🧬Dividend Neural Engine — 7 Modules (Boris Banushev Framework)
🌊 Fourier Denoiser
FFT-based denoising of dividend history. Detects special dividends as 2σ outliers. Isolates true trend vs noise.
active 215 lines
📊 ARIMA Feature
Auto-selects best (p,d,q) via AIC. Forecasts next dividend as ML input feature. Handles short series gracefully.
active 212 lines
🔲 Stacked Autoencoder
PyTorch 3-layer encoder/decoder. Compresses 12 dividend features → 16-dim latent vector for LSTM input.
PyTorch 473 lines
🔮 LSTM Predictor
2-layer LSTM + attention. MC Dropout (50 passes) for confidence intervals. Predicts next dividend amount.
PyTorch 445 lines
🎲 GAN Engine
LSTM Generator + CNN Discriminator. Generates 100 synthetic dividend futures. cut_probability → 8% ensemble vote.
ensemble voter 432 lines
🗺️ SOM Anomaly Detector
10×10 Self-Organized Map. Detects unusual payout/FCF trajectories. Returns similar tickers by BMU proximity.
minisom 487 lines
🔗 Eigen / Contagion
PCA on dividend growth matrix. Dividend contagion risk: if sector peer cuts, who follows? 24h TTL cache.
sklearn PCA 256 lines
🏭Harvey Unified ML API — Port 9001
Status✓ healthy
Serviceharvey-unified-ml
Version4.0.0
Models loaded7
Host0.0.0.0:9001 (public) + nginx proxy
Endpoints/score/symbol · /predict/yield · /predict/cut-risk · /predict/payout-rating · /health
CPU usage0.9%
Memory1.2% of 27 GB ≈ 330 MB
⚖️Ensemble Weight Distribution
Harvey GPT-5 (Azure OAI)17%
Claude Sonnet 4 (Verification)5%
🛡️ML-Powered Moat Features
FeatureExtractionService
Extracts 12 proprietary dividend features from DB + market data for ML input pipeline.
active
DividendQualityScorer
ML scoring of dividend quality across 6 dimensions. sklearn-based (offline on LLM VM).
sklearn missing (dev)
CutRiskAnalyzer
XGBoost dividend cut risk model. Rule-based fallback active when model unavailable.
xgboost missing (dev)
Harvey Score Service
Pre-computed scores for top 500 tickers daily at 2 AM. Composite dividend health score.
active
NAV Avoidance ML Screen
6 warning signals, erosion trend, distribution trap detection across 50+ securities.
active
Claim Verification Gate
Mandatory fact-check before AI response delivery. Cross-references ground truth tables. Upgraded with apply_multi_source_consensus() — cross-checks LLM output against median consensus, upgrades to VERIFIED within tolerance, marks CONFLICT on deviation.
active
🔬Cross-API Verification Layer — v1.0
multi_source_verifier.py
active
804 lines
5-min TTL cache
Fires 5 parallel API calls on every ticker preflight. Computes median consensus across all responding sources. Injects a VerificationResult.context_block() into the LLM system prompt before any response is generated — ensuring Harvey always reasons from ground-truth consensus rather than stale or hallucinated figures.
Data Sources
FMP (Financial Modeling Prep)
Finnhub
EODHD
Alpha Vantage
Harvey DB (unified_dividends)
Verified Metrics
📈 Price
💰 Dividend Yield
📅 Annual Dividend
⚖️ Payout Ratio
🕐 Last Dividend
Discrepancy Thresholds
Price: 2%
Dividend: 5%
Yield: 15%
Payout ratio: 20%
Confidence Levels
HIGH ✅ ≥3 sources agree
MEDIUM ⚡ 2 sources agree
LOW ⚠️ single source
Conflicts flagged & surfaced to LLM
Pipeline flow: ticker preflight → parallel fetch (asyncio.gather) → median consensus → context_block() → LLM system prompt injection → response generation → apply_multi_source_consensus() post-check → VERIFIED / CONFLICT label appended
📐Quantium Library Stack — 20+ Libraries (Dividend Intelligence Pipeline)
edgartools
SEC EDGAR access — 8-K dividend declarations, 10-K XBRL cash flows, Item 5 policy text extraction
Stage 1
exchange-calendars
NYSE/NASDAQ/LSE trading day validation for ex-date confirmation and growth streak counting
Stage 2
pandas-market-calendars
Market-aware date arithmetic — fiscal year boundary detection, pay-gap widening (liquidity stress signal)
Stage 2
arch (GARCH)
GARCH(1,1) volatility model on dividend yield series. Conditional variance = cut risk. Persistence α+β.
Stage 3
pmdarima
Auto-ARIMA with AIC/BIC order search. Dividend payment forecasting with confidence intervals.
Stage 4
prophet
Facebook Prophet with quarterly seasonality decomposition. 2nd member of 3-model ensemble for 4-quarter forecast.
Stage 4
scikit-learn Ridge
Ridge regression (L2, α=1.0) with lag-4 windowed features and StandardScaler. 3rd ensemble member. Dynamic inverse-MAE weighting. CI: ±1.28×val_MAE (80%).
Stage 4inverse-MAE weights
tsfresh
EfficientFCParameters — 800+ statistical features from payment time series. 20-feature curated subset.
Stage 5
empyrical-reloaded
Income Sharpe, income Sortino, max income drawdown. Treats dividends as the "returns" series.
Stage 6
ffn
Financial functions — yield-on-cost CAGR, income consistency score, drawdown analytics.
Stage 6
PyPortfolioOpt
Max-yield portfolio optimization where yield replaces expected returns in the efficient frontier.
Stage 7
Riskfolio-Lib
HRP (Hierarchical Risk Parity) across dividend growth correlations for robust position sizing.
Stage 7
skfolio
Portfolio optimization toolkit — additional constraint handling for income-focused allocation problems.
Stage 7
alphalens-reloaded
Factor IC / ICIR back-testing of Harvey Safety Score, yield screen, DGR screen as alpha signals.
Stage 8
ta · finta
Technical analysis indicators applied to dividend yield time series for regime context signals.
supporting
financedatabase
Sector/industry classification database — sector peer grouping for contagion and factor analysis.
supporting
pypme · rateslib
Public Market Equivalent benchmarking and rates/bond analytics for yield spread context.
supporting
🔧Core Intelligence Services
Intelligent Query Service
19 intent types · multi-model fallback chain · learning loop
Hallucination Prevention
Ticker Preflight Gate · Schema fingerprint · Claim verification
Dividend Safety Ensemble
7-voter weighted system · VERY_SAFE → CRITICAL scale
NAV Erosion Service
50 securities · 6 warning signals · distribution trap detection
Chart Generator Service
6 chart types · matplotlib · server-side PNG generation
PDF Research Report Gen
4 report types · ReportLab · institutional-grade output
SEC Filing RAG Service
EDGAR retrieval · vector search · 10-K/Q synthesis
Stock Card Service
Data-driven per-ticker summaries · ML integration · 30+ stopwords
🌐Real-Time Data Services
FMP Integration (80+ endpoints)
Financial Modeling Prep · market data · fundamentals · estimates
Finnhub Service
Real-time quotes · news · earnings · analyst ratings
X.com Monitoring
Grok agent tools · circuit breaker active · Grok 429 degraded
FRED Economic Data
Federal Reserve economic indicators · interest rate feeds
Unified Dividends Pipeline
Multi-source dividend data · FMP + Finnhub + EDGAR → Azure SQL
Multi-Source Backfill
Auto data discovery · gap filling · Backfilled_Dividends/Prices tables
🚀v4.2 Feature Layer
Semantic Vector Memory
Persistent user conversation memory with semantic retrieval
ElevenLabs TTS
Text-to-speech audio generation for research summaries
Agentic Tool-Calling Loop
Multi-step tool orchestration with self-correction
Code Interpreter (Sandboxed)
Safe Python execution for user-defined financial calculations
SSE Token Streaming
Real-time token-by-token response streaming via Server-Sent Events
WebSocket Alert Push
Real-time dividend alert delivery via persistent WebSocket
User Profile Injection
Personalized context from persistent user profiles into LLM prompts
Auto Fine-Tuning Pipeline
Fully automated: Sunday 1 AM UTC · 3 sources (50/35/15%) · min 200 pairs · gpt-4o-mini-2024-07-18 · 6h polls · zero manual steps
Harvey v4.0 Training Agent
29 seed pairs across 6 categories · 15 LLM variations per run (every 2h) · ~180 pairs/day into InvestmentResearcherTraining (35% fine-tune weight)
🆕Recent Platform Upgrades
Chat History API
Full conversation persistence to Azure SQL. 4 endpoints: GET /api/v1/chat/conversations (list recent), GET /conversations/{id}/messages (restore thread), DELETE /conversations/{id}, POST /conversations/new. Streams persist via _history_save_wrapper. User/conversation IDs via x-user-id/x-conversation-id headers.
activechat_history_routes.py4 endpoints
Analysis Depth Overhaul v4.1
Every single-ticker dividend card now calls Claude Sonnet for 5 mandatory analysis sections: Dividend Sustainability (VERY SAFE/SAFE/WATCH/AT RISK/DANGER), Growth Trajectory, Yield in Context (vs sector/S&P/T-bill/peers), Business Quality, and Harvey's Verdict (BUY/HOLD/WATCH/AVOID). max_tokens raised to 2800. Banned-phrase enforcement. prefer_groq=False.
Claude Sonnet 4ai_sdk_routes.py5 sections2800 tokens
Two-Layer LLM Response Cache
Semantic response cache with in-memory dictionary (L1) + Azure SQL persistent table (L2). Reduces repeated-query token cost by 30–50%. Semantic similarity matching prevents stale cache misses. LRU eviction on L1 overflow.
activeL1 in-memoryL2 Azure SQL
99.999% Accuracy Enforcement (v1.0)
Multi-layer accuracy framework: middleware interception, claim verification gate, stale data refresh trigger, multi-source cross-validation. Batch INSERT logging (1 background thread replaces 24 sequential DB calls — ~4.2s → 200ms). Harvey accuracy log table auto-ensured once per process.
activeclaim_verification_gate.pybatch INSERT
Broker Connect via Plaid (REST + MCP)
5 REST endpoints at /api/plaid/* plus 5 MCP tools. 3-step flow: create link token → connect account → sync portfolio. CUSIP (100%), ISIN (95%), ticker (70%) institutional-grade security matching. Dividend-only filtering, portfolio upsert, ownership-verified disconnect.
active/api/plaid/*PLAID_CLIENT_ID + PLAID_SECRET
DividendScreener Stale-Connection Retry
2-attempt retry loop in request_handler.py. On OperationalError at attempt 0 calls engine.dispose() to recycle the pymssql connection pool, then retries. Graceful fallback to LLM narrative if second attempt also fails. Eliminates "connection closed" errors after DB idle.
activerequest_handler.py:637pool recycle
Fix-It Agent v2.1
Hourly scan of conversation logs for failed responses. 19 fix strategies (7 new in v2.1: TAX_EFFICIENCY, ETF_ANALYSIS, DRIP_COMPOUNDING, RETIREMENT_INCOME, COMPARISON_ANALYSIS, POSITION_SIZING, DGI_STRATEGY). 45 bad-response detection patterns (12 new: knowledge-cutoff deflections, clarification-fishing, tool-error leakage, hedging, truncated/disclaimer-only responses, AI identity deflections, info-begging, accuracy disclaimers, hallucination confidence hedges, stall phrases).
active19 strategies45 patternsv2.1
DIVIDEND_SCREENER Training Data v2
86x failure-pattern fix for SCREENER_LIST intent. dividend_intelligence_training.py: 12→20 scenarios, 8 new query pools (PRICE_CONSTRAINED, FCF_COVERAGE, TICKER_SEEDED, RECOVERY_DIVIDEND, NEGATIVE_SCREEN, INCOME_TARGET_REVERSE, COMPARATIVE_BETTER, VAGUE_INTENT). comprehensive_investment_trainer.py: +36 templates across 6 gap-pattern categories. LLM override prompt for screener intents — requires concrete ticker list with yield, safety rating, and rationale.
active20 scenariosSCREENER_LIST86x fix
🧠Harvey v4.0 Standalone Capabilities
General Knowledge Handler
Detects educational/definitional queries ("what is inflation", "explain DCF") via prefix matching + capability triggers. Routes before ticker extraction — Harvey answers as CIO-level financial educator.
activeharvey_persona.py
Capability Registry
34 structured capabilities across 8 categories: Dividend Analysis, Portfolio Strategy, ETF Intelligence, Equity Research, Market/Macro, Screeners, Education, Advanced Tools. Exposed via GET /api/v1/harvey/capabilities.
activeharvey_capabilities.py
Agentic Reflect Loop
Fires after CHAT PATH LLM stream completes — checks if critical content is missing and appends a supplement. Only activates for non-low-latency queries with parsed tickers. Capped at 3s via SIGALRM.
active_reflect_and_supplement()
Graceful Degradation
50 pre-built Q&A answers for common finance topics. Activates on full provider failure. Health exposed via GET /api/v1/harvey/health — 6 provider statuses (Azure OAI, Claude, Gemini, OpenAI, Groq, graceful_degradation).
active50 answersgraceful_degradation.py
Unified Persona Contract
Single source of truth for Harvey's CIO/Chief Strategist identity. get_system_prompt(mode="full"|"compact"|"fast") assembles right persona for each call site. Imported by llm_providers.py and intelligent_query_service.py.
activeharvey_persona.py
📊StructuredKPIService — Perplexity Finance-Style Breakouts
Auto-appended to every single-ticker CHAT PATH response. Auto-detects security type via symbol sets + FMP profile. Fetches data in parallel via ThreadPoolExecutor (5s hard cap). Renders clean markdown panels.
BLOCK TYPE — stock
Business segments, valuation ratios, P&L summary for general equities.
BLOCK TYPE — dividend_stock
Yield, FCF payout, ML cut risk, 7-model safety score, growth streak years.
BLOCK TYPE — reit
FFO proxy, AFFO payout ratio, debt/equity, ML cut risk score.
BLOCK TYPE — etf
Top 10 holdings, sector allocation, expense ratio, AUM.
BLOCK TYPE — dividend_etf
Weighted yield, NAV erosion flag, distribution schedule.
BLOCK TYPE — bank
Net Interest Margin, ROE, ROA, efficiency ratio, P/Book.
BLOCK TYPE — mlp
DCF coverage ratio, distribution yield, debt/EBITDA, commodity exposure.
🏗️HarvestEngine Platform — 8 Modules
Backtesting Engine
6 strategies · DRIP, Growth, High-Yield, Capture, Aristocrat, Covered Call
Portfolio Optimizer
Dividend income optimization · Markowitz-inspired allocation
Dividend Risk Analyzer
NAV erosion, cut risk, FCF coverage, sector contagion
Income Impact Simulator
Dividend income impact projection with DRIP compounding
Dividend Calendar
Ex-date tracking · declaration date patterns · 25 securities
NAV Avoidance Screener
50 securities · 20 profiles · 6 warning signals
Multi-Model Safety Ensemble
Orchestration layer · 7 voters · weight redistribution on failure
HarvestEngine Continuous Training
Expert Q&A pair generation · 13,120 rows in DB · ThreadPoolExecutor
🧪Dividend Intelligence Pipeline — 8 Services (Quantium Library Stack)
Full ML pipeline orchestrated by dividend_intelligence_pipeline.py — all 8 stages confirmed available: true. Returns unified DividendIntelligenceReport with Harvey composite score & plain-language verdict.
STAGE 1 — EDGAR Dividend Service
edgar_dividend_service.py
edgartools pulls 8-K declared dividends + 10-K XBRL cash flow + Item 5 policy text. Policy type classifier (consistent / growth / variable / suspended).
edgartoolsSEC EDGARXBRL
STAGE 2 — Trading Calendar Service
trading_calendar_service.py
exchange_calendars validates ex-dates, counts consecutive growth years respecting fiscal years, detects pay-gap widening (liquidity stress), estimates next ex-date.
exchange-calendarspandas-market-calendars
STAGE 3 — Yield Volatility Service
yield_volatility_service.py
ARCH/GARCH(1,1) fit on dividend yield time series → conditional variance = cut risk signal. 0–100 score, regime (stable / elevated / crisis), persistence alpha+beta.
arch/GARCHcut risk 0–100
STAGE 4 — Dividend Forecast Service
dividend_forecast_service.py
Three-model ensemble: Auto-ARIMA (pmdarima) + Facebook Prophet + Ridge regression (L2, α=1.0, lag-4 windowed features, StandardScaler). Dynamic inverse-MAE weighting — weight ∝ 1/val_MAE, equal-weight fallback if <2 valid. Next 4 payment forecasts with 80% CI. Predictability score 0–100.
pmdarimaprophetRidge4-quarter CIinverse-MAE weights
STAGE 5 — Dividend Feature Extractor
dividend_feature_extractor.py
tsfresh EfficientFCParameters extracts 800+ features from payment time series. Curated 20-feature subset + stability score 0–100. Powers the tsfresh Cut-Risk Classifier: GradientBoostingClassifier (n_estimators=200, max_depth=4, lr=0.05, subsample=0.8) — upgraded from RandomForest. Time-ordered train/test split with 10% embargo gap prevents look-ahead bias. Returns probability + top-5 feature drivers with direction signals. Monthly auto-retraining.
tsfresh800+ featuresstability 0–100GradientBoostingmonthly retrain
STAGE 6 — Income Analytics Service
income_analytics_service.py
empyrical-reloaded + ffn compute income Sharpe, income Sortino, yield-on-cost CAGR, max income drawdown, income consistency score 0–100.
empyricalffnYOC CAGR
STAGE 7 — Portfolio Income Optimizer
portfolio_income_optimizer.py
PyPortfolioOpt max-yield optimization (yield replaces expected returns), HRP across dividend growth correlations, Kelly position sizing (win_prob = 1 − cut_prob).
PyPortfolioOptRiskfolio-LibHRP · Kelly
STAGE 8 — Dividend Factor Analyzer
dividend_factor_analyzer.py
alphalens-reloaded back-tests Harvey Safety Score, yield screen, DGR screen as alpha factors. Returns IC / ICIR metrics proving or refuting each screen's predictive power.
alphalensIC · ICIRalpha factor
🔗MCP Server v3.1 — 23 Financial Intelligence Tools · 5 Capability Groups
OAuth 2.1 one-click connection · stdio (Claude Desktop) + HTTP/SSE at /api/mcp/sse · MCPGuard prompt injection + rate limiting + SHA-256 audit log
📡Group 1 — Data (7 tools)
get_dividend_history
Full dividend payment history for any ticker from Azure SQL
data60/min
get_stock_price
Real-time price + yield from FMP integration
data60/min
get_dividend_calendar
Upcoming ex-dates, pay dates, declaration dates
data60/min
get_company_fundamentals
FCF, payout ratio, debt-to-equity, sector from DB views
data60/min
get_sec_filings
SEC EDGAR 10-K/10-Q/8-K structured extraction with 7-day cache
data30/min
get_earnings_transcript
FMP earnings call transcripts — dividend guidance + guidance language extraction
data30/min
get_market_data
Macro indicators, sector ETF flows, FRED interest rate data
data60/min
🔍Group 2 — Screening (3 tools)
screen_dividends
Filter securities by yield, DGR, payout ratio, streak, sector
screening60/min
screen_dividend_aristocrats
Filter by consecutive growth years — Aristocrats (25yr+), Kings (50yr+), Champions
screening60/min
screen_nav_safe_etfs
NAV-erosion-free ETF screening — 6 warning signals, 50 securities tracked
screening30/min
📊Group 3 — Analytics (4 tools)
analyze_dividend_safety
Full 7-voter ensemble safety score: VERY SAFE / SAFE / WATCH / AT RISK / DANGER
analytics20/min
compare_dividends
Side-by-side multi-ticker dividend comparison matrix with safety ratings
analytics60/min
forecast_dividend
Three-model ensemble forecast (ARIMA + Prophet + Ridge) — next 4 quarters with 80% CI
analytics20/min
optimize_portfolio
HarvestEngine max-yield portfolio optimization with HRP position sizing
analytics10/min
🧠Group 4 — Intelligence (4 tools)
ask_harvey
Natural language query to Harvey's full AI pipeline. Prompt injection & manipulation detection via MCPGuard.
intelligence10/min · 50/hrguarded
generate_research_report
Claude Opus 4.5 deep research — IoC, earnings, sector, dividend sustainability, risk
intelligence5/min
get_harvey_capabilities
List Harvey's 34 capabilities across 8 categories for agent discovery
intelligenceno limit
get_durability_graph
Composite durability score with 6 explainable sub-scores, stress scenarios, historical trends
intelligence20/min
🏦Group 5 — Brokerage / Plaid Connect (5 tools)
create_brokerage_link_token
Step 1: Generate Plaid Link token for OAuth brokerage connection flow
brokerage30/min
connect_brokerage_account
Step 2: Exchange public token → access token; persist to DB. CUSIP/ISIN/ticker matching.
brokerage30/min
sync_brokerage_portfolio
Step 3: Pull holdings → enrich with dividend data → upsert portfolio. Dividend-only filter.
brokerage10/min
get_brokerage_portfolio
Retrieve synced portfolio positions with Harvey safety scores and income projections
brokerage60/min
disconnect_brokerage
Revoke Plaid access token + purge portfolio data. Ownership-verified delete.
brokerage10/min
VersionMCP Server v3.1 · MCP SDK v1.26.0
Transportsstdio (Claude Desktop) + HTTP/SSE at /api/mcp/sse
AuthOAuth 2.1 one-click connection · API key for REST consumers
SecurityMCPGuard — prompt injection (16 override + 7 manipulation patterns) + sliding-window rate limiter + SHA-256 audit log → dbo.mcp_audit_log
Brokerage matchingCUSIP 100% · ISIN 95% · Ticker 70% — institutional-grade security resolution
Health/api/mcp/health · /api/mcp/tools
🤖Claude Intelligence Layer v1.2.0 — 8 Services
ClaudeClient (Async HTTP Core)
httpx async wrapper for Anthropic API — sonnet/opus/haiku. Lazy import, fails gracefully if ANTHROPIC_API_KEY absent. No SDK dependency. Shared across all 7 other Claude services.
Sonnet 4 / Opus 4.5httpxno SDK
Premium Query Router
39-signal frozenset routes complex queries to Claude Sonnet 4 with 6,000-token budget. Triggers: initiation of coverage, passive income plan, retirement portfolio, deep dive, investment thesis. OAI fallback on failure.
Sonnet 439 signalsrequest_handler.py
Deep Research Agent
Opus 4.5 for institutional-grade equity research. 5 report types: initiation-of-coverage, earnings deep dive, sector comparative, dividend sustainability, risk assessment. Multi-section structured output.
Opus 4.5POST /api/v1/claude/research/report
Training Quality Reviewer
Scores Q&A training pairs across 5 rubric dimensions. Returns verdict (pass/review/fail), dimension scores, improvement notes. Batch up to 50 concurrent. Raises overall fine-tune dataset quality.
Sonnet 4POST /api/v1/claude/training/reviewbatch 50
Self-Improvement Engine
Queries harvey_query_log + harvey_feedback DB tables. Outputs SelfImprovementReport: persona health score 0–100, knowledge gaps, prompt refinements, training recommendations. Scheduled analysis.
Sonnet 4POST /api/v1/claude/improve/analyze
Safety Ensemble Vote
Claude Sonnet 4 as 5% weighted voter in the 7-model dividend safety ensemble. Logical consistency check + hallucination detection. Dynamic weight redistribution on model failure.
Sonnet 45% weightPOST /api/v1/claude/ensemble/safety-score
Financial Formula Engine
43+ deterministic formulas across 8 categories with Claude explanations and Excel syntax. Prevents hallucination on quantitative calculations. Gordon Growth Model, Yield on Cost, AFFO Payout, DCF and more.
Sonnet 443 formulasGET /api/excel/formulas/list
ClaudeDeepResearch (Direct Methods)
Three callable research primitives: generate_initiation_report() — full IoC with valuation + dividend thesis; generate_dividend_deep_dive() — payout safety + growth trajectory + BUY/HOLD/WATCH/AVOID verdict; generate_sector_comparison() — peer ranking with dividend yield matrix.
Opus 4.53 research primitivesPOST /api/v1/claude/research/report
Status endpointGET /api/v1/claude/status
Clienthttpx async (no Anthropic SDK) — lazy import, fails gracefully if ANTHROPIC_API_KEY absent
Modelsclaude-sonnet-4-20250514 (default) + claude-opus-4-20250514 (deep research only)
𝕏X Real-Time Signal Service — v2.0 (X API 2025)
grok_x_dividend_service.py
pay-per-use
XMCP-ready
grok-3-fast
MCP tool #19
X is the most real-time data platform on earth. Harvey now queries it via Grok x_search for any ticker — not just monitored ETF accounts. Pay-per-use billing (X API 2025) means no monthly credit exhaustion. XMCP Server hook will upgrade to native X MCP context when configured.
Signal Types
📣 Dividend announcements
⚠️ Cut / suspension warnings
📊 Earnings reactions
🏦 Analyst calls
👤 Insider activity
🔴 Breaking news
Coverage
Any public ticker
10 monitored ETF accounts
All public X posts
Image understanding
1–30 day lookback
Sentiment classification
New Endpoints
GET /api/x/signals/{ticker}
GET /api/x/signals/{ticker}/dividend
GET /api/x/status
GET /api/x/xmcp/status
MCP Tool #19
get_x_dividend_signals
Args: ticker, days_back
Returns: signals[], sentiment_summary, top_signal
Claude/Cursor can now ask:
"What is X saying about $T?"
Grok Responses API /v1/responses
x_search tool
no monthly limits
XMCP fallback active
env: X_XMCP_SERVER_URL
🗂️API Routes (65 files)
advanced_analyticsadvisoragent_toolai_sdkalert_pushclaude_intelligencecode_interpretercomprehensive_trainingcurated_listdashboarddatabase_mldeep_researchdeepseek_trainingdividend_aristocratsdividend_intelligencedividend_listsdividend_neuraldividend_pipelinedocument_learningeducation_trainingexternal_ml_apifile_processingfinetuningfingptfinrobotfmpfmp_traininggeneral_investmenthallucination_preventionharvestharvest_traininghashtaginvestment_strategymarket_intelligencemcpml_predictionmoatmulti_source_trainingnotebookperplexityperplexity_trainingquantlibs_trainingrecommendationrecommendation_trainingresearcher_agentrlmroundtablesecurity_comparisonsemanticsentimentsocial_mediastrategicstreaming_chattrading_intelligencetrading_strategies_trainingtraining_managementttsultimate_packunified_data_lakeuser_profilevideovideo_trainingx_dividendx_dividend_training
🌐Network & Ports
8001Harvey Main API (public)
9001ML API (public)
9000Internal API (localhost)
8000Shadow / restarting
443Nginx → llm.theharvey.ai
🗄️Database
EngineAzure SQL Server
Serverhey-dividend-sql-server.database.windows.net
DatabaseHeyDividend-Main-DB
Driverpymssql (native FreeTDS)
ViewsvSecurities, vDividendsEnhanced, vSchedules, vSignals, vPredictions
unified_dividends787,787 rows · canonical primary source
Data guardCHK_unified_dividends_amount_sanity · amount > 0 AND ≤ $500 · 0 bad rows
Guard layers① code batch-uniformity check ② SQL CHECK constraint ③ moat cleanup endpoint
⚙️Architecture Enhancement Layer
Model Telemetry System
active
Cost-Aware Router (4-tier)
active
ML-Based Intent Classifier
active
Shared Cache (LRU in-mem)
active
Async DB Pool (ThreadPoolExecutor)
active
Parallel Source Fan-Out
active
RAG Retrieval Reranking
active
Circuit Breaker (ML API)
active
ML Health Monitor (30s interval)
active
🔑Active External Integrations
Azure OpenAI
Endpointhtmltojson-parser-openai-a1a8.openai.azure.com
DeploymentHarveyGPT-5
Status✓ active
xAI Grok-4
PurposeX.com monitoring, real-time social sentiment
Status✗ 429 rate-limited (ongoing — monthly credits exhausted)
ImpactEnsemble redistributes weight; X Dividend Training Agent degraded
Google Gemini 2.5 Pro
PurposeEnsemble (31%) + market intel
Status✓ active
Anthropic Claude
ModelsSonnet 4 + Opus 4.5
PurposeEnsemble + deep research
Status✓ active (httpx, no SDK)
Perplexity Sonar
PurposeEnsemble (21%) + research training
Status✓ active
ElevenLabs TTS
PurposeAudio generation for research
KeyELEVENLABS_API_KEY ✓
Status✓ configured
FMP (Financial Modeling Prep)
Endpoints80+ API endpoints
Status✓ active
Helicone LLM Observability
PurposeToken tracking, latency, cost
Status✓ active (proxy layer)
🔁GitHub Actions — ML Training Pipeline
Workflowtrain-ml-models.yml · daily 2 AM UTC + manual dispatch
Models6 total — Dividend Growth, Cut Predictor, Anomaly Detection, ESG Scorer, Payout Rating, Portfolio Optimization
No-DB models (always train)① Growth Forecaster ⑤ Payout Rating ⑥ Portfolio Optimizer — use synthetic data when DB unreachable
DB-dependent models② Cut Predictor ③ Anomaly Detection ④ ESG Scorer — train when Azure SQL reachable
UploadAzure Blob Storage → ml-models container · versioned archive per run
Status✓ Fixed — exit code 2 (pip cache / MSSQL install) + TRAINED=0 (wrong args) resolved
🩺VM Cron Health — LLM VM (20.81.210.213)
vm_training_cron.sh✓ execute permission fixed (was chmod -x, silent 30-min fail)
training_health_check.py✓ deployed — runs every 6h via cron, validates all training tables + unified_dividends integrity
Health checksRow counts per training table · unified_dividends 787k+ · amount sanity · zero / over-$500 scan
📋Log Health (Current Session)
Dominant error sourceGrok-4 HTTP 429 (96%+ of all ERROR events) — ongoing monthly credit exhaustion
Other errorsDB timeouts, yfinance fallback, config misses (non-Grok)
Log file size~205 MB (accumulated; rotate monthly)
Live countsSee Live Logs tab → ERROR filter for real-time totals
Accuracy logBatch INSERT active — 1 background thread per verify_all_claims() call (~200ms vs ~4.2s sequential)