7
Ensemble Voters
34
Training Agents
214
Service Modules
67
API Route Files
787k+
unified_dividends Rows
112
Scheduled Jobs
19.1k
Log Events (24h)
7
ML Models Loaded
Response Accuracy
Queries (24h)
Fix-It Pending
Auto-Fix Rate
Avg Confidence
5
Verification Sources
🟢
Grok / X API — Pay-Per-Use Active — Migrated to X API 2025 pay-per-use billing. Monthly credit exhaustion no longer possible. Ensemble weights restored. grok_x_dividend_service.py upgraded to v2.0: search_ticker_signals() for any ticker · XMCP Server hook ready · New MCP tool get_x_dividend_signals added (tool #19). Model: grok-3-fast
🟢
ML API Healthy — harvey-unified-ml v4.0.0 running on port 9001 with 7 models loaded. Harvey backend on port 8001 active. Claude Intelligence Layer v1.2.0 — 8 services registered.
🟢
Accuracy & Performance Upgrades Active — Accuracy log batch INSERT (1 background thread vs 24 sequential), DividendScreener stale-connection retry (pool recycle on OperationalError), Two-layer LLM Response Cache deployed.
🟢
DB Integrity Guard — 3-Layer Protection ActiveCHK_unified_dividends_amount_sanity SQL CHECK constraint deployed (amount > 0 AND amount ≤ $500). Code-level batch-uniformity guard + moat cleanup endpoint. 787,787 clean rows · 0 over $500 · 0 zeroes.
🟢
VM Cron Health Monitoring Activevm_training_cron.sh permissions fixed. training_health_check.py deployed on LLM VM — runs every 6h, validates all training tables + unified_dividends integrity. GitHub Actions ML training pipeline fixed: 6-model pipeline runs daily 2 AM UTC; models 1, 5 & 6 train on synthetic data when DB unreachable.
🟢
Cross-API Verification Layer Activemulti_source_verifier.py fires 5 parallel API calls (FMP · Finnhub · EODHD · Alpha Vantage · Harvey DB) on every ticker preflight. Median consensus for price, yield, annual dividend, payout ratio & last dividend is injected into the LLM system prompt before any response is generated. Confidence ratings: HIGH✅ ≥3 sources agree · MEDIUM⚡ 2 sources · LOW⚠️ single source. 5-minute TTL cache per ticker.
🏠Overview
🤖AI Models
🎓Training Agents
🧠ML & Neural
⚙️Services
🖥️Infrastructure
📋Live Logs
🧠Harvey Learned
💬Conversations
🗺️Architecture
🖥️System Vitals
HostAzure VM — 20.81.210.213
OSUbuntu (miniconda3 llm env)
Uptime~24h (refreshed via deploy)
Load avg0.75 / 0.79 / 0.81
RAM2.8 GB used / 27 GB total
Disk148 GB / 248 GB (60%)
RAM usage10.4%
Disk usage60%
Active Processes
ProcessPortCPUMEMStatus
Harvey Main API80017.3%5.3%LIVE
Harvey ML API90010.9%1.2%LIVE
Internal API (9000)90000.0%1.2%IDLE
Gemini Scheduler0.0%0.0%RUNNING
Service Watchdog0.0%0.0%ACTIVE
Port 8000 (shadow)8000189%0.6%RESTARTING
📊Training Database — Row Counts ↻ Refresh ⚙ Init Tables
Loading live DB counts…
📦Disk Breakdown
ml_training/966 MB
logs/205 MB
app/24 MB
⚖️Dividend Safety Ensemble — 7 Weighted Voters
🟣 Google Gemini 2.5 Pro
Sustainability + fundamentals analyst
31%
active primary voter REST API
Evaluates FCF coverage, payout sustainability, dividend growth trajectory
🔵 Perplexity Sonar
Real-time data + news disclosures
21%
active sonar-pro web-grounded
Grounds analysis in recent earnings, news, SEC filings, and analyst reports
🟢 Harvey GPT-5
Azure OpenAI — complex reasoning
17%
active HarveyGPT-5 Azure hosted
Payout coverage analysis, DCF reasoning, complex multi-step financial logic
🔶 DeepSeek-R1
Quantitative — DCF & FCF math
13%
active quantitative chain-of-thought
Precise DCF, FCF payout calculations and quantitative stress-testing
🌐 GAN Scenarios
Dividend Neural Engine — stress testing
8%
active CNN Discriminator synthetic futures
Generates 100 synthetic dividend sequences, cut_probability drives this vote
📰 BERT News Sentiment
FinBERT — positive news signal
5%
active FinBERT sentiment layer
Scores news sentiment; positive coverage boosts safety score
🎭 Claude Sonnet 4
Anthropic — verification gate
5%
active Opus 4.5 for research hallucination check
Logical consistency check, hallucination detection, claim verification
🔌Other Active Models (Outside Ensemble)
⚡ Grok-4
xAI — real-time X.com monitoring
RATE LIMITED
Monthly credit limit exhausted (ongoing). Thousands of HTTP 429 errors logged. X.com dividend monitoring degraded. Ensemble auto-redistributes to Gemini + Perplexity. X Dividend Training Agent marked degraded.
⚡ Groq (Llama 3.3 70B)
Ultra-fast inference (~300ms)
active
Used for rapid lightweight queries. Async streaming enabled.
🏦 FinGPT
Finance-specialized open-source LLM
specialized
Domain-specific fine-tuned on financial corpus. Used for technical analysis queries.
📊 FinRobot
Multi-agent financial reasoning
specialized
Institutional-grade multi-agent financial analysis framework.
🔍 Claude Deep Research
Opus 4.5 — institutional reports
premium
5 report types: IoC, earnings, sector, dividend sustainability, risk. Opus 4.5 only.
📐 Financial Formula Engine
43 deterministic formulas + Claude explanations
active
Deterministic math layer prevents AI hallucination on quantitative calculations.
🔄Core Training Agents — Continuous
Deep Research Training Agent
Every 1 hour 500 examples/hr running
Investment Researcher Agent
Every 30 min 100 questions running
Perplexity Research Training
Every 1 hour 30 Q&A pairs running
HarvestEngine Training Agent
Every 2 hours 160 pairs/batch running
Investor Roundtable Training
Every 2 hours ~9,600 pairs running
FMP Comprehensive Training
Every 2 hours 60 pairs (720/day) running
Market Intelligence Training
Every 2 hours 30 pairs running
DeepSeek Quantitative Training
Every 2 hours 20 pairs running
Harvey Advisor Platform Training
Every 3 hours 120 pairs/batch running
NAV Avoidance Training Agent
Every 2 hours 15 pairs (180/day) running
X Dividend Training Agent
Every 2 hours 30 questions degraded (Grok 429)
Video Training Agent (@heydividedtv)
Every 6 hours sync YouTube running
Data Enrichment Service
Every 6 hours AFFO, REIT, streaks running
Harvey Score Pre-computation
Daily @ 2 AM Top 500 tickers scheduled
Dividend Frequency Fallback Sync
Every 1 hour DB sync running
Harvey v4.0 Standalone Training Agent
Every 2 hours 15 variations/run · ~180/day running
Auto Fine-Tuning Submission (Scheduler)
Sunday 1 AM UTC · poll 6h gpt-4o-mini-2024-07-18 scheduled
DIVIDEND_SCREENER Training v2 (86x fix)
Every 2 hours 20 scenarios · 36 new templates v2 — SCREENER_LIST fix
Chat History Conversation Training
On conversation end DB-persisted turns running
📈Trading Intelligence — 13 Specialist Agents (Every 2 Hours)
Investment Thesis Generator
20 pairs/batch • 240/day
active
Technical Analysis Agent
15 pairs/batch • 180/day
active
Sentiment Aggregator
15 pairs/batch • 180/day
active
Crypto Market Agent
15 pairs/batch • 180/day
active
ETF Composition Analyzer
15 pairs/batch • 180/day
active
Volatility Risk Scorer
15 pairs/batch • 180/day
active
Macro Regime Detector
15 pairs/batch • 180/day
active
Earnings Surprise Predictor
15 pairs/batch • 180/day
active
Rotation Strategy Agent
15 pairs/batch • 180/day
active
Multi-Agent Trading Desk
15 pairs/batch • 180/day
active
Options Strategy Analyzer
15 pairs/batch • 180/day
active
Fund Flow Analyzer
15 pairs/batch • 180/day
active
Crypto Correlation Monitor
15 pairs/batch • 180/day
active
⏱️Cron Schedule (VM)
ScheduleJobLog
*/30 * * * *vm_training_cron.sh — comprehensive training batch/var/log/harvey/comprehensive_training.log
15 6,18 * * *Dividend Intelligence 8-model trainingdividend_intelligence_cron.log
0 4,16 * * *Investor Profile Trainerinvestor_profile_training.log
0 5,17 * * *Portfolio Blueprint Trainer (15 questions)portfolio_blueprint_training.log
0 1 * * 0Auto Fine-Tuning Submission (Sunday 1 AM UTC) + 6h status pollsscheduler_service.py (in-process)
0 2 * * *Harvey Sanity Check v3 (email alert)sanity_cron.log
0 * * * *PII Monitor — production log scanpii_monitor.log
📉Projected Training Volume
AgentRatePer DayPer Week
Deep Research Training500/hr12,00084,000
Investment Researcher100/30min4,80033,600
Investor Roundtable~9,600/2hr4,80033,600
FMP Comprehensive60/2hr7205,040
Investment Strategy15/2hr1801,260
13 Trading Agents15/2hr each2,34016,380
DI 8-Model (daily incr.)20/model×8160960
Harvey v4.0 Training Agent15/2hr1801,260
Total~25,180~176,100
🧬Dividend Neural Engine — 7 Modules (Boris Banushev Framework)

🌊 Fourier Denoiser

FFT-based denoising of dividend history. Detects special dividends as 2σ outliers. Isolates true trend vs noise.

active 215 lines

📊 ARIMA Feature

Auto-selects best (p,d,q) via AIC. Forecasts next dividend as ML input feature. Handles short series gracefully.

active 212 lines

🔲 Stacked Autoencoder

PyTorch 3-layer encoder/decoder. Compresses 12 dividend features → 16-dim latent vector for LSTM input.

PyTorch 473 lines

🔮 LSTM Predictor

2-layer LSTM + attention. MC Dropout (50 passes) for confidence intervals. Predicts next dividend amount.

PyTorch 445 lines

🎲 GAN Engine

LSTM Generator + CNN Discriminator. Generates 100 synthetic dividend futures. cut_probability → 8% ensemble vote.

ensemble voter 432 lines

🗺️ SOM Anomaly Detector

10×10 Self-Organized Map. Detects unusual payout/FCF trajectories. Returns similar tickers by BMU proximity.

minisom 487 lines

🔗 Eigen / Contagion

PCA on dividend growth matrix. Dividend contagion risk: if sector peer cuts, who follows? 24h TTL cache.

sklearn PCA 256 lines
🏭Harvey Unified ML API — Port 9001
Status✓ healthy
Serviceharvey-unified-ml
Version4.0.0
Models loaded7
Host0.0.0.0:9001 (public) + nginx proxy
Endpoints/score/symbol · /predict/yield · /predict/cut-risk · /predict/payout-rating · /health
CPU usage0.9%
Memory1.2% of 27 GB ≈ 330 MB
⚖️Ensemble Weight Distribution
Gemini 2.5 Pro31%
Perplexity Sonar21%
Harvey GPT-5 (Azure OAI)17%
DeepSeek-R113%
GAN Scenarios (Neural)8%
BERT News Sentiment5%
Claude Sonnet 4 (Verification)5%
🛡️ML-Powered Moat Features
FeatureExtractionService
Extracts 12 proprietary dividend features from DB + market data for ML input pipeline.
active
DividendQualityScorer
ML scoring of dividend quality across 6 dimensions. sklearn-based (offline on LLM VM).
sklearn missing (dev)
CutRiskAnalyzer
XGBoost dividend cut risk model. Rule-based fallback active when model unavailable.
xgboost missing (dev)
Harvey Score Service
Pre-computed scores for top 500 tickers daily at 2 AM. Composite dividend health score.
active
NAV Avoidance ML Screen
6 warning signals, erosion trend, distribution trap detection across 50+ securities.
active
Claim Verification Gate
Mandatory fact-check before AI response delivery. Cross-references ground truth tables. Upgraded with apply_multi_source_consensus() — cross-checks LLM output against median consensus, upgrades to VERIFIED within tolerance, marks CONFLICT on deviation.
active
🔬Cross-API Verification Layer — v1.0
multi_source_verifier.py active 804 lines 5-min TTL cache
Fires 5 parallel API calls on every ticker preflight. Computes median consensus across all responding sources. Injects a VerificationResult.context_block() into the LLM system prompt before any response is generated — ensuring Harvey always reasons from ground-truth consensus rather than stale or hallucinated figures.
Data Sources
FMP (Financial Modeling Prep)
Finnhub
EODHD
Alpha Vantage
Harvey DB (unified_dividends)
Verified Metrics
📈 Price
💰 Dividend Yield
📅 Annual Dividend
⚖️ Payout Ratio
🕐 Last Dividend
Discrepancy Thresholds
Price:         2%
Dividend:     5%
Yield:        15%
Payout ratio: 20%
Confidence Levels
HIGH ✅  ≥3 sources agree
MEDIUM ⚡ 2 sources agree
LOW ⚠️   single source

Conflicts flagged & surfaced to LLM
Pipeline flow: ticker preflight → parallel fetch (asyncio.gather) → median consensus → context_block() → LLM system prompt injection → response generation → apply_multi_source_consensus() post-check → VERIFIED / CONFLICT label appended
📐Quantium Library Stack — 20+ Libraries (Dividend Intelligence Pipeline)

edgartools

SEC EDGAR access — 8-K dividend declarations, 10-K XBRL cash flows, Item 5 policy text extraction

Stage 1

exchange-calendars

NYSE/NASDAQ/LSE trading day validation for ex-date confirmation and growth streak counting

Stage 2

pandas-market-calendars

Market-aware date arithmetic — fiscal year boundary detection, pay-gap widening (liquidity stress signal)

Stage 2

arch (GARCH)

GARCH(1,1) volatility model on dividend yield series. Conditional variance = cut risk. Persistence α+β.

Stage 3

pmdarima

Auto-ARIMA with AIC/BIC order search. Dividend payment forecasting with confidence intervals.

Stage 4

prophet

Facebook Prophet with quarterly seasonality decomposition. 2nd member of 3-model ensemble for 4-quarter forecast.

Stage 4

scikit-learn Ridge

Ridge regression (L2, α=1.0) with lag-4 windowed features and StandardScaler. 3rd ensemble member. Dynamic inverse-MAE weighting. CI: ±1.28×val_MAE (80%).

Stage 4inverse-MAE weights

tsfresh

EfficientFCParameters — 800+ statistical features from payment time series. 20-feature curated subset.

Stage 5

empyrical-reloaded

Income Sharpe, income Sortino, max income drawdown. Treats dividends as the "returns" series.

Stage 6

ffn

Financial functions — yield-on-cost CAGR, income consistency score, drawdown analytics.

Stage 6

PyPortfolioOpt

Max-yield portfolio optimization where yield replaces expected returns in the efficient frontier.

Stage 7

Riskfolio-Lib

HRP (Hierarchical Risk Parity) across dividend growth correlations for robust position sizing.

Stage 7

skfolio

Portfolio optimization toolkit — additional constraint handling for income-focused allocation problems.

Stage 7

alphalens-reloaded

Factor IC / ICIR back-testing of Harvey Safety Score, yield screen, DGR screen as alpha signals.

Stage 8

ta · finta

Technical analysis indicators applied to dividend yield time series for regime context signals.

supporting

financedatabase

Sector/industry classification database — sector peer grouping for contagion and factor analysis.

supporting

pypme · rateslib

Public Market Equivalent benchmarking and rates/bond analytics for yield spread context.

supporting
🔧Core Intelligence Services
Intelligent Query Service
19 intent types · multi-model fallback chain · learning loop
Hallucination Prevention
Ticker Preflight Gate · Schema fingerprint · Claim verification
Dividend Safety Ensemble
7-voter weighted system · VERY_SAFE → CRITICAL scale
NAV Erosion Service
50 securities · 6 warning signals · distribution trap detection
Chart Generator Service
6 chart types · matplotlib · server-side PNG generation
PDF Research Report Gen
4 report types · ReportLab · institutional-grade output
SEC Filing RAG Service
EDGAR retrieval · vector search · 10-K/Q synthesis
Stock Card Service
Data-driven per-ticker summaries · ML integration · 30+ stopwords
🌐Real-Time Data Services
FMP Integration (80+ endpoints)
Financial Modeling Prep · market data · fundamentals · estimates
Finnhub Service
Real-time quotes · news · earnings · analyst ratings
X.com Monitoring
Grok agent tools · circuit breaker active · Grok 429 degraded
FRED Economic Data
Federal Reserve economic indicators · interest rate feeds
Unified Dividends Pipeline
Multi-source dividend data · FMP + Finnhub + EDGAR → Azure SQL
Multi-Source Backfill
Auto data discovery · gap filling · Backfilled_Dividends/Prices tables
🚀v4.2 Feature Layer
Semantic Vector Memory
Persistent user conversation memory with semantic retrieval
ElevenLabs TTS
Text-to-speech audio generation for research summaries
Agentic Tool-Calling Loop
Multi-step tool orchestration with self-correction
Code Interpreter (Sandboxed)
Safe Python execution for user-defined financial calculations
SSE Token Streaming
Real-time token-by-token response streaming via Server-Sent Events
WebSocket Alert Push
Real-time dividend alert delivery via persistent WebSocket
User Profile Injection
Personalized context from persistent user profiles into LLM prompts
Auto Fine-Tuning Pipeline
Fully automated: Sunday 1 AM UTC · 3 sources (50/35/15%) · min 200 pairs · gpt-4o-mini-2024-07-18 · 6h polls · zero manual steps
Harvey v4.0 Training Agent
29 seed pairs across 6 categories · 15 LLM variations per run (every 2h) · ~180 pairs/day into InvestmentResearcherTraining (35% fine-tune weight)
🆕Recent Platform Upgrades
Chat History API
Full conversation persistence to Azure SQL. 4 endpoints: GET /api/v1/chat/conversations (list recent), GET /conversations/{id}/messages (restore thread), DELETE /conversations/{id}, POST /conversations/new. Streams persist via _history_save_wrapper. User/conversation IDs via x-user-id/x-conversation-id headers.
activechat_history_routes.py4 endpoints
Analysis Depth Overhaul v4.1
Every single-ticker dividend card now calls Claude Sonnet for 5 mandatory analysis sections: Dividend Sustainability (VERY SAFE/SAFE/WATCH/AT RISK/DANGER), Growth Trajectory, Yield in Context (vs sector/S&P/T-bill/peers), Business Quality, and Harvey's Verdict (BUY/HOLD/WATCH/AVOID). max_tokens raised to 2800. Banned-phrase enforcement. prefer_groq=False.
Claude Sonnet 4ai_sdk_routes.py5 sections2800 tokens
Two-Layer LLM Response Cache
Semantic response cache with in-memory dictionary (L1) + Azure SQL persistent table (L2). Reduces repeated-query token cost by 30–50%. Semantic similarity matching prevents stale cache misses. LRU eviction on L1 overflow.
activeL1 in-memoryL2 Azure SQL
99.999% Accuracy Enforcement (v1.0)
Multi-layer accuracy framework: middleware interception, claim verification gate, stale data refresh trigger, multi-source cross-validation. Batch INSERT logging (1 background thread replaces 24 sequential DB calls — ~4.2s → 200ms). Harvey accuracy log table auto-ensured once per process.
activeclaim_verification_gate.pybatch INSERT
Broker Connect via Plaid (REST + MCP)
5 REST endpoints at /api/plaid/* plus 5 MCP tools. 3-step flow: create link token → connect account → sync portfolio. CUSIP (100%), ISIN (95%), ticker (70%) institutional-grade security matching. Dividend-only filtering, portfolio upsert, ownership-verified disconnect.
active/api/plaid/*PLAID_CLIENT_ID + PLAID_SECRET
DividendScreener Stale-Connection Retry
2-attempt retry loop in request_handler.py. On OperationalError at attempt 0 calls engine.dispose() to recycle the pymssql connection pool, then retries. Graceful fallback to LLM narrative if second attempt also fails. Eliminates "connection closed" errors after DB idle.
activerequest_handler.py:637pool recycle
Fix-It Agent v2.1
Hourly scan of conversation logs for failed responses. 19 fix strategies (7 new in v2.1: TAX_EFFICIENCY, ETF_ANALYSIS, DRIP_COMPOUNDING, RETIREMENT_INCOME, COMPARISON_ANALYSIS, POSITION_SIZING, DGI_STRATEGY). 45 bad-response detection patterns (12 new: knowledge-cutoff deflections, clarification-fishing, tool-error leakage, hedging, truncated/disclaimer-only responses, AI identity deflections, info-begging, accuracy disclaimers, hallucination confidence hedges, stall phrases).
active19 strategies45 patternsv2.1
DIVIDEND_SCREENER Training Data v2
86x failure-pattern fix for SCREENER_LIST intent. dividend_intelligence_training.py: 12→20 scenarios, 8 new query pools (PRICE_CONSTRAINED, FCF_COVERAGE, TICKER_SEEDED, RECOVERY_DIVIDEND, NEGATIVE_SCREEN, INCOME_TARGET_REVERSE, COMPARATIVE_BETTER, VAGUE_INTENT). comprehensive_investment_trainer.py: +36 templates across 6 gap-pattern categories. LLM override prompt for screener intents — requires concrete ticker list with yield, safety rating, and rationale.
active20 scenariosSCREENER_LIST86x fix
🧠Harvey v4.0 Standalone Capabilities
General Knowledge Handler
Detects educational/definitional queries ("what is inflation", "explain DCF") via prefix matching + capability triggers. Routes before ticker extraction — Harvey answers as CIO-level financial educator.
activeharvey_persona.py
Capability Registry
34 structured capabilities across 8 categories: Dividend Analysis, Portfolio Strategy, ETF Intelligence, Equity Research, Market/Macro, Screeners, Education, Advanced Tools. Exposed via GET /api/v1/harvey/capabilities.
activeharvey_capabilities.py
Agentic Reflect Loop
Fires after CHAT PATH LLM stream completes — checks if critical content is missing and appends a supplement. Only activates for non-low-latency queries with parsed tickers. Capped at 3s via SIGALRM.
active_reflect_and_supplement()
Graceful Degradation
50 pre-built Q&A answers for common finance topics. Activates on full provider failure. Health exposed via GET /api/v1/harvey/health — 6 provider statuses (Azure OAI, Claude, Gemini, OpenAI, Groq, graceful_degradation).
active50 answersgraceful_degradation.py
Unified Persona Contract
Single source of truth for Harvey's CIO/Chief Strategist identity. get_system_prompt(mode="full"|"compact"|"fast") assembles right persona for each call site. Imported by llm_providers.py and intelligent_query_service.py.
activeharvey_persona.py
📊StructuredKPIService — Perplexity Finance-Style Breakouts
Auto-appended to every single-ticker CHAT PATH response. Auto-detects security type via symbol sets + FMP profile. Fetches data in parallel via ThreadPoolExecutor (5s hard cap). Renders clean markdown panels.
BLOCK TYPE — stock
Business segments, valuation ratios, P&L summary for general equities.
BLOCK TYPE — dividend_stock
Yield, FCF payout, ML cut risk, 7-model safety score, growth streak years.
BLOCK TYPE — reit
FFO proxy, AFFO payout ratio, debt/equity, ML cut risk score.
BLOCK TYPE — etf
Top 10 holdings, sector allocation, expense ratio, AUM.
BLOCK TYPE — dividend_etf
Weighted yield, NAV erosion flag, distribution schedule.
BLOCK TYPE — bank
Net Interest Margin, ROE, ROA, efficiency ratio, P/Book.
BLOCK TYPE — mlp
DCF coverage ratio, distribution yield, debt/EBITDA, commodity exposure.
🏗️HarvestEngine Platform — 8 Modules
Backtesting Engine
6 strategies · DRIP, Growth, High-Yield, Capture, Aristocrat, Covered Call
Portfolio Optimizer
Dividend income optimization · Markowitz-inspired allocation
Dividend Risk Analyzer
NAV erosion, cut risk, FCF coverage, sector contagion
Income Impact Simulator
Dividend income impact projection with DRIP compounding
Dividend Calendar
Ex-date tracking · declaration date patterns · 25 securities
NAV Avoidance Screener
50 securities · 20 profiles · 6 warning signals
Multi-Model Safety Ensemble
Orchestration layer · 7 voters · weight redistribution on failure
HarvestEngine Continuous Training
Expert Q&A pair generation · 13,120 rows in DB · ThreadPoolExecutor
🧪Dividend Intelligence Pipeline — 8 Services (Quantium Library Stack)
Full ML pipeline orchestrated by dividend_intelligence_pipeline.py — all 8 stages confirmed available: true. Returns unified DividendIntelligenceReport with Harvey composite score & plain-language verdict.
STAGE 1 — EDGAR Dividend Service
edgar_dividend_service.py
edgartools pulls 8-K declared dividends + 10-K XBRL cash flow + Item 5 policy text. Policy type classifier (consistent / growth / variable / suspended).
edgartoolsSEC EDGARXBRL
STAGE 2 — Trading Calendar Service
trading_calendar_service.py
exchange_calendars validates ex-dates, counts consecutive growth years respecting fiscal years, detects pay-gap widening (liquidity stress), estimates next ex-date.
exchange-calendarspandas-market-calendars
STAGE 3 — Yield Volatility Service
yield_volatility_service.py
ARCH/GARCH(1,1) fit on dividend yield time series → conditional variance = cut risk signal. 0–100 score, regime (stable / elevated / crisis), persistence alpha+beta.
arch/GARCHcut risk 0–100
STAGE 4 — Dividend Forecast Service
dividend_forecast_service.py
Three-model ensemble: Auto-ARIMA (pmdarima) + Facebook Prophet + Ridge regression (L2, α=1.0, lag-4 windowed features, StandardScaler). Dynamic inverse-MAE weighting — weight ∝ 1/val_MAE, equal-weight fallback if <2 valid. Next 4 payment forecasts with 80% CI. Predictability score 0–100.
pmdarimaprophetRidge4-quarter CIinverse-MAE weights
STAGE 5 — Dividend Feature Extractor
dividend_feature_extractor.py
tsfresh EfficientFCParameters extracts 800+ features from payment time series. Curated 20-feature subset + stability score 0–100. Powers the tsfresh Cut-Risk Classifier: GradientBoostingClassifier (n_estimators=200, max_depth=4, lr=0.05, subsample=0.8) — upgraded from RandomForest. Time-ordered train/test split with 10% embargo gap prevents look-ahead bias. Returns probability + top-5 feature drivers with direction signals. Monthly auto-retraining.
tsfresh800+ featuresstability 0–100GradientBoostingmonthly retrain
STAGE 6 — Income Analytics Service
income_analytics_service.py
empyrical-reloaded + ffn compute income Sharpe, income Sortino, yield-on-cost CAGR, max income drawdown, income consistency score 0–100.
empyricalffnYOC CAGR
STAGE 7 — Portfolio Income Optimizer
portfolio_income_optimizer.py
PyPortfolioOpt max-yield optimization (yield replaces expected returns), HRP across dividend growth correlations, Kelly position sizing (win_prob = 1 − cut_prob).
PyPortfolioOptRiskfolio-LibHRP · Kelly
STAGE 8 — Dividend Factor Analyzer
dividend_factor_analyzer.py
alphalens-reloaded back-tests Harvey Safety Score, yield screen, DGR screen as alpha factors. Returns IC / ICIR metrics proving or refuting each screen's predictive power.
alphalensIC · ICIRalpha factor
🔗MCP Server v3.1 — 23 Financial Intelligence Tools · 5 Capability Groups
OAuth 2.1 one-click connection · stdio (Claude Desktop) + HTTP/SSE at /api/mcp/sse · MCPGuard prompt injection + rate limiting + SHA-256 audit log
📡Group 1 — Data (7 tools)
get_dividend_history
Full dividend payment history for any ticker from Azure SQL
data60/min
get_stock_price
Real-time price + yield from FMP integration
data60/min
get_dividend_calendar
Upcoming ex-dates, pay dates, declaration dates
data60/min
get_company_fundamentals
FCF, payout ratio, debt-to-equity, sector from DB views
data60/min
get_sec_filings
SEC EDGAR 10-K/10-Q/8-K structured extraction with 7-day cache
data30/min
get_earnings_transcript
FMP earnings call transcripts — dividend guidance + guidance language extraction
data30/min
get_market_data
Macro indicators, sector ETF flows, FRED interest rate data
data60/min
🔍Group 2 — Screening (3 tools)
screen_dividends
Filter securities by yield, DGR, payout ratio, streak, sector
screening60/min
screen_dividend_aristocrats
Filter by consecutive growth years — Aristocrats (25yr+), Kings (50yr+), Champions
screening60/min
screen_nav_safe_etfs
NAV-erosion-free ETF screening — 6 warning signals, 50 securities tracked
screening30/min
📊Group 3 — Analytics (4 tools)
analyze_dividend_safety
Full 7-voter ensemble safety score: VERY SAFE / SAFE / WATCH / AT RISK / DANGER
analytics20/min
compare_dividends
Side-by-side multi-ticker dividend comparison matrix with safety ratings
analytics60/min
forecast_dividend
Three-model ensemble forecast (ARIMA + Prophet + Ridge) — next 4 quarters with 80% CI
analytics20/min
optimize_portfolio
HarvestEngine max-yield portfolio optimization with HRP position sizing
analytics10/min
🧠Group 4 — Intelligence (4 tools)
ask_harvey
Natural language query to Harvey's full AI pipeline. Prompt injection & manipulation detection via MCPGuard.
intelligence10/min · 50/hrguarded
generate_research_report
Claude Opus 4.5 deep research — IoC, earnings, sector, dividend sustainability, risk
intelligence5/min
get_harvey_capabilities
List Harvey's 34 capabilities across 8 categories for agent discovery
intelligenceno limit
get_durability_graph
Composite durability score with 6 explainable sub-scores, stress scenarios, historical trends
intelligence20/min
🏦Group 5 — Brokerage / Plaid Connect (5 tools)
create_brokerage_link_token
Step 1: Generate Plaid Link token for OAuth brokerage connection flow
brokerage30/min
connect_brokerage_account
Step 2: Exchange public token → access token; persist to DB. CUSIP/ISIN/ticker matching.
brokerage30/min
sync_brokerage_portfolio
Step 3: Pull holdings → enrich with dividend data → upsert portfolio. Dividend-only filter.
brokerage10/min
get_brokerage_portfolio
Retrieve synced portfolio positions with Harvey safety scores and income projections
brokerage60/min
disconnect_brokerage
Revoke Plaid access token + purge portfolio data. Ownership-verified delete.
brokerage10/min
VersionMCP Server v3.1 · MCP SDK v1.26.0
Transportsstdio (Claude Desktop) + HTTP/SSE at /api/mcp/sse
AuthOAuth 2.1 one-click connection · API key for REST consumers
SecurityMCPGuard — prompt injection (16 override + 7 manipulation patterns) + sliding-window rate limiter + SHA-256 audit log → dbo.mcp_audit_log
Brokerage matchingCUSIP 100% · ISIN 95% · Ticker 70% — institutional-grade security resolution
Health/api/mcp/health · /api/mcp/tools
🤖Claude Intelligence Layer v1.2.0 — 8 Services
ClaudeClient (Async HTTP Core)
httpx async wrapper for Anthropic API — sonnet/opus/haiku. Lazy import, fails gracefully if ANTHROPIC_API_KEY absent. No SDK dependency. Shared across all 7 other Claude services.
Sonnet 4 / Opus 4.5httpxno SDK
Premium Query Router
39-signal frozenset routes complex queries to Claude Sonnet 4 with 6,000-token budget. Triggers: initiation of coverage, passive income plan, retirement portfolio, deep dive, investment thesis. OAI fallback on failure.
Sonnet 439 signalsrequest_handler.py
Deep Research Agent
Opus 4.5 for institutional-grade equity research. 5 report types: initiation-of-coverage, earnings deep dive, sector comparative, dividend sustainability, risk assessment. Multi-section structured output.
Opus 4.5POST /api/v1/claude/research/report
Training Quality Reviewer
Scores Q&A training pairs across 5 rubric dimensions. Returns verdict (pass/review/fail), dimension scores, improvement notes. Batch up to 50 concurrent. Raises overall fine-tune dataset quality.
Sonnet 4POST /api/v1/claude/training/reviewbatch 50
Self-Improvement Engine
Queries harvey_query_log + harvey_feedback DB tables. Outputs SelfImprovementReport: persona health score 0–100, knowledge gaps, prompt refinements, training recommendations. Scheduled analysis.
Sonnet 4POST /api/v1/claude/improve/analyze
Safety Ensemble Vote
Claude Sonnet 4 as 5% weighted voter in the 7-model dividend safety ensemble. Logical consistency check + hallucination detection. Dynamic weight redistribution on model failure.
Sonnet 45% weightPOST /api/v1/claude/ensemble/safety-score
Financial Formula Engine
43+ deterministic formulas across 8 categories with Claude explanations and Excel syntax. Prevents hallucination on quantitative calculations. Gordon Growth Model, Yield on Cost, AFFO Payout, DCF and more.
Sonnet 443 formulasGET /api/excel/formulas/list
ClaudeDeepResearch (Direct Methods)
Three callable research primitives: generate_initiation_report() — full IoC with valuation + dividend thesis; generate_dividend_deep_dive() — payout safety + growth trajectory + BUY/HOLD/WATCH/AVOID verdict; generate_sector_comparison() — peer ranking with dividend yield matrix.
Opus 4.53 research primitivesPOST /api/v1/claude/research/report
Status endpointGET /api/v1/claude/status
Clienthttpx async (no Anthropic SDK) — lazy import, fails gracefully if ANTHROPIC_API_KEY absent
Modelsclaude-sonnet-4-20250514 (default) + claude-opus-4-20250514 (deep research only)
𝕏X Real-Time Signal Service — v2.0 (X API 2025)
grok_x_dividend_service.py pay-per-use XMCP-ready grok-3-fast MCP tool #19
X is the most real-time data platform on earth. Harvey now queries it via Grok x_search for any ticker — not just monitored ETF accounts. Pay-per-use billing (X API 2025) means no monthly credit exhaustion. XMCP Server hook will upgrade to native X MCP context when configured.
Signal Types
📣 Dividend announcements
⚠️ Cut / suspension warnings
📊 Earnings reactions
🏦 Analyst calls
👤 Insider activity
🔴 Breaking news
Coverage
Any public ticker
10 monitored ETF accounts
All public X posts
Image understanding
1–30 day lookback
Sentiment classification
New Endpoints
GET /api/x/signals/{ticker}
GET /api/x/signals/{ticker}/dividend
GET /api/x/status
GET /api/x/xmcp/status
MCP Tool #19
get_x_dividend_signals
Args: ticker, days_back
Returns: signals[], sentiment_summary, top_signal

Claude/Cursor can now ask:
"What is X saying about $T?"
Grok Responses API /v1/responses x_search tool no monthly limits XMCP fallback active env: X_XMCP_SERVER_URL
🗂️API Routes (65 files)
advanced_analyticsadvisoragent_toolai_sdkalert_pushclaude_intelligencecode_interpretercomprehensive_trainingcurated_listdashboarddatabase_mldeep_researchdeepseek_trainingdividend_aristocratsdividend_intelligencedividend_listsdividend_neuraldividend_pipelinedocument_learningeducation_trainingexternal_ml_apifile_processingfinetuningfingptfinrobotfmpfmp_traininggeneral_investmenthallucination_preventionharvestharvest_traininghashtaginvestment_strategymarket_intelligencemcpml_predictionmoatmulti_source_trainingnotebookperplexityperplexity_trainingquantlibs_trainingrecommendationrecommendation_trainingresearcher_agentrlmroundtablesecurity_comparisonsemanticsentimentsocial_mediastrategicstreaming_chattrading_intelligencetrading_strategies_trainingtraining_managementttsultimate_packunified_data_lakeuser_profilevideovideo_trainingx_dividendx_dividend_training
🌐Network & Ports
8001Harvey Main API (public)
9001ML API (public)
9000Internal API (localhost)
8000Shadow / restarting
443Nginx → llm.theharvey.ai
🗄️Database
EngineAzure SQL Server
Serverhey-dividend-sql-server.database.windows.net
DatabaseHeyDividend-Main-DB
Driverpymssql (native FreeTDS)
ViewsvSecurities, vDividendsEnhanced, vSchedules, vSignals, vPredictions
unified_dividends787,787 rows · canonical primary source
Data guardCHK_unified_dividends_amount_sanity · amount > 0 AND ≤ $500 · 0 bad rows
Guard layers① code batch-uniformity check ② SQL CHECK constraint ③ moat cleanup endpoint
⚙️Architecture Enhancement Layer
Model Telemetry System
active
Cost-Aware Router (4-tier)
active
ML-Based Intent Classifier
active
Shared Cache (LRU in-mem)
active
Async DB Pool (ThreadPoolExecutor)
active
Response Gating
active
Parallel Source Fan-Out
active
RAG Retrieval Reranking
active
Circuit Breaker (ML API)
active
ML Health Monitor (30s interval)
active
🔑Active External Integrations
Azure OpenAI
Endpointhtmltojson-parser-openai-a1a8.openai.azure.com
DeploymentHarveyGPT-5
Status✓ active
xAI Grok-4
PurposeX.com monitoring, real-time social sentiment
Status✗ 429 rate-limited (ongoing — monthly credits exhausted)
ImpactEnsemble redistributes weight; X Dividend Training Agent degraded
Google Gemini 2.5 Pro
PurposeEnsemble (31%) + market intel
Status✓ active
Anthropic Claude
ModelsSonnet 4 + Opus 4.5
PurposeEnsemble + deep research
Status✓ active (httpx, no SDK)
Perplexity Sonar
PurposeEnsemble (21%) + research training
Status✓ active
ElevenLabs TTS
PurposeAudio generation for research
KeyELEVENLABS_API_KEY ✓
Status✓ configured
FMP (Financial Modeling Prep)
Endpoints80+ API endpoints
Status✓ active
Helicone LLM Observability
PurposeToken tracking, latency, cost
Status✓ active (proxy layer)
🔁GitHub Actions — ML Training Pipeline
Workflowtrain-ml-models.yml · daily 2 AM UTC + manual dispatch
Models6 total — Dividend Growth, Cut Predictor, Anomaly Detection, ESG Scorer, Payout Rating, Portfolio Optimization
No-DB models (always train)① Growth Forecaster ⑤ Payout Rating ⑥ Portfolio Optimizer — use synthetic data when DB unreachable
DB-dependent models② Cut Predictor ③ Anomaly Detection ④ ESG Scorer — train when Azure SQL reachable
UploadAzure Blob Storage → ml-models container · versioned archive per run
Status✓ Fixed — exit code 2 (pip cache / MSSQL install) + TRAINED=0 (wrong args) resolved
🩺VM Cron Health — LLM VM (20.81.210.213)
vm_training_cron.sh✓ execute permission fixed (was chmod -x, silent 30-min fail)
training_health_check.py✓ deployed — runs every 6h via cron, validates all training tables + unified_dividends integrity
Health checksRow counts per training table · unified_dividends 787k+ · amount sanity · zero / over-$500 scan
📋Log Health (Current Session)
Dominant error sourceGrok-4 HTTP 429 (96%+ of all ERROR events) — ongoing monthly credit exhaustion
Other errorsDB timeouts, yfinance fallback, config misses (non-Grok)
Log file size~205 MB (accumulated; rotate monthly)
Live countsSee Live Logs tab → ERROR filter for real-time totals
Accuracy logBatch INSERT active — 1 background thread per verify_all_claims() call (~200ms vs ~4.2s sequential)
Loading logs…
Harvey Backend · Replit dev env · auto-refreshes every 10s

🌐 Next.js Frontend

llm.theharvey.ai · Dark blue UI · Vercel AI SDK · REST consumers · RIA / broker-dealer clients

🛡️ Nginx Reverse Proxy

TLS termination · Port 443 → 8001 · /api/internal/ml → 9001

⚡ FastAPI :8001

67 route files · API key auth · Rate limiting · ASGI streaming middleware

📊 Helicone

LLM observability · Token tracking · Latency · Cost per model

🔌 Vercel AI SDK Layer

/api/ai-sdk · Tool-call streaming protocol · SSE streams · 4,400+ lines · Next.js native

🔔 WebSocket Alerts

/ws/alerts/{user_id} · Real-time dividend cut-risk push · Async scanner · Connected user tracking

🎙️ ElevenLabs TTS

/api/v1/tts/synthesize · Harvey responses → MP3 audio · Voice presets · ELEVENLABS_API_KEY

📊 Dashboard Builder

/api/v1/dashboards · Custom ticker dashboards · Min 3 tickers · Widget config · Persist to DB

🗂️ Query Router

27+ query types · Intent classification · Asset class detection (6 classes)

🎭 Claude Premium Router

39 premium signals · IoC reports · Investment thesis · Retirement plans · Deep dives → Sonnet 4 (6K tokens) · OAI fallback

🎯 ML Intent Classifier

TF-IDF + ensemble scoring · 19 intent types · Semantic routing

🔍 Ticker Preflight Gate

Real-time ticker validation · Stopword filter · $TICKER prefix bypass

⚖️ 7-Voter Weighted Ensemble · Dynamic weight redistribution on failure
🟣 Gemini 2.5 Pro31% · Fundamentals
🔵 Perplexity Sonar21% · Real-time news
🟢 Harvey GPT-517% · Complex reasoning
🔶 DeepSeek-R113% · DCF / FCF math
🌐 GAN Scenarios8% · Stress-test futures
📰 BERT Sentiment5% · FinBERT news
🎭 Claude Sonnet 45% · Verification gate

🔌 ClaudeClient

httpx async core

Sonnet 4 / Opus 4.5

No SDK · lazy import

Fail graceful

📋 Deep Research Agent

Opus 4.5 · 5 report types

IoC · Earnings deep dive

Sector · Dividend · Risk

🎓 Training Reviewer

5-rubric QA scoring

pass/review/fail verdict

Batch up to 50

🔄 Self-Improvement

Queries query_log + feedback

Knowledge gap detection

Health score 0-100

📐 Formula Engine

43 deterministic formulas

8 categories · Excel syntax

DCF · DDM · YOC · AFFO

🎯 Premium Router

39 signals → Sonnet 4

7 complex intent types

6K token budget

✅ Safety Vote

5% ensemble weight

Logical consistency

Hallucination check

🔬 DeepResearch Methods

generate_initiation_report

generate_dividend_deep_dive

generate_sector_comparison

🔬 Deep-Dive Framework

10 institutional research templates · IoC · Earnings · Sector · Risk

🛡️ Hallucination Prevention

Claim verification · Schema fingerprint · Multi-angle fact-check · Response gating

🤖 Agentic Tool-Calling

/api/v1/agent/query · Multi-step financial reasoning · Tool loop · List available tools

🐍 Code Interpreter

/api/v1/interpret/run · Sandboxed Python execution · Financial modelling · DCF sandbox

📚 SkillLoader Pipeline

34 financial skills · 250+ trigger keywords · Auto-injects methodology context into system prompt

🔄 Two-Layer LLM Cache

L1 in-memory (LRU) + L2 Azure SQL · semantic similarity · 30-50% token cost reduction

📜 Chat History API

4 endpoints · Azure SQL persistence · x-user-id / x-conversation-id headers · streaming wrapper

🎯 Accuracy Enforcement

99.999% target · claim gate · batch INSERT logging · 1 bg thread · multi-source cross-validation

📡 Real-Time Data

FMP (80+ endpoints)

Finnhub · yFinance

EDGAR SEC · FRED

X.com (Grok) · Alpha Vantage

🧬 Dividend Neural Engine

Fourier Denoiser

ARIMA Feature · LSTM Predictor

Stacked Autoencoder

GAN · SOM Anomaly · Eigen

🎓 Training Pipeline

33 agents · ~25k pairs/day

Cron (*/30 min + 2×daily)

256+ screener templates

Roundtable · Trading (13 modules)

🌾 HarvestEngine

Backtesting (6 strategies)

Portfolio Optimizer

NAV Avoidance Screener

Dividend Calendar · DRIP Sim

① EDGAR 8-K dividends · 10-K XBRL · Policy classifier
② Calendar Exchange validation · Fiscal year · Pay-gap stress
③ GARCH Yield volatility · Cut-risk score · Crisis regime
④ Forecast ARIMA + Prophet + Ridge · Dynamic MAE weights · 80% CI
⑤ tsfresh 800+ features · GBRT cut-risk · Top-5 explainer
⑥ Income Sharpe · Sortino · YOC CAGR · Max drawdown
⑦ Optimizer PyPortfolioOpt · HRP · Kelly sizing
⑧ Alphalens Factor IC/ICIR · Safety Score backtest · Alpha signals
3-Source Consolidation → Content-Hash Dedup → JSONL Upload → Azure Fine-Tune Job
💬 harvey_query_memory50% budget · confidence ≥ 0.80 · live conversations
🤖 InvestmentResearcherTraining35% budget · 256+ synthetic Q&A templates
📚 harvey_training_data15% budget · quality ≥ 0.75 · ingestion service
☁️ Azure OAI Fine-TuneAZURE_OPENAI_FINETUNE_MODEL · 3 epochs · harvey_finetuning_jobs

🏭 harvey-unified-ml v4.0.0 — Port 9001 — 7 Models Loaded

/score/symbol · /predict/yield · /predict/cut-risk (GBRT + time-ordered CV) · /predict/payout-rating · /health · conda env: llm (PyTorch + miniconda3) · Circuit Breaker (CLOSED/OPEN/HALF-OPEN) · rate-limit protection

🗄️ Azure SQL — HeyDividend-Main-DB

hey-dividend-sql-server.database.windows.net · pymssql · 20+ training tables · unified_dividends · vSecurities · vDividendsEnhanced · harvey_finetuning_jobs · mcp_audit_log

💾 LRU Cache

5,000 entries in-memory · 10,800s TTL · Async DB pool via ThreadPoolExecutor

📁 ml_training/

966 MB on VM · saved_models/ · tsfresh_cut_risk_classifier.pkl · dividend_lstm.pt · dividend_gan.pt · dividend_som.pkl

☁️ Azure VM

20.81.210.213 · Ubuntu · 27 GB RAM · 248 GB disk

🐍 miniconda3 llm

Python 3.11 · PyTorch · uvicorn · conda env

⏱️ Cron + Scheduler

112 jobs · */30 min training · GBRT monthly retrain

🔑 Secrets (28)

Azure OAI · Grok · Gemini · Claude · Perplexity · FMP · Finnhub · ElevenLabs · FRED · DeepSeek

📡 MCP Server v3.1

23 tools · 5 groups (data/screening/analytics/intelligence/brokerage) · OAuth 2.1 · Claude Desktop · SSE /api/mcp/sse · MCPGuard · Audit log

🧠 What Harvey Has Learned

Training knowledge base — 8 domains, 33+ agents, continuous learning
8
Knowledge Domains
Training Agents
Agent Codebase (KB)
Log Days Available
Est. Conversation Pairs

💬 Conversations & Training

Total Queries
Unique Sessions
Avg Response (chars)
Top Tickers Today
— conversations
p1
Select a date to load conversations
💬
Click a conversation to inspect it