Comparison of three open-source quant agentic frameworks against the
agent-utilities finance domain (agent_utilities/domains/finance/*,
CONCEPT:KG-2.6), with three concrete gaps closed:
CONCEPT:KG-2.26— Trade-journal bias auditor + shadow accountCONCEPT:KG-2.27— Agent calibration / reputation trackingCONCEPT:KG-2.28— Persona decision-heuristic enrichment
The thesis: each borrowed capability is made stronger by our OWL/KG / graph-native / self-evolution substrate — the borrowed feature stops being a one-shot report and becomes a reasoned-over, provenance-bearing, calibrated fact the rest of the system can cite.
A skill-rich research agent: 74 skills (ML-strategy, SMC, Elliott-wave,
sector-rotation, perp-funding-basis, on-chain analysis, liquidation-heatmap,
options/SABR vol-surface, …) and 29 swarm presets. Its standout, novel
capability is the shadow account / trade-journal audit
(agent/src/shadow_account/*, agent/src/tools/trade_journal_tool.py): it parses
a broker export, FIFO-matches roundtrips, builds a trading profile (win rate,
holding days, PnL ratio, max drawdown), and runs 4 behavioural-bias
diagnostics (disposition effect, overtrading, momentum-chasing, anchoring),
then distils profitable roundtrips into if-then shadow rules and backtests
the counterfactual "shadow" portfolio against the user's real trades for delta-PnL
attribution. The audit is a flat report — not persisted as a queryable fact.
A deliberately thin autonomous loop (autohedge/): a fixed
Director → Quant → Risk → Execution agent chain (plus sentiment), structured
JSON output, risk-first sizing, on-chain (Solana/Jupiter) execution. Strength:
clean risk-first pipeline ergonomics. Weakness: no memory of which agent was
right before, no behavioural feedback, single-venue, no graph substrate. Its
role taxonomy already maps onto our SwarmRole.
A desktop terminal (fincept-qt/) wrapping 100+ data sources (yfinance,
OECD/BEA/BoE macro, akshare, financial-datasets, …), 16 broker execution
integrations, and the richest persona registry of the three: 37 persona
agents across TraderInvestorsAgent, hedgeFundAgents (Renaissance-style
archetypes), GeopoliticsAgents, and EconomicAgents. Critically, each persona
config (TraderInvestorsAgent/configs/agent_definitions.json) carries an
explicit, scored decision framework — Buffett: ROE≥15% / D/E<0.5 /
owner-earnings yield; Graham-style value gates; named scoring_weights and
required line_items. That structure is the seed for KG-2.28. Weakness: no KG,
no calibration/reputation, frameworks live in prose + weights but aren't
executable, queryable graph facts.
| Capability | Vibe-Trading | AutoHedge | FinceptTerminal | agent-utilities (before) | agent-utilities (after) |
|---|---|---|---|---|---|
| Bull/bear + persona-voice debate w/ risk veto | partial | basic | persona prose | ✅ debate_engine + investor_debate |
✅ |
| Role-weighted swarm consensus | preset swarms | fixed chain | — | ✅ trading_swarm |
✅ |
| Forensic earnings screen (M/Z/F/Sloan) | — | — | data only | ✅ forensic_screener (engine) |
✅ |
| Portfolio opt / VaR / regime / alpha factors | partial | basic | data | ✅ KG-2.6 suite | ✅ |
| Trade-journal behavioural-bias audit | ✅ (flat report) | — | — | ❌ | ✅ KG-2.26 (KG-persisted) |
| Shadow account (trader profile as signal) | ✅ (report) | — | — | ❌ | ✅ KG-2.26 |
| Agent calibration / reputation feedback | — | — | — | ❌ | ✅ KG-2.27 (Brier → swarm weights) |
| Executable, queryable persona heuristics | — | — | prose + weights | partial (voice only) | ✅ KG-2.28 (OWL + evaluator) |
| 100+ data sources / many brokers | some | 1 venue | ✅ | partial | (out of scope) |
| Graph-native provenance / OWL reasoning | — | — | — | ✅ epistemic-graph + OWL | ✅ |
The three gaps closed here are the ones that are both absent from us and multiplied by our substrate — not the breadth gaps (data-source/broker count), which are integration surface rather than capability.
3. Hidden value-adds — where our substrate makes a borrowed capability stronger
The three frameworks treat these features as terminal outputs. Because we have an epistemic graph + OWL ontology + calibration-feedback loop, the same feature becomes an input to further reasoning:
- Trade-journal audit → learning signal (KG-2.26). Vibe-Trading produces a
PDF-shaped report. We persist the profile + each bias as
:TraderProfileand:BehavioralBiasnodes (EXHIBITED_BY), so a future Bull/Bear debate or the risk officer can cite them: "this account exhibits a HIGH disposition effect → weight the bear's stop-loss discipline up." The audit becomes a fact reasoned over, with provenance, not a one-shot artifact. - Calibration feedback (KG-2.27). Palantir AIP and Fincept have personas but
no memory of who was right. We record each persona's directional calls vs
outcomes, score them with the engine's
brier_scorekernel, and feed the calibration back into the weightedSwarmConsensusso historically-accurate voices outvote the rest. Each score is an:AgentCalibrationnode (CALIBRATION_OFthe agent) — a queryable reputation, e.g. "which persona has the best Brier on tech shorts?" - Persona heuristics as OWL facts (KG-2.28). Fincept's frameworks live in
prose + weights. We make them
:DecisionHeuristicOWL individuals (HEURISTIC_OFa persona:Agent) with a deterministic evaluator, so the graph can answer "which personas' value criteria does ACME pass?" and a Buffett bull cites the exact passing/failing rule. The engine's forensic kernels feed Burry's short triggers directly — borrowed structure, grounded in our reasoned-over numbers.
In every case the differentiator is the same: reasoned-over facts + KG provenance + calibration feedback turn a static borrowed feature into a self-improving one.
agent_utilities/domains/finance/trade_journal.py
from agent_utilities.domains.finance import TradeJournalAuditor, Roundtrip
auditor = TradeJournalAuditor()
profile = auditor.audit("acct_42", roundtrips) # win rate, PnL ratio, max DD, 4 biases
profile.bias("disposition_effect").severity # low | medium | high
auditor.persist(profile, backend) # -> :TraderProfile + :BehavioralBias KG nodes- Real maths: FIFO-matched roundtrips in; win rate, avg holding period, PnL ratio (avg-win / avg-|loss|), cumulative-equity max drawdown; disposition (loser/winner hold ratio), overtrading (busy-vs-quiet-day PnL gap), momentum-chasing (buys >3% above own prior buy), anchoring (<5% entry-price CV).
- Wiring → KG:
to_batch()emits the standardExtractionBatch(GraphNode / EnrichmentEdge) andpersist()writes it through the samewrite_batch→GraphBackendpath every enrichment source uses.Nonebackend degrades to a no-op so the audit runs fully offline.
agent_utilities/domains/finance/calibration_tracker.py
from agent_utilities.domains.finance import (
CalibrationTracker, apply_calibration_to_swarm,
)
t = CalibrationTracker()
t.record_call("quant_01", direction=+1, confidence=0.85, subject="ACME")
t.record_outcome("quant_01", realized_direction=+1, subject="ACME")
t.score("quant_01") # accuracy + Brier + calibration in [0,1]
apply_calibration_to_swarm(swarm, t) # LIVE wire: mutates swarm.config.role_weights- Brier: engine
client.finance.brier_scorewhen reachable, vetted local fallbackmean((f-o)^2)offline; calibration =clamp(1 - 2·brier, 0, 1). - Wiring → SwarmConsensus:
calibrated_role_weights()scales each role's base weight by the average calibration of its agents (floored, never zeroed);apply_calibration_to_swarm()writes them back into the live swarm'sconfig.role_weights, which the existingTradingSwarm.analyzeweighted aggregation already consumes — so the nextanalyze()lets a high-calibration quant outvote a miscalibrated sentiment agent. A live-path test proves a swarm's decision flips HOLD → BUY after calibration. - Wiring → KG:
persist()writes:AgentCalibrationnodes (CALIBRATION_OF).
agent_utilities/domains/finance/persona_heuristics.py
from agent_utilities.domains.finance import evaluate_persona, evaluate_all
ev = evaluate_persona("graham_investor", {"pe": 11, "pb": 1.1, "margin_of_safety": 0.4})
ev.verdict # bullish | neutral | bearish | insufficient_data
ev.citation() # names the exact passing/failing rules- Structured heuristics:
PERSONA_HEURISTICSattaches typedHeuristicrules to Graham (P/E<15, P/B<1.5, MoS≥30%), Buffett (ROE/ROIC/owner-earnings/ D/E), Burry (forensic short triggers on Sloan accruals / Beneish-M / Altman-Z), Damodaran (DCF), Druckenmiller (regime), and a Lynch PEG lens. The Burry lens is inverted: satisfying the triggers ⇒ bearish. Missing metrics yieldunknown(never a silent pass). - Wiring → debate/screen:
DebateContextgained ametricsfield;DebateEngine.persona_heuristic_evidence()/_heuristic_block()fold each bound persona's verdict into its bull/bear prompt by default when metrics are present (generic path untouched otherwise). A live-path test asserts the Graham verdict +KG-2.28marker appear in the prompt block. - Wiring → KG/OWL:
persona_heuristics_batch()/seed_persona_heuristics()emit:DecisionHeuristicnodes (HEURISTIC_OF); matching OWL classes (:DecisionHeuristic,:AgentCalibration,:TraderProfile,:BehavioralBias) and object properties (:heuristicOf,:calibrationOf,:exhibitedBy) were added toagent_utilities/knowledge_graph/ontology_quant.ttl.
| ID | Capability | Borrowed from | Module |
|---|---|---|---|
KG-2.26 |
Trade-journal bias auditor + shadow account | Vibe-Trading | domains/finance/trade_journal.py |
KG-2.27 |
Agent calibration / reputation tracking | (novel; gap vs Palantir/Fincept) | domains/finance/calibration_tracker.py |
KG-2.28 |
Persona decision-heuristic enrichment | FinceptTerminal | domains/finance/persona_heuristics.py |
All three degrade cleanly offline (lazy engine + optional KG backend), export via
domains/finance/__init__.py, and are covered by tests under
tests/unit/finance/.