Skip to content

Feature/real agents#5

Open
sahana-sreeram wants to merge 2 commits into
taugroup:mainfrom
sahana-sreeram:feature/real-agents
Open

Feature/real agents#5
sahana-sreeram wants to merge 2 commits into
taugroup:mainfrom
sahana-sreeram:feature/real-agents

Conversation

@sahana-sreeram

Copy link
Copy Markdown

No description provided.

sahana-sreeram and others added 2 commits June 27, 2026 12:43
Replaces the round-based scientific meeting loop with a LangGraph policy
workflow (Policy Director -> stakeholder research -> synthesis ->
recommendation -> red-team -> revise -> forecast) exposed via a single
run_policy_analysis(request) -> PolicyRunResult entry point.

Foundation + mocked end-to-end skeleton so four workstreams can develop in
parallel against frozen Pydantic contracts and example fixtures:
- models.py: 18 frozen policy schemas (legacy meeting models retained)
- graph.py/orchestrator.py: LangGraph state graph + sequential fallback,
  bounded red-team revision loop
- context_builder.py: compact BriefingPacket (no transcript-to-every-agent)
- agents/: policy_director, stakeholder_research, implementation, red_team
  (mock mode; real Ollama path stubbed)
- retrieval.py: retrieve_policy_evidence seam (mock, dedup/rank/filter)
- storage.py (SQLite), source_scoring.py, logger model events
- app.py: general policy Streamlit UI + execute_policy_analysis adapter
- skills/, data/, examples/, evals/ (3 cases), tests/ (22 passing)

General-domain (not transportation-specific): forecasting is domain-gated via
a forecasters/ registry — numeric scenarios when a deterministic domain module
matches (transportation today), otherwise a qualitative directional outlook
with no fabricated numbers. No LLM arithmetic in forecasts.

Local-first by default (MOCK_MODE needs no model); optional frontier fallback
off by default. Preserves upstream MIT license + TAU Group attribution.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Real intelligence behind per-component flags (mock stays default + auto-fallback,
so the app always runs), plus the new agent roster with orchestrator-decided skills.

Agent framework (Person A):
- config.py: per-component MOCK_DIRECTOR/RESEARCH/ANALYSIS flags.
- agent_builder.py: run_structured() shared wrapper — lazy Ollama, JSON-mode,
  Pydantic validation, retry, optional frontier fallback, ModelEvent logging;
  never raises (returns None -> callers fall back to mock). Injectable for tests.

Roster (Director + 3 workers + Red-Team):
- skills_registry.py: reads skills/*/SKILL.md into a catalog.
- Policy Director now assigns each task an agent_type AND a skill set chosen from
  the registry (skills are orchestrator-decided, not hardcoded).
- New Research agent (objective cited evidence) runs as its own graph phase.
- Stakeholder agent loads the task's assigned skills.
- Data Analyst (canonical name for the analysis/recommendation role).
- Red-Team kept for the revision loop.

Orchestration:
- graph/orchestrator: added the research phase (plan -> research -> stakeholder
  -> synthesize -> recommend -> red_team -> forecast); fallback executor updated.
- models.py (additive): AgentType, PolicyTask.agent_type + skills, ResearchBrief,
  PolicyRunResult.research_briefs.

Product:
- forecasters/housing.py: second deterministic domain (registry not transport-only).
- utils.export_policy_brief() (pure python-docx, offline); removed import-time
  pypandoc network download.
- app.py: research section, per-stakeholder assigned-skills display, run history,
  DOCX brief download.

Verified here (no Ollama/agno): 33 tests pass; evals 100% in mock and
real-flags-fallback modes. Defaults stay mock; flip POLICY_MOCK_*=0 with Ollama
to go live.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant