feat: Crisis Triage demo (tiered intelligence)#13
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 37387314cc
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (position "COORDINATOR" response != false) or (position "coordinator" response != false) [ | ||
| report "coordinator" | ||
| ] | ||
|
|
||
| if (position "EXPERT" response != false) or (position "expert" response != false) [ |
There was a problem hiding this comment.
Parse only the ROUTE field from dispatcher output
extract-route-label searches the entire LLM response for tier keywords and returns the first match, so a valid response like ROUTE: BASIC can be misread as expert/coordinator if those words appear in the REASON text. This produces incorrect routing decisions and distorts queue/load behavior whenever the rationale references other tiers, which is a common phrasing for dispatch explanations.
Useful? React with 👍 / 👎.
| if (position "collision" merged != false) | ||
| or (position "toxic" merged != false) | ||
| or (position "evacuation" merged != false) | ||
| or (position "critical care" merged != false) | ||
| or (position "structural" merged != false) [ |
There was a problem hiding this comment.
Make heuristic severity keyword checks case-insensitive
The offline triage fallback does case-sensitive substring matching, so seeded incidents with mixed-case text can miss critical rules; for example, "Critical care equipment on backup power" does not match "critical care" and falls through to a lower severity. This weakens the documented deterministic fallback mode by under-triaging high-risk cases when LLM calls are unavailable.
Useful? React with 👍 / 👎.
Code Review: Crisis Triage Demo ✅Strengths:
Logic Review:
Minor Recommendations:
Status: APPROVED FOR MERGE ✅ |
… checks Replace regex-based .nlogox parsing with xml.etree.ElementTree for proper XML validation. Add three new test classes (19 tests): - TestModelXmlParsing: validates CDATA sections, widgets structure, button/monitor counts, turtle shapes via real XML parsing - TestModelStructure: asserts NetLogo 7.0.3 version, required top-level sections (code, widgets, info, turtleShapes, linkShapes, previewCommands) - TestBehaviorRegression: ensures list syntax for chat-with-template, no deprecated primitives, balanced to/end blocks, globals and owns All 8 original tests preserved and passing (27 total).
Complete rewrite of the crisis-triage demo with 30 incidents (10 misleading, 10 clear, 10 borderline), 3 dispatcher personas (Veteran, Rookie, Analyst), episode-based memory management, and A/B heuristic comparison. Three targeted improvements from live testing: - Routing prompt: explicit severity-to-tier mapping rules replacing vague descriptions, expected to raise route accuracy from ~28% to ~50%+ - Visual pipeline: incidents flow through y-axis zones (spawn near dispatchers, stage after triage, move to responders when routed, graveyard when resolved) - Per-persona accuracy monitors: Veteran/Rookie/Analyst individual accuracy displayed in dedicated UI monitors 29 static validation tests pass.
The 3B model systematically under-triages incidents (defaults to LOW), producing ~22% triage accuracy. The 7B model should better calibrate across the 4-level severity scale.
Triage template: - Add severity calibration anchors (LOW=no injuries/routine, MODERATE=minor injuries/single-agency, HIGH=significant injuries/infrastructure at risk, CRITICAL=life-threatening/multi-agency/cascading) - Simplify output format: severity word first then brief reason, instead of rigid two-line SEVERITY/JUSTIFICATION format Routing prompt: - Show tier capacity context (e.g., BASIC: 2/9) so the model can assess whether a tier is nearly full or barely used
Implements tiered intelligence system with LLM-based severity assessment and resource routing.
What it does:
Architecture:
NetLogo agent tiers + Python LLM integration for intelligent dispatch
Files:
Demonstrates: Hierarchical problem-solving with LLM-based routing