diff --git a/demos/crisis-triage/README.md b/demos/crisis-triage/README.md
new file mode 100644
index 0000000..d7c5509
--- /dev/null
+++ b/demos/crisis-triage/README.md
@@ -0,0 +1,100 @@
+# Demo 2: Crisis Triage with Ambiguous Incidents
+
+A municipal emergency operations center where LLM-powered dispatchers assess ambiguous crisis reports — demonstrating that keyword matching fails when incidents are deliberately misleading, but LLMs reading full impact descriptions can succeed.
+
+Target runtime: NetLogo 7.0.3 (`.nlogox` model format).
+
+## The Story
+
+Three dispatchers — Veteran, Rookie, and Analyst — receive a stream of crisis incidents. Each must assess severity and route to the right response tier. The incident bank includes **misleading cases** where surface keywords don't match reality:
+
+- "Toxic chemical spill at school" → actually spilled vinegar (LOW severity)
+- "Minor water leak in basement" → threatening a neonatal ICU (CRITICAL severity)
+- "Dog loose on highway" → causing a multi-vehicle pileup (HIGH severity)
+
+A naive keyword heuristic over-triggers on "toxic", "fire", "collapse" and fails on these cases. The LLM reads the full impact description and can assess correctly.
+
+## Quick Start
+
+1. Edit `config.txt` with your provider credentials (default: local Ollama).
+2. Open `crisis-triage.nlogox` in NetLogo 7.0.3.
+3. Click **setup** → dispatchers appear with persona labels, responders by tier.
+4. Click **go** → incidents spawn, flow through the pipeline, monitors update.
+5. Watch the output log for `[TRIAGE]`, `[ROUTE]`, and `[REFLECT]` messages.
+
+## How to Use
+
+### Controls
+
+| Control | Type | Purpose |
+|---------|------|---------|
+| `use-llm?` | Switch | Toggle between LLM dispatchers and naive heuristic |
+| `memory-mode` | Chooser | persistent / per-episode / none |
+| `reflection-interval` | Slider | Ticks between dispatcher self-reflection (0 = off) |
+| `incident-rate` | Slider | Probability (%) of new incident per tick |
+| `episode-length` | Slider | Ticks per episode boundary (0 = no episodes) |
+| `add incident` | Button | Manually inject a random incident |
+| `force reflect` | Button | Trigger immediate reflection for all dispatchers |
+
+### What to Observe
+
+- **Misleading%** — The key metric. Accuracy on misleading incidents where keywords don't match reality.
+- **Triage Acc%** / **Route Acc%** — Overall accuracy vs ground truth.
+- **Accuracy Over Time** plot — Watch how accuracy evolves, especially with memory.
+- **Per-persona differences** — Veteran, Rookie, and Analyst may perform differently.
+- **Reflection log** — Dispatchers reason about their own performance.
+
+## The A/B Experiment
+
+1. Run with `use-llm?` ON for 50+ ticks. Note the Misleading% metric.
+2. Click setup again. Toggle `use-llm?` OFF. Run for 50+ ticks.
+3. Compare:
+ - **Heuristic**: ~30% on misleading cases (keywords mislead it).
+ - **LLM**: Expected ~70%+ on misleading cases (reads actual impact).
+4. Compare memory modes: Run with "persistent" vs "none" over multiple episodes.
+
+## LLM Primitives Exercised (8)
+
+| Primitive | Where | Paper Concept |
+|-----------|-------|---------------|
+| `llm:load-config` | `setup-llm` | Config management |
+| `llm:set-history` | `setup-dispatchers` — persona injection | Personalization (Ch.2) |
+| `llm:chat-with-template` | `triage-my-incidents` — severity assessment | Environment/Interface (Ch.1) |
+| `llm:choose` | `route-my-incidents` — bounded tier selection | Bounded Rationality |
+| `llm:history` | `dispatcher-reflect` — check history length | Memory (Ch.3) |
+| `llm:chat` | `dispatcher-reflect` — freeform reflection | Reflection (Ch.3) |
+| `llm:clear-history` | `handle-episode-boundary` — configurable reset | Memory ablation |
+| `llm:active` | Monitor widget — show provider/model | Provider awareness |
+
+## Design Rationale
+
+**Why dispatchers use LLM, not responders**: Triage and routing are judgment calls where reading context matters. Case processing is mechanical — it doesn't benefit from language understanding.
+
+**Why no thinking/reasoning models**: With 3 dispatchers making 2+ LLM calls per tick, thinking models would add minutes of latency per tick. The triage task is classification, not multi-step reasoning. Standard `llm:chat-with-template` and `llm:choose` are the right tools.
+
+**Why `llm:choose` for routing**: Guarantees the output is one of the valid tier names, avoiding parsing failures from freeform text.
+
+**Why misleading incidents**: They make the LLM genuinely necessary. Without them, keyword matching achieves similar accuracy and the LLM adds cost without value.
+
+## Paper Connection
+
+This demo implements concepts from the Gao et al. (2312.11970) LLM-ABM survey:
+
+- **Personalization** (Ch.2): Dispatcher personas via `llm:set-history` produce different decisions from the same model.
+- **Bounded Rationality**: `llm:choose` constrains decisions to valid options.
+- **Memory** (Ch.3): Configurable memory modes show how history retention affects performance.
+- **Reflection** (Ch.3): Dispatchers reason about their own accuracy and identify patterns.
+- **Environment/Interface** (Ch.1): Templates structure how agents perceive incidents.
+
+## Files
+
+| File | Purpose |
+|------|---------|
+| `crisis-triage.nlogox` | NetLogo 7 simulation model |
+| `triage-template.yaml` | Severity assessment prompt with anti-keyword-bias guidance |
+| `dispatcher-template.yaml` | Documentation stub (routing uses `llm:choose`) |
+| `config.txt` | LLM provider configuration |
+
+## Provider Configuration
+
+Default is local Ollama (no API key needed). See commented examples in `config.txt` for OpenAI, Claude, and Gemini. Never commit real API keys.
diff --git a/demos/crisis-triage/config.txt b/demos/crisis-triage/config.txt
new file mode 100644
index 0000000..e927166
--- /dev/null
+++ b/demos/crisis-triage/config.txt
@@ -0,0 +1,34 @@
+# Crisis Triage Demo LLM configuration
+# Path is loaded by crisis-triage.nlogox via llm:load-config
+
+# Recommended local/default option (no cloud key required)
+provider=ollama
+model=qwen2.5:7b
+base_url=http://localhost:11434
+
+# Runtime behavior
+temperature=0.2
+max_tokens=200
+timeout_seconds=45
+
+# Optional cloud fallback examples (commented)
+# provider=openai
+# api_key=YOUR_OPENAI_API_KEY_HERE
+# model=gpt-4o-mini
+# temperature=0.2
+# max_tokens=200
+# timeout_seconds=45
+
+# provider=claude
+# api_key=YOUR_ANTHROPIC_API_KEY_HERE
+# model=claude-3-5-haiku-latest
+# temperature=0.2
+# max_tokens=200
+# timeout_seconds=45
+
+# provider=gemini
+# api_key=YOUR_GEMINI_API_KEY_HERE
+# model=gemini-2.0-flash
+# temperature=0.2
+# max_tokens=200
+# timeout_seconds=45
diff --git a/demos/crisis-triage/crisis-triage.nlogox b/demos/crisis-triage/crisis-triage.nlogox
new file mode 100644
index 0000000..3b84ec6
--- /dev/null
+++ b/demos/crisis-triage/crisis-triage.nlogox
@@ -0,0 +1,1117 @@
+
+
+
+ create-dispatchers 1 [
+ set persona-name item 0 p
+ set persona-prompt item 1 p
+ set my-triaged 0
+ set my-correct-triage 0
+ set my-routed 0
+ set my-correct-route 0
+ set shape "person"
+ set size 2.5
+ set color blue + 2
+ setxy px 14
+ set label persona-name
+ set px px + 7
+
+ ;; Inject persona via llm:set-history if LLM is active
+ if llm-ready? and use-llm? [
+ carefully [
+ llm:set-history (list
+ (list "system" persona-prompt)
+ )
+ ] [
+ output-print (word "[SETUP] Failed to set history for " persona-name ": " error-message)
+ ]
+ ]
+ ]
+ ]
+end
+
+;; ---------------------------------------------------------------------------
+;; Setup Responders (3 BASIC cap=3, 3 EXPERT cap=2, 3 COORDINATOR cap=1)
+;; ---------------------------------------------------------------------------
+
+to setup-responders
+ let base-x -12
+ ;; BASIC responders
+ create-responders 3 [
+ set tier "BASIC"
+ set capacity 3
+ set current-load 0
+ set resolved-count 0
+ set shape "circle"
+ set size 1.5
+ set color green + 1
+ set label "B"
+ ]
+ let idx 0
+ ask responders with [ tier = "BASIC" ] [
+ setxy (base-x + idx * 3) -12
+ set idx idx + 1
+ ]
+
+ ;; EXPERT responders
+ create-responders 3 [
+ set tier "EXPERT"
+ set capacity 2
+ set current-load 0
+ set resolved-count 0
+ set shape "circle"
+ set size 1.8
+ set color orange + 1
+ set label "E"
+ ]
+ set idx 0
+ ask responders with [ tier = "EXPERT" ] [
+ setxy (base-x + 10 + idx * 3) -12
+ set idx idx + 1
+ ]
+
+ ;; COORDINATOR responders
+ create-responders 3 [
+ set tier "COORDINATOR"
+ set capacity 1
+ set current-load 0
+ set resolved-count 0
+ set shape "circle"
+ set size 2.1
+ set color violet + 1
+ set label "C"
+ ]
+ set idx 0
+ ask responders with [ tier = "COORDINATOR" ] [
+ setxy (base-x + 20 + idx * 3) -12
+ set idx idx + 1
+ ]
+end
+
+;; ---------------------------------------------------------------------------
+;; Incident Bank (30 incidents: 10 misleading + 10 clear + 10 borderline)
+;; ---------------------------------------------------------------------------
+
+to build-incident-bank
+ ;; Each entry: [summary impact ground-truth-severity ground-truth-tier category]
+ ;; MISLEADING: keywords suggest one severity but actual impact warrants another
+ set incident-bank (list
+ ;; --- MISLEADING (10): keywords mislead naive classifiers ---
+ (list "Server room fire alarm triggered"
+ "Sensor malfunction confirmed; no smoke, no heat, systems normal"
+ "LOW" "BASIC" "misleading")
+ (list "Toxic chemical spill reported at school"
+ "Custodian knocked over a bottle of vinegar in the cafeteria; no injuries"
+ "LOW" "BASIC" "misleading")
+ (list "Building collapse alert downtown"
+ "Scaffolding panel fell on empty sidewalk at 3 AM; no structural damage to building"
+ "LOW" "BASIC" "misleading")
+ (list "Mass casualty incident at concert venue"
+ "Three people fainted from heat; all conscious, first aid on scene"
+ "MODERATE" "BASIC" "misleading")
+ (list "Explosion heard near hospital"
+ "Transformer blew on adjacent street; hospital on backup power, no injuries"
+ "MODERATE" "EXPERT" "misleading")
+ (list "Data center cooling failure"
+ "Regional hospital patient records, 911 dispatch system, and pharmacy networks all depend on this center; 30 minutes to critical thermal threshold"
+ "CRITICAL" "COORDINATOR" "misleading")
+ (list "Minor water leak in basement"
+ "Leak is in the electrical vault supplying the neonatal ICU; backup generators have 45 minutes of fuel"
+ "CRITICAL" "COORDINATOR" "misleading")
+ (list "Small kitchen fire at restaurant"
+ "Fire spreading to adjacent apartment building; 40 residents trapped above; fire department 20 minutes away"
+ "CRITICAL" "COORDINATOR" "misleading")
+ (list "Routine power fluctuation reported"
+ "Affecting traffic signals across 12 intersections during school dismissal; two near-miss accidents already"
+ "HIGH" "EXPERT" "misleading")
+ (list "Dog loose on highway"
+ "Causing multi-vehicle chain reaction on I-95; 6 cars involved, injuries reported, highway blocked both directions"
+ "HIGH" "EXPERT" "misleading")
+
+ ;; --- CLEAR (10): keywords and impact align ---
+ (list "Multi-vehicle pileup on interstate"
+ "12 vehicles, multiple injuries confirmed, highway fully blocked, EMS requesting additional units"
+ "CRITICAL" "COORDINATOR" "clear")
+ (list "Warehouse fire with toxic plume"
+ "Residential area downwind being evacuated; 500+ people displaced; air quality hazardous"
+ "CRITICAL" "COORDINATOR" "clear")
+ (list "Earthquake damage to bridge"
+ "Visible structural cracks; bridge closed; 50,000 daily commuters affected; engineers en route"
+ "CRITICAL" "COORDINATOR" "clear")
+ (list "School bus accident with injuries"
+ "Bus overturned; 8 children with minor-moderate injuries; parents arriving at scene"
+ "HIGH" "EXPERT" "clear")
+ (list "Chemical plant pressure valve failure"
+ "Controlled venting in progress; shelter-in-place advisory for 2-mile radius; monitoring air quality"
+ "HIGH" "EXPERT" "clear")
+ (list "Hospital generator test failure"
+ "Backup generator failed routine test; primary power stable; repair crew dispatched for same-day fix"
+ "MODERATE" "BASIC" "clear")
+ (list "Broken water main on residential street"
+ "Low-pressure water to 30 homes; repair crew en route; estimated 4-hour fix"
+ "MODERATE" "BASIC" "clear")
+ (list "Traffic signal malfunction at intersection"
+ "Single intersection flashing red; police directing traffic; no accidents"
+ "LOW" "BASIC" "clear")
+ (list "Park trail flooding after rain"
+ "Trails closed; no hikers in area; water receding naturally"
+ "LOW" "BASIC" "clear")
+ (list "Streetlight outage on residential block"
+ "Six streetlights out; residents notified; maintenance scheduled for morning"
+ "LOW" "BASIC" "clear")
+
+ ;; --- BORDERLINE (10): genuinely ambiguous, reasonable people could disagree ---
+ (list "Subway train stalled between stations"
+ "200 passengers stuck for 25 minutes; ventilation working; rescue train dispatched; some passengers anxious"
+ "MODERATE" "EXPERT" "borderline")
+ (list "Power outage at nursing home"
+ "Backup generator active; 60 residents comfortable; generator fuel for 8 hours; utility ETA unknown"
+ "HIGH" "EXPERT" "borderline")
+ (list "Gas smell reported near elementary school"
+ "School in session; gas company en route; no readings yet; precautionary evacuation being considered"
+ "HIGH" "EXPERT" "borderline")
+ (list "Protest blocking major intersection"
+ "500 people; peaceful but not dispersing; ambulance rerouting adds 8 minutes to hospital route"
+ "MODERATE" "EXPERT" "borderline")
+ (list "Crane malfunction at construction site"
+ "Crane arm stuck over occupied building; no immediate danger but wind advisory in effect for afternoon"
+ "HIGH" "EXPERT" "borderline")
+ (list "River level rising near flood stage"
+ "2 feet below flood level; rain expected to continue 6 hours; 200 homes in potential flood zone"
+ "HIGH" "COORDINATOR" "borderline")
+ (list "Suspicious package at government building"
+ "Building evacuated; bomb squad 15 minutes away; 300 workers displaced; likely false alarm based on description"
+ "MODERATE" "EXPERT" "borderline")
+ (list "Internet outage affecting emergency services"
+ "911 calls routing to backup center; 12-second additional delay per call; estimated 2-hour repair"
+ "HIGH" "EXPERT" "borderline")
+ (list "Heat wave shelter capacity reached"
+ "Main cooling center full at 150 people; overflow into library planned; 3 elderly residents showing heat stress"
+ "MODERATE" "EXPERT" "borderline")
+ (list "Airport runway incursion reported"
+ "Ground vehicle crossed active runway; no aircraft in immediate path; runway closed for inspection"
+ "MODERATE" "EXPERT" "borderline")
+ )
+end
+
+;; ===========================================================================
+;; GO LOOP
+;; ===========================================================================
+
+to go
+ ;; Episode boundary check
+ handle-episode-boundary
+
+ ;; Spawn new incidents
+ if random 100 < incident-rate [
+ spawn-incident
+ ]
+
+ ;; Dispatchers triage and route
+ ask dispatchers [
+ triage-my-incidents
+ route-my-incidents
+ ]
+
+ ;; Responders process active cases
+ process-active-cases
+
+ ;; Check deadlines
+ check-deadlines
+
+ ;; Reflection at intervals
+ if reflection-interval > 0 and ticks > 0 and ticks mod reflection-interval = 0 [
+ ask dispatchers [
+ dispatcher-reflect
+ ]
+ ]
+
+ set episode-tick-counter episode-tick-counter + 1
+ tick
+end
+
+;; ===========================================================================
+;; INCIDENT SPAWNING
+;; ===========================================================================
+
+to spawn-incident
+ let picked one-of incident-bank
+ create-incidents 1 [
+ set summary item 0 picked
+ set impact item 1 picked
+ set ground-truth-severity item 2 picked
+ set ground-truth-tier item 3 picked
+ set incident-category item 4 picked
+ set assessed-severity ""
+ set assessed-tier ""
+ set queue-state "new"
+ set triage-correct? false
+ set route-correct? false
+ set created-at ticks
+ set assigned-responder nobody
+
+ ;; Deadline: severity-dependent time window
+ let window severity-deadline ground-truth-severity
+ set deadline ticks + window
+
+ set shape "circle"
+ set size 1.0
+ set color yellow
+ setxy (random-xcor * 0.5) (9 + random 3)
+ set label ""
+ ]
+end
+
+;; Manual incident injection button
+to add-incident
+ spawn-incident
+ output-print "[MANUAL] Incident added"
+end
+
+to-report severity-deadline [ sev ]
+ if sev = "LOW" [ report 30 ]
+ if sev = "MODERATE" [ report 20 ]
+ if sev = "HIGH" [ report 12 ]
+ report 8 ;; CRITICAL
+end
+
+;; ===========================================================================
+;; TRIAGE (dispatchers assess severity via llm:chat-with-template)
+;; ===========================================================================
+
+to triage-my-incidents
+ ;; Each dispatcher picks one untriaged incident per tick
+ let target one-of incidents with [ queue-state = "new" ]
+ if target = nobody [ stop ]
+
+ let sev ""
+
+ ifelse llm-ready? and use-llm? [
+ ;; LLM triage via template
+ carefully [
+ let response llm:chat-with-template triage-template-path (list
+ (list "persona" persona-prompt)
+ (list "episode" (word current-episode))
+ (list "tick" (word ticks))
+ (list "incident" [summary] of target)
+ (list "impact" [impact] of target)
+ )
+ set sev extract-severity response
+ output-print (word "[TRIAGE:" persona-name "] " [summary] of target " -> " sev)
+ ] [
+ output-print (word "[TRIAGE:" persona-name "] LLM failed: " error-message)
+ set sev ""
+ ]
+ ] [
+ ;; Heuristic triage (naive keyword matching — deliberately bad on misleading cases)
+ set sev heuristic-triage [summary] of target [impact] of target
+ output-print (word "[TRIAGE:heuristic] " [summary] of target " -> " sev)
+ ]
+
+ ;; Fallback if empty
+ if sev = "" [ set sev "MODERATE" ]
+
+ ;; Score
+ let truth [ground-truth-severity] of target
+ let is-correct? (sev = truth)
+
+ set total-triaged total-triaged + 1
+ set my-triaged my-triaged + 1
+ if is-correct? [
+ set correct-triage correct-triage + 1
+ set my-correct-triage my-correct-triage + 1
+ ]
+ if [incident-category] of target = "misleading" [
+ set misleading-triaged misleading-triaged + 1
+ if is-correct? [ set misleading-correct misleading-correct + 1 ]
+ ]
+
+ ask target [
+ set assessed-severity sev
+ set triage-correct? is-correct?
+ set queue-state "triaged"
+ set color severity-color sev
+ setxy xcor (3 + random 3)
+ ]
+end
+
+;; Heuristic triage: deliberately naive keyword matching
+to-report heuristic-triage [ s i ]
+ let text (word s " " i)
+ ;; Keywords that trigger high severity regardless of actual impact
+ if has-word? text "fire" [ report "CRITICAL" ]
+ if has-word? text "explosion" [ report "CRITICAL" ]
+ if has-word? text "collapse" [ report "CRITICAL" ]
+ if has-word? text "toxic" [ report "CRITICAL" ]
+ if has-word? text "casualty" [ report "CRITICAL" ]
+ if has-word? text "chemical" [ report "HIGH" ]
+ if has-word? text "trapped" [ report "CRITICAL" ]
+ if has-word? text "spill" [ report "HIGH" ]
+ if has-word? text "suspicious" [ report "HIGH" ]
+ if has-word? text "earthquake" [ report "CRITICAL" ]
+ if has-word? text "flood" [ report "HIGH" ]
+ if has-word? text "outage" [ report "HIGH" ]
+ if has-word? text "injuries" [ report "HIGH" ]
+ if has-word? text "accident" [ report "HIGH" ]
+ if has-word? text "alarm" [ report "HIGH" ]
+ if has-word? text "evacuat" [ report "CRITICAL" ]
+ ;; Default for anything without scary keywords
+ report "MODERATE"
+end
+
+to-report has-word? [ text word-fragment ]
+ report position word-fragment text != false or position (lower-case-first word-fragment) text != false
+end
+
+to-report lower-case-first [ s ]
+ ;; Simple helper: just return the string as-is since NetLogo string matching is case-sensitive
+ ;; and our keywords are already lowercase
+ report s
+end
+
+to-report extract-severity [ response ]
+ if position "CRITICAL" response != false [ report "CRITICAL" ]
+ if position "HIGH" response != false [ report "HIGH" ]
+ if position "MODERATE" response != false [ report "MODERATE" ]
+ if position "LOW" response != false [ report "LOW" ]
+ report ""
+end
+
+to-report severity-color [ sev ]
+ if sev = "LOW" [ report 55 ] ;; green
+ if sev = "MODERATE" [ report 45 ] ;; yellow-green
+ if sev = "HIGH" [ report 25 ] ;; orange
+ if sev = "CRITICAL" [ report 15 ] ;; red
+ report 5 ;; grey
+end
+
+;; ===========================================================================
+;; ROUTING (dispatchers route via llm:choose)
+;; ===========================================================================
+
+to route-my-incidents
+ let target one-of incidents with [ queue-state = "triaged" ]
+ if target = nobody [ stop ]
+
+ let chosen-tier ""
+ let choices (list "BASIC" "EXPERT" "COORDINATOR" "HOLD")
+
+ ifelse llm-ready? and use-llm? [
+ ;; LLM routing via llm:choose
+ carefully [
+ let prompt (word
+ "Incident: " [summary] of target "\n"
+ "Severity: " [assessed-severity] of target "\n"
+ "Impact: " [impact] of target "\n"
+ "Current load — BASIC: " count-active-tier "BASIC" "/9"
+ ", EXPERT: " count-active-tier "EXPERT" "/6"
+ ", COORDINATOR: " count-active-tier "COORDINATOR" "/3" "\n"
+ "Routing rules based on severity:\n"
+ " - LOW severity -> BASIC\n"
+ " - MODERATE severity -> BASIC (or EXPERT if BASIC is full)\n"
+ " - HIGH severity -> EXPERT\n"
+ " - CRITICAL severity -> COORDINATOR\n"
+ " - HOLD only if the appropriate tier AND all higher tiers are at capacity.\n"
+ "The assessed severity for this incident is " [assessed-severity] of target ". Apply the rules above."
+ )
+ set chosen-tier llm:choose prompt choices
+ output-print (word "[ROUTE:" persona-name "] " [summary] of target " -> " chosen-tier)
+ ] [
+ output-print (word "[ROUTE:" persona-name "] LLM choose failed: " error-message)
+ set chosen-tier ""
+ ]
+ ] [
+ ;; Heuristic routing
+ set chosen-tier heuristic-route [assessed-severity] of target
+ output-print (word "[ROUTE:heuristic] " [summary] of target " -> " chosen-tier)
+ ]
+
+ if chosen-tier = "" [ set chosen-tier heuristic-route [assessed-severity] of target ]
+ if chosen-tier = "HOLD" [
+ output-print (word "[HOLD] " [summary] of target " — waiting for capacity")
+ stop
+ ]
+
+ ;; Find available responder in chosen tier
+ let worker find-responder chosen-tier
+ if worker = nobody [
+ ;; Try escalation
+ set worker find-responder escalation-tier chosen-tier
+ if worker != nobody [
+ set total-escalated total-escalated + 1
+ set chosen-tier [tier] of worker
+ ]
+ ]
+ if worker = nobody [ stop ] ;; No capacity anywhere
+
+ ;; Score routing
+ let truth [ground-truth-tier] of target
+ let is-correct? (chosen-tier = truth)
+ set total-routed total-routed + 1
+ set my-routed my-routed + 1
+ if is-correct? [
+ set correct-route correct-route + 1
+ set my-correct-route my-correct-route + 1
+ ]
+
+ ask worker [
+ set current-load current-load + 1
+ ]
+
+ ask target [
+ set assessed-tier chosen-tier
+ set route-correct? is-correct?
+ set queue-state "active"
+ set assigned-responder worker
+ ;; Move toward responder zone
+ setxy ([xcor] of worker + random-float 2 - 1) ([ycor] of worker + 3)
+ set label ""
+ ]
+end
+
+to-report heuristic-route [ sev ]
+ if sev = "LOW" [ report "BASIC" ]
+ if sev = "MODERATE" [ report "BASIC" ]
+ if sev = "HIGH" [ report "EXPERT" ]
+ report "COORDINATOR"
+end
+
+to-report escalation-tier [ current-tier ]
+ if current-tier = "BASIC" [ report "EXPERT" ]
+ if current-tier = "EXPERT" [ report "COORDINATOR" ]
+ report "COORDINATOR"
+end
+
+to-report find-responder [ tier-name ]
+ let candidates responders with [ tier = tier-name and current-load < capacity ]
+ ifelse any? candidates [
+ report min-one-of candidates [ current-load ]
+ ] [
+ report nobody
+ ]
+end
+
+to-report count-active-tier [ tier-name ]
+ report count incidents with [ queue-state = "active" and assessed-tier = tier-name ]
+end
+
+;; ===========================================================================
+;; PROCESSING + DEADLINES
+;; ===========================================================================
+
+to process-active-cases
+ ask incidents with [ queue-state = "active" ] [
+ let chance completion-probability assessed-tier
+ if random-float 1 < chance [
+ resolve-incident self
+ ]
+ ]
+end
+
+to-report completion-probability [ tier-name ]
+ if tier-name = "BASIC" [ report 0.15 ]
+ if tier-name = "EXPERT" [ report 0.20 ]
+ if tier-name = "COORDINATOR" [ report 0.25 ]
+ report 0.10
+end
+
+to resolve-incident [ inc ]
+ let worker [assigned-responder] of inc
+ if worker != nobody [
+ ask worker [
+ set current-load max (list 0 (current-load - 1))
+ set resolved-count resolved-count + 1
+ ]
+ ]
+
+ set total-resolved total-resolved + 1
+ set total-response-ticks total-response-ticks + (ticks - [created-at] of inc)
+
+ ask inc [
+ set queue-state "resolved"
+ set color grey + 2
+ set size 0.6
+ setxy xcor (-15 + random-float 1)
+ set label ""
+ ]
+end
+
+to check-deadlines
+ ask incidents with [ queue-state = "active" and ticks > deadline ] [
+ set queue-state "late"
+ set total-late total-late + 1
+ set color magenta
+ output-print (word "[LATE] " summary " — exceeded deadline at tick " ticks)
+
+ ;; Try to escalate late cases
+ let current-tier assessed-tier
+ let higher-tier escalation-tier current-tier
+ if higher-tier != current-tier [
+ let new-worker find-responder higher-tier
+ if new-worker != nobody [
+ ;; Release old responder
+ if assigned-responder != nobody [
+ ask assigned-responder [
+ set current-load max (list 0 (current-load - 1))
+ ]
+ ]
+ ask new-worker [ set current-load current-load + 1 ]
+ set assigned-responder new-worker
+ set assessed-tier higher-tier
+ set queue-state "active"
+ set total-escalated total-escalated + 1
+ output-print (word "[ESCALATE] " summary " -> " higher-tier)
+ ]
+ ]
+ ]
+
+ ;; Also let late-but-still-processing cases resolve
+ ask incidents with [ queue-state = "late" ] [
+ let chance completion-probability assessed-tier
+ if random-float 1 < chance [
+ resolve-incident self
+ ]
+ ]
+end
+
+;; ===========================================================================
+;; REFLECTION (dispatchers reflect on performance via llm:chat)
+;; ===========================================================================
+
+to dispatcher-reflect
+ if not llm-ready? or not use-llm? [ stop ]
+ if my-triaged = 0 [ stop ]
+
+ ;; Only reflect if enough history accumulated
+ let hist-len 0
+ carefully [
+ set hist-len length llm:history
+ ] [
+ set hist-len 0
+ ]
+ if hist-len < 4 [ stop ]
+
+ let my-triage-acc ifelse-value (my-triaged > 0) [ precision (my-correct-triage / my-triaged * 100) 1 ] [ 0 ]
+ let my-route-acc ifelse-value (my-routed > 0) [ precision (my-correct-route / my-routed * 100) 1 ] [ 0 ]
+
+ carefully [
+ let reflection llm:chat (word
+ "REFLECTION — You are " persona-name " dispatcher. Review your performance:\n"
+ "Triage accuracy: " my-triage-acc "% (" my-correct-triage "/" my-triaged ")\n"
+ "Routing accuracy: " my-route-acc "% (" my-correct-route "/" my-routed ")\n"
+ "Episode: " current-episode ", Tick: " ticks "\n"
+ "What patterns are you noticing? What would you do differently? "
+ "Keep your reflection to 2-3 sentences."
+ )
+ output-print (word "[REFLECT:" persona-name "] " reflection)
+ ] [
+ output-print (word "[REFLECT:" persona-name "] Failed: " error-message)
+ ]
+end
+
+;; Manual reflection trigger
+to force-reflect
+ ask dispatchers [ dispatcher-reflect ]
+end
+
+;; ===========================================================================
+;; EPISODE BOUNDARY + MEMORY MANAGEMENT
+;; ===========================================================================
+
+to handle-episode-boundary
+ if episode-length = 0 [ stop ] ;; No episode boundaries
+ if episode-tick-counter < episode-length [ stop ]
+
+ ;; Episode ended
+ set current-episode current-episode + 1
+ set episode-tick-counter 0
+ output-print (word "[EPISODE] Starting episode " current-episode " | Memory mode: " memory-mode)
+
+ ask dispatchers [
+ if memory-mode = "per-episode" [
+ ;; Clear and re-inject persona
+ carefully [
+ llm:clear-history
+ llm:set-history (list
+ (list "system" persona-prompt)
+ )
+ output-print (word "[MEMORY:" persona-name "] History cleared, persona re-injected")
+ ] [
+ output-print (word "[MEMORY:" persona-name "] Reset failed: " error-message)
+ ]
+ ]
+ if memory-mode = "none" [
+ ;; Clear everything every episode
+ carefully [
+ llm:clear-history
+ output-print (word "[MEMORY:" persona-name "] History fully cleared")
+ ] [
+ output-print (word "[MEMORY:" persona-name "] Clear failed: " error-message)
+ ]
+ ]
+ ;; "persistent" mode: do nothing, history accumulates
+ ]
+end
+
+;; ===========================================================================
+;; METRIC REPORTERS
+;; ===========================================================================
+
+to-report triage-accuracy
+ ifelse total-triaged > 0
+ [ report precision (correct-triage / total-triaged * 100) 1 ]
+ [ report 0 ]
+end
+
+to-report route-accuracy
+ ifelse total-routed > 0
+ [ report precision (correct-route / total-routed * 100) 1 ]
+ [ report 0 ]
+end
+
+to-report late-rate
+ let total-dispatched total-routed
+ ifelse total-dispatched > 0
+ [ report precision (total-late / total-dispatched * 100) 1 ]
+ [ report 0 ]
+end
+
+to-report escalation-rate
+ ifelse total-routed > 0
+ [ report precision (total-escalated / total-routed * 100) 1 ]
+ [ report 0 ]
+end
+
+to-report avg-response-time
+ ifelse total-resolved > 0
+ [ report precision (total-response-ticks / total-resolved) 1 ]
+ [ report 0 ]
+end
+
+to-report misleading-accuracy
+ ifelse misleading-triaged > 0
+ [ report precision (misleading-correct / misleading-triaged * 100) 1 ]
+ [ report 0 ]
+end
+
+to-report persona-accuracy-report
+ report (word
+ map [ d ->
+ (word [persona-name] of d ": "
+ ifelse-value ([my-triaged] of d > 0)
+ [ (word precision ([my-correct-triage] of d / [my-triaged] of d * 100) 0 "%") ]
+ [ "N/A" ]
+ )
+ ] sort dispatchers
+ )
+end
+
+to-report veteran-accuracy
+ let d one-of dispatchers with [persona-name = "Veteran"]
+ if d = nobody [ report "N/A" ]
+ ifelse [my-triaged] of d > 0
+ [ report (word precision ([my-correct-triage] of d / [my-triaged] of d * 100) 0 "%") ]
+ [ report "N/A" ]
+end
+
+to-report rookie-accuracy
+ let d one-of dispatchers with [persona-name = "Rookie"]
+ if d = nobody [ report "N/A" ]
+ ifelse [my-triaged] of d > 0
+ [ report (word precision ([my-correct-triage] of d / [my-triaged] of d * 100) 0 "%") ]
+ [ report "N/A" ]
+end
+
+to-report analyst-accuracy
+ let d one-of dispatchers with [persona-name = "Analyst"]
+ if d = nobody [ report "N/A" ]
+ ifelse [my-triaged] of d > 0
+ [ report (word precision ([my-correct-triage] of d / [my-triaged] of d * 100) 0 "%") ]
+ [ report "N/A" ]
+end
+
+to-report llm-status
+ let result "N/A"
+ carefully [
+ set result (word llm:active)
+ ] [
+ ;; keep default
+ ]
+ report result
+end
+
+to-report queue-new-count
+ report count incidents with [ queue-state = "new" ]
+end
+
+to-report queue-triaged-count
+ report count incidents with [ queue-state = "triaged" ]
+end
+
+to-report queue-active-count
+ report count incidents with [ queue-state = "active" or queue-state = "late" ]
+end
+
+to-report queue-resolved-count
+ report count incidents with [ queue-state = "resolved" ]
+end
+]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ llm-status
+ current-episode
+ memory-mode
+ queue-new-count
+ queue-triaged-count
+ queue-active-count
+ triage-accuracy
+ route-accuracy
+ misleading-accuracy
+ avg-response-time
+ veteran-accuracy
+ rookie-accuracy
+ analyst-accuracy
+ late-rate
+ escalation-rate
+ total-resolved
+
+
+
+
+
+ plot triage-accuracy
+
+
+
+ plot route-accuracy
+
+
+
+ plot misleading-accuracy
+
+
+
+
+
+
+
+ plot queue-new-count
+
+
+
+ plot queue-active-count
+
+
+
+ plot total-resolved
+
+
+
+ plot total-late
+
+
+
+
+ ## Crisis Triage with Ambiguous Incidents
+
+### The Story
+
+A municipal emergency operations center receives a stream of crisis reports. Three dispatchers — a Veteran, a Rookie, and an Analyst — must assess each incident's severity and route it to the appropriate response tier (Basic, Expert, or Coordinator).
+
+The twist: many incidents are **deliberately misleading**. A "toxic chemical spill at a school" turns out to be spilled vinegar. A "minor water leak" threatens a neonatal ICU. Naive keyword matching fails on these cases — but an LLM reading the full impact description can get them right.
+
+### What This Demonstrates
+
+This demo exercises 8 LLM extension primitives, grounded in the Gao et al. (2312.11970) LLM-ABM survey:
+
+| Primitive | Where Used | Paper Concept |
+|-----------|-----------|---------------|
+| `llm:load-config` | Setup | Config management |
+| `llm:set-history` | Dispatcher personas | Personalization (Ch.2) |
+| `llm:chat-with-template` | Severity triage | Environment/Interface (Ch.1) |
+| `llm:choose` | Tier routing | Bounded Rationality |
+| `llm:history` | Reflection trigger | Memory (Ch.3) |
+| `llm:chat` | Dispatcher reflection | Reflection (Ch.3) |
+| `llm:clear-history` | Episode boundaries | Memory ablation |
+| `llm:active` | Status monitor | Provider awareness |
+
+### Quick Start
+
+1. Edit `config.txt` with your provider credentials (default: local Ollama).
+2. Click **setup**.
+3. Click **go**.
+4. Watch the output log for `[TRIAGE]`, `[ROUTE]`, and `[REFLECT]` messages.
+5. Compare the **Misleading%** monitor — this is where the LLM shines vs heuristics.
+
+### The A/B Experiment
+
+Toggle **use-llm?** OFF to switch to pure heuristic mode:
+
+- **Heuristic mode**: Keyword matching triggers on "fire", "toxic", "collapse" etc. Works fine on clear cases (~70%) but scores ~30% on misleading cases where keywords don't match reality.
+- **LLM mode**: Reads the full impact description. Expected ~70%+ on misleading cases.
+
+Run both modes for 50+ ticks and compare the Accuracy Over Time plot.
+
+### Controls
+
+- **use-llm?**: Toggle between LLM dispatchers and naive heuristic
+- **memory-mode**: How dispatcher memory works across episodes
+ - *persistent*: Full conversation history retained
+ - *per-episode*: History cleared each episode, persona re-injected
+ - *none*: History cleared each episode, no persona
+- **reflection-interval**: How often dispatchers reflect on their performance (0 = never)
+- **incident-rate**: Probability (%) of a new incident each tick
+- **episode-length**: Ticks per episode (0 = no episodes)
+
+### What to Observe
+
+- **Triage Acc%**: How often dispatchers match ground-truth severity
+- **Misleading%**: Accuracy specifically on misleading incidents (the key metric)
+- **Route Acc%**: How often incidents go to the correct response tier
+- **Per-persona differences**: Veteran vs Rookie vs Analyst performance
+- **Reflection output**: Watch dispatchers reason about their own performance in the log
+- **Memory effects**: Compare persistent vs per-episode vs none over multiple episodes
+
+### Design Rationale
+
+**Why dispatchers (not responders) use LLM**: Triage and routing are judgment calls where context matters. Processing is mechanical — it doesn't benefit from language understanding.
+
+**Why no thinking/reasoning models**: Speed (3 dispatchers x 2 calls/tick would take minutes with thinking), cost (300+ calls per session), and overkill for classification tasks.
+
+**Why `llm:choose` for routing**: Guarantees output is one of the valid tiers, avoiding parsing failures. The extension handles fuzzy matching and falls back to random choice if the LLM response can't be parsed.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ setup repeat 30 [ go ]
+
diff --git a/demos/crisis-triage/dispatcher-template.yaml b/demos/crisis-triage/dispatcher-template.yaml
new file mode 100644
index 0000000..f018c6d
--- /dev/null
+++ b/demos/crisis-triage/dispatcher-template.yaml
@@ -0,0 +1,17 @@
+# ABOUTME: Documentation stub for the dispatcher routing step.
+# ABOUTME: Routing now uses llm:choose for bounded tier selection instead of template parsing.
+#
+# This file is kept for reference. The actual routing in crisis-triage.nlogox
+# uses llm:choose with choices ["BASIC" "EXPERT" "COORDINATOR" "HOLD"],
+# which guarantees the response is one of the valid tiers.
+#
+# The dispatcher's conversational context (persona, history) is maintained
+# via llm:set-history and accumulated through llm:chat-with-template calls.
+system: "You are a crisis operations dispatcher. Route incidents to the appropriate response tier."
+template: |
+ Severity: {severity}
+ Incident: {incident}
+ Current load — BASIC: {basic_load}, EXPERT: {expert_load}, COORDINATOR: {coordinator_load}
+
+ Choose the best response tier considering severity and current workload.
+ Respond with EXACTLY ONE of: BASIC, EXPERT, COORDINATOR, HOLD
diff --git a/demos/crisis-triage/tests/README.md b/demos/crisis-triage/tests/README.md
new file mode 100644
index 0000000..edd3703
--- /dev/null
+++ b/demos/crisis-triage/tests/README.md
@@ -0,0 +1,20 @@
+# Crisis Triage Demo Tests
+
+Run from repository root:
+
+```bash
+python -m unittest discover -s demos/crisis-triage/tests -p "test_*.py" -v
+```
+
+These tests validate (29 tests, no API calls):
+
+- Presence of all required demo files
+- Breed declarations (dispatchers, incidents, responders)
+- Required procedures (setup, triage, routing, reflection, episode boundary)
+- All 8 LLM primitives present in code
+- Template placeholder consistency with model substitutions
+- Config key completeness and max_tokens=200
+- README documentation sections
+- XML structure (widgets, shapes, plots, CDATA)
+- Incident bank has 30 entries (10 misleading + 10 clear + 10 borderline)
+- Procedure block matching (every `to` has an `end`)
diff --git a/demos/crisis-triage/tests/__pycache__/test_crisis_triage.cpython-312.pyc b/demos/crisis-triage/tests/__pycache__/test_crisis_triage.cpython-312.pyc
new file mode 100644
index 0000000..78a8e71
Binary files /dev/null and b/demos/crisis-triage/tests/__pycache__/test_crisis_triage.cpython-312.pyc differ
diff --git a/demos/crisis-triage/tests/test_crisis_triage.py b/demos/crisis-triage/tests/test_crisis_triage.py
new file mode 100644
index 0000000..183920d
--- /dev/null
+++ b/demos/crisis-triage/tests/test_crisis_triage.py
@@ -0,0 +1,287 @@
+# ABOUTME: Static validation tests for the crisis triage demo.
+# ABOUTME: Tests file structure, XML format, code structure, and template consistency.
+
+import re
+import unittest
+import xml.etree.ElementTree as ET
+from pathlib import Path
+
+
+DEMO_DIR = Path(__file__).resolve().parents[1]
+MODEL_PATH = DEMO_DIR / "crisis-triage.nlogox"
+TRIAGE_TEMPLATE_PATH = DEMO_DIR / "triage-template.yaml"
+DISPATCHER_TEMPLATE_PATH = DEMO_DIR / "dispatcher-template.yaml"
+CONFIG_PATH = DEMO_DIR / "config.txt"
+README_PATH = DEMO_DIR / "README.md"
+
+
+def read(path: Path) -> str:
+ return path.read_text(encoding="utf-8")
+
+
+def parse_model() -> ET.Element:
+ return ET.parse(MODEL_PATH).getroot()
+
+
+def model_code_only() -> str:
+ root = parse_model()
+ code_elem = root.find("code")
+ if code_elem is None or code_elem.text is None:
+ raise AssertionError("unable to extract content from model XML")
+ return code_elem.text
+
+
+def parse_config(path: Path) -> dict[str, str]:
+ data: dict[str, str] = {}
+ for raw in read(path).splitlines():
+ line = raw.strip()
+ if not line or line.startswith("#"):
+ continue
+ if "=" not in line:
+ continue
+ key, value = line.split("=", 1)
+ data[key.strip()] = value.strip()
+ return data
+
+
+class TestCrisisTriageArtifacts(unittest.TestCase):
+ def test_required_files_exist(self) -> None:
+ required = [
+ MODEL_PATH,
+ TRIAGE_TEMPLATE_PATH,
+ DISPATCHER_TEMPLATE_PATH,
+ CONFIG_PATH,
+ README_PATH,
+ ]
+ for path in required:
+ self.assertTrue(path.exists(), f"missing file: {path}")
+
+ def test_model_declares_breeds(self) -> None:
+ code = model_code_only()
+ self.assertIn("breed [ dispatchers dispatcher ]", code)
+ self.assertIn("breed [ incidents incident ]", code)
+ self.assertIn("breed [ responders responder ]", code)
+
+ def test_model_contains_required_procedures(self) -> None:
+ code = model_code_only()
+ procedures = [
+ "to setup",
+ "to setup-llm",
+ "to setup-dispatchers",
+ "to setup-responders",
+ "to go",
+ "to triage-my-incidents",
+ "to route-my-incidents",
+ "to process-active-cases",
+ "to dispatcher-reflect",
+ "to handle-episode-boundary",
+ ]
+ for proc in procedures:
+ self.assertIn(proc, code, f"missing procedure: {proc}")
+
+ def test_model_uses_llm_config_and_template(self) -> None:
+ code = model_code_only()
+ self.assertIn('set config-path "demos/crisis-triage/config.txt"', code)
+ self.assertIn('set triage-template-path "demos/crisis-triage/triage-template.yaml"', code)
+ self.assertIn("llm:chat-with-template triage-template-path", code)
+
+ def test_model_uses_all_eight_primitives(self) -> None:
+ code = model_code_only()
+ primitives = [
+ "llm:load-config",
+ "llm:set-history",
+ "llm:chat-with-template",
+ "llm:choose",
+ "llm:history",
+ "llm:chat",
+ "llm:clear-history",
+ "llm:active",
+ ]
+ for prim in primitives:
+ self.assertIn(prim, code, f"missing LLM primitive: {prim}")
+
+ def test_triage_template_placeholders_match_model(self) -> None:
+ template = read(TRIAGE_TEMPLATE_PATH)
+ placeholders = set(re.findall(r"\{([a-zA-Z_][a-zA-Z0-9_]*)\}", template))
+ self.assertEqual(
+ placeholders,
+ {"persona", "episode", "tick", "incident", "impact"},
+ )
+
+ def test_config_has_required_keys(self) -> None:
+ config = parse_config(CONFIG_PATH)
+ for key in ["provider", "model", "temperature", "max_tokens", "timeout_seconds"]:
+ self.assertIn(key, config, f"missing key in config: {key}")
+
+ def test_config_max_tokens_is_200(self) -> None:
+ config = parse_config(CONFIG_PATH)
+ self.assertEqual(config["max_tokens"], "200")
+
+ def test_readme_has_core_sections(self) -> None:
+ readme = read(README_PATH)
+ for text in [
+ "Quick Start",
+ "A/B Experiment",
+ "Design Rationale",
+ "Paper Connection",
+ ]:
+ self.assertIn(text, readme)
+
+
+class TestModelXmlParsing(unittest.TestCase):
+ def setUp(self) -> None:
+ self.root = parse_model()
+
+ def test_model_parses_as_valid_xml(self) -> None:
+ self.assertEqual(self.root.tag, "model")
+
+ def test_code_element_contains_cdata_content(self) -> None:
+ code_elem = self.root.find("code")
+ self.assertIsNotNone(code_elem, "missing element")
+ self.assertIsNotNone(code_elem.text, " element has no text content")
+ self.assertIn("extensions [ llm ]", code_elem.text)
+
+ def test_raw_file_preserves_cdata_wrapping(self) -> None:
+ raw = read(MODEL_PATH)
+ self.assertIn("", raw)
+
+ def test_widgets_section_has_expected_children(self) -> None:
+ widgets = self.root.find("widgets")
+ self.assertIsNotNone(widgets, "missing section")
+ child_tags = [child.tag for child in widgets]
+ self.assertIn("view", child_tags)
+ self.assertIn("button", child_tags)
+ self.assertIn("monitor", child_tags)
+ self.assertIn("switch", child_tags)
+ self.assertIn("chooser", child_tags)
+ self.assertIn("slider", child_tags)
+ self.assertIn("plot", child_tags)
+
+ def test_widgets_button_count(self) -> None:
+ widgets = self.root.find("widgets")
+ buttons = widgets.findall("button")
+ self.assertEqual(len(buttons), 4, "expected 4 buttons: setup, go, add-incident, force-reflect")
+
+ def test_widgets_monitor_count(self) -> None:
+ widgets = self.root.find("widgets")
+ monitors = widgets.findall("monitor")
+ self.assertGreaterEqual(len(monitors), 12, "expected at least 12 monitors")
+
+ def test_widgets_plot_count(self) -> None:
+ widgets = self.root.find("widgets")
+ plots = widgets.findall("plot")
+ self.assertEqual(len(plots), 2, "expected 2 plots: Accuracy Over Time, Case Flow")
+
+ def test_turtle_shapes_defined(self) -> None:
+ shapes = self.root.find("turtleShapes")
+ self.assertIsNotNone(shapes, "missing section")
+ shape_names = [s.get("name") for s in shapes.findall("shape")]
+ self.assertIn("default", shape_names)
+ self.assertIn("circle", shape_names)
+ self.assertIn("person", shape_names)
+
+
+class TestModelStructure(unittest.TestCase):
+ def setUp(self) -> None:
+ self.root = parse_model()
+
+ def test_netlogo_version_is_7_0_3(self) -> None:
+ version = self.root.get("version")
+ self.assertEqual(version, "NetLogo 7.0.3")
+
+ def test_required_top_level_sections_exist(self) -> None:
+ required_sections = [
+ "code", "widgets", "info", "turtleShapes", "linkShapes",
+ "previewCommands",
+ ]
+ present = {child.tag for child in self.root}
+ for section in required_sections:
+ self.assertIn(section, present, f"missing top-level section: {section}")
+
+ def test_info_section_not_empty(self) -> None:
+ info = self.root.find("info")
+ self.assertIsNotNone(info, "missing section")
+ self.assertTrue(
+ info.text and len(info.text.strip()) > 0,
+ " section is empty",
+ )
+
+ def test_preview_commands_present(self) -> None:
+ preview = self.root.find("previewCommands")
+ self.assertIsNotNone(preview)
+ self.assertIn("setup", preview.text)
+
+ def test_link_shapes_has_default(self) -> None:
+ link_shapes = self.root.find("linkShapes")
+ self.assertIsNotNone(link_shapes, "missing ")
+ names = [s.get("name") for s in link_shapes.findall("shape")]
+ self.assertIn("default", names)
+
+
+class TestBehaviorRegression(unittest.TestCase):
+ def setUp(self) -> None:
+ self.code = model_code_only()
+
+ def test_extensions_declaration_present(self) -> None:
+ self.assertIn("extensions [ llm ]", self.code)
+
+ def test_chat_with_template_uses_list_syntax(self) -> None:
+ lines = self.code.splitlines()
+ for line in lines:
+ stripped = line.strip()
+ if "llm:chat-with-template" not in stripped:
+ continue
+ self.assertNotRegex(
+ stripped,
+ r'llm:chat-with-template\s+\S+\s+\[\[',
+ f"bracket syntax found instead of (list ...): {stripped}",
+ )
+
+ def test_no_inline_provider_setup_in_procedures(self) -> None:
+ for deprecated in ["llm:set-provider", "llm:set-api-key", "llm:set-model"]:
+ self.assertNotIn(
+ deprecated,
+ self.code,
+ f"deprecated inline primitive found: {deprecated}",
+ )
+
+ def test_all_procedure_blocks_are_closed(self) -> None:
+ opens = len(re.findall(r"^to(?:-report)?\s", self.code, re.MULTILINE))
+ closes = len(re.findall(r"^end\s*$", self.code, re.MULTILINE))
+ self.assertEqual(
+ opens,
+ closes,
+ f"mismatched procedure blocks: {opens} opens vs {closes} ends",
+ )
+
+ def test_no_deprecated_primitives(self) -> None:
+ deprecated = [
+ "llm:ask",
+ "llm:send",
+ "llm:query",
+ "llm:prompt",
+ ]
+ for prim in deprecated:
+ self.assertNotIn(prim, self.code, f"deprecated primitive: {prim}")
+
+ def test_globals_declared(self) -> None:
+ self.assertIn("globals [", self.code)
+ for g in ["llm-ready?", "config-path", "triage-template-path",
+ "incident-bank", "total-triaged", "correct-triage"]:
+ self.assertIn(g, self.code, f"missing global: {g}")
+
+ def test_incident_bank_has_30_entries(self) -> None:
+ """The incident bank should contain 30 incidents (10 misleading + 10 clear + 10 borderline)."""
+ code = self.code
+ # Count (list " patterns inside build-incident-bank — each incident starts with (list "
+ bank_start = code.find("to build-incident-bank")
+ bank_end = code.find("\nend", bank_start)
+ bank_code = code[bank_start:bank_end]
+ incident_count = bank_code.count('(list "')
+ # The outer (list wrapping all incidents doesn't start with (list "
+ self.assertEqual(incident_count, 30, f"expected 30 incidents, found {incident_count}")
+
+
+if __name__ == "__main__":
+ unittest.main()
diff --git a/demos/crisis-triage/triage-template.yaml b/demos/crisis-triage/triage-template.yaml
new file mode 100644
index 0000000..cd9745d
--- /dev/null
+++ b/demos/crisis-triage/triage-template.yaml
@@ -0,0 +1,24 @@
+# ABOUTME: Triage template for crisis severity assessment with calibration anchors.
+# ABOUTME: Used by dispatchers via llm:chat-with-template to classify incident severity.
+system: |
+ You are a crisis triage specialist with this background: {persona}
+ This is episode {episode}, tick {tick} of a municipal emergency simulation.
+
+ IMPORTANT: Do NOT rely on scary-sounding keywords alone. A "fire alarm" in a
+ server room may be a sensor malfunction. A "data center cooling loss" may threaten
+ lives if hospitals depend on it. Assess the ACTUAL described impact, not the
+ surface-level vocabulary.
+
+ Severity definitions:
+ - LOW: No injuries, no infrastructure at risk, routine response adequate.
+ - MODERATE: Minor injuries or limited disruption, single-agency response sufficient.
+ - HIGH: Significant injuries, infrastructure at risk, or time-sensitive escalation potential.
+ - CRITICAL: Life-threatening, multi-agency coordination needed, cascading failures, or large population affected.
+
+ Classify severity as exactly one of: LOW, MODERATE, HIGH, CRITICAL.
+template: |
+ Incident: {incident}
+ Impact: {impact}
+
+ Based on the described impact (not keywords), classify this incident severity.
+ Reply with the severity level first (LOW, MODERATE, HIGH, or CRITICAL), then a brief reason.