diff --git a/docs/DISCOVERIES.md b/docs/DISCOVERIES.md index ac69d9903a..3f850ef4af 100644 --- a/docs/DISCOVERIES.md +++ b/docs/DISCOVERIES.md @@ -2,6 +2,89 @@ This file documents non-obvious problems, solutions, and patterns discovered during amplihack development. Review and update this regularly, removing outdated entries or those replaced by better practices, code, or tools. Update entries where best practices have evolved. +## Sync Method Called with asyncio.run() Passes Resolved Value, Not Coroutine (2026-04-25) + +### Issue + +`agent_subprocess.py` wrapped `agent.answer_question(...)` with `asyncio.run()`. +Every L1–L12 level in the progressive test suite failed with: + +``` +ValueError: a coroutine was expected, got ('', ReasoningTrace(...)) +``` + +### Root Cause + +`AnswerSynthesizerMixin.answer_question` is a **sync** method that internally +calls `_run_async_or_return` and returns the already-resolved value. Passing +the resolved tuple to `asyncio.run()` triggers `ValueError` because +`asyncio.run()` expects a coroutine object, not a plain Python value. + +### Solution + +**PR #4471**: Call `agent.answer_question(...)` directly. `learn_from_content` +on the same line is genuinely async and remains wrapped with `asyncio.run()`. + +### Key Learnings + +1. **Verify async status before wrapping**: Check `async def` in the method + signature — not just the module name — before using `asyncio.run()` +2. **`_run_async_or_return` helpers return synchronously**: They are sync + wrappers around async dispatch; they do not return coroutines +3. **Error message is misleading**: The bench logs reported a memory warning, + not the actual `ValueError`, making diagnosis harder + +### Prevention + +- Read the method signature (`async def` vs `def`) before adding `asyncio.run()` +- Search for sibling calls in the same file to verify the async/sync pattern + used for other methods on the same object + +--- + +## Shadowed Async Error in Progressive Eval Suite Exposed by Upstream Fix (2026-04-25) + +### Issue + +After PR #4471 fixed the `asyncio.run()` misuse on `answer_question`, a new +failure appeared across all L1–L12 levels in the gym benchmark: + +``` +RuntimeWarning: coroutine 'grade_metacognition' was never awaited +✗ L1 failed: 'coroutine' object has no attribute 'effort_calibration' +``` + +### Root Cause + +`grade_metacognition` (`metacognition_grader.py:280`) is `async def`, but +`progressive_test_suite.run_single_level` is a **sync** method and called it +without `asyncio.run()`. The coroutine was never awaited; accessing +`.effort_calibration` on the unawaited coroutine object raised `AttributeError`. +The bug was invisible before #4471 because the upstream failure masked it. + +### Solution + +**PR #4472**: Wrap the call with `asyncio.run(grade_metacognition(...))` and +add `import asyncio` to `progressive_test_suite.py`. + +### Key Learnings + +1. **Upstream failures can mask downstream async bugs**: Fix root-cause errors + first; expect secondary async/sync mismatches to surface after +2. **`RuntimeWarning: coroutine was never awaited` pinpoints the site**: The + warning names the exact coroutine function — go there first +3. **`AttributeError` on coroutine object is a secondary symptom**: The real + error is the missing `asyncio.run()` one line earlier + +### Prevention + +- After fixing an async/sync mismatch, re-run the full test suite to catch + downstream bugs now exposed +- Treat `RuntimeWarning: coroutine … was never awaited` as a hard error, not + a warning — it always indicates a broken async call path + +--- + ## Cleanup Agent Gap: Root Directory Organization (2026-01-12) ### Issue diff --git a/docs/howto/agent-learning-eval-harness.md b/docs/howto/agent-learning-eval-harness.md index 9391cbc4e2..3a3b4f9211 100644 --- a/docs/howto/agent-learning-eval-harness.md +++ b/docs/howto/agent-learning-eval-harness.md @@ -106,6 +106,26 @@ The latest accepted Azure scores and the local reproduction commands are tracked - [Current validation results](../hive_mind/current-validation-results.md) +## Async/Sync Contract for Eval Methods + +When adding calls to eval methods in `agent_subprocess.py` or +`progressive_test_suite.py`, check whether the method is `async def` or `def` +before deciding how to call it: + +| Method | Type | How to call from sync context | +|--------|------|-------------------------------| +| `agent.answer_question(...)` | sync (`def`) | Call directly — no `asyncio.run()` | +| `agent.learn_from_content(...)` | async (`async def`) | `asyncio.run(agent.learn_from_content(...))` | +| `grade_metacognition(...)` | async (`async def`) | `asyncio.run(grade_metacognition(...))` | + +Wrapping a sync method with `asyncio.run()` raises +`ValueError: a coroutine was expected` because the resolved return value is +not a coroutine. Omitting `asyncio.run()` from an async method leaves the +coroutine unawaited and raises `AttributeError` when you access fields on it. + +See [April 2026 eval fixes](../recipes/RECENT_FIXES_APRIL_2026.md) for the +full diagnosis (PRs #4471, #4472). + ## Related Docs - [Distributed Hive Evaluation](../hive_mind/EVAL.md) diff --git a/docs/recipes/RECENT_FIXES_APRIL_2026.md b/docs/recipes/RECENT_FIXES_APRIL_2026.md new file mode 100644 index 0000000000..7315cae8f7 --- /dev/null +++ b/docs/recipes/RECENT_FIXES_APRIL_2026.md @@ -0,0 +1,80 @@ +# Recent Fixes — April 2026 + +This document tracks bug fixes and improvements merged in April 2026, following +the [Diátaxis](https://diataxis.fr/) framework (Explanation sections). + +--- + +## April 25 — Eval Async/Sync Contract Fixes + +Two sequential fixes in the `amplihack-agent-eval` progressive test suite +resolved a class of bugs where Python's `asyncio.run()` was misapplied to +sync methods, or omitted from genuinely async ones. + +### Don't wrap sync `answer_question` with `asyncio.run()` (PR #4471) + +**Problem**: `agent_subprocess.py` called `asyncio.run(agent.answer_question(...))` +inside the `testing_phase` function. Every L1–L12 level failed with: + +``` +ValueError: a coroutine was expected, got ('', ReasoningTrace(...)) +``` + +The bench logs obscured the root cause by reporting +`Testing phase failed: WARNING: amplihack_memory.graph not available`. + +**Root cause**: `AnswerSynthesizerMixin.answer_question` +(`src/amplihack/agents/goal_seeking/answer_synthesizer.py:52`) is a **sync** +method. It calls `_run_async_or_return` internally and returns the resolved +value directly. Passing the resolved tuple `('', ReasoningTrace)` to +`asyncio.run()` raised `ValueError` because `asyncio.run()` expects a coroutine, +not an already-resolved value. + +**Fix**: Call `agent.answer_question(...)` directly without `asyncio.run()`. +`learn_from_content` (line 120) is genuinely async and remains wrapped. + +**Rule**: Before wrapping a call with `asyncio.run()`, verify via the method +signature or docstring that the method is `async def`. Sync wrappers that +internally dispatch to async code (like `_run_async_or_return`) return +synchronously — wrapping them is incorrect. + +--- + +### Await async `grade_metacognition` in progressive suite (PR #4472) + +**Problem**: This bug was hidden by the #4471 error above. Once #4471 was +fixed and `answer_question` began returning the correct trace, the downstream +`grade_metacognition` call surfaced a new failure across all L1–L12 levels: + +``` +RuntimeWarning: coroutine 'grade_metacognition' was never awaited +✗ L1 failed: 'coroutine' object has no attribute 'effort_calibration' +✗ L2 failed: 'coroutine' object has no attribute 'effort_calibration' +... (repeats L1 through L12) +``` + +**Root cause**: `grade_metacognition` is **async** +(`metacognition_grader.py:280`) but `progressive_test_suite.run_single_level` +is a **sync** method and called it without `asyncio.run()`. The returned +coroutine was never awaited; the subsequent `.effort_calibration` access +raised `AttributeError`. + +**Fix**: Wrap the call with `asyncio.run(grade_metacognition(...))` and add +the `asyncio` import to `progressive_test_suite.py`. + +**Rule**: When a sync function must call an async function, use +`asyncio.run()`. When an async function calls another async function, use +`await`. The `run_single_level` function is sync, so `asyncio.run()` is the +correct bridge. + +--- + +### Summary table + +| PR | File | Bug | Fix | +|----|------|-----|-----| +| #4471 | `eval/agent_subprocess.py` | `asyncio.run()` on sync `answer_question` | Call directly | +| #4472 | `progressive_test_suite.py` | Missing `asyncio.run()` on async `grade_metacognition` | Add `asyncio.run()` + import | + +**Dependency**: #4472 was shadowed by #4471. Fix #4471 first to expose the +downstream grader bug.