fix: prevent SOTA hallucination in Stage 01 goal.md by octo-patch · Pull Request #239 · aiming-lab/AutoResearchClaw

octo-patch · 2026-04-20T03:17:12Z

Fixes #238

Problem

The previous fix (#226 / 5dc7fcc) removed the prompt line that asked the model to cite specific papers in Stage 01, but the TREND VALIDATION section still explicitly asked:

State whether SOTA results exist on this benchmark and what they are.
Add a 'Benchmark' subsection listing: name, source, metrics, current SOTA (if known).

These lines continue to invite the model to hallucinate specific model names and performance numbers. Because goal.md is the first pipeline artifact and is loaded as context by all downstream stages, inaccurate SOTA claims can anchor reasoning throughout the entire run.

Solution

Prompt hardening (researchclaw/prompts.py): Replace the two problematic lines with instructions that explicitly prohibit stating specific model names or performance numbers, directing the model to describe the benchmark type and typical metrics only. Actual SOTA verification is deferred to the literature search stage — consistent with the existing guardrail for paper citations.
Disclaimer in goal.md (researchclaw/pipeline/stage_impls/_topic.py): Prepend a visible blockquote disclaimer to every LLM-generated goal.md so that downstream stages (and human reviewers) treat any benchmark/SOTA figures as provisional LLM estimates, not verified facts.
Regression test (tests/test_rc_prompts.py): Assert that the rendered topic_init prompt contains at least one anti-hallucination guardrail phrase and that the previously problematic phrases ("what they are", "current SOTA (if known)") are absent.

Testing

pytest tests/test_rc_prompts.py — all 35 tests pass including the new regression test.
Full test suite run clean.

…#238) The previous fix (5dc7fcc) removed explicit paper-citation requests but left SOTA performance claims intact. This commit closes the remaining gap. - Replace the TREND VALIDATION prompt line that asked the model to state specific SOTA results with an instruction to NOT provide model names or performance numbers (these will be verified in the literature stage). - Update the Benchmark subsection instruction to request only name, source, and typical metrics, omitting unverified performance numbers. - Prepend a visible disclaimer to every LLM-generated goal.md so that downstream stages treat benchmark/SOTA figures as provisional estimates rather than verified facts. - Add a regression test asserting the topic_init prompt contains anti-hallucination guardrails and no longer solicits exact SOTA figures.

octo-patch added 2 commits April 20, 2026 11:16

test: update executor test to expect goal.md disclaimer header

073acc3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: prevent SOTA hallucination in Stage 01 goal.md#239

fix: prevent SOTA hallucination in Stage 01 goal.md#239
octo-patch wants to merge 2 commits intoaiming-lab:mainfrom
octo-patch:fix/issue-238-sota-hallucination-in-topic-init

octo-patch commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

octo-patch commented Apr 20, 2026

Problem

Solution

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant