fix: prevent SOTA hallucination in Stage 01 goal.md#239
Open
octo-patch wants to merge 2 commits intoaiming-lab:mainfrom
Open
fix: prevent SOTA hallucination in Stage 01 goal.md#239octo-patch wants to merge 2 commits intoaiming-lab:mainfrom
octo-patch wants to merge 2 commits intoaiming-lab:mainfrom
Conversation
added 2 commits
April 20, 2026 11:16
…#238) The previous fix (5dc7fcc) removed explicit paper-citation requests but left SOTA performance claims intact. This commit closes the remaining gap. - Replace the TREND VALIDATION prompt line that asked the model to state specific SOTA results with an instruction to NOT provide model names or performance numbers (these will be verified in the literature stage). - Update the Benchmark subsection instruction to request only name, source, and typical metrics, omitting unverified performance numbers. - Prepend a visible disclaimer to every LLM-generated goal.md so that downstream stages treat benchmark/SOTA figures as provisional estimates rather than verified facts. - Add a regression test asserting the topic_init prompt contains anti-hallucination guardrails and no longer solicits exact SOTA figures.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #238
Problem
The previous fix (#226 / 5dc7fcc) removed the prompt line that asked the model to cite specific papers in Stage 01, but the TREND VALIDATION section still explicitly asked:
These lines continue to invite the model to hallucinate specific model names and performance numbers. Because
goal.mdis the first pipeline artifact and is loaded as context by all downstream stages, inaccurate SOTA claims can anchor reasoning throughout the entire run.Solution
Prompt hardening (
researchclaw/prompts.py): Replace the two problematic lines with instructions that explicitly prohibit stating specific model names or performance numbers, directing the model to describe the benchmark type and typical metrics only. Actual SOTA verification is deferred to the literature search stage — consistent with the existing guardrail for paper citations.Disclaimer in
goal.md(researchclaw/pipeline/stage_impls/_topic.py): Prepend a visible blockquote disclaimer to every LLM-generatedgoal.mdso that downstream stages (and human reviewers) treat any benchmark/SOTA figures as provisional LLM estimates, not verified facts.Regression test (
tests/test_rc_prompts.py): Assert that the renderedtopic_initprompt contains at least one anti-hallucination guardrail phrase and that the previously problematic phrases ("what they are","current SOTA (if known)") are absent.Testing
pytest tests/test_rc_prompts.py— all 35 tests pass including the new regression test.