Skip to content

fix: prevent SOTA hallucination in Stage 01 goal.md#239

Open
octo-patch wants to merge 2 commits intoaiming-lab:mainfrom
octo-patch:fix/issue-238-sota-hallucination-in-topic-init
Open

fix: prevent SOTA hallucination in Stage 01 goal.md#239
octo-patch wants to merge 2 commits intoaiming-lab:mainfrom
octo-patch:fix/issue-238-sota-hallucination-in-topic-init

Conversation

@octo-patch
Copy link
Copy Markdown
Contributor

Fixes #238

Problem

The previous fix (#226 / 5dc7fcc) removed the prompt line that asked the model to cite specific papers in Stage 01, but the TREND VALIDATION section still explicitly asked:

State whether SOTA results exist on this benchmark and what they are.
Add a 'Benchmark' subsection listing: name, source, metrics, current SOTA (if known).

These lines continue to invite the model to hallucinate specific model names and performance numbers. Because goal.md is the first pipeline artifact and is loaded as context by all downstream stages, inaccurate SOTA claims can anchor reasoning throughout the entire run.

Solution

  1. Prompt hardening (researchclaw/prompts.py): Replace the two problematic lines with instructions that explicitly prohibit stating specific model names or performance numbers, directing the model to describe the benchmark type and typical metrics only. Actual SOTA verification is deferred to the literature search stage — consistent with the existing guardrail for paper citations.

  2. Disclaimer in goal.md (researchclaw/pipeline/stage_impls/_topic.py): Prepend a visible blockquote disclaimer to every LLM-generated goal.md so that downstream stages (and human reviewers) treat any benchmark/SOTA figures as provisional LLM estimates, not verified facts.

  3. Regression test (tests/test_rc_prompts.py): Assert that the rendered topic_init prompt contains at least one anti-hallucination guardrail phrase and that the previously problematic phrases ("what they are", "current SOTA (if known)") are absent.

Testing

  • pytest tests/test_rc_prompts.py — all 35 tests pass including the new regression test.
  • Full test suite run clean.

octo-patch added 2 commits April 20, 2026 11:16
…#238)

The previous fix (5dc7fcc) removed explicit paper-citation requests but
left SOTA performance claims intact. This commit closes the remaining gap.

- Replace the TREND VALIDATION prompt line that asked the model to state
  specific SOTA results with an instruction to NOT provide model names or
  performance numbers (these will be verified in the literature stage).
- Update the Benchmark subsection instruction to request only name,
  source, and typical metrics, omitting unverified performance numbers.
- Prepend a visible disclaimer to every LLM-generated goal.md so that
  downstream stages treat benchmark/SOTA figures as provisional estimates
  rather than verified facts.
- Add a regression test asserting the topic_init prompt contains
  anti-hallucination guardrails and no longer solicits exact SOTA figures.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Goal.md Hallucinated References

1 participant