fix(search_strategy): accept search_phases / phases / dict-wrapped queries#243
Open
minamo817 wants to merge 1 commit intoaiming-lab:mainfrom
Open
fix(search_strategy): accept search_phases / phases / dict-wrapped queries#243minamo817 wants to merge 1 commit intoaiming-lab:mainfrom
minamo817 wants to merge 1 commit intoaiming-lab:mainfrom
Conversation
…eries
The `_execute_search_strategy` parser only recognized `search_strategies`
as the top-level key for LLM-generated plans, and only accepted queries
as a list of strings. Real LLM output frequently uses different but
semantically equivalent schemas, e.g.:
search_phases:
- phase: 1
queries:
- q1: "<first domain query>"
- q2: "<second domain query>"
When this happens, `plan.get("search_strategies", [])` returns `[]`,
`queries_list` stays empty, and the stage falls back to
`_build_default_search_queries()`, which slices the topic sentence into
stopword-filtered n-grams of the form `<topic-first-6-words>` plus the
literal strings `" benchmark"` / `" survey"` / `" recent advances"`.
These n-grams then drive Stage 4's real API search against OpenAlex /
Semantic Scholar / arXiv, which returns whichever high-citation 2020+
papers happen to match those generic tokens — entirely unrelated to the
actual research topic. Stage 5's shortlist is then populated with
completely off-topic papers.
This patch accepts `search_strategies`, `search_phases`, `phases`, and a
top-level `queries` fallback, and handles query items that are either
strings or single-key dicts like `{"q1": "query text"}`. Behavior for
plans that already use the original schema is unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5595f30 to
9f15247
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
_execute_search_strategyonly recognizessearch_strategiesas the top-level key of LLM-generated plans, and only accepts queries as a list of strings. When the LLM produces a semantically equivalent but syntactically different schema (e.g.search_phases:with queries like{q1: "..."}), the parser silently returns an emptyqueries_listand the stage falls back to_build_default_search_queries(), which slices the topic sentence into stopword-filtered n-grams.Those n-grams then drive Stage 4's real API search, which returns whichever high-citation papers happen to match those generic tokens, and Stage 5's shortlist is populated with off-topic papers.
Repro
Any run where the LLM produces a plan with a top-level key other than
search_strategies, or with queries wrapped as dicts. After Stage 3 completes:stage-03/search_plan.yaml— a well-formed planstage-03/queries.json— only topic n-grams:<first N words of topic>plus literal" benchmark"/" survey"/" recent advances"stage-04/candidates.jsonl— top hits match generic tokens rather than the intended domainstage-05/shortlist.jsonl— off-topic papersFix
search_strategies,search_phases, orphasesat the top level.querieslist, handle items that are either strings or single-key dicts like{"q1": "query text"}.queries:list as a last-resort fallback (same shape handling).Plans that already use the original schema are unaffected.
Test plan
queries.jsonreflects the LLM's plan rather than topic n-gramsNotes for reviewers
Separately,
_build_default_search_queries()is a weak fallback that produces low-quality queries regardless of domain. That is out of scope here — this patch only addresses why the fallback is being hit when the LLM did produce a good plan. Happy to follow up.