Summary
Four of the English stems in the multilingual structural-question gate (#1134) — call, trace,
affect, connect — match as unbounded prefixes, so ordinary non-technical English words that merely
start with one of these stems false-fire the gate's HIGH-tier (full-explore) branch.
Root cause
// src/directory.ts:405-406
const STRUCTURAL_STEMS_RE = new RegExp(`${NOT_WORD_BEFORE}(?:${STRUCTURAL_STEMS.join('|')})`, 'iu');
STRUCTURAL_STEMS_RE enforces a LEFT boundary only (by design — see the docstring at
src/directory.ts:312-320 — so "architect" matches "architecture"). Most stems in the list only ever
complete into structural words, so the open right side is safe for them. But call, trace,
affect, connect have common English completions that aren't structural at all: callus,
calligraphy, callous, Connecticut, connective (tissue), affectionate, Tracey. The file's own review
rule for this list ("Add a stem only when every plausible completion is still a structural word",
src/directory.ts:318) doesn't hold for these four.
Repro (verified by direct execution against the built regex, node 22, node:sqlite backend)
import { hasStructuralKeyword } from './src/directory';
hasStructuralKeyword('he has a callus'); // true — should be false
hasStructuralKeyword('Connecticut is a state'); // true — should be false
hasStructuralKeyword('she is very affectionate'); // true — should be false
hasStructuralKeyword('Tracey went home early'); // true — should be false
I have NOT run the full codegraph prompt-hook CLI end-to-end on these strings — the above is a
direct unit-level call. keyworded = hasStructuralKeyword(prompt) at src/bin/codegraph.ts:1094 does
feed straight into the HIGH-tier branch (confirmed by reading that call site), which runs
codegraph_explore and writes a <codegraph_context> block — PR #1134's own reported payload sizes
put HIGH-tier injections around 16KB on this repo for other prompts, so I'd expect a comparable cost
here, though I haven't measured it for these exact strings.
No existing test catches this — __tests__/frontload-hook.test.ts's mid-word guards
("restructure this paragraph", "an independent module", lines 230-231) only exercise the LEFT
boundary.
Impact
Any structural-question prompt containing one of these four stems as a false-positive substring
gets the same unnecessary explore-and-inject cost the STRUCTURAL_WORDS exact-match class exists to
avoid (per its own docstring: "short or ambiguous tokens where prefix matching would
false-positive"). "call" in particular is a common enough substring that this seems likely to fire
in normal use, though I don't have production telemetry to say how often.
Suggested fix
Bound the four risky stems to their known derivational suffixes, matching the suffix-enumeration
pattern STRUCTURAL_WORDS already uses elsewhere in this same file (e.g. reach(?:es|ed)?):
`call(?:s|ing|ed|ers?)?${NOT_WORD_AFTER}`
`trace(?:s|d|rs?)?${NOT_WORD_AFTER}`
`affect(?:s|ed|ing)?${NOT_WORD_AFTER}`
`connect(?:s|ed|ing|ions?|ors?)?${NOT_WORD_AFTER}`
Verified by direct execution: calls/calling/called/caller(s), traces/traced/tracer(s),
affects/affected/affecting, connects/connected/connecting/connection(s)/connector(s) all still
match; callus/calligraphy/callous/Connecticut/connective/affectionate/Tracey no longer do. All 31
existing tests in frontload-hook.test.ts still pass with this change. The other 6 English stems
(architect, structur, depend, implement, impact, explain) didn't turn up demonstrable non-technical
collisions in my check, so I'd leave those as open prefixes rather than touch what isn't
demonstrated broken.
Verification / scope
Environment
Found on main (tip e699ee9, v1.2.0). Not yet in a tagged release (still origin/main as of
2026-07-02).
Happy to send a PR for the fix (I already have it written and tested locally), or just leave this
filed if you'd rather pick it up yourself. Let me know which you'd prefer.
Summary
Four of the English stems in the multilingual structural-question gate (#1134) — call, trace,
affect, connect — match as unbounded prefixes, so ordinary non-technical English words that merely
start with one of these stems false-fire the gate's HIGH-tier (full-explore) branch.
Root cause
STRUCTURAL_STEMS_RE enforces a LEFT boundary only (by design — see the docstring at
src/directory.ts:312-320 — so "architect" matches "architecture"). Most stems in the list only ever
complete into structural words, so the open right side is safe for them. But
call,trace,affect,connecthave common English completions that aren't structural at all: callus,calligraphy, callous, Connecticut, connective (tissue), affectionate, Tracey. The file's own review
rule for this list ("Add a stem only when every plausible completion is still a structural word",
src/directory.ts:318) doesn't hold for these four.
Repro (verified by direct execution against the built regex, node 22,
node:sqlitebackend)I have NOT run the full
codegraph prompt-hookCLI end-to-end on these strings — the above is adirect unit-level call.
keyworded = hasStructuralKeyword(prompt)at src/bin/codegraph.ts:1094 doesfeed straight into the HIGH-tier branch (confirmed by reading that call site), which runs
codegraph_exploreand writes a<codegraph_context>block — PR #1134's own reported payload sizesput HIGH-tier injections around 16KB on this repo for other prompts, so I'd expect a comparable cost
here, though I haven't measured it for these exact strings.
No existing test catches this —
__tests__/frontload-hook.test.ts's mid-word guards("restructure this paragraph", "an independent module", lines 230-231) only exercise the LEFT
boundary.
Impact
Any structural-question prompt containing one of these four stems as a false-positive substring
gets the same unnecessary explore-and-inject cost the STRUCTURAL_WORDS exact-match class exists to
avoid (per its own docstring: "short or ambiguous tokens where prefix matching would
false-positive"). "call" in particular is a common enough substring that this seems likely to fire
in normal use, though I don't have production telemetry to say how often.
Suggested fix
Bound the four risky stems to their known derivational suffixes, matching the suffix-enumeration
pattern STRUCTURAL_WORDS already uses elsewhere in this same file (e.g.
reach(?:es|ed)?):Verified by direct execution: calls/calling/called/caller(s), traces/traced/tracer(s),
affects/affected/affecting, connects/connected/connecting/connection(s)/connector(s) all still
match; callus/calligraphy/callous/Connecticut/connective/affectionate/Tracey no longer do. All 31
existing tests in frontload-hook.test.ts still pass with this change. The other 6 English stems
(architect, structur, depend, implement, impact, explain) didn't turn up demonstrable non-technical
collisions in my check, so I'd leave those as open prefixes rather than touch what isn't
demonstrated broken.
Verification / scope
feat(directory): allow local-only codegraph gitignore #1013), none touch this line range (STRUCTURAL_STEMS / STRUCTURAL_STEMS_RE, lines 312-406).
related terms — only prompt-hook gate regex skips non-English (Chinese) prompts — UserPromptSubmit never injects context #994/prompt-hook gate still skips non-English Latin-script prompts (French) â�� follow-up to #994/#1004 #1126 (the issues fix(prompt-hook): make the structural-question gate multilingual (#1126) #1134 itself fixed) came up; nothing about this
specific boundary gap.
Environment
Found on
main(tipe699ee9, v1.2.0). Not yet in a tagged release (stillorigin/mainas of2026-07-02).
Happy to send a PR for the fix (I already have it written and tested locally), or just leave this
filed if you'd rather pick it up yourself. Let me know which you'd prefer.