test: fix CI regressions (Vertex/litellmrouter construction, dedup config, trace recorder leak)#2491
Merged
Merged
Conversation
…onfig The dedup merge/update path builds a search_vector UPDATE clause from config.text_search_extension (+ _native_language) since #2425, but the _dedup_reconcile_create / _dedup_reconcile_update test configs only set consolidation_dedup_threshold, so the two merge-path tests raised AttributeError: 'types.SimpleNamespace' object has no attribute 'text_search_extension' on main. Add the two fields (production defaults native/english) to those configs. The clause reuses $1, so the existing positional-arg assertions are unchanged.
Regression: LLMConfig was refactored to use vertexai_project_id/region/ service_account_key as-passed (the caller resolves the global-config fallback), but the llm_config fixture never forwarded them. So with the CI provider set to vertexai, LLMConfig raised "HINDSIGHT_API_LLM_VERTEXAI_PROJECT_ID is required" even though the env var was set — the test errored in the test-api job. Forward the three Vertex settings from config (mirroring MemoryEngine's own LLMConfig construction). Verified: LLMConfig(provider="vertexai", ...) now constructs with project_id passed, and still raises when it is omitted.
b29bcfd to
31cc360
Compare
_make_llm() built an LLMProvider from the env-selected provider without
forwarding provider-specific settings, so the vertexai and litellmrouter
acceptance-matrix jobs failed at construction ("VERTEXAI_PROJECT_ID is required"
/ "litellmrouter requires a config object"). LLMProvider uses these as-passed
(it does not resolve them from global config), so forward
vertexai_project_id/region/service_account_key and litellmrouter_config.
31cc360 to
4da6a13
Compare
Root cause of the flaky test_llm_trace::test_disabled_writes_no_rows: MemoryEngine.__init__ registers its LLM-trace recorder in a process-global registry, and only close() removes it. Tests that construct an engine directly (test_per_operation_llm_config, test_llm_reasoning_effort_env, etc.) never close it, leaking an ENABLED recorder. Locally those recorders' writes fail (uninitialized backend), but in CI a leaked recorder with a live backend records a later test's LLM calls into the shared DB — so test_disabled_writes_ no_rows sees rows for its bank even though its own recorder is disabled (assert N == 0). Reproduced: after test_per_operation_llm_config the registry holds 8 enabled recorders. Add an autouse fixture that snapshots the registry and removes anything a test leaked. Verified: the registry drops from 8 leaked recorders back to 1. _teardown_memory_engine already guards the fixtures; this guards direct constructions. 53 trace + leak-risk tests pass together under -n2.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes CI failures on
main. Test-only changes.1. Vertex AI regression — provider built without the Vertex settings
LLMConfig/LLMProviderwere refactored to use provider-specific settings as-passed (not resolved from global config). CI runstest-api/Core LLM/acceptance withHINDSIGHT_API_LLM_PROVIDER=vertexai, and two constructors built the provider without forwardingvertexai_project_id/region/service_account_key, raisingHINDSIGHT_API_LLM_VERTEXAI_PROJECT_ID is required:test_fact_extraction_analysis.py::llm_config(ERROR intest-api)test_llm_provider.py::_make_llm(FAILED inLLM acceptance (vertexai))2. litellmrouter regression — same class
test_llm_provider.py::_make_llmdidn't forwardlitellmrouter_configeither →LLM acceptance (litellmrouter)failed withProvider 'litellmrouter' requires a config object.3.
test_consolidation_dedup— 2 merge-path testsSince #2425 the dedup merge path reads
config.text_search_extension(+_native_language); the test configs only setconsolidation_dedup_threshold. Added the fields (defaultsnative/english).4.
test_llm_trace::test_disabled_writes_no_rows— leaked-recorder flake (#2229)MemoryEngine.__init__registers its LLM-trace recorder in a process-global registry; onlyclose()removes it. Tests that construct an engine directly (e.g.test_per_operation_llm_config) never close it, leaking an enabled recorder. Locally its writes fail (uninitialized backend) so 0 rows; in CI a leaked recorder with a live backend records a later test's LLM calls into the shared DB, sotest_disabled_writes_no_rowssees rows for its bank even though its own recorder is disabled (assert N == 0). Reproduced: aftertest_per_operation_llm_configthe registry holds 8 enabled recorders. Added an autouse fixture that drops any recorder a test leaked (registry goes 8 → 1)._teardown_memory_enginealready guarded the fixtures; this guards direct constructions.Verified
test_consolidation_dedup.py→ 18 passed.LLMConfig/LLMProviderconstruct with the params passed, still raise when omitted (repros the CI errors). Prior run confirmedtest-api (1/3)and both acceptance jobs went green.-n2; leaked-recorder count 8 → 1.Out of scope (pre-existing, not this PR)
test_fact_extraction_qualityjudge tests — real-LLM model variability (a different one flakes each run).test_migration_observation_search_vector_backfill/test-*-client-oracle— CI infra (embedded-PG / Oracle DB startup).