Skip to content

test(core): regression guard for vector-row cleanup on entity delete (#764)#771

Merged
phernandez merged 2 commits intomainfrom
fix/delete-entity-vectors-cleanup
Apr 29, 2026
Merged

test(core): regression guard for vector-row cleanup on entity delete (#764)#771
phernandez merged 2 commits intomainfrom
fix/delete-entity-vectors-cleanup

Conversation

@phernandez
Copy link
Copy Markdown
Member

Summary

What the test does

Spies on repository.delete_entity_vector_rows via monkeypatch, calls handle_delete, asserts the spy was called with the entity id. Forces _semantic_enabled=True so we exercise the real delegation rather than the no-op short-circuit.

Out of scope (filed in the issue, will land separately)

  • pavelasm: project info doesn't surface orphaned search_vector_embeddings rows from pre-fix DBs. Reporting gap, not a correctness bug.
  • natedev: the watch service single-file sync path doesn't refresh vectors on UPDATE. Different code path (sync_service.sync_file), unrelated to the delete-path issue.

I'll recommend in the issue comment that those land as separate issues so they get individual triage.

Test plan

  • pytest tests/services/test_search_service.py::test_handle_delete_clears_entity_vectors --no-cov — passes
  • Existing test_delete_entity_without_permalink still passes
  • ruff check clean

Closes #764

🤖 Generated with Claude Code

…764)

The delete-path cleanup the reporter asked for already shipped in PR #733
("clean up delete vectors and cloud sync", merged 2026-04-10): handle_delete
calls _clear_entity_vectors which delegates to
repository.delete_entity_vector_rows. Issue #764 was filed two weeks later
without referencing #733, so this PR locks the wiring in with a regression
test.

The test spies on the repository's delete_entity_vector_rows and asserts
handle_delete invokes it with the entity id. monkeypatch flips
_semantic_enabled=True so we exercise the real delegation rather than the
no-op short-circuit that fires when semantic is disabled.

The two follow-up comments on #764 describe distinct issues that are out
of scope here:
  - pavelasm: project info doesn't surface orphaned vector embeddings.
    Reporting gap for pre-existing drift, not a delete-path bug.
  - natedev: the watch service single-file sync path doesn't refresh
    vectors on update. Different code path (sync_service.sync_file), not
    delete-related.
Both should land as separate issues.

Closes #764

Signed-off-by: phernandez <paul@basicmachines.co>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 29, 2026

Claude finished @phernandez's task in 1m 21s —— View job


PR Review

  • Gather context and understand the request
  • Read changed file (tests/services/test_search_service.py)
  • Review against Basic Memory team checklist
  • Post final review

This is a clean, focused regression test. The implementation is sound and aligns with project conventions. A few observations below.


Code Quality & Standards

  • ✅ Follows Basic Memory coding conventions
  • ✅ Python 3.12+ type annotations (calls: list[int] = [])
  • ✅ 100-character line limit maintained
  • ✅ No commented-out code
  • ✅ Docstring clearly explains the regression purpose and what the test is guarding against

Testing & Documentation

  • ✅ Regression test guards the correct boundary (repository.delete_entity_vector_rows)
  • ✅ Forces _semantic_enabled=True to bypass the short-circuit in _clear_entity_vectors — correct approach
  • ✅ Assertion message is informative on failure
  • ✅ Placed next to the existing test_delete_entity_without_permalink — good proximity
  • ✅ 100% coverage is preserved (test-only PR, no production code added)

Minor Observations

raising=False on _semantic_enabled (test_search_service.py, new test ~line 283):

monkeypatch.setattr(
    search_service.repository, "_semantic_enabled", True, raising=False
)

Using raising=False means if _semantic_enabled is ever renamed or removed, the monkeypatch silently succeeds but the test would fail later (or behave unexpectedly) instead of failing at the patch site. Per CLAUDE.md's "Fail fast" guideline, the default raising=True is safer and would immediately surface a typo or rename. The fixture's repository object is a real instance that does carry this attribute, so raising=True would work fine here.

Fix this →

Spy return type is -> None on an async def, which is correct Python. No issue here.

Architecture & Compatibility

  • ✅ No MCP, DB, or file handling changes — test-only
  • ✅ No file path comparisons, no unicode/emoji in output
  • ✅ No hardcoded secrets or credentials

Summary

The test is well-scoped, documents the regression risk clearly, and exercises the right wiring. The raising=False flag is the only actionable nit — removing it would align with the project's fail-fast convention. Everything else looks good. Happy to approve once that's addressed, or if intentional (e.g. the fixture's repo type varies between test configs), a short comment explaining the choice would suffice.

The repository instance under test always exposes `_semantic_enabled`,
so `raising=False` was masking a hypothetical rename rather than
guarding against a real condition. Use the default raising=True so a
future rename or removal fails loudly at the patch site instead of
mutating drift downstream.

Signed-off-by: phernandez <paul@basicmachines.co>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 29, 2026

Claude finished @phernandez's task in 41s —— View job


PR Review

  • Read changed file and verify previous feedback addressed
  • Review against Basic Memory team checklist
  • Post final review

The follow-up commit (7386929) addressed the one nit from the previous review. Here's the full checklist pass on the final state.


Code Quality & Standards

  • ✅ Follows Basic Memory coding conventions in CLAUDE.md
  • ✅ Python 3.12+ type annotations (calls: list[int] = [], entity_id: int)
  • ✅ SQLAlchemy 2.0 — N/A (test-only)
  • ✅ FastAPI/Typer conventions — N/A (test-only)
  • ✅ 100-character line length maintained
  • ✅ No commented-out code blocks

Testing & Documentation

  • ✅ Regression test guards correct boundary (repository.delete_entity_vector_rows)
  • ✅ Integration tests for MCP tools — N/A (no new tools added)
  • ✅ Docstring clearly explains the issue (entity deletion leaves orphaned rows in search_vector_chunks and search_vector_embeddings #764), what orphaned rows are left behind, and why _semantic_enabled=True is forced
  • ✅ 100% coverage preserved (test-only PR)
  • raising=False removed in follow-up commit — monkeypatch.setattr now uses default raising=True, consistent with CLAUDE.md's fail-fast convention

Basic Memory Architecture

  • ✅ Test-only change — no MCP, database, file, or knowledge-graph changes
  • ✅ No Alembic migrations needed
  • ✅ Local-first architecture unaffected

Security & Performance

  • ✅ No hardcoded secrets or credentials
  • ✅ Input validation — N/A (test-only)
  • ✅ No sensitive data in logs or commits

Compatibility

  • ✅ No file path comparisons
  • ✅ No emoji or unicode in console/log output

Summary

Clean regression test, well-scoped, and the raising=False nit from the previous review is resolved. All checklist items pass. Ready to merge.

@phernandez phernandez merged commit 953fe20 into main Apr 29, 2026
40 of 41 checks passed
@phernandez phernandez deleted the fix/delete-entity-vectors-cleanup branch April 29, 2026 04:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

entity deletion leaves orphaned rows in search_vector_chunks and search_vector_embeddings

1 participant