Skip to content

entity deletion leaves orphaned rows in search_vector_chunks and search_vector_embeddings #764

@pavelasm

Description

@pavelasm

Scope

SearchService.handle_delete() removes search_index (FTS) entries but does not clean up vector data. After EntityService.delete_entity() returns, search_vector_chunks and search_vector_embeddings retain rows for the deleted entity. Reproduced on SQLite; Postgres appears affected by the same delete path and schema design — search_vector_chunks has no entity FK in the Postgres migration, and embeddings only cascade from chunks, not from entity deletion.

Reproduced (SQLite)

write-note → delete via delete_note() MCP tool → after_entity=0, after_fts=0, orphaned_chunks=1, orphaned_embeddings=1.

Proposed fix

  1. Add delete_entity_vectors(entity_id: int) to the public SearchRepository protocol (search_repository.py). SearchService is typed against this protocol — the method must appear there before the service can call it.

  2. Implement in both backends via SearchRepositoryBase.delete_entity_vectors(), which acquires a session and calls the existing private _delete_entity_chunks():

    • SQLite (SQLiteSearchRepository): _delete_entity_chunks already deletes search_vector_embeddings rows first (no CASCADE on the sqlite-vec virtual table), then deletes search_vector_chunks rows.
    • Postgres (PostgresSearchRepository): _delete_entity_chunks already deletes search_vector_chunks rows only — search_vector_embeddings.chunk_id has REFERENCES search_vector_chunks(id) ON DELETE CASCADE, so embeddings cascade automatically.
  3. Expose through SearchService as delete_entity_vectors(entity_id: int).

  4. Call await self.delete_entity_vectors(entity.id) from handle_delete() after the FTS cleanup loop, before the caller removes the entity row.

Out of scope

Schema-level FK/migration redesign. A foreign key from search_vector_chunks.entity_id -> entity(id) could be considered later, but it would require a migration and would still not fully solve SQLite cleanup, because search_vector_embeddings is a vec0 virtual table and cannot rely on normal FK cascades. Explicit service-layer cleanup is still required for correctness across both backends.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions