Skip to content

fix(store): migrate sqlite-vec table to partition keys#416

Open
arabold wants to merge 2 commits into
mainfrom
fix/411-sqlite-vec-partition-keys
Open

fix(store): migrate sqlite-vec table to partition keys#416
arabold wants to merge 2 commits into
mainfrom
fix/411-sqlite-vec-partition-keys

Conversation

@arabold
Copy link
Copy Markdown
Owner

@arabold arabold commented May 18, 2026

Summary

  • add migration 014 to rebuild documents_vec with sqlite-vec partition keys for library_id and version_id
  • preserve compatible existing vectors and backfill missing rows from documents.embedding
  • update runtime vector table reconciliation and model-change invalidation to create the partition-key schema
  • restore direct partition-filtered KNN search in DocumentStore.findByContent()

Fixes #411.

Migration note

This rebuilds the sqlite-vec virtual table. Operators should stop the server and keep a database backup before applying the migration on real data.

Verification

  • zsh -lc 'source ~/.nvm/nvm.sh && nvm use 22 >/dev/null && npx vitest run src/store/applyMigrations.test.ts && npx vitest run src/store/DocumentStore.test.ts -t "partition-filtered vector search|rebuild old metadata-column vec table"'\n- zsh -lc 'source ~/.nvm/nvm.sh && nvm use 22 >/dev/null && npm run typecheck'\n- zsh -lc 'source ~/.nvm/nvm.sh && nvm use 22 >/dev/null && npm run lint' reports existing unrelated optional-chain warnings in scraper/web files\n

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR migrates the sqlite-vec documents_vec index to use partition keys so vector search can efficiently filter by library and version.

Changes:

  • Adds migration 014 to rebuild documents_vec with partition keys and backfill vectors.
  • Updates DocumentStore vector-table reconciliation and hybrid search to use the partition-key schema.
  • Adds tests for partition-key migration, runtime rebuild, and partition-filtered search.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
db/migrations/014-rebuild-vector-partition-keys.sql Adds the vector-table rebuild migration.
src/store/DocumentStore.ts Reworks vector table creation/rebuild logic and restores direct partition-filtered KNN search.
src/store/DocumentStore.test.ts Adds/updates runtime vector schema and search tests.
src/store/applyMigrations.test.ts Adds migration assertions and vector backfill coverage.
Comments suppressed due to low confidence (1)

src/store/applyMigrations.test.ts:220

  • This test removes every existing vector before re-running migration 014, so it only verifies the backfill path from documents.embedding and does not exercise the migration's preservation path for rows that exist only in the old documents_vec table. Add a case that keeps an old vector row (ideally with no documents.embedding) so regressions in the preservation logic are caught.
      DELETE FROM documents_vec;

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread db/migrations/014-rebuild-vector-partition-keys.sql Outdated
Comment thread src/store/DocumentStore.ts Outdated
Comment thread src/store/DocumentStore.test.ts Outdated
Comment thread src/store/applyMigrations.test.ts Outdated
Comment thread db/migrations/014-rebuild-vector-partition-keys.sql Outdated
@arabold
Copy link
Copy Markdown
Owner Author

arabold commented May 18, 2026

Addressed Copilot review comments in 4b010de:

  • Migration and runtime rebuild now derive library_id/version_id from current documents -> pages -> versions rows while preserving only the old vector embedding.
  • Staging tables are disk-backed regular tables (_documents_vec_*) instead of temporary tables, avoiding temp_store = MEMORY amplification for large vector indexes.
  • Migration test now awaits applyMigrations() and covers both old-vector preservation and documents.embedding backfill.
  • Runtime rebuild test now verifies stale old vector metadata gets corrected to current partition keys, with the old vector row as the only embedding source.

Verification under Node 22:

  • npx vitest run src/store/applyMigrations.test.ts
  • npx vitest run src/store/DocumentStore.test.ts -t "partition-filtered vector search|rebuild old metadata-column vec table"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix sqlite-vec search performance with proper partition-key migration

2 participants