Summary
openDatabase() in src/db.ts sets no busy_timeout, and SQLite's default is 0. The index is WAL with exactly one writer at a time, so whenever two qmd processes overlap, the second writer's FIRST write statement throws SQLITE_BUSY: database is locked instead of waiting for the lock.
How it bites in practice
update vs a long embed. A large-corpus embed (or the post-upgrade re-embed backlog) runs for hours, committing per batch. Any qmd update launched meanwhile (second shell, cron job, editor integration) dies at its first content write: insertContent / indexFiles -> SQLITE_BUSY. The update errors out and that pass reports 0 chunks embedded.
- First-open schema migration vs anything. After an upgrade, the first command to open the DB runs the one-time migration over the whole index (minutes on a multi-GB index). Any other qmd command issued during it throws
SQLITE_BUSY at initializeDatabase / DROP TRIGGER, which looks like a corrupted install but is just lock contention.
For a tool that encourages background and scheduled refreshes of a personal corpus, two qmd processes overlapping is the normal case, not an edge case.
Repro
# terminal A: large backlog
qmd embed --max-docs-per-batch 500
# terminal B, while A runs:
qmd update # -> SQLITE_BUSY at insertContent/indexFiles
Environment
- qmd 2.5.3 (3f751cd), git install
- Windows 11, index ~10 GB / ~72k docs across 4 collections
- Reproduces under BOTH runtimes (bun:sqlite and node/better-sqlite3); neither sets a busy timeout by default
Proposed fix
Set a busy timeout on the shared open path so writers queue instead of failing:
// src/db.ts
export function openDatabase(path: string): Database {
const db = new _Database(path) as Database;
const raw = Number(process.env.QMD_SQLITE_BUSY_TIMEOUT_MS);
const busyTimeoutMs = Number.isFinite(raw) && raw >= 0 ? Math.floor(raw) : 120_000;
db.exec(`PRAGMA busy_timeout = ${busyTimeoutMs}`);
return db;
}
Embeds commit per batch, so a queued writer gets the lock at the next batch boundary; 2 minutes outlasts worst-case commit pauses on a 10 GB index. The env var keeps an escape hatch (0 = old fail-fast behavior).
Running this as a local patch: an update launched against a held write lock queues ~3s and succeeds where it previously threw instantly. PR ready, sending it now.
Summary
openDatabase()insrc/db.tssets nobusy_timeout, and SQLite's default is 0. The index is WAL with exactly one writer at a time, so whenever two qmd processes overlap, the second writer's FIRST write statement throwsSQLITE_BUSY: database is lockedinstead of waiting for the lock.How it bites in practice
updatevs a longembed. A large-corpus embed (or the post-upgrade re-embed backlog) runs for hours, committing per batch. Anyqmd updatelaunched meanwhile (second shell, cron job, editor integration) dies at its first content write:insertContent/indexFiles->SQLITE_BUSY. The update errors out and that pass reports 0 chunks embedded.SQLITE_BUSYatinitializeDatabase/DROP TRIGGER, which looks like a corrupted install but is just lock contention.For a tool that encourages background and scheduled refreshes of a personal corpus, two qmd processes overlapping is the normal case, not an edge case.
Repro
Environment
Proposed fix
Set a busy timeout on the shared open path so writers queue instead of failing:
Embeds commit per batch, so a queued writer gets the lock at the next batch boundary; 2 minutes outlasts worst-case commit pauses on a 10 GB index. The env var keeps an escape hatch (
0= old fail-fast behavior).Running this as a local patch: an
updatelaunched against a held write lock queues ~3s and succeeds where it previously threw instantly. PR ready, sending it now.