Skip to content

Overlapping qmd processes crash with SQLITE_BUSY (no busy_timeout): update dies at insertContent while an embed runs #710

@BAM-CRP

Description

@BAM-CRP

Summary

openDatabase() in src/db.ts sets no busy_timeout, and SQLite's default is 0. The index is WAL with exactly one writer at a time, so whenever two qmd processes overlap, the second writer's FIRST write statement throws SQLITE_BUSY: database is locked instead of waiting for the lock.

How it bites in practice

  1. update vs a long embed. A large-corpus embed (or the post-upgrade re-embed backlog) runs for hours, committing per batch. Any qmd update launched meanwhile (second shell, cron job, editor integration) dies at its first content write: insertContent / indexFiles -> SQLITE_BUSY. The update errors out and that pass reports 0 chunks embedded.
  2. First-open schema migration vs anything. After an upgrade, the first command to open the DB runs the one-time migration over the whole index (minutes on a multi-GB index). Any other qmd command issued during it throws SQLITE_BUSY at initializeDatabase / DROP TRIGGER, which looks like a corrupted install but is just lock contention.

For a tool that encourages background and scheduled refreshes of a personal corpus, two qmd processes overlapping is the normal case, not an edge case.

Repro

# terminal A: large backlog
qmd embed --max-docs-per-batch 500

# terminal B, while A runs:
qmd update    # -> SQLITE_BUSY at insertContent/indexFiles

Environment

  • qmd 2.5.3 (3f751cd), git install
  • Windows 11, index ~10 GB / ~72k docs across 4 collections
  • Reproduces under BOTH runtimes (bun:sqlite and node/better-sqlite3); neither sets a busy timeout by default

Proposed fix

Set a busy timeout on the shared open path so writers queue instead of failing:

// src/db.ts
export function openDatabase(path: string): Database {
  const db = new _Database(path) as Database;
  const raw = Number(process.env.QMD_SQLITE_BUSY_TIMEOUT_MS);
  const busyTimeoutMs = Number.isFinite(raw) && raw >= 0 ? Math.floor(raw) : 120_000;
  db.exec(`PRAGMA busy_timeout = ${busyTimeoutMs}`);
  return db;
}

Embeds commit per batch, so a queued writer gets the lock at the next batch boundary; 2 minutes outlasts worst-case commit pauses on a 10 GB index. The env var keeps an escape hatch (0 = old fail-fast behavior).

Running this as a local patch: an update launched against a held write lock queues ~3s and succeeds where it previously threw instantly. PR ready, sending it now.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions