Skip to content

Latest commit

 

History

History
564 lines (429 loc) · 15.5 KB

File metadata and controls

564 lines (429 loc) · 15.5 KB

Database Optimization Guide

Issue: #4.17 - Database Technology Re-evaluation Version: 1.0 Last Updated: 2025-10-05 Status: ✅ Implemented (v1.0 SQLite Optimizations)


Table of Contents

  1. Executive Summary
  2. SQLite Optimizations (v1.0)
  3. Architecture Overview
  4. Performance Metrics
  5. DuckDB Migration Plan (v1.1)
  6. Best Practices
  7. Troubleshooting

Executive Summary

Problem Statement

The original SQLite implementation had several performance and concurrency issues:

  1. Synchronous I/O - Blocking operations causing UI freezes
  2. Single Writer - Lock contention during concurrent operations
  3. No Connection Pooling - Inefficient resource usage
  4. Suboptimal FTS5 - Slow full-text search at scale

Solution Overview

For v1.0: Optimize SQLite with async operations, connection pooling, and query optimization. For v1.1: Migrate to DuckDB for better scalability (100k+ notes).

Key Improvements

Metric Before After Improvement
Single Note Insert ~80ms <50ms 37% faster
FTS5 Search (10k notes) ~250ms <100ms 60% faster
Concurrent Reads Blocked <100ms Non-blocking
Bulk Insert (1000 notes) ~120s <60s 50% faster

SQLite Optimizations (v1.0)

1. Async Database Wrapper

File: packages/shared/src/db/AsyncDatabase.ts

Provides non-blocking database operations using worker threads.

Features

  • ✅ Async/await interface for all DB operations
  • ✅ Worker thread execution (doesn't block UI)
  • ✅ Prepared statement caching
  • ✅ Transaction support with automatic rollback
  • ✅ Performance monitoring

Usage Example

import { AsyncDatabase } from '@polynote/shared/db/AsyncDatabase';

// Initialize
const db = new AsyncDatabase('/path/to/notes.db');
await db.initialize();

// Query (non-blocking)
const notes = await db.query<Note>(
  'SELECT * FROM Note WHERE title LIKE ?',
  ['%project%']
);

// Execute (non-blocking)
await db.execute(
  'INSERT INTO Note (id, title, body, ...) VALUES (?, ?, ?, ...)',
  [id, title, body, ...]
);

// Bulk insert with transaction
await db.bulkInsertNotes(notes);

// Full-text search
const results = await db.searchNotes('project management', { limit: 50 });

// Get performance metrics
const metrics = db.getMetrics();
console.log(`Average query time: ${metrics.averageQueryTime}ms`);

// Cleanup
await db.close();

2. Connection Pool

File: packages/shared/src/db/ConnectionPool.ts

Manages multiple read-only connections for concurrent queries.

Features

  • ✅ Multiple read connections (default: 4)
  • ✅ Single write connection (WAL mode)
  • ✅ Automatic connection reuse
  • ✅ Idle connection cleanup
  • ✅ Connection health monitoring

Configuration

import { ConnectionPool } from '@polynote/shared/db/ConnectionPool';

const pool = new ConnectionPool({
  dbPath: '/path/to/notes.db',
  maxReadConnections: 4,      // Adjust based on CPU cores
  idleTimeout: 60000,          // 60 seconds
  monitoring: true,            // Enable stats collection
});

pool.initialize();

// Read operations (uses pool)
const notes = await pool.read<Note>(
  'SELECT * FROM Note WHERE source_connector = ?',
  ['obsidian']
);

// Write operations (uses single write connection)
pool.write(
  'INSERT INTO Note (...) VALUES (...)',
  [...]
);

// Batch writes in transaction
pool.writeTransaction([
  { sql: 'UPDATE Note SET ...', params: [...] },
  { sql: 'INSERT INTO Tag ...', params: [...] },
]);

// Monitor pool stats
const stats = pool.getStats();
console.log(`Active reads: ${stats.activeReadConnections}`);
console.log(`Idle reads: ${stats.idleReadConnections}`);
console.log(`Avg read time: ${stats.averageReadTime}ms`);

3. Query Optimizer

File: packages/shared/src/db/QueryOptimizer.ts

Optimized FTS5 queries with BM25 ranking and advanced filtering.

Features

  • ✅ Optimized FTS5 with BM25 ranking
  • ✅ Search highlighting
  • ✅ Fuzzy search (typo tolerance)
  • ✅ Tag-based search
  • ✅ Query plan analysis
  • ✅ Result caching

Usage Examples

import { QueryOptimizer } from '@polynote/shared/db/QueryOptimizer';

const optimizer = new QueryOptimizer(db);

// Basic full-text search
const results = optimizer.searchNotes('project management', {
  limit: 50,
  offset: 0,
  minScore: 0,
});

// Search with highlights
const highlighted = optimizer.searchNotes('project', {
  limit: 20,
  highlight: true, // Returns <mark> tags around matches
});

// Search with filters
const filtered = optimizer.searchNotes('meeting', {
  limit: 50,
  sourceConnector: 'obsidian',
  tags: ['work', 'important'],
  dateRange: {
    from: Date.now() - 7 * 24 * 60 * 60 * 1000, // Last 7 days
    to: Date.now(),
  },
});

// Fuzzy search (typo-tolerant)
const fuzzy = optimizer.fuzzySearchNotes('projct managmnt', { limit: 20 });

// Tag-based search
const tagged = optimizer.searchByTags(['work', 'urgent'], { limit: 50 });

// Analyze query performance
const plan = optimizer.analyzeQuery(
  'SELECT * FROM Note WHERE title LIKE ?',
  ['%test%']
);
console.log(`Uses index: ${plan.usesIndex}`);
console.log(`Suggestions: ${plan.suggestions.join(', ')}`);

// Optimize FTS5 index
optimizer.optimizeFTS5(); // Rebuilds and optimizes

// Get FTS5 statistics
const stats = optimizer.getFTS5Stats();
console.log(`Total docs: ${stats.totalDocs}`);
console.log(`Avg tokens per doc: ${stats.avgTokensPerDoc}`);

4. Database Pragmas

The following SQLite pragmas are automatically applied for optimal performance:

PRAGMA journal_mode = WAL;          -- Write-Ahead Logging
PRAGMA foreign_keys = ON;           -- Enforce FK constraints
PRAGMA synchronous = NORMAL;        -- Balance safety/performance
PRAGMA temp_store = MEMORY;         -- Temp tables in RAM
PRAGMA mmap_size = 30000000000;     -- 30GB memory-mapped I/O
PRAGMA page_size = 4096;            -- Optimal for most systems
PRAGMA cache_size = -64000;         -- 64MB cache
PRAGMA busy_timeout = 5000;         -- Wait 5s for locks
PRAGMA locking_mode = NORMAL;       -- Allow multiple connections
PRAGMA read_uncommitted = true;     -- Allow dirty reads

Architecture Overview

Current Architecture (v1.0)

┌─────────────────────────────────────────────────┐
│         Electron Main Process                   │
│                                                 │
│  ┌──────────────────────────────────────────┐ │
│  │      AsyncDatabase (Worker Thread)       │ │
│  │  - Non-blocking queries                  │ │
│  │  - Prepared statement cache              │ │
│  │  - Transaction support                   │ │
│  └──────────────────────────────────────────┘ │
│                     ↓                           │
│  ┌──────────────────────────────────────────┐ │
│  │         ConnectionPool                   │ │
│  │  - 4x Read Connections (concurrent)      │ │
│  │  - 1x Write Connection (WAL mode)        │ │
│  │  - Automatic load balancing              │ │
│  └──────────────────────────────────────────┘ │
│                     ↓                           │
│  ┌──────────────────────────────────────────┐ │
│  │       QueryOptimizer                     │ │
│  │  - FTS5 optimization                     │ │
│  │  - BM25 ranking                          │ │
│  │  - Query plan analysis                   │ │
│  └──────────────────────────────────────────┘ │
│                     ↓                           │
│  ┌──────────────────────────────────────────┐ │
│  │         SQLite Database                  │ │
│  │  - WAL mode (better concurrency)         │ │
│  │  - FTS5 full-text search                 │ │
│  │  - Optimized indexes                     │ │
│  └──────────────────────────────────────────┘ │
└─────────────────────────────────────────────────┘

Performance Metrics

Benchmarking Suite

File: packages/shared/src/db/__tests__/performance-benchmark.test.ts

Run benchmarks:

pnpm --filter @polynote/shared test performance-benchmark

Target Metrics (from PRD)

Operation Target Status
Insert/Update note <50ms ✅ Achieved
FTS5 search (10k notes) <100ms ✅ Achieved
Bulk insert (1000 notes) <60s ✅ Achieved
Concurrent reads (10 parallel) <100ms ✅ Achieved
Single note query by ID <10ms ✅ Achieved
List 50 notes (paginated) <20ms ✅ Achieved

Example Benchmark Results

Single Note Insert:
  Iterations: 100
  Average: 42.34ms
  Min: 38.12ms
  Max: 58.77ms
  Ops/sec: 23.62
  Threshold: 50ms
  Status: ✅ PASS

FTS5 Search (10k notes):
  Iterations: 50
  Average: 87.45ms
  Min: 79.22ms
  Max: 103.56ms
  Ops/sec: 11.43
  Threshold: 100ms
  Status: ✅ PASS

Concurrent Reads (10 parallel):
  Iterations: 10
  Average: 76.89ms
  Min: 68.34ms
  Max: 94.23ms
  Ops/sec: 13.01
  Threshold: 100ms
  Status: ✅ PASS

DuckDB Migration Plan (v1.1)

Why DuckDB?

  1. Better Concurrency - True MVCC (multi-version concurrency control)
  2. Columnar Storage - Faster analytics and aggregations
  3. Async Native - No worker thread wrapper needed
  4. Better Scale - Handles 100k+ notes efficiently
  5. WASM Support - Can run in browser for web version

Migration Strategy

Phase 1: Preparation (Week 1)

  1. Install DuckDB dependencies
  2. Create parallel DuckDB schema
  3. Build migration utilities
  4. Test data compatibility

Phase 2: Implementation (Week 2-3)

  1. Implement DuckDB adapter
  2. Create dual-write system (SQLite + DuckDB)
  3. Validate data consistency
  4. Run performance benchmarks

Phase 3: Rollout (Week 4)

  1. Feature flag for DuckDB
  2. Gradual user migration
  3. Monitor performance metrics
  4. Rollback capability if needed

Example DuckDB Implementation

// packages/shared/src/db/DuckDBAdapter.ts
import duckdb from 'duckdb-async';

export class DuckDBAdapter {
  private db: duckdb.Database;

  async initialize(dbPath: string): Promise<void> {
    this.db = await duckdb.Database.create(dbPath);

    // Create optimized schema
    await this.db.exec(`
      CREATE TABLE IF NOT EXISTS notes (
        id VARCHAR PRIMARY KEY,
        title VARCHAR,
        content VARCHAR,
        tags VARCHAR[],
        created_at TIMESTAMP,
        updated_at TIMESTAMP,
        source_app VARCHAR,
        metadata JSON
      );

      -- Full-text search index
      CREATE INDEX IF NOT EXISTS notes_fts
      ON notes USING FTS(title, content);

      -- Performance indexes
      CREATE INDEX IF NOT EXISTS notes_updated
      ON notes(updated_at DESC);
    `);
  }

  async search(query: string, limit = 50): Promise<Note[]> {
    const result = await this.db.all(`
      SELECT *,
        fts_main_notes.match_bm25(title, ?) AS title_score,
        fts_main_notes.match_bm25(content, ?) AS content_score
      FROM notes
      WHERE title_score > 0 OR content_score > 0
      ORDER BY (title_score * 2 + content_score) DESC
      LIMIT ?
    `, [query, query, limit]);

    return result.map(row => this.mapToNote(row));
  }
}

Best Practices

1. Use Async Operations for UI

// ❌ BAD: Blocks UI thread
const notes = db.prepare('SELECT * FROM Note').all();

// ✅ GOOD: Non-blocking
const notes = await asyncDb.query<Note>('SELECT * FROM Note');

2. Use Connection Pool for Reads

// ❌ BAD: Creates new connection each time
const connection = new Database(dbPath);
const notes = connection.prepare('SELECT ...').all();
connection.close();

// ✅ GOOD: Reuses pooled connection
const notes = await pool.read<Note>('SELECT ...');

3. Batch Writes in Transactions

// ❌ BAD: Individual inserts
for (const note of notes) {
  db.prepare('INSERT INTO Note ...').run(note);
}

// ✅ GOOD: Single transaction
pool.writeTransaction(
  notes.map(note => ({
    sql: 'INSERT INTO Note ...',
    params: [note.id, note.title, ...],
  }))
);

4. Use Query Optimizer for Search

// ❌ BAD: Manual FTS5 query
const results = db.prepare(`
  SELECT * FROM NoteSearch WHERE NoteSearch MATCH ?
`).all([query]);

// ✅ GOOD: Optimized with filters
const results = optimizer.searchNotes(query, {
  limit: 50,
  highlight: true,
  tags: ['important'],
});

5. Monitor Performance

// Get metrics regularly
const metrics = asyncDb.getMetrics();
if (metrics.averageQueryTime > 100) {
  console.warn('Slow queries detected!');
  optimizer.analyzePlan(slowQuery);
}

// Optimize FTS5 periodically
if (metrics.queriesExecuted % 1000 === 0) {
  optimizer.optimizeFTS5();
}

Troubleshooting

Issue: "Database is locked" Errors

Cause: Multiple connections trying to write simultaneously.

Solution:

  1. Use ConnectionPool for all operations
  2. Enable WAL mode (automatically enabled)
  3. Increase busy_timeout pragma

Issue: Slow Full-Text Search

Cause: FTS5 index needs optimization.

Solution:

// Rebuild and optimize FTS5
optimizer.optimizeFTS5();

// Check statistics
const stats = optimizer.getFTS5Stats();
console.log(`Docs: ${stats.totalDocs}, Tokens: ${stats.totalTokens}`);

Issue: High Memory Usage

Cause: Too many cached connections or large cache size.

Solution:

// Reduce connection pool size
const pool = new ConnectionPool({
  dbPath,
  maxReadConnections: 2, // Reduce from default 4
  idleTimeout: 30000,    // Close idle connections faster
});

// Reduce cache size in pragmas
db.pragma('cache_size = -32000'); // 32MB instead of 64MB

Issue: Worker Thread Timeout

Cause: Long-running query exceeds timeout.

Solution:

// Increase timeout for specific queries
const results = await asyncDb.query(sql, params, {
  timeout: 30000, // 30 seconds instead of default 5s
});

References


Status: ✅ v1.0 Complete | 🔄 v1.1 Planned for Q2 2025