ask is SQLRite's natural-language query feature: type a question in English, get back generated SQL ready to run against your database. It ships across every product surface — the REPL, the desktop app, all four SDKs (Python / Node.js / Go / WASM), the sqlrite-mcp server (so any MCP client gets the ask tool for free), and the embedded Rust library — with a single underlying engine and one consistent set of defaults.
This doc is the canonical reference. For the per-language API details, the SDK READMEs go deeper; for the design decisions, see docs/phase-7-plan.md §7g.
- What
askdoes - Architecture
- Configuration —
SQLRITE_LLM_*env vars - Defaults
- How to use it from each surface
- The shared
AskResponseshape - Errors and how they surface
- Prompt caching
- Security notes
- Provider support
- Cost considerations
- Limitations
Given a connection to a SQLRite database and a natural-language question, ask:
- Walks the database's schema (your
CREATE TABLEstatements, alphabetically sorted for prompt-cache stability). - Builds a structured prompt — frozen system rules block, then the schema dump wrapped in a cacheable Anthropic prompt-cache breakpoint, then the user's question.
- Calls the configured LLM provider (Anthropic by default).
- Parses the model's response into
{sql, explanation, usage}— tolerant to fenced JSON / leading prose because real LLM output drifts even with strict instructions.
ask does NOT execute the SQL by default. The convention across every SDK: ask() returns the generated SQL for the caller to review (or hand to a confirm-and-run UX); ask_run() (or its language-idiomatic equivalent) is the one-shot generate-and-execute convenience.
The Rust crate sqlrite-ask (published on crates.io) holds the core machinery. Two halves:
| Half | Where it lives | Wasm-safe? |
|---|---|---|
| Core (prompt construction, response parsing, types) | sqlrite-ask lib root |
✅ yes |
HTTP transport (AnthropicProvider, ureq + rustls) |
sqlrite-ask::provider::anthropic, gated behind http feature |
❌ no (ureq doesn't compile to wasm32) |
The engine integration (the Connection::ask extension trait + the sqlrite::ask::ask family of free functions) lives in sqlrite-engine itself, gated behind the engine's ask feature (default-on for the CLI binary; off for the WASM SDK and any minimal library embedding).
The schema-dump helper (sqlrite::ask::schema::dump_schema_for_database) is always available, no feature flag needed — it's pure-engine code that the WASM SDK uses to introspect schemas without pulling in the HTTP transport.
Every surface (except WASM, which has the split JS-callback shape) reads the same environment variables for zero-config use:
| Variable | Purpose | Default |
|---|---|---|
SQLRITE_LLM_PROVIDER |
LLM provider | anthropic |
SQLRITE_LLM_API_KEY |
Provider API key | (required for any LLM call) |
SQLRITE_LLM_MODEL |
Model ID | claude-sonnet-4-6 |
SQLRITE_LLM_MAX_TOKENS |
Per-call max output tokens | 1024 |
SQLRITE_LLM_CACHE_TTL |
Anthropic prompt-cache TTL on the schema block | 5m (also 1h or off) |
Set them once in your shell rc and every surface picks them up:
export SQLRITE_LLM_API_KEY="sk-ant-…"
# Optional overrides:
# export SQLRITE_LLM_MODEL="claude-haiku-4-5"
# export SQLRITE_LLM_CACHE_TTL="1h"For per-call / per-connection overrides, each SDK exposes an AskConfig struct/object you can pass explicitly — see the SDK sections below.
Same across every surface:
| Default | Value | Why |
|---|---|---|
| Model | claude-sonnet-4-6 |
Cost/quality sweet spot for NL→SQL. Haiku 4.5 is buggy on joins/aggregates/vectors; Opus 4.7 overkills the task at 5× cost. |
max_tokens |
1024 |
SQL output rarely exceeds 500 tokens. Leaves headroom for a long explanation. |
| Cache TTL | 5m |
Break-even at 2 calls per cached prefix; right for interactive REPL/notebook use. Set 1h for editor/desktop sessions where the same DB is queried sporadically over an hour. |
| Provider | anthropic |
Per Phase 7 plan Q4 — Anthropic-first; OpenAI / Ollama follow-ups planned. |
Every example below assumes SQLRITE_LLM_API_KEY is set in the environment. Each section also shows the per-call explicit-config form for non-default keys.
$ sqlrite my.sqlrite
sqlrite> .ask How many users are over 30?
Generated SQL:
SELECT COUNT(*) FROM users WHERE age > 30
Rationale: Counts users older than thirty.
Run? [Y/n] y
+-------+
| count |
+-------+
| 47 |
+-------+
Confirmation defaults to y (just hit enter). n skips. Ctrl-C / EOF also skip — paranoid default for LLM-generated SQL. Per Phase 7g.2 retrospective in the roadmap.
Click the Ask… button in the editor toolbar. A composer panel slides in above the editor. Type a question, hit Cmd/Ctrl+Enter (or click "Generate SQL"), and the generated SQL drops into the editor textarea for review. Click Run when ready.
The Tauri Rust backend reads SQLRITE_LLM_API_KEY from the env Tauri inherited at launch, makes the HTTP call server-side, and returns only {sql, explanation} to the webview. The API key never crosses into the browser-render process. Same security story as the WASM split, achieved here as a natural side effect of how Tauri's command bridge works.
If SQLRITE_LLM_API_KEY is missing, the panel surfaces a clean "missing API key" error in the existing error slot.
[dependencies]
sqlrite-engine = "0.14"
sqlrite-ask = "0.14"use sqlrite::{Connection, ConnectionAskExt};
use sqlrite_ask::AskConfig;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let conn = Connection::open("foo.sqlrite")?;
// Path 1: env vars (zero config)
let cfg = AskConfig::from_env()?;
let resp = conn.ask("How many users are over 30?", &cfg)?;
println!("{}", resp.sql);
println!("{}", resp.explanation);
// Path 2: explicit config
let cfg = AskConfig {
api_key: Some("sk-ant-…".to_string()),
model: "claude-haiku-4-5".to_string(),
..AskConfig::default()
};
let resp = conn.ask("count by age", &cfg)?;
Ok(())
}The ConnectionAskExt::ask method is gated behind the engine's ask feature (default-on). Equivalent free functions live at sqlrite::ask::{ask, ask_with_database, ask_with_provider, ask_with_database_and_provider} — pick whichever shape reads better at the call site.
pip install sqlriteimport sqlrite
conn = sqlrite.connect("foo.sqlrite")
# Path 1: env vars
resp = conn.ask("How many users are over 30?")
print(resp.sql)
print(resp.explanation)
# Path 2: explicit config
cfg = sqlrite.AskConfig(
api_key="sk-ant-…",
model="claude-haiku-4-5",
cache_ttl="1h",
)
resp = conn.ask("count by age", cfg)
# Path 3: per-connection (set once, reuse)
conn.set_ask_config(cfg)
resp = conn.ask("anything") # uses cfg
# Convenience: generate + execute
rows = conn.ask_run("list active users").fetchall()AskConfig.__repr__ deliberately omits the API key value (shows <set> or None). See sdk/python/README.md for the full reference.
npm install @joaoh82/sqlriteimport { Database, AskConfig } from '@joaoh82/sqlrite';
const db = new Database('foo.sqlrite');
// Path 1: env vars
const resp = db.ask('How many users are over 30?');
// Path 2: explicit config (camelCase per JS convention)
const cfg = new AskConfig({
apiKey: 'sk-ant-…',
model: 'claude-haiku-4-5',
cacheTtl: '1h',
});
const resp = db.ask('count by age', cfg);
// Path 3: per-connection
db.setAskConfig(cfg);
const resp = db.ask('anything');
// Convenience: generate + execute
const rows = db.askRun('list active users');Auto-generated TypeScript types in index.d.ts. AskConfig.toString() deliberately omits the API key value. See sdk/nodejs/README.md for the full reference.
go get github.com/joaoh82/rust_sqlite/sdk/goimport (
"database/sql"
sqlrite "github.com/joaoh82/rust_sqlite/sdk/go"
)
db, _ := sql.Open("sqlrite", "foo.sqlrite")
// Path 1: env vars (nil cfg)
resp, err := sqlrite.Ask(db, "How many users are over 30?", nil)
// Path 2: explicit config
cfg := &sqlrite.AskConfig{
APIKey: "sk-ant-…",
Model: "claude-haiku-4-5",
MaxTokens: 512,
CacheTTL: "1h",
}
resp, _ := sqlrite.Ask(db, "count by age", cfg)
// Convenience: generate + execute
rows, _ := sqlrite.AskRun(db, "list active users", nil)
defer rows.Close()
// Context-aware variants for connection-pool acquisition
resp, _ = sqlrite.AskContext(ctx, db, "...", cfg)
rows, _ = sqlrite.AskRunContext(ctx, db, "...", cfg)(*AskConfig).String() deliberately omits the API key value. See sdk/go/README.md for the full reference.
The WASM SDK is the only surface that requires a backend you control. Browsers can't call api.anthropic.com directly (CORS) and can't safely hold an API key (anyone with DevTools can read it). So the WASM SDK splits the work: the browser builds the prompt and parses the response, your backend does the HTTP call.
npm install @joaoh82/sqlrite-wasmimport init, { Database } from '@joaoh82/sqlrite-wasm';
await init();
const db = new Database();
db.exec(`CREATE TABLE users (id INTEGER PRIMARY KEY, name TEXT, age INTEGER)`);
async function ask(question) {
// Step 1: build the LLM-API payload locally. No key needed.
const payload = db.askPrompt(question);
// Step 2: send to YOUR backend, which adds the key + forwards.
const apiResponse = await fetch('/api/llm/complete', {
method: 'POST',
headers: { 'content-type': 'application/json' },
body: JSON.stringify(payload),
}).then(r => r.text());
// Step 3: parse the model's reply.
return db.askParse(apiResponse);
}
const result = await ask('How many users are over 30?');
console.log(result.sql);The backend proxy is ~10 lines on any modern serverless platform. See docs/ask-backend-examples.md for ready-to-deploy templates — Cloudflare Workers, Vercel Edge Functions, Deno Deploy, Firebase Functions, and Node/Express.
A runnable end-to-end demo (browser + zero-dep Node proxy) lives at examples/wasm/ — make build && make ask-demo. See sdk/wasm/README.md for the full reference.
The sqlrite-mcp server exposes ask as a tool that any Model Context Protocol client (Claude Code, Cursor, mcp-inspector, etc.) can call without any glue code. The MCP server holds the API key in its own process environment — set SQLRITE_LLM_API_KEY in the MCP client's server-config env block once, and the ask tool works for every subsequent invocation.
# Install the MCP server:
cargo install sqlrite-mcpIn the MCP client config (Claude Code's ~/.claude.json shown):
{
"mcpServers": {
"sqlrite": {
"command": "sqlrite-mcp",
"args": ["/path/to/your.sqlrite"],
"env": { "SQLRITE_LLM_API_KEY": "sk-ant-…" }
}
}
}Now any LLM driving that MCP client sees an ask tool alongside query / execute / list_tables / etc., and can call it with { "question": "...", "execute": true } to generate-and-run in a single round trip. Per-call overrides (model, max_tokens, cache_ttl) are passed as additional tool arguments. See docs/mcp.md for the full tool reference and the other six tools the server exposes.
The ask tool is gated behind the ask cargo feature (default-on). If you want a leaner MCP binary with no LLM machinery (six pure-SQL tools only, no ureq / rustls in the dep tree), build with cargo install sqlrite-mcp --no-default-features.
Every surface returns the same logical structure (the field names are spelled idiomatically per language):
| Field | Meaning |
|---|---|
sql |
The generated SQL, ready to execute. Empty string if the model declined to generate SQL for the schema. |
explanation |
One-sentence rationale from the model. May be empty. |
usage.input_tokens |
Tokens billed as input on this call. |
usage.output_tokens |
Tokens billed as output on this call. |
usage.cache_creation_input_tokens |
Tokens written to the prompt cache (charged at ~1.25× normal). Non-zero on the first call against a fresh schema. |
usage.cache_read_input_tokens |
Tokens served from the prompt cache (charged at ~0.1× normal). Use this to verify caching is working — see Prompt caching. |
| Failure mode | Where it surfaces | Typical message |
|---|---|---|
| Missing API key | ask() call returns / throws |
"missing API key (set SQLRITE_LLM_API_KEY ...)" |
| HTTP transport error (network) | ask() call returns / throws |
"HTTP transport error: <details>" |
| LLM API 4xx/5xx | ask() call returns / throws |
"API returned status <code>: <Anthropic error type+message>" |
| Model declined (empty SQL) | ask() returns response with sql=""; ask_run() raises with the model's explanation |
"model declined to generate SQL: <explanation>" |
| Model output unparseable | ask() returns / throws |
"model output not valid JSON: <raw>" |
The parser is tolerant — strict JSON, fenced JSON (```json … ```), and JSON-with-leading-prose all parse — because real LLM output drifts even with strict prompt instructions.
Anthropic's prompt cache lets the schema dump be served at ~10% of normal input cost on repeat calls. The schema block in ask's prompt carries a cache_control: ephemeral marker that's stable across calls (alphabetical column order, byte-identical rules block) so the cache reliably hits.
Verifying it works:
# Python — same shape in every SDK
resp1 = conn.ask("first question")
resp2 = conn.ask("second question")
assert resp2.usage.cache_read_input_tokens > 0 # cache hit on the schemaIf cache_read_input_tokens stays zero across repeated asks against the same schema, something is invalidating the cache. Common culprits:
- Timestamps / UUIDs / random IDs leaking into the schema dump (the schema dump is byte-stable, but if you use
ALTER TABLEbetween asks, the schema changes). - Switching models between calls (cache is model-scoped).
- Switching the cache TTL (
5mvs1hare separate cache entries).
For long-running editor / desktop sessions where the same DB is queried sporadically over an hour, set cache_ttl to "1h" — costs 2× write premium instead of 1.25× but stays alive between asks.
| Surface | API key location |
|---|---|
| REPL | Process env (SQLRITE_LLM_API_KEY) — visible to other processes owned by the same user, same as any tool you run from a shell. |
| Desktop | Tauri Rust backend's process env. Webview (the JS rendering process) never sees the key. |
| Rust library | Wherever you put it. Read from env, secrets manager, vault, etc. |
| Python / Node / Go SDKs | Wherever you put it. Same flexibility — env, secrets manager, runtime config. |
| WASM | YOUR backend, never the browser tab. The browser hands the prompt to your backend, the backend adds the key and forwards. See docs/ask-backend-examples.md. |
| MCP server | The spawned sqlrite-mcp process's environment. The MCP client sets it in its server-config env block once at spawn time; no tool call ever echoes the key back. See docs/mcp.md. |
Every SDK's AskConfig representation deliberately omits the API key value. Printing the config in logs / debuggers / Jupyter cells / console.log won't leak the secret. Each shows a <set> / <unset> marker so you can tell whether a key is configured:
Python: AskConfig(provider="anthropic", model="claude-sonnet-4-6", max_tokens=1024, cache_ttl="5m", api_key=<set>)
Node.js: AskConfig(provider="anthropic", model="claude-sonnet-4-6", maxTokens=1024, cacheTtl="5m", apiKey=<set>)
Go: AskConfig(provider="anthropic", model="claude-sonnet-4-6", maxTokens=1024, cacheTtl="5m", apiKey=<set>)
The AskResponse returned to your code carries {sql, explanation, usage} — never the API key, never the request body, never the raw API response. Logging an AskResponse is safe.
Phase 7g.1–7g.7 ships with Anthropic as the only built-in provider. Per Phase 7 plan Q4, OpenAI and Ollama follow-ups are planned but not yet implemented. The internal Provider trait is open — Rust callers can supply a custom backend via ask_with_schema_and_provider (see sqlrite-ask's docs).
For non-Anthropic providers today, the WASM SDK already offers a clean path: your backend translates the Anthropic-shaped payload to whatever provider it talks to (the system blocks and messages array map cleanly to OpenAI's messages field, for example). No SDK changes needed; provider variety lives entirely on your backend. Per-SDK native support for OpenAI/Ollama is tracked as a Phase 7g.x follow-up.
ask's typical cost per call (Anthropic Sonnet 4.6, no cache hit on schema):
- Input tokens: ~3,000 — ~500 for the rules block, varies with your schema (a small DB ~500; a 20-table app DB might run 2,000+).
- Output tokens: ~50–150 — generated SQL + one-sentence explanation.
At Sonnet 4.6 pricing ($3/MTok input, $15/MTok output): roughly $0.01 per first call, then ~$0.001 per cached follow-up call within the 5-minute TTL.
For high-volume scenarios:
- Set
model: "claude-haiku-4-5"to drop cost ~3× at the price of slightly worse SQL on complex schemas. - Set
cache_ttl: "1h"if your DB is queried sporadically over an hour — costs 2× cache-write but keeps every subsequent call cheap. - Inspect
resp.usage.cache_read_input_tokensafter each call to confirm caching is hitting.
- No streaming.
askwaits for the full response. Streaming would complicate the confirm-and-run flow + the SDK return-type story for marginal UX gain on a small payload. - No multi-turn. Stateless — every call is a fresh prompt. Conversational refinement ("now sort by age") is its own UX problem.
- No parameter binding in generated SQL. The model emits literal-inlined SQL, matching the engine's current parameter-binding story (deferred to Phase 5a.2 across the whole stack).
- Anthropic only at the SDK layer today. OpenAI / Ollama require translation on your own backend (clean path on the WASM SDK; Rust crate has the
Providertrait open).
docs/phase-7-plan.md§7g — design decisions + sub-phase breakdown.docs/ask-backend-examples.md— ready-to-deploy backend proxies for the WASM SDK (Cloudflare Workers / Vercel Edge / Deno Deploy / Firebase / Express).sdk/python/README.md,sdk/nodejs/README.md,sdk/go/README.md,sdk/wasm/README.md— per-SDK API references.examples/wasm/— runnable browser demo with the Ask flow.docs/embedding.md— Rust library embedding guide that includesConnectionAskExt::ask.