docs: query-performance guide by coderdan · Pull Request #212 · cipherstash/encrypt-query-language

coderdan · 2026-05-13T11:52:29Z

Summary

A new reference page that consolidates the practical advice for getting fast queries out of EQL-encrypted columns. The 2.3 operator-inlining work surfaced enough subtleties — natural vs extractor forms, the ORDER BY sort-key trap, the inlining chain through helpers, when a functional index will and won't match — that they warrant a dedicated page rather than being scattered across database-indexes.md, the upgrade notes, and tribal knowledge.

Outline

Why functional indexes — small leaves, no superuser, structural planner match.
Operator inlining mechanics — the four conditions PG checks, the syntactic match rule, the transitive nature of the chain, EXPLAIN-based verification.
Natural / extractor / hybrid query forms — when each is the right default, with examples.
ORDER BY: the sort-key trap — three-shape contrast (natural / hybrid / fully extractor) with the 100k benchmark numbers as evidence (885 ms vs 1.4 ms).
Equality and GROUP BY / DISTINCT — including the ~425× speedup from GROUP BY on the inlined extractor vs the natural form's plpgsql hash_encrypted path.
LIKE / ILIKE — bloom-filter recipe and the case-sensitivity caveat.
JSONB containment and ste_vec field-level extraction — per-selector vs all-selector recipes.
Common pitfalls — index-creation ordering, missing ANALYZE, stale opclass indexes, pinned search_path, range queries on non-Block-ORE.
Diagnosing with EXPLAIN — what to look for, what to do when the plan is wrong.

Placement

Landed at docs/reference/query-performance.md and linked from docs/README.md next to database-indexes.md. Happy to relocate to a new docs/guides/ directory if we want to start separating action-oriented content from reference. Not blocking on that for this PR — easier to iterate on the content first.

#211 — range-operator inlining (< / <= / > / >=). U-005 callouts in the guide reference that PR's upgrade note.
The 2.3 = inlining (perf: flip eql_v2_encrypted infix operator implementations to inlinable SQL (RFC Phase 1) #193, perf: flip eql_v2_encrypted infix operator implementations to inlinable SQL (#193) #196) — already referenced via U-002.

Test plan

Markdown renders cleanly (checked in preview).
All in-doc links resolve (database-indexes, sql-support, index-config, v2.3 upgrade notes).
Review pass for accuracy on the numbers and the claims about planner behaviour.
Decide whether to keep at reference/ or move to a new guides/.

Draft because the content is a first pass — happy to iterate.

New reference page that consolidates the practical advice for getting fast queries out of EQL-encrypted columns. Centred on the two ingredients the 2.3 work surfaced: functional indexes and operator inlining. Sections: 1. Why functional indexes — small leaves, no superuser, structural planner match. 2. Operator inlining mechanics — the four conditions PG checks, the syntactic match rule, the transitive nature of the chain, EXPLAIN-based verification. 3. Natural / extractor / hybrid query forms — when each is the right default, with examples for each shape. 4. ORDER BY: the sort-key trap — three-shape contrast (natural / hybrid / fully extractor) with the 100k benchmark numbers as evidence. 5. Equality and GROUP BY / DISTINCT — including the ~425× speedup from GROUP BY on the inlined extractor vs the natural form's plpgsql hash_encrypted path. 6. LIKE / ILIKE — bloom filter recipe + the case-sensitivity caveat. 7. JSONB containment and ste_vec — per-selector vs all-selector field recipes; when to use jsonb_path_query vs hmac_256(col, selector). 8. Common pitfalls — index-creation ordering, missing ANALYZE, stale opclass indexes, pinned search_path, the natural-form ORDER BY expectation, range queries on non-Block-ORE columns. 9. Diagnosing with EXPLAIN — what to look for, what to do when the plan is wrong. Linked from docs/README.md under Reference, alongside database-indexes.md. Placement can move to a guides/ directory if we grow more action-oriented content later.

coderabbitai · 2026-05-13T11:52:37Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f2b49c1b-50fa-4b24-9a32-58188dd04f4d

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch dan/query-performance-guide

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…ecipe Rewrites §5 to lead with the extractor form (`GROUP BY eql_v2.hmac_256(col)`) as the only recipe that scales, and explains the planner trap that makes the natural form (`GROUP BY col`) degrade pathologically on real-world tables. What changed in the underlying mechanics, and why the doc needs updating: * `eql_v2.hash_encrypted` was flipped from plpgsql to inlinable SQL in 2.3 (the discriminator backing the natural-form `GROUP BY`). That eliminated the "per-row plpgsql call cost" framing the previous version of this section leaned on — but the natural form is still slow at scale, and the reason is more interesting and more important to surface than the inlining detail. * The real bottleneck is the planner's HashAggregate-vs-GroupAggregate choice, which is driven by estimated hash-table size against `work_mem`. The natural form's key is the full ~1-2 KB encrypted payload; at 100k rows the estimate is 100-200 MB, way over the default 4 MB `work_mem`, so the planner refuses HashAggregate and falls back to GroupAggregate + sort. The extractor's group key is a 32-byte HMAC and fits trivially — HashAggregate every time, no `work_mem` tuning needed. Bench numbers measured on the cipherstash/benches setup post-2.3 with all three operator-inlining PRs in place (#205, #211, hash_encrypted): 100k natural, default 4 MB work_mem: ~29 s (GroupAggregate + Sort) 100k natural, 256 MB work_mem: ~780 ms (HashAggregate) 100k extractor: ~80 ms (HashAggregate, default) 1M natural, 512 MB work_mem: ~234 s (GroupAggregate) Also tightens the §8 pitfall on `hash_encrypted` — pre-2.3 it raised loudly when used against a column without `hm`, which made `GROUP BY` a natural smoke test for misconfig. Post-2.3 it falls back to data-hashing to keep the aggregate from degrading to O(N^2), so the runtime smoke signal is gone. Audit at config time via `eql_v2.has_hmac_256(col)`. Adds a new §8 pitfall calling out the natural-form `GROUP BY` trap specifically — frequent enough that it warrants its own bullet.

…not bounded Updates §4's empirical numbers to include the 1M data point from the cipherstash/benches suite. The previous framing (Top-N cost is "real but bounded — milliseconds, not seconds … if you can live with that, the natural form keeps the query readable") doesn't survive past 100k. At 1M rows on `string_encrypted_1000000` / `integer_encrypted_1000000` with a ~0.5 selectivity predicate, the natural-form Top-N takes ~8.8 s while the hybrid and fully-extractor forms stay around 1 ms. The Sort step just keeps scaling with the post-WHERE row count. Same advice surface as §5 on GROUP BY: there's a documented better recipe, the natural form's plan choice is the trap, write the extractor form. The §5 framing now reads in parallel with this one — both sections land on "the extractor form is what scales; the natural form is for toy-size data". Also flags that the bench suite drops the natural-form `range_lt_ordered_10` scenario and the redundant fully-extractor `range_lt_ore_ordered_10` scenario, keeping only the hybrid recipe. Companion bench change shipping on the `feat/json-benches-rebased` branch in cipherstash/benches.

coderdan added 2 commits May 14, 2026 18:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: query-performance guide#212

docs: query-performance guide#212
coderdan wants to merge 3 commits into
mainfrom
dan/query-performance-guide

coderdan commented May 13, 2026

Uh oh!

coderabbitai Bot commented May 13, 2026 •

edited

Loading

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

coderdan commented May 13, 2026

Summary

Outline

Placement

Related

Test plan

Uh oh!

coderabbitai Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented May 13, 2026 •

edited

Loading