A pure-Go PostgreSQL SQL parser. No cgo. No WebAssembly runtime. No native library to link. It compiles into any Go program as ordinary source and imposes zero per-process memory overhead at startup.
Type any PostgreSQL and watch it parse instantly — AST tree, read/write classification, and real µs/parse timing, running 100% client-side (no server, your SQL never leaves the tab).
The library (
github.com/zkrebbekx/pgparse) is pure Go with no WebAssembly — nothing you import pulls in a wasm runtime. WebAssembly is used only by the separateplayground/module, which compiles the pure-Go library to aGOOS=jsbuild so it can run in the browser. Importing pgparse into your Go program involves no wasm and no cgo.
It exists to fill the gap between the two common options:
| Approach | Cost |
|---|---|
pg_query_go |
Wraps the real Postgres parser via cgo — needs a C toolchain, cross-compilation pain, and a non-trivial binary. |
WebAssembly builds of libpg_query |
Pure Go to call, but the wasm runtime allocates a large linear memory region on startup for every instance. |
| pgparse | Hand-written scanner + recursive-descent parser. Pure Go, zero deps, no startup allocation, low per-parse allocation. |
pgparse targets a pragmatic, high-coverage subset of PostgreSQL DML rather
than the entire grammar — see Coverage. If you need every Postgres
node exactly, use pg_query_go. If you need to parse the SQL that real apps
actually write, fast, with no cgo, this is for you.
Requires Go 1.20+. Zero runtime dependencies (goconvey is a test-only dep).
go get github.com/zkrebbekx/pgparsepackage main
import (
"fmt"
"log"
"github.com/zkrebbekx/pgparse"
)
func main() {
res, err := pgparse.Parse(`
SELECT u.id, u.name, count(o.id) AS orders
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE u.active = true
GROUP BY u.id, u.name
HAVING count(o.id) > 0
ORDER BY orders DESC
LIMIT 10
`)
if err != nil {
log.Fatal(err)
}
sel := res.Stmts[0].(*pgparse.SelectStmt)
fmt.Println("columns:", len(sel.Columns))
fmt.Println("from:", sel.From[0].(*pgparse.JoinExpr).Kind)
}Single-statement helper:
stmt, err := pgparse.ParseOne("DELETE FROM logs WHERE created_at < $1")
del := stmt.(*pgparse.DeleteStmt)Token stream only (e.g. for highlighting):
toks, _ := pgparse.Tokenize("SELECT 1 + 2")Statements
SELECT— projection with aliases,DISTINCT/DISTINCT ON,FROMwith comma and explicit joins (INNER,LEFT,RIGHT,FULL,CROSS,OUTER,ON/USING,LATERALsubqueries),WHERE,GROUP BY,HAVING,ORDER BY(ASC/DESC,NULLS FIRST/LAST),LIMIT,OFFSETWITHCTEs (incl.RECURSIVE) and subqueries inFROM/ expressions- Set operations:
UNION/INTERSECT/EXCEPT(ALL), with correct precedence (INTERSECTbinds tighter) and a tail that binds to the whole expression - Window functions:
OVER (PARTITION BY ... ORDER BY ...), frame clauses (ROWS/RANGE/GROUPS ... BETWEEN ... AND ...,EXCLUDE), named windows (OVER w),FILTER (WHERE ...),WITHIN GROUP (ORDER BY ...), and aggregateORDER BY(array_agg(x ORDER BY y)) INSERT— column lists, multi-rowVALUES,INSERT ... SELECT,ON CONFLICT (...) DO NOTHING | DO UPDATE SET ...,RETURNINGUPDATE— single and multi-columnSET (a, b) = (v1, v2)/= (SELECT ...),FROM,WHERE,RETURNINGDELETE—USING,WHERE,RETURNING- DDL —
CREATE TABLE(column + table constraints:PRIMARY KEY,NOT NULL,UNIQUE,DEFAULT,CHECK,REFERENCES/FOREIGN KEY,IF NOT EXISTS,TEMP),CREATE [OR REPLACE] VIEW,CREATE [UNIQUE] INDEX(USING, partialWHERE,DESC),DROP TABLE/VIEW/INDEX(IF EXISTS,CASCADE),ALTER TABLE(ADD/DROP COLUMN,ADD/DROP CONSTRAINT,ALTER COLUMN … TYPE/SET/DROP DEFAULT/NOT NULL,RENAME) - Query features —
VALUESlists (standalone, inFROM, as set-op operands), set-returning functions inFROM(WITH ORDINALITY),LATERALandNATURALjoins,GROUP BYROLLUP/CUBE/GROUPING SETS,ORDER BY … USING,FETCH FIRST … ROWS ONLY/WITH TIES,FOR UPDATE/SHARElocking - Expression extras — quantified
op ANY/ALL (array | subquery),ARRAY[…]andARRAY(subquery),IS [NOT] DISTINCT FROM, row/tuple constructors, the full open operator class (geometric, range, text-search operators) - Utility & admin statements (
SET,SHOW,COPY,GRANT/REVOKE,ANALYZE,VACUUM,EXPLAIN, transaction control,TRUNCATE,COMMENT,CREATE TYPE/SEQUENCE/FUNCTION/…, and DDL options pgparse does not model) are recognised and validated asRawStmt— leading keyword plus verbatim text
Expressions
- Logical (
AND/OR/NOT), comparison, arithmetic,||concatenation — all with correct precedence and associativity - JSON/array operators (
->->>#>#>>@><@??|?&), bitwise (&|#<<>>~), and regex match (~~*!~!~*); array subscript and slice (a[1],a[1:3],a[:2]) CASE(simple + searched),CAST(x AS t)andx::t, typed literals (date '...'),INTERVAL '90' dayIN (list | subquery),BETWEEN,LIKE/ILIKE,IS [NOT] NULL/TRUE/FALSEEXISTS (...), scalar subqueries, function calls (DISTINCT,count(*)), SQL special functions (extract,substring,position,trim,overlay)- Literals (string with
''escapes, dollar-quoted, int/float, bool, NULL), positional parameters$n, qualified namesschema.table.column,table.*
Out of scope (use pg_query_go): MERGE, GRANT/REVOKE, PL/pgSQL
bodies, COPY, multi-word type names (timestamp with time zone,
double precision — use timestamptz etc.), the long tail of ALTER
sub-commands, and exact pg_query node-tree compatibility. Operators are lexed
by maximal munch, so adjacent operator characters without a space (e.g. a=-1)
are read as one operator (=-); write a = -1.
The AST is idiomatic typed Go (see ast.go) — ergonomic to walk and
pattern-match, not a protobuf mirror. Walk traverses it in pre-order:
stmt, _ := pgparse.ParseOne("SELECT * FROM a JOIN b ON a.id = b.a")
var tables []string
pgparse.Walk(stmt, func(n pgparse.Node) bool {
if t, ok := n.(*pgparse.TableName); ok {
tables = append(tables, t.Name)
}
return true // return false to skip a node's children
})
// tables == ["a", "b"]Mutates answers a common question in one call: does this SQL change data or
schema? Useful as a routing or observability hint.
res, _ := pgparse.Parse("UPDATE accounts SET balance = balance - 10 WHERE id = $1")
res.Mutates() // true
r2, _ := pgparse.Parse("SELECT * FROM accounts WHERE id = $1")
r2.Mutates() // false (r2.ReadOnly() == true)Not a security boundary. Classification is purely syntactic, so it cannot see side effects inside functions —
SELECT nextval('s')andSELECT writes_rows()are reported read-only because they look like reads. pgparse also parses a subset of PostgreSQL and is not the authoritative parser. UseMutatesas a hint; gate real writes with server-side controls (read-only roles,default_transaction_read_only, a read-replica connection).
Classify(stmt) gives the finer category — ClassReadOnly, ClassWrite
(INSERT/UPDATE/DELETE, TRUNCATE, MERGE, data-modifying CTEs, SELECT … INTO),
ClassDDL (CREATE/ALTER/DROP, GRANT, …), ClassTransaction
(BEGIN/COMMIT/ROLLBACK/SAVEPOINT — non-mutating), or ClassUtility (recognised
admin statements whose effect is not modelled, treated as possibly-writing).
Data-modifying CTEs (WITH x AS (UPDATE …) SELECT …) are detected as writes,
and EXPLAIN is treated as possibly-mutating because EXPLAIN ANALYZE executes
its argument. SELECT … FOR UPDATE is classified read-only — it locks rows but
does not change data (note it still requires a primary, not a read replica).
Deparse renders any AST node back to SQL. It is deterministic and idempotent
(deparse∘parse∘deparse == deparse∘parse), which makes it both a formatting
utility and a round-trip test oracle — every query in the test corpus is
verified to survive parse → print → re-parse unchanged.
stmt, _ := pgparse.ParseOne("select a,b from t where a>1")
fmt.Println(pgparse.Deparse(stmt))
// SELECT a, b FROM t WHERE a > 1Parse never panics: any internal panic is recovered and returned as an error,
and a Go fuzz target (FuzzParse) drives the unguarded parser
to prove it. Millions of executions over arbitrary and malformed input produce
errors, never crashes or hangs.
go test -fuzz=FuzzParse -fuzztime=30sThe suite is BDD (goconvey), race-tested for concurrency, round-trips every
parse through the deparser, and drives the full PostgreSQL regression corpus
through parse → classify → deparse → re-parse — ~89% statement coverage
(go test -cover).
A feature-by-feature acceptance matrix (in comparison/) over 23
representative constructs, parsed by each engine. pg_query_go (the real
PostgreSQL parser) is the fidelity baseline; pgparse and
GoSQLX are the pure-Go contenders.
Versions benchmarked (latest releases at time of writing):
pg_query_gov6.2.2 · GoSQLX v1.14.0 ·go-pgquery(wasm, latest, wazero v1.12.0) · pgparse — this repo. Reproduce withmake compare/make memcompare.
| pgparse | pg_query_go v6 | GoSQLX v1.14 | |
|---|---|---|---|
| features accepted | 23 / 23 | 23 / 23 | 20 / 23 |
multi-column UPDATE SET (a,b)=(…) |
✓ | ✓ | ✗ |
extract / substring keyword syntax |
✓ | ✓ | ✗ |
typed literal + INTERVAL '90' day |
✓ | ✓ | ✗ |
LATERAL join |
✓ | ✓ | ✓ |
DDL CREATE TABLE |
✓ | ✓ | ✓ |
pgparse matches the real-PostgreSQL baseline across all 23 representative
constructs, and parses the multi-column UPDATE form GoSQLX rejects.
A 23-feature matrix only proves the happy path. For a breadth measure, pgparse
is run against a subset of PostgreSQL's own regression test suite
(src/test/regress/sql) — the SQL that libpg_query/pg_query_go is validated
against. Each file is split into statements; a statement counts only when
pg_query_go (the real parser) accepts it, which discards psql meta-commands
and the suite's deliberately-invalid cases. The score is the share of those
genuinely-valid statements each pure-Go parser also accepts:
| statements | accepted | |
|---|---|---|
| pgparse | 7,985 | 99.4% |
| GoSQLX v1.14 | 7,985 | 48.4% |
pgparse accepts 99.4% of the real PostgreSQL statements pg_query_go
accepts, far ahead of GoSQLX. Of those, DML, queries, and core DDL are parsed
into a full typed AST; utility and administrative statements (SET, COPY,
GRANT, ANALYZE, CREATE TYPE/SEQUENCE/FUNCTION, DROP ROLE, …) and the
DDL options pgparse does not model (partitioning, storage parameters, …) are
recognised as a RawStmt: the leading keyword plus the verbatim,
delimiter-validated statement text. So "accepted" means recognised and
lexically validated, with full structure for the DML/DDL core. The remaining
~2% are deep edge cases (subqueries nested inside aggregate ORDER BY, array
slices as INSERT targets, …). For an exhaustive node tree of every statement,
use pg_query_go.
pg_query_go is the PostgreSQL parser (via cgo), so on coverage it is the
100% baseline. The trade is everything else. Measured over the ~5,300 regression
statements both engines accept:
| pgparse | pg_query_go v6.2.2 | |
|---|---|---|
| statements accepted (valid PG) | 98% | 100% |
| latency / statement | ~3.0 µs | ~55 µs |
| speedup | ~18× | 1× |
| memory / statement | ~2.3 KB | ~2.9 KB |
| allocations / statement | ~18 | ~57 |
| cgo / C toolchain | none | required |
| startup memory cost | none | C runtime |
Latency, memory, and allocations are measured over the statements pgparse parses
into a full AST (RawStmt-recognised statements are excluded so both sides do real
parsing work). In short: pg_query_go for exhaustive fidelity; pgparse when you
want most of the SQL real apps write, ~18× faster, with no cgo.
Reproduce all of the above with make compare.
The original motivation for pgparse was a WebAssembly build of pg_query
(wasilibs/go-pgquery, which runs
libpg_query as a wasm module via wazero) using too much memory. That is real and
measurable. Each engine here parsed the regression corpus repeatedly; RSS is the
process peak (cgo and wasm memory does not appear in Go's MemStats). Apple
M-series, one process per engine (make memcompare):
Single-threaded
| engine | ns / statement | startup RSS | peak RSS |
|---|---|---|---|
| pgparse | ~2,600 | 3 MB | 19 MB |
| GoSQLX v1.14 | ~7,700 | 5 MB | 21 MB |
| pg_query_go v6 (cgo) | ~51,000 | 12 MB | 26 MB |
| go-pgquery (wasm) | ~95,000 | 255 MB | 273 MB |
8 concurrent workers
| engine | ns / statement | peak RSS |
|---|---|---|
| pgparse | ~1,240 | 26 MB |
| GoSQLX v1.14 | ~4,500 | 28 MB |
| pg_query_go v6 (cgo) | ~7,500 | 31 MB |
| go-pgquery (wasm) | ~28,300 | 2,250 MB |
The wasm memory issue, and why pgparse avoids it. WebAssembly linear memory
can only grow — memory.grow never shrinks, and freed blocks are not returned
to the OS. A wasm parser therefore ratchets up to the high-water mark of the
largest/most queries it has seen and stays there; because a wazero instance is
single-threaded, concurrency needs a pool of instances, each holding its own
grown memory. In the table above that compounds to ~2.4 GB under 8 workers,
on top of a 255 MB fixed startup cost.
pgparse has none of this: it is pure Go with no linear memory, so each parse allocates only its AST (tokens alias the input string — zero copy) and the garbage collector reclaims it and returns pages to the OS. Memory stays flat (~26 MB) under 8× concurrency, ~95× less than the wasm build and below the cgo binding, while running ~5–20× faster.
Concurrency safety. pgparse holds no shared mutable state — the keyword and
word tables are read-only, and each Parse builds its own lexer and parser — so
it is safe to call from any number of goroutines with no pool and no lock. This
is enforced by a -race test (concurrent_test.go)
running tens of thousands of overlapping parses.
Hand-written byte scanner (no regex) producing tokens whose values alias the source string, feeding a single-pass recursive-descent parser. No reflection, no intermediate string copies in the hot path.
Measured on Go 1.26 / arm64 (go test -bench=. -benchmem):
BenchmarkParse/simple-10 774518 1564 ns/op 25.58 MB/s 1488 B/op 19 allocs/op
BenchmarkParse/join-10 194725 6043 ns/op 41.04 MB/s 5168 B/op 65 allocs/op
BenchmarkParse/cte_window-10 195241 6195 ns/op 49.56 MB/s 5752 B/op 64 allocs/op
BenchmarkParse/expr_heavy-10 127866 9399 ns/op 27.66 MB/s 11984 B/op 90 allocs/op
BenchmarkParse/insert-10 268135 4464 ns/op 43.23 MB/s 3448 B/op 41 allocs/op
BenchmarkTokenize-10 399202 2997 ns/op 82.74 MB/s 3176 B/op 14 allocs/op
Unlike cgo/wasm approaches, there is no fixed startup memory cost — a process that never parses pays nothing, and each parse allocates only its own AST.
Reproduce:
make benchBenchmarked over the TPC-H query corpus (all 22 queries parse). Both engines
parse the identical queries; pg_query_go v6.2.2 is the cgo binding
around the real PostgreSQL parser. Apple M-series, go test -bench=Corpus -benchmem:
| Engine | ns / query | B / query | allocs / query |
|---|---|---|---|
| pgparse | ~14,200 | ~9,700 | ~94 |
pg_query_go v6 |
~376,000 | ~17,700 | ~392 |
| pgparse advantage | ~26× faster | ~2× less | ~4× fewer |
pg_query_go does strictly more — it produces the full-fidelity PostgreSQL
node tree for every statement kind. Its per-call cost includes the cgo
boundary and protobuf (de)serialization of that tree, which is the real price a
Go program pays to use it. pgparse trades exhaustive fidelity for a lean Go AST,
no cgo, and the throughput above. Pick accordingly.
The comparison lives in a separate module (comparison/) so the root module
never pulls in cgo. Reproduce:
make compare # see Makefile; sets the macOS CGO workaroundCoverage of the corpus is asserted by TestTPCHCoverage (22/22).
pgparse follows Semantic Versioning. The public API —
the top-level functions (Parse, ParseOne, Tokenize, Deparse, Classify,
Mutates) and the exported AST types in ast.go / DDL nodes — is the
compatibility surface.
As of v1.0.0 the public API is covered by SemVer:
- No breaking changes to the public API within a major version. New AST fields
and new statement/expression kinds may be added (so always include a
default case when type-switching over
Stmt/Expr); fields and types are not removed or renamed except in a new major version. Parsenever panics and bounds recursion and input size.
Check the CHANGELOG before upgrading.
Safety on untrusted input and the limits of Mutates/Classify are described in
SECURITY.md.
Run on untrusted input safely: Parse never panics, recursion depth is bounded,
and MaxInputBytes caps input size. See SECURITY.md for the
threat model and how to report a vulnerability. Note that Mutates/Classify
is a syntactic hint, not a security control.
MIT — see LICENSE.
