pgparse

A pure-Go PostgreSQL SQL parser. No cgo. No WebAssembly runtime. No native library to link. It compiles into any Go program as ordinary source and imposes zero per-process memory overhead at startup.

▶ Try it live in your browser → zkrebbekx.github.io/pgparse

Type any PostgreSQL and watch it parse instantly — AST tree, read/write classification, and real µs/parse timing, running 100% client-side (no server, your SQL never leaves the tab).

The library (github.com/zkrebbekx/pgparse) is pure Go with no WebAssembly — nothing you import pulls in a wasm runtime. WebAssembly is used only by the separate playground/ module, which compiles the pure-Go library to a GOOS=js build so it can run in the browser. Importing pgparse into your Go program involves no wasm and no cgo.

It exists to fill the gap between the two common options:

Approach	Cost
`pg_query_go`	Wraps the real Postgres parser via cgo — needs a C toolchain, cross-compilation pain, and a non-trivial binary.
WebAssembly builds of `libpg_query`	Pure Go to call, but the wasm runtime allocates a large linear memory region on startup for every instance.
pgparse	Hand-written scanner + recursive-descent parser. Pure Go, zero deps, no startup allocation, low per-parse allocation.

pgparse targets a pragmatic, high-coverage subset of PostgreSQL DML rather than the entire grammar — see Coverage. If you need every Postgres node exactly, use pg_query_go. If you need to parse the SQL that real apps actually write, fast, with no cgo, this is for you.

Requires Go 1.20+. Zero runtime dependencies (goconvey is a test-only dep).

Install

go get github.com/zkrebbekx/pgparse

Usage

package main

import (
	"fmt"
	"log"

	"github.com/zkrebbekx/pgparse"
)

func main() {
	res, err := pgparse.Parse(`
		SELECT u.id, u.name, count(o.id) AS orders
		FROM users u
		LEFT JOIN orders o ON o.user_id = u.id
		WHERE u.active = true
		GROUP BY u.id, u.name
		HAVING count(o.id) > 0
		ORDER BY orders DESC
		LIMIT 10
	`)
	if err != nil {
		log.Fatal(err)
	}

	sel := res.Stmts[0].(*pgparse.SelectStmt)
	fmt.Println("columns:", len(sel.Columns))
	fmt.Println("from:", sel.From[0].(*pgparse.JoinExpr).Kind)
}

Single-statement helper:

stmt, err := pgparse.ParseOne("DELETE FROM logs WHERE created_at < $1")
del := stmt.(*pgparse.DeleteStmt)

Token stream only (e.g. for highlighting):

toks, _ := pgparse.Tokenize("SELECT 1 + 2")

Coverage

Statements

SELECT — projection with aliases, DISTINCT / DISTINCT ON, FROM with comma and explicit joins (INNER, LEFT, RIGHT, FULL, CROSS, OUTER, ON / USING, LATERAL subqueries), WHERE, GROUP BY, HAVING, ORDER BY (ASC/DESC, NULLS FIRST/LAST), LIMIT, OFFSET
WITH CTEs (incl. RECURSIVE) and subqueries in FROM / expressions
Set operations: UNION / INTERSECT / EXCEPT (ALL), with correct precedence (INTERSECT binds tighter) and a tail that binds to the whole expression
Window functions: OVER (PARTITION BY ... ORDER BY ...), frame clauses (ROWS/RANGE/GROUPS ... BETWEEN ... AND ..., EXCLUDE), named windows (OVER w), FILTER (WHERE ...), WITHIN GROUP (ORDER BY ...), and aggregate ORDER BY (array_agg(x ORDER BY y))
INSERT — column lists, multi-row VALUES, INSERT ... SELECT, ON CONFLICT (...) DO NOTHING | DO UPDATE SET ..., RETURNING
UPDATE — single and multi-column SET (a, b) = (v1, v2) / = (SELECT ...), FROM, WHERE, RETURNING
DELETE — USING, WHERE, RETURNING
DDL — CREATE TABLE (column + table constraints: PRIMARY KEY, NOT NULL, UNIQUE, DEFAULT, CHECK, REFERENCES/FOREIGN KEY, IF NOT EXISTS, TEMP), CREATE [OR REPLACE] VIEW, CREATE [UNIQUE] INDEX (USING, partial WHERE, DESC), DROP TABLE/VIEW/INDEX (IF EXISTS, CASCADE), ALTER TABLE (ADD/DROP COLUMN, ADD/DROP CONSTRAINT, ALTER COLUMN … TYPE/SET/DROP DEFAULT/NOT NULL, RENAME)
Query features — VALUES lists (standalone, in FROM, as set-op operands), set-returning functions in FROM (WITH ORDINALITY), LATERAL and NATURAL joins, GROUP BY ROLLUP/CUBE/GROUPING SETS, ORDER BY … USING, FETCH FIRST … ROWS ONLY/WITH TIES, FOR UPDATE/SHARE locking
Expression extras — quantified op ANY/ALL (array | subquery), ARRAY[…] and ARRAY(subquery), IS [NOT] DISTINCT FROM, row/tuple constructors, the full open operator class (geometric, range, text-search operators)
Utility & admin statements (SET, SHOW, COPY, GRANT/REVOKE, ANALYZE, VACUUM, EXPLAIN, transaction control, TRUNCATE, COMMENT, CREATE TYPE/SEQUENCE/FUNCTION/…, and DDL options pgparse does not model) are recognised and validated as RawStmt — leading keyword plus verbatim text

Expressions

Logical (AND / OR / NOT), comparison, arithmetic, || concatenation — all with correct precedence and associativity
JSON/array operators (-> ->> #> #>> @> <@ ? ?| ?&), bitwise (& | # << >> ~), and regex match (~ ~* !~ !~*); array subscript and slice (a[1], a[1:3], a[:2])
CASE (simple + searched), CAST(x AS t) and x::t, typed literals (date '...'), INTERVAL '90' day
IN (list | subquery), BETWEEN, LIKE / ILIKE, IS [NOT] NULL/TRUE/FALSE
EXISTS (...), scalar subqueries, function calls (DISTINCT, count(*)), SQL special functions (extract, substring, position, trim, overlay)
Literals (string with '' escapes, dollar-quoted, int/float, bool, NULL), positional parameters $n, qualified names schema.table.column, table.*

Out of scope (use pg_query_go): MERGE, GRANT/REVOKE, PL/pgSQL bodies, COPY, multi-word type names (timestamp with time zone, double precision — use timestamptz etc.), the long tail of ALTER sub-commands, and exact pg_query node-tree compatibility. Operators are lexed by maximal munch, so adjacent operator characters without a space (e.g. a=-1) are read as one operator (=-); write a = -1.

The AST is idiomatic typed Go (see ast.go) — ergonomic to walk and pattern-match, not a protobuf mirror. Walk traverses it in pre-order:

stmt, _ := pgparse.ParseOne("SELECT * FROM a JOIN b ON a.id = b.a")
var tables []string
pgparse.Walk(stmt, func(n pgparse.Node) bool {
        if t, ok := n.(*pgparse.TableName); ok {
                tables = append(tables, t.Name)
        }
        return true // return false to skip a node's children
})
// tables == ["a", "b"]

Read vs. write classification

Mutates answers a common question in one call: does this SQL change data or schema? Useful as a routing or observability hint.

res, _ := pgparse.Parse("UPDATE accounts SET balance = balance - 10 WHERE id = $1")
res.Mutates()   // true

r2, _ := pgparse.Parse("SELECT * FROM accounts WHERE id = $1")
r2.Mutates()    // false  (r2.ReadOnly() == true)

Not a security boundary. Classification is purely syntactic, so it cannot see side effects inside functions — SELECT nextval('s') and SELECT writes_rows() are reported read-only because they look like reads. pgparse also parses a subset of PostgreSQL and is not the authoritative parser. Use Mutates as a hint; gate real writes with server-side controls (read-only roles, default_transaction_read_only, a read-replica connection).

Classify(stmt) gives the finer category — ClassReadOnly, ClassWrite (INSERT/UPDATE/DELETE, TRUNCATE, MERGE, data-modifying CTEs, SELECT … INTO), ClassDDL (CREATE/ALTER/DROP, GRANT, …), ClassTransaction (BEGIN/COMMIT/ROLLBACK/SAVEPOINT — non-mutating), or ClassUtility (recognised admin statements whose effect is not modelled, treated as possibly-writing). Data-modifying CTEs (WITH x AS (UPDATE …) SELECT …) are detected as writes, and EXPLAIN is treated as possibly-mutating because EXPLAIN ANALYZE executes its argument. SELECT … FOR UPDATE is classified read-only — it locks rows but does not change data (note it still requires a primary, not a read replica).

Deparse (AST → SQL)

Deparse renders any AST node back to SQL. It is deterministic and idempotent (deparse∘parse∘deparse == deparse∘parse), which makes it both a formatting utility and a round-trip test oracle — every query in the test corpus is verified to survive parse → print → re-parse unchanged.

stmt, _ := pgparse.ParseOne("select a,b from t where a>1")
fmt.Println(pgparse.Deparse(stmt))
// SELECT a, b FROM t WHERE a > 1

Robustness

Parse never panics: any internal panic is recovered and returned as an error, and a Go fuzz target (FuzzParse) drives the unguarded parser to prove it. Millions of executions over arbitrary and malformed input produce errors, never crashes or hangs.

go test -fuzz=FuzzParse -fuzztime=30s

The suite is BDD (goconvey), race-tested for concurrency, round-trips every parse through the deparser, and drives the full PostgreSQL regression corpus through parse → classify → deparse → re-parse — ~89% statement coverage (go test -cover).

Completeness vs other Go parsers

A feature-by-feature acceptance matrix (in comparison/) over 23 representative constructs, parsed by each engine. pg_query_go (the real PostgreSQL parser) is the fidelity baseline; pgparse and GoSQLX are the pure-Go contenders.

Versions benchmarked (latest releases at time of writing): pg_query_go v6.2.2 · GoSQLX v1.14.0 · go-pgquery (wasm, latest, wazero v1.12.0) · pgparse — this repo. Reproduce with make compare / make memcompare.

	pgparse	pg_query_go v6	GoSQLX v1.14
features accepted	23 / 23	23 / 23	20 / 23
multi-column `UPDATE SET (a,b)=(…)`	✓	✓	✗
`extract` / `substring` keyword syntax	✓	✓	✗
typed literal + `INTERVAL '90' day`	✓	✓	✗
`LATERAL` join	✓	✓	✓
DDL `CREATE TABLE`	✓	✓	✓

pgparse matches the real-PostgreSQL baseline across all 23 representative constructs, and parses the multi-column UPDATE form GoSQLX rejects.

Breadth: the PostgreSQL regression suite

A 23-feature matrix only proves the happy path. For a breadth measure, pgparse is run against a subset of PostgreSQL's own regression test suite (src/test/regress/sql) — the SQL that libpg_query/pg_query_go is validated against. Each file is split into statements; a statement counts only when pg_query_go (the real parser) accepts it, which discards psql meta-commands and the suite's deliberately-invalid cases. The score is the share of those genuinely-valid statements each pure-Go parser also accepts:

	statements	accepted
pgparse	7,985	99.4%
GoSQLX v1.14	7,985	48.4%

pgparse accepts 99.4% of the real PostgreSQL statements pg_query_go accepts, far ahead of GoSQLX. Of those, DML, queries, and core DDL are parsed into a full typed AST; utility and administrative statements (SET, COPY, GRANT, ANALYZE, CREATE TYPE/SEQUENCE/FUNCTION, DROP ROLE, …) and the DDL options pgparse does not model (partitioning, storage parameters, …) are recognised as a RawStmt: the leading keyword plus the verbatim, delimiter-validated statement text. So "accepted" means recognised and lexically validated, with full structure for the DML/DDL core. The remaining ~2% are deep edge cases (subqueries nested inside aggregate ORDER BY, array slices as INSERT targets, …). For an exhaustive node tree of every statement, use pg_query_go.

vs `pg_query_go`, in one place

pg_query_go is the PostgreSQL parser (via cgo), so on coverage it is the 100% baseline. The trade is everything else. Measured over the ~5,300 regression statements both engines accept:

	pgparse	pg_query_go v6.2.2
statements accepted (valid PG)	98%	100%
latency / statement	~3.0 µs	~55 µs
speedup	~18×	1×
memory / statement	~2.3 KB	~2.9 KB
allocations / statement	~18	~57
cgo / C toolchain	none	required
startup memory cost	none	C runtime

Latency, memory, and allocations are measured over the statements pgparse parses into a full AST (RawStmt-recognised statements are excluded so both sides do real parsing work). In short: pg_query_go for exhaustive fidelity; pgparse when you want most of the SQL real apps write, ~18× faster, with no cgo.

Reproduce all of the above with make compare.

Memory, CPU & concurrency vs cgo / wasm

The original motivation for pgparse was a WebAssembly build of pg_query (wasilibs/go-pgquery, which runs libpg_query as a wasm module via wazero) using too much memory. That is real and measurable. Each engine here parsed the regression corpus repeatedly; RSS is the process peak (cgo and wasm memory does not appear in Go's MemStats). Apple M-series, one process per engine (make memcompare):

Single-threaded

engine	ns / statement	startup RSS	peak RSS
pgparse	~2,600	3 MB	19 MB
GoSQLX v1.14	~7,700	5 MB	21 MB
pg_query_go v6 (cgo)	~51,000	12 MB	26 MB
go-pgquery (wasm)	~95,000	255 MB	273 MB

8 concurrent workers

engine	ns / statement	peak RSS
pgparse	~1,240	26 MB
GoSQLX v1.14	~4,500	28 MB
pg_query_go v6 (cgo)	~7,500	31 MB
go-pgquery (wasm)	~28,300	2,250 MB

The wasm memory issue, and why pgparse avoids it. WebAssembly linear memory can only grow — memory.grow never shrinks, and freed blocks are not returned to the OS. A wasm parser therefore ratchets up to the high-water mark of the largest/most queries it has seen and stays there; because a wazero instance is single-threaded, concurrency needs a pool of instances, each holding its own grown memory. In the table above that compounds to ~2.4 GB under 8 workers, on top of a 255 MB fixed startup cost.

pgparse has none of this: it is pure Go with no linear memory, so each parse allocates only its AST (tokens alias the input string — zero copy) and the garbage collector reclaims it and returns pages to the OS. Memory stays flat (~26 MB) under 8× concurrency, ~95× less than the wasm build and below the cgo binding, while running ~5–20× faster.

Concurrency safety. pgparse holds no shared mutable state — the keyword and word tables are read-only, and each Parse builds its own lexer and parser — so it is safe to call from any number of goroutines with no pool and no lock. This is enforced by a -race test (concurrent_test.go) running tens of thousands of overlapping parses.

Performance

Hand-written byte scanner (no regex) producing tokens whose values alias the source string, feeding a single-pass recursive-descent parser. No reflection, no intermediate string copies in the hot path.

Measured on Go 1.26 / arm64 (go test -bench=. -benchmem):

BenchmarkParse/simple-10        774518     1564 ns/op   25.58 MB/s    1488 B/op    19 allocs/op
BenchmarkParse/join-10          194725     6043 ns/op   41.04 MB/s    5168 B/op    65 allocs/op
BenchmarkParse/cte_window-10    195241     6195 ns/op   49.56 MB/s    5752 B/op    64 allocs/op
BenchmarkParse/expr_heavy-10    127866     9399 ns/op   27.66 MB/s   11984 B/op    90 allocs/op
BenchmarkParse/insert-10        268135     4464 ns/op   43.23 MB/s    3448 B/op    41 allocs/op
BenchmarkTokenize-10            399202     2997 ns/op   82.74 MB/s    3176 B/op    14 allocs/op

Unlike cgo/wasm approaches, there is no fixed startup memory cost — a process that never parses pays nothing, and each parse allocates only its own AST.

Reproduce:

make bench

Head-to-head vs `pg_query_go`

Benchmarked over the TPC-H query corpus (all 22 queries parse). Both engines parse the identical queries; pg_query_go v6.2.2 is the cgo binding around the real PostgreSQL parser. Apple M-series, go test -bench=Corpus -benchmem:

Engine	ns / query	B / query	allocs / query
pgparse	~14,200	~9,700	~94
`pg_query_go` v6	~376,000	~17,700	~392
pgparse advantage	~26× faster	~2× less	~4× fewer

pg_query_go does strictly more — it produces the full-fidelity PostgreSQL node tree for every statement kind. Its per-call cost includes the cgo boundary and protobuf (de)serialization of that tree, which is the real price a Go program pays to use it. pgparse trades exhaustive fidelity for a lean Go AST, no cgo, and the throughput above. Pick accordingly.

The comparison lives in a separate module (comparison/) so the root module never pulls in cgo. Reproduce:

make compare        # see Makefile; sets the macOS CGO workaround

Coverage of the corpus is asserted by TestTPCHCoverage (22/22).

Stability

pgparse follows Semantic Versioning. The public API — the top-level functions (Parse, ParseOne, Tokenize, Deparse, Classify, Mutates) and the exported AST types in ast.go / DDL nodes — is the compatibility surface.

As of v1.0.0 the public API is covered by SemVer:

No breaking changes to the public API within a major version. New AST fields and new statement/expression kinds may be added (so always include a default case when type-switching over Stmt/Expr); fields and types are not removed or renamed except in a new major version.
Parse never panics and bounds recursion and input size.

Check the CHANGELOG before upgrading.

Safety on untrusted input and the limits of Mutates/Classify are described in SECURITY.md.

Security

Run on untrusted input safely: Parse never panics, recursion depth is bounded, and MaxInputBytes caps input size. See SECURITY.md for the threat model and how to report a vulnerability. Note that Mutates/Classify is a syntactic hint, not a security control.

License

MIT — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.github		.github
comparison		comparison
example		example
playground		playground
testdata/tpch		testdata/tpch
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
ast.go		ast.go
bench_test.go		bench_test.go
classify.go		classify.go
classify_test.go		classify_test.go
concurrent_test.go		concurrent_test.go
corpus_test.go		corpus_test.go
ddl.go		ddl.go
ddl_ast.go		ddl_ast.go
ddl_deparse.go		ddl_deparse.go
ddl_test.go		ddl_test.go
deparse.go		deparse.go
dml.go		dml.go
errors.go		errors.go
errors_test.go		errors_test.go
example_test.go		example_test.go
expr.go		expr.go
ext.go		ext.go
feature_test.go		feature_test.go
fuzz_test.go		fuzz_test.go
go.mod		go.mod
go.sum		go.sum
lexer.go		lexer.go
lexer_test.go		lexer_test.go
markers_test.go		markers_test.go
parser.go		parser.go
parser_test.go		parser_test.go
pgparse.go		pgparse.go
roundtrip_test.go		roundtrip_test.go
token.go		token.go
tpch_coverage_test.go		tpch_coverage_test.go
walk.go		walk.go
walk_test.go		walk_test.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pgparse

▶ Try it live in your browser → zkrebbekx.github.io/pgparse

Install

Usage

Coverage

Read vs. write classification

Deparse (AST → SQL)

Robustness

Completeness vs other Go parsers

Breadth: the PostgreSQL regression suite

vs `pg_query_go`, in one place

Memory, CPU & concurrency vs cgo / wasm

Performance

Head-to-head vs `pg_query_go`

Stability

Security

License

About

Uh oh!

Releases 9

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

pgparse

▶ Try it live in your browser → zkrebbekx.github.io/pgparse

Install

Usage

Coverage

Read vs. write classification

Deparse (AST → SQL)

Robustness

Completeness vs other Go parsers

Breadth: the PostgreSQL regression suite

vs pg_query_go, in one place

Memory, CPU & concurrency vs cgo / wasm

Performance

Head-to-head vs pg_query_go

Stability

Security

License

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 9

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

vs `pg_query_go`, in one place

Head-to-head vs `pg_query_go`

Packages