This document captures our current analysis of gaps between pgparser and PostgreSQL's native parser, with emphasis on 100% syntax compatibility and nodeToString output parity.
- Syntax coverage: pgparser accepts the same SQL grammar as PG.
- AST shape parity: nodes and fields match PG parse trees.
- nodeToString parity: output strings exactly match PG.
All three are required; parsing success alone is insufficient.
- Grammar coverage is incomplete (many missing rule alternatives).
- Some rules are refactored/flattened due to goyacc vs bison differences.
- AST shapes sometimes diverge (e.g., qualified_name, row constructors).
- Location tracking is largely missing (goyacc lacks %locations).
- %locations: bison emits token positions; goyacc doesn't. Many nodes in
pgparser use
Location: -1, so nodeToString cannot match PG. - Conflict resolution: PG grammar relies on bison behavior and
%prec. goyacc sometimes requires rule inlining/rewrites; syntax may be equivalent but structure differs. - Error productions: PG has error-recovery branches that are absent or simplified in pgparser.
UNIQUE (SELECT ...)ina_expr(PG-only).TREAT (expr AS type)(PG-only in func_expr_common_subexpr).ROW()empty constructor (PG-only in row).OPERATOR(schema.op) ANY/ALL(PG-only in subquery_Op).(SELECT ...)[...]/(SELECT ...).field(PG-only in c_expr).NATIONAL CHAR/NCHARtype keywords (PG-only in character rules).
qualified_name: PG producesRangeVar; pgparser produces[]String.ConstCharacter/ConstTypename: typmod defaults handled differently.JsonType: pgparser inlines JSON token; PG uses SystemTypeName path.AexprConst: pgparser lacks some PG-only validation paths for type modifiers.
INhandling: PG usesin_exprhelper; pgparser inlines alternatives ina_expr. Generally equivalent but verifyOperNamepopulation.subquery_Op: pgparser expands basic operators instead ofall_Op/MathOp.
- Location fields: PG sets location via
@n(bison); pgparser lacks this. - AST shape: different node types or list shapes produce different output.
- Field order/omissions: outfuncs order and default values must match PG.
- Implement location propagation in lexer/parser (simulate
%locations). - Align node constructors to match PG raw parse trees.
- Ensure
nodes/outfuncs.gomatches PGoutfuncs.cordering and defaults.
- Continue executing
docs/plans/2026-01-30-gram-completion.md. - Focus on missing/high-impact rules: expressions, window, table_ref, COPY, partitioning, JSON/XML.
- Systematically compare PG vs pgparser raw parse trees.
- Fix structural differences (RangeVar, row constructors, operator names, SetToDefault, TypeName, etc.).
- Add token location tracking (simulated
%locations). - Ensure all nodes emit fields in PG order with matching defaults.
We added a baseline PG parse helper and a diff tool to compare outputs.
tools/pg_parse_helper/pg_parse_helper.ctools/pg_parse_helper/build.sh
Build (requires PG source configured):
PG_SRC=~/Github/postgres tools/pg_parse_helper/build.sh
tools/pg_parse_diff/main.gotools/pg_parse_diff/smoke.sql
Usage:
go run tools/pg_parse_diff/main.go --file tools/pg_parse_diff/smoke.sql
Or for regression SQL:
go run tools/pg_parse_diff/main.go --dir parser/pgregress/testdata/sql
This compares raw parse trees (same level as pgparser's parser.Parse).
- Close the high-impact missing syntax in expressions and table refs.
- Fix AST shape divergences that create nodeToString diffs (e.g., RangeVar).
- Implement location propagation to match PG output.