CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

GoSQLX is a production-ready, race-free, high-performance SQL parsing SDK for Go that provides lexing, parsing, and AST generation with zero-copy optimizations. The library is designed for enterprise use with comprehensive object pooling for memory efficiency.

Requirements: Go 1.26+ (upgraded from 1.23 to fix stdlib vulnerabilities; mark3labs/mcp-go requires 1.23)

Production Status: ✅ Validated for production deployment (v1.6.0+, current: v1.14.0)

Thread-safe with zero race conditions (20,000+ concurrent operations tested)
1.38M+ ops/sec sustained, 1.5M peak with memory-efficient object pooling
~80-85% SQL-99 compliance (window functions, CTEs, set operations, MERGE, etc.)
Multi-dialect support: PostgreSQL, MySQL, MariaDB, SQL Server, Oracle, SQLite, Snowflake, ClickHouse (8 dialects)

Architecture

Core Components

Tokenizer (pkg/sql/tokenizer/): Zero-copy SQL lexer with full UTF-8 support
Parser (pkg/sql/parser/): Recursive descent parser with one-token lookahead
AST (pkg/sql/ast/): Abstract Syntax Tree nodes with visitor pattern support
Keywords (pkg/sql/keywords/): Multi-dialect SQL keyword definitions
Models (pkg/models/): Core data structures (tokens, spans, locations)
Errors (pkg/errors/): Structured error handling with position tracking
Metrics (pkg/metrics/): Production performance monitoring
Security (pkg/sql/security/): SQL injection detection with severity classification
Linter (pkg/linter/): SQL linting engine with 30 built-in rules (L001-L030)
LSP (pkg/lsp/): Language Server Protocol for IDE integration
GoSQLX (pkg/gosqlx/): High-level simple API (recommended for most users)
Compatibility (pkg/compatibility/): API stability testing

Token Processing Pipeline

Raw SQL bytes → tokenizer.Tokenize() → []models.TokenWithSpan
             → parser.ParseFromModelTokens() → *ast.AST

Object Pooling (Critical for Performance)

The codebase uses extensive sync.Pool for all major data structures:

ast.NewAST() / ast.ReleaseAST() - AST container
tokenizer.GetTokenizer() / tokenizer.PutTokenizer() - Tokenizer instances
Individual pools for SELECT, INSERT, UPDATE, DELETE statements
Expression pools for identifiers, binary expressions, literals

Module Dependencies

Clean hierarchy with minimal coupling (verified against production imports):

# Core parsing chain
models     → (no deps)
errors     → models
metrics    → (no deps)
keywords   → (no deps)
token      → (no deps)
tokenizer  → models, errors, metrics, keywords
ast        → models, metrics
parser     → models, errors, keywords, token, tokenizer, ast

# Higher-level / product packages
formatter  → models, sql/ast, sql/parser, sql/tokenizer
transform  → formatter, sql/ast, sql/keywords, sql/parser, sql/tokenizer
fingerprint→ formatter, sql/ast, sql/parser, sql/tokenizer
security   → sql/ast            (scanner; tests also pull parser, tokenizer)
linter     → sql/parser, sql/tokenizer
           # rule sub-packages additionally import: linter, models, sql/ast
lsp        → errors, models, gosqlx, sql/keywords, sql/parser, sql/tokenizer
cbinding   → gosqlx, sql/ast    (requires CGO; excluded from task test:race)

# High-level wrapper
gosqlx     → all of the above (top-level convenience API)

Notes:

pkg/cbinding requires CGO_ENABLED=1. The Taskfile splits this out: task test:race runs everything except cbinding, and task test:cbinding runs cbinding with CGO on. CI workflows must follow the same split or cbinding is silently skipped.
keywords has no intra-module deps — it's a pure keyword table.
ast depends on models (spans, locations) and metrics (pool instrumentation), NOT on token in production code.

Development Commands

This project uses Task as the task runner:

go install github.com/go-task/task/v3/cmd/task@latest
# Or: brew install go-task (macOS)

Essential Commands

task                    # Show all available tasks
task build              # Build all packages
task build:cli          # Build CLI binary
task install            # Install CLI globally
task test               # Run all tests
task test:race          # Run tests with race detection (CRITICAL)
task test:pkg PKG=./pkg/sql/parser  # Test specific package
task bench              # Run benchmarks with memory tracking
task coverage           # Generate coverage report
task quality            # Run fmt, vet, lint
task check              # Full suite: format, vet, lint, test:race
task ci                 # Full CI pipeline

Running a Single Test

go test -v -run TestSpecificName ./pkg/sql/parser/
go test -v -run "TestParser_Window.*" ./pkg/sql/parser/
go test -v -run "TestParser_TupleIn/Basic" ./pkg/sql/parser/  # Run specific subtest

CLI Tool

./gosqlx validate "SELECT * FROM users"
./gosqlx format -i query.sql
./gosqlx analyze "SELECT COUNT(*) FROM orders GROUP BY status"
./gosqlx parse -f json query.sql
./gosqlx lsp                    # Start LSP server
./gosqlx lint query.sql         # Run linter

Key Implementation Patterns

Memory Management (MANDATORY)

Always use defer with pool return functions:

// High-level API (recommended for most use cases)
ast, err := gosqlx.Parse("SELECT * FROM users")
// No cleanup needed - handled automatically

// Low-level API (for fine-grained control)
tkz := tokenizer.GetTokenizer()
defer tokenizer.PutTokenizer(tkz)  // MANDATORY

astObj := ast.NewAST()
defer ast.ReleaseAST(astObj)       // MANDATORY

Parser Architecture

Recursive descent with one-token lookahead
Main file: pkg/sql/parser/parser.go
Window functions: parseFunctionCall(), parseWindowSpec(), parseWindowFrame()
CTEs: WITH clause with RECURSIVE support
Set operations: UNION/EXCEPT/INTERSECT with left-associative parsing
JOINs: All types with proper left-associative tree logic

Error Handling

Always check errors from tokenizer and parser
Errors include position information (models.Location)
Error codes: E1001-E3004 for tokenizer, parser, semantic errors
Use pkg/errors/ for structured error creation

Safe Type Assertions in Tests

Always use the two-value form for type assertions to avoid panics:

stmt, ok := tree.Statements[0].(*ast.SelectStatement)
if !ok {
    t.Fatalf("expected SelectStatement, got %T", tree.Statements[0])
}

Testing Requirements

Race Detection is Mandatory

task test:race                           # Primary method
go test -race -timeout 60s ./...         # Direct command

Coverage by Package

pkg/models/: 100% - All core data structures
pkg/sql/ast/: 73.4% - AST nodes
pkg/sql/tokenizer/: 76.1% - Zero-copy operations
pkg/sql/parser/: 76.1% - All SQL features
pkg/errors/: 95.6% - Error handling

Benchmarking

task bench                                                    # All benchmarks
go test -bench=BenchmarkName -benchmem ./pkg/sql/parser/     # Specific benchmark
go test -bench=. -benchmem -cpuprofile=cpu.prof ./pkg/...    # With profiling

Performance Regression Tests

Baselines defined in performance_baselines.json at project root
CI environment variability may require baseline adjustments (tolerance %)
Run locally: go test -run TestPerformanceRegression ./pkg/sql/parser/
Skip with race detector (adds 3-5x overhead): automatically skipped

Common Workflows

Adding a New SQL Feature

Update tokens in pkg/models/token.go (if needed)
Add keywords to pkg/sql/keywords/ (if needed)
Extend AST nodes in pkg/sql/ast/
Add parsing logic in pkg/sql/parser/parser.go
Write comprehensive tests
Run: task test:race && task bench
Update CHANGELOG.md

Debugging Parsing Issues

go test -v -run TestTokenizer_YourTest ./pkg/sql/tokenizer/
go test -v -run TestParser_YourTest ./pkg/sql/parser/

Use the visitor pattern in pkg/sql/ast/visitor.go to traverse and inspect AST.

Release Workflow

CRITICAL: Main branch is protected. Never create tags in feature branches.

# 1. Develop in feature branch
git checkout -b feature/branch-name
# ... make changes, update CHANGELOG.md as [Unreleased] ...
git push origin feature/branch-name

# 2. Create PR and get it merged

# 3. After merge, create docs PR for release finalization
git checkout main && git pull
git checkout -b docs/vX.Y.Z-release
# Update CHANGELOG.md with version and date
git push origin docs/vX.Y.Z-release

# 4. After docs PR merged, create tag
git checkout main && git pull
git tag vX.Y.Z -a -m "vX.Y.Z: Release notes"
git push origin vX.Y.Z

# 5. Create GitHub release
gh release create vX.Y.Z --title "vX.Y.Z: Title" --notes "..."

Pre-commit Hooks

The repository has pre-commit hooks that run:

go fmt - Code formatting
go vet - Static analysis
go test -short - Short test suite

Install with: task hooks:install

Additional Documentation

docs/GETTING_STARTED.md - Quick start guide
docs/USAGE_GUIDE.md - Comprehensive usage patterns
docs/LSP_GUIDE.md - LSP server and IDE integration
docs/LINTING_RULES.md - All 30 linting rules reference
docs/SQL_COMPATIBILITY.md - SQL dialect compatibility matrix
docs/ARCHITECTURE.md - Detailed system design
https://gosqlx.dev - Official website with interactive playground
https://gosqlx.dev/playground/ - WASM-powered SQL playground

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLAUDE.md

Project Overview

Architecture

Core Components

Token Processing Pipeline

Object Pooling (Critical for Performance)

Module Dependencies

Development Commands

Essential Commands

Running a Single Test

CLI Tool

Key Implementation Patterns

Memory Management (MANDATORY)

Parser Architecture

Error Handling

Safe Type Assertions in Tests

Testing Requirements

Race Detection is Mandatory

Coverage by Package

Benchmarking

Performance Regression Tests

Common Workflows

Adding a New SQL Feature

Debugging Parsing Issues

Release Workflow

Pre-commit Hooks

Additional Documentation

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

CLAUDE.md

Project Overview

Architecture

Core Components

Token Processing Pipeline

Object Pooling (Critical for Performance)

Module Dependencies

Development Commands

Essential Commands

Running a Single Test

CLI Tool

Key Implementation Patterns

Memory Management (MANDATORY)

Parser Architecture

Error Handling

Safe Type Assertions in Tests

Testing Requirements

Race Detection is Mandatory

Coverage by Package

Benchmarking

Performance Regression Tests

Common Workflows

Adding a New SQL Feature

Debugging Parsing Issues

Release Workflow

Pre-commit Hooks

Additional Documentation