felderize

Felderize — Spark SQL to Feldera SQL Translator

felderize attempts to translate Spark SQL schemas and queries into valid Feldera SQL using LLM-based translation with optional compiler validation.

Setup

cd python/felderize
python3 -m venv .venv
source .venv/bin/activate
pip install -e .

Note: pip install -e . is required before running felderize. It registers the package and CLI command.

Create a .env file with your API key:

echo 'ANTHROPIC_API_KEY=your-key-here' > .env

Usage

Run a built-in example

# List available examples
felderize example

# Translate an example (validates by default)
felderize example simple

# Without compiler validation
felderize example simple --no-validate

# Log SQL submitted to the validator at each attempt
felderize example json --verbose

# Use a specific compiler binary
felderize example simple --compiler /path/to/sql-to-dbsp

# Output as JSON
felderize example simple --json-output

Available examples:

Name	Description
`simple`	Date truncation, GROUP BY
`strings`	INITCAP, LPAD, NVL, CONCAT_WS
`arrays`	array_contains, size, element_at
`joins`	Null-safe equality (`<=>`)
`windows`	LAG, running SUM OVER
`aggregations`	COUNT DISTINCT, HAVING (includes unsupported: COLLECT_LIST, PERCENTILE_APPROX)
`json`	get_json_object → PARSE_JSON + VARIANT access (combined file)
`topk`	ROW_NUMBER TopK, QUALIFY, DATEDIFF → TIMESTAMPDIFF (combined file)

The JSON output contains:

{
  "feldera_schema": "...",   // translated DDL (CREATE TABLE statements)
  "feldera_query": "...",    // translated query (CREATE VIEW statements)
  "unsupported": [...],      // unsupported Spark features found
  "warnings": [...],         // non-fatal issues
  "explanations": [...],     // explanations for translation decisions
  "status": "success|unsupported|error"
}

Translate your own SQL

Two input formats are supported:

Separate schema and query files:

felderize translate path/to/schema.sql path/to/query.sql
felderize translate path/to/schema.sql path/to/query.sql --validate

Single combined file (CREATE TABLE and CREATE VIEW statements in one file):

felderize translate-file path/to/combined.sql
felderize translate-file path/to/combined.sql --validate

Note: Running without --validate prints a warning — the output SQL has not been verified against the Feldera compiler.

Both commands accept:

--verbose to log the SQL submitted to the validator at each repair attempt
--compiler PATH to specify the path to the Feldera compiler binary (overrides FELDERA_COMPILER env var)

Batch translation

felderize batch path/to/data_dir/ --output-dir results/

Each subdirectory should contain *_schema.sql and *_query.sql files.

Configuration

Environment variables (set in .env):

Variable	Description	Default
`ANTHROPIC_API_KEY`	Anthropic API key	(required)
`FELDERIZE_LLM_PROVIDER`	`anthropic` or `openai`	`anthropic`
`FELDERIZE_MODEL`	LLM model to use	`claude-sonnet-4-20250514`
`OPENAI_API_KEY`	OpenAI API key (if using openai provider)	—
`FELDERA_COMPILER`	Path to sql-to-dbsp compiler (can also be set with `--compiler`)	`<repo-root>/sql-to-dbsp-compiler/SQL-compiler/sql-to-dbsp`

How it works

Loads translation rules from a single skill file (spark/data/skills/spark_skills.md)
Sends Spark SQL to the LLM with rules, validated examples, and relevant Feldera SQL documentation (from docs.feldera.com/docs/sql/)
Parses the translated Feldera SQL from the LLM response
Optionally validates output against the Feldera compiler, retrying with error feedback if needed

Support

Contact us at support@feldera.com for assistance with unsupported Spark SQL features.

Name		Name	Last commit message	Last commit date
parent directory ..
spark		spark
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Felderize — Spark SQL to Feldera SQL Translator

Setup

Usage

Run a built-in example

Translate your own SQL

Batch translation

Configuration

How it works

Support

FilesExpand file tree

felderize

Directory actions

More options

Directory actions

More options

Latest commit

History

felderize

Folders and files

parent directory

README.md

Felderize — Spark SQL to Feldera SQL Translator

Setup

Usage

Run a built-in example

Translate your own SQL

Batch translation

Configuration

How it works

Support