felderize attempts to translate Spark SQL schemas and queries into valid Feldera SQL using LLM-based translation with optional compiler validation.
cd python/felderize
python3 -m venv .venv
source .venv/bin/activate
pip install -e .Note:
pip install -e .is required before runningfelderize. It registers the package and CLI command.
Create a .env file with your API key:
echo 'ANTHROPIC_API_KEY=your-key-here' > .env# List available examples
felderize example
# Translate an example (validates by default)
felderize example simple
# Without compiler validation
felderize example simple --no-validate
# Log SQL submitted to the validator at each attempt
felderize example json --verbose
# Use a specific compiler binary
felderize example simple --compiler /path/to/sql-to-dbsp
# Output as JSON
felderize example simple --json-outputAvailable examples:
| Name | Description |
|---|---|
simple |
Date truncation, GROUP BY |
strings |
INITCAP, LPAD, NVL, CONCAT_WS |
arrays |
array_contains, size, element_at |
joins |
Null-safe equality (<=>) |
windows |
LAG, running SUM OVER |
aggregations |
COUNT DISTINCT, HAVING (includes unsupported: COLLECT_LIST, PERCENTILE_APPROX) |
json |
get_json_object → PARSE_JSON + VARIANT access (combined file) |
topk |
ROW_NUMBER TopK, QUALIFY, DATEDIFF → TIMESTAMPDIFF (combined file) |
The JSON output contains:
{
"feldera_schema": "...", // translated DDL (CREATE TABLE statements)
"feldera_query": "...", // translated query (CREATE VIEW statements)
"unsupported": [...], // unsupported Spark features found
"warnings": [...], // non-fatal issues
"explanations": [...], // explanations for translation decisions
"status": "success|unsupported|error"
}Two input formats are supported:
Separate schema and query files:
felderize translate path/to/schema.sql path/to/query.sql
felderize translate path/to/schema.sql path/to/query.sql --validateSingle combined file (CREATE TABLE and CREATE VIEW statements in one file):
felderize translate-file path/to/combined.sql
felderize translate-file path/to/combined.sql --validateNote: Running without
--validateprints a warning — the output SQL has not been verified against the Feldera compiler.
Both commands accept:
--verboseto log the SQL submitted to the validator at each repair attempt--compiler PATHto specify the path to the Feldera compiler binary (overridesFELDERA_COMPILERenv var)
felderize batch path/to/data_dir/ --output-dir results/Each subdirectory should contain *_schema.sql and *_query.sql files.
Environment variables (set in .env):
| Variable | Description | Default |
|---|---|---|
ANTHROPIC_API_KEY |
Anthropic API key | (required) |
FELDERIZE_LLM_PROVIDER |
anthropic or openai |
anthropic |
FELDERIZE_MODEL |
LLM model to use | claude-sonnet-4-20250514 |
OPENAI_API_KEY |
OpenAI API key (if using openai provider) | — |
FELDERA_COMPILER |
Path to sql-to-dbsp compiler (can also be set with --compiler) |
<repo-root>/sql-to-dbsp-compiler/SQL-compiler/sql-to-dbsp |
- Loads translation rules from a single skill file (
spark/data/skills/spark_skills.md) - Sends Spark SQL to the LLM with rules, validated examples, and relevant Feldera SQL documentation (from
docs.feldera.com/docs/sql/) - Parses the translated Feldera SQL from the LLM response
- Optionally validates output against the Feldera compiler, retrying with error feedback if needed
Contact us at support@feldera.com for assistance with unsupported Spark SQL features.