Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 53 additions & 0 deletions .github/workflows/ci-post-release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,55 @@ jobs:
working-directory: ./python/dbt-feldera
run: uv cache prune --ci

# Ideally this would just invoke `publish-python.yml`
#
# But not yet supported:
# https://docs.pypi.org/trusted-publishers/troubleshooting/#reusable-workflows-on-github
# https://github.com/pypa/gh-action-pypi-publish/issues/166
# https://github.com/pypi/warehouse/issues/11096
#
# When this is solved, do this again:
# - name: ""
# uses: ./.github/workflows/publish-python.yml
# secrets: inherit
publish-felderize:
runs-on: ubuntu-latest-amd64
environment:
name: release
url: https://pypi.org/p/felderize
permissions:
contents: read
id-token: write
defaults:
run:
shell: bash
working-directory: ./python
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why pinned hash instead of version?

- name: Install uv
uses: astral-sh/setup-uv@6dfebec6ddbcd197e02256fbdf54deb334fb7f06 # v2
with:
version: "0.11.3"
enable-cache: true
- name: "Set up Python"
uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065 # v5
with:
python-version: "3.10"
- name: Install and build felderize
working-directory: ./python/felderize
run: |
uv venv
uv pip install -e .

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to do pip install -e . explicitly>

uv build
- name: Publish felderize
if: ${{ vars.RELEASE_DRY_RUN == 'false' }}
uses: pypa/gh-action-pypi-publish@ed0c53931b1dc9bd32cbe73a98c7f6766f8a527e # ed0c53931b1dc9bd32cbe73a98c7f6766f8a527e
with:
packages-dir: ./python/felderize/dist
- name: Minimize uv cache
working-directory: ./python/felderize
run: uv cache prune --ci

publish-crates:
name: ""
uses: ./.github/workflows/publish-crates.yml
Expand Down Expand Up @@ -162,6 +211,10 @@ jobs:
run: |
sed -i "s/version = \"${{ env.CURRENT_VERSION }}\"/version = \"${{ env.NEXT_VERSION }}\"/g" pyproject.toml
sed -i "s/version: '${{ env.CURRENT_VERSION }}'/version: '${{ env.NEXT_VERSION }}'/g" dbt/include/feldera/dbt_project.yml
- name: Adjust felderize version
working-directory: ./python/felderize
run: |
sed -i "s/version = \"${{ env.CURRENT_VERSION }}\"/version = \"${{ env.NEXT_VERSION }}\"/g" pyproject.toml
- name: Adjust sql compiler version
working-directory: ./sql-to-dbsp-compiler/SQL-compiler
run: |
Expand Down
35 changes: 35 additions & 0 deletions .github/workflows/publish-python.yml
Original file line number Diff line number Diff line change
Expand Up @@ -91,3 +91,38 @@ jobs:
- name: Minimize uv cache
working-directory: ./python/dbt-feldera
run: uv cache prune --ci

deploy-felderize:
runs-on: ubuntu-latest-amd64
environment:
name: release
url: https://pypi.org/p/felderize

steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
with:
ref: ${{ inputs.tag || github.ref }}
- name: Install uv
uses: astral-sh/setup-uv@6dfebec6ddbcd197e02256fbdf54deb334fb7f06 # v2
with:
version: "0.11.3"
enable-cache: true
- name: "Set up Python"
uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065 # v5
with:
python-version: "3.10"
- name: Install and build felderize
working-directory: ./python/felderize
run: |
uv venv
uv pip install -e .

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this pip install? build should do it anyways right?

uv build
- name: Publish felderize
if: ${{ vars.RELEASE_DRY_RUN == 'false' }}
uses: pypa/gh-action-pypi-publish@ed0c53931b1dc9bd32cbe73a98c7f6766f8a527e # ed0c53931b1dc9bd32cbe73a98c7f6766f8a527e
with:
packages-dir: ./python/felderize/dist

- name: Minimize uv cache
working-directory: ./python/felderize
run: uv cache prune --ci
22 changes: 18 additions & 4 deletions python/felderize/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,26 @@ pip install -e .

> **Note:** `pip install -e .` is required before running `felderize`. It registers the package and CLI command.

**Download the Feldera SQL compiler JAR** (requires Java 19–21 installed):
**The Feldera SQL compiler JAR** (used only for `--validate`; requires Java 19–21 installed):

felderize downloads it for you. The first time you run a command with `--validate`
and no compiler configured, felderize fetches the latest
`sql2dbsp-jar-with-dependencies-*.jar` from
[GitHub Releases](https://github.com/feldera/feldera/releases) into `~/.felderize/`
and reuses it on later runs. To opt out (e.g. in CI or offline), set
`FELDERIZE_AUTO_DOWNLOAD=0`; validation is then skipped unless you point
`FELDERA_COMPILER` / `--compiler` at a JAR.

To fetch or update it explicitly:

```bash
felderize download-compiler
```

This fetches the latest `sql2dbsp-jar-with-dependencies-*.jar` from [GitHub Releases](https://github.com/feldera/feldera/releases) and saves it to `~/.felderize/`. The command prints the exact path — copy it for the next step. Re-run it any time to pick up a newer release; it reports whether you are already on the latest one.
This saves the latest JAR to `~/.felderize/` and prints its path. Re-run it any
time to pick up a newer release; it reports whether you are already on the latest
one. felderize automatically uses the newest JAR cached in `~/.felderize/`, so you
do not need to set `FELDERA_COMPILER` unless you want a specific JAR.

> **Requirement:** felderize needs compiler **v0.304.0 or newer** — earlier releases lack SQL features felderize relies on (e.g. `div_null`, `MAKE_DATE`). `download-compiler` always fetches the latest release, and felderize warns at validation time if the configured compiler is older than v0.304.0.

Expand All @@ -35,7 +48,7 @@ FELDERA_COMPILER=~/.felderize/sql2dbsp-jar-with-dependencies-vX.Y.Z.jar
FELDERIZE_MODEL=claude-sonnet-4-6
```

All three variables are required. `FELDERA_COMPILER` is used only for validation — translation still works without it, but output SQL is not verified. You can also pass `--compiler PATH` and `--model MODEL` per command.
`ANTHROPIC_API_KEY` and `FELDERIZE_MODEL` are required. `FELDERA_COMPILER` is optional: it is used only for validation, and when unset felderize auto-downloads (and caches) the compiler on first `--validate`. Set it to pin a specific JAR. You can also pass `--compiler PATH` and `--model MODEL` per command.

> **Note:** felderize currently requires an Anthropic API key — only Claude models are supported.

Expand Down Expand Up @@ -202,7 +215,8 @@ Environment variables (set in `.env`):
|---|---|---|
| `ANTHROPIC_API_KEY` | Anthropic API key | (required) |
| `FELDERIZE_MODEL` | LLM model to use (can also be set with `--model`) | (required, set in `.env`) |
| `FELDERA_COMPILER` | Path to sql-to-dbsp compiler (can also be set with `--compiler`) | (required for validation) |
| `FELDERA_COMPILER` | Path to sql-to-dbsp compiler (can also be set with `--compiler`) | (optional; auto-downloaded when unset) |
| `FELDERIZE_AUTO_DOWNLOAD` | Auto-download the compiler on first `--validate` when none is configured. Set to `0`/`false` to disable. | `1` |
| `ANTHROPIC_BASE_URL` | Override Anthropic API base URL (for proxies or alternate endpoints) | (optional) |

## Customizing translation
Expand Down
58 changes: 30 additions & 28 deletions python/felderize/felderize/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
from felderize.constants import MINIMUM_COMPILER_VERSION
from felderize.install_feldera_sql_compiler import (
download_compiler,
ensure_compiler,
is_supported_version,
jar_version,
)
Expand Down Expand Up @@ -37,6 +38,31 @@ def _warn_if_unsupported_compiler(compiler_path: str | None) -> None:
)


def _prepare_config(compiler: str | None, model: str | None, validate: bool) -> Config:
"""Build the run config from env, apply CLI overrides, and resolve a compiler.

An explicit ``--compiler`` (or ``FELDERA_COMPILER``) always wins. Otherwise,
when validating, felderize reuses a compiler JAR cached in ``~/.felderize/``
or downloads the latest release — unless auto-download is disabled via
``FELDERIZE_AUTO_DOWNLOAD=0``. Download progress goes to stderr so ``--json-output``
stays clean.
"""
config = Config.from_env()
if compiler:
config.feldera_compiler = compiler
if model:
config.model = model
if validate:
if not config.feldera_compiler:
resolved = ensure_compiler(
auto_download=config.auto_download_compiler, logs=sys.stderr
)
if resolved is not None:
config.feldera_compiler = str(resolved)
_warn_if_unsupported_compiler(config.feldera_compiler)
return config


def _split_examples(paths: tuple[str, ...]) -> tuple[list[Path], list[Path]]:
"""Split --examples paths into (dirs, files)."""
dirs, files = [], []
Expand Down Expand Up @@ -192,13 +218,7 @@ def translate(
"Warning: running without validation — output SQL is not verified against the Feldera compiler.",
err=True,
)
config = Config.from_env()
if compiler:
config.feldera_compiler = compiler
if model:
config.model = model
if validate:
_warn_if_unsupported_compiler(config.feldera_compiler)
config = _prepare_config(compiler, model, validate)
schema_sql = _read_text(schema_file)
query_sql = _read_text(query_file)

Expand Down Expand Up @@ -277,13 +297,7 @@ def translate_file(
"Warning: running without validation — output SQL is not verified against the Feldera compiler.",
err=True,
)
config = Config.from_env()
if compiler:
config.feldera_compiler = compiler
if model:
config.model = model
if validate:
_warn_if_unsupported_compiler(config.feldera_compiler)
config = _prepare_config(compiler, model, validate)
combined_sql = _read_text(sql_file)
schema_sql, query_sql = split_combined_sql(combined_sql)

Expand Down Expand Up @@ -371,13 +385,7 @@ def translate_batch(
"Warning: running without validation — output SQL is not verified against the Feldera compiler.",
err=True,
)
config = Config.from_env()
if compiler:
config.feldera_compiler = compiler
if model:
config.model = model
if validate:
_warn_if_unsupported_compiler(config.feldera_compiler)
config = _prepare_config(compiler, model, validate)

schema_sql = _read_text(schema_file)
schema_errors = validate_schema(schema_sql)
Expand Down Expand Up @@ -525,13 +533,7 @@ def example(
"Warning: running without validation — output SQL is not verified against the Feldera compiler.",
err=True,
)
config = Config.from_env()
if compiler:
config.feldera_compiler = compiler
if model:
config.model = model
if validate:
_warn_if_unsupported_compiler(config.feldera_compiler)
config = _prepare_config(compiler, model, validate)
result = translate_spark_to_feldera(
schema_sql,
query_sql,
Expand Down
5 changes: 5 additions & 0 deletions python/felderize/felderize/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ class Config:
feldera_compiler: str = ""
max_tokens: int = DEFAULT_MAX_TOKENS
docs_base_url: str = DEFAULT_DOCS_BASE_URL
auto_download_compiler: bool = True

@property
def compiler_path(self) -> str | None:
Expand All @@ -32,6 +33,9 @@ def from_env(cls) -> Config:
load_dotenv(env_path)

raw_max_tokens = os.environ.get("FELDERIZE_MAX_TOKENS")
raw_auto_download = (
os.environ.get("FELDERIZE_AUTO_DOWNLOAD", "1").strip().lower()
)
return cls(
model=os.environ.get("FELDERIZE_MODEL", ""),
api_key=os.environ.get("ANTHROPIC_API_KEY"),
Expand All @@ -41,4 +45,5 @@ def from_env(cls) -> Config:
docs_base_url=os.environ.get(
"FELDERA_DOCS_BASE_URL", DEFAULT_DOCS_BASE_URL
),
auto_download_compiler=raw_auto_download not in ("0", "false", "no", "off"),

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to support all these or should we standardize?

)
69 changes: 66 additions & 3 deletions python/felderize/felderize/install_feldera_sql_compiler.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,14 +61,18 @@ def download_compiler(
output_dir: Path | None = None,
version: str | None = None,
force: bool = False,
logs=None,
) -> Path:
"""Download sql2dbsp JAR from GitHub releases. Returns the path to the JAR.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: logs is the only parameter in this file without a type annotation. logs: TextIO | None = None (from typing.TextIO) would keep the signatures consistent.

Args:
output_dir: Directory to save the JAR (default: ~/.felderize/).
version: Release tag (e.g. "v0.291.0"); defaults to latest.
force: Overwrite existing file if present.
logs: Where to write progress messages (default: stdout). Pass
sys.stderr when stdout must stay machine-clean (e.g. --json-output).
"""
out = logs or sys.stdout
dest_dir = output_dir or FELDERIZE_DIR
dest_dir.mkdir(parents=True, exist_ok=True)

Expand All @@ -90,7 +94,7 @@ def download_compiler(

if dest.exists() and not force:
status = "the latest release" if is_latest else "installed"
print(f"Already on {status}: {name} ({tag})")
print(f"Already on {status}: {name} ({tag})", file=out)
return dest

last_pct = [-1]
Expand All @@ -108,11 +112,70 @@ def _progress(block_num: int, block_size: int, total_size: int) -> None:
f"\r [{bar:<20}] {pct:3d}% {downloaded / 1_048_576:.1f}/{total_size / 1_048_576:.1f} MB",
end="",
flush=True,
file=out,
)

latest_note = " (latest release)" if is_latest else ""
print(f"Downloading {name} ({tag}){latest_note}...")
print(f"Downloading {name} ({tag}){latest_note}...", file=out)
urllib.request.urlretrieve(url, dest, reporthook=_progress)
print() # newline after progress bar
print(file=out) # newline after progress bar

return dest


def find_local_compiler(search_dir: Path | None = None) -> Path | None:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if there was some network issue or interrupt previously due to which there is truncated / corrupt sql2dbsp-jar-with-dependencies-*.jar file, we would still treat it as we found the local compiler?

is this possible? do we prevent corrupt downloads?

"""Return the newest compiler JAR already cached in search_dir.

Looks for ``sql2dbsp-jar-with-dependencies-*.jar`` files in search_dir
(default ``~/.felderize/``). Prefers versions felderize supports; among
those, the highest version. Falls back to the highest unsupported version
when no supported one is cached (validation then warns). Returns None when
the directory holds no compiler JAR.
"""
directory = search_dir or FELDERIZE_DIR
if not directory.is_dir():
return None

jars = [p for p in directory.glob(f"{COMPILER_JAR_PREFIX}*.jar") if p.is_file()]
if not jars:
return None

def version_key(path: Path) -> tuple[int, ...]:
tag = jar_version(path.name)
return _parse_version(tag) if tag else ()

supported = [
p for p in jars if (tag := jar_version(p.name)) and is_supported_version(tag)
]
return max(supported or jars, key=version_key)


def ensure_compiler(
search_dir: Path | None = None,
auto_download: bool = True,
logs=None,
) -> Path | None:
"""Return a usable compiler JAR, downloading the latest one if needed.

Resolution order:
1. The newest compiler JAR already cached in search_dir (~/.felderize/).
2. Otherwise, when auto_download is set, download the latest release.

Returns None when nothing is cached and downloading is disabled or fails;
callers then fall back to their "compiler not found" handling.
"""
local = find_local_compiler(search_dir)
if local is not None:
return local
if not auto_download:
return None
try:
return download_compiler(output_dir=search_dir, logs=logs)
except Exception as e: # network/API failure — degrade gracefully
print(
f"Warning: could not auto-download the Feldera compiler ({e}); "
"validation will be skipped. Run 'felderize download-compiler' or set "
"FELDERA_COMPILER to validate.",
file=sys.stderr,
)
return None
Loading
Loading