Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
75 commits
Select commit Hold shift + click to select a range
8810072
Checkpoint: Initial adapter bootstrapping and e2e tests with Docker C…
mdrakiburrahman Mar 30, 2026
04438af
Fixed catalog gen and add some steps for accessing dbt UI
mdrakiburrahman Mar 30, 2026
44448ab
Materialize Adventureworks as Delta Table and use DuckDB to assert ro…
mdrakiburrahman Mar 30, 2026
6635196
Adding a Kafka sales table and ensuring it fires IVM
mdrakiburrahman Mar 30, 2026
e5af63c
Materialize the Kafka topic as Delta table and use that instead in th…
mdrakiburrahman Mar 30, 2026
709b25a
Use sqlglot instead of brittle RegEx to parse Feldera SQL dialect
mdrakiburrahman Mar 31, 2026
04b7b38
Break apart contributing and customer facing README
mdrakiburrahman Mar 31, 2026
a06c699
Update changelog
mdrakiburrahman Mar 31, 2026
d8afa71
Merge branch 'main' into dev/mdrrahman/dbt-feldera
mdrakiburrahman Mar 31, 2026
0283a3d
Proofread
mdrakiburrahman Mar 31, 2026
34d0e35
Remove test_string_size override to implement the base behavior
mdrakiburrahman Mar 31, 2026
7ceaa8f
Differentiate between float vs numeric
mdrakiburrahman Apr 1, 2026
3fb8dcc
Update docstring to mention Feldera transactions
mdrakiburrahman Apr 1, 2026
56de322
Use NOW and add a test for it
mdrakiburrahman Apr 1, 2026
a870d0b
Update docstring for the expand column types
mdrakiburrahman Apr 1, 2026
22fd48e
Use better type inference in seed via _infer_column_types.
mdrakiburrahman Apr 1, 2026
c10e3d0
Add an idempotent seed uploader and downloader
mdrakiburrahman Apr 1, 2026
dc4df22
Remove seeds from git in origin
mdrakiburrahman Apr 1, 2026
24c7b86
Remove the models and macros as well
mdrakiburrahman Apr 1, 2026
ec8cf4c
Add a gitkeep to the seed folders so they are visible in git tree
mdrakiburrahman Apr 1, 2026
044063c
Merge branch 'main' into dev/mdrrahman/dbt-feldera
mdrakiburrahman Apr 1, 2026
3f6809b
Remove GitHub action changes - no tests in publish and remove CI test…
mdrakiburrahman Apr 1, 2026
0c70a30
Use the ci-post-release.yml sed to change the version
mdrakiburrahman Apr 1, 2026
2f2dcb9
Update changelog
mdrakiburrahman Apr 1, 2026
ec20ba2
Use Pipeline client instead of FelderaClient.get_pipeline and a few o…
mdrakiburrahman Apr 1, 2026
8046bf6
compiling_states needs to be a tuple in PipelineStateManager due to P…
mdrakiburrahman Apr 1, 2026
a07217d
Attempt to lint for CI pipeline
mdrakiburrahman Apr 1, 2026
2cafef7
Merge branch 'main' into dev/mdrrahman/dbt-feldera
mdrakiburrahman Apr 1, 2026
fd88b16
Merge branch 'main' into dev/mdrrahman/dbt-feldera
mdrakiburrahman Apr 2, 2026
8700b56
fix(py): fix ruff isort import ordering in dbt-feldera
mdrakiburrahman Apr 3, 2026
7e05d2f
Merge branch 'main' into dev/mdrrahman/dbt-feldera
mdrakiburrahman Apr 5, 2026
0280c4d
fix(py): pin ruff==0.9.10 in dbt-feldera to match CI pre-commit
mdrakiburrahman Apr 5, 2026
f698020
Update docstring to make it clear that this adapter uses continuous q…
mdrakiburrahman Apr 5, 2026
d94f0c8
Add documentation for TYPE_LABELS
mdrakiburrahman Apr 5, 2026
b5fb950
Update param docstring for auto_begin
mdrakiburrahman Apr 5, 2026
9ecf1b2
Turn positional tuples into a ColumnDescription NamedTuple
mdrakiburrahman Apr 5, 2026
8413b74
Add a docstring to connection.execute that it delegates to the cursor…
mdrakiburrahman Apr 5, 2026
018cd38
Update docstring for cursor.execute
mdrakiburrahman Apr 5, 2026
64afc65
Add a docstring to SqlIntent to make it clear which enum is supported…
mdrakiburrahman Apr 5, 2026
5c3dd4e
Add a small docstring to DATA_INGRESS
mdrakiburrahman Apr 5, 2026
de5b06f
Merge branch 'main' into dev/mdrrahman/dbt-feldera
mdrakiburrahman Apr 7, 2026
1859709
Merge branch 'main' into dev/mdrrahman/dbt-feldera
mdrakiburrahman Apr 8, 2026
4d43b45
Clean .temp dir
mdrakiburrahman Apr 8, 2026
2687e25
Merge branch 'main' into dev/mdrrahman/dbt-feldera
mdrakiburrahman Apr 11, 2026
95a54f7
Merge branch 'main' into dev/mdrrahman/dbt-feldera
mdrakiburrahman Apr 11, 2026
b859260
Merge branch 'main' into dev/mdrrahman/dbt-feldera
mdrakiburrahman Apr 12, 2026
6f68d81
Merge branch 'feldera:main' into dev/mdrrahman/dbt-feldera
mdrakiburrahman Apr 13, 2026
0a45510
Merge branch 'main' into dev/mdrrahman/dbt-feldera
mdrakiburrahman Apr 14, 2026
1d8fdcd
Move DuckDB into a container for tests so we don't need host mount
mdrakiburrahman Apr 14, 2026
749e793
Add adventureworks SQL scripts to git, remove .gitkeep files
mdrakiburrahman Apr 15, 2026
9aa8846
Pass 1 through PR review comments
mdrakiburrahman Apr 15, 2026
37f077e
Implement Generator in sqlglot to override FLOAT and map it to REAL
mdrakiburrahman Apr 15, 2026
edf302b
Merge branch 'main' into dev/mdrrahman/dbt-feldera
mdrakiburrahman Apr 15, 2026
a64ae10
Bump version to 0.288.0
mdrakiburrahman Apr 15, 2026
da6aa5e
Map FLOAT to REAL in dbt macro
mdrakiburrahman Apr 15, 2026
ca62160
Update README
mdrakiburrahman Apr 15, 2026
29c6c7c
Remove incremental materialization support, it's not required in Feldera
mdrakiburrahman Apr 15, 2026
da853cb
Changed Jinja macro to stored = true
mdrakiburrahman Apr 15, 2026
28fa16f
Remove ambiguous "registered" reference
mdrakiburrahman Apr 15, 2026
ac59e4d
Make TestFelderaColumn dynamic
mdrakiburrahman Apr 15, 2026
b5482d3
Remove external from FelderaRelationType
mdrakiburrahman Apr 15, 2026
7465e32
Remove test for no columns
mdrakiburrahman Apr 15, 2026
a6f2801
Factor out CATALOG_COLUMNS
mdrakiburrahman Apr 15, 2026
e4c50f7
Remove INTERVAL mapping in agate and run integration tests green
mdrakiburrahman Apr 15, 2026
dc3416c
Add a _wait_for_pipeline_idle to deterministically poll pipeline to p…
mdrakiburrahman Apr 15, 2026
b0150a6
Add lattice tests that are data driven
mdrakiburrahman Apr 15, 2026
3f04fb6
Rename test to test_remove_nonexistent_does_not_throw
mdrakiburrahman Apr 15, 2026
393d5cd
Crash test if Kafka topic isn't up and dump logs
mdrakiburrahman Apr 15, 2026
2d8c874
Update docstring for update_with_views
mdrakiburrahman Apr 15, 2026
ab70529
Lint
mdrakiburrahman Apr 15, 2026
61123e3
Dupe line
mdrakiburrahman Apr 16, 2026
05e7eb5
Remove NOW and use MD5 hash
mdrakiburrahman Apr 16, 2026
e4e07b6
There is no relationship between connectors and views being stored
mdrakiburrahman Apr 16, 2026
e1ce519
Add DBT_THREADS for faster integration tests
mdrakiburrahman Apr 16, 2026
b0dca82
Ruff the integration test folder
mdrakiburrahman Apr 16, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Break apart contributing and customer facing README
  • Loading branch information
mdrakiburrahman committed Mar 31, 2026
commit 04b7b387a4db4862336a11ddad2b0e2e20b4c944
131 changes: 131 additions & 0 deletions python/dbt-feldera/CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
# Contributing to dbt-feldera

Thanks for your interest in contributing to **dbt-feldera**!
This guide covers the local development workflow, test infrastructure, and
conventions.

## Prerequisites

| Tool | Version | Purpose |
| ------------------------------------- | ------- | ----------------------- |
| Python | 3.10+ | Runtime |
| [uv](https://github.com/astral-sh/uv) | latest | Package & venv manager |
| Docker (with Compose v2) | latest | Integration & e2e tests |

> Use the [vscode devcontainer](../../.devcontainer/devcontainer.json) to have a smoother onboarding experience!

## Quick start

```bash
cd python/dbt-feldera
.scripts/run.sh all # venv → build → lint → unit → integration → e2e
```

## Development script

All development tasks go through a single entry-point — [`.scripts/run.sh`](.scripts/run.sh):

```bash
.scripts/run.sh <target>
```

| Target | What it does |
| ------------------ | --------------------------------------------------------- |
| `venv` | Create a fresh virtual environment and install all deps |
| `build` | Build the wheel into `dist/*.whl` |
| `fix` | Auto-fix lint issues (`ruff check --fix` + `ruff format`) |
| `lint` | Check lint (`ruff check` + `ruff format --check`) |
| `unit-test` | Run `pytest tests/` (no Docker required) |
| `integration-test` | Start Feldera in Docker, run `pytest integration_tests/` |
| `e2e` | Full dbt CLI lifecycle against a Docker Feldera instance |
| `all` | Run every target above in sequence |

## Test architecture

### Overview

The integration tests spin up Feldera and Kafka via Docker Compose, and run a dbt project (`dbt-adventureworks`) against the live Feldera instance, and verify outputs including Delta Lake files.

```mermaid
graph TB
subgraph "Host"
RS[".scripts/run.sh"]
PT["PyTest<br/><i>integration_tests/test_dbt_feldera.py</i>"]
DBT["dbt CLI<br/><i>e2e via run-dbt-local.sh</i>"]
AW["dbt-adventureworks<br/><i>test project & seeds</i>"]
end

subgraph "Docker Compose"
FM["pipeline-manager<br/><i>Feldera</i>"]
RP["redpanda<br/><i>Kafka-compatible broker</i>"]
DV[("delta-output/<br/><i>bind-mount volume</i>")]
end

RS -- "starts/stops" --> FM
RS -- "starts/stops" --> RP
RS -- "invokes" --> PT
RS -- "invokes" --> DBT

PT -- "HTTP API<br/>:8080" --> FM
PT -- "HTTP Proxy<br/>:18082" --> RP
PT -- "reads Delta<br/>via DuckDB" --> DV

DBT -- "dbt seed / build<br/>:8080" --> FM

AW -. "fixture" .-> PT
AW -. "fixture" .-> DBT

FM -- "Kafka connector<br/>:29092" --> RP
FM -- "writes Delta" --> DV

RP -- "healthcheck<br/>:9644" --> RP
FM -- "healthcheck<br/>:8080/healthz" --> FM
```

### Test categories

| Category | Directory | Docker? | What it validates |
| --------------- | ---------------------------- | ------- | ----------------------------------------------------------------------------------------- |
| **Unit** | `tests/unit/` | No | Adapter internals: credentials, columns, cursor, relations, SQL parsing, pipeline manager |
| **Integration** | `integration_tests/` | Yes | Full dbt ↔ Feldera round-trip: seed, run, test, incremental, Delta output, Kafka IVM |
| **End-to-end** | `integration_tests/scripts/` | Yes | dbt CLI lifecycle (`debug → seed → build → docs generate`) against a real instance |

### Integration test fixtures (conftest.py)

The PyTest session fixtures handle the Docker lifecycle automatically:

1. **`delta_output_dir`** — cleans and creates the `dbt-adventureworks/delta-output/` directory (bind-mounted into the Feldera container at `/data/delta`)
2. **`docker_feldera`** — starts Docker Compose, waits for health checks, yields the Feldera URL, and tears down on exit
3. **`kafka_proxy_url`** — resolves and waits for Redpanda's HTTP proxy
4. **`dbt_project_dir`** — returns the path to the `dbt-adventureworks` project

Set `FELDERA_SKIP_DOCKER=1` to skip Docker management and test against an
external Feldera instance.

### Docker Compose services

| Service | Image | Ports | Purpose |
| ------------------ | -------------------------- | ------------------------------------- | ------------------------------------------- |
| `pipeline-manager` | `feldera/pipeline-manager` | `8080` | Feldera API + pipeline engine |
| `redpanda` | `redpanda:v23.1.13` | `19092` (Kafka), `18082` (HTTP proxy) | Kafka-compatible broker for connector tests |

## Environment variables

| Variable | Default | Used by | Description |
| --------------------- | ---------------------------------------------------- | ------------------------- | ----------------------------------------------------------------------- |
| `FELDERA_URL` | `http://localhost:8080` | `run.sh`, e2e | Feldera API base URL |
| `FELDERA_SKIP_DOCKER` | _(unset)_ | `run.sh integration-test` | Set to `1` to skip Docker start/stop (use an external Feldera instance) |
| `FELDERA_IMAGE` | `images.feldera.com/feldera/pipeline-manager:latest` | docker-compose | Docker image for the Feldera container |
| `FELDERA_PORT` | `8080` | docker-compose | Host port mapped to the Feldera container |
| `RUST_LOG` | `info` | docker-compose | Log level inside the Feldera container |
| `SKIP_TEARDOWN` | _(unset)_ | e2e (`run-dbt-local.sh`) | Set to `1` to keep Feldera running after the e2e test and print UI URLs |
| `DBT_DOCS_PORT` | `18081` | e2e (`run-dbt-local.sh`) | Host port for `dbt docs serve` |

## Code style

We use [Ruff](https://docs.astral.sh/ruff/) for linting and formatting:

```bash
.scripts/run.sh lint # check
.scripts/run.sh fix # auto-fix
```
221 changes: 187 additions & 34 deletions python/dbt-feldera/README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,42 @@
# dbt-feldera

The [dbt](https://www.getdbt.com/) adapter for
[Feldera](https://www.feldera.com/). Feldera's DBSP engine automatically
incrementalizes every SQL query -- no watermarks, scans, or `MERGE`.
[Feldera](https://www.feldera.com/).

**[dbt](https://www.getdbt.com/)** enables data analysts and engineers to
transform their data using the same practices that software engineers use to
build applications.

**[Feldera](https://www.feldera.com/)** is a streaming SQL engine powered by
the DBSP incremental computation engine. It automatically incrementalizes
_every_ SQL query without watermarks, scans, or `MERGE`. When input data
changes, only affected output rows are recomputed.

## Key features

- **Automatic incremental view maintenance (IVM)** — Feldera's DBSP engine
incrementalizes any SQL query out of the box. No manual merge logic or
watermark tuning required.
- **Streaming-native materializations** — first-class support for continuous
pipelines alongside standard dbt materializations.
- **Connector integration** — attach Kafka, Delta Lake, S3, and HTTP
connectors directly to models via configuration.
- **Easy setup** — pure Python adapter with no ODBC driver needed.

## Installation

```bash
pip install dbt-feldera
```

or with [uv](https://docs.astral.sh/uv/):

```bash
uv add dbt-feldera
```

Requires Python 3.10+ and dbt-core ~1.9.

## Configuration

Add a Feldera target to your `profiles.yml`:
Expand All @@ -21,50 +48,176 @@ my_project:
dev:
type: feldera
host: "http://localhost:8080"
api_key: "apikey:..." # optional
api_key: "apikey:..." # optional — for authenticated instances
database: "default"
schema: "my_pipeline" # pipeline name
schema: "my_pipeline" # maps to the Feldera pipeline name
compilation_profile: dev # dev | unoptimized | optimized
workers: 4
timeout: 300
```

### Concept mapping

Feldera uses different terminology than traditional databases. Here's how dbt
concepts map to Feldera:

| dbt concept | Feldera concept | Description |
| ----------------------------- | ----------------- | ----------------------------------------------- |
| `database` | _(unused)_ | Set to any string (e.g. `"default"`) |
| `schema` | Pipeline name | Each dbt schema maps to one Feldera pipeline |
| `table` materialization | Input table | External data source (Kafka, HTTP, S3) |
| `view` materialization | View | Intermediate SQL transform |
| `incremental` materialization | Materialized view | IVM-backed output, queryable ad-hoc |
| `seed` | Table + HTTP push | Schema registered, data pushed via HTTP ingress |

### Configuration options

| Option | Default | Description |
| --------------------- | ----------------------- | ------------------------------------------------------------------------------------------------------- |
| `host` | `http://localhost:8080` | Feldera API base URL |
| `api_key` | _(none)_ | API key for authenticated Feldera instances |
| `schema` | _(required)_ | Pipeline name in Feldera |
| `compilation_profile` | `dev` | SQL compilation profile: `dev` (fast compile), `unoptimized`, or `optimized` (best runtime performance) |
| `workers` | `4` | Number of pipeline worker threads |
| `timeout` | `300` | Pipeline operation timeout in seconds |
Comment thread
mdrakiburrahman marked this conversation as resolved.
Outdated

## Materializations

| Materialization | Feldera object |
| -------------------- | -------------------------- |
| `view` | `CREATE VIEW` |
| `table` | `CREATE TABLE` (input) |
| `incremental` | `CREATE MATERIALIZED VIEW` |
| `streaming_pipeline` | Full pipeline program |
| `seed` | Table + HTTP ingress push |
### `view` — Intermediate transform

Creates a `CREATE VIEW` in the pipeline. Use for intermediate transformations
that don't need to be queried directly or connected to an output.

## Development
```sql
-- models/orders_enriched.sql
{{ config(materialized='view') }}

Requires Python 3.10+, [uv](https://github.com/astral-sh/uv), and Docker.
SELECT o.id, o.total, c.name AS customer_name
FROM {{ ref('orders') }} o
JOIN {{ ref('customers') }} c ON o.customer_id = c.id
```

Set `materialized_view: true` or attach `connectors` to promote to a
`CREATE MATERIALIZED VIEW` (enables ad-hoc queries and output connectors):

All development tasks go through a single script — see [`.scripts/run.sh`](.scripts/run.sh)
```sql
{{ config(
materialized='view',
materialized_view=true,
Comment thread
mdrakiburrahman marked this conversation as resolved.
Outdated
connectors=[{'transport': {'name': 'my_delta_connector'}}]
) }}
```

### `table` — Input source

Creates a `CREATE TABLE` — an input source for external data ingress. The model
SQL defines the **column schema**, not a SELECT query. Attach connectors for
Kafka, S3, HTTP, or other input sources.

```sql
-- models/raw_events.sql
{{ config(
materialized='table',
connectors=[{
'transport': {
'name': 'kafka_in',
'config': {
'bootstrap.servers': 'redpanda:29092',
'topics': ['events']
}
},
'format': {'name': 'json'}
}]
) }}

event_id BIGINT NOT NULL,
event_type VARCHAR NOT NULL,
payload VARCHAR,
created_at TIMESTAMP NOT NULL
```

### `incremental` — Automatic IVM

Leverages Feldera's DBSP engine for automatic incremental view maintenance.
Unlike dbt's standard incremental strategy (which uses watermarks and merge),
Feldera incrementalizes the query automatically — when inputs change, only
affected output rows are recomputed.

```sql
-- models/sales_summary.sql
{{ config(materialized='incremental') }}

SELECT
region,
product_category,
SUM(amount) AS total_sales,
COUNT(*) AS order_count
FROM {{ ref('orders') }}
GROUP BY region, product_category
```

On `--full-refresh`, the pipeline is stopped, storage is cleared, and the
pipeline is redeployed from scratch.

### `streaming_pipeline` — Full pipeline as a single model

Deploys an entire Feldera pipeline as one dbt model. The model SQL **is** the
complete pipeline program — containing `CREATE TABLE` and `CREATE VIEW`
statements. Useful for complex multi-table, multi-view pipelines managed as a
single unit.

```sql
-- models/my_pipeline.sql
{{ config(materialized='streaming_pipeline') }}

CREATE TABLE orders (
id BIGINT NOT NULL,
customer_id BIGINT NOT NULL,
amount DECIMAL(10, 2) NOT NULL
);

CREATE TABLE customers (
id BIGINT NOT NULL,
name VARCHAR NOT NULL
);

CREATE MATERIALIZED VIEW enriched_orders AS
SELECT o.id, o.amount, c.name AS customer_name
FROM orders o
JOIN customers c ON o.customer_id = c.id;
```

### `seed` — Reference data via HTTP push

Seeds register a `CREATE TABLE` and push row data via Feldera's HTTP ingress
API after the pipeline is deployed. Use for small reference datasets (CSVs).

```bash
cd python/dbt-feldera

.scripts/run.sh all # run everything in sequence
.scripts/run.sh venv # fresh venv + install deps
.scripts/run.sh build # build wheel to dist/*.whl
.scripts/run.sh lint # ruff check + format
.scripts/run.sh unit-test # pytest unit tests
.scripts/run.sh integration-test # pytest integration
.scripts/run.sh e2e # dbt CLI end-to-end test
dbt seed # push seed data
dbt seed --full-refresh # drop and recreate, then push
Comment thread
mdrakiburrahman marked this conversation as resolved.
Outdated
```

### Environment variables

| Variable | Default | Used by | Description |
| --------------------- | ---------------------------------------------------- | ------------------------- | ----------------------------------------------------------------------- |
| `FELDERA_URL` | `http://localhost:8080` | `run.sh`, e2e | Feldera API base URL |
| `FELDERA_SKIP_DOCKER` | _(unset)_ | `run.sh integration-test` | Set to `1` to skip Docker start/stop (use an external Feldera instance) |
| `FELDERA_IMAGE` | `images.feldera.com/feldera/pipeline-manager:latest` | docker-compose | Docker image for the Feldera container |
| `FELDERA_PORT` | `8080` | docker-compose | Host port mapped to the Feldera container |
| `RUST_LOG` | `info` | docker-compose | Log level inside the Feldera container |
| `SKIP_TEARDOWN` | _(unset)_ | e2e (`run-dbt-local.sh`) | Set to `1` to keep Feldera running after the e2e test and print UI URLs |
| `DBT_DOCS_PORT` | `18081` | e2e (`run-dbt-local.sh`) | Host port suggested for `dbt docs serve` |
### Summary

| Materialization | Feldera SQL | Best for |
| ---------------------------------- | -------------------------- | ------------------------------------------- |
| `view` | `CREATE VIEW` | Intermediate transforms |
| `view` + `materialized_view: true` | `CREATE MATERIALIZED VIEW` | Queryable outputs, connectors |
| `table` | `CREATE TABLE` | External input sources (Kafka, S3, HTTP) |
| `incremental` | `CREATE MATERIALIZED VIEW` | IVM-backed aggregations and joins |
| `streaming_pipeline` | Full program | Multi-table/view pipelines as a single unit |
| `seed` | `CREATE TABLE` + HTTP push | Small reference datasets |
Comment thread
mdrakiburrahman marked this conversation as resolved.
Outdated

## Documentation

- **[Feldera documentation](https://docs.feldera.com/)** — platform docs, SQL reference, connectors
- **[dbt documentation](https://docs.getdbt.com/)** — general dbt usage and concepts

## Contributing

See [CONTRIBUTING.md](CONTRIBUTING.md) for development setup, testing, and
project layout.

## License

Apache-2.0 — see [LICENSE](../../LICENSE) for details.