Break apart contributing and customer facing README

feldera · mihaibudiu · Apr 16, 2026 · Mar 30, 2026 · Mar 30, 2026 · Mar 30, 2026
commit 04b7b387a4db4862336a11ddad2b0e2e20b4c944
diff --git a/python/dbt-feldera/CONTRIBUTING.md b/python/dbt-feldera/CONTRIBUTING.md
@@ -0,0 +1,131 @@
+# Contributing to dbt-feldera
+
+Thanks for your interest in contributing to **dbt-feldera**!
+This guide covers the local development workflow, test infrastructure, and
+conventions.
+
+## Prerequisites
+
+| Tool                                  | Version | Purpose                 |
+| ------------------------------------- | ------- | ----------------------- |
+| Python                                | 3.10+   | Runtime                 |
+| [uv](https://github.com/astral-sh/uv) | latest  | Package & venv manager  |
+| Docker (with Compose v2)              | latest  | Integration & e2e tests |
+
+> Use the [vscode devcontainer](../../.devcontainer/devcontainer.json) to have a smoother onboarding experience!
+
+## Quick start
+
+```bash
+cd python/dbt-feldera
+.scripts/run.sh all        # venv → build → lint → unit → integration → e2e
+```
+
+## Development script
+
+All development tasks go through a single entry-point — [`.scripts/run.sh`](.scripts/run.sh):
+
+```bash
+.scripts/run.sh <target>
+```
+
+| Target             | What it does                                              |
+| ------------------ | --------------------------------------------------------- |
+| `venv`             | Create a fresh virtual environment and install all deps   |
+| `build`            | Build the wheel into `dist/*.whl`                         |
+| `fix`              | Auto-fix lint issues (`ruff check --fix` + `ruff format`) |
+| `lint`             | Check lint (`ruff check` + `ruff format --check`)         |
+| `unit-test`        | Run `pytest tests/` (no Docker required)                  |
+| `integration-test` | Start Feldera in Docker, run `pytest integration_tests/`  |
+| `e2e`              | Full dbt CLI lifecycle against a Docker Feldera instance  |
+| `all`              | Run every target above in sequence                        |
+
+## Test architecture
+
+### Overview
+
+The integration tests spin up Feldera and Kafka via Docker Compose, and run a dbt project (`dbt-adventureworks`) against the live Feldera instance, and verify outputs including Delta Lake files.
+
+```mermaid
+graph TB
+    subgraph "Host"
+        RS[".scripts/run.sh"]
+        PT["PyTest<br/><i>integration_tests/test_dbt_feldera.py</i>"]
+        DBT["dbt CLI<br/><i>e2e via run-dbt-local.sh</i>"]
+        AW["dbt-adventureworks<br/><i>test project & seeds</i>"]
+    end
+
+    subgraph "Docker Compose"
+        FM["pipeline-manager<br/><i>Feldera</i>"]
+        RP["redpanda<br/><i>Kafka-compatible broker</i>"]
+        DV[("delta-output/<br/><i>bind-mount volume</i>")]
+    end
+
+    RS -- "starts/stops" --> FM
+    RS -- "starts/stops" --> RP
+    RS -- "invokes" --> PT
+    RS -- "invokes" --> DBT
+
+    PT -- "HTTP API<br/>:8080" --> FM
+    PT -- "HTTP Proxy<br/>:18082" --> RP
+    PT -- "reads Delta<br/>via DuckDB" --> DV
+
+    DBT -- "dbt seed / build<br/>:8080" --> FM
+
+    AW -. "fixture" .-> PT
+    AW -. "fixture" .-> DBT
+
+    FM -- "Kafka connector<br/>:29092" --> RP
+    FM -- "writes Delta" --> DV
+
+    RP -- "healthcheck<br/>:9644" --> RP
+    FM -- "healthcheck<br/>:8080/healthz" --> FM
+```
+
+### Test categories
+
+| Category        | Directory                    | Docker? | What it validates                                                                         |
+| --------------- | ---------------------------- | ------- | ----------------------------------------------------------------------------------------- |
+| **Unit**        | `tests/unit/`                | No      | Adapter internals: credentials, columns, cursor, relations, SQL parsing, pipeline manager |
+| **Integration** | `integration_tests/`         | Yes     | Full dbt ↔ Feldera round-trip: seed, run, test, incremental, Delta output, Kafka IVM      |
+| **End-to-end**  | `integration_tests/scripts/` | Yes     | dbt CLI lifecycle (`debug → seed → build → docs generate`) against a real instance        |
+
+### Integration test fixtures (conftest.py)
+
+The PyTest session fixtures handle the Docker lifecycle automatically:
+
+1. **`delta_output_dir`** — cleans and creates the `dbt-adventureworks/delta-output/` directory (bind-mounted into the Feldera container at `/data/delta`)
+2. **`docker_feldera`** — starts Docker Compose, waits for health checks, yields the Feldera URL, and tears down on exit
+3. **`kafka_proxy_url`** — resolves and waits for Redpanda's HTTP proxy
+4. **`dbt_project_dir`** — returns the path to the `dbt-adventureworks` project
+
+Set `FELDERA_SKIP_DOCKER=1` to skip Docker management and test against an
+external Feldera instance.
+
+### Docker Compose services
+
+| Service            | Image                      | Ports                                 | Purpose                                     |
+| ------------------ | -------------------------- | ------------------------------------- | ------------------------------------------- |
+| `pipeline-manager` | `feldera/pipeline-manager` | `8080`                                | Feldera API + pipeline engine               |
+| `redpanda`         | `redpanda:v23.1.13`        | `19092` (Kafka), `18082` (HTTP proxy) | Kafka-compatible broker for connector tests |
+
+## Environment variables
+
+| Variable              | Default                                              | Used by                   | Description                                                             |
+| --------------------- | ---------------------------------------------------- | ------------------------- | ----------------------------------------------------------------------- |
+| `FELDERA_URL`         | `http://localhost:8080`                              | `run.sh`, e2e             | Feldera API base URL                                                    |
+| `FELDERA_SKIP_DOCKER` | _(unset)_                                            | `run.sh integration-test` | Set to `1` to skip Docker start/stop (use an external Feldera instance) |
+| `FELDERA_IMAGE`       | `images.feldera.com/feldera/pipeline-manager:latest` | docker-compose            | Docker image for the Feldera container                                  |
+| `FELDERA_PORT`        | `8080`                                               | docker-compose            | Host port mapped to the Feldera container                               |
+| `RUST_LOG`            | `info`                                               | docker-compose            | Log level inside the Feldera container                                  |
+| `SKIP_TEARDOWN`       | _(unset)_                                            | e2e (`run-dbt-local.sh`)  | Set to `1` to keep Feldera running after the e2e test and print UI URLs |
+| `DBT_DOCS_PORT`       | `18081`                                              | e2e (`run-dbt-local.sh`)  | Host port for `dbt docs serve`                                          |
+
+## Code style
+
+We use [Ruff](https://docs.astral.sh/ruff/) for linting and formatting:
+
+```bash
+.scripts/run.sh lint       # check
+.scripts/run.sh fix        # auto-fix
+```
diff --git a/python/dbt-feldera/README.md b/python/dbt-feldera/README.md
@@ -1,15 +1,42 @@
 # dbt-feldera
 
 The [dbt](https://www.getdbt.com/) adapter for
-[Feldera](https://www.feldera.com/). Feldera's DBSP engine automatically
-incrementalizes every SQL query -- no watermarks, scans, or `MERGE`.
+[Feldera](https://www.feldera.com/).
+
+**[dbt](https://www.getdbt.com/)** enables data analysts and engineers to
+transform their data using the same practices that software engineers use to
+build applications.
+
+**[Feldera](https://www.feldera.com/)** is a streaming SQL engine powered by
+the DBSP incremental computation engine. It automatically incrementalizes
+_every_ SQL query without watermarks, scans, or `MERGE`. When input data
+changes, only affected output rows are recomputed.
+
+## Key features
+
+- **Automatic incremental view maintenance (IVM)** — Feldera's DBSP engine
+  incrementalizes any SQL query out of the box. No manual merge logic or
+  watermark tuning required.
+- **Streaming-native materializations** — first-class support for continuous
+  pipelines alongside standard dbt materializations.
+- **Connector integration** — attach Kafka, Delta Lake, S3, and HTTP
+  connectors directly to models via configuration.
+- **Easy setup** — pure Python adapter with no ODBC driver needed.
 
 ## Installation
 
 ```bash
 pip install dbt-feldera
 ```
 
+or with [uv](https://docs.astral.sh/uv/):
+
+```bash
+uv add dbt-feldera
+```
+
+Requires Python 3.10+ and dbt-core ~1.9.
+
 ## Configuration
 
 Add a Feldera target to your `profiles.yml`:
@@ -21,50 +48,176 @@ my_project:
     dev:
       type: feldera
       host: "http://localhost:8080"
-      api_key: "apikey:..."          # optional
+      api_key: "apikey:..."          # optional — for authenticated instances
       database: "default"
-      schema: "my_pipeline"          # pipeline name
+      schema: "my_pipeline"          # maps to the Feldera pipeline name
       compilation_profile: dev       # dev | unoptimized | optimized
       workers: 4
       timeout: 300
 ```
 
+### Concept mapping
+
+Feldera uses different terminology than traditional databases. Here's how dbt
+concepts map to Feldera:
+
+| dbt concept                   | Feldera concept   | Description                                     |
+| ----------------------------- | ----------------- | ----------------------------------------------- |
+| `database`                    | _(unused)_        | Set to any string (e.g. `"default"`)            |
+| `schema`                      | Pipeline name     | Each dbt schema maps to one Feldera pipeline    |
+| `table` materialization       | Input table       | External data source (Kafka, HTTP, S3)          |
+| `view` materialization        | View              | Intermediate SQL transform                      |
+| `incremental` materialization | Materialized view | IVM-backed output, queryable ad-hoc             |
+| `seed`                        | Table + HTTP push | Schema registered, data pushed via HTTP ingress |
+
+### Configuration options
+
+| Option                | Default                 | Description                                                                                             |
+| --------------------- | ----------------------- | ------------------------------------------------------------------------------------------------------- |
+| `host`                | `http://localhost:8080` | Feldera API base URL                                                                                    |
+| `api_key`             | _(none)_                | API key for authenticated Feldera instances                                                             |
+| `schema`              | _(required)_            | Pipeline name in Feldera                                                                                |
+| `compilation_profile` | `dev`                   | SQL compilation profile: `dev` (fast compile), `unoptimized`, or `optimized` (best runtime performance) |
+| `workers`             | `4`                     | Number of pipeline worker threads                                                                       |
+| `timeout`             | `300`                   | Pipeline operation timeout in seconds                                                                   |
+
 ## Materializations
 
-| Materialization      | Feldera object             |
-| -------------------- | -------------------------- |
-| `view`               | `CREATE VIEW`              |
-| `table`              | `CREATE TABLE` (input)     |
-| `incremental`        | `CREATE MATERIALIZED VIEW` |
-| `streaming_pipeline` | Full pipeline program      |
-| `seed`               | Table + HTTP ingress push  |
+### `view` — Intermediate transform
+
+Creates a `CREATE VIEW` in the pipeline. Use for intermediate transformations
+that don't need to be queried directly or connected to an output.
 
-## Development
+```sql
+-- models/orders_enriched.sql
+{{ config(materialized='view') }}
 
-Requires Python 3.10+, [uv](https://github.com/astral-sh/uv), and Docker.
+SELECT o.id, o.total, c.name AS customer_name
+FROM {{ ref('orders') }} o
+JOIN {{ ref('customers') }} c ON o.customer_id = c.id
+```
+
+Set `materialized_view: true` or attach `connectors` to promote to a
+`CREATE MATERIALIZED VIEW` (enables ad-hoc queries and output connectors):
 
-All development tasks go through a single script — see [`.scripts/run.sh`](.scripts/run.sh)
+```sql
+{{ config(
+    materialized='view',
+    materialized_view=true,
+    connectors=[{'transport': {'name': 'my_delta_connector'}}]
+) }}
+```
+
+### `table` — Input source
+
+Creates a `CREATE TABLE` — an input source for external data ingress. The model
+SQL defines the **column schema**, not a SELECT query. Attach connectors for
+Kafka, S3, HTTP, or other input sources.
+
+```sql
+-- models/raw_events.sql
+{{ config(
+    materialized='table',
+    connectors=[{
+        'transport': {
+            'name': 'kafka_in',
+            'config': {
+                'bootstrap.servers': 'redpanda:29092',
+                'topics': ['events']
+            }
+        },
+        'format': {'name': 'json'}
+    }]
+) }}
+
+event_id BIGINT NOT NULL,
+event_type VARCHAR NOT NULL,
+payload VARCHAR,
+created_at TIMESTAMP NOT NULL
+```
+
+### `incremental` — Automatic IVM
+
+Leverages Feldera's DBSP engine for automatic incremental view maintenance.
+Unlike dbt's standard incremental strategy (which uses watermarks and merge),
+Feldera incrementalizes the query automatically — when inputs change, only
+affected output rows are recomputed.
+
+```sql
+-- models/sales_summary.sql
+{{ config(materialized='incremental') }}
+
+SELECT
+    region,
+    product_category,
+    SUM(amount) AS total_sales,
+    COUNT(*) AS order_count
+FROM {{ ref('orders') }}
+GROUP BY region, product_category
+```
+
+On `--full-refresh`, the pipeline is stopped, storage is cleared, and the
+pipeline is redeployed from scratch.
+
+### `streaming_pipeline` — Full pipeline as a single model
+
+Deploys an entire Feldera pipeline as one dbt model. The model SQL **is** the
+complete pipeline program — containing `CREATE TABLE` and `CREATE VIEW`
+statements. Useful for complex multi-table, multi-view pipelines managed as a
+single unit.
+
+```sql
+-- models/my_pipeline.sql
+{{ config(materialized='streaming_pipeline') }}
+
+CREATE TABLE orders (
+    id BIGINT NOT NULL,
+    customer_id BIGINT NOT NULL,
+    amount DECIMAL(10, 2) NOT NULL
+);
+
+CREATE TABLE customers (
+    id BIGINT NOT NULL,
+    name VARCHAR NOT NULL
+);
+
+CREATE MATERIALIZED VIEW enriched_orders AS
+SELECT o.id, o.amount, c.name AS customer_name
+FROM orders o
+JOIN customers c ON o.customer_id = c.id;
+```
+
+### `seed` — Reference data via HTTP push
+
+Seeds register a `CREATE TABLE` and push row data via Feldera's HTTP ingress
+API after the pipeline is deployed. Use for small reference datasets (CSVs).
 
 ```bash
-cd python/dbt-feldera
-
-.scripts/run.sh all              # run everything in sequence
-.scripts/run.sh venv             # fresh venv + install deps
-.scripts/run.sh build            # build wheel to dist/*.whl
-.scripts/run.sh lint             # ruff check + format
-.scripts/run.sh unit-test        # pytest unit tests
-.scripts/run.sh integration-test # pytest integration
-.scripts/run.sh e2e              # dbt CLI end-to-end test
+dbt seed                # push seed data
+dbt seed --full-refresh # drop and recreate, then push
 ```
 
-### Environment variables
-
-| Variable              | Default                                              | Used by                   | Description                                                             |
-| --------------------- | ---------------------------------------------------- | ------------------------- | ----------------------------------------------------------------------- |
-| `FELDERA_URL`         | `http://localhost:8080`                              | `run.sh`, e2e             | Feldera API base URL                                                    |
-| `FELDERA_SKIP_DOCKER` | _(unset)_                                            | `run.sh integration-test` | Set to `1` to skip Docker start/stop (use an external Feldera instance) |
-| `FELDERA_IMAGE`       | `images.feldera.com/feldera/pipeline-manager:latest` | docker-compose            | Docker image for the Feldera container                                  |
-| `FELDERA_PORT`        | `8080`                                               | docker-compose            | Host port mapped to the Feldera container                               |
-| `RUST_LOG`            | `info`                                               | docker-compose            | Log level inside the Feldera container                                  |
-| `SKIP_TEARDOWN`       | _(unset)_                                            | e2e (`run-dbt-local.sh`)  | Set to `1` to keep Feldera running after the e2e test and print UI URLs |
-| `DBT_DOCS_PORT`       | `18081`                                              | e2e (`run-dbt-local.sh`)  | Host port suggested for `dbt docs serve`                                |
+### Summary
+
+| Materialization                    | Feldera SQL                | Best for                                    |
+| ---------------------------------- | -------------------------- | ------------------------------------------- |
+| `view`                             | `CREATE VIEW`              | Intermediate transforms                     |
+| `view` + `materialized_view: true` | `CREATE MATERIALIZED VIEW` | Queryable outputs, connectors               |
+| `table`                            | `CREATE TABLE`             | External input sources (Kafka, S3, HTTP)    |
+| `incremental`                      | `CREATE MATERIALIZED VIEW` | IVM-backed aggregations and joins           |
+| `streaming_pipeline`               | Full program               | Multi-table/view pipelines as a single unit |
+| `seed`                             | `CREATE TABLE` + HTTP push | Small reference datasets                    |
+
+## Documentation
+
+- **[Feldera documentation](https://docs.feldera.com/)** — platform docs, SQL reference, connectors
+- **[dbt documentation](https://docs.getdbt.com/)** — general dbt usage and concepts
+
+## Contributing
+
+See [CONTRIBUTING.md](CONTRIBUTING.md) for development setup, testing, and
+project layout.
+
+## License
+
+Apache-2.0 — see [LICENSE](../../LICENSE) for details.