|
| 1 | +# Importing Features from dbt |
| 2 | + |
| 3 | +This guide explains how to use Feast's dbt integration to automatically import dbt models as Feast FeatureViews. This enables you to leverage your existing dbt transformations as feature definitions without manual duplication. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +[dbt (data build tool)](https://www.getdbt.com/) is a popular tool for transforming data in your warehouse. Many teams already use dbt to create feature tables. Feast's dbt integration allows you to: |
| 8 | + |
| 9 | +- **Discover** dbt models tagged for feature engineering |
| 10 | +- **Import** model metadata (columns, types, descriptions) as Feast objects |
| 11 | +- **Generate** Python code for Entity, DataSource, and FeatureView definitions |
| 12 | + |
| 13 | +This eliminates the need to manually define Feast objects that mirror your dbt models. |
| 14 | + |
| 15 | +## Prerequisites |
| 16 | + |
| 17 | +- A dbt project with compiled artifacts (`target/manifest.json`) |
| 18 | +- Feast installed with dbt support: |
| 19 | + |
| 20 | +```bash |
| 21 | +pip install 'feast[dbt]' |
| 22 | +``` |
| 23 | + |
| 24 | +Or install the parser directly: |
| 25 | + |
| 26 | +```bash |
| 27 | +pip install dbt-artifacts-parser |
| 28 | +``` |
| 29 | + |
| 30 | +## Quick Start |
| 31 | + |
| 32 | +### 1. Tag your dbt models |
| 33 | + |
| 34 | +In your dbt project, add a `feast` tag to models you want to import: |
| 35 | + |
| 36 | +{% code title="models/driver_features.sql" %} |
| 37 | +```sql |
| 38 | +{{ config( |
| 39 | + materialized='table', |
| 40 | + tags=['feast'] |
| 41 | +) }} |
| 42 | + |
| 43 | +SELECT |
| 44 | + driver_id, |
| 45 | + event_timestamp, |
| 46 | + avg_rating, |
| 47 | + total_trips, |
| 48 | + is_active |
| 49 | +FROM {{ ref('stg_drivers') }} |
| 50 | +``` |
| 51 | +{% endcode %} |
| 52 | + |
| 53 | +### 2. Define column types in schema.yml |
| 54 | + |
| 55 | +Feast uses column metadata from your `schema.yml` to determine feature types: |
| 56 | + |
| 57 | +{% code title="models/schema.yml" %} |
| 58 | +```yaml |
| 59 | +version: 2 |
| 60 | +models: |
| 61 | + - name: driver_features |
| 62 | + description: "Driver aggregated features for ML models" |
| 63 | + columns: |
| 64 | + - name: driver_id |
| 65 | + description: "Unique driver identifier" |
| 66 | + data_type: STRING |
| 67 | + - name: event_timestamp |
| 68 | + description: "Feature timestamp" |
| 69 | + data_type: TIMESTAMP |
| 70 | + - name: avg_rating |
| 71 | + description: "Average driver rating" |
| 72 | + data_type: FLOAT64 |
| 73 | + - name: total_trips |
| 74 | + description: "Total completed trips" |
| 75 | + data_type: INT64 |
| 76 | + - name: is_active |
| 77 | + description: "Whether driver is currently active" |
| 78 | + data_type: BOOLEAN |
| 79 | +``` |
| 80 | +{% endcode %} |
| 81 | +
|
| 82 | +### 3. Compile your dbt project |
| 83 | +
|
| 84 | +```bash |
| 85 | +cd your_dbt_project |
| 86 | +dbt compile |
| 87 | +``` |
| 88 | + |
| 89 | +This generates `target/manifest.json` which Feast will read. |
| 90 | + |
| 91 | +### 4. List available models |
| 92 | + |
| 93 | +Use the Feast CLI to discover tagged models: |
| 94 | + |
| 95 | +```bash |
| 96 | +feast dbt list target/manifest.json --tag-filter feast |
| 97 | +``` |
| 98 | + |
| 99 | +Output: |
| 100 | +``` |
| 101 | +Found 1 model(s) with tag 'feast': |
| 102 | +
|
| 103 | + driver_features |
| 104 | + Description: Driver aggregated features for ML models |
| 105 | + Columns: driver_id, event_timestamp, avg_rating, total_trips, is_active |
| 106 | + Tags: feast |
| 107 | +``` |
| 108 | + |
| 109 | +### 5. Import models as Feast definitions |
| 110 | + |
| 111 | +Generate a Python file with Feast object definitions: |
| 112 | + |
| 113 | +```bash |
| 114 | +feast dbt import target/manifest.json \ |
| 115 | + --entity-column driver_id \ |
| 116 | + --data-source-type bigquery \ |
| 117 | + --tag-filter feast \ |
| 118 | + --output features/driver_features.py |
| 119 | +``` |
| 120 | + |
| 121 | +This generates: |
| 122 | + |
| 123 | +{% code title="features/driver_features.py" %} |
| 124 | +```python |
| 125 | +""" |
| 126 | +Feast feature definitions generated from dbt models. |
| 127 | +
|
| 128 | +Source: target/manifest.json |
| 129 | +Project: my_dbt_project |
| 130 | +Generated by: feast dbt import |
| 131 | +""" |
| 132 | + |
| 133 | +from datetime import timedelta |
| 134 | + |
| 135 | +from feast import Entity, FeatureView, Field |
| 136 | +from feast.types import Bool, Float64, Int64 |
| 137 | +from feast.infra.offline_stores.bigquery_source import BigQuerySource |
| 138 | + |
| 139 | + |
| 140 | +# Entities |
| 141 | +driver_id = Entity( |
| 142 | + name="driver_id", |
| 143 | + join_keys=["driver_id"], |
| 144 | + description="Entity key for dbt models", |
| 145 | + tags={'source': 'dbt'}, |
| 146 | +) |
| 147 | + |
| 148 | + |
| 149 | +# Data Sources |
| 150 | +driver_features_source = BigQuerySource( |
| 151 | + name="driver_features_source", |
| 152 | + table="my_project.my_dataset.driver_features", |
| 153 | + timestamp_field="event_timestamp", |
| 154 | + description="Driver aggregated features for ML models", |
| 155 | + tags={'dbt.model': 'driver_features', 'dbt.tag.feast': 'true'}, |
| 156 | +) |
| 157 | + |
| 158 | + |
| 159 | +# Feature Views |
| 160 | +driver_features_fv = FeatureView( |
| 161 | + name="driver_features", |
| 162 | + entities=[driver_id], |
| 163 | + ttl=timedelta(days=1), |
| 164 | + schema=[ |
| 165 | + Field(name="avg_rating", dtype=Float64, description="Average driver rating"), |
| 166 | + Field(name="total_trips", dtype=Int64, description="Total completed trips"), |
| 167 | + Field(name="is_active", dtype=Bool, description="Whether driver is currently active"), |
| 168 | + ], |
| 169 | + online=True, |
| 170 | + source=driver_features_source, |
| 171 | + description="Driver aggregated features for ML models", |
| 172 | + tags={'dbt.model': 'driver_features', 'dbt.tag.feast': 'true'}, |
| 173 | +) |
| 174 | +``` |
| 175 | +{% endcode %} |
| 176 | + |
| 177 | +## CLI Reference |
| 178 | + |
| 179 | +### `feast dbt list` |
| 180 | + |
| 181 | +Discover dbt models available for import. |
| 182 | + |
| 183 | +```bash |
| 184 | +feast dbt list <manifest_path> [OPTIONS] |
| 185 | +``` |
| 186 | + |
| 187 | +**Arguments:** |
| 188 | +- `manifest_path`: Path to dbt's `manifest.json` file |
| 189 | + |
| 190 | +**Options:** |
| 191 | +- `--tag-filter`, `-t`: Filter models by dbt tag (e.g., `feast`) |
| 192 | +- `--model`, `-m`: Filter to specific model name(s) |
| 193 | + |
| 194 | +### `feast dbt import` |
| 195 | + |
| 196 | +Import dbt models as Feast object definitions. |
| 197 | + |
| 198 | +```bash |
| 199 | +feast dbt import <manifest_path> [OPTIONS] |
| 200 | +``` |
| 201 | + |
| 202 | +**Arguments:** |
| 203 | +- `manifest_path`: Path to dbt's `manifest.json` file |
| 204 | + |
| 205 | +**Options:** |
| 206 | + |
| 207 | +| Option | Description | Default | |
| 208 | +|--------|-------------|---------| |
| 209 | +| `--entity-column`, `-e` | Column to use as entity key | (required) | |
| 210 | +| `--data-source-type`, `-d` | Data source type: `bigquery`, `snowflake`, `file` | `bigquery` | |
| 211 | +| `--tag-filter`, `-t` | Filter models by dbt tag | None | |
| 212 | +| `--model`, `-m` | Import specific model(s) only | None | |
| 213 | +| `--timestamp-field` | Timestamp column name | `event_timestamp` | |
| 214 | +| `--ttl-days` | Feature TTL in days | `1` | |
| 215 | +| `--exclude-columns` | Columns to exclude from features | None | |
| 216 | +| `--no-online` | Disable online serving | `False` | |
| 217 | +| `--output`, `-o` | Output Python file path | None (stdout) | |
| 218 | +| `--dry-run` | Preview without generating code | `False` | |
| 219 | + |
| 220 | +## Type Mapping |
| 221 | + |
| 222 | +Feast automatically maps dbt/warehouse column types to Feast types: |
| 223 | + |
| 224 | +| dbt/SQL Type | Feast Type | |
| 225 | +|--------------|------------| |
| 226 | +| `STRING`, `VARCHAR`, `TEXT` | `String` | |
| 227 | +| `INT`, `INTEGER`, `BIGINT` | `Int64` | |
| 228 | +| `SMALLINT`, `TINYINT` | `Int32` | |
| 229 | +| `FLOAT`, `REAL` | `Float32` | |
| 230 | +| `DOUBLE`, `FLOAT64` | `Float64` | |
| 231 | +| `BOOLEAN`, `BOOL` | `Bool` | |
| 232 | +| `TIMESTAMP`, `DATETIME` | `UnixTimestamp` | |
| 233 | +| `BYTES`, `BINARY` | `Bytes` | |
| 234 | +| `ARRAY<type>` | `Array(type)` | |
| 235 | + |
| 236 | +Snowflake `NUMBER(precision, scale)` types are handled specially: |
| 237 | +- Scale > 0: `Float64` |
| 238 | +- Precision <= 9: `Int32` |
| 239 | +- Precision <= 18: `Int64` |
| 240 | +- Precision > 18: `Float64` |
| 241 | + |
| 242 | +## Data Source Configuration |
| 243 | + |
| 244 | +### BigQuery |
| 245 | + |
| 246 | +```bash |
| 247 | +feast dbt import manifest.json -e user_id -d bigquery -o features.py |
| 248 | +``` |
| 249 | + |
| 250 | +Generates `BigQuerySource` with the full table path from dbt metadata: |
| 251 | +```python |
| 252 | +BigQuerySource( |
| 253 | + table="project.dataset.table_name", |
| 254 | + ... |
| 255 | +) |
| 256 | +``` |
| 257 | + |
| 258 | +### Snowflake |
| 259 | + |
| 260 | +```bash |
| 261 | +feast dbt import manifest.json -e user_id -d snowflake -o features.py |
| 262 | +``` |
| 263 | + |
| 264 | +Generates `SnowflakeSource` with database, schema, and table: |
| 265 | +```python |
| 266 | +SnowflakeSource( |
| 267 | + database="MY_DB", |
| 268 | + schema="MY_SCHEMA", |
| 269 | + table="TABLE_NAME", |
| 270 | + ... |
| 271 | +) |
| 272 | +``` |
| 273 | + |
| 274 | +### File |
| 275 | + |
| 276 | +```bash |
| 277 | +feast dbt import manifest.json -e user_id -d file -o features.py |
| 278 | +``` |
| 279 | + |
| 280 | +Generates `FileSource` with a placeholder path: |
| 281 | +```python |
| 282 | +FileSource( |
| 283 | + path="/data/table_name.parquet", |
| 284 | + ... |
| 285 | +) |
| 286 | +``` |
| 287 | + |
| 288 | +{% hint style="info" %} |
| 289 | +For file sources, update the generated path to point to your actual data files. |
| 290 | +{% endhint %} |
| 291 | + |
| 292 | +## Best Practices |
| 293 | + |
| 294 | +### 1. Use consistent tagging |
| 295 | + |
| 296 | +Create a standard tagging convention in your dbt project: |
| 297 | + |
| 298 | +```yaml |
| 299 | +# dbt_project.yml |
| 300 | +models: |
| 301 | + my_project: |
| 302 | + features: |
| 303 | + +tags: ['feast'] # All models in features/ get the feast tag |
| 304 | +``` |
| 305 | +
|
| 306 | +### 2. Document your columns |
| 307 | +
|
| 308 | +Column descriptions from `schema.yml` are preserved in the generated Feast definitions, making your feature catalog self-documenting. |
| 309 | + |
| 310 | +### 3. Review before committing |
| 311 | + |
| 312 | +Use `--dry-run` to preview what will be generated: |
| 313 | + |
| 314 | +```bash |
| 315 | +feast dbt import manifest.json -e user_id -d bigquery --dry-run |
| 316 | +``` |
| 317 | + |
| 318 | +### 4. Version control generated code |
| 319 | + |
| 320 | +Commit the generated Python files to your repository. This allows you to: |
| 321 | +- Track changes to feature definitions over time |
| 322 | +- Review dbt-to-Feast mapping in pull requests |
| 323 | +- Customize generated code if needed |
| 324 | + |
| 325 | +### 5. Integrate with CI/CD |
| 326 | + |
| 327 | +Add dbt import to your CI pipeline: |
| 328 | + |
| 329 | +```yaml |
| 330 | +# .github/workflows/features.yml |
| 331 | +- name: Compile dbt |
| 332 | + run: dbt compile |
| 333 | +
|
| 334 | +- name: Generate Feast definitions |
| 335 | + run: | |
| 336 | + feast dbt import target/manifest.json \ |
| 337 | + -e user_id -d bigquery -t feast \ |
| 338 | + -o feature_repo/features.py |
| 339 | +
|
| 340 | +- name: Apply Feast changes |
| 341 | + run: feast apply |
| 342 | +``` |
| 343 | + |
| 344 | +## Limitations |
| 345 | + |
| 346 | +- **Single entity support**: Currently supports one entity column per import. For multi-entity models, run multiple imports or manually adjust the generated code. |
| 347 | +- **No incremental updates**: Each import generates a complete file. Use version control to track changes. |
| 348 | +- **Column types required**: Models without `data_type` in schema.yml default to `String` type. |
| 349 | + |
| 350 | +## Troubleshooting |
| 351 | + |
| 352 | +### "manifest.json not found" |
| 353 | + |
| 354 | +Run `dbt compile` or `dbt run` first to generate the manifest file. |
| 355 | + |
| 356 | +### "No models found with tag" |
| 357 | + |
| 358 | +Check that your models have the correct tag in their config: |
| 359 | + |
| 360 | +```sql |
| 361 | +{{ config(tags=['feast']) }} |
| 362 | +``` |
| 363 | + |
| 364 | +### "Missing entity column" |
| 365 | + |
| 366 | +Ensure your dbt model includes the entity column specified with `--entity-column`. Models missing this column are skipped with a warning. |
| 367 | + |
| 368 | +### "Missing timestamp column" |
| 369 | + |
| 370 | +By default, Feast looks for `event_timestamp`. Use `--timestamp-field` to specify a different column name. |
0 commit comments