Skip to content

Commit 53932ff

Browse files
committed
docs: Add dbt integration documentation
Add comprehensive documentation for the new dbt integration feature: - Quick start guide with step-by-step instructions - CLI reference for `feast dbt list` and `feast dbt import` - Type mapping table for dbt to Feast types - Data source configuration examples (BigQuery, Snowflake, File) - Best practices for tagging, documentation, and CI/CD - Troubleshooting section Addresses review feedback from @franciscojavierarceo. Signed-off-by: YassinNouh21 <yassinnouh21@gmail.com> Signed-off-by: yassinnouh21 <yassinnouh21@gmail.com>
1 parent fb40e93 commit 53932ff

2 files changed

Lines changed: 371 additions & 0 deletions

File tree

docs/SUMMARY.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,7 @@
7676
* [Adding a custom provider](how-to-guides/customizing-feast/creating-a-custom-provider.md)
7777
* [Adding or reusing tests](how-to-guides/adding-or-reusing-tests.md)
7878
* [Starting Feast servers in TLS(SSL) Mode](how-to-guides/starting-feast-servers-tls-mode.md)
79+
* [Importing Features from dbt](how-to-guides/dbt-integration.md)
7980

8081
## Reference
8182

Lines changed: 370 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,370 @@
1+
# Importing Features from dbt
2+
3+
This guide explains how to use Feast's dbt integration to automatically import dbt models as Feast FeatureViews. This enables you to leverage your existing dbt transformations as feature definitions without manual duplication.
4+
5+
## Overview
6+
7+
[dbt (data build tool)](https://www.getdbt.com/) is a popular tool for transforming data in your warehouse. Many teams already use dbt to create feature tables. Feast's dbt integration allows you to:
8+
9+
- **Discover** dbt models tagged for feature engineering
10+
- **Import** model metadata (columns, types, descriptions) as Feast objects
11+
- **Generate** Python code for Entity, DataSource, and FeatureView definitions
12+
13+
This eliminates the need to manually define Feast objects that mirror your dbt models.
14+
15+
## Prerequisites
16+
17+
- A dbt project with compiled artifacts (`target/manifest.json`)
18+
- Feast installed with dbt support:
19+
20+
```bash
21+
pip install 'feast[dbt]'
22+
```
23+
24+
Or install the parser directly:
25+
26+
```bash
27+
pip install dbt-artifacts-parser
28+
```
29+
30+
## Quick Start
31+
32+
### 1. Tag your dbt models
33+
34+
In your dbt project, add a `feast` tag to models you want to import:
35+
36+
{% code title="models/driver_features.sql" %}
37+
```sql
38+
{{ config(
39+
materialized='table',
40+
tags=['feast']
41+
) }}
42+
43+
SELECT
44+
driver_id,
45+
event_timestamp,
46+
avg_rating,
47+
total_trips,
48+
is_active
49+
FROM {{ ref('stg_drivers') }}
50+
```
51+
{% endcode %}
52+
53+
### 2. Define column types in schema.yml
54+
55+
Feast uses column metadata from your `schema.yml` to determine feature types:
56+
57+
{% code title="models/schema.yml" %}
58+
```yaml
59+
version: 2
60+
models:
61+
- name: driver_features
62+
description: "Driver aggregated features for ML models"
63+
columns:
64+
- name: driver_id
65+
description: "Unique driver identifier"
66+
data_type: STRING
67+
- name: event_timestamp
68+
description: "Feature timestamp"
69+
data_type: TIMESTAMP
70+
- name: avg_rating
71+
description: "Average driver rating"
72+
data_type: FLOAT64
73+
- name: total_trips
74+
description: "Total completed trips"
75+
data_type: INT64
76+
- name: is_active
77+
description: "Whether driver is currently active"
78+
data_type: BOOLEAN
79+
```
80+
{% endcode %}
81+
82+
### 3. Compile your dbt project
83+
84+
```bash
85+
cd your_dbt_project
86+
dbt compile
87+
```
88+
89+
This generates `target/manifest.json` which Feast will read.
90+
91+
### 4. List available models
92+
93+
Use the Feast CLI to discover tagged models:
94+
95+
```bash
96+
feast dbt list target/manifest.json --tag-filter feast
97+
```
98+
99+
Output:
100+
```
101+
Found 1 model(s) with tag 'feast':
102+
103+
driver_features
104+
Description: Driver aggregated features for ML models
105+
Columns: driver_id, event_timestamp, avg_rating, total_trips, is_active
106+
Tags: feast
107+
```
108+
109+
### 5. Import models as Feast definitions
110+
111+
Generate a Python file with Feast object definitions:
112+
113+
```bash
114+
feast dbt import target/manifest.json \
115+
--entity-column driver_id \
116+
--data-source-type bigquery \
117+
--tag-filter feast \
118+
--output features/driver_features.py
119+
```
120+
121+
This generates:
122+
123+
{% code title="features/driver_features.py" %}
124+
```python
125+
"""
126+
Feast feature definitions generated from dbt models.
127+
128+
Source: target/manifest.json
129+
Project: my_dbt_project
130+
Generated by: feast dbt import
131+
"""
132+
133+
from datetime import timedelta
134+
135+
from feast import Entity, FeatureView, Field
136+
from feast.types import Bool, Float64, Int64
137+
from feast.infra.offline_stores.bigquery_source import BigQuerySource
138+
139+
140+
# Entities
141+
driver_id = Entity(
142+
name="driver_id",
143+
join_keys=["driver_id"],
144+
description="Entity key for dbt models",
145+
tags={'source': 'dbt'},
146+
)
147+
148+
149+
# Data Sources
150+
driver_features_source = BigQuerySource(
151+
name="driver_features_source",
152+
table="my_project.my_dataset.driver_features",
153+
timestamp_field="event_timestamp",
154+
description="Driver aggregated features for ML models",
155+
tags={'dbt.model': 'driver_features', 'dbt.tag.feast': 'true'},
156+
)
157+
158+
159+
# Feature Views
160+
driver_features_fv = FeatureView(
161+
name="driver_features",
162+
entities=[driver_id],
163+
ttl=timedelta(days=1),
164+
schema=[
165+
Field(name="avg_rating", dtype=Float64, description="Average driver rating"),
166+
Field(name="total_trips", dtype=Int64, description="Total completed trips"),
167+
Field(name="is_active", dtype=Bool, description="Whether driver is currently active"),
168+
],
169+
online=True,
170+
source=driver_features_source,
171+
description="Driver aggregated features for ML models",
172+
tags={'dbt.model': 'driver_features', 'dbt.tag.feast': 'true'},
173+
)
174+
```
175+
{% endcode %}
176+
177+
## CLI Reference
178+
179+
### `feast dbt list`
180+
181+
Discover dbt models available for import.
182+
183+
```bash
184+
feast dbt list <manifest_path> [OPTIONS]
185+
```
186+
187+
**Arguments:**
188+
- `manifest_path`: Path to dbt's `manifest.json` file
189+
190+
**Options:**
191+
- `--tag-filter`, `-t`: Filter models by dbt tag (e.g., `feast`)
192+
- `--model`, `-m`: Filter to specific model name(s)
193+
194+
### `feast dbt import`
195+
196+
Import dbt models as Feast object definitions.
197+
198+
```bash
199+
feast dbt import <manifest_path> [OPTIONS]
200+
```
201+
202+
**Arguments:**
203+
- `manifest_path`: Path to dbt's `manifest.json` file
204+
205+
**Options:**
206+
207+
| Option | Description | Default |
208+
|--------|-------------|---------|
209+
| `--entity-column`, `-e` | Column to use as entity key | (required) |
210+
| `--data-source-type`, `-d` | Data source type: `bigquery`, `snowflake`, `file` | `bigquery` |
211+
| `--tag-filter`, `-t` | Filter models by dbt tag | None |
212+
| `--model`, `-m` | Import specific model(s) only | None |
213+
| `--timestamp-field` | Timestamp column name | `event_timestamp` |
214+
| `--ttl-days` | Feature TTL in days | `1` |
215+
| `--exclude-columns` | Columns to exclude from features | None |
216+
| `--no-online` | Disable online serving | `False` |
217+
| `--output`, `-o` | Output Python file path | None (stdout) |
218+
| `--dry-run` | Preview without generating code | `False` |
219+
220+
## Type Mapping
221+
222+
Feast automatically maps dbt/warehouse column types to Feast types:
223+
224+
| dbt/SQL Type | Feast Type |
225+
|--------------|------------|
226+
| `STRING`, `VARCHAR`, `TEXT` | `String` |
227+
| `INT`, `INTEGER`, `BIGINT` | `Int64` |
228+
| `SMALLINT`, `TINYINT` | `Int32` |
229+
| `FLOAT`, `REAL` | `Float32` |
230+
| `DOUBLE`, `FLOAT64` | `Float64` |
231+
| `BOOLEAN`, `BOOL` | `Bool` |
232+
| `TIMESTAMP`, `DATETIME` | `UnixTimestamp` |
233+
| `BYTES`, `BINARY` | `Bytes` |
234+
| `ARRAY<type>` | `Array(type)` |
235+
236+
Snowflake `NUMBER(precision, scale)` types are handled specially:
237+
- Scale > 0: `Float64`
238+
- Precision <= 9: `Int32`
239+
- Precision <= 18: `Int64`
240+
- Precision > 18: `Float64`
241+
242+
## Data Source Configuration
243+
244+
### BigQuery
245+
246+
```bash
247+
feast dbt import manifest.json -e user_id -d bigquery -o features.py
248+
```
249+
250+
Generates `BigQuerySource` with the full table path from dbt metadata:
251+
```python
252+
BigQuerySource(
253+
table="project.dataset.table_name",
254+
...
255+
)
256+
```
257+
258+
### Snowflake
259+
260+
```bash
261+
feast dbt import manifest.json -e user_id -d snowflake -o features.py
262+
```
263+
264+
Generates `SnowflakeSource` with database, schema, and table:
265+
```python
266+
SnowflakeSource(
267+
database="MY_DB",
268+
schema="MY_SCHEMA",
269+
table="TABLE_NAME",
270+
...
271+
)
272+
```
273+
274+
### File
275+
276+
```bash
277+
feast dbt import manifest.json -e user_id -d file -o features.py
278+
```
279+
280+
Generates `FileSource` with a placeholder path:
281+
```python
282+
FileSource(
283+
path="/data/table_name.parquet",
284+
...
285+
)
286+
```
287+
288+
{% hint style="info" %}
289+
For file sources, update the generated path to point to your actual data files.
290+
{% endhint %}
291+
292+
## Best Practices
293+
294+
### 1. Use consistent tagging
295+
296+
Create a standard tagging convention in your dbt project:
297+
298+
```yaml
299+
# dbt_project.yml
300+
models:
301+
my_project:
302+
features:
303+
+tags: ['feast'] # All models in features/ get the feast tag
304+
```
305+
306+
### 2. Document your columns
307+
308+
Column descriptions from `schema.yml` are preserved in the generated Feast definitions, making your feature catalog self-documenting.
309+
310+
### 3. Review before committing
311+
312+
Use `--dry-run` to preview what will be generated:
313+
314+
```bash
315+
feast dbt import manifest.json -e user_id -d bigquery --dry-run
316+
```
317+
318+
### 4. Version control generated code
319+
320+
Commit the generated Python files to your repository. This allows you to:
321+
- Track changes to feature definitions over time
322+
- Review dbt-to-Feast mapping in pull requests
323+
- Customize generated code if needed
324+
325+
### 5. Integrate with CI/CD
326+
327+
Add dbt import to your CI pipeline:
328+
329+
```yaml
330+
# .github/workflows/features.yml
331+
- name: Compile dbt
332+
run: dbt compile
333+
334+
- name: Generate Feast definitions
335+
run: |
336+
feast dbt import target/manifest.json \
337+
-e user_id -d bigquery -t feast \
338+
-o feature_repo/features.py
339+
340+
- name: Apply Feast changes
341+
run: feast apply
342+
```
343+
344+
## Limitations
345+
346+
- **Single entity support**: Currently supports one entity column per import. For multi-entity models, run multiple imports or manually adjust the generated code.
347+
- **No incremental updates**: Each import generates a complete file. Use version control to track changes.
348+
- **Column types required**: Models without `data_type` in schema.yml default to `String` type.
349+
350+
## Troubleshooting
351+
352+
### "manifest.json not found"
353+
354+
Run `dbt compile` or `dbt run` first to generate the manifest file.
355+
356+
### "No models found with tag"
357+
358+
Check that your models have the correct tag in their config:
359+
360+
```sql
361+
{{ config(tags=['feast']) }}
362+
```
363+
364+
### "Missing entity column"
365+
366+
Ensure your dbt model includes the entity column specified with `--entity-column`. Models missing this column are skipped with a warning.
367+
368+
### "Missing timestamp column"
369+
370+
By default, Feast looks for `event_timestamp`. Use `--timestamp-field` to specify a different column name.

0 commit comments

Comments
 (0)