Skip to content

Commit b997361

Browse files
feat: Add dbt integration for importing models as FeatureViews (feast-dev#5827)
* feat: Add dbt integration for importing models as FeatureViews (feast-dev#3335) This PR implements the dbt-Feast integration feature requested in feast-dev#3335, enabling users to import dbt models as Feast FeatureViews. ## New CLI Commands - `feast dbt list` - List dbt models available for import - `feast dbt import` - Import dbt models as Feast objects ## Features - Parse dbt manifest.json files to extract model metadata - Map dbt types to Feast types (38 types supported) - Generate Entity, DataSource, and FeatureView objects - Support for BigQuery, Snowflake, and File data sources - Tag-based filtering (--tag) to select specific models - Code generation (--output) to create Python files - Dry-run mode to preview changes before applying ## Usage Examples ```bash # List models with 'feast' tag feast dbt list -m target/manifest.json --tag feast # Import models to registry feast dbt import -m target/manifest.json -e driver_id --tag feast # Generate Python file instead feast dbt import -m target/manifest.json -e driver_id --output features.py ``` Closes feast-dev#3335 Signed-off-by: yassinnouh21 <yassinnouh21@gmail.com> * fix: Address mypy and ruff lint errors in dbt integration Signed-off-by: yassinnouh21 <yassinnouh21@gmail.com> * fix: Address ruff lint errors in dbt unit tests Signed-off-by: yassinnouh21 <yassinnouh21@gmail.com> * style: Format dbt files with ruff Signed-off-by: yassinnouh21 <yassinnouh21@gmail.com> * fix: Remove unused dbt-artifacts-parser import and fix enum import Signed-off-by: yassinnouh21 <yassinnouh21@gmail.com> * feat: Use dbt-artifacts-parser for typed manifest parsing - Add dbt-artifacts-parser as optional dependency (feast[dbt]) - Update parser to use typed parsing with fallback to raw dict - Provides better support for manifest versions v1-v12 Signed-off-by: yassinnouh21 <yassinnouh21@gmail.com> * fix: Add graceful fallback for dbt-artifacts-parser validation errors When parsing minimal/incomplete manifests (e.g., in unit tests), dbt-artifacts-parser may fail validation. This change adds a graceful fallback to use raw dict parsing when typed parsing fails. Also updated test fixture with dbt_schema_version field. Signed-off-by: yassinnouh21 <yassinnouh21@gmail.com> * fix: Skip dbt tests when dbt-artifacts-parser is not installed Since dbt-artifacts-parser is an optional dependency, unit tests should be skipped in CI when it's not installed. Signed-off-by: yassinnouh21 <yassinnouh21@gmail.com> * refactor: Simplify parser to rely solely on dbt-artifacts-parser Removed manual/fallback dict parsing code. The parser now exclusively uses dbt-artifacts-parser typed objects. Updated test fixtures to create complete manifests that dbt-artifacts-parser can parse. Signed-off-by: yassinnouh21 <yassinnouh21@gmail.com> * ci: Add dbt-artifacts-parser to unit test dependencies Install dbt-artifacts-parser in CI so dbt unit tests run instead of being skipped. Signed-off-by: yassinnouh21 <yassinnouh21@gmail.com> * fix: Address Copilot code review comments for dbt integration - mapper.py: Fix Array element type check to use set membership instead of incorrect isinstance() comparison - codegen.py: Add safe getattr() with fallback for Array.base_type access Signed-off-by: yassinnouh21 <yassinnouh21@gmail.com> * fix: Only add ellipsis to truncated descriptions Signed-off-by: yassinnouh21 <yassinnouh21@gmail.com> * style: Format dbt files with ruff Signed-off-by: yassinnouh21 <yassinnouh21@gmail.com> * fix: Convert doctest examples to code blocks to avoid CI failures Signed-off-by: yassinnouh21 <yassinnouh21@gmail.com> * fix: Add dbt-artifacts-parser to feast[ci] and update requirements - Add dbt-artifacts-parser to pyproject.toml under feast[dbt] and feast[ci] extras - Remove separate install step from unit_tests.yml workflow - Update all requirements lock files Addresses review feedback from @ntkathole. Signed-off-by: YassinNouh21 <yassinnouh21@gmail.com> Signed-off-by: yassinnouh21 <yassinnouh21@gmail.com> * docs: Add dbt integration documentation Add comprehensive documentation for the new dbt integration feature: - Quick start guide with step-by-step instructions - CLI reference for `feast dbt list` and `feast dbt import` - Type mapping table for dbt to Feast types - Data source configuration examples (BigQuery, Snowflake, File) - Best practices for tagging, documentation, and CI/CD - Troubleshooting section Addresses review feedback from @franciscojavierarceo. Signed-off-by: YassinNouh21 <yassinnouh21@gmail.com> Signed-off-by: yassinnouh21 <yassinnouh21@gmail.com> * docs: Add alpha warning to dbt integration documentation Add prominent warning callout highlighting that the dbt integration is an alpha feature with current limitations. This sets proper expectations for users regarding: - Supported data sources (BigQuery, Snowflake, File only) - Single entity per model constraint - Potential for breaking changes in future releases Addresses feedback from PR feast-dev#5827 review comments. Signed-off-by: yassinnouh21 <yassinnouh21@gmail.com> * fix: Add dbt-artifacts-parser to CI_REQUIRED dependencies Ensure dbt-artifacts-parser is installed in CI environments by adding it to the CI_REQUIRED list in setup.py. This matches the dependency already present in pyproject.toml and ensures CI tests for dbt integration have access to the required parser library. Addresses feedback from PR feast-dev#5827 review comments. Signed-off-by: yassinnouh21 <yassinnouh21@gmail.com> * fix: Add defensive Array.base_type handling with logging Add logging and defensive attribute access for Array.base_type in code generation to prevent potential AttributeError. While Array.__init__ always sets base_type, defensive programming with warnings provides: - Protection against edge cases or future Array implementation changes - Clear visibility when fallback occurs via logger.warning - Consistent error handling across both usage sites Changes: - Add logging module and logger instance - Update _get_feast_type_name() to use getattr with warning - Update import tracking logic to use getattr with warning - Add concise comments with examples (e.g., Array(String) -> base_type = String) Addresses code review feedback from PR feast-dev#5827. Signed-off-by: yassinnouh21 <yassinnouh21@gmail.com> * docs: Add comment explaining ImageBytes/PdfBytes exclusion Add clarifying comment in type_map explaining why ImageBytes and PdfBytes are not included in the dbt type mapping. While these types exist in Feast, dbt manifests only expose generic BYTES type without semantic information to distinguish between regular bytes, images, or PDFs. Example: A dbt model with image and PDF columns both appear as 'BYTES' in the manifest, making ImageBytes/PdfBytes types unmappable from dbt artifacts. Addresses feedback from PR feast-dev#5827 review (franciscojavierarceo). Signed-off-by: yassinnouh21 <yassinnouh21@gmail.com> * fix: Move imports to top of file to resolve linter errors - Fix E402 linter error in feast/dbt/codegen.py by moving imports before logger initialization - Update requirements files to include dbt-artifacts-parser in pydantic dependency comments Signed-off-by: yassinnouh21 <yassinnouh21@gmail.com> --------- Signed-off-by: yassinnouh21 <yassinnouh21@gmail.com> Signed-off-by: YassinNouh21 <yassinnouh21@gmail.com> Co-authored-by: Francisco Javier Arceo <arceofrancisco@gmail.com>
1 parent f6116f9 commit b997361

File tree

17 files changed

+2692
-0
lines changed

17 files changed

+2692
-0
lines changed

docs/SUMMARY.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,7 @@
7676
* [Adding a custom provider](how-to-guides/customizing-feast/creating-a-custom-provider.md)
7777
* [Adding or reusing tests](how-to-guides/adding-or-reusing-tests.md)
7878
* [Starting Feast servers in TLS(SSL) Mode](how-to-guides/starting-feast-servers-tls-mode.md)
79+
* [Importing Features from dbt](how-to-guides/dbt-integration.md)
7980

8081
## Reference
8182

Lines changed: 381 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,381 @@
1+
# Importing Features from dbt
2+
3+
{% hint style="warning" %}
4+
**Alpha Feature**: The dbt integration is currently in early development and subject to change.
5+
6+
**Current Limitations**:
7+
- Supported data sources: BigQuery, Snowflake, and File-based sources only
8+
- Single entity per model
9+
- Manual entity column specification required
10+
11+
Breaking changes may occur in future releases.
12+
{% endhint %}
13+
14+
This guide explains how to use Feast's dbt integration to automatically import dbt models as Feast FeatureViews. This enables you to leverage your existing dbt transformations as feature definitions without manual duplication.
15+
16+
## Overview
17+
18+
[dbt (data build tool)](https://www.getdbt.com/) is a popular tool for transforming data in your warehouse. Many teams already use dbt to create feature tables. Feast's dbt integration allows you to:
19+
20+
- **Discover** dbt models tagged for feature engineering
21+
- **Import** model metadata (columns, types, descriptions) as Feast objects
22+
- **Generate** Python code for Entity, DataSource, and FeatureView definitions
23+
24+
This eliminates the need to manually define Feast objects that mirror your dbt models.
25+
26+
## Prerequisites
27+
28+
- A dbt project with compiled artifacts (`target/manifest.json`)
29+
- Feast installed with dbt support:
30+
31+
```bash
32+
pip install 'feast[dbt]'
33+
```
34+
35+
Or install the parser directly:
36+
37+
```bash
38+
pip install dbt-artifacts-parser
39+
```
40+
41+
## Quick Start
42+
43+
### 1. Tag your dbt models
44+
45+
In your dbt project, add a `feast` tag to models you want to import:
46+
47+
{% code title="models/driver_features.sql" %}
48+
```sql
49+
{{ config(
50+
materialized='table',
51+
tags=['feast']
52+
) }}
53+
54+
SELECT
55+
driver_id,
56+
event_timestamp,
57+
avg_rating,
58+
total_trips,
59+
is_active
60+
FROM {{ ref('stg_drivers') }}
61+
```
62+
{% endcode %}
63+
64+
### 2. Define column types in schema.yml
65+
66+
Feast uses column metadata from your `schema.yml` to determine feature types:
67+
68+
{% code title="models/schema.yml" %}
69+
```yaml
70+
version: 2
71+
models:
72+
- name: driver_features
73+
description: "Driver aggregated features for ML models"
74+
columns:
75+
- name: driver_id
76+
description: "Unique driver identifier"
77+
data_type: STRING
78+
- name: event_timestamp
79+
description: "Feature timestamp"
80+
data_type: TIMESTAMP
81+
- name: avg_rating
82+
description: "Average driver rating"
83+
data_type: FLOAT64
84+
- name: total_trips
85+
description: "Total completed trips"
86+
data_type: INT64
87+
- name: is_active
88+
description: "Whether driver is currently active"
89+
data_type: BOOLEAN
90+
```
91+
{% endcode %}
92+
93+
### 3. Compile your dbt project
94+
95+
```bash
96+
cd your_dbt_project
97+
dbt compile
98+
```
99+
100+
This generates `target/manifest.json` which Feast will read.
101+
102+
### 4. List available models
103+
104+
Use the Feast CLI to discover tagged models:
105+
106+
```bash
107+
feast dbt list target/manifest.json --tag-filter feast
108+
```
109+
110+
Output:
111+
```
112+
Found 1 model(s) with tag 'feast':
113+
114+
driver_features
115+
Description: Driver aggregated features for ML models
116+
Columns: driver_id, event_timestamp, avg_rating, total_trips, is_active
117+
Tags: feast
118+
```
119+
120+
### 5. Import models as Feast definitions
121+
122+
Generate a Python file with Feast object definitions:
123+
124+
```bash
125+
feast dbt import target/manifest.json \
126+
--entity-column driver_id \
127+
--data-source-type bigquery \
128+
--tag-filter feast \
129+
--output features/driver_features.py
130+
```
131+
132+
This generates:
133+
134+
{% code title="features/driver_features.py" %}
135+
```python
136+
"""
137+
Feast feature definitions generated from dbt models.
138+
139+
Source: target/manifest.json
140+
Project: my_dbt_project
141+
Generated by: feast dbt import
142+
"""
143+
144+
from datetime import timedelta
145+
146+
from feast import Entity, FeatureView, Field
147+
from feast.types import Bool, Float64, Int64
148+
from feast.infra.offline_stores.bigquery_source import BigQuerySource
149+
150+
151+
# Entities
152+
driver_id = Entity(
153+
name="driver_id",
154+
join_keys=["driver_id"],
155+
description="Entity key for dbt models",
156+
tags={'source': 'dbt'},
157+
)
158+
159+
160+
# Data Sources
161+
driver_features_source = BigQuerySource(
162+
name="driver_features_source",
163+
table="my_project.my_dataset.driver_features",
164+
timestamp_field="event_timestamp",
165+
description="Driver aggregated features for ML models",
166+
tags={'dbt.model': 'driver_features', 'dbt.tag.feast': 'true'},
167+
)
168+
169+
170+
# Feature Views
171+
driver_features_fv = FeatureView(
172+
name="driver_features",
173+
entities=[driver_id],
174+
ttl=timedelta(days=1),
175+
schema=[
176+
Field(name="avg_rating", dtype=Float64, description="Average driver rating"),
177+
Field(name="total_trips", dtype=Int64, description="Total completed trips"),
178+
Field(name="is_active", dtype=Bool, description="Whether driver is currently active"),
179+
],
180+
online=True,
181+
source=driver_features_source,
182+
description="Driver aggregated features for ML models",
183+
tags={'dbt.model': 'driver_features', 'dbt.tag.feast': 'true'},
184+
)
185+
```
186+
{% endcode %}
187+
188+
## CLI Reference
189+
190+
### `feast dbt list`
191+
192+
Discover dbt models available for import.
193+
194+
```bash
195+
feast dbt list <manifest_path> [OPTIONS]
196+
```
197+
198+
**Arguments:**
199+
- `manifest_path`: Path to dbt's `manifest.json` file
200+
201+
**Options:**
202+
- `--tag-filter`, `-t`: Filter models by dbt tag (e.g., `feast`)
203+
- `--model`, `-m`: Filter to specific model name(s)
204+
205+
### `feast dbt import`
206+
207+
Import dbt models as Feast object definitions.
208+
209+
```bash
210+
feast dbt import <manifest_path> [OPTIONS]
211+
```
212+
213+
**Arguments:**
214+
- `manifest_path`: Path to dbt's `manifest.json` file
215+
216+
**Options:**
217+
218+
| Option | Description | Default |
219+
|--------|-------------|---------|
220+
| `--entity-column`, `-e` | Column to use as entity key | (required) |
221+
| `--data-source-type`, `-d` | Data source type: `bigquery`, `snowflake`, `file` | `bigquery` |
222+
| `--tag-filter`, `-t` | Filter models by dbt tag | None |
223+
| `--model`, `-m` | Import specific model(s) only | None |
224+
| `--timestamp-field` | Timestamp column name | `event_timestamp` |
225+
| `--ttl-days` | Feature TTL in days | `1` |
226+
| `--exclude-columns` | Columns to exclude from features | None |
227+
| `--no-online` | Disable online serving | `False` |
228+
| `--output`, `-o` | Output Python file path | None (stdout) |
229+
| `--dry-run` | Preview without generating code | `False` |
230+
231+
## Type Mapping
232+
233+
Feast automatically maps dbt/warehouse column types to Feast types:
234+
235+
| dbt/SQL Type | Feast Type |
236+
|--------------|------------|
237+
| `STRING`, `VARCHAR`, `TEXT` | `String` |
238+
| `INT`, `INTEGER`, `BIGINT` | `Int64` |
239+
| `SMALLINT`, `TINYINT` | `Int32` |
240+
| `FLOAT`, `REAL` | `Float32` |
241+
| `DOUBLE`, `FLOAT64` | `Float64` |
242+
| `BOOLEAN`, `BOOL` | `Bool` |
243+
| `TIMESTAMP`, `DATETIME` | `UnixTimestamp` |
244+
| `BYTES`, `BINARY` | `Bytes` |
245+
| `ARRAY<type>` | `Array(type)` |
246+
247+
Snowflake `NUMBER(precision, scale)` types are handled specially:
248+
- Scale > 0: `Float64`
249+
- Precision <= 9: `Int32`
250+
- Precision <= 18: `Int64`
251+
- Precision > 18: `Float64`
252+
253+
## Data Source Configuration
254+
255+
### BigQuery
256+
257+
```bash
258+
feast dbt import manifest.json -e user_id -d bigquery -o features.py
259+
```
260+
261+
Generates `BigQuerySource` with the full table path from dbt metadata:
262+
```python
263+
BigQuerySource(
264+
table="project.dataset.table_name",
265+
...
266+
)
267+
```
268+
269+
### Snowflake
270+
271+
```bash
272+
feast dbt import manifest.json -e user_id -d snowflake -o features.py
273+
```
274+
275+
Generates `SnowflakeSource` with database, schema, and table:
276+
```python
277+
SnowflakeSource(
278+
database="MY_DB",
279+
schema="MY_SCHEMA",
280+
table="TABLE_NAME",
281+
...
282+
)
283+
```
284+
285+
### File
286+
287+
```bash
288+
feast dbt import manifest.json -e user_id -d file -o features.py
289+
```
290+
291+
Generates `FileSource` with a placeholder path:
292+
```python
293+
FileSource(
294+
path="/data/table_name.parquet",
295+
...
296+
)
297+
```
298+
299+
{% hint style="info" %}
300+
For file sources, update the generated path to point to your actual data files.
301+
{% endhint %}
302+
303+
## Best Practices
304+
305+
### 1. Use consistent tagging
306+
307+
Create a standard tagging convention in your dbt project:
308+
309+
```yaml
310+
# dbt_project.yml
311+
models:
312+
my_project:
313+
features:
314+
+tags: ['feast'] # All models in features/ get the feast tag
315+
```
316+
317+
### 2. Document your columns
318+
319+
Column descriptions from `schema.yml` are preserved in the generated Feast definitions, making your feature catalog self-documenting.
320+
321+
### 3. Review before committing
322+
323+
Use `--dry-run` to preview what will be generated:
324+
325+
```bash
326+
feast dbt import manifest.json -e user_id -d bigquery --dry-run
327+
```
328+
329+
### 4. Version control generated code
330+
331+
Commit the generated Python files to your repository. This allows you to:
332+
- Track changes to feature definitions over time
333+
- Review dbt-to-Feast mapping in pull requests
334+
- Customize generated code if needed
335+
336+
### 5. Integrate with CI/CD
337+
338+
Add dbt import to your CI pipeline:
339+
340+
```yaml
341+
# .github/workflows/features.yml
342+
- name: Compile dbt
343+
run: dbt compile
344+
345+
- name: Generate Feast definitions
346+
run: |
347+
feast dbt import target/manifest.json \
348+
-e user_id -d bigquery -t feast \
349+
-o feature_repo/features.py
350+
351+
- name: Apply Feast changes
352+
run: feast apply
353+
```
354+
355+
## Limitations
356+
357+
- **Single entity support**: Currently supports one entity column per import. For multi-entity models, run multiple imports or manually adjust the generated code.
358+
- **No incremental updates**: Each import generates a complete file. Use version control to track changes.
359+
- **Column types required**: Models without `data_type` in schema.yml default to `String` type.
360+
361+
## Troubleshooting
362+
363+
### "manifest.json not found"
364+
365+
Run `dbt compile` or `dbt run` first to generate the manifest file.
366+
367+
### "No models found with tag"
368+
369+
Check that your models have the correct tag in their config:
370+
371+
```sql
372+
{{ config(tags=['feast']) }}
373+
```
374+
375+
### "Missing entity column"
376+
377+
Ensure your dbt model includes the entity column specified with `--entity-column`. Models missing this column are skipped with a warning.
378+
379+
### "Missing timestamp column"
380+
381+
By default, Feast looks for `event_timestamp`. Use `--timestamp-field` to specify a different column name.

0 commit comments

Comments
 (0)