Skip to content

Commit f5640a8

Browse files
committed
feat: Added blog post for OpenLineage integration
Signed-off-by: ntkathole <nikhilkathole2683@gmail.com>
1 parent e1ddd6b commit f5640a8

File tree

2 files changed

+185
-0
lines changed

2 files changed

+185
-0
lines changed

docs/blog/README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,10 @@ Welcome to the Feast blog! Here you'll find articles about feature store develop
44

55
## Featured Posts
66

7+
{% content-ref url="feast-openlineage-integration.md" %}
8+
[feast-openlineage-integration.md](feast-openlineage-integration.md)
9+
{% endcontent-ref %}
10+
711
{% content-ref url="what-is-a-feature-store.md" %}
812
[what-is-a-feature-store.md](what-is-a-feature-store.md)
913
{% endcontent-ref %}
Lines changed: 181 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,181 @@
1+
# Feast Launches OpenLineage Integration for Data Lineage Tracking 🔗
2+
3+
*January 26, 2026* | *Nikhil Kathole, Francisco Javier Arceo*
4+
5+
## Feast and OpenLineage
6+
7+
Understanding where your ML features come from and how they flow through your system is critical for debugging, compliance, and governance. We are excited to announce that Feast now supports native integration with [OpenLineage](https://openlineage.io/), the open standard for data lineage collection and analysis.
8+
9+
With this integration, Feast automatically tracks and emits lineage events whenever you apply feature definitions or materialize features—**no code changes required**. Simply enable OpenLineage in your `feature_store.yaml`, and Feast handles the rest.
10+
11+
## Why Data Lineage Matters for Feature Stores
12+
13+
Feature stores manage the lifecycle of ML features, from raw data sources to model inference. As ML systems grow in complexity, teams often struggle to answer fundamental questions:
14+
15+
- *Where does this feature's data come from?*
16+
- *Which models depend on this feature view?*
17+
- *What downstream impact will changing this data source have?*
18+
- *How do I audit the data flow for compliance?*
19+
20+
OpenLineage solves these challenges by providing a standardized way to capture and visualize data lineage. By integrating OpenLineage into Feast, ML teams gain automatic visibility into their feature engineering pipelines without manual instrumentation.
21+
22+
## How It Works
23+
24+
The integration automatically emits OpenLineage events for two key operations:
25+
26+
### 1. Registry Changes (`feast apply`)
27+
28+
When you run `feast apply`, Feast creates a lineage graph that mirrors what you see in the Feast UI:
29+
30+
```
31+
DataSources ──┐
32+
├──→ feast_feature_views_{project} ──→ FeatureViews
33+
Entities ─────┘ │
34+
35+
36+
feature_service_{name} ──→ FeatureService
37+
```
38+
39+
This creates two types of jobs:
40+
- **`feast_feature_views_{project}`**: Shows how DataSources and Entities flow into FeatureViews
41+
- **`feature_service_{name}`**: Shows which FeatureViews compose each FeatureService
42+
43+
### 2. Feature Materialization (`feast materialize`)
44+
45+
When materializing features, Feast emits START, COMPLETE, and FAIL events, allowing you to track:
46+
- Which feature views were materialized
47+
- The time window of materialization
48+
- Success or failure status
49+
- Duration and row counts
50+
51+
## Getting Started
52+
53+
### Step 1: Install OpenLineage
54+
55+
```bash
56+
pip install feast[openlineage]
57+
```
58+
59+
### Step 2: Configure Your Feature Store
60+
61+
Add the `openlineage` section to your `feature_store.yaml`:
62+
63+
```yaml
64+
project: my_fraud_detection
65+
registry: data/registry.db
66+
provider: local
67+
online_store:
68+
type: sqlite
69+
path: data/online_store.db
70+
71+
openlineage:
72+
enabled: true
73+
transport_type: http
74+
transport_url: http://localhost:5000
75+
namespace: feast
76+
```
77+
78+
### Step 3: Start Marquez (Optional)
79+
80+
[Marquez](https://marquezproject.ai/) is the reference implementation for OpenLineage and provides a beautiful UI for exploring lineage:
81+
82+
```bash
83+
docker run -p 5000:5000 -p 3000:3000 marquezproject/marquez
84+
```
85+
86+
### Step 4: Apply Your Features
87+
88+
```python
89+
from feast import FeatureStore
90+
91+
fs = FeatureStore(repo_path="feature_repo")
92+
93+
# This automatically emits lineage events!
94+
fs.apply([
95+
driver_entity,
96+
driver_stats_source,
97+
driver_hourly_stats_view,
98+
driver_stats_service
99+
])
100+
```
101+
102+
Visit http://localhost:3000 to see your lineage graph in Marquez!
103+
104+
## Rich Metadata Tracking
105+
106+
The integration doesn't just track relationships—it captures comprehensive metadata about your Feast objects:
107+
108+
### Feature Views
109+
- Feature names, types, and descriptions
110+
- TTL (time-to-live) configuration
111+
- Associated entities
112+
- Custom tags
113+
- Online/offline store enablement
114+
115+
### Feature Services
116+
- Constituent feature views
117+
- Total feature count
118+
- Service-level descriptions and tags
119+
120+
### Data Sources
121+
- Source type (File, BigQuery, Snowflake, etc.)
122+
- Connection URIs
123+
- Timestamp fields
124+
- Field mappings
125+
126+
All this metadata is attached as OpenLineage facets, making it queryable and explorable in any OpenLineage-compatible tool.
127+
128+
## Try It Out: Complete Working Example
129+
130+
We've included a complete working example in the Feast repository that demonstrates the OpenLineage integration end-to-end. The example creates a driver statistics feature store and shows how lineage events are automatically emitted.
131+
132+
**Run the example:**
133+
134+
```bash
135+
# Start Marquez first
136+
docker run -p 5000:5000 -p 3000:3000 marquezproject/marquez
137+
138+
# Clone and run the example
139+
cd feast/examples/openlineage-integration
140+
python openlineage_demo.py --url http://localhost:5000
141+
142+
# View lineage at http://localhost:3000
143+
```
144+
145+
The example demonstrates:
146+
- Creating entities, data sources, feature views, and feature services
147+
- Automatic lineage emission on `feast apply`
148+
- Materialization tracking with START/COMPLETE events
149+
- Feature retrieval (no lineage events for retrieval operations)
150+
151+
In Marquez, you'll see the complete lineage graph:
152+
- `driver_stats_source` (DataSource) → `driver_hourly_stats` (FeatureView)
153+
- `driver_id` (Entity) → `driver_hourly_stats` (FeatureView)
154+
- `driver_hourly_stats` (FeatureView) → `driver_stats_service` (FeatureService)
155+
156+
Check out the [full example code](https://github.com/feast-dev/feast/tree/master/examples/openlineage-integration) for complete details including feature definitions with descriptions and tags
157+
158+
## Benefits for ML Teams
159+
160+
### 1. **Debugging Made Easy**
161+
When a model's predictions degrade, trace back through the lineage to identify which data source or feature transformation changed.
162+
163+
### 2. **Impact Analysis**
164+
Before modifying a data source, understand all downstream feature views and services that will be affected.
165+
166+
### 3. **Compliance & Audit**
167+
Maintain a complete audit trail of data flow for regulatory requirements like GDPR, CCPA, or SOC2.
168+
169+
### 4. **Documentation**
170+
Auto-generated lineage serves as living documentation that stays in sync with your actual feature store configuration.
171+
172+
### 5. **Cross-Team Collaboration**
173+
Data engineers, ML engineers, and data scientists can all view the same lineage graph to understand the feature store structure.
174+
175+
## Final Thoughts
176+
177+
Data lineage is becoming essential for production ML systems. With Feast's OpenLineage integration, you get automatic lineage tracking without any code changes. Combined with visualization tools like Marquez, teams can finally answer "where does this data come from?" with confidence.
178+
179+
This integration is available now in the latest version of Feast. We're excited to see how teams use it to improve their ML operations and welcome feedback from the community.
180+
181+
To get started, check out the [OpenLineage Integration documentation](../reference/openlineage.md) and the [example in the Feast repository](https://github.com/feast-dev/feast/tree/master/examples/openlineage-integration).

0 commit comments

Comments
 (0)