Skip to content

Commit e350360

Browse files
committed
feat: Added blog post for OpenLineage integration
Signed-off-by: ntkathole <nikhilkathole2683@gmail.com>
1 parent e1ddd6b commit e350360

File tree

5 files changed

+209
-9
lines changed

5 files changed

+209
-9
lines changed
Lines changed: 196 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,196 @@
1+
---
2+
title: Tracking Feature Lineage with OpenLineage
3+
description: Feast now supports native OpenLineage integration for automatic data lineage tracking of your ML features - no code changes required.
4+
date: 2026-01-26
5+
authors: ["Nikhil Kathole", "Francisco Javier Arceo"]
6+
---
7+
8+
<div class="hero-image">
9+
<img src="/images/blog/openlineage1.png" alt="Feast OpenLineage Integration - Marquez UI" loading="lazy">
10+
</div>
11+
12+
# Tracking Feature Lineage with OpenLineage 🔗
13+
14+
# Feast and OpenLineage
15+
16+
Understanding where your ML features come from and how they flow through your system is critical for debugging, compliance, and governance. We are excited to announce that Feast now supports native integration with [OpenLineage](https://openlineage.io/), the open standard for data lineage collection and analysis.
17+
18+
With this integration, Feast automatically tracks and emits lineage events whenever you apply feature definitions or materialize features—**no code changes required**. Simply enable OpenLineage in your `feature_store.yaml`, and Feast handles the rest.
19+
20+
# Why Data Lineage Matters for Feature Stores
21+
22+
Feature stores manage the lifecycle of ML features, from raw data sources to model inference. As ML systems grow in complexity, teams often struggle to answer fundamental questions:
23+
24+
- *Where does this feature's data come from?*
25+
- *Which models depend on this feature view?*
26+
- *What downstream impact will changing this data source have?*
27+
- *How do I audit the data flow for compliance?*
28+
29+
OpenLineage solves these challenges by providing a standardized way to capture and visualize data lineage. By integrating OpenLineage into Feast, ML teams gain automatic visibility into their feature engineering pipelines without manual instrumentation.
30+
31+
# How It Works
32+
33+
The integration automatically emits OpenLineage events for two key operations:
34+
35+
## Registry Changes (`feast apply`)
36+
37+
When you run `feast apply`, Feast creates a lineage graph that mirrors what you see in the Feast UI:
38+
39+
```
40+
DataSources ──┐
41+
├──→ feast_feature_views_{project} ──→ FeatureViews
42+
Entities ─────┘ │
43+
44+
45+
feature_service_{name} ──→ FeatureService
46+
```
47+
48+
This creates two types of jobs:
49+
- **`feast_feature_views_{project}`**: Shows how DataSources and Entities flow into FeatureViews
50+
- **`feature_service_{name}`**: Shows which FeatureViews compose each FeatureService
51+
52+
## Feature Materialization (`feast materialize`)
53+
54+
When materializing features, Feast emits START, COMPLETE, and FAIL events, allowing you to track:
55+
- Which feature views were materialized
56+
- The time window of materialization
57+
- Success or failure status
58+
- Duration and row counts
59+
60+
# Getting Started
61+
62+
## Step 1: Install OpenLineage
63+
64+
```bash
65+
pip install feast[openlineage]
66+
```
67+
68+
## Step 2: Configure Your Feature Store
69+
70+
Add the `openlineage` section to your `feature_store.yaml`:
71+
72+
```yaml
73+
project: my_fraud_detection
74+
registry: data/registry.db
75+
provider: local
76+
online_store:
77+
type: sqlite
78+
path: data/online_store.db
79+
80+
openlineage:
81+
enabled: true
82+
transport_type: http
83+
transport_url: http://localhost:5000
84+
namespace: feast
85+
```
86+
87+
## Step 3: Start Marquez (Optional)
88+
89+
[Marquez](https://marquezproject.ai/) is the reference implementation for OpenLineage and provides a beautiful UI for exploring lineage:
90+
91+
```bash
92+
docker run -p 5000:5000 -p 3000:3000 marquezproject/marquez
93+
```
94+
95+
## Step 4: Apply Your Features
96+
97+
```python
98+
from feast import FeatureStore
99+
100+
fs = FeatureStore(repo_path="feature_repo")
101+
102+
# This automatically emits lineage events!
103+
fs.apply([
104+
driver_entity,
105+
driver_stats_source,
106+
driver_hourly_stats_view,
107+
driver_stats_service
108+
])
109+
```
110+
111+
Visit http://localhost:3000 to see your lineage graph in Marquez!
112+
113+
# Rich Metadata Tracking
114+
115+
The integration doesn't just track relationships—it captures comprehensive metadata about your Feast objects:
116+
117+
**Feature Views**
118+
- Feature names, types, and descriptions
119+
- TTL (time-to-live) configuration
120+
- Associated entities
121+
- Custom tags
122+
- Online/offline store enablement
123+
124+
**Feature Services**
125+
- Constituent feature views
126+
- Total feature count
127+
- Service-level descriptions and tags
128+
129+
**Data Sources**
130+
- Source type (File, BigQuery, Snowflake, etc.)
131+
- Connection URIs
132+
- Timestamp fields
133+
- Field mappings
134+
135+
All this metadata is attached as OpenLineage facets, making it queryable and explorable in any OpenLineage-compatible tool.
136+
137+
# Try It Out: Complete Working Example
138+
139+
We've included a complete working example in the Feast repository that demonstrates the OpenLineage integration end-to-end. The example creates a driver statistics feature store and shows how lineage events are automatically emitted.
140+
141+
**Run the example:**
142+
143+
```bash
144+
# Start Marquez first
145+
docker run -p 5000:5000 -p 3000:3000 marquezproject/marquez
146+
147+
# Clone and run the example
148+
cd feast/examples/openlineage-integration
149+
python openlineage_demo.py --url http://localhost:5000
150+
151+
# View lineage at http://localhost:3000
152+
```
153+
154+
The example demonstrates:
155+
- Creating entities, data sources, feature views, and feature services
156+
- Automatic lineage emission on `feast apply`
157+
- Materialization tracking with START/COMPLETE events
158+
- Feature retrieval (no lineage events for retrieval operations)
159+
160+
In Marquez, you'll see the complete lineage graph:
161+
- `driver_stats_source` (DataSource) → `driver_hourly_stats` (FeatureView)
162+
- `driver_id` (Entity) → `driver_hourly_stats` (FeatureView)
163+
- `driver_hourly_stats` (FeatureView) → `driver_stats_service` (FeatureService)
164+
165+
<div class="content-image">
166+
<img src="/images/blog/openlineage2.png" alt="Feast Lineage Graph in Marquez UI" loading="lazy">
167+
</div>
168+
169+
Check out the [full example code](https://github.com/feast-dev/feast/tree/master/examples/openlineage-integration) for complete details including feature definitions with descriptions and tags.
170+
171+
# Benefits for ML Teams
172+
173+
**Debugging Made Easy**
174+
When a model's predictions degrade, trace back through the lineage to identify which data source or feature transformation changed.
175+
176+
**Impact Analysis**
177+
Before modifying a data source, understand all downstream feature views and services that will be affected.
178+
179+
**Compliance & Audit**
180+
Maintain a complete audit trail of data flow for regulatory requirements like GDPR, CCPA, or SOC2.
181+
182+
**Documentation**
183+
Auto-generated lineage serves as living documentation that stays in sync with your actual feature store configuration.
184+
185+
**Cross-Team Collaboration**
186+
Data engineers, ML engineers, and data scientists can all view the same lineage graph to understand the feature store structure.
187+
188+
# How Can I Get Started?
189+
190+
This integration is available now in the latest version of Feast. To get started:
191+
192+
1. Check out the [OpenLineage Integration documentation](https://docs.feast.dev/reference/openlineage)
193+
2. Try the [example in the Feast repository](https://github.com/feast-dev/feast/tree/master/examples/openlineage-integration)
194+
3. Join the [Feast Slack](https://slack.feast.dev) to share feedback and ask questions
195+
196+
We're excited to see how teams use OpenLineage integration to improve their ML operations and welcome feedback from the community!
474 KB
Loading
401 KB
Loading

sdk/python/feast/feature_store.py

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1810,10 +1810,12 @@ def _emit_openlineage_materialize_start(
18101810
if self._openlineage_emitter is None:
18111811
return None
18121812
try:
1813-
run_id, _ = self._openlineage_emitter.emit_materialize_start(
1813+
run_id, success = self._openlineage_emitter.emit_materialize_start(
18141814
feature_views, start_date, end_date, self.project
18151815
)
1816-
return run_id
1816+
# Return run_id only if START was successfully emitted
1817+
# This prevents orphaned COMPLETE/FAIL events
1818+
return run_id if run_id and success else None
18171819
except Exception as e:
18181820
warnings.warn(f"Failed to emit OpenLineage materialize start event: {e}")
18191821
return None
@@ -1824,7 +1826,7 @@ def _emit_openlineage_materialize_complete(
18241826
feature_views: List[Any],
18251827
):
18261828
"""Emit OpenLineage COMPLETE event for materialization."""
1827-
if self._openlineage_emitter is None or run_id is None:
1829+
if self._openlineage_emitter is None or not run_id:
18281830
return
18291831
try:
18301832
self._openlineage_emitter.emit_materialize_complete(
@@ -1839,7 +1841,7 @@ def _emit_openlineage_materialize_fail(
18391841
error_message: str,
18401842
):
18411843
"""Emit OpenLineage FAIL event for materialization."""
1842-
if self._openlineage_emitter is None or run_id is None:
1844+
if self._openlineage_emitter is None or not run_id:
18431845
return
18441846
try:
18451847
self._openlineage_emitter.emit_materialize_fail(

sdk/python/feast/openlineage/emitter.py

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -141,6 +141,9 @@ def emit_registry_lineage(
141141

142142
results = []
143143

144+
# Initialize to empty list in case listing fails
145+
feature_views: list = []
146+
144147
# Emit events for feature views
145148
try:
146149
feature_views = registry.list_feature_views(
@@ -485,17 +488,16 @@ def emit_materialize_start(
485488
)
486489
)
487490

488-
# Add entities as inputs
491+
# Add entities as inputs (use direct name for consistency with emit_apply)
489492
if hasattr(fv, "entities") and fv.entities:
490493
for entity_name in fv.entities:
491494
if entity_name and entity_name != "__dummy":
492-
entity_key = f"entity_{entity_name}"
493-
if entity_key not in seen_sources:
494-
seen_sources.add(entity_key)
495+
if entity_name not in seen_sources:
496+
seen_sources.add(entity_name)
495497
inputs.append(
496498
InputDataset(
497499
namespace=namespace,
498-
name=entity_key,
500+
name=entity_name,
499501
)
500502
)
501503

0 commit comments

Comments
 (0)