Skip to content

Commit e8b524b

Browse files
committed
feat: Added blog post for OpenLineage integration
Signed-off-by: ntkathole <nikhilkathole2683@gmail.com>
1 parent 94315bb commit e8b524b

File tree

6 files changed

+253
-46
lines changed

6 files changed

+253
-46
lines changed
Lines changed: 196 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,196 @@
1+
---
2+
title: Tracking Feature Lineage with OpenLineage
3+
description: Feast now supports native OpenLineage integration for automatic data lineage tracking of your ML features - no code changes required.
4+
date: 2026-01-26
5+
authors: ["Nikhil Kathole", "Francisco Javier Arceo"]
6+
---
7+
8+
<div class="hero-image">
9+
<img src="/images/blog/openlineage1.png" alt="Feast OpenLineage Integration - Marquez UI" loading="lazy">
10+
</div>
11+
12+
# Tracking Feature Lineage with OpenLineage 🔗
13+
14+
# Feast and OpenLineage
15+
16+
Understanding where your ML features come from and how they flow through your system is critical for debugging, compliance, and governance. We are excited to announce that Feast now supports native integration with [OpenLineage](https://openlineage.io/), the open standard for data lineage collection and analysis.
17+
18+
With this integration, Feast automatically tracks and emits lineage events whenever you apply feature definitions or materialize features—**no code changes required**. Simply enable OpenLineage in your `feature_store.yaml`, and Feast handles the rest.
19+
20+
# Why Data Lineage Matters for Feature Stores
21+
22+
Feature stores manage the lifecycle of ML features, from raw data sources to model inference. As ML systems grow in complexity, teams often struggle to answer fundamental questions:
23+
24+
- *Where does this feature's data come from?*
25+
- *Which models depend on this feature view?*
26+
- *What downstream impact will changing this data source have?*
27+
- *How do I audit the data flow for compliance?*
28+
29+
OpenLineage solves these challenges by providing a standardized way to capture and visualize data lineage. By integrating OpenLineage into Feast, ML teams gain automatic visibility into their feature engineering pipelines without manual instrumentation.
30+
31+
# How It Works
32+
33+
The integration automatically emits OpenLineage events for two key operations:
34+
35+
## Registry Changes (`feast apply`)
36+
37+
When you run `feast apply`, Feast creates a lineage graph that mirrors what you see in the Feast UI:
38+
39+
```
40+
DataSources ──┐
41+
├──→ feast_feature_views_{project} ──→ FeatureViews
42+
Entities ─────┘ │
43+
44+
45+
feature_service_{name} ──→ FeatureService
46+
```
47+
48+
This creates two types of jobs:
49+
- **`feast_feature_views_{project}`**: Shows how DataSources and Entities flow into FeatureViews
50+
- **`feature_service_{name}`**: Shows which FeatureViews compose each FeatureService
51+
52+
## Feature Materialization (`feast materialize`)
53+
54+
When materializing features, Feast emits START, COMPLETE, and FAIL events, allowing you to track:
55+
- Which feature views were materialized
56+
- The time window of materialization
57+
- Success or failure status
58+
- Duration and row counts
59+
60+
# Getting Started
61+
62+
## Step 1: Install OpenLineage
63+
64+
```bash
65+
pip install feast[openlineage]
66+
```
67+
68+
## Step 2: Configure Your Feature Store
69+
70+
Add the `openlineage` section to your `feature_store.yaml`:
71+
72+
```yaml
73+
project: my_fraud_detection
74+
registry: data/registry.db
75+
provider: local
76+
online_store:
77+
type: sqlite
78+
path: data/online_store.db
79+
80+
openlineage:
81+
enabled: true
82+
transport_type: http
83+
transport_url: http://localhost:5000
84+
namespace: feast
85+
```
86+
87+
## Step 3: Start Marquez (Optional)
88+
89+
[Marquez](https://marquezproject.ai/) is the reference implementation for OpenLineage and provides a beautiful UI for exploring lineage:
90+
91+
```bash
92+
docker run -p 5000:5000 -p 3000:3000 marquezproject/marquez
93+
```
94+
95+
## Step 4: Apply Your Features
96+
97+
```python
98+
from feast import FeatureStore
99+
100+
fs = FeatureStore(repo_path="feature_repo")
101+
102+
# This automatically emits lineage events!
103+
fs.apply([
104+
driver_entity,
105+
driver_stats_source,
106+
driver_hourly_stats_view,
107+
driver_stats_service
108+
])
109+
```
110+
111+
Visit http://localhost:3000 to see your lineage graph in Marquez!
112+
113+
# Rich Metadata Tracking
114+
115+
The integration doesn't just track relationships—it captures comprehensive metadata about your Feast objects:
116+
117+
**Feature Views**
118+
- Feature names, types, and descriptions
119+
- TTL (time-to-live) configuration
120+
- Associated entities
121+
- Custom tags
122+
- Online/offline store enablement
123+
124+
**Feature Services**
125+
- Constituent feature views
126+
- Total feature count
127+
- Service-level descriptions and tags
128+
129+
**Data Sources**
130+
- Source type (File, BigQuery, Snowflake, etc.)
131+
- Connection URIs
132+
- Timestamp fields
133+
- Field mappings
134+
135+
All this metadata is attached as OpenLineage facets, making it queryable and explorable in any OpenLineage-compatible tool.
136+
137+
# Try It Out: Complete Working Example
138+
139+
We've included a complete working example in the Feast repository that demonstrates the OpenLineage integration end-to-end. The example creates a driver statistics feature store and shows how lineage events are automatically emitted.
140+
141+
**Run the example:**
142+
143+
```bash
144+
# Start Marquez first
145+
docker run -p 5000:5000 -p 3000:3000 marquezproject/marquez
146+
147+
# Clone and run the example
148+
cd feast/examples/openlineage-integration
149+
python openlineage_demo.py --url http://localhost:5000
150+
151+
# View lineage at http://localhost:3000
152+
```
153+
154+
The example demonstrates:
155+
- Creating entities, data sources, feature views, and feature services
156+
- Automatic lineage emission on `feast apply`
157+
- Materialization tracking with START/COMPLETE events
158+
- Feature retrieval (no lineage events for retrieval operations)
159+
160+
In Marquez, you'll see the complete lineage graph:
161+
- `driver_stats_source` (DataSource) → `driver_hourly_stats` (FeatureView)
162+
- `driver_id` (Entity) → `driver_hourly_stats` (FeatureView)
163+
- `driver_hourly_stats` (FeatureView) → `driver_stats_service` (FeatureService)
164+
165+
<div class="content-image">
166+
<img src="/images/blog/openlineage2.png" alt="Feast Lineage Graph in Marquez UI" loading="lazy">
167+
</div>
168+
169+
Check out the [full example code](https://github.com/feast-dev/feast/tree/master/examples/openlineage-integration) for complete details including feature definitions with descriptions and tags.
170+
171+
# Benefits for ML Teams
172+
173+
**Debugging Made Easy**
174+
When a model's predictions degrade, trace back through the lineage to identify which data source or feature transformation changed.
175+
176+
**Impact Analysis**
177+
Before modifying a data source, understand all downstream feature views and services that will be affected.
178+
179+
**Compliance & Audit**
180+
Maintain a complete audit trail of data flow for regulatory requirements like GDPR, CCPA, or SOC2.
181+
182+
**Documentation**
183+
Auto-generated lineage serves as living documentation that stays in sync with your actual feature store configuration.
184+
185+
**Cross-Team Collaboration**
186+
Data engineers, ML engineers, and data scientists can all view the same lineage graph to understand the feature store structure.
187+
188+
# How Can I Get Started?
189+
190+
This integration is available now in the latest version of Feast. To get started:
191+
192+
1. Check out the [OpenLineage Integration documentation](https://docs.feast.dev/reference/openlineage)
193+
2. Try the [example in the Feast repository](https://github.com/feast-dev/feast/tree/master/examples/openlineage-integration)
194+
3. Join the [Feast Slack](https://slack.feast.dev) to share feedback and ask questions
195+
196+
We're excited to see how teams use OpenLineage integration to improve their ML operations and welcome feedback from the community!
474 KB
Loading
401 KB
Loading

sdk/python/feast/feature_store.py

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1810,10 +1810,12 @@ def _emit_openlineage_materialize_start(
18101810
if self._openlineage_emitter is None:
18111811
return None
18121812
try:
1813-
run_id, _ = self._openlineage_emitter.emit_materialize_start(
1813+
run_id, success = self._openlineage_emitter.emit_materialize_start(
18141814
feature_views, start_date, end_date, self.project
18151815
)
1816-
return run_id
1816+
# Return run_id only if START was successfully emitted
1817+
# This prevents orphaned COMPLETE/FAIL events
1818+
return run_id if run_id and success else None
18171819
except Exception as e:
18181820
warnings.warn(f"Failed to emit OpenLineage materialize start event: {e}")
18191821
return None
@@ -1824,7 +1826,7 @@ def _emit_openlineage_materialize_complete(
18241826
feature_views: List[Any],
18251827
):
18261828
"""Emit OpenLineage COMPLETE event for materialization."""
1827-
if self._openlineage_emitter is None or run_id is None:
1829+
if self._openlineage_emitter is None or not run_id:
18281830
return
18291831
try:
18301832
self._openlineage_emitter.emit_materialize_complete(
@@ -1839,7 +1841,7 @@ def _emit_openlineage_materialize_fail(
18391841
error_message: str,
18401842
):
18411843
"""Emit OpenLineage FAIL event for materialization."""
1842-
if self._openlineage_emitter is None or run_id is None:
1844+
if self._openlineage_emitter is None or not run_id:
18431845
return
18441846
try:
18451847
self._openlineage_emitter.emit_materialize_fail(

sdk/python/feast/openlineage/emitter.py

Lines changed: 30 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -141,46 +141,39 @@ def emit_registry_lineage(
141141

142142
results = []
143143

144-
# Emit events for feature views
144+
# Get all feature views at once (includes FeatureView, StreamFeatureView, OnDemandFeatureView)
145+
all_feature_views: list = []
145146
try:
146-
feature_views = registry.list_feature_views(
147+
all_feature_views = registry.list_all_feature_views(
147148
project=project, allow_cache=allow_cache
148149
)
149-
for fv in feature_views:
150-
result = self.emit_feature_view_lineage(fv, project)
151-
results.append(result)
152150
except Exception as e:
153-
logger.error(f"Error emitting feature view lineage: {e}")
154-
155-
# Emit events for stream feature views
156-
try:
157-
stream_fvs = registry.list_stream_feature_views(
158-
project=project, allow_cache=allow_cache
159-
)
160-
for sfv in stream_fvs:
161-
result = self.emit_stream_feature_view_lineage(sfv, project)
151+
logger.error(f"Error listing all feature views: {e}")
152+
153+
# Emit lineage events for each feature view type
154+
for fv in all_feature_views:
155+
try:
156+
if isinstance(fv, OnDemandFeatureView):
157+
result = self.emit_on_demand_feature_view_lineage(fv, project)
158+
elif isinstance(fv, StreamFeatureView):
159+
result = self.emit_stream_feature_view_lineage(fv, project)
160+
elif isinstance(fv, FeatureView):
161+
result = self.emit_feature_view_lineage(fv, project)
162+
else:
163+
continue
162164
results.append(result)
163-
except Exception as e:
164-
logger.error(f"Error emitting stream feature view lineage: {e}")
165-
166-
# Emit events for on-demand feature views
167-
try:
168-
odfvs = registry.list_on_demand_feature_views(
169-
project=project, allow_cache=allow_cache
170-
)
171-
for odfv in odfvs:
172-
result = self.emit_on_demand_feature_view_lineage(odfv, project)
173-
results.append(result)
174-
except Exception as e:
175-
logger.error(f"Error emitting on-demand feature view lineage: {e}")
165+
except Exception as e:
166+
logger.error(f"Error emitting lineage for feature view {fv.name}: {e}")
176167

177168
# Emit events for feature services
178169
try:
179170
feature_services = registry.list_feature_services(
180171
project=project, allow_cache=allow_cache
181172
)
182173
for fs in feature_services:
183-
result = self.emit_feature_service_lineage(fs, feature_views, project)
174+
result = self.emit_feature_service_lineage(
175+
fs, all_feature_views, project
176+
)
184177
results.append(result)
185178
except Exception as e:
186179
logger.error(f"Error emitting feature service lineage: {e}")
@@ -372,15 +365,18 @@ def emit_on_demand_feature_view_lineage(
372365
def emit_feature_service_lineage(
373366
self,
374367
feature_service: "FeatureService",
375-
feature_views: List["FeatureView"],
368+
feature_views: List[
369+
Union["FeatureView", "OnDemandFeatureView", "StreamFeatureView"]
370+
],
376371
project: str,
377372
) -> bool:
378373
"""
379374
Emit lineage for a feature service definition.
380375
381376
Args:
382377
feature_service: The feature service
383-
feature_views: List of available feature views
378+
feature_views: List of all available feature views (FeatureView,
379+
OnDemandFeatureView, StreamFeatureView)
384380
project: Project name
385381
386382
Returns:
@@ -485,17 +481,16 @@ def emit_materialize_start(
485481
)
486482
)
487483

488-
# Add entities as inputs
484+
# Add entities as inputs (use direct name for consistency with emit_apply)
489485
if hasattr(fv, "entities") and fv.entities:
490486
for entity_name in fv.entities:
491487
if entity_name and entity_name != "__dummy":
492-
entity_key = f"entity_{entity_name}"
493-
if entity_key not in seen_sources:
494-
seen_sources.add(entity_key)
488+
if entity_name not in seen_sources:
489+
seen_sources.add(entity_name)
495490
inputs.append(
496491
InputDataset(
497492
namespace=namespace,
498-
name=entity_key,
493+
name=entity_name,
499494
)
500495
)
501496

0 commit comments

Comments
 (0)