|
| 1 | +--- |
| 2 | +title: Tracking Feature Lineage with OpenLineage |
| 3 | +description: Feast now supports native OpenLineage integration for automatic data lineage tracking of your ML features - no code changes required. |
| 4 | +date: 2026-01-26 |
| 5 | +authors: ["Nikhil Kathole", "Francisco Javier Arceo"] |
| 6 | +--- |
| 7 | + |
| 8 | +<div class="hero-image"> |
| 9 | + <img src="/images/blog/openlineage1.png" alt="Feast OpenLineage Integration - Marquez UI" loading="lazy"> |
| 10 | +</div> |
| 11 | + |
| 12 | +# Tracking Feature Lineage with OpenLineage 🔗 |
| 13 | + |
| 14 | +# Feast and OpenLineage |
| 15 | + |
| 16 | +Understanding where your ML features come from and how they flow through your system is critical for debugging, compliance, and governance. We are excited to announce that Feast now supports native integration with [OpenLineage](https://openlineage.io/), the open standard for data lineage collection and analysis. |
| 17 | + |
| 18 | +With this integration, Feast automatically tracks and emits lineage events whenever you apply feature definitions or materialize features—**no code changes required**. Simply enable OpenLineage in your `feature_store.yaml`, and Feast handles the rest. |
| 19 | + |
| 20 | +# Why Data Lineage Matters for Feature Stores |
| 21 | + |
| 22 | +Feature stores manage the lifecycle of ML features, from raw data sources to model inference. As ML systems grow in complexity, teams often struggle to answer fundamental questions: |
| 23 | + |
| 24 | +- *Where does this feature's data come from?* |
| 25 | +- *Which models depend on this feature view?* |
| 26 | +- *What downstream impact will changing this data source have?* |
| 27 | +- *How do I audit the data flow for compliance?* |
| 28 | + |
| 29 | +OpenLineage solves these challenges by providing a standardized way to capture and visualize data lineage. By integrating OpenLineage into Feast, ML teams gain automatic visibility into their feature engineering pipelines without manual instrumentation. |
| 30 | + |
| 31 | +# How It Works |
| 32 | + |
| 33 | +The integration automatically emits OpenLineage events for two key operations: |
| 34 | + |
| 35 | +## Registry Changes (`feast apply`) |
| 36 | + |
| 37 | +When you run `feast apply`, Feast creates a lineage graph that mirrors what you see in the Feast UI: |
| 38 | + |
| 39 | +``` |
| 40 | +DataSources ──┐ |
| 41 | + ├──→ feast_feature_views_{project} ──→ FeatureViews |
| 42 | +Entities ─────┘ │ |
| 43 | + │ |
| 44 | + ▼ |
| 45 | + feature_service_{name} ──→ FeatureService |
| 46 | +``` |
| 47 | + |
| 48 | +This creates two types of jobs: |
| 49 | +- **`feast_feature_views_{project}`**: Shows how DataSources and Entities flow into FeatureViews |
| 50 | +- **`feature_service_{name}`**: Shows which FeatureViews compose each FeatureService |
| 51 | + |
| 52 | +## Feature Materialization (`feast materialize`) |
| 53 | + |
| 54 | +When materializing features, Feast emits START, COMPLETE, and FAIL events, allowing you to track: |
| 55 | +- Which feature views were materialized |
| 56 | +- The time window of materialization |
| 57 | +- Success or failure status |
| 58 | +- Duration and row counts |
| 59 | + |
| 60 | +# Getting Started |
| 61 | + |
| 62 | +## Step 1: Install OpenLineage |
| 63 | + |
| 64 | +```bash |
| 65 | +pip install feast[openlineage] |
| 66 | +``` |
| 67 | + |
| 68 | +## Step 2: Configure Your Feature Store |
| 69 | + |
| 70 | +Add the `openlineage` section to your `feature_store.yaml`: |
| 71 | + |
| 72 | +```yaml |
| 73 | +project: my_fraud_detection |
| 74 | +registry: data/registry.db |
| 75 | +provider: local |
| 76 | +online_store: |
| 77 | + type: sqlite |
| 78 | + path: data/online_store.db |
| 79 | + |
| 80 | +openlineage: |
| 81 | + enabled: true |
| 82 | + transport_type: http |
| 83 | + transport_url: http://localhost:5000 |
| 84 | + namespace: feast |
| 85 | +``` |
| 86 | +
|
| 87 | +## Step 3: Start Marquez (Optional) |
| 88 | +
|
| 89 | +[Marquez](https://marquezproject.ai/) is the reference implementation for OpenLineage and provides a beautiful UI for exploring lineage: |
| 90 | +
|
| 91 | +```bash |
| 92 | +docker run -p 5000:5000 -p 3000:3000 marquezproject/marquez |
| 93 | +``` |
| 94 | + |
| 95 | +## Step 4: Apply Your Features |
| 96 | + |
| 97 | +```python |
| 98 | +from feast import FeatureStore |
| 99 | + |
| 100 | +fs = FeatureStore(repo_path="feature_repo") |
| 101 | + |
| 102 | +# This automatically emits lineage events! |
| 103 | +fs.apply([ |
| 104 | + driver_entity, |
| 105 | + driver_stats_source, |
| 106 | + driver_hourly_stats_view, |
| 107 | + driver_stats_service |
| 108 | +]) |
| 109 | +``` |
| 110 | + |
| 111 | +Visit http://localhost:3000 to see your lineage graph in Marquez! |
| 112 | + |
| 113 | +# Rich Metadata Tracking |
| 114 | + |
| 115 | +The integration doesn't just track relationships—it captures comprehensive metadata about your Feast objects: |
| 116 | + |
| 117 | +**Feature Views** |
| 118 | +- Feature names, types, and descriptions |
| 119 | +- TTL (time-to-live) configuration |
| 120 | +- Associated entities |
| 121 | +- Custom tags |
| 122 | +- Online/offline store enablement |
| 123 | + |
| 124 | +**Feature Services** |
| 125 | +- Constituent feature views |
| 126 | +- Total feature count |
| 127 | +- Service-level descriptions and tags |
| 128 | + |
| 129 | +**Data Sources** |
| 130 | +- Source type (File, BigQuery, Snowflake, etc.) |
| 131 | +- Connection URIs |
| 132 | +- Timestamp fields |
| 133 | +- Field mappings |
| 134 | + |
| 135 | +All this metadata is attached as OpenLineage facets, making it queryable and explorable in any OpenLineage-compatible tool. |
| 136 | + |
| 137 | +# Try It Out: Complete Working Example |
| 138 | + |
| 139 | +We've included a complete working example in the Feast repository that demonstrates the OpenLineage integration end-to-end. The example creates a driver statistics feature store and shows how lineage events are automatically emitted. |
| 140 | + |
| 141 | +**Run the example:** |
| 142 | + |
| 143 | +```bash |
| 144 | +# Start Marquez first |
| 145 | +docker run -p 5000:5000 -p 3000:3000 marquezproject/marquez |
| 146 | + |
| 147 | +# Clone and run the example |
| 148 | +cd feast/examples/openlineage-integration |
| 149 | +python openlineage_demo.py --url http://localhost:5000 |
| 150 | + |
| 151 | +# View lineage at http://localhost:3000 |
| 152 | +``` |
| 153 | + |
| 154 | +The example demonstrates: |
| 155 | +- Creating entities, data sources, feature views, and feature services |
| 156 | +- Automatic lineage emission on `feast apply` |
| 157 | +- Materialization tracking with START/COMPLETE events |
| 158 | +- Feature retrieval (no lineage events for retrieval operations) |
| 159 | + |
| 160 | +In Marquez, you'll see the complete lineage graph: |
| 161 | +- `driver_stats_source` (DataSource) → `driver_hourly_stats` (FeatureView) |
| 162 | +- `driver_id` (Entity) → `driver_hourly_stats` (FeatureView) |
| 163 | +- `driver_hourly_stats` (FeatureView) → `driver_stats_service` (FeatureService) |
| 164 | + |
| 165 | +<div class="content-image"> |
| 166 | + <img src="/images/blog/openlineage2.png" alt="Feast Lineage Graph in Marquez UI" loading="lazy"> |
| 167 | +</div> |
| 168 | + |
| 169 | +Check out the [full example code](https://github.com/feast-dev/feast/tree/master/examples/openlineage-integration) for complete details including feature definitions with descriptions and tags. |
| 170 | + |
| 171 | +# Benefits for ML Teams |
| 172 | + |
| 173 | +**Debugging Made Easy** |
| 174 | +When a model's predictions degrade, trace back through the lineage to identify which data source or feature transformation changed. |
| 175 | + |
| 176 | +**Impact Analysis** |
| 177 | +Before modifying a data source, understand all downstream feature views and services that will be affected. |
| 178 | + |
| 179 | +**Compliance & Audit** |
| 180 | +Maintain a complete audit trail of data flow for regulatory requirements like GDPR, CCPA, or SOC2. |
| 181 | + |
| 182 | +**Documentation** |
| 183 | +Auto-generated lineage serves as living documentation that stays in sync with your actual feature store configuration. |
| 184 | + |
| 185 | +**Cross-Team Collaboration** |
| 186 | +Data engineers, ML engineers, and data scientists can all view the same lineage graph to understand the feature store structure. |
| 187 | + |
| 188 | +# How Can I Get Started? |
| 189 | + |
| 190 | +This integration is available now in the latest version of Feast. To get started: |
| 191 | + |
| 192 | +1. Check out the [OpenLineage Integration documentation](https://docs.feast.dev/reference/openlineage) |
| 193 | +2. Try the [example in the Feast repository](https://github.com/feast-dev/feast/tree/master/examples/openlineage-integration) |
| 194 | +3. Join the [Feast Slack](https://slack.feast.dev) to share feedback and ask questions |
| 195 | + |
| 196 | +We're excited to see how teams use OpenLineage integration to improve their ML operations and welcome feedback from the community! |
0 commit comments