Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .secrets.baseline
Original file line number Diff line number Diff line change
Expand Up @@ -957,7 +957,7 @@
"filename": "infra/feast-operator/api/v1/featurestore_types.go",
"hashed_secret": "44e17306b837162269a410204daaa5ecee4ec22c",
"is_verified": false,
"line_number": 906
"line_number": 935
}
],
"infra/feast-operator/api/v1/zz_generated.deepcopy.go": [
Expand All @@ -980,7 +980,7 @@
"filename": "infra/feast-operator/api/v1/zz_generated.deepcopy.go",
"hashed_secret": "c2028031c154bbe86fd69bef740855c74b927dcf",
"is_verified": false,
"line_number": 1528
"line_number": 1570
}
],
"infra/feast-operator/api/v1alpha1/featurestore_types.go": [
Expand Down Expand Up @@ -1555,5 +1555,5 @@
}
]
},
"generated_at": "2026-06-11T15:45:28Z"
"generated_at": "2026-06-22T17:54:28Z"
}
204 changes: 204 additions & 0 deletions docs/reference/openlineage.md
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,12 @@ Captures materialization run metadata:

## Lineage Visualization

### Option 1: Feast UI (Built-in)

Feast includes a built-in OpenLineage consumer that can receive, store, and visualize lineage from **all** OpenLineage producers (Airflow, Spark, dbt, Feast itself, etc.) directly in the Feast UI. See the [OpenLineage Consumer](#openlineage-consumer) section below.

### Option 2: Marquez

Use [Marquez](https://marquezproject.ai/) to visualize your Feast lineage:

```bash
Expand Down Expand Up @@ -216,3 +222,201 @@ Then access the Marquez UI at http://localhost:3000 to see your feature lineage.
| Entity | InputDataset |
| FeatureService | OutputDataset |
| Materialization | RunEvent (START/COMPLETE/FAIL) |

---

## OpenLineage Consumer

Feast can act as an **OpenLineage consumer**, receiving lineage events from any OpenLineage-compatible producer and displaying them in the Feast UI. This eliminates the need for a separate Marquez deployment when you want to visualize cross-system data lineage alongside your feature store.

### Consumer Architecture

```
Producers (Airflow, Spark, dbt, Feast, Flink, …)
POST /api/v1/lineage ──→ Event Processor ──→ Lineage Store (SQL)
Feast UI
┌──────────────────────────┐
│ Lineage tab │
│ ├─ OpenLineage Graph │
│ │ (all producers) │
│ └─ ☐ Feast Only Lineage │
│ (registry view) │
│ │
│ Events tab │
│ └─ Event browser │
└──────────────────────────┘
```
When the consumer is **not** enabled, the Feast UI shows only the original registry-based lineage view — no tabs are added.
### Enabling the Consumer
Add the `consumer` section under `openlineage` in your `feature_store.yaml`:
```yaml
project: my_project
registry:
registry_type: sql
path: postgresql://user:****@host:5432/feast # pragma: allowlist secret
openlineage:
enabled: true
namespace: my_project
consumer:
enabled: true
store_type: sql
# Optional: separate database for lineage storage.
# If omitted, the SQL registry database is reused.
# connection_string: postgresql://user:****@host:5432/feast_lineage
api_key: "change-me" # pragma: allowlist secret
namespace_mapping:
airflow_ns: my_project
spark_ns: my_project
```

Or via environment variables:

```bash
export FEAST_OPENLINEAGE_CONSUMER_ENABLED=true
export FEAST_OPENLINEAGE_CONSUMER_STORE_TYPE=sql
export FEAST_OPENLINEAGE_CONSUMER_API_KEY=change-me # pragma: allowlist secret
# Optional separate DB:
# export FEAST_OPENLINEAGE_CONSUMER_CONNECTION_STRING=postgresql://...
```

### Consumer Configuration Options

| Option | Default | Description |
|--------|---------|-------------|
| `consumer.enabled` | `false` | Enable the OpenLineage consumer |
| `consumer.store_type` | `sql` | Storage backend type. Currently only `sql` is supported |
| `consumer.connection_string` | - | Optional separate database connection string. If omitted, reuses the SQL registry database |
| `consumer.api_key` | - | API key that producers must provide when sending events |
| `consumer.namespace_mapping` | `{}` | Maps OpenLineage namespaces to Feast projects for RBAC scoping |

### Consumer API Endpoints

When the consumer is enabled, the following endpoints are available on the Feast REST registry server:

#### Event Receiver (Producer-facing)

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/api/v1/lineage` | `POST` | Receive a single OpenLineage event (or array of events) |
| `/api/v1/lineage/batch` | `POST` | Receive a batch of OpenLineage events |

Both endpoints require the `X-API-Key` header (or `Authorization: Bearer <key>`) if `consumer.api_key` is configured.

#### OpenLineage Query Endpoints (UI-facing)

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/lineage/openlineage/graph` | `GET` | Full lineage graph with all nodes, edges, and symlinks |
| `/lineage/openlineage/graph/{node_type}/{namespace}/{name}` | `GET` | Lineage graph centered on a specific node |
| `/lineage/openlineage/events` | `GET` | Browse stored events with filtering |
| `/lineage/openlineage/jobs` | `GET` | List all known OpenLineage jobs |
| `/lineage/openlineage/datasets` | `GET` | List all known OpenLineage datasets |

#### Registry Query Endpoints

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/lineage/registry` | `GET` | Feast registry lineage (entities, feature views, services) |
| `/lineage/registry/all` | `GET` | All registry objects with full metadata |
| `/lineage/objects/{object_type}/{object_name}` | `GET` | Detail for a specific registry object |
| `/lineage/complete` | `GET` | Complete registry lineage with relationships |
| `/lineage/complete/all` | `GET` | Complete registry lineage for all objects |

### Configuring Producers to Send Events to Feast

Configure any OpenLineage producer to send events to your Feast instance:

#### Airflow

```python
# In airflow.cfg or environment
OPENLINEAGE_URL = "http://feast-registry:8080/api"
OPENLINEAGE_API_KEY = "change-me" # pragma: allowlist secret
```

#### Spark

```properties
spark.openlineage.transport.type=http
spark.openlineage.transport.url=http://feast-registry:8080/api
spark.openlineage.transport.endpoint=/v1/lineage
spark.openlineage.transport.auth.type=api_key
spark.openlineage.transport.auth.apiKey=change-me
```

#### dbt

```yaml
# In profiles.yml or environment
OPENLINEAGE_URL: "http://feast-registry:8080/api"
OPENLINEAGE_API_KEY: "change-me" # pragma: allowlist secret
```
#### Feast (Self-reporting)
When both the OpenLineage producer and consumer are enabled, Feast's own events (from `feast apply`, materialization, etc.) are automatically ingested into the local consumer store — no HTTP transport is needed.

```yaml
# In feature_store.yaml
openlineage:
enabled: true
namespace: my_project
consumer:
enabled: true
api_key: change-me # pragma: allowlist secret
```

### Feast UI Lineage Views

When the consumer is enabled, the lineage page in the Feast UI shows two tabs:

**Lineage tab**

- **OpenLineage Graph** (default) — shows lineage from all OpenLineage producers with cross-producer connectivity. Nodes are color-coded by producer (colors generated dynamically). The graph supports filtering by type, producer, and object name. Clicking a node opens a **detail panel** showing description, schema, tags, features, entities, data quality metrics, data source info, and other facets.
- **Feast Only Lineage** (checkbox) — switches to the original Feast registry view (DataSource → FeatureView → FeatureService) powered entirely by the Feast registry.

**Events tab**

- Browse individual OpenLineage events with filtering by event type, job name, and run ID. Expand any event to inspect the full JSON payload.

### Cross-Producer Lineage Connectivity

The consumer automatically links datasets across different producers when they refer to the same physical data. Linking mechanisms:

1. **Shared namespace + name** — If Airflow writes to `s3://bucket/path` and Spark reads from the same `s3://bucket/path`, the graph connects them automatically.
2. **SymlinksDatasetFacet** — Producers can declare aliases. For example, Feast can declare that its internal `driver_hourly_stats` is a symlink to the Spark output at `s3://bucket/features/driver_hourly_stats/`.
3. **dataSource URI matching** — Datasets with matching `dataSource.uri` facets are linked even if their namespace or name differ.

Compatible producers include Airflow, Spark, dbt, Flink, Feast, Dagster, and Great Expectations.

### RBAC for Lineage

The OpenLineage consumer integrates with Feast's existing RBAC:

- **Write access** (producers sending events): Authenticated via API key in the `X-API-Key` header
- **Read access** (UI viewing lineage): Namespace-based filtering maps OpenLineage namespaces to Feast projects. Users see only lineage data for namespaces they have access to via the `namespace_mapping` configuration

### Database Schema

The consumer creates the following tables (automatically on first startup):

| Table | Purpose |
|-------|---------|
| `openlineage_events` | Raw event storage with JSON payloads |
| `openlineage_jobs` | Deduplicated job records with producer, description, and facets |
| `openlineage_datasets` | Deduplicated dataset records with schema, facets, and Feast mapping |
| `openlineage_runs` | Run lifecycle tracking (START/COMPLETE/FAIL) |
| `openlineage_run_io` | Input/output relationships between runs and datasets |
| `openlineage_lineage_edges` | Materialized lineage graph edges for efficient traversal |
| `openlineage_dataset_symlinks` | Cross-producer dataset linking via `SymlinksDatasetFacet` and `dataSource` URI matching |

By default these tables are created in the **same database** as the SQL registry (hybrid storage). Set `consumer.connection_string` to store them in a separate database instead.
29 changes: 29 additions & 0 deletions infra/feast-operator/api/v1/featurestore_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,35 @@ type OpenLineageConfig struct {
// Keys must be valid Feast OpenLineageConfig YAML field names.
// +optional
ExtraConfig map[string]string `json:"extraConfig,omitempty"`
// Consumer configures the OpenLineage consumer (event receiver) that enables
// Feast to receive and display lineage from external producers (Airflow, Spark, dbt, etc.).
// +optional
Consumer *OpenLineageConsumerConfig `json:"consumer,omitempty"`
}

// OpenLineageConsumerConfig configures the OpenLineage consumer (event receiver).
// When enabled, the Feast REST server exposes POST /api/v1/lineage to receive
// OpenLineage events from any producer, storing them for visualization in the Feast UI.
type OpenLineageConsumerConfig struct {
// Enable the OpenLineage consumer.
Enabled bool `json:"enabled"`
// StoreType is the storage backend for lineage events. Currently only "sql" is supported.
// +kubebuilder:default="sql"
// +kubebuilder:validation:Enum=sql
// +optional
StoreType *string `json:"storeType,omitempty"`
// Reference to a Secret containing the key "connection_string" for a separate
// lineage database. If omitted, the SQL registry database is reused.
// +optional
ConnectionStringSecretRef *corev1.LocalObjectReference `json:"connectionStringSecretRef,omitempty"`
// Reference to a Secret containing the key "api_key" that producers must
// provide in the X-API-Key header when sending events.
// +optional
ApiKeySecretRef *corev1.LocalObjectReference `json:"apiKeySecretRef,omitempty"`
// NamespaceMapping maps OpenLineage namespaces to Feast projects for
// RBAC-based filtering of lineage data in the UI.
// +optional
NamespaceMapping map[string]string `json:"namespaceMapping,omitempty"`
}

// FeatureStoreSpec defines the desired state of FeatureStore
Expand Down
42 changes: 42 additions & 0 deletions infra/feast-operator/api/v1/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading