fix: Configure environment paths for Ray worker compatibility

Use PYTHONPATH and PATH env vars to ensure Ray workers can access packages installed by uv sync, maintaining consistent uv usage across all make targets while supporting subprocess tools. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
feast-dev · franciscojavierarceo · Feb 10, 2026 · Jan 31, 2026 · Jan 31, 2026 · Jan 31, 2026
commit 0da7c1ddfbd1e67c256f75ed635b8f3546c32104
diff --git a/.github/workflows/unit_tests.yml b/.github/workflows/unit_tests.yml
@@ -36,6 +36,9 @@ jobs:
       - name: Install dependencies
         run: make install-python-dependencies-ci
       - name: Test Python
+        env:
+          PYTHONPATH: "/home/runner/work/feast/feast/.venv/lib/python${{ matrix.python-version }}/site-packages:$PYTHONPATH"
+          PATH: "/home/runner/work/feast/feast/.venv/bin:$PATH"
         run: make test-python-unit
       - name: Minimize uv cache
         run: uv cache prune --ci

diff --git a/feast_profile_demo/.gitignore b/feast_profile_demo/.gitignore
@@ -0,0 +1,45 @@
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*.pyo
+*.pyd
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+*.egg-info/
+dist/
+build/
+.venv
+
+# Pytest
+.cache
+*.cover
+*.log
+.coverage
+nosetests.xml
+coverage.xml
+*.hypothesis/
+*.pytest_cache/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# IDEs and Editors
+.vscode/
+.idea/
+*.swp
+*.swo
+*.sublime-workspace
+*.sublime-project
+
+# OS generated files
+.DS_Store
+Thumbs.db
diff --git a/feast_profile_demo/README.md b/feast_profile_demo/README.md
@@ -0,0 +1,29 @@
+# Feast Quickstart
+If you haven't already, check out the quickstart guide on Feast's website (http://docs.feast.dev/quickstart), which 
+uses this repo. A quick view of what's in this repository's `feature_repo/` directory:
+
+* `data/` contains raw demo parquet data
+* `feature_repo/feature_definitions.py` contains demo feature definitions
+* `feature_repo/feature_store.yaml` contains a demo setup configuring where data sources are
+* `feature_repo/test_workflow.py` showcases how to run all key Feast commands, including defining, retrieving, and pushing features. 
+
+You can run the overall workflow with `python test_workflow.py`.
+
+## To move from this into a more production ready workflow:
+> See more details in [Running Feast in production](https://docs.feast.dev/how-to-guides/running-feast-in-production)
+
+1. First: you should start with a different Feast template, which delegates to a more scalable offline store. 
+   - For example, running `feast init -t gcp`
+   or `feast init -t aws` or `feast init -t snowflake`. 
+   - You can see your options if you run `feast init --help`.
+2. `feature_store.yaml` points to a local file as a registry. You'll want to setup a remote file (e.g. in S3/GCS) or a 
+SQL registry. See [registry docs](https://docs.feast.dev/getting-started/concepts/registry) for more details. 
+3. This example uses a file [offline store](https://docs.feast.dev/getting-started/components/offline-store) 
+   to generate training data. It does not scale. We recommend instead using a data warehouse such as BigQuery, 
+   Snowflake, Redshift. There is experimental support for Spark as well.
+4. Setup CI/CD + dev vs staging vs prod environments to automatically update the registry as you change Feast feature definitions. See [docs](https://docs.feast.dev/how-to-guides/running-feast-in-production#1.-automatically-deploying-changes-to-your-feature-definitions).
+5. (optional) Regularly scheduled materialization to power low latency feature retrieval (e.g. via Airflow). See [Batch data ingestion](https://docs.feast.dev/getting-started/concepts/data-ingestion#batch-data-ingestion)
+for more details.
+6. (optional) Deploy feature server instances with `feast serve` to expose endpoints to retrieve online features.
+   - See [Python feature server](https://docs.feast.dev/reference/feature-servers/python-feature-server) for details.
+   - Use cases can also directly call the Feast client to fetch features as per [Feature retrieval](https://docs.feast.dev/getting-started/concepts/feature-retrieval)
diff --git a/feast_profile_demo/__init__.py b/feast_profile_demo/__init__.py
diff --git a/feast_profile_demo/feature_repo/README_Profiling.md b/feast_profile_demo/feature_repo/README_Profiling.md
@@ -0,0 +1,152 @@
+# Feast Performance Profiling Suite
+
+## Overview
+
+This repository contains a comprehensive performance profiling suite for Feast's feature serving infrastructure. The profiling tools help identify bottlenecks in FeatureStore operations, FastAPI server performance, and component-level inefficiencies.
+
+## Files Created
+
+### Core Profiling Scripts
+
+1. **`profiling_utils.py`** - Shared utilities for cProfile management, timing, memory tracking
+2. **`profile_feature_store.py`** - Direct FeatureStore.get_online_features() profiling
+3. **`profile_feature_server.py`** - FastAPI server endpoint profiling (requires requests, aiohttp)
+4. **`profile_components.py`** - Component isolation profiling (protobuf, registry, etc.)
+5. **`profiling_analysis.md`** - Comprehensive analysis of performance findings
+
+### Generated Reports
+
+- **CSV Reports**: Quantitative performance data in `profiling_results/*/profiling_summary_*.csv`
+- **Profile Files**: Detailed cProfile outputs (`.prof` files) for snakeviz analysis
+- **Memory Analysis**: Tracemalloc snapshots for memory usage patterns
+
+## Key Performance Findings
+
+### Major Bottlenecks Identified
+
+1. **FeatureStore Initialization: 2.4-2.5 seconds**
+   - Primary bottleneck for serverless deployments
+   - Heavy import and dependency loading overhead
+   - 99.8% of initialization time spent in `feature_store.py:123(__init__)`
+
+2. **On-Demand Feature Views: 4x Performance Penalty**
+   - Standard features: ~2ms per request
+   - With ODFVs: ~8ms per request
+   - Bottleneck: `on_demand_feature_view.py:819(transform_arrow)`
+
+3. **Feature Services: 129% Overhead vs Direct Features**
+   - Direct features: 7ms
+   - Feature service: 16ms
+   - Additional registry traversal costs
+
+### Scaling Characteristics
+
+- **Entity Count**: Linear scaling (good)
+  - 1 entity: 2ms
+  - 1000 entities: 22ms
+- **Memory Usage**: Efficient (<1MB for most operations)
+- **Provider Abstraction**: Minimal overhead
+
+## Usage Instructions
+
+### Quick Start
+
+```bash
+# Run basic FeatureStore profiling
+python profile_feature_store.py
+
+# Run component isolation tests
+python profile_components.py
+
+# For FastAPI server profiling (requires additional deps):
+pip install requests aiohttp
+python profile_feature_server.py
+```
+
+### Custom Profiling
+
+```python
+from profiling_utils import FeastProfiler
+from feast import FeatureStore
+
+profiler = FeastProfiler("my_results")
+
+with profiler.profile_context("my_test") as result:
+    store = FeatureStore(repo_path=".")
+
+    with profiler.time_operation("feature_retrieval", result):
+        response = store.get_online_features(...)
+
+    # Add custom metrics
+    result.add_timing("custom_metric", some_value)
+
+# Generate reports
+profiler.print_summary()
+profiler.generate_csv_report()
+```
+
+### Analysis Tools
+
+```bash
+# View interactive call graphs
+pip install snakeviz
+snakeviz profiling_results/components/my_test_*.prof
+
+# Analyze CSV reports
+import pandas as pd
+df = pd.read_csv("profiling_results/*/profiling_summary_*.csv")
+```
+
+## Optimization Priorities
+
+### High Impact (>100ms improvement potential)
+
+1. **Optimize FeatureStore initialization** - Lazy loading, import optimization
+2. **On-Demand Feature View optimization** - Arrow operations, vectorization
+
+### Medium Impact (10-100ms improvement potential)
+
+3. **Entity batch processing** - Vectorized operations for large batches
+4. **Response serialization** - Streaming, protobuf optimization
+
+### Low Impact (<10ms improvement potential)
+
+5. **Registry operations** - Already efficient, minor optimizations possible
+
+## Environment Setup
+
+This profiling was conducted with:
+- **Data**: Local SQLite online store, 15 days × 5 drivers hourly stats
+- **Features**: Standard numerical features + on-demand transformations
+- **Scale**: 1-1000 entities, 1-5 features per request
+- **Provider**: Local SQLite (provider-agnostic bottlenecks identified)
+
+## Production Recommendations
+
+### For High-Throughput Serving
+
+1. **Pre-initialize FeatureStore** - Keep warm instances to avoid 2.4s cold start
+2. **Minimize ODFV usage** - Consider pre-computation for performance-critical paths
+3. **Use direct feature lists** - Avoid feature service overhead when possible
+4. **Batch entity requests** - Linear scaling makes batching efficient
+
+### For Serverless Deployment
+
+1. **Investigate initialization optimization** - Biggest impact for cold starts
+2. **Consider connection pooling** - Reduce per-request overhead
+3. **Monitor memory usage** - Current usage is efficient (<1MB typical)
+
+### For Development
+
+1. **Use profiling suite** - Regular performance regression testing
+2. **Benchmark new features** - Especially ODFV implementations
+3. **Monitor provider changes** - Verify abstraction layer efficiency
+
+## Next Steps
+
+1. **Run FastAPI server profiling** with proper dependencies
+2. **Implement optimization recommendations** starting with high-impact items
+3. **Establish continuous profiling** in CI/CD pipeline
+4. **Profile production workloads** to validate findings
+
+This profiling suite provides the foundation for ongoing Feast performance optimization and monitoring.
diff --git a/feast_profile_demo/feature_repo/__init__.py b/feast_profile_demo/feature_repo/__init__.py
diff --git a/feast_profile_demo/feature_repo/data/driver_stats.parquet b/feast_profile_demo/feature_repo/data/driver_stats.parquet
diff --git a/feast_profile_demo/feature_repo/data/online_store.db b/feast_profile_demo/feature_repo/data/online_store.db
diff --git a/feast_profile_demo/feature_repo/feature_definitions.py b/feast_profile_demo/feature_repo/feature_definitions.py
@@ -0,0 +1,148 @@
+# This is an example feature definition file
+
+from datetime import timedelta
+
+import pandas as pd
+
+from feast import (
+    Entity,
+    FeatureService,
+    FeatureView,
+    Field,
+    FileSource,
+    Project,
+    PushSource,
+    RequestSource,
+)
+from feast.feature_logging import LoggingConfig
+from feast.infra.offline_stores.file_source import FileLoggingDestination
+from feast.on_demand_feature_view import on_demand_feature_view
+from feast.types import Float32, Float64, Int64
+
+# Define a project for the feature repo
+project = Project(name="feast_profile_demo", description="A project for driver statistics")
+
+# Define an entity for the driver. You can think of an entity as a primary key used to
+# fetch features.
+driver = Entity(name="driver", join_keys=["driver_id"])
+
+# Read data from parquet files. Parquet is convenient for local development mode. For
+# production, you can use your favorite DWH, such as BigQuery. See Feast documentation
+# for more info.
+driver_stats_source = FileSource(
+    name="driver_hourly_stats_source",
+    path="data/driver_stats.parquet",
+    timestamp_field="event_timestamp",
+    created_timestamp_column="created",
+)
+
+# Our parquet files contain sample data that includes a driver_id column, timestamps and
+# three feature column. Here we define a Feature View that will allow us to serve this
+# data to our model online.
+driver_stats_fv = FeatureView(
+    # The unique name of this feature view. Two feature views in a single
+    # project cannot have the same name
+    name="driver_hourly_stats",
+    entities=[driver],
+    ttl=timedelta(days=1),
+    # The list of features defined below act as a schema to both define features
+    # for both materialization of features into a store, and are used as references
+    # during retrieval for building a training dataset or serving features
+    schema=[
+        Field(name="conv_rate", dtype=Float32),
+        Field(name="acc_rate", dtype=Float32),
+        Field(name="avg_daily_trips", dtype=Int64, description="Average daily trips"),
+    ],
+    online=True,
+    source=driver_stats_source,
+    # Tags are user defined key/value pairs that are attached to each
+    # feature view
+    tags={"team": "driver_performance"},
+)
+
+# Define a request data source which encodes features / information only
+# available at request time (e.g. part of the user initiated HTTP request)
+input_request = RequestSource(
+    name="vals_to_add",
+    schema=[
+        Field(name="val_to_add", dtype=Int64),
+        Field(name="val_to_add_2", dtype=Int64),
+    ],
+)
+
+
+# Define an on demand feature view which can generate new features based on
+# existing feature views and RequestSource features
+@on_demand_feature_view(
+    sources=[driver_stats_fv, input_request],
+    schema=[
+        Field(name="conv_rate_plus_val1", dtype=Float64),
+        Field(name="conv_rate_plus_val2", dtype=Float64),
+    ],
+)
+def transformed_conv_rate(inputs: pd.DataFrame) -> pd.DataFrame:
+    df = pd.DataFrame()
+    df["conv_rate_plus_val1"] = inputs["conv_rate"] + inputs["val_to_add"]
+    df["conv_rate_plus_val2"] = inputs["conv_rate"] + inputs["val_to_add_2"]
+    return df
+
+
+# This groups features into a model version
+driver_activity_v1 = FeatureService(
+    name="driver_activity_v1",
+    features=[
+        driver_stats_fv[["conv_rate"]],  # Sub-selects a feature from a feature view
+        transformed_conv_rate,  # Selects all features from the feature view
+    ],
+    logging_config=LoggingConfig(
+        destination=FileLoggingDestination(path="data")
+    ),
+)
+driver_activity_v2 = FeatureService(
+    name="driver_activity_v2", features=[driver_stats_fv, transformed_conv_rate]
+)
+
+# Defines a way to push data (to be available offline, online or both) into Feast.
+driver_stats_push_source = PushSource(
+    name="driver_stats_push_source",
+    batch_source=driver_stats_source,
+)
+
+# Defines a slightly modified version of the feature view from above, where the source
+# has been changed to the push source. This allows fresh features to be directly pushed
+# to the online store for this feature view.
+driver_stats_fresh_fv = FeatureView(
+    name="driver_hourly_stats_fresh",
+    entities=[driver],
+    ttl=timedelta(days=1),
+    schema=[
+        Field(name="conv_rate", dtype=Float32),
+        Field(name="acc_rate", dtype=Float32),
+        Field(name="avg_daily_trips", dtype=Int64),
+    ],
+    online=True,
+    source=driver_stats_push_source,  # Changed from above
+    tags={"team": "driver_performance"},
+)
+
+
+# Define an on demand feature view which can generate new features based on
+# existing feature views and RequestSource features
+@on_demand_feature_view(
+    sources=[driver_stats_fresh_fv, input_request],  # relies on fresh version of FV
+    schema=[
+        Field(name="conv_rate_plus_val1", dtype=Float64),
+        Field(name="conv_rate_plus_val2", dtype=Float64),
+    ],
+)
+def transformed_conv_rate_fresh(inputs: pd.DataFrame) -> pd.DataFrame:
+    df = pd.DataFrame()
+    df["conv_rate_plus_val1"] = inputs["conv_rate"] + inputs["val_to_add"]
+    df["conv_rate_plus_val2"] = inputs["conv_rate"] + inputs["val_to_add_2"]
+    return df
+
+
+driver_activity_v3 = FeatureService(
+    name="driver_activity_v3",
+    features=[driver_stats_fresh_fv, transformed_conv_rate_fresh],
+)