Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
e7c9113
feat: Modernize precommit hooks and optimize test performance
franciscojavierarceo Jan 31, 2026
031a978
fix: Run uv commands from root to use pyproject.toml
franciscojavierarceo Jan 31, 2026
a23152e
fix: Use --no-project for mypy to run from sdk/python
franciscojavierarceo Jan 31, 2026
ae52fcc
fix: Simplify precommit config to use make targets
franciscojavierarceo Jan 31, 2026
88cefb3
fix: Use uv run --extra ci for tests to include all deps
franciscojavierarceo Jan 31, 2026
759dd9e
fix: Fix import sorting in snowflake bootstrap.py
franciscojavierarceo Jan 31, 2026
ff4548e
feat: Modernize development workflow with uv integration and CI perfo…
franciscojavierarceo Feb 2, 2026
de69e92
fix: Resolve MyPy type error in MilvusOnlineStoreCreator
franciscojavierarceo Feb 2, 2026
c8bbf87
fix: Ensure feast module is accessible in CI smoke tests
franciscojavierarceo Feb 2, 2026
46ed9d9
Merge branch 'master' into feat/precommit-test-performance-optimization
franciscojavierarceo Feb 2, 2026
df45285
fix: Ensure feast module is accessible in CI smoke tests
franciscojavierarceo Feb 2, 2026
6bc12c2
Apply suggestion from @Copilot
franciscojavierarceo Feb 2, 2026
eb6b346
refactor: Simplify Makefile with consistent uv run usage
franciscojavierarceo Feb 2, 2026
5417ab2
fix: Use uv sync for CI to enable consistent uv run usage
franciscojavierarceo Feb 3, 2026
6f6c736
fix: Use uv run in smoke tests for virtualenv compatibility
franciscojavierarceo Feb 3, 2026
9dae77f
chore: Untrack perf-monitor.py development utility
franciscojavierarceo Feb 3, 2026
63c8f3b
fix: Address review feedback for pytest.ini and Makefile
franciscojavierarceo Feb 3, 2026
2068303
Merge branch 'master' into feat/precommit-test-performance-optimization
franciscojavierarceo Feb 3, 2026
0da7c1d
fix: Configure environment paths for Ray worker compatibility
franciscojavierarceo Feb 3, 2026
8848a45
fix: Install make and fix Python paths in CI
franciscojavierarceo Feb 3, 2026
4904104
fix: Use RUNNER_OS environment variable correctly
franciscojavierarceo Feb 3, 2026
60466b2
fix: Ensure PATH is properly exported in test step
franciscojavierarceo Feb 3, 2026
97cd848
fix: Use dynamic site-packages detection for cross-platform compatibi…
franciscojavierarceo Feb 4, 2026
386c7cf
debug: Add Python 3.11 macOS debugging and compatibility workarounds
franciscojavierarceo Feb 4, 2026
d8b156c
fix: Apply macOS Ray compatibility workarounds to all Python versions
franciscojavierarceo Feb 4, 2026
0b6d274
fix: Make PYTHONPATH additive to support both Ray workers and CLI tests
franciscojavierarceo Feb 4, 2026
c530cf6
fix: Skip ray_transformation doctests to avoid macOS Ray worker timeouts
franciscojavierarceo Feb 5, 2026
dec75eb
chore: Remove feast_profile_demo from git tracking
franciscojavierarceo Feb 5, 2026
f50366b
fix: Skip test_e2e_local on macOS CI due to Ray/uv subprocess issues
franciscojavierarceo Feb 5, 2026
282558a
fix: Skip CLI tests on macOS CI due to Ray/uv subprocess issues
franciscojavierarceo Feb 6, 2026
f8051e1
chore: Remove perf-monitor.py from git tracking
franciscojavierarceo Feb 6, 2026
0e111fc
fix: Use uv pip sync with virtualenv instead of uv sync
franciscojavierarceo Feb 6, 2026
12b4a72
updated
franciscojavierarceo Feb 6, 2026
2d54924
fix: Skip test_cli_chdir on macOS CI and use uv run pytest for REST A…
franciscojavierarceo Feb 9, 2026
c69a5c8
fix: Run uv commands from repo root to use correct virtualenv
franciscojavierarceo Feb 9, 2026
cf72f4e
fix: Handle missing arguments gracefully in mypy-daemon.sh
franciscojavierarceo Feb 9, 2026
ad90593
chore: Revert .gitignore changes
franciscojavierarceo Feb 9, 2026
3e827bc
fix: Restore ruff format --check in lint-python target
franciscojavierarceo Feb 9, 2026
9552822
Merge branch 'master' into feat/precommit-test-performance-optimization
franciscojavierarceo Feb 9, 2026
9c499ad
Apply suggestion from @ntkathole
franciscojavierarceo Feb 9, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
fix: Configure environment paths for Ray worker compatibility
Use PYTHONPATH and PATH env vars to ensure Ray workers can access
packages installed by uv sync, maintaining consistent uv usage
across all make targets while supporting subprocess tools.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
  • Loading branch information
franciscojavierarceo and claude committed Feb 3, 2026
commit 0da7c1ddfbd1e67c256f75ed635b8f3546c32104
3 changes: 3 additions & 0 deletions .github/workflows/unit_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,9 @@ jobs:
- name: Install dependencies
run: make install-python-dependencies-ci
- name: Test Python
env:
PYTHONPATH: "/home/runner/work/feast/feast/.venv/lib/python${{ matrix.python-version }}/site-packages:$PYTHONPATH"
PATH: "/home/runner/work/feast/feast/.venv/bin:$PATH"
Comment thread
devin-ai-integration[bot] marked this conversation as resolved.
Outdated
run: make test-python-unit
- name: Minimize uv cache
run: uv cache prune --ci
Expand Down
45 changes: 45 additions & 0 deletions feast_profile_demo/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*.pyo
*.pyd

# C extensions
*.so

# Distribution / packaging
.Python
env/
venv/
ENV/
env.bak/
venv.bak/
*.egg-info/
dist/
build/
.venv

# Pytest
.cache
*.cover
*.log
.coverage
nosetests.xml
coverage.xml
*.hypothesis/
*.pytest_cache/

# Jupyter Notebook
.ipynb_checkpoints

# IDEs and Editors
.vscode/
.idea/
*.swp
*.swo
*.sublime-workspace
*.sublime-project

# OS generated files
.DS_Store
Thumbs.db
29 changes: 29 additions & 0 deletions feast_profile_demo/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Feast Quickstart
If you haven't already, check out the quickstart guide on Feast's website (http://docs.feast.dev/quickstart), which
uses this repo. A quick view of what's in this repository's `feature_repo/` directory:

* `data/` contains raw demo parquet data
* `feature_repo/feature_definitions.py` contains demo feature definitions
* `feature_repo/feature_store.yaml` contains a demo setup configuring where data sources are
* `feature_repo/test_workflow.py` showcases how to run all key Feast commands, including defining, retrieving, and pushing features.

You can run the overall workflow with `python test_workflow.py`.

## To move from this into a more production ready workflow:
> See more details in [Running Feast in production](https://docs.feast.dev/how-to-guides/running-feast-in-production)

1. First: you should start with a different Feast template, which delegates to a more scalable offline store.
- For example, running `feast init -t gcp`
or `feast init -t aws` or `feast init -t snowflake`.
- You can see your options if you run `feast init --help`.
2. `feature_store.yaml` points to a local file as a registry. You'll want to setup a remote file (e.g. in S3/GCS) or a
SQL registry. See [registry docs](https://docs.feast.dev/getting-started/concepts/registry) for more details.
3. This example uses a file [offline store](https://docs.feast.dev/getting-started/components/offline-store)
to generate training data. It does not scale. We recommend instead using a data warehouse such as BigQuery,
Snowflake, Redshift. There is experimental support for Spark as well.
4. Setup CI/CD + dev vs staging vs prod environments to automatically update the registry as you change Feast feature definitions. See [docs](https://docs.feast.dev/how-to-guides/running-feast-in-production#1.-automatically-deploying-changes-to-your-feature-definitions).
5. (optional) Regularly scheduled materialization to power low latency feature retrieval (e.g. via Airflow). See [Batch data ingestion](https://docs.feast.dev/getting-started/concepts/data-ingestion#batch-data-ingestion)
for more details.
6. (optional) Deploy feature server instances with `feast serve` to expose endpoints to retrieve online features.
- See [Python feature server](https://docs.feast.dev/reference/feature-servers/python-feature-server) for details.
- Use cases can also directly call the Feast client to fetch features as per [Feature retrieval](https://docs.feast.dev/getting-started/concepts/feature-retrieval)
Empty file added feast_profile_demo/__init__.py
Empty file.
152 changes: 152 additions & 0 deletions feast_profile_demo/feature_repo/README_Profiling.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
# Feast Performance Profiling Suite

## Overview

This repository contains a comprehensive performance profiling suite for Feast's feature serving infrastructure. The profiling tools help identify bottlenecks in FeatureStore operations, FastAPI server performance, and component-level inefficiencies.

## Files Created

### Core Profiling Scripts

1. **`profiling_utils.py`** - Shared utilities for cProfile management, timing, memory tracking
2. **`profile_feature_store.py`** - Direct FeatureStore.get_online_features() profiling
3. **`profile_feature_server.py`** - FastAPI server endpoint profiling (requires requests, aiohttp)
4. **`profile_components.py`** - Component isolation profiling (protobuf, registry, etc.)
5. **`profiling_analysis.md`** - Comprehensive analysis of performance findings

### Generated Reports

- **CSV Reports**: Quantitative performance data in `profiling_results/*/profiling_summary_*.csv`
- **Profile Files**: Detailed cProfile outputs (`.prof` files) for snakeviz analysis
- **Memory Analysis**: Tracemalloc snapshots for memory usage patterns

## Key Performance Findings

### Major Bottlenecks Identified

1. **FeatureStore Initialization: 2.4-2.5 seconds**
- Primary bottleneck for serverless deployments
- Heavy import and dependency loading overhead
- 99.8% of initialization time spent in `feature_store.py:123(__init__)`

2. **On-Demand Feature Views: 4x Performance Penalty**
- Standard features: ~2ms per request
- With ODFVs: ~8ms per request
- Bottleneck: `on_demand_feature_view.py:819(transform_arrow)`

3. **Feature Services: 129% Overhead vs Direct Features**
- Direct features: 7ms
- Feature service: 16ms
- Additional registry traversal costs

### Scaling Characteristics

- **Entity Count**: Linear scaling (good)
- 1 entity: 2ms
- 1000 entities: 22ms
- **Memory Usage**: Efficient (<1MB for most operations)
- **Provider Abstraction**: Minimal overhead

## Usage Instructions

### Quick Start

```bash
# Run basic FeatureStore profiling
python profile_feature_store.py

# Run component isolation tests
python profile_components.py

# For FastAPI server profiling (requires additional deps):
pip install requests aiohttp
python profile_feature_server.py
```

### Custom Profiling

```python
from profiling_utils import FeastProfiler
from feast import FeatureStore

profiler = FeastProfiler("my_results")

with profiler.profile_context("my_test") as result:
store = FeatureStore(repo_path=".")

with profiler.time_operation("feature_retrieval", result):
response = store.get_online_features(...)

# Add custom metrics
result.add_timing("custom_metric", some_value)

# Generate reports
profiler.print_summary()
profiler.generate_csv_report()
```

### Analysis Tools

```bash
# View interactive call graphs
pip install snakeviz
snakeviz profiling_results/components/my_test_*.prof

# Analyze CSV reports
import pandas as pd
df = pd.read_csv("profiling_results/*/profiling_summary_*.csv")
```

## Optimization Priorities

### High Impact (>100ms improvement potential)

1. **Optimize FeatureStore initialization** - Lazy loading, import optimization
2. **On-Demand Feature View optimization** - Arrow operations, vectorization

### Medium Impact (10-100ms improvement potential)

3. **Entity batch processing** - Vectorized operations for large batches
4. **Response serialization** - Streaming, protobuf optimization

### Low Impact (<10ms improvement potential)

5. **Registry operations** - Already efficient, minor optimizations possible

## Environment Setup

This profiling was conducted with:
- **Data**: Local SQLite online store, 15 days × 5 drivers hourly stats
- **Features**: Standard numerical features + on-demand transformations
- **Scale**: 1-1000 entities, 1-5 features per request
- **Provider**: Local SQLite (provider-agnostic bottlenecks identified)

## Production Recommendations

### For High-Throughput Serving

1. **Pre-initialize FeatureStore** - Keep warm instances to avoid 2.4s cold start
2. **Minimize ODFV usage** - Consider pre-computation for performance-critical paths
3. **Use direct feature lists** - Avoid feature service overhead when possible
4. **Batch entity requests** - Linear scaling makes batching efficient

### For Serverless Deployment

1. **Investigate initialization optimization** - Biggest impact for cold starts
2. **Consider connection pooling** - Reduce per-request overhead
3. **Monitor memory usage** - Current usage is efficient (<1MB typical)

### For Development

1. **Use profiling suite** - Regular performance regression testing
2. **Benchmark new features** - Especially ODFV implementations
3. **Monitor provider changes** - Verify abstraction layer efficiency

## Next Steps

1. **Run FastAPI server profiling** with proper dependencies
2. **Implement optimization recommendations** starting with high-impact items
3. **Establish continuous profiling** in CI/CD pipeline
4. **Profile production workloads** to validate findings

This profiling suite provides the foundation for ongoing Feast performance optimization and monitoring.
Empty file.
Binary file not shown.
Binary file not shown.
148 changes: 148 additions & 0 deletions feast_profile_demo/feature_repo/feature_definitions.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
# This is an example feature definition file

from datetime import timedelta

import pandas as pd

from feast import (
Entity,
FeatureService,
FeatureView,
Field,
FileSource,
Project,
PushSource,
RequestSource,
)
from feast.feature_logging import LoggingConfig
from feast.infra.offline_stores.file_source import FileLoggingDestination
from feast.on_demand_feature_view import on_demand_feature_view
from feast.types import Float32, Float64, Int64

# Define a project for the feature repo
project = Project(name="feast_profile_demo", description="A project for driver statistics")

# Define an entity for the driver. You can think of an entity as a primary key used to
# fetch features.
driver = Entity(name="driver", join_keys=["driver_id"])

# Read data from parquet files. Parquet is convenient for local development mode. For
# production, you can use your favorite DWH, such as BigQuery. See Feast documentation
# for more info.
driver_stats_source = FileSource(
name="driver_hourly_stats_source",
path="data/driver_stats.parquet",
timestamp_field="event_timestamp",
created_timestamp_column="created",
)

# Our parquet files contain sample data that includes a driver_id column, timestamps and
# three feature column. Here we define a Feature View that will allow us to serve this
# data to our model online.
driver_stats_fv = FeatureView(
# The unique name of this feature view. Two feature views in a single
# project cannot have the same name
name="driver_hourly_stats",
entities=[driver],
ttl=timedelta(days=1),
# The list of features defined below act as a schema to both define features
# for both materialization of features into a store, and are used as references
# during retrieval for building a training dataset or serving features
schema=[
Field(name="conv_rate", dtype=Float32),
Field(name="acc_rate", dtype=Float32),
Field(name="avg_daily_trips", dtype=Int64, description="Average daily trips"),
],
online=True,
source=driver_stats_source,
# Tags are user defined key/value pairs that are attached to each
# feature view
tags={"team": "driver_performance"},
)

# Define a request data source which encodes features / information only
# available at request time (e.g. part of the user initiated HTTP request)
input_request = RequestSource(
name="vals_to_add",
schema=[
Field(name="val_to_add", dtype=Int64),
Field(name="val_to_add_2", dtype=Int64),
],
)


# Define an on demand feature view which can generate new features based on
# existing feature views and RequestSource features
@on_demand_feature_view(
sources=[driver_stats_fv, input_request],
schema=[
Field(name="conv_rate_plus_val1", dtype=Float64),
Field(name="conv_rate_plus_val2", dtype=Float64),
],
)
def transformed_conv_rate(inputs: pd.DataFrame) -> pd.DataFrame:
df = pd.DataFrame()
df["conv_rate_plus_val1"] = inputs["conv_rate"] + inputs["val_to_add"]
df["conv_rate_plus_val2"] = inputs["conv_rate"] + inputs["val_to_add_2"]
return df


# This groups features into a model version
driver_activity_v1 = FeatureService(
name="driver_activity_v1",
features=[
driver_stats_fv[["conv_rate"]], # Sub-selects a feature from a feature view
transformed_conv_rate, # Selects all features from the feature view
],
logging_config=LoggingConfig(
destination=FileLoggingDestination(path="data")
),
)
driver_activity_v2 = FeatureService(
name="driver_activity_v2", features=[driver_stats_fv, transformed_conv_rate]
)

# Defines a way to push data (to be available offline, online or both) into Feast.
driver_stats_push_source = PushSource(
name="driver_stats_push_source",
batch_source=driver_stats_source,
)

# Defines a slightly modified version of the feature view from above, where the source
# has been changed to the push source. This allows fresh features to be directly pushed
# to the online store for this feature view.
driver_stats_fresh_fv = FeatureView(
name="driver_hourly_stats_fresh",
entities=[driver],
ttl=timedelta(days=1),
schema=[
Field(name="conv_rate", dtype=Float32),
Field(name="acc_rate", dtype=Float32),
Field(name="avg_daily_trips", dtype=Int64),
],
online=True,
source=driver_stats_push_source, # Changed from above
tags={"team": "driver_performance"},
)


# Define an on demand feature view which can generate new features based on
# existing feature views and RequestSource features
@on_demand_feature_view(
sources=[driver_stats_fresh_fv, input_request], # relies on fresh version of FV
schema=[
Field(name="conv_rate_plus_val1", dtype=Float64),
Field(name="conv_rate_plus_val2", dtype=Float64),
],
)
def transformed_conv_rate_fresh(inputs: pd.DataFrame) -> pd.DataFrame:
df = pd.DataFrame()
df["conv_rate_plus_val1"] = inputs["conv_rate"] + inputs["val_to_add"]
df["conv_rate_plus_val2"] = inputs["conv_rate"] + inputs["val_to_add_2"]
return df


driver_activity_v3 = FeatureService(
name="driver_activity_v3",
features=[driver_stats_fresh_fv, transformed_conv_rate_fresh],
)
Loading
Loading