-
Notifications
You must be signed in to change notification settings - Fork 1.3k
feat: Modernize precommit hooks and optimize test performance #5929
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
franciscojavierarceo
merged 40 commits into
master
from
feat/precommit-test-performance-optimization
Feb 10, 2026
Merged
Changes from 1 commit
Commits
Show all changes
40 commits
Select commit
Hold shift + click to select a range
e7c9113
feat: Modernize precommit hooks and optimize test performance
franciscojavierarceo 031a978
fix: Run uv commands from root to use pyproject.toml
franciscojavierarceo a23152e
fix: Use --no-project for mypy to run from sdk/python
franciscojavierarceo ae52fcc
fix: Simplify precommit config to use make targets
franciscojavierarceo 88cefb3
fix: Use uv run --extra ci for tests to include all deps
franciscojavierarceo 759dd9e
fix: Fix import sorting in snowflake bootstrap.py
franciscojavierarceo ff4548e
feat: Modernize development workflow with uv integration and CI perfo…
franciscojavierarceo de69e92
fix: Resolve MyPy type error in MilvusOnlineStoreCreator
franciscojavierarceo c8bbf87
fix: Ensure feast module is accessible in CI smoke tests
franciscojavierarceo 46ed9d9
Merge branch 'master' into feat/precommit-test-performance-optimization
franciscojavierarceo df45285
fix: Ensure feast module is accessible in CI smoke tests
franciscojavierarceo 6bc12c2
Apply suggestion from @Copilot
franciscojavierarceo eb6b346
refactor: Simplify Makefile with consistent uv run usage
franciscojavierarceo 5417ab2
fix: Use uv sync for CI to enable consistent uv run usage
franciscojavierarceo 6f6c736
fix: Use uv run in smoke tests for virtualenv compatibility
franciscojavierarceo 9dae77f
chore: Untrack perf-monitor.py development utility
franciscojavierarceo 63c8f3b
fix: Address review feedback for pytest.ini and Makefile
franciscojavierarceo 2068303
Merge branch 'master' into feat/precommit-test-performance-optimization
franciscojavierarceo 0da7c1d
fix: Configure environment paths for Ray worker compatibility
franciscojavierarceo 8848a45
fix: Install make and fix Python paths in CI
franciscojavierarceo 4904104
fix: Use RUNNER_OS environment variable correctly
franciscojavierarceo 60466b2
fix: Ensure PATH is properly exported in test step
franciscojavierarceo 97cd848
fix: Use dynamic site-packages detection for cross-platform compatibi…
franciscojavierarceo 386c7cf
debug: Add Python 3.11 macOS debugging and compatibility workarounds
franciscojavierarceo d8b156c
fix: Apply macOS Ray compatibility workarounds to all Python versions
franciscojavierarceo 0b6d274
fix: Make PYTHONPATH additive to support both Ray workers and CLI tests
franciscojavierarceo c530cf6
fix: Skip ray_transformation doctests to avoid macOS Ray worker timeouts
franciscojavierarceo dec75eb
chore: Remove feast_profile_demo from git tracking
franciscojavierarceo f50366b
fix: Skip test_e2e_local on macOS CI due to Ray/uv subprocess issues
franciscojavierarceo 282558a
fix: Skip CLI tests on macOS CI due to Ray/uv subprocess issues
franciscojavierarceo f8051e1
chore: Remove perf-monitor.py from git tracking
franciscojavierarceo 0e111fc
fix: Use uv pip sync with virtualenv instead of uv sync
franciscojavierarceo 12b4a72
updated
franciscojavierarceo 2d54924
fix: Skip test_cli_chdir on macOS CI and use uv run pytest for REST A…
franciscojavierarceo c69a5c8
fix: Run uv commands from repo root to use correct virtualenv
franciscojavierarceo cf72f4e
fix: Handle missing arguments gracefully in mypy-daemon.sh
franciscojavierarceo ad90593
chore: Revert .gitignore changes
franciscojavierarceo 3e827bc
fix: Restore ruff format --check in lint-python target
franciscojavierarceo 9552822
Merge branch 'master' into feat/precommit-test-performance-optimization
franciscojavierarceo 9c499ad
Apply suggestion from @ntkathole
franciscojavierarceo File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
fix: Configure environment paths for Ray worker compatibility
Use PYTHONPATH and PATH env vars to ensure Ray workers can access packages installed by uv sync, maintaining consistent uv usage across all make targets while supporting subprocess tools. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Loading branch information
commit 0da7c1ddfbd1e67c256f75ed635b8f3546c32104
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,45 @@ | ||
| # Byte-compiled / optimized / DLL files | ||
| __pycache__/ | ||
| *.py[cod] | ||
| *.pyo | ||
| *.pyd | ||
|
|
||
| # C extensions | ||
| *.so | ||
|
|
||
| # Distribution / packaging | ||
| .Python | ||
| env/ | ||
| venv/ | ||
| ENV/ | ||
| env.bak/ | ||
| venv.bak/ | ||
| *.egg-info/ | ||
| dist/ | ||
| build/ | ||
| .venv | ||
|
|
||
| # Pytest | ||
| .cache | ||
| *.cover | ||
| *.log | ||
| .coverage | ||
| nosetests.xml | ||
| coverage.xml | ||
| *.hypothesis/ | ||
| *.pytest_cache/ | ||
|
|
||
| # Jupyter Notebook | ||
| .ipynb_checkpoints | ||
|
|
||
| # IDEs and Editors | ||
| .vscode/ | ||
| .idea/ | ||
| *.swp | ||
| *.swo | ||
| *.sublime-workspace | ||
| *.sublime-project | ||
|
|
||
| # OS generated files | ||
| .DS_Store | ||
| Thumbs.db |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,29 @@ | ||
| # Feast Quickstart | ||
| If you haven't already, check out the quickstart guide on Feast's website (http://docs.feast.dev/quickstart), which | ||
| uses this repo. A quick view of what's in this repository's `feature_repo/` directory: | ||
|
|
||
| * `data/` contains raw demo parquet data | ||
| * `feature_repo/feature_definitions.py` contains demo feature definitions | ||
| * `feature_repo/feature_store.yaml` contains a demo setup configuring where data sources are | ||
| * `feature_repo/test_workflow.py` showcases how to run all key Feast commands, including defining, retrieving, and pushing features. | ||
|
|
||
| You can run the overall workflow with `python test_workflow.py`. | ||
|
|
||
| ## To move from this into a more production ready workflow: | ||
| > See more details in [Running Feast in production](https://docs.feast.dev/how-to-guides/running-feast-in-production) | ||
|
|
||
| 1. First: you should start with a different Feast template, which delegates to a more scalable offline store. | ||
| - For example, running `feast init -t gcp` | ||
| or `feast init -t aws` or `feast init -t snowflake`. | ||
| - You can see your options if you run `feast init --help`. | ||
| 2. `feature_store.yaml` points to a local file as a registry. You'll want to setup a remote file (e.g. in S3/GCS) or a | ||
| SQL registry. See [registry docs](https://docs.feast.dev/getting-started/concepts/registry) for more details. | ||
| 3. This example uses a file [offline store](https://docs.feast.dev/getting-started/components/offline-store) | ||
| to generate training data. It does not scale. We recommend instead using a data warehouse such as BigQuery, | ||
| Snowflake, Redshift. There is experimental support for Spark as well. | ||
| 4. Setup CI/CD + dev vs staging vs prod environments to automatically update the registry as you change Feast feature definitions. See [docs](https://docs.feast.dev/how-to-guides/running-feast-in-production#1.-automatically-deploying-changes-to-your-feature-definitions). | ||
| 5. (optional) Regularly scheduled materialization to power low latency feature retrieval (e.g. via Airflow). See [Batch data ingestion](https://docs.feast.dev/getting-started/concepts/data-ingestion#batch-data-ingestion) | ||
| for more details. | ||
| 6. (optional) Deploy feature server instances with `feast serve` to expose endpoints to retrieve online features. | ||
| - See [Python feature server](https://docs.feast.dev/reference/feature-servers/python-feature-server) for details. | ||
| - Use cases can also directly call the Feast client to fetch features as per [Feature retrieval](https://docs.feast.dev/getting-started/concepts/feature-retrieval) |
Empty file.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,152 @@ | ||
| # Feast Performance Profiling Suite | ||
|
|
||
| ## Overview | ||
|
|
||
| This repository contains a comprehensive performance profiling suite for Feast's feature serving infrastructure. The profiling tools help identify bottlenecks in FeatureStore operations, FastAPI server performance, and component-level inefficiencies. | ||
|
|
||
| ## Files Created | ||
|
|
||
| ### Core Profiling Scripts | ||
|
|
||
| 1. **`profiling_utils.py`** - Shared utilities for cProfile management, timing, memory tracking | ||
| 2. **`profile_feature_store.py`** - Direct FeatureStore.get_online_features() profiling | ||
| 3. **`profile_feature_server.py`** - FastAPI server endpoint profiling (requires requests, aiohttp) | ||
| 4. **`profile_components.py`** - Component isolation profiling (protobuf, registry, etc.) | ||
| 5. **`profiling_analysis.md`** - Comprehensive analysis of performance findings | ||
|
|
||
| ### Generated Reports | ||
|
|
||
| - **CSV Reports**: Quantitative performance data in `profiling_results/*/profiling_summary_*.csv` | ||
| - **Profile Files**: Detailed cProfile outputs (`.prof` files) for snakeviz analysis | ||
| - **Memory Analysis**: Tracemalloc snapshots for memory usage patterns | ||
|
|
||
| ## Key Performance Findings | ||
|
|
||
| ### Major Bottlenecks Identified | ||
|
|
||
| 1. **FeatureStore Initialization: 2.4-2.5 seconds** | ||
| - Primary bottleneck for serverless deployments | ||
| - Heavy import and dependency loading overhead | ||
| - 99.8% of initialization time spent in `feature_store.py:123(__init__)` | ||
|
|
||
| 2. **On-Demand Feature Views: 4x Performance Penalty** | ||
| - Standard features: ~2ms per request | ||
| - With ODFVs: ~8ms per request | ||
| - Bottleneck: `on_demand_feature_view.py:819(transform_arrow)` | ||
|
|
||
| 3. **Feature Services: 129% Overhead vs Direct Features** | ||
| - Direct features: 7ms | ||
| - Feature service: 16ms | ||
| - Additional registry traversal costs | ||
|
|
||
| ### Scaling Characteristics | ||
|
|
||
| - **Entity Count**: Linear scaling (good) | ||
| - 1 entity: 2ms | ||
| - 1000 entities: 22ms | ||
| - **Memory Usage**: Efficient (<1MB for most operations) | ||
| - **Provider Abstraction**: Minimal overhead | ||
|
|
||
| ## Usage Instructions | ||
|
|
||
| ### Quick Start | ||
|
|
||
| ```bash | ||
| # Run basic FeatureStore profiling | ||
| python profile_feature_store.py | ||
|
|
||
| # Run component isolation tests | ||
| python profile_components.py | ||
|
|
||
| # For FastAPI server profiling (requires additional deps): | ||
| pip install requests aiohttp | ||
| python profile_feature_server.py | ||
| ``` | ||
|
|
||
| ### Custom Profiling | ||
|
|
||
| ```python | ||
| from profiling_utils import FeastProfiler | ||
| from feast import FeatureStore | ||
|
|
||
| profiler = FeastProfiler("my_results") | ||
|
|
||
| with profiler.profile_context("my_test") as result: | ||
| store = FeatureStore(repo_path=".") | ||
|
|
||
| with profiler.time_operation("feature_retrieval", result): | ||
| response = store.get_online_features(...) | ||
|
|
||
| # Add custom metrics | ||
| result.add_timing("custom_metric", some_value) | ||
|
|
||
| # Generate reports | ||
| profiler.print_summary() | ||
| profiler.generate_csv_report() | ||
| ``` | ||
|
|
||
| ### Analysis Tools | ||
|
|
||
| ```bash | ||
| # View interactive call graphs | ||
| pip install snakeviz | ||
| snakeviz profiling_results/components/my_test_*.prof | ||
|
|
||
| # Analyze CSV reports | ||
| import pandas as pd | ||
| df = pd.read_csv("profiling_results/*/profiling_summary_*.csv") | ||
| ``` | ||
|
|
||
| ## Optimization Priorities | ||
|
|
||
| ### High Impact (>100ms improvement potential) | ||
|
|
||
| 1. **Optimize FeatureStore initialization** - Lazy loading, import optimization | ||
| 2. **On-Demand Feature View optimization** - Arrow operations, vectorization | ||
|
|
||
| ### Medium Impact (10-100ms improvement potential) | ||
|
|
||
| 3. **Entity batch processing** - Vectorized operations for large batches | ||
| 4. **Response serialization** - Streaming, protobuf optimization | ||
|
|
||
| ### Low Impact (<10ms improvement potential) | ||
|
|
||
| 5. **Registry operations** - Already efficient, minor optimizations possible | ||
|
|
||
| ## Environment Setup | ||
|
|
||
| This profiling was conducted with: | ||
| - **Data**: Local SQLite online store, 15 days × 5 drivers hourly stats | ||
| - **Features**: Standard numerical features + on-demand transformations | ||
| - **Scale**: 1-1000 entities, 1-5 features per request | ||
| - **Provider**: Local SQLite (provider-agnostic bottlenecks identified) | ||
|
|
||
| ## Production Recommendations | ||
|
|
||
| ### For High-Throughput Serving | ||
|
|
||
| 1. **Pre-initialize FeatureStore** - Keep warm instances to avoid 2.4s cold start | ||
| 2. **Minimize ODFV usage** - Consider pre-computation for performance-critical paths | ||
| 3. **Use direct feature lists** - Avoid feature service overhead when possible | ||
| 4. **Batch entity requests** - Linear scaling makes batching efficient | ||
|
|
||
| ### For Serverless Deployment | ||
|
|
||
| 1. **Investigate initialization optimization** - Biggest impact for cold starts | ||
| 2. **Consider connection pooling** - Reduce per-request overhead | ||
| 3. **Monitor memory usage** - Current usage is efficient (<1MB typical) | ||
|
|
||
| ### For Development | ||
|
|
||
| 1. **Use profiling suite** - Regular performance regression testing | ||
| 2. **Benchmark new features** - Especially ODFV implementations | ||
| 3. **Monitor provider changes** - Verify abstraction layer efficiency | ||
|
|
||
| ## Next Steps | ||
|
|
||
| 1. **Run FastAPI server profiling** with proper dependencies | ||
| 2. **Implement optimization recommendations** starting with high-impact items | ||
| 3. **Establish continuous profiling** in CI/CD pipeline | ||
| 4. **Profile production workloads** to validate findings | ||
|
|
||
| This profiling suite provides the foundation for ongoing Feast performance optimization and monitoring. |
Empty file.
Binary file not shown.
Binary file not shown.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,148 @@ | ||
| # This is an example feature definition file | ||
|
|
||
| from datetime import timedelta | ||
|
|
||
| import pandas as pd | ||
|
|
||
| from feast import ( | ||
| Entity, | ||
| FeatureService, | ||
| FeatureView, | ||
| Field, | ||
| FileSource, | ||
| Project, | ||
| PushSource, | ||
| RequestSource, | ||
| ) | ||
| from feast.feature_logging import LoggingConfig | ||
| from feast.infra.offline_stores.file_source import FileLoggingDestination | ||
| from feast.on_demand_feature_view import on_demand_feature_view | ||
| from feast.types import Float32, Float64, Int64 | ||
|
|
||
| # Define a project for the feature repo | ||
| project = Project(name="feast_profile_demo", description="A project for driver statistics") | ||
|
|
||
| # Define an entity for the driver. You can think of an entity as a primary key used to | ||
| # fetch features. | ||
| driver = Entity(name="driver", join_keys=["driver_id"]) | ||
|
|
||
| # Read data from parquet files. Parquet is convenient for local development mode. For | ||
| # production, you can use your favorite DWH, such as BigQuery. See Feast documentation | ||
| # for more info. | ||
| driver_stats_source = FileSource( | ||
| name="driver_hourly_stats_source", | ||
| path="data/driver_stats.parquet", | ||
| timestamp_field="event_timestamp", | ||
| created_timestamp_column="created", | ||
| ) | ||
|
|
||
| # Our parquet files contain sample data that includes a driver_id column, timestamps and | ||
| # three feature column. Here we define a Feature View that will allow us to serve this | ||
| # data to our model online. | ||
| driver_stats_fv = FeatureView( | ||
| # The unique name of this feature view. Two feature views in a single | ||
| # project cannot have the same name | ||
| name="driver_hourly_stats", | ||
| entities=[driver], | ||
| ttl=timedelta(days=1), | ||
| # The list of features defined below act as a schema to both define features | ||
| # for both materialization of features into a store, and are used as references | ||
| # during retrieval for building a training dataset or serving features | ||
| schema=[ | ||
| Field(name="conv_rate", dtype=Float32), | ||
| Field(name="acc_rate", dtype=Float32), | ||
| Field(name="avg_daily_trips", dtype=Int64, description="Average daily trips"), | ||
| ], | ||
| online=True, | ||
| source=driver_stats_source, | ||
| # Tags are user defined key/value pairs that are attached to each | ||
| # feature view | ||
| tags={"team": "driver_performance"}, | ||
| ) | ||
|
|
||
| # Define a request data source which encodes features / information only | ||
| # available at request time (e.g. part of the user initiated HTTP request) | ||
| input_request = RequestSource( | ||
| name="vals_to_add", | ||
| schema=[ | ||
| Field(name="val_to_add", dtype=Int64), | ||
| Field(name="val_to_add_2", dtype=Int64), | ||
| ], | ||
| ) | ||
|
|
||
|
|
||
| # Define an on demand feature view which can generate new features based on | ||
| # existing feature views and RequestSource features | ||
| @on_demand_feature_view( | ||
| sources=[driver_stats_fv, input_request], | ||
| schema=[ | ||
| Field(name="conv_rate_plus_val1", dtype=Float64), | ||
| Field(name="conv_rate_plus_val2", dtype=Float64), | ||
| ], | ||
| ) | ||
| def transformed_conv_rate(inputs: pd.DataFrame) -> pd.DataFrame: | ||
| df = pd.DataFrame() | ||
| df["conv_rate_plus_val1"] = inputs["conv_rate"] + inputs["val_to_add"] | ||
| df["conv_rate_plus_val2"] = inputs["conv_rate"] + inputs["val_to_add_2"] | ||
| return df | ||
|
|
||
|
|
||
| # This groups features into a model version | ||
| driver_activity_v1 = FeatureService( | ||
| name="driver_activity_v1", | ||
| features=[ | ||
| driver_stats_fv[["conv_rate"]], # Sub-selects a feature from a feature view | ||
| transformed_conv_rate, # Selects all features from the feature view | ||
| ], | ||
| logging_config=LoggingConfig( | ||
| destination=FileLoggingDestination(path="data") | ||
| ), | ||
| ) | ||
| driver_activity_v2 = FeatureService( | ||
| name="driver_activity_v2", features=[driver_stats_fv, transformed_conv_rate] | ||
| ) | ||
|
|
||
| # Defines a way to push data (to be available offline, online or both) into Feast. | ||
| driver_stats_push_source = PushSource( | ||
| name="driver_stats_push_source", | ||
| batch_source=driver_stats_source, | ||
| ) | ||
|
|
||
| # Defines a slightly modified version of the feature view from above, where the source | ||
| # has been changed to the push source. This allows fresh features to be directly pushed | ||
| # to the online store for this feature view. | ||
| driver_stats_fresh_fv = FeatureView( | ||
| name="driver_hourly_stats_fresh", | ||
| entities=[driver], | ||
| ttl=timedelta(days=1), | ||
| schema=[ | ||
| Field(name="conv_rate", dtype=Float32), | ||
| Field(name="acc_rate", dtype=Float32), | ||
| Field(name="avg_daily_trips", dtype=Int64), | ||
| ], | ||
| online=True, | ||
| source=driver_stats_push_source, # Changed from above | ||
| tags={"team": "driver_performance"}, | ||
| ) | ||
|
|
||
|
|
||
| # Define an on demand feature view which can generate new features based on | ||
| # existing feature views and RequestSource features | ||
| @on_demand_feature_view( | ||
| sources=[driver_stats_fresh_fv, input_request], # relies on fresh version of FV | ||
| schema=[ | ||
| Field(name="conv_rate_plus_val1", dtype=Float64), | ||
| Field(name="conv_rate_plus_val2", dtype=Float64), | ||
| ], | ||
| ) | ||
| def transformed_conv_rate_fresh(inputs: pd.DataFrame) -> pd.DataFrame: | ||
| df = pd.DataFrame() | ||
| df["conv_rate_plus_val1"] = inputs["conv_rate"] + inputs["val_to_add"] | ||
| df["conv_rate_plus_val2"] = inputs["conv_rate"] + inputs["val_to_add_2"] | ||
| return df | ||
|
|
||
|
|
||
| driver_activity_v3 = FeatureService( | ||
| name="driver_activity_v3", | ||
| features=[driver_stats_fresh_fv, transformed_conv_rate_fresh], | ||
| ) |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.