Skip to content

Commit e159a95

Browse files
committed
feat: Add scripts for benchmark testing on local Linux machine.
Signed-off-by: Shuchu Han <shuchu.han@gmail.com>
1 parent b2b46a3 commit e159a95

File tree

8 files changed

+1017
-0
lines changed

8 files changed

+1017
-0
lines changed

python_local/README.md

Lines changed: 145 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,145 @@
1+
# Benchmarking Python Feature Server
2+
3+
Here we provide tools for benchmarking Python-based feature server with one online stores: Redis on a local Linux machine. Follow the instructions below to reproduce the benchmarks.
4+
5+
_Tested with: `feast 0.37.1`_
6+
7+
## Prerequisites
8+
9+
You need to have the following installed:
10+
* Python `3.9+`
11+
* Feast `0.37.0+`
12+
* Docker
13+
* Docker Compose `v2.x`
14+
* Vegeta
15+
* `parquet-tools`
16+
17+
18+
19+
## Generate Data
20+
21+
For all of the following benchmarks, you'll need to generate the data using `data_generator.py` under the top-level directory of this repo. Just `cd` to the main directory and run `python data_generator.py`. Please be aware that the timestamp of the generated parquet file has an experiation effect. If you try to use the generated data at a different day, it will fail the "feast materialize-increment" command in Step 3. Please generate this fake data again if no feature data is written into the Redis.
22+
23+
The generated parquet file includes:
24+
1, 252 columns: "entity" column, "event_timestamp" column and 250 fake "feature_[*]" columns.
25+
2, 10,000 rows.
26+
3, the value of the Datafame are randomg integers.
27+
28+
The content of the parquet can be checked by following example commands:
29+
1, ```parquet-tools inspect generated_data.parquet```
30+
2, ```parquet-tools show --head 2 generated_data.parquet```
31+
32+
33+
34+
## Redis
35+
36+
1. Disable the USAGE feature. Apply feature definitions to create a Feast repo.
37+
38+
```
39+
export FEAST_USAGE=False
40+
cd python/feature_repos/redis
41+
feast apply
42+
```
43+
44+
2. Deploy Redis & feature servers using docker-compose
45+
46+
```
47+
cd ../../docker/redis
48+
docker-compose up -d
49+
```
50+
If everything goes well, you should see an output like this:
51+
52+
```
53+
Creating redis_redis_1 ... done
54+
Creating redis_feast_1 ... done
55+
Creating redis_feast_2 ... done
56+
Creating redis_feast_3 ... done
57+
Creating redis_feast_4 ... done
58+
Creating redis_feast_5 ... done
59+
Creating redis_feast_6 ... done
60+
Creating redis_feast_7 ... done
61+
Creating redis_feast_8 ... done
62+
Creating redis_feast_9 ... done
63+
Creating redis_feast_10 ... done
64+
Creating redis_feast_11 ... done
65+
Creating redis_feast_12 ... done
66+
Creating redis_feast_13 ... done
67+
Creating redis_feast_14 ... done
68+
Creating redis_feast_15 ... done
69+
Creating redis_feast_16 ... done
70+
```
71+
72+
3. Materialize data to Redis
73+
74+
```
75+
cd ../../feature_repos/redis
76+
# This is unfortunately necessary because inside docker feature servers resolve
77+
# Redis host name as `redis`, but since we're running materialization from shell,
78+
# Redis is accessible on localhost:
79+
sed -i 's/redis:6379/localhost:6379/g' feature_store.yaml
80+
feast materialize-incremental $(date -u +"%Y-%m-%dT%H:%M:%S")
81+
# Make sure to change this back, since it can mess up with feature servers
82+
# if you run another docker-compose command later:
83+
sed -i 's/localhost:6379/redis:6379/g' feature_store.yaml
84+
```
85+
86+
4. Check that feature servers are working & they have materialized data
87+
88+
```
89+
cd ../../..
90+
parquet-tools show --columns entity generated_data.parquet 2>/dev/null | head -n 6
91+
```
92+
This should return something like this:
93+
94+
```
95+
+----------+
96+
| entity |
97+
|----------|
98+
| 94 |
99+
| 1992 |
100+
| 4475 |
101+
```
102+
103+
Put these numbers into an env variable with:
104+
105+
```
106+
TEST_ENTITY_IDS=`parquet-tools show --columns entity generated_data.parquet 2>/dev/null | head -n 6 | tail -n 3 | sed 's/|//g' | paste -d, -s`
107+
echo $TEST_ENTITY_IDS
108+
```
109+
(which should output something like `94 , 1992 , 4475 `)
110+
111+
112+
Query the feature server with
113+
114+
```
115+
curl -X POST \
116+
"http://127.0.0.1:6566/get-online-features" \
117+
-H "accept: application/json" \
118+
-d "{
119+
\"feature_service\": \"feature_service_0\",
120+
\"entities\": {
121+
\"entity\": [$TEST_ENTITY_IDS]
122+
}
123+
}" | jq
124+
```
125+
126+
127+
In the output, make sure that `"values"` field contains none of the null
128+
values. It should look something like this:
129+
130+
```
131+
{
132+
"values": [
133+
4475,
134+
1551,
135+
9889,
136+
```
137+
138+
5. Run Benchmarks
139+
140+
```
141+
cd python
142+
./run-benchmark.sh > perf.log
143+
```
144+
145+
The report (or say results) of vegeta will be written to "pert.log" file.
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
FROM python:3.9
2+
3+
RUN pip3 install 'feast[redis]==0.37.1'
4+
RUN pip3 install cffi
5+
6+
COPY feature_repos/redis feature_repo
7+
8+
WORKDIR feature_repo
9+
10+
ENV FEAST_USAGE=False
11+
12+
CMD feast serve --host "0.0.0.0" --port 6566
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
services:
2+
feast:
3+
build:
4+
context: ../..
5+
dockerfile: docker/redis/Dockerfile
6+
ports:
7+
- "6566-6581:6566"
8+
deploy:
9+
replicas: 16
10+
links:
11+
- redis
12+
redis:
13+
image: redis
14+
ports:
15+
- "6379:6379"
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
import datetime
2+
3+
from feast import Entity, Field, FeatureView, FileSource, FeatureService, ValueType
4+
from feast.types import Int64
5+
6+
generated_data_source = FileSource(
7+
path="../../../generated_data.parquet",
8+
event_timestamp_column="event_timestamp",
9+
)
10+
11+
entity = Entity(
12+
name="entity",
13+
value_type=ValueType.INT64,
14+
)
15+
16+
feature_views = [
17+
FeatureView(
18+
name=f"feature_view_{i}",
19+
entities=[entity],
20+
ttl=datetime.timedelta(days=1),
21+
schema=[
22+
Field(name=f"feature_{10 * i + j}", dtype=Int64)
23+
for j in range(10)
24+
],
25+
online=True,
26+
source=generated_data_source,
27+
)
28+
for i in range(25)
29+
]
30+
31+
feature_services = [
32+
FeatureService(
33+
name=f"feature_service_{i}",
34+
features=feature_views[:5*(i + 1)],
35+
)
36+
for i in range(5)
37+
]
38+
39+
def add_definitions_in_globals():
40+
for i, fv in enumerate(feature_views):
41+
globals()[f"feature_view_{i}"] = fv
42+
for i, fs in enumerate(feature_services):
43+
globals()[f"feature_service_{i}"] = fs
44+
45+
add_definitions_in_globals()
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
registry: data/registry.db
2+
project: feature_repo
3+
provider: local
4+
online_store:
5+
type: redis
6+
connection_string: redis:6379
7+
offline_store:
8+
type: file
9+
entity_key_serialization_version: 2

0 commit comments

Comments
 (0)