Skip to content

Commit 1c6dbc9

Browse files
committed
a few more details and aws instance ram fixed in python README
Signed-off-by: Stefano Lottini <stefano.lottini@datastax.com>
1 parent 9a6888b commit 1c6dbc9

File tree

1 file changed

+64
-24
lines changed

1 file changed

+64
-24
lines changed

python/README.md

Lines changed: 64 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -2,15 +2,19 @@
22

33
Here we provide tools for benchmarking Python-based feature server with 3 online stores: Redis, AWS DynamoDB, and GCP Datastore. Follow the instructions below to reproduce the benchmarks.
44

5+
_Tested with: `feast 0.25.1`_
6+
57
## Prerequisites
68

79
You need to have the following installed:
8-
* Feast 0.17+
10+
* Python `3.8+`
11+
* Feast `0.25+`
912
* Docker
10-
* Docker Compose
13+
* Docker Compose `v2.x`
1114
* Vegeta
15+
* `parquet-tools`
1216

13-
All these benchmarks are run on an EC2 instance (c5.4xlarge, 16vCPU, 64GiB memory) or a GCP GCE instance (c2-standard-16, 16 vCPU), on the same region as the target online stores.
17+
All these benchmarks are run on an EC2 instance (c5.4xlarge, 16vCPU, 32GiB memory) or a GCP GCE instance (c2-standard-16, 16 vCPU, 64GiB memory), on the same region as the target online stores.
1418

1519
## Generate Data
1620

@@ -53,9 +57,14 @@ Creating redis_feast_16 ... done
5357
3. Materialize data to Redis
5458
```
5559
cd ../../feature_repos/redis
56-
sed -i 's/redis:6379/localhost:6379/g' feature_store.yaml # this is unfortunately necessary because inside docker feature servers resolve Redis host name as `redis`, but since we're running materialization from shell, Redis is accessible on localhost.
60+
# This is unfortunately necessary because inside docker feature servers resolve
61+
# Redis host name as `redis`, but since we're running materialization from shell,
62+
# Redis is accessible on localhost:
63+
sed -i 's/redis:6379/localhost:6379/g' feature_store.yaml
5764
feast materialize-incremental $(date -u +"%Y-%m-%dT%H:%M:%S")
58-
sed -i 's/localhost:6379/redis:6379/g' feature_store.yaml # make sure to change this back, since it can mess up with feature servers if you run another docker-compose command later.
65+
# Make sure to change this back, since it can mess up with feature servers
66+
# if you run another docker-compose command later:
67+
sed -i 's/localhost:6379/redis:6379/g' feature_store.yaml
5968
```
6069

6170
4. Check that feature servers are working & they have materialized data
@@ -72,20 +81,31 @@ This should return something like this:
7281
| 1992 |
7382
| 4475 |
7483
```
75-
Take your 3 numbers and replace in this query:
84+
85+
Put these numbers into an env variable with:
86+
```
87+
TEST_ENTITY_IDS=`parquet-tools show --columns entity generated_data.parquet 2>/dev/null | head -n 6 | tail -n 3 | sed 's/|//g' | paste -d, -s`
88+
echo $TEST_ENTITY_IDS
89+
```
90+
(which should output something like `94 , 1992 , 4475 `)
91+
92+
93+
Query the feature server with
7694
```
7795
curl -X POST \
7896
"http://127.0.0.1:6566/get-online-features" \
7997
-H "accept: application/json" \
80-
-d '{
81-
"feature_service": "feature_service_0",
82-
"entities": {
83-
"entity": [94, 1992, 4475]
98+
-d "{
99+
\"feature_service\": \"feature_service_0\",
100+
\"entities\": {
101+
\"entity\": [$TEST_ENTITY_IDS]
84102
}
85-
}' | jq
103+
}" | jq
86104
```
87105

88-
In the output, make sure that `"values"` field contains none of the null values. It should look something like this:
106+
107+
In the output, make sure that `"values"` field contains none of the null
108+
values. It should look something like this:
89109
```
90110
{
91111
"values": [
@@ -154,18 +174,26 @@ This should return something like this:
154174
| 94 |
155175
| 1992 |
156176
| 4475 |
177+
178+
Put these numbers into an env variable with:
157179
```
158-
Take your 3 numbers and replace in this query:
180+
TEST_ENTITY_IDS=`parquet-tools show --columns entity generated_data.parquet 2>/dev/null | head -n 6 | tail -n 3 | sed 's/|//g' | paste -d, -s`
181+
echo $TEST_ENTITY_IDS
182+
```
183+
(which should output something like `94 , 1992 , 4475 `)
184+
185+
186+
Query the feature server with
159187
```
160188
curl -X POST \
161189
"http://127.0.0.1:6566/get-online-features" \
162190
-H "accept: application/json" \
163-
-d '{
164-
"feature_service": "feature_service_0",
165-
"entities": {
166-
"entity": [94, 1992, 4475]
191+
-d "{
192+
\"feature_service\": \"feature_service_0\",
193+
\"entities\": {
194+
\"entity\": [$TEST_ENTITY_IDS]
167195
}
168-
}' | jq
196+
}" | jq
169197
```
170198
171199
In the output, make sure that `"values"` field contains none of the null values. It should look something like this:
@@ -185,6 +213,10 @@ cd python
185213
186214
## GCP Datastore
187215
216+
For this benchmark, you need GCP credentials accessible. Here it is assumed that it's all in
217+
`${HOME}/.config/gcloud`, which will be available to the docker containers running
218+
the feature server. (Adjust as needed by inspecting the `docker-compose.yml`).
219+
188220
1. Apply feature definitions to create a Feast repo.
189221
```
190222
cd feature_repos/datastore
@@ -235,18 +267,26 @@ This should return something like this:
235267
| 94 |
236268
| 1992 |
237269
| 4475 |
270+
271+
Put these numbers into an env variable with:
272+
```
273+
TEST_ENTITY_IDS=`parquet-tools show --columns entity generated_data.parquet 2>/dev/null | head -n 6 | tail -n 3 | sed 's/|//g' | paste -d, -s`
274+
echo $TEST_ENTITY_IDS
238275
```
239-
Take your 3 numbers and replace in this query:
276+
(which should output something like `94 , 1992 , 4475 `)
277+
278+
279+
Query the feature server with
240280
```
241281
curl -X POST \
242282
"http://127.0.0.1:6566/get-online-features" \
243283
-H "accept: application/json" \
244-
-d '{
245-
"feature_service": "feature_service_0",
246-
"entities": {
247-
"entity": [94, 1992, 4475]
284+
-d "{
285+
\"feature_service\": \"feature_service_0\",
286+
\"entities\": {
287+
\"entity\": [$TEST_ENTITY_IDS]
248288
}
249-
}' | jq
289+
}" | jq
250290
```
251291

252292
In the output, make sure that `"values"` field contains none of the null values. It should look something like this:

0 commit comments

Comments
 (0)