|
| 1 | +# Benchmarking Python Feature Server |
| 2 | + |
| 3 | +Here we provide tools for benchmarking Python-based feature server with one online stores: Redis on a local Linux machine. Follow the instructions below to reproduce the benchmarks. |
| 4 | + |
| 5 | +_Tested with: `feast 0.37.1`_ |
| 6 | + |
| 7 | +## Prerequisites |
| 8 | + |
| 9 | +You need to have the following installed: |
| 10 | +* Python `3.9+` |
| 11 | +* Feast `0.37.0+` |
| 12 | +* Docker |
| 13 | +* Docker Compose `v2.x` |
| 14 | +* Vegeta |
| 15 | +* `parquet-tools` |
| 16 | + |
| 17 | + |
| 18 | + |
| 19 | +## Generate Data |
| 20 | + |
| 21 | +For all of the following benchmarks, you'll need to generate the data using `data_generator.py` under the top-level directory of this repo. Just `cd` to the main directory and run `python data_generator.py`. Please be aware that the timestamp of the generated parquet file has an experiation effect. If you try to use the generated data at a different day, it will fail the "feast materialize-increment" command in Step 3. Please generate this fake data again if no feature data is written into the Redis. |
| 22 | + |
| 23 | +The generated parquet file includes: |
| 24 | +1, 252 columns: "entity" column, "event_timestamp" column and 250 fake "feature_[*]" columns. |
| 25 | +2, 10,000 rows. |
| 26 | +3, the value of the Datafame are randomg integers. |
| 27 | + |
| 28 | +The content of the parquet can be checked by following example commands: |
| 29 | +1, ```parquet-tools inspect generated_data.parquet``` |
| 30 | +2, ```parquet-tools show --head 2 generated_data.parquet``` |
| 31 | + |
| 32 | + |
| 33 | + |
| 34 | +## Redis |
| 35 | + |
| 36 | +1. Disable the USAGE feature. Apply feature definitions to create a Feast repo. |
| 37 | + |
| 38 | +``` |
| 39 | +export FEAST_USAGE=False |
| 40 | +cd python/feature_repos/redis |
| 41 | +feast apply |
| 42 | +``` |
| 43 | + |
| 44 | +2. Deploy Redis & feature servers using docker-compose |
| 45 | + |
| 46 | +``` |
| 47 | +cd ../../docker/redis |
| 48 | +docker-compose up -d |
| 49 | +``` |
| 50 | +If everything goes well, you should see an output like this: |
| 51 | + |
| 52 | +``` |
| 53 | +Creating redis_redis_1 ... done |
| 54 | +Creating redis_feast_1 ... done |
| 55 | +Creating redis_feast_2 ... done |
| 56 | +Creating redis_feast_3 ... done |
| 57 | +Creating redis_feast_4 ... done |
| 58 | +Creating redis_feast_5 ... done |
| 59 | +Creating redis_feast_6 ... done |
| 60 | +Creating redis_feast_7 ... done |
| 61 | +Creating redis_feast_8 ... done |
| 62 | +Creating redis_feast_9 ... done |
| 63 | +Creating redis_feast_10 ... done |
| 64 | +Creating redis_feast_11 ... done |
| 65 | +Creating redis_feast_12 ... done |
| 66 | +Creating redis_feast_13 ... done |
| 67 | +Creating redis_feast_14 ... done |
| 68 | +Creating redis_feast_15 ... done |
| 69 | +Creating redis_feast_16 ... done |
| 70 | +``` |
| 71 | + |
| 72 | +3. Materialize data to Redis |
| 73 | + |
| 74 | +``` |
| 75 | +cd ../../feature_repos/redis |
| 76 | +# This is unfortunately necessary because inside docker feature servers resolve |
| 77 | +# Redis host name as `redis`, but since we're running materialization from shell, |
| 78 | +# Redis is accessible on localhost: |
| 79 | +sed -i 's/redis:6379/localhost:6379/g' feature_store.yaml |
| 80 | +feast materialize-incremental $(date -u +"%Y-%m-%dT%H:%M:%S") |
| 81 | +# Make sure to change this back, since it can mess up with feature servers |
| 82 | +# if you run another docker-compose command later: |
| 83 | +sed -i 's/localhost:6379/redis:6379/g' feature_store.yaml |
| 84 | +``` |
| 85 | + |
| 86 | +4. Check that feature servers are working & they have materialized data |
| 87 | + |
| 88 | +``` |
| 89 | +cd ../../.. |
| 90 | +parquet-tools show --columns entity generated_data.parquet 2>/dev/null | head -n 6 |
| 91 | +``` |
| 92 | +This should return something like this: |
| 93 | + |
| 94 | +``` |
| 95 | ++----------+ |
| 96 | +| entity | |
| 97 | +|----------| |
| 98 | +| 94 | |
| 99 | +| 1992 | |
| 100 | +| 4475 | |
| 101 | +``` |
| 102 | + |
| 103 | +Put these numbers into an env variable with: |
| 104 | + |
| 105 | +``` |
| 106 | +TEST_ENTITY_IDS=`parquet-tools show --columns entity generated_data.parquet 2>/dev/null | head -n 6 | tail -n 3 | sed 's/|//g' | paste -d, -s` |
| 107 | +echo $TEST_ENTITY_IDS |
| 108 | +``` |
| 109 | +(which should output something like `94 , 1992 , 4475 `) |
| 110 | + |
| 111 | + |
| 112 | +Query the feature server with |
| 113 | + |
| 114 | +``` |
| 115 | +curl -X POST \ |
| 116 | + "http://127.0.0.1:6566/get-online-features" \ |
| 117 | + -H "accept: application/json" \ |
| 118 | + -d "{ |
| 119 | + \"feature_service\": \"feature_service_0\", |
| 120 | + \"entities\": { |
| 121 | + \"entity\": [$TEST_ENTITY_IDS] |
| 122 | + } |
| 123 | + }" | jq |
| 124 | +``` |
| 125 | + |
| 126 | + |
| 127 | +In the output, make sure that `"values"` field contains none of the null |
| 128 | +values. It should look something like this: |
| 129 | + |
| 130 | +``` |
| 131 | + { |
| 132 | + "values": [ |
| 133 | + 4475, |
| 134 | + 1551, |
| 135 | + 9889, |
| 136 | +``` |
| 137 | + |
| 138 | +5. Run Benchmarks |
| 139 | + |
| 140 | +``` |
| 141 | +cd python |
| 142 | +./run-benchmark.sh > perf.log |
| 143 | +``` |
| 144 | + |
| 145 | +The report (or say results) of vegeta will be written to "pert.log" file. |
0 commit comments