Skip to content

Commit d1807c9

Browse files
authored
Pyspark job for feature batch retrieval (feast-dev#1021)
* Pyspark job for feature batch retrieval Signed-off-by: Khor Shu Heng <khor.heng@gojek.com> * Add pyspark to ci requirements Signed-off-by: Khor Shu Heng <khor.heng@gojek.com> * Additional documentation and col mapping Signed-off-by: Khor Shu Heng <khor.heng@gojek.com> * Add Schema validation Signed-off-by: Khor Shu Heng <khor.heng@gojek.com> * Improve test case and documentation Signed-off-by: Khor Shu Heng <khor.heng@gojek.com> * Change max age to integer, filter source feature tables, tests for large dataframe Signed-off-by: Khor Shu Heng <khor.heng@gojek.com> Co-authored-by: Khor Shu Heng <khor.heng@gojek.com>
1 parent 61133e4 commit d1807c9

10 files changed

Lines changed: 1237 additions & 0 deletions

sdk/python/feast/pyspark/__init__.py

Whitespace-only changes.

sdk/python/feast/pyspark/historical_feature_retrieval_job.py

Lines changed: 421 additions & 0 deletions
Large diffs are not rendered by default.

sdk/python/requirements-ci.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ pytest-lazy-fixture==0.6.3
99
pytest-mock
1010
pytest-timeout
1111
pytest-ordering==0.6.*
12+
pyspark==3.*
1213
pandas~=1.0.0
1314
mock==2.0.0
1415
pandavro==1.5.*

sdk/python/requirements-dev.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,3 +38,5 @@ flake8
3838
black==19.10b0
3939
boto3
4040
moto
41+
pyspark==3.*
42+
pyspark-stubs==3.*

sdk/python/tests/data/bookings.csv

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
driver_id,event_timestamp,created_timestamp,completed_bookings
2+
8001,2020-08-31T00:00:00.000,2020-08-31T00:00:00.000,200
3+
8001,2020-09-01T00:00:00.000,2020-09-01T00:00:00.000,300
4+
8002,2020-09-01T00:00:00.000,2020-09-01T00:00:00.000,600
5+
8002,2020-09-01T00:00:00.000,2020-09-02T00:00:00.000,500
6+
8003,2020-09-01T00:00:00.000,2020-09-02T00:00:00.000,700
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
id,event_timestamp
2+
1001,2020-09-02T00:00:00.000
3+
1001,2020-09-03T00:00:00.000
4+
2001,2020-09-04T00:00:00.000
5+
2001,2020-09-04T00:00:00.000
6+
3001,2020-09-04T00:00:00.000
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
customer_id,total_bookings,datetime,created_datetime
2+
1001,200,2020-09-02T00:00:00.000,2020-09-02T00:00:00.000
3+
1001,400,2020-09-04T00:00:00.000,2020-09-02T00:00:00.000
4+
2001,500,2020-09-03T00:00:00.000,2020-09-01T00:00:00.000
5+
2001,600,2020-09-03T00:00:00.000,2020-09-02T00:00:00.000
6+
3001,700,2020-09-03T00:00:00.000,2020-09-03T00:00:00.000
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
customer_id,driver_id,event_timestamp
2+
1001,8001,2020-09-02T00:00:00.000
3+
1001,8002,2020-09-02T00:00:00.000
4+
1001,8002,2020-09-03T00:00:00.000
5+
2001,8002,2020-09-03T00:00:00.000
6+
2001,8002,2020-09-04T00:00:00.000
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
customer_id,event_timestamp,created_timestamp,daily_transactions
2+
1001,2020-08-31T00:00:00.000,2020-09-01T00:00:00.000,50.0
3+
1001,2020-09-01T00:00:00.000,2020-09-01T00:00:00.000,100.0
4+
2001,2020-09-01T00:00:00.000,2020-08-31T00:00:00.000,80.0
5+
2001,2020-09-01T00:00:00.000,2020-09-01T00:00:00.000,200.0
6+
3001,2020-09-01T00:00:00.000,2020-09-01T00:00:00.000,300.0

0 commit comments

Comments
 (0)