Skip to content

Commit 891231a

Browse files
chore: Provide related exceptions for empty data frame from entity sql execution in spark sql query execution. (feast-dev#3323)
1. Modified spark.py, to raise error from current file regarding empty data from entity sql execution, than from separate module. 2. Removed spark_utils.py 3. Removed un-necessary files from git repo and changed gitignore accordingly. Formatted spark.py with make for better linting Signed-off-by: amithadiraju1694 <amith.adiraju@gmail.com> Signed-off-by: amithadiraju1694 <amith.adiraju@gmail.com>
1 parent ea94aa2 commit 891231a

File tree

2 files changed

+8
-2
lines changed

2 files changed

+8
-2
lines changed

.gitignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -223,4 +223,4 @@ ui/.vercel
223223
**/yarn-error.log*
224224

225225
# Go subprocess binaries (built during feast pip package building)
226-
sdk/python/feast/binaries/
226+
sdk/python/feast/binaries/

sdk/python/feast/infra/offline_stores/contrib/spark_offline_store/spark.py

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@
1818

1919
from feast import FeatureView, OnDemandFeatureView
2020
from feast.data_source import DataSource
21-
from feast.errors import InvalidEntityType
21+
from feast.errors import EntitySQLEmptyResults, InvalidEntityType
2222
from feast.feature_view import DUMMY_ENTITY_ID, DUMMY_ENTITY_VAL
2323
from feast.infra.offline_stores import offline_utils
2424
from feast.infra.offline_stores.contrib.spark_offline_store.spark_source import (
@@ -449,7 +449,13 @@ def _get_entity_df_event_timestamp_range(
449449
# If the entity_df is a string (SQL query), determine range
450450
# from table
451451
df = spark_session.sql(entity_df).select(entity_df_event_timestamp_col)
452+
453+
# Checks if executing entity sql resulted in any data
454+
if df.rdd.isEmpty():
455+
raise EntitySQLEmptyResults(entity_df)
456+
452457
# TODO(kzhang132): need utc conversion here.
458+
453459
entity_df_event_timestamp_range = (
454460
df.agg({entity_df_event_timestamp_col: "min"}).collect()[0][0],
455461
df.agg({entity_df_event_timestamp_col: "max"}).collect()[0][0],

0 commit comments

Comments
 (0)