diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml index bcf7be93702..2ce95f07707 100644 --- a/.pre-commit-config.yaml +++ b/.pre-commit-config.yaml @@ -14,6 +14,7 @@ repos: stages: [commit] language: system types: [python] + exclude: '_pb2\.py$' entry: bash -c 'uv run ruff check --fix "$@" && uv run ruff format "$@"' -- pass_filenames: true @@ -24,6 +25,7 @@ repos: stages: [commit] language: system types: [python] + exclude: '_pb2\.py$' entry: bash -c 'uv run ruff check "$@" && uv run ruff format --check "$@"' -- pass_filenames: true diff --git a/docs/getting-started/concepts/feast-types.md b/docs/getting-started/concepts/feast-types.md index 72741f263e4..94c93f2a8ea 100644 --- a/docs/getting-started/concepts/feast-types.md +++ b/docs/getting-started/concepts/feast-types.md @@ -5,10 +5,44 @@ To make this possible, Feast itself has a type system for all the types it is ab Feast's type system is built on top of [protobuf](https://github.com/protocolbuffers/protobuf). The messages that make up the type system can be found [here](https://github.com/feast-dev/feast/blob/master/protos/feast/types/Value.proto), and the corresponding python classes that wrap them can be found [here](https://github.com/feast-dev/feast/blob/master/sdk/python/feast/types.py). -Feast supports primitive data types (numerical values, strings, bytes, booleans and timestamps). The only complex data type Feast supports is Arrays, and arrays cannot contain other arrays. +Feast supports the following categories of data types: + +- **Primitive types**: numerical values (`Int32`, `Int64`, `Float32`, `Float64`), `String`, `Bytes`, `Bool`, and `UnixTimestamp`. +- **Array types**: ordered lists of any primitive type, e.g. `Array(Int64)`, `Array(String)`. +- **Set types**: unordered collections of unique values for any primitive type, e.g. `Set(String)`, `Set(Int64)`. +- **Map types**: dictionary-like structures with string keys and values that can be any supported Feast type (including nested maps), e.g. `Map`, `Array(Map)`. +- **JSON type**: opaque JSON data stored as a string at the proto level but semantically distinct from `String` — backends use native JSON types (`jsonb`, `VARIANT`, etc.), e.g. `Json`, `Array(Json)`. +- **Struct type**: schema-aware structured type with named, typed fields. Unlike `Map` (which is schema-free), a `Struct` declares its field names and their types, enabling schema validation, e.g. `Struct({"name": String, "age": Int32})`. + +For a complete reference with examples, see [Type System](../../reference/type-system.md). Each feature or schema field in Feast is associated with a data type, which is stored in Feast's [registry](registry.md). These types are also used to ensure that Feast operates on values correctly (e.g. making sure that timestamp columns used for [point-in-time correct joins](point-in-time-joins.md) actually have the timestamp type). -As a result, each system that feast interacts with needs a way to translate data types from the native platform, into a feast type. E.g., Snowflake SQL types are converted to Feast types [here](https://rtd.feast.dev/en/master/feast.html#feast.type_map.snowflake_python_type_to_feast_value_type). The onus is therefore on authors of offline or online store connectors to make sure that this type mapping happens correctly. +As a result, each system that Feast interacts with needs a way to translate data types from the native platform into a Feast type. E.g., Snowflake SQL types are converted to Feast types [here](https://rtd.feast.dev/en/master/feast.html#feast.type_map.snowflake_python_type_to_feast_value_type). The onus is therefore on authors of offline or online store connectors to make sure that this type mapping happens correctly. + +### Backend Type Mapping for Complex Types + +Map, JSON, and Struct types are supported across all major Feast backends: + +| Backend | Native Type | Feast Type | +|---------|-------------|------------| +| PostgreSQL | `jsonb` | `Map`, `Json`, `Struct` | +| PostgreSQL | `jsonb[]` | `Array(Map)` | +| Snowflake | `VARIANT`, `OBJECT` | `Map` | +| Snowflake | `JSON` | `Json` | +| Redshift | `SUPER` | `Map` | +| Redshift | `json` | `Json` | +| BigQuery | `JSON` | `Json` | +| BigQuery | `STRUCT`, `RECORD` | `Struct` | +| Spark | `map` | `Map` | +| Spark | `array>` | `Array(Map)` | +| Spark | `struct<...>` | `Struct` | +| Spark | `array>` | `Array(Struct(...))` | +| MSSQL | `nvarchar(max)` | `Map`, `Json`, `Struct` | +| DynamoDB | Proto bytes | `Map`, `Json`, `Struct` | +| Redis | Proto bytes | `Map`, `Json`, `Struct` | +| Milvus | `VARCHAR` (serialized) | `Map`, `Json`, `Struct` | + +**Note**: When the backend native type is ambiguous (e.g., `jsonb` could be `Map`, `Json`, or `Struct`), the **schema-declared Feast type takes precedence**. The backend-to-Feast type mappings above are only used for schema inference when no explicit type is provided. **Note**: Feast currently does *not* support a null type in its type system. \ No newline at end of file diff --git a/docs/getting-started/concepts/feature-view.md b/docs/getting-started/concepts/feature-view.md index faaaf54408a..4ea007a1f91 100644 --- a/docs/getting-started/concepts/feature-view.md +++ b/docs/getting-started/concepts/feature-view.md @@ -24,6 +24,7 @@ Feature views consist of: * (optional, but recommended) a schema specifying one or more [features](feature-view.md#field) (without this, Feast will infer the schema by reading from the data source) * (optional, but recommended) metadata (for example, description, or other free-form metadata via `tags`) * (optional) a TTL, which limits how far back Feast will look when generating historical datasets +* (optional) `enable_validation=True`, which enables schema validation during materialization (see [Schema Validation](#schema-validation) below) Feature views allow Feast to model your existing feature data in a consistent way in both an offline (training) and online (serving) environment. Feature views generally contain features that are properties of a specific object, in which case that object is defined as an entity and included in the feature view. @@ -159,6 +160,43 @@ Feature names must be unique within a [feature view](feature-view.md#feature-vie Each field can have additional metadata associated with it, specified as key-value [tags](https://rtd.feast.dev/en/master/feast.html#feast.field.Field). +## Schema Validation + +Feature views support an optional `enable_validation` parameter that enables schema validation during materialization and historical feature retrieval. When enabled, Feast verifies that: + +- All declared feature columns are present in the input data. +- Column data types match the expected Feast types (mismatches are logged as warnings). + +This is useful for catching data quality issues early in the pipeline. To enable it: + +```python +from feast import FeatureView, Field +from feast.types import Int32, Int64, Float32, Json, Map, String, Struct + +validated_fv = FeatureView( + name="validated_features", + entities=[driver], + schema=[ + Field(name="trips_today", dtype=Int64), + Field(name="rating", dtype=Float32), + Field(name="preferences", dtype=Map), + Field(name="config", dtype=Json), # opaque JSON data + Field(name="address", dtype=Struct({"street": String, "city": String, "zip": Int32})), # typed struct + ], + source=my_source, + enable_validation=True, # enables schema checks +) +``` + +**JSON vs Map vs Struct**: These three complex types serve different purposes: +- **`Map`**: Schema-free dictionary with string keys. Use when the keys and values are dynamic. +- **`Json`**: Opaque JSON data stored as a string. Backends use native JSON types (`jsonb`, `VARIANT`). Use for configuration blobs or API responses where you don't need field-level typing. +- **`Struct`**: Schema-aware structured type with named, typed fields. Persisted through the registry via Field tags. Use when you know the exact structure and want type safety. + +Validation is supported in all compute engines (Local, Spark, and Ray). When a required column is missing, a `ValueError` is raised. Type mismatches are logged as warnings but do not block execution, allowing for safe gradual adoption. + +The `enable_validation` parameter is also available on `BatchFeatureView` and `StreamFeatureView`, as well as their respective decorators (`@batch_feature_view` and `@stream_feature_view`). + ## \[Alpha] On demand feature views On demand feature views allows data scientists to use existing features and request time data (features only available at request time) to transform and create new features. Users define python transformation logic which is executed in both the historical retrieval and online retrieval paths. diff --git a/docs/how-to-guides/dbt-integration.md b/docs/how-to-guides/dbt-integration.md index c85cf2508db..02c188d6bf1 100644 --- a/docs/how-to-guides/dbt-integration.md +++ b/docs/how-to-guides/dbt-integration.md @@ -289,6 +289,12 @@ Feast automatically maps dbt/warehouse column types to Feast types: | `TIMESTAMP`, `DATETIME` | `UnixTimestamp` | | `BYTES`, `BINARY` | `Bytes` | | `ARRAY` | `Array(type)` | +| `JSON`, `JSONB` | `Map` (or `Json` if declared in schema) | +| `VARIANT`, `OBJECT` | `Map` | +| `SUPER` | `Map` | +| `MAP` | `Map` | +| `STRUCT`, `RECORD` | `Struct` (BigQuery) | +| `struct<...>` | `Struct` (Spark) | Snowflake `NUMBER(precision, scale)` types are handled specially: - Scale > 0: `Float64` diff --git a/docs/specs/offline_store_format.md b/docs/specs/offline_store_format.md index ac829dd52f1..1b440d34c27 100644 --- a/docs/specs/offline_store_format.md +++ b/docs/specs/offline_store_format.md @@ -49,6 +49,12 @@ Here's how Feast types map to Pandas types for Feast APIs that take in or return | DOUBLE\_LIST | `list[float]`| | FLOAT\_LIST | `list[float]`| | BOOL\_LIST | `list[bool]`| +| MAP | `dict` (`Dict[str, Any]`)| +| MAP\_LIST | `list[dict]` (`List[Dict[str, Any]]`)| +| JSON | `object` (parsed Python dict/list/str)| +| JSON\_LIST | `list[object]`| +| STRUCT | `dict` (`Dict[str, Any]`)| +| STRUCT\_LIST | `list[dict]` (`List[Dict[str, Any]]`)| Note that this mapping is non-injective, that is more than one Pandas type may corresponds to one Feast type (but not vice versa). In these cases, when converting Feast values to Pandas, the **first** Pandas type in the table above is used. @@ -78,6 +84,12 @@ Here's how Feast types map to BigQuery types when using BigQuery for offline sto | DOUBLE\_LIST | `ARRAY`| | FLOAT\_LIST | `ARRAY`| | BOOL\_LIST | `ARRAY`| +| MAP | `JSON` / `STRUCT` | +| MAP\_LIST | `ARRAY` / `ARRAY` | +| JSON | `JSON` | +| JSON\_LIST | `ARRAY` | +| STRUCT | `STRUCT` / `RECORD` | +| STRUCT\_LIST | `ARRAY` | Values that are not specified by the table above will cause an error on conversion. @@ -94,3 +106,23 @@ https://docs.snowflake.com/en/user-guide/python-connector-pandas.html#snowflake- | INT32 | `INT8 / UINT8 / INT16 / UINT16 / INT32 / UINT32` | | INT64 | `INT64 / UINT64` | | DOUBLE | `FLOAT64` | +| MAP | `VARIANT` / `OBJECT` | +| JSON | `JSON` / `VARIANT` | + +#### Redshift Types +Here's how Feast types map to Redshift types when using Redshift for offline storage: + +| Feast Type | Redshift Type | +|-------------|--| +| Event Timestamp | `TIMESTAMP` / `TIMESTAMPTZ` | +| BYTES | `VARBYTE` | +| STRING | `VARCHAR` | +| INT32 | `INT4` / `SMALLINT` | +| INT64 | `INT8` / `BIGINT` | +| DOUBLE | `FLOAT8` / `DOUBLE PRECISION` | +| FLOAT | `FLOAT4` / `REAL` | +| BOOL | `BOOL` | +| MAP | `SUPER` | +| JSON | `json` / `SUPER` | + +Note: Redshift's `SUPER` type stores semi-structured JSON data. During materialization, Feast automatically handles `SUPER` columns that are exported as JSON strings by parsing them back into Python dictionaries before converting to `MAP` proto values. diff --git a/protos/feast/core/FeatureView.proto b/protos/feast/core/FeatureView.proto index 6306d425be3..b0a62a1c854 100644 --- a/protos/feast/core/FeatureView.proto +++ b/protos/feast/core/FeatureView.proto @@ -36,7 +36,7 @@ message FeatureView { FeatureViewMeta meta = 2; } -// Next available id: 17 +// Next available id: 18 // TODO(adchia): refactor common fields from this and ODFV into separate metadata proto message FeatureViewSpec { // Name of the feature view. Must be unique. Not updated. @@ -89,6 +89,9 @@ message FeatureViewSpec { // The transformation mode (e.g., "python", "pandas", "spark", "sql", "ray") string mode = 16; + + // Whether schema validation is enabled during materialization + bool enable_validation = 17; } message FeatureViewMeta { diff --git a/protos/feast/core/StreamFeatureView.proto b/protos/feast/core/StreamFeatureView.proto index 6492cbe3069..5f9ee6ce39d 100644 --- a/protos/feast/core/StreamFeatureView.proto +++ b/protos/feast/core/StreamFeatureView.proto @@ -37,7 +37,7 @@ message StreamFeatureView { FeatureViewMeta meta = 2; } -// Next available id: 20 +// Next available id: 21 message StreamFeatureViewSpec { // Name of the feature view. Must be unique. Not updated. string name = 1; @@ -99,5 +99,8 @@ message StreamFeatureViewSpec { // Hop size for tiling (e.g., 5 minutes). Determines the granularity of pre-aggregated tiles. // If not specified, defaults to 5 minutes. Only used when enable_tiling is true. google.protobuf.Duration tiling_hop_size = 19; + + // Whether schema validation is enabled during materialization + bool enable_validation = 20; } diff --git a/protos/feast/types/Value.proto b/protos/feast/types/Value.proto index be93235ab36..ada2ba42791 100644 --- a/protos/feast/types/Value.proto +++ b/protos/feast/types/Value.proto @@ -53,6 +53,10 @@ message ValueType { FLOAT_SET = 27; BOOL_SET = 28; UNIX_TIMESTAMP_SET = 29; + JSON = 32; + JSON_LIST = 33; + STRUCT = 34; + STRUCT_LIST = 35; } } @@ -88,6 +92,10 @@ message Value { FloatSet float_set_val = 27; BoolSet bool_set_val = 28; Int64Set unix_timestamp_set_val = 29; + string json_val = 32; + StringList json_list_val = 33; + Map struct_val = 34; + MapList struct_list_val = 35; } } diff --git a/sdk/python/feast/batch_feature_view.py b/sdk/python/feast/batch_feature_view.py index 3f3e1bf20ec..e2a1f78441a 100644 --- a/sdk/python/feast/batch_feature_view.py +++ b/sdk/python/feast/batch_feature_view.py @@ -97,6 +97,7 @@ def __init__( feature_transformation: Optional[Transformation] = None, batch_engine: Optional[Dict[str, Any]] = None, aggregations: Optional[List[Aggregation]] = None, + enable_validation: bool = False, ): if not flags_helper.is_test(): warnings.warn( @@ -136,6 +137,7 @@ def __init__( source=source, # type: ignore[arg-type] sink_source=sink_source, mode=mode, + enable_validation=enable_validation, ) def get_feature_transformation(self) -> Optional[Transformation]: @@ -169,6 +171,7 @@ def batch_feature_view( description: str = "", owner: str = "", schema: Optional[List[Field]] = None, + enable_validation: bool = False, ): """ Creates a BatchFeatureView object with the given user-defined function (UDF) as the transformation. @@ -199,6 +202,7 @@ def decorator(user_function): schema=schema, udf=user_function, udf_string=udf_string, + enable_validation=enable_validation, ) functools.update_wrapper(wrapper=batch_feature_view_obj, wrapped=user_function) return batch_feature_view_obj diff --git a/sdk/python/feast/driver_test_data.py b/sdk/python/feast/driver_test_data.py index d96c9c6d387..39b7faf22c2 100644 --- a/sdk/python/feast/driver_test_data.py +++ b/sdk/python/feast/driver_test_data.py @@ -136,10 +136,38 @@ def create_driver_hourly_stats_df(drivers, start_date, end_date) -> pd.DataFrame df_all_drivers["conv_rate"] = np.random.random(size=rows).astype(np.float32) df_all_drivers["acc_rate"] = np.random.random(size=rows).astype(np.float32) df_all_drivers["avg_daily_trips"] = np.random.randint(0, 1000, size=rows).astype( - np.int32 + np.int64 ) df_all_drivers["created"] = pd.to_datetime(pd.Timestamp.now(tz=None).round("ms")) + # Complex type columns for Map, Json, and Struct examples + import json as _json + + df_all_drivers["driver_metadata"] = [ + { + "vehicle_type": np.random.choice(["sedan", "suv", "truck"]), + "rating": str(round(np.random.uniform(3.0, 5.0), 1)), + } + for _ in range(len(df_all_drivers)) + ] + df_all_drivers["driver_config"] = [ + _json.dumps( + { + "max_distance_km": int(np.random.randint(10, 200)), + "preferred_zones": list( + np.random.choice( + ["north", "south", "east", "west"], size=2, replace=False + ) + ), + } + ) + for _ in range(len(df_all_drivers)) + ] + df_all_drivers["driver_profile"] = [ + {"name": f"driver_{driver_id}", "age": str(int(np.random.randint(25, 60)))} + for driver_id in df_all_drivers["driver_id"] + ] + # Create duplicate rows that should be filtered by created timestamp # TODO: These duplicate rows area indirectly being filtered out by the point in time join already. We need to # inject a bad row at a timestamp where we know it will get joined to the entity dataframe, and then test that diff --git a/sdk/python/feast/feature_view.py b/sdk/python/feast/feature_view.py index 20a8dbba6ab..b6774170fbe 100644 --- a/sdk/python/feast/feature_view.py +++ b/sdk/python/feast/feature_view.py @@ -109,6 +109,7 @@ class FeatureView(BaseFeatureView): owner: str materialization_intervals: List[Tuple[datetime, datetime]] mode: Optional[Union["TransformationMode", str]] + enable_validation: bool def __init__( self, @@ -125,6 +126,7 @@ def __init__( tags: Optional[Dict[str, str]] = None, owner: str = "", mode: Optional[Union["TransformationMode", str]] = None, + enable_validation: bool = False, ): """ Creates a FeatureView object. @@ -150,11 +152,14 @@ def __init__( primary maintainer. mode (optional): The transformation mode for feature transformations. Only meaningful when transformations are applied. Choose from TransformationMode enum values. + enable_validation (optional): If True, enables schema validation during materialization + to check that data conforms to the declared feature types. Default is False. Raises: ValueError: A field mapping conflicts with an Entity or a Feature. """ self.name = name + self.enable_validation = enable_validation self.entities = [e.name for e in entities] if entities else [DUMMY_ENTITY_NAME] self.ttl = ttl schema = schema or [] @@ -281,6 +286,7 @@ def __copy__(self): online=self.online, offline=self.offline, sink_source=self.batch_source if self.source_views else None, + enable_validation=self.enable_validation, ) # This is deliberately set outside of the FV initialization as we do not have the Entity objects. @@ -309,6 +315,7 @@ def __eq__(self, other): or sorted(self.entity_columns) != sorted(other.entity_columns) or self.source_views != other.source_views or self.materialization_intervals != other.materialization_intervals + or self.enable_validation != other.enable_validation ): return False @@ -447,6 +454,7 @@ def to_proto_spec( source_views=source_view_protos, feature_transformation=feature_transformation_proto, mode=mode_to_string(self.mode), + enable_validation=self.enable_validation, ) def to_proto_meta(self): @@ -616,6 +624,9 @@ def _from_proto_internal( f"Entities: {feature_view.entities} vs Entity Columns: {feature_view.entity_columns}" ) + # Restore enable_validation from proto field. + feature_view.enable_validation = feature_view_proto.spec.enable_validation + # FeatureViewProjections are not saved in the FeatureView proto. # Create the default projection. feature_view.projection = FeatureViewProjection.from_feature_view_definition( diff --git a/sdk/python/feast/field.py b/sdk/python/feast/field.py index 27552878afc..c61ed6a5c5e 100644 --- a/sdk/python/feast/field.py +++ b/sdk/python/feast/field.py @@ -12,15 +12,18 @@ # See the License for the specific language governing permissions and # limitations under the License. +import json from typing import Dict, Optional from typeguard import typechecked from feast.feature import Feature from feast.protos.feast.core.Feature_pb2 import FeatureSpecV2 as FieldProto -from feast.types import FeastType, from_value_type +from feast.types import FeastType, Struct, from_value_type from feast.value_type import ValueType +STRUCT_SCHEMA_TAG = "feast:struct_schema" + @typechecked class Field: @@ -115,13 +118,21 @@ def __str__(self): def to_proto(self) -> FieldProto: """Converts a Field object to its protobuf representation.""" + from feast.types import Array + value_type = self.dtype.to_value_type() vector_search_metric = self.vector_search_metric or "" + tags = dict(self.tags) + # Persist Struct field schema in tags + if isinstance(self.dtype, Struct): + tags[STRUCT_SCHEMA_TAG] = _serialize_struct_schema(self.dtype) + elif isinstance(self.dtype, Array) and isinstance(self.dtype.base_type, Struct): + tags[STRUCT_SCHEMA_TAG] = _serialize_struct_schema(self.dtype.base_type) return FieldProto( name=self.name, value_type=value_type.value, description=self.description, - tags=self.tags, + tags=tags, vector_index=self.vector_index, vector_length=self.vector_length, vector_search_metric=vector_search_metric, @@ -136,13 +147,30 @@ def from_proto(cls, field_proto: FieldProto): field_proto: FieldProto protobuf object """ value_type = ValueType(field_proto.value_type) + tags = dict(field_proto.tags) vector_search_metric = getattr(field_proto, "vector_search_metric", "") vector_index = getattr(field_proto, "vector_index", False) vector_length = getattr(field_proto, "vector_length", 0) + + # Reconstruct Struct type from persisted schema in tags + from feast.types import Array + + dtype: FeastType + if value_type == ValueType.STRUCT and STRUCT_SCHEMA_TAG in tags: + dtype = _deserialize_struct_schema(tags[STRUCT_SCHEMA_TAG]) + user_tags = {k: v for k, v in tags.items() if k != STRUCT_SCHEMA_TAG} + elif value_type == ValueType.STRUCT_LIST and STRUCT_SCHEMA_TAG in tags: + inner_struct = _deserialize_struct_schema(tags[STRUCT_SCHEMA_TAG]) + dtype = Array(inner_struct) + user_tags = {k: v for k, v in tags.items() if k != STRUCT_SCHEMA_TAG} + else: + dtype = from_value_type(value_type=value_type) + user_tags = tags + return cls( name=field_proto.name, - dtype=from_value_type(value_type=value_type), - tags=dict(field_proto.tags), + dtype=dtype, + tags=user_tags, description=field_proto.description, vector_index=vector_index, vector_length=vector_length, @@ -163,3 +191,75 @@ def from_feature(cls, feature: Feature): description=feature.description, tags=feature.labels, ) + + +def _feast_type_to_str(feast_type: FeastType) -> str: + """Convert a FeastType to a string representation for serialization.""" + from feast.types import ( + Array, + PrimitiveFeastType, + ) + + if isinstance(feast_type, PrimitiveFeastType): + return feast_type.name + elif isinstance(feast_type, Struct): + nested = { + name: _feast_type_to_str(ft) for name, ft in feast_type.fields.items() + } + return json.dumps({"__struct__": nested}) + elif isinstance(feast_type, Array): + return f"Array({_feast_type_to_str(feast_type.base_type)})" + else: + return str(feast_type) + + +def _str_to_feast_type(type_str: str) -> FeastType: + """Convert a string representation back to a FeastType.""" + from feast.types import ( + Array, + PrimitiveFeastType, + ) + + # Check if it's an Array type + if type_str.startswith("Array(") and type_str.endswith(")"): + inner = type_str[6:-1] + base_type = _str_to_feast_type(inner) + return Array(base_type) + + # Check if it's a nested Struct (JSON encoded) + if type_str.startswith("{"): + try: + parsed = json.loads(type_str) + if "__struct__" in parsed: + fields = { + name: _str_to_feast_type(ft_str) + for name, ft_str in parsed["__struct__"].items() + } + return Struct(fields) + except (json.JSONDecodeError, TypeError): + pass + + # Must be a PrimitiveFeastType name + try: + return PrimitiveFeastType[type_str] + except KeyError: + from feast.types import String + + return String + + +def _serialize_struct_schema(struct_type: Struct) -> str: + """Serialize a Struct's field schema to a JSON string for tag storage.""" + schema_dict = {} + for name, feast_type in struct_type.fields.items(): + schema_dict[name] = _feast_type_to_str(feast_type) + return json.dumps(schema_dict) + + +def _deserialize_struct_schema(schema_str: str) -> Struct: + """Deserialize a JSON string from tags back to a Struct type.""" + schema_dict = json.loads(schema_str) + fields = {} + for name, type_str in schema_dict.items(): + fields[name] = _str_to_feast_type(type_str) + return Struct(fields) diff --git a/sdk/python/feast/infra/compute_engines/local/feature_builder.py b/sdk/python/feast/infra/compute_engines/local/feature_builder.py index 3463c0e074b..754a00db76f 100644 --- a/sdk/python/feast/infra/compute_engines/local/feature_builder.py +++ b/sdk/python/feast/infra/compute_engines/local/feature_builder.py @@ -1,3 +1,4 @@ +import logging from typing import Union from feast.aggregation import aggregation_specs_to_agg_ops @@ -16,6 +17,9 @@ LocalValidationNode, ) from feast.infra.registry.base_registry import BaseRegistry +from feast.types import PrimitiveFeastType, from_feast_to_pyarrow_type + +logger = logging.getLogger(__name__) class LocalFeatureBuilder(FeatureBuilder): @@ -88,7 +92,36 @@ def build_transformation_node(self, view, input_nodes): return node def build_validation_node(self, view, input_node): - validation_config = view.validation_config + validation_config = getattr(view, "validation_config", None) or {} + + if not validation_config.get("columns") and hasattr(view, "features"): + columns = {} + json_columns = set() + for feature in view.features: + try: + columns[feature.name] = from_feast_to_pyarrow_type(feature.dtype) + except (ValueError, KeyError): + logger.debug( + "Could not resolve PyArrow type for feature '%s' " + "(dtype=%s), skipping type check for this column.", + feature.name, + feature.dtype, + ) + columns[feature.name] = None + # Track which columns are Json type for content validation + if ( + isinstance(feature.dtype, PrimitiveFeastType) + and feature.dtype.name == "JSON" + ): + json_columns.add(feature.name) + if columns: + validation_config = {**validation_config, "columns": columns} + if json_columns: + validation_config = { + **validation_config, + "json_columns": json_columns, + } + node = LocalValidationNode( "validate", validation_config, self.backend, inputs=[input_node] ) diff --git a/sdk/python/feast/infra/compute_engines/local/nodes.py b/sdk/python/feast/infra/compute_engines/local/nodes.py index 985a089daae..db65761a5e2 100644 --- a/sdk/python/feast/infra/compute_engines/local/nodes.py +++ b/sdk/python/feast/infra/compute_engines/local/nodes.py @@ -1,5 +1,7 @@ +import json +import logging from datetime import datetime, timedelta -from typing import List, Optional, Union +from typing import List, Optional, Set, Union import pyarrow as pa @@ -19,6 +21,8 @@ ) from feast.utils import _convert_arrow_to_proto +logger = logging.getLogger(__name__) + ENTITY_TS_ALIAS = "__entity_event_timestamp" @@ -236,15 +240,114 @@ def __init__( def execute(self, context: ExecutionContext) -> ArrowTableValue: input_table = self.get_single_table(context).data - df = self.backend.from_arrow(input_table) - # Placeholder for actual validation logic + if self.validation_config: - print(f"[Validation: {self.name}] Passed.") - result = self.backend.to_arrow(df) - output = ArrowTableValue(result) + self._validate_schema(input_table) + + output = ArrowTableValue(input_table) context.node_outputs[self.name] = output return output + def _validate_schema(self, table: pa.Table): + """Validate that the input table conforms to the expected schema. + + Checks that all expected columns are present, that their types + are compatible with the declared Feast types, and that Json columns + contain well-formed JSON. Logs warnings for type mismatches but + raises on missing columns or invalid JSON content. + """ + expected_columns = self.validation_config.get("columns", {}) + if not expected_columns: + logger.debug( + "[Validation: %s] No column schema to validate against.", + self.name, + ) + return + + actual_columns = set(table.column_names) + expected_names = set(expected_columns.keys()) + + missing = expected_names - actual_columns + if missing: + raise ValueError( + f"[Validation: {self.name}] Missing expected columns: {missing}. " + f"Actual columns: {sorted(actual_columns)}" + ) + + for col_name, expected_type in expected_columns.items(): + actual_type = table.schema.field(col_name).type + if expected_type is not None and actual_type != expected_type: + # PyArrow map columns and struct columns are compatible + # with the Feast Map type — skip warning for these cases + if pa.types.is_map(expected_type) and ( + pa.types.is_map(actual_type) + or pa.types.is_struct(actual_type) + or pa.types.is_large_list(actual_type) + or pa.types.is_list(actual_type) + ): + continue + + # JSON type (large_string) is compatible with string types + if pa.types.is_large_string(expected_type) and ( + pa.types.is_string(actual_type) + or pa.types.is_large_string(actual_type) + ): + continue + + # Struct type — expected struct is compatible with actual + # struct or map representations + if pa.types.is_struct(expected_type) and ( + pa.types.is_struct(actual_type) + or pa.types.is_map(actual_type) + or pa.types.is_list(actual_type) + ): + continue + + logger.warning( + "[Validation: %s] Column '%s' type mismatch: expected %s, got %s", + self.name, + col_name, + expected_type, + actual_type, + ) + + # Validate JSON well-formedness for declared Json columns + json_columns: Set[str] = self.validation_config.get("json_columns", set()) + for col_name in json_columns: + if col_name not in actual_columns: + continue + + column = table.column(col_name) + invalid_count = 0 + first_error = None + first_error_row = None + + for i in range(len(column)): + value = column[i] + if not value.is_valid: + continue + + str_value = value.as_py() + if not isinstance(str_value, str): + continue + + try: + json.loads(str_value) + except (json.JSONDecodeError, TypeError) as e: + invalid_count += 1 + if first_error is None: + first_error = str(e) + first_error_row = i + + if invalid_count > 0: + raise ValueError( + f"[Validation: {self.name}] Column '{col_name}' declared as Json " + f"contains {invalid_count} invalid JSON value(s). " + f"First error at row {first_error_row}: {first_error}" + ) + + logger.debug("[Validation: %s] Schema validation passed.", self.name) + class LocalOutputNode(LocalNode): def __init__( diff --git a/sdk/python/feast/infra/compute_engines/ray/feature_builder.py b/sdk/python/feast/infra/compute_engines/ray/feature_builder.py index 274fe87599c..a9830162c1e 100644 --- a/sdk/python/feast/infra/compute_engines/ray/feature_builder.py +++ b/sdk/python/feast/infra/compute_engines/ray/feature_builder.py @@ -17,8 +17,10 @@ RayJoinNode, RayReadNode, RayTransformationNode, + RayValidationNode, RayWriteNode, ) +from feast.types import PrimitiveFeastType, from_feast_to_pyarrow_type if TYPE_CHECKING: from feast.infra.compute_engines.ray.config import RayComputeEngineConfig @@ -174,11 +176,36 @@ def build_output_nodes(self, view, final_node): def build_validation_node(self, view, input_node): """Build the validation node for feature validation.""" - # TODO: Implement validation logic - logger.warning( - "Feature validation is not yet implemented for Ray compute engine." + expected_columns = {} + json_columns: set = set() + if hasattr(view, "features"): + for feature in view.features: + try: + expected_columns[feature.name] = from_feast_to_pyarrow_type( + feature.dtype + ) + except (ValueError, KeyError): + logger.debug( + "Could not resolve PyArrow type for feature '%s' " + "(dtype=%s), skipping type check for this column.", + feature.name, + feature.dtype, + ) + expected_columns[feature.name] = None + if ( + isinstance(feature.dtype, PrimitiveFeastType) + and feature.dtype.name == "JSON" + ): + json_columns.add(feature.name) + + node = RayValidationNode( + f"{view.name}:validate", + expected_columns=expected_columns, + json_columns=json_columns, + inputs=[input_node], ) - return input_node + self.nodes.append(node) + return node def _build(self, view, input_nodes: Optional[List[DAGNode]]) -> DAGNode: has_physical_source = (hasattr(view, "batch_source") and view.batch_source) or ( diff --git a/sdk/python/feast/infra/compute_engines/ray/nodes.py b/sdk/python/feast/infra/compute_engines/ray/nodes.py index 89694a57e2d..7d600728adf 100644 --- a/sdk/python/feast/infra/compute_engines/ray/nodes.py +++ b/sdk/python/feast/infra/compute_engines/ray/nodes.py @@ -1,6 +1,7 @@ +import json import logging from datetime import datetime, timedelta, timezone -from typing import Dict, List, Optional, Union +from typing import Dict, List, Optional, Set, Union import dill import pandas as pd @@ -847,3 +848,127 @@ def write_batch_with_serialized_artifacts(batch: pd.DataFrame) -> pd.DataFrame: ), }, ) + + +class RayValidationNode(DAGNode): + """ + Ray node for validating feature data against the declared schema. + + Checks that all expected columns are present and logs warnings for + type mismatches. Validation runs once on the first batch to avoid + per-batch overhead; the full dataset is passed through unchanged. + """ + + def __init__( + self, + name: str, + expected_columns: Dict[str, Optional[pa.DataType]], + json_columns: Optional[Set[str]] = None, + inputs: Optional[List[DAGNode]] = None, + ): + super().__init__(name, inputs=inputs) + self.expected_columns = expected_columns + self.json_columns = json_columns or set() + + def execute(self, context: ExecutionContext) -> DAGValue: + input_value = self.get_single_input_value(context) + dataset = input_value.data + + if not self.expected_columns: + context.node_outputs[self.name] = input_value + return input_value + + expected_names = set(self.expected_columns.keys()) + + schema = dataset.schema() + actual_columns = set(schema.names) + + missing = expected_names - actual_columns + if missing: + raise ValueError( + f"[Validation: {self.name}] Missing expected columns: {missing}. " + f"Actual columns: {sorted(actual_columns)}" + ) + + for col_name, expected_type in self.expected_columns.items(): + if expected_type is None: + continue + actual_field = schema.field(col_name) + actual_type = actual_field.type + if actual_type != expected_type: + # Map type compatibility + if pa.types.is_map(expected_type) and ( + pa.types.is_map(actual_type) + or pa.types.is_struct(actual_type) + or pa.types.is_list(actual_type) + ): + continue + + # JSON type compatibility (large_string / string) + if pa.types.is_large_string(expected_type) and ( + pa.types.is_string(actual_type) + or pa.types.is_large_string(actual_type) + ): + continue + + # Struct type compatibility + if pa.types.is_struct(expected_type) and ( + pa.types.is_struct(actual_type) + or pa.types.is_map(actual_type) + or pa.types.is_list(actual_type) + ): + continue + + logger.warning( + "[Validation: %s] Column '%s' type mismatch: expected %s, got %s", + self.name, + col_name, + expected_type, + actual_type, + ) + + # Validate JSON well-formedness for declared Json columns + if self.json_columns: + try: + first_batch = dataset.take_batch(1000) + except Exception: + logger.debug( + "[Validation: %s] Could not sample batch for JSON validation.", + self.name, + ) + first_batch = None + + if first_batch is not None: + for col_name in self.json_columns: + if col_name not in first_batch: + continue + + values = first_batch[col_name] + invalid_count = 0 + first_error = None + first_error_row = None + + for i, value in enumerate(values): + if value is None: + continue + if not isinstance(value, str): + continue + try: + json.loads(value) + except (json.JSONDecodeError, TypeError) as e: + invalid_count += 1 + if first_error is None: + first_error = str(e) + first_error_row = i + + if invalid_count > 0: + raise ValueError( + f"[Validation: {self.name}] Column '{col_name}' declared " + f"as Json contains {invalid_count} invalid JSON value(s) " + f"in sampled batch. First error at row {first_error_row}: " + f"{first_error}" + ) + + logger.debug("[Validation: %s] Schema validation passed.", self.name) + context.node_outputs[self.name] = input_value + return input_value diff --git a/sdk/python/feast/infra/compute_engines/spark/feature_builder.py b/sdk/python/feast/infra/compute_engines/spark/feature_builder.py index 11a3c1587f6..94f29220513 100644 --- a/sdk/python/feast/infra/compute_engines/spark/feature_builder.py +++ b/sdk/python/feast/infra/compute_engines/spark/feature_builder.py @@ -1,3 +1,4 @@ +import logging from typing import Union from pyspark.sql import SparkSession @@ -12,9 +13,14 @@ SparkJoinNode, SparkReadNode, SparkTransformationNode, + SparkValidationNode, SparkWriteNode, + from_feast_to_spark_type, ) from feast.infra.registry.base_registry import BaseRegistry +from feast.types import PrimitiveFeastType + +logger = logging.getLogger(__name__) class SparkFeatureBuilder(FeatureBuilder): @@ -115,4 +121,30 @@ def build_output_nodes(self, view, input_node): return node def build_validation_node(self, view, input_node): - pass + expected_columns = {} + json_columns: set = set() + if hasattr(view, "features"): + for feature in view.features: + spark_type = from_feast_to_spark_type(feature.dtype) + if spark_type is None: + logger.debug( + "Could not resolve Spark type for feature '%s' " + "(dtype=%s), skipping type check for this column.", + feature.name, + feature.dtype, + ) + expected_columns[feature.name] = spark_type + if ( + isinstance(feature.dtype, PrimitiveFeastType) + and feature.dtype.name == "JSON" + ): + json_columns.add(feature.name) + + node = SparkValidationNode( + f"{view.name}:validate", + expected_columns=expected_columns, + json_columns=json_columns, + inputs=[input_node], + ) + self.nodes.append(node) + return node diff --git a/sdk/python/feast/infra/compute_engines/spark/nodes.py b/sdk/python/feast/infra/compute_engines/spark/nodes.py index 124ce65ff90..d5463a9a16d 100644 --- a/sdk/python/feast/infra/compute_engines/spark/nodes.py +++ b/sdk/python/feast/infra/compute_engines/spark/nodes.py @@ -1,10 +1,28 @@ +import json +import logging from datetime import datetime, timedelta -from typing import Callable, List, Optional, Union, cast +from typing import Callable, Dict, List, Optional, Set, Union, cast import pandas as pd from pyspark.sql import DataFrame, SparkSession, Window from pyspark.sql import functions as F from pyspark.sql.pandas.types import from_arrow_schema +from pyspark.sql.types import ( + ArrayType, + BinaryType, + BooleanType, + DoubleType, + FloatType, + IntegerType, + LongType, + MapType, + StringType, + StructType, + TimestampType, +) +from pyspark.sql.types import ( + DataType as SparkDataType, +) from feast import BatchFeatureView, StreamFeatureView from feast.aggregation import Aggregation @@ -29,6 +47,103 @@ infer_event_timestamp_from_entity_df, ) +logger = logging.getLogger(__name__) + + +def from_feast_to_spark_type(feast_type) -> Optional[SparkDataType]: + """Convert a Feast type to a PySpark DataType. + + Returns None if the Feast type cannot be mapped. + """ + from feast.types import ( + Array, + PrimitiveFeastType, + Set, + Struct, + ) + + if isinstance(feast_type, Struct): + from pyspark.sql.types import StructField + + spark_fields = [] + for name, ftype in feast_type.fields.items(): + spark_type = from_feast_to_spark_type(ftype) + if spark_type is None: + return None + spark_fields.append(StructField(name, spark_type, nullable=True)) + return StructType(spark_fields) + + if isinstance(feast_type, PrimitiveFeastType): + mapping = { + PrimitiveFeastType.BYTES: BinaryType(), + PrimitiveFeastType.STRING: StringType(), + PrimitiveFeastType.INT32: IntegerType(), + PrimitiveFeastType.INT64: LongType(), + PrimitiveFeastType.FLOAT64: DoubleType(), + PrimitiveFeastType.FLOAT32: FloatType(), + PrimitiveFeastType.BOOL: BooleanType(), + PrimitiveFeastType.UNIX_TIMESTAMP: TimestampType(), + PrimitiveFeastType.MAP: MapType(StringType(), StringType()), + PrimitiveFeastType.JSON: StringType(), + } + return mapping.get(feast_type) + + if isinstance(feast_type, Array): + base_type = feast_type.base_type + if isinstance(base_type, Struct): + inner = from_feast_to_spark_type(base_type) + return ArrayType(inner) if inner else None + if isinstance(base_type, PrimitiveFeastType): + if base_type == PrimitiveFeastType.MAP: + return ArrayType(MapType(StringType(), StringType())) + inner = from_feast_to_spark_type(base_type) + return ArrayType(inner) if inner else None + + if isinstance(feast_type, Set): + inner = from_feast_to_spark_type(feast_type.base_type) + return ArrayType(inner) if inner else None + + return None + + +def _spark_types_compatible(expected: SparkDataType, actual: SparkDataType) -> bool: + """Check if two Spark types are compatible for validation purposes. + + Exact match is always compatible. Beyond that, we allow common + representations that arise from different data source encodings. + """ + if expected == actual: + return True + + # Map ↔ Struct: data sources may encode maps as structs or vice versa + if isinstance(expected, MapType) and isinstance(actual, (MapType, StructType)): + return True + if isinstance(expected, StructType) and isinstance(actual, (StructType, MapType)): + return True + + # Json (StringType) is always compatible with StringType + if isinstance(expected, StringType) and isinstance(actual, StringType): + return True + + # Integer widening: IntegerType ↔ LongType + if isinstance(expected, (IntegerType, LongType)) and isinstance( + actual, (IntegerType, LongType) + ): + return True + + # Float widening: FloatType ↔ DoubleType + if isinstance(expected, (FloatType, DoubleType)) and isinstance( + actual, (FloatType, DoubleType) + ): + return True + + # Array compatibility: compare element types + if isinstance(expected, ArrayType) and isinstance(actual, ArrayType): + return _spark_types_compatible(expected.elementType, actual.elementType) + + return False + + ENTITY_TS_ALIAS = "__entity_event_timestamp" @@ -510,3 +625,107 @@ def execute(self, context: ExecutionContext) -> DAGValue: return DAGValue( data=transformed_df, format=DAGFormat.SPARK, metadata={"transformed": True} ) + + +class SparkValidationNode(DAGNode): + """ + Spark node for validating feature data against the declared schema. + + Checks that all expected columns are present in the Spark DataFrame, + validates column types using native Spark types, and checks JSON + well-formedness for Json columns. + """ + + def __init__( + self, + name: str, + expected_columns: Dict[str, Optional[SparkDataType]], + json_columns: Optional[Set[str]] = None, + inputs: Optional[List[DAGNode]] = None, + ): + super().__init__(name, inputs=inputs) + self.expected_columns = expected_columns + self.json_columns = json_columns or set() + + def execute(self, context: ExecutionContext) -> DAGValue: + input_value = self.get_single_input_value(context) + input_value.assert_format(DAGFormat.SPARK) + spark_df: DataFrame = input_value.data + + if not self.expected_columns: + context.node_outputs[self.name] = input_value + return input_value + + self._validate_schema(spark_df) + + logger.debug("[Validation: %s] Schema validation passed.", self.name) + context.node_outputs[self.name] = input_value + return input_value + + def _validate_schema(self, spark_df: DataFrame): + """Validate the Spark DataFrame against the expected schema. + + Checks for missing columns, type mismatches using native Spark types, + and JSON well-formedness for declared Json columns. + """ + actual_columns = set(spark_df.columns) + expected_names = set(self.expected_columns.keys()) + + missing = expected_names - actual_columns + if missing: + raise ValueError( + f"[Validation: {self.name}] Missing expected columns: {missing}. " + f"Actual columns: {sorted(actual_columns)}" + ) + + # Type validation using native Spark types + schema = spark_df.schema + for col_name, expected_type in self.expected_columns.items(): + if expected_type is None: + continue + try: + actual_field = schema[col_name] + except (KeyError, IndexError): + continue + actual_type = actual_field.dataType + if not _spark_types_compatible(expected_type, actual_type): + logger.warning( + "[Validation: %s] Column '%s' type mismatch: expected %s, got %s", + self.name, + col_name, + expected_type.simpleString(), + actual_type.simpleString(), + ) + + # Validate JSON well-formedness for declared Json columns + if self.json_columns: + sample_rows = spark_df.limit(1000).collect() + for col_name in self.json_columns: + if col_name not in actual_columns: + continue + + invalid_count = 0 + first_error = None + first_error_row = None + + for i, row in enumerate(sample_rows): + value = row[col_name] + if value is None: + continue + if not isinstance(value, str): + continue + try: + json.loads(value) + except (json.JSONDecodeError, TypeError) as e: + invalid_count += 1 + if first_error is None: + first_error = str(e) + first_error_row = i + + if invalid_count > 0: + raise ValueError( + f"[Validation: {self.name}] Column '{col_name}' declared as " + f"Json contains {invalid_count} invalid JSON value(s) in " + f"sampled rows. First error at row {first_error_row}: " + f"{first_error}" + ) diff --git a/sdk/python/feast/infra/offline_stores/contrib/trino_offline_store/trino_type_map.py b/sdk/python/feast/infra/offline_stores/contrib/trino_offline_store/trino_type_map.py index e5afa3f3ab3..a11298e9b81 100644 --- a/sdk/python/feast/infra/offline_stores/contrib/trino_offline_store/trino_type_map.py +++ b/sdk/python/feast/infra/offline_stores/contrib/trino_offline_store/trino_type_map.py @@ -69,6 +69,15 @@ def pa_to_trino_value_type(pa_type_as_str: str) -> str: if pa_type_as_str.startswith("decimal"): return trino_type.format(pa_type_as_str) + if pa_type_as_str.startswith("map<"): + return trino_type.format("varchar") + + if pa_type_as_str == "large_string": + return trino_type.format("varchar") + + if pa_type_as_str.startswith("struct<"): + return trino_type.format("varchar") + type_map = { "null": "null", "bool": "boolean", diff --git a/sdk/python/feast/infra/offline_stores/file_source.py b/sdk/python/feast/infra/offline_stores/file_source.py index 02d40ad770b..76460a73e5c 100644 --- a/sdk/python/feast/infra/offline_stores/file_source.py +++ b/sdk/python/feast/infra/offline_stores/file_source.py @@ -1,3 +1,4 @@ +import logging from pathlib import Path from typing import Callable, Dict, Iterable, List, Optional, Tuple, Union from urllib.parse import urlparse @@ -24,6 +25,8 @@ from feast.saved_dataset import SavedDatasetStorage from feast.value_type import ValueType +logger = logging.getLogger(__name__) + @typechecked class FileSource(DataSource): @@ -151,8 +154,43 @@ def _to_proto_impl(self) -> DataSourceProto: return data_source_proto def validate(self, config: RepoConfig): - # TODO: validate a FileSource - pass + """Validate that the file source exists and is readable. + + Checks that the path resolves to an existing Parquet or Delta file + and that the declared timestamp column is present in the schema. + """ + from feast.infra.offline_stores.file_source import FileSource + + uri = self.path + repo_path = config.repo_path if hasattr(config, "repo_path") else None + resolved = FileSource.get_uri_for_file_path(repo_path, uri) + + try: + filesystem, path = FileSystem.from_uri(resolved) + file_info = filesystem.get_file_info(path) + if file_info.type == pyarrow.fs.FileType.NotFound: + raise FileNotFoundError(f"FileSource path does not exist: {resolved}") + except Exception as e: + logger.warning("Could not validate FileSource path '%s': %s", resolved, e) + return + + try: + if isinstance(self.file_options.file_format, DeltaFormat): + return + pq_dataset = ParquetDataset(path, filesystem=filesystem) + schema = pq_dataset.schema + if self.timestamp_field and self.timestamp_field not in schema.names: + logger.warning( + "Timestamp field '%s' not found in FileSource schema at '%s'. " + "Available columns: %s", + self.timestamp_field, + resolved, + schema.names, + ) + except Exception as e: + logger.warning( + "Could not read schema from FileSource '%s': %s", resolved, e + ) @staticmethod def source_datatype_to_feast_value_type() -> Callable[[str], ValueType]: diff --git a/sdk/python/feast/infra/online_stores/dynamodb.py b/sdk/python/feast/infra/online_stores/dynamodb.py index 0353e2c2d72..9a2d57a3278 100644 --- a/sdk/python/feast/infra/online_stores/dynamodb.py +++ b/sdk/python/feast/infra/online_stores/dynamodb.py @@ -945,6 +945,12 @@ def _extract_list_values(self, value_proto: ValueProto) -> list: return list(value_proto.bool_list_val.val) elif value_proto.HasField("bytes_list_val"): return list(value_proto.bytes_list_val.val) + elif value_proto.HasField("map_list_val"): + return list(value_proto.map_list_val.val) + elif value_proto.HasField("json_list_val"): + return list(value_proto.json_list_val.val) + elif value_proto.HasField("struct_list_val"): + return list(value_proto.struct_list_val.val) return [] def _set_list_values( @@ -965,6 +971,12 @@ def _set_list_values( result.bool_list_val.val.extend(values) elif template.HasField("bytes_list_val"): result.bytes_list_val.val.extend(values) + elif template.HasField("map_list_val"): + result.map_list_val.val.extend(values) + elif template.HasField("json_list_val"): + result.json_list_val.val.extend(values) + elif template.HasField("struct_list_val"): + result.struct_list_val.val.extend(values) async def _update_item_with_expression_async( self, diff --git a/sdk/python/feast/infra/online_stores/milvus_online_store/milvus.py b/sdk/python/feast/infra/online_stores/milvus_online_store/milvus.py index 42a8f359107..86c2aaad0e6 100644 --- a/sdk/python/feast/infra/online_stores/milvus_online_store/milvus.py +++ b/sdk/python/feast/infra/online_stores/milvus_online_store/milvus.py @@ -56,6 +56,12 @@ PROTO_VALUE_TO_VALUE_TYPE_MAP["int64_list_val"]: DataType.FLOAT_VECTOR, PROTO_VALUE_TO_VALUE_TYPE_MAP["double_list_val"]: DataType.FLOAT_VECTOR, PROTO_VALUE_TO_VALUE_TYPE_MAP["bool_list_val"]: DataType.BINARY_VECTOR, + PROTO_VALUE_TO_VALUE_TYPE_MAP["map_val"]: DataType.VARCHAR, + PROTO_VALUE_TO_VALUE_TYPE_MAP["map_list_val"]: DataType.VARCHAR, + PROTO_VALUE_TO_VALUE_TYPE_MAP["json_val"]: DataType.VARCHAR, + PROTO_VALUE_TO_VALUE_TYPE_MAP["json_list_val"]: DataType.VARCHAR, + PROTO_VALUE_TO_VALUE_TYPE_MAP["struct_val"]: DataType.VARCHAR, + PROTO_VALUE_TO_VALUE_TYPE_MAP["struct_list_val"]: DataType.VARCHAR, } FEAST_PRIMITIVE_TO_MILVUS_TYPE_MAPPING: Dict[ @@ -433,6 +439,19 @@ def online_read( "double_list_val", ]: getattr(val, proto_attr).val.extend(field_value) + elif proto_attr in [ + "map_val", + "map_list_val", + "struct_val", + "struct_list_val", + "json_list_val", + ]: + if isinstance(field_value, str) and field_value: + try: + proto_bytes = base64.b64decode(field_value) + val.ParseFromString(proto_bytes) + except Exception: + setattr(val, "string_val", field_value) else: setattr(val, proto_attr, field_value) else: diff --git a/sdk/python/feast/protos/feast/core/DatastoreTable_pb2.pyi b/sdk/python/feast/protos/feast/core/DatastoreTable_pb2.pyi index 7b5a629eb7a..6339a97536e 100644 --- a/sdk/python/feast/protos/feast/core/DatastoreTable_pb2.pyi +++ b/sdk/python/feast/protos/feast/core/DatastoreTable_pb2.pyi @@ -1,19 +1,19 @@ """ @generated by mypy-protobuf. Do not edit manually! isort:skip_file - -* Copyright 2021 The Feast Authors -* -* Licensed under the Apache License, Version 2.0 (the "License"); -* you may not use this file except in compliance with the License. -* You may obtain a copy of the License at -* -* https://www.apache.org/licenses/LICENSE-2.0 -* -* Unless required by applicable law or agreed to in writing, software -* distributed under the License is distributed on an "AS IS" BASIS, -* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -* See the License for the specific language governing permissions and + +* Copyright 2021 The Feast Authors +* +* Licensed under the Apache License, Version 2.0 (the "License"); +* you may not use this file except in compliance with the License. +* You may obtain a copy of the License at +* +* https://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an "AS IS" BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and * limitations under the License. """ import builtins diff --git a/sdk/python/feast/protos/feast/core/Entity_pb2.pyi b/sdk/python/feast/protos/feast/core/Entity_pb2.pyi index a5924a13451..025817edfee 100644 --- a/sdk/python/feast/protos/feast/core/Entity_pb2.pyi +++ b/sdk/python/feast/protos/feast/core/Entity_pb2.pyi @@ -1,19 +1,19 @@ """ @generated by mypy-protobuf. Do not edit manually! isort:skip_file - -* Copyright 2020 The Feast Authors -* -* Licensed under the Apache License, Version 2.0 (the "License"); -* you may not use this file except in compliance with the License. -* You may obtain a copy of the License at -* -* https://www.apache.org/licenses/LICENSE-2.0 -* -* Unless required by applicable law or agreed to in writing, software -* distributed under the License is distributed on an "AS IS" BASIS, -* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -* See the License for the specific language governing permissions and + +* Copyright 2020 The Feast Authors +* +* Licensed under the Apache License, Version 2.0 (the "License"); +* you may not use this file except in compliance with the License. +* You may obtain a copy of the License at +* +* https://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an "AS IS" BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and * limitations under the License. """ import builtins diff --git a/sdk/python/feast/protos/feast/core/FeatureViewProjection_pb2.pyi b/sdk/python/feast/protos/feast/core/FeatureViewProjection_pb2.pyi index 72426f55c9f..6b44ad4a931 100644 --- a/sdk/python/feast/protos/feast/core/FeatureViewProjection_pb2.pyi +++ b/sdk/python/feast/protos/feast/core/FeatureViewProjection_pb2.pyi @@ -19,7 +19,7 @@ else: DESCRIPTOR: google.protobuf.descriptor.FileDescriptor class FeatureViewProjection(google.protobuf.message.Message): - """A projection to be applied on top of a FeatureView. + """A projection to be applied on top of a FeatureView. Contains the modifications to a FeatureView such as the features subset to use. """ diff --git a/sdk/python/feast/protos/feast/core/FeatureView_pb2.py b/sdk/python/feast/protos/feast/core/FeatureView_pb2.py index 9a59255375f..0221a96031b 100644 --- a/sdk/python/feast/protos/feast/core/FeatureView_pb2.py +++ b/sdk/python/feast/protos/feast/core/FeatureView_pb2.py @@ -19,7 +19,7 @@ from feast.protos.feast.core import Transformation_pb2 as feast_dot_core_dot_Transformation__pb2 -DESCRIPTOR = _descriptor_pool.Default().AddSerializedFile(b'\n\x1c\x66\x65\x61st/core/FeatureView.proto\x12\nfeast.core\x1a\x1egoogle/protobuf/duration.proto\x1a\x1fgoogle/protobuf/timestamp.proto\x1a\x1b\x66\x65\x61st/core/DataSource.proto\x1a\x18\x66\x65\x61st/core/Feature.proto\x1a\x1f\x66\x65\x61st/core/Transformation.proto\"c\n\x0b\x46\x65\x61tureView\x12)\n\x04spec\x18\x01 \x01(\x0b\x32\x1b.feast.core.FeatureViewSpec\x12)\n\x04meta\x18\x02 \x01(\x0b\x32\x1b.feast.core.FeatureViewMeta\"\xd4\x04\n\x0f\x46\x65\x61tureViewSpec\x12\x0c\n\x04name\x18\x01 \x01(\t\x12\x0f\n\x07project\x18\x02 \x01(\t\x12\x10\n\x08\x65ntities\x18\x03 \x03(\t\x12+\n\x08\x66\x65\x61tures\x18\x04 \x03(\x0b\x32\x19.feast.core.FeatureSpecV2\x12\x33\n\x04tags\x18\x05 \x03(\x0b\x32%.feast.core.FeatureViewSpec.TagsEntry\x12&\n\x03ttl\x18\x06 \x01(\x0b\x32\x19.google.protobuf.Duration\x12,\n\x0c\x62\x61tch_source\x18\x07 \x01(\x0b\x32\x16.feast.core.DataSource\x12\x0e\n\x06online\x18\x08 \x01(\x08\x12-\n\rstream_source\x18\t \x01(\x0b\x32\x16.feast.core.DataSource\x12\x13\n\x0b\x64\x65scription\x18\n \x01(\t\x12\r\n\x05owner\x18\x0b \x01(\t\x12\x31\n\x0e\x65ntity_columns\x18\x0c \x03(\x0b\x32\x19.feast.core.FeatureSpecV2\x12\x0f\n\x07offline\x18\r \x01(\x08\x12\x31\n\x0csource_views\x18\x0e \x03(\x0b\x32\x1b.feast.core.FeatureViewSpec\x12\x43\n\x16\x66\x65\x61ture_transformation\x18\x0f \x01(\x0b\x32#.feast.core.FeatureTransformationV2\x12\x0c\n\x04mode\x18\x10 \x01(\t\x1a+\n\tTagsEntry\x12\x0b\n\x03key\x18\x01 \x01(\t\x12\r\n\x05value\x18\x02 \x01(\t:\x02\x38\x01\"\xcc\x01\n\x0f\x46\x65\x61tureViewMeta\x12\x35\n\x11\x63reated_timestamp\x18\x01 \x01(\x0b\x32\x1a.google.protobuf.Timestamp\x12:\n\x16last_updated_timestamp\x18\x02 \x01(\x0b\x32\x1a.google.protobuf.Timestamp\x12\x46\n\x19materialization_intervals\x18\x03 \x03(\x0b\x32#.feast.core.MaterializationInterval\"w\n\x17MaterializationInterval\x12.\n\nstart_time\x18\x01 \x01(\x0b\x32\x1a.google.protobuf.Timestamp\x12,\n\x08\x65nd_time\x18\x02 \x01(\x0b\x32\x1a.google.protobuf.Timestamp\"@\n\x0f\x46\x65\x61tureViewList\x12-\n\x0c\x66\x65\x61tureviews\x18\x01 \x03(\x0b\x32\x17.feast.core.FeatureViewBU\n\x10\x66\x65\x61st.proto.coreB\x10\x46\x65\x61tureViewProtoZ/github.com/feast-dev/feast/go/protos/feast/coreb\x06proto3') +DESCRIPTOR = _descriptor_pool.Default().AddSerializedFile(b'\n\x1c\x66\x65\x61st/core/FeatureView.proto\x12\nfeast.core\x1a\x1egoogle/protobuf/duration.proto\x1a\x1fgoogle/protobuf/timestamp.proto\x1a\x1b\x66\x65\x61st/core/DataSource.proto\x1a\x18\x66\x65\x61st/core/Feature.proto\x1a\x1f\x66\x65\x61st/core/Transformation.proto\"c\n\x0b\x46\x65\x61tureView\x12)\n\x04spec\x18\x01 \x01(\x0b\x32\x1b.feast.core.FeatureViewSpec\x12)\n\x04meta\x18\x02 \x01(\x0b\x32\x1b.feast.core.FeatureViewMeta\"\xef\x04\n\x0f\x46\x65\x61tureViewSpec\x12\x0c\n\x04name\x18\x01 \x01(\t\x12\x0f\n\x07project\x18\x02 \x01(\t\x12\x10\n\x08\x65ntities\x18\x03 \x03(\t\x12+\n\x08\x66\x65\x61tures\x18\x04 \x03(\x0b\x32\x19.feast.core.FeatureSpecV2\x12\x33\n\x04tags\x18\x05 \x03(\x0b\x32%.feast.core.FeatureViewSpec.TagsEntry\x12&\n\x03ttl\x18\x06 \x01(\x0b\x32\x19.google.protobuf.Duration\x12,\n\x0c\x62\x61tch_source\x18\x07 \x01(\x0b\x32\x16.feast.core.DataSource\x12\x0e\n\x06online\x18\x08 \x01(\x08\x12-\n\rstream_source\x18\t \x01(\x0b\x32\x16.feast.core.DataSource\x12\x13\n\x0b\x64\x65scription\x18\n \x01(\t\x12\r\n\x05owner\x18\x0b \x01(\t\x12\x31\n\x0e\x65ntity_columns\x18\x0c \x03(\x0b\x32\x19.feast.core.FeatureSpecV2\x12\x0f\n\x07offline\x18\r \x01(\x08\x12\x31\n\x0csource_views\x18\x0e \x03(\x0b\x32\x1b.feast.core.FeatureViewSpec\x12\x43\n\x16\x66\x65\x61ture_transformation\x18\x0f \x01(\x0b\x32#.feast.core.FeatureTransformationV2\x12\x0c\n\x04mode\x18\x10 \x01(\t\x12\x19\n\x11\x65nable_validation\x18\x11 \x01(\x08\x1a+\n\tTagsEntry\x12\x0b\n\x03key\x18\x01 \x01(\t\x12\r\n\x05value\x18\x02 \x01(\t:\x02\x38\x01\"\xcc\x01\n\x0f\x46\x65\x61tureViewMeta\x12\x35\n\x11\x63reated_timestamp\x18\x01 \x01(\x0b\x32\x1a.google.protobuf.Timestamp\x12:\n\x16last_updated_timestamp\x18\x02 \x01(\x0b\x32\x1a.google.protobuf.Timestamp\x12\x46\n\x19materialization_intervals\x18\x03 \x03(\x0b\x32#.feast.core.MaterializationInterval\"w\n\x17MaterializationInterval\x12.\n\nstart_time\x18\x01 \x01(\x0b\x32\x1a.google.protobuf.Timestamp\x12,\n\x08\x65nd_time\x18\x02 \x01(\x0b\x32\x1a.google.protobuf.Timestamp\"@\n\x0f\x46\x65\x61tureViewList\x12-\n\x0c\x66\x65\x61tureviews\x18\x01 \x03(\x0b\x32\x17.feast.core.FeatureViewBU\n\x10\x66\x65\x61st.proto.coreB\x10\x46\x65\x61tureViewProtoZ/github.com/feast-dev/feast/go/protos/feast/coreb\x06proto3') _globals = globals() _builder.BuildMessageAndEnumDescriptors(DESCRIPTOR, _globals) @@ -32,13 +32,13 @@ _globals['_FEATUREVIEW']._serialized_start=197 _globals['_FEATUREVIEW']._serialized_end=296 _globals['_FEATUREVIEWSPEC']._serialized_start=299 - _globals['_FEATUREVIEWSPEC']._serialized_end=895 - _globals['_FEATUREVIEWSPEC_TAGSENTRY']._serialized_start=852 - _globals['_FEATUREVIEWSPEC_TAGSENTRY']._serialized_end=895 - _globals['_FEATUREVIEWMETA']._serialized_start=898 - _globals['_FEATUREVIEWMETA']._serialized_end=1102 - _globals['_MATERIALIZATIONINTERVAL']._serialized_start=1104 - _globals['_MATERIALIZATIONINTERVAL']._serialized_end=1223 - _globals['_FEATUREVIEWLIST']._serialized_start=1225 - _globals['_FEATUREVIEWLIST']._serialized_end=1289 + _globals['_FEATUREVIEWSPEC']._serialized_end=922 + _globals['_FEATUREVIEWSPEC_TAGSENTRY']._serialized_start=879 + _globals['_FEATUREVIEWSPEC_TAGSENTRY']._serialized_end=922 + _globals['_FEATUREVIEWMETA']._serialized_start=925 + _globals['_FEATUREVIEWMETA']._serialized_end=1129 + _globals['_MATERIALIZATIONINTERVAL']._serialized_start=1131 + _globals['_MATERIALIZATIONINTERVAL']._serialized_end=1250 + _globals['_FEATUREVIEWLIST']._serialized_start=1252 + _globals['_FEATUREVIEWLIST']._serialized_end=1316 # @@protoc_insertion_point(module_scope) diff --git a/sdk/python/feast/protos/feast/core/FeatureView_pb2.pyi b/sdk/python/feast/protos/feast/core/FeatureView_pb2.pyi index a7115be8459..c5a54394320 100644 --- a/sdk/python/feast/protos/feast/core/FeatureView_pb2.pyi +++ b/sdk/python/feast/protos/feast/core/FeatureView_pb2.pyi @@ -58,7 +58,7 @@ class FeatureView(google.protobuf.message.Message): global___FeatureView = FeatureView class FeatureViewSpec(google.protobuf.message.Message): - """Next available id: 17 + """Next available id: 18 TODO(adchia): refactor common fields from this and ODFV into separate metadata proto """ @@ -95,6 +95,7 @@ class FeatureViewSpec(google.protobuf.message.Message): SOURCE_VIEWS_FIELD_NUMBER: builtins.int FEATURE_TRANSFORMATION_FIELD_NUMBER: builtins.int MODE_FIELD_NUMBER: builtins.int + ENABLE_VALIDATION_FIELD_NUMBER: builtins.int name: builtins.str """Name of the feature view. Must be unique. Not updated.""" project: builtins.str @@ -141,6 +142,8 @@ class FeatureViewSpec(google.protobuf.message.Message): """Feature transformation for batch feature views""" mode: builtins.str """The transformation mode (e.g., "python", "pandas", "spark", "sql", "ray")""" + enable_validation: builtins.bool + """Whether schema validation is enabled during materialization""" def __init__( self, *, @@ -160,9 +163,10 @@ class FeatureViewSpec(google.protobuf.message.Message): source_views: collections.abc.Iterable[global___FeatureViewSpec] | None = ..., feature_transformation: feast.core.Transformation_pb2.FeatureTransformationV2 | None = ..., mode: builtins.str = ..., + enable_validation: builtins.bool = ..., ) -> None: ... def HasField(self, field_name: typing_extensions.Literal["batch_source", b"batch_source", "feature_transformation", b"feature_transformation", "stream_source", b"stream_source", "ttl", b"ttl"]) -> builtins.bool: ... - def ClearField(self, field_name: typing_extensions.Literal["batch_source", b"batch_source", "description", b"description", "entities", b"entities", "entity_columns", b"entity_columns", "feature_transformation", b"feature_transformation", "features", b"features", "mode", b"mode", "name", b"name", "offline", b"offline", "online", b"online", "owner", b"owner", "project", b"project", "source_views", b"source_views", "stream_source", b"stream_source", "tags", b"tags", "ttl", b"ttl"]) -> None: ... + def ClearField(self, field_name: typing_extensions.Literal["batch_source", b"batch_source", "description", b"description", "enable_validation", b"enable_validation", "entities", b"entities", "entity_columns", b"entity_columns", "feature_transformation", b"feature_transformation", "features", b"features", "mode", b"mode", "name", b"name", "offline", b"offline", "online", b"online", "owner", b"owner", "project", b"project", "source_views", b"source_views", "stream_source", b"stream_source", "tags", b"tags", "ttl", b"ttl"]) -> None: ... global___FeatureViewSpec = FeatureViewSpec diff --git a/sdk/python/feast/protos/feast/core/Project_pb2.pyi b/sdk/python/feast/protos/feast/core/Project_pb2.pyi index 3196304a19b..e3cce2ec425 100644 --- a/sdk/python/feast/protos/feast/core/Project_pb2.pyi +++ b/sdk/python/feast/protos/feast/core/Project_pb2.pyi @@ -1,19 +1,19 @@ """ @generated by mypy-protobuf. Do not edit manually! isort:skip_file - -* Copyright 2020 The Feast Authors -* -* Licensed under the Apache License, Version 2.0 (the "License"); -* you may not use this file except in compliance with the License. -* You may obtain a copy of the License at -* -* https://www.apache.org/licenses/LICENSE-2.0 -* -* Unless required by applicable law or agreed to in writing, software -* distributed under the License is distributed on an "AS IS" BASIS, -* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -* See the License for the specific language governing permissions and + +* Copyright 2020 The Feast Authors +* +* Licensed under the Apache License, Version 2.0 (the "License"); +* you may not use this file except in compliance with the License. +* You may obtain a copy of the License at +* +* https://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an "AS IS" BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and * limitations under the License. """ import builtins diff --git a/sdk/python/feast/protos/feast/core/Registry_pb2.pyi b/sdk/python/feast/protos/feast/core/Registry_pb2.pyi index ad09878b77f..fca49c75481 100644 --- a/sdk/python/feast/protos/feast/core/Registry_pb2.pyi +++ b/sdk/python/feast/protos/feast/core/Registry_pb2.pyi @@ -1,19 +1,19 @@ """ @generated by mypy-protobuf. Do not edit manually! isort:skip_file - -* Copyright 2020 The Feast Authors -* -* Licensed under the Apache License, Version 2.0 (the "License"); -* you may not use this file except in compliance with the License. -* You may obtain a copy of the License at -* -* https://www.apache.org/licenses/LICENSE-2.0 -* -* Unless required by applicable law or agreed to in writing, software -* distributed under the License is distributed on an "AS IS" BASIS, -* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -* See the License for the specific language governing permissions and + +* Copyright 2020 The Feast Authors +* +* Licensed under the Apache License, Version 2.0 (the "License"); +* you may not use this file except in compliance with the License. +* You may obtain a copy of the License at +* +* https://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an "AS IS" BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and * limitations under the License. """ import builtins diff --git a/sdk/python/feast/protos/feast/core/StreamFeatureView_pb2.py b/sdk/python/feast/protos/feast/core/StreamFeatureView_pb2.py index f64c2852aa9..cd3ec690574 100644 --- a/sdk/python/feast/protos/feast/core/StreamFeatureView_pb2.py +++ b/sdk/python/feast/protos/feast/core/StreamFeatureView_pb2.py @@ -21,7 +21,7 @@ from feast.protos.feast.core import Transformation_pb2 as feast_dot_core_dot_Transformation__pb2 -DESCRIPTOR = _descriptor_pool.Default().AddSerializedFile(b'\n\"feast/core/StreamFeatureView.proto\x12\nfeast.core\x1a\x1egoogle/protobuf/duration.proto\x1a$feast/core/OnDemandFeatureView.proto\x1a\x1c\x66\x65\x61st/core/FeatureView.proto\x1a\x18\x66\x65\x61st/core/Feature.proto\x1a\x1b\x66\x65\x61st/core/DataSource.proto\x1a\x1c\x66\x65\x61st/core/Aggregation.proto\x1a\x1f\x66\x65\x61st/core/Transformation.proto\"o\n\x11StreamFeatureView\x12/\n\x04spec\x18\x01 \x01(\x0b\x32!.feast.core.StreamFeatureViewSpec\x12)\n\x04meta\x18\x02 \x01(\x0b\x32\x1b.feast.core.FeatureViewMeta\"\xf3\x05\n\x15StreamFeatureViewSpec\x12\x0c\n\x04name\x18\x01 \x01(\t\x12\x0f\n\x07project\x18\x02 \x01(\t\x12\x10\n\x08\x65ntities\x18\x03 \x03(\t\x12+\n\x08\x66\x65\x61tures\x18\x04 \x03(\x0b\x32\x19.feast.core.FeatureSpecV2\x12\x31\n\x0e\x65ntity_columns\x18\x05 \x03(\x0b\x32\x19.feast.core.FeatureSpecV2\x12\x13\n\x0b\x64\x65scription\x18\x06 \x01(\t\x12\x39\n\x04tags\x18\x07 \x03(\x0b\x32+.feast.core.StreamFeatureViewSpec.TagsEntry\x12\r\n\x05owner\x18\x08 \x01(\t\x12&\n\x03ttl\x18\t \x01(\x0b\x32\x19.google.protobuf.Duration\x12,\n\x0c\x62\x61tch_source\x18\n \x01(\x0b\x32\x16.feast.core.DataSource\x12-\n\rstream_source\x18\x0b \x01(\x0b\x32\x16.feast.core.DataSource\x12\x0e\n\x06online\x18\x0c \x01(\x08\x12\x42\n\x15user_defined_function\x18\r \x01(\x0b\x32\x1f.feast.core.UserDefinedFunctionB\x02\x18\x01\x12\x0c\n\x04mode\x18\x0e \x01(\t\x12-\n\x0c\x61ggregations\x18\x0f \x03(\x0b\x32\x17.feast.core.Aggregation\x12\x17\n\x0ftimestamp_field\x18\x10 \x01(\t\x12\x43\n\x16\x66\x65\x61ture_transformation\x18\x11 \x01(\x0b\x32#.feast.core.FeatureTransformationV2\x12\x15\n\renable_tiling\x18\x12 \x01(\x08\x12\x32\n\x0ftiling_hop_size\x18\x13 \x01(\x0b\x32\x19.google.protobuf.Duration\x1a+\n\tTagsEntry\x12\x0b\n\x03key\x18\x01 \x01(\t\x12\r\n\x05value\x18\x02 \x01(\t:\x02\x38\x01\x42[\n\x10\x66\x65\x61st.proto.coreB\x16StreamFeatureViewProtoZ/github.com/feast-dev/feast/go/protos/feast/coreb\x06proto3') +DESCRIPTOR = _descriptor_pool.Default().AddSerializedFile(b'\n\"feast/core/StreamFeatureView.proto\x12\nfeast.core\x1a\x1egoogle/protobuf/duration.proto\x1a$feast/core/OnDemandFeatureView.proto\x1a\x1c\x66\x65\x61st/core/FeatureView.proto\x1a\x18\x66\x65\x61st/core/Feature.proto\x1a\x1b\x66\x65\x61st/core/DataSource.proto\x1a\x1c\x66\x65\x61st/core/Aggregation.proto\x1a\x1f\x66\x65\x61st/core/Transformation.proto\"o\n\x11StreamFeatureView\x12/\n\x04spec\x18\x01 \x01(\x0b\x32!.feast.core.StreamFeatureViewSpec\x12)\n\x04meta\x18\x02 \x01(\x0b\x32\x1b.feast.core.FeatureViewMeta\"\x8e\x06\n\x15StreamFeatureViewSpec\x12\x0c\n\x04name\x18\x01 \x01(\t\x12\x0f\n\x07project\x18\x02 \x01(\t\x12\x10\n\x08\x65ntities\x18\x03 \x03(\t\x12+\n\x08\x66\x65\x61tures\x18\x04 \x03(\x0b\x32\x19.feast.core.FeatureSpecV2\x12\x31\n\x0e\x65ntity_columns\x18\x05 \x03(\x0b\x32\x19.feast.core.FeatureSpecV2\x12\x13\n\x0b\x64\x65scription\x18\x06 \x01(\t\x12\x39\n\x04tags\x18\x07 \x03(\x0b\x32+.feast.core.StreamFeatureViewSpec.TagsEntry\x12\r\n\x05owner\x18\x08 \x01(\t\x12&\n\x03ttl\x18\t \x01(\x0b\x32\x19.google.protobuf.Duration\x12,\n\x0c\x62\x61tch_source\x18\n \x01(\x0b\x32\x16.feast.core.DataSource\x12-\n\rstream_source\x18\x0b \x01(\x0b\x32\x16.feast.core.DataSource\x12\x0e\n\x06online\x18\x0c \x01(\x08\x12\x42\n\x15user_defined_function\x18\r \x01(\x0b\x32\x1f.feast.core.UserDefinedFunctionB\x02\x18\x01\x12\x0c\n\x04mode\x18\x0e \x01(\t\x12-\n\x0c\x61ggregations\x18\x0f \x03(\x0b\x32\x17.feast.core.Aggregation\x12\x17\n\x0ftimestamp_field\x18\x10 \x01(\t\x12\x43\n\x16\x66\x65\x61ture_transformation\x18\x11 \x01(\x0b\x32#.feast.core.FeatureTransformationV2\x12\x15\n\renable_tiling\x18\x12 \x01(\x08\x12\x32\n\x0ftiling_hop_size\x18\x13 \x01(\x0b\x32\x19.google.protobuf.Duration\x12\x19\n\x11\x65nable_validation\x18\x14 \x01(\x08\x1a+\n\tTagsEntry\x12\x0b\n\x03key\x18\x01 \x01(\t\x12\r\n\x05value\x18\x02 \x01(\t:\x02\x38\x01\x42[\n\x10\x66\x65\x61st.proto.coreB\x16StreamFeatureViewProtoZ/github.com/feast-dev/feast/go/protos/feast/coreb\x06proto3') _globals = globals() _builder.BuildMessageAndEnumDescriptors(DESCRIPTOR, _globals) @@ -36,7 +36,7 @@ _globals['_STREAMFEATUREVIEW']._serialized_start=268 _globals['_STREAMFEATUREVIEW']._serialized_end=379 _globals['_STREAMFEATUREVIEWSPEC']._serialized_start=382 - _globals['_STREAMFEATUREVIEWSPEC']._serialized_end=1137 - _globals['_STREAMFEATUREVIEWSPEC_TAGSENTRY']._serialized_start=1094 - _globals['_STREAMFEATUREVIEWSPEC_TAGSENTRY']._serialized_end=1137 + _globals['_STREAMFEATUREVIEWSPEC']._serialized_end=1164 + _globals['_STREAMFEATUREVIEWSPEC_TAGSENTRY']._serialized_start=1121 + _globals['_STREAMFEATUREVIEWSPEC_TAGSENTRY']._serialized_end=1164 # @@protoc_insertion_point(module_scope) diff --git a/sdk/python/feast/protos/feast/core/StreamFeatureView_pb2.pyi b/sdk/python/feast/protos/feast/core/StreamFeatureView_pb2.pyi index 160a59b35df..853ada60a27 100644 --- a/sdk/python/feast/protos/feast/core/StreamFeatureView_pb2.pyi +++ b/sdk/python/feast/protos/feast/core/StreamFeatureView_pb2.pyi @@ -59,7 +59,7 @@ class StreamFeatureView(google.protobuf.message.Message): global___StreamFeatureView = StreamFeatureView class StreamFeatureViewSpec(google.protobuf.message.Message): - """Next available id: 20""" + """Next available id: 21""" DESCRIPTOR: google.protobuf.descriptor.Descriptor @@ -97,6 +97,7 @@ class StreamFeatureViewSpec(google.protobuf.message.Message): FEATURE_TRANSFORMATION_FIELD_NUMBER: builtins.int ENABLE_TILING_FIELD_NUMBER: builtins.int TILING_HOP_SIZE_FIELD_NUMBER: builtins.int + ENABLE_VALIDATION_FIELD_NUMBER: builtins.int name: builtins.str """Name of the feature view. Must be unique. Not updated.""" project: builtins.str @@ -152,6 +153,8 @@ class StreamFeatureViewSpec(google.protobuf.message.Message): """Hop size for tiling (e.g., 5 minutes). Determines the granularity of pre-aggregated tiles. If not specified, defaults to 5 minutes. Only used when enable_tiling is true. """ + enable_validation: builtins.bool + """Whether schema validation is enabled during materialization""" def __init__( self, *, @@ -174,8 +177,9 @@ class StreamFeatureViewSpec(google.protobuf.message.Message): feature_transformation: feast.core.Transformation_pb2.FeatureTransformationV2 | None = ..., enable_tiling: builtins.bool = ..., tiling_hop_size: google.protobuf.duration_pb2.Duration | None = ..., + enable_validation: builtins.bool = ..., ) -> None: ... def HasField(self, field_name: typing_extensions.Literal["batch_source", b"batch_source", "feature_transformation", b"feature_transformation", "stream_source", b"stream_source", "tiling_hop_size", b"tiling_hop_size", "ttl", b"ttl", "user_defined_function", b"user_defined_function"]) -> builtins.bool: ... - def ClearField(self, field_name: typing_extensions.Literal["aggregations", b"aggregations", "batch_source", b"batch_source", "description", b"description", "enable_tiling", b"enable_tiling", "entities", b"entities", "entity_columns", b"entity_columns", "feature_transformation", b"feature_transformation", "features", b"features", "mode", b"mode", "name", b"name", "online", b"online", "owner", b"owner", "project", b"project", "stream_source", b"stream_source", "tags", b"tags", "tiling_hop_size", b"tiling_hop_size", "timestamp_field", b"timestamp_field", "ttl", b"ttl", "user_defined_function", b"user_defined_function"]) -> None: ... + def ClearField(self, field_name: typing_extensions.Literal["aggregations", b"aggregations", "batch_source", b"batch_source", "description", b"description", "enable_tiling", b"enable_tiling", "enable_validation", b"enable_validation", "entities", b"entities", "entity_columns", b"entity_columns", "feature_transformation", b"feature_transformation", "features", b"features", "mode", b"mode", "name", b"name", "online", b"online", "owner", b"owner", "project", b"project", "stream_source", b"stream_source", "tags", b"tags", "tiling_hop_size", b"tiling_hop_size", "timestamp_field", b"timestamp_field", "ttl", b"ttl", "user_defined_function", b"user_defined_function"]) -> None: ... global___StreamFeatureViewSpec = StreamFeatureViewSpec diff --git a/sdk/python/feast/protos/feast/types/Value_pb2.py b/sdk/python/feast/protos/feast/types/Value_pb2.py index 2ab1d2cc8fb..5edd8c5bde9 100644 --- a/sdk/python/feast/protos/feast/types/Value_pb2.py +++ b/sdk/python/feast/protos/feast/types/Value_pb2.py @@ -14,7 +14,7 @@ -DESCRIPTOR = _descriptor_pool.Default().AddSerializedFile(b'\n\x17\x66\x65\x61st/types/Value.proto\x12\x0b\x66\x65\x61st.types\"\xb0\x03\n\tValueType\"\xa2\x03\n\x04\x45num\x12\x0b\n\x07INVALID\x10\x00\x12\t\n\x05\x42YTES\x10\x01\x12\n\n\x06STRING\x10\x02\x12\t\n\x05INT32\x10\x03\x12\t\n\x05INT64\x10\x04\x12\n\n\x06\x44OUBLE\x10\x05\x12\t\n\x05\x46LOAT\x10\x06\x12\x08\n\x04\x42OOL\x10\x07\x12\x12\n\x0eUNIX_TIMESTAMP\x10\x08\x12\x0e\n\nBYTES_LIST\x10\x0b\x12\x0f\n\x0bSTRING_LIST\x10\x0c\x12\x0e\n\nINT32_LIST\x10\r\x12\x0e\n\nINT64_LIST\x10\x0e\x12\x0f\n\x0b\x44OUBLE_LIST\x10\x0f\x12\x0e\n\nFLOAT_LIST\x10\x10\x12\r\n\tBOOL_LIST\x10\x11\x12\x17\n\x13UNIX_TIMESTAMP_LIST\x10\x12\x12\x08\n\x04NULL\x10\x13\x12\x07\n\x03MAP\x10\x14\x12\x0c\n\x08MAP_LIST\x10\x15\x12\r\n\tBYTES_SET\x10\x16\x12\x0e\n\nSTRING_SET\x10\x17\x12\r\n\tINT32_SET\x10\x18\x12\r\n\tINT64_SET\x10\x19\x12\x0e\n\nDOUBLE_SET\x10\x1a\x12\r\n\tFLOAT_SET\x10\x1b\x12\x0c\n\x08\x42OOL_SET\x10\x1c\x12\x16\n\x12UNIX_TIMESTAMP_SET\x10\x1d\"\xe0\x08\n\x05Value\x12\x13\n\tbytes_val\x18\x01 \x01(\x0cH\x00\x12\x14\n\nstring_val\x18\x02 \x01(\tH\x00\x12\x13\n\tint32_val\x18\x03 \x01(\x05H\x00\x12\x13\n\tint64_val\x18\x04 \x01(\x03H\x00\x12\x14\n\ndouble_val\x18\x05 \x01(\x01H\x00\x12\x13\n\tfloat_val\x18\x06 \x01(\x02H\x00\x12\x12\n\x08\x62ool_val\x18\x07 \x01(\x08H\x00\x12\x1c\n\x12unix_timestamp_val\x18\x08 \x01(\x03H\x00\x12\x30\n\x0e\x62ytes_list_val\x18\x0b \x01(\x0b\x32\x16.feast.types.BytesListH\x00\x12\x32\n\x0fstring_list_val\x18\x0c \x01(\x0b\x32\x17.feast.types.StringListH\x00\x12\x30\n\x0eint32_list_val\x18\r \x01(\x0b\x32\x16.feast.types.Int32ListH\x00\x12\x30\n\x0eint64_list_val\x18\x0e \x01(\x0b\x32\x16.feast.types.Int64ListH\x00\x12\x32\n\x0f\x64ouble_list_val\x18\x0f \x01(\x0b\x32\x17.feast.types.DoubleListH\x00\x12\x30\n\x0e\x66loat_list_val\x18\x10 \x01(\x0b\x32\x16.feast.types.FloatListH\x00\x12.\n\rbool_list_val\x18\x11 \x01(\x0b\x32\x15.feast.types.BoolListH\x00\x12\x39\n\x17unix_timestamp_list_val\x18\x12 \x01(\x0b\x32\x16.feast.types.Int64ListH\x00\x12%\n\x08null_val\x18\x13 \x01(\x0e\x32\x11.feast.types.NullH\x00\x12#\n\x07map_val\x18\x14 \x01(\x0b\x32\x10.feast.types.MapH\x00\x12,\n\x0cmap_list_val\x18\x15 \x01(\x0b\x32\x14.feast.types.MapListH\x00\x12.\n\rbytes_set_val\x18\x16 \x01(\x0b\x32\x15.feast.types.BytesSetH\x00\x12\x30\n\x0estring_set_val\x18\x17 \x01(\x0b\x32\x16.feast.types.StringSetH\x00\x12.\n\rint32_set_val\x18\x18 \x01(\x0b\x32\x15.feast.types.Int32SetH\x00\x12.\n\rint64_set_val\x18\x19 \x01(\x0b\x32\x15.feast.types.Int64SetH\x00\x12\x30\n\x0e\x64ouble_set_val\x18\x1a \x01(\x0b\x32\x16.feast.types.DoubleSetH\x00\x12.\n\rfloat_set_val\x18\x1b \x01(\x0b\x32\x15.feast.types.FloatSetH\x00\x12,\n\x0c\x62ool_set_val\x18\x1c \x01(\x0b\x32\x14.feast.types.BoolSetH\x00\x12\x37\n\x16unix_timestamp_set_val\x18\x1d \x01(\x0b\x32\x15.feast.types.Int64SetH\x00\x42\x05\n\x03val\"\x18\n\tBytesList\x12\x0b\n\x03val\x18\x01 \x03(\x0c\"\x19\n\nStringList\x12\x0b\n\x03val\x18\x01 \x03(\t\"\x18\n\tInt32List\x12\x0b\n\x03val\x18\x01 \x03(\x05\"\x18\n\tInt64List\x12\x0b\n\x03val\x18\x01 \x03(\x03\"\x19\n\nDoubleList\x12\x0b\n\x03val\x18\x01 \x03(\x01\"\x18\n\tFloatList\x12\x0b\n\x03val\x18\x01 \x03(\x02\"\x17\n\x08\x42oolList\x12\x0b\n\x03val\x18\x01 \x03(\x08\"\x17\n\x08\x42ytesSet\x12\x0b\n\x03val\x18\x01 \x03(\x0c\"\x18\n\tStringSet\x12\x0b\n\x03val\x18\x01 \x03(\t\"\x17\n\x08Int32Set\x12\x0b\n\x03val\x18\x01 \x03(\x05\"\x17\n\x08Int64Set\x12\x0b\n\x03val\x18\x01 \x03(\x03\"\x18\n\tDoubleSet\x12\x0b\n\x03val\x18\x01 \x03(\x01\"\x17\n\x08\x46loatSet\x12\x0b\n\x03val\x18\x01 \x03(\x02\"\x16\n\x07\x42oolSet\x12\x0b\n\x03val\x18\x01 \x03(\x08\"m\n\x03Map\x12&\n\x03val\x18\x01 \x03(\x0b\x32\x19.feast.types.Map.ValEntry\x1a>\n\x08ValEntry\x12\x0b\n\x03key\x18\x01 \x01(\t\x12!\n\x05value\x18\x02 \x01(\x0b\x32\x12.feast.types.Value:\x02\x38\x01\"(\n\x07MapList\x12\x1d\n\x03val\x18\x01 \x03(\x0b\x32\x10.feast.types.Map\"0\n\rRepeatedValue\x12\x1f\n\x03val\x18\x01 \x03(\x0b\x32\x12.feast.types.Value*\x10\n\x04Null\x12\x08\n\x04NULL\x10\x00\x42Q\n\x11\x66\x65\x61st.proto.typesB\nValueProtoZ0github.com/feast-dev/feast/go/protos/feast/typesb\x06proto3') +DESCRIPTOR = _descriptor_pool.Default().AddSerializedFile(b'\n\x17\x66\x65\x61st/types/Value.proto\x12\x0b\x66\x65\x61st.types\"\xe6\x03\n\tValueType\"\xd8\x03\n\x04\x45num\x12\x0b\n\x07INVALID\x10\x00\x12\t\n\x05\x42YTES\x10\x01\x12\n\n\x06STRING\x10\x02\x12\t\n\x05INT32\x10\x03\x12\t\n\x05INT64\x10\x04\x12\n\n\x06\x44OUBLE\x10\x05\x12\t\n\x05\x46LOAT\x10\x06\x12\x08\n\x04\x42OOL\x10\x07\x12\x12\n\x0eUNIX_TIMESTAMP\x10\x08\x12\x0e\n\nBYTES_LIST\x10\x0b\x12\x0f\n\x0bSTRING_LIST\x10\x0c\x12\x0e\n\nINT32_LIST\x10\r\x12\x0e\n\nINT64_LIST\x10\x0e\x12\x0f\n\x0b\x44OUBLE_LIST\x10\x0f\x12\x0e\n\nFLOAT_LIST\x10\x10\x12\r\n\tBOOL_LIST\x10\x11\x12\x17\n\x13UNIX_TIMESTAMP_LIST\x10\x12\x12\x08\n\x04NULL\x10\x13\x12\x07\n\x03MAP\x10\x14\x12\x0c\n\x08MAP_LIST\x10\x15\x12\r\n\tBYTES_SET\x10\x16\x12\x0e\n\nSTRING_SET\x10\x17\x12\r\n\tINT32_SET\x10\x18\x12\r\n\tINT64_SET\x10\x19\x12\x0e\n\nDOUBLE_SET\x10\x1a\x12\r\n\tFLOAT_SET\x10\x1b\x12\x0c\n\x08\x42OOL_SET\x10\x1c\x12\x16\n\x12UNIX_TIMESTAMP_SET\x10\x1d\x12\x08\n\x04JSON\x10 \x12\r\n\tJSON_LIST\x10!\x12\n\n\x06STRUCT\x10\"\x12\x0f\n\x0bSTRUCT_LIST\x10#\"\xff\t\n\x05Value\x12\x13\n\tbytes_val\x18\x01 \x01(\x0cH\x00\x12\x14\n\nstring_val\x18\x02 \x01(\tH\x00\x12\x13\n\tint32_val\x18\x03 \x01(\x05H\x00\x12\x13\n\tint64_val\x18\x04 \x01(\x03H\x00\x12\x14\n\ndouble_val\x18\x05 \x01(\x01H\x00\x12\x13\n\tfloat_val\x18\x06 \x01(\x02H\x00\x12\x12\n\x08\x62ool_val\x18\x07 \x01(\x08H\x00\x12\x1c\n\x12unix_timestamp_val\x18\x08 \x01(\x03H\x00\x12\x30\n\x0e\x62ytes_list_val\x18\x0b \x01(\x0b\x32\x16.feast.types.BytesListH\x00\x12\x32\n\x0fstring_list_val\x18\x0c \x01(\x0b\x32\x17.feast.types.StringListH\x00\x12\x30\n\x0eint32_list_val\x18\r \x01(\x0b\x32\x16.feast.types.Int32ListH\x00\x12\x30\n\x0eint64_list_val\x18\x0e \x01(\x0b\x32\x16.feast.types.Int64ListH\x00\x12\x32\n\x0f\x64ouble_list_val\x18\x0f \x01(\x0b\x32\x17.feast.types.DoubleListH\x00\x12\x30\n\x0e\x66loat_list_val\x18\x10 \x01(\x0b\x32\x16.feast.types.FloatListH\x00\x12.\n\rbool_list_val\x18\x11 \x01(\x0b\x32\x15.feast.types.BoolListH\x00\x12\x39\n\x17unix_timestamp_list_val\x18\x12 \x01(\x0b\x32\x16.feast.types.Int64ListH\x00\x12%\n\x08null_val\x18\x13 \x01(\x0e\x32\x11.feast.types.NullH\x00\x12#\n\x07map_val\x18\x14 \x01(\x0b\x32\x10.feast.types.MapH\x00\x12,\n\x0cmap_list_val\x18\x15 \x01(\x0b\x32\x14.feast.types.MapListH\x00\x12.\n\rbytes_set_val\x18\x16 \x01(\x0b\x32\x15.feast.types.BytesSetH\x00\x12\x30\n\x0estring_set_val\x18\x17 \x01(\x0b\x32\x16.feast.types.StringSetH\x00\x12.\n\rint32_set_val\x18\x18 \x01(\x0b\x32\x15.feast.types.Int32SetH\x00\x12.\n\rint64_set_val\x18\x19 \x01(\x0b\x32\x15.feast.types.Int64SetH\x00\x12\x30\n\x0e\x64ouble_set_val\x18\x1a \x01(\x0b\x32\x16.feast.types.DoubleSetH\x00\x12.\n\rfloat_set_val\x18\x1b \x01(\x0b\x32\x15.feast.types.FloatSetH\x00\x12,\n\x0c\x62ool_set_val\x18\x1c \x01(\x0b\x32\x14.feast.types.BoolSetH\x00\x12\x37\n\x16unix_timestamp_set_val\x18\x1d \x01(\x0b\x32\x15.feast.types.Int64SetH\x00\x12\x12\n\x08json_val\x18 \x01(\tH\x00\x12\x30\n\rjson_list_val\x18! \x01(\x0b\x32\x17.feast.types.StringListH\x00\x12&\n\nstruct_val\x18\" \x01(\x0b\x32\x10.feast.types.MapH\x00\x12/\n\x0fstruct_list_val\x18# \x01(\x0b\x32\x14.feast.types.MapListH\x00\x42\x05\n\x03val\"\x18\n\tBytesList\x12\x0b\n\x03val\x18\x01 \x03(\x0c\"\x19\n\nStringList\x12\x0b\n\x03val\x18\x01 \x03(\t\"\x18\n\tInt32List\x12\x0b\n\x03val\x18\x01 \x03(\x05\"\x18\n\tInt64List\x12\x0b\n\x03val\x18\x01 \x03(\x03\"\x19\n\nDoubleList\x12\x0b\n\x03val\x18\x01 \x03(\x01\"\x18\n\tFloatList\x12\x0b\n\x03val\x18\x01 \x03(\x02\"\x17\n\x08\x42oolList\x12\x0b\n\x03val\x18\x01 \x03(\x08\"\x17\n\x08\x42ytesSet\x12\x0b\n\x03val\x18\x01 \x03(\x0c\"\x18\n\tStringSet\x12\x0b\n\x03val\x18\x01 \x03(\t\"\x17\n\x08Int32Set\x12\x0b\n\x03val\x18\x01 \x03(\x05\"\x17\n\x08Int64Set\x12\x0b\n\x03val\x18\x01 \x03(\x03\"\x18\n\tDoubleSet\x12\x0b\n\x03val\x18\x01 \x03(\x01\"\x17\n\x08\x46loatSet\x12\x0b\n\x03val\x18\x01 \x03(\x02\"\x16\n\x07\x42oolSet\x12\x0b\n\x03val\x18\x01 \x03(\x08\"m\n\x03Map\x12&\n\x03val\x18\x01 \x03(\x0b\x32\x19.feast.types.Map.ValEntry\x1a>\n\x08ValEntry\x12\x0b\n\x03key\x18\x01 \x01(\t\x12!\n\x05value\x18\x02 \x01(\x0b\x32\x12.feast.types.Value:\x02\x38\x01\"(\n\x07MapList\x12\x1d\n\x03val\x18\x01 \x03(\x0b\x32\x10.feast.types.Map\"0\n\rRepeatedValue\x12\x1f\n\x03val\x18\x01 \x03(\x0b\x32\x12.feast.types.Value*\x10\n\x04Null\x12\x08\n\x04NULL\x10\x00\x42Q\n\x11\x66\x65\x61st.proto.typesB\nValueProtoZ0github.com/feast-dev/feast/go/protos/feast/typesb\x06proto3') _globals = globals() _builder.BuildMessageAndEnumDescriptors(DESCRIPTOR, _globals) @@ -24,48 +24,48 @@ _globals['DESCRIPTOR']._serialized_options = b'\n\021feast.proto.typesB\nValueProtoZ0github.com/feast-dev/feast/go/protos/feast/types' _globals['_MAP_VALENTRY']._options = None _globals['_MAP_VALENTRY']._serialized_options = b'8\001' - _globals['_NULL']._serialized_start=2160 - _globals['_NULL']._serialized_end=2176 + _globals['_NULL']._serialized_start=2373 + _globals['_NULL']._serialized_end=2389 _globals['_VALUETYPE']._serialized_start=41 - _globals['_VALUETYPE']._serialized_end=473 + _globals['_VALUETYPE']._serialized_end=527 _globals['_VALUETYPE_ENUM']._serialized_start=55 - _globals['_VALUETYPE_ENUM']._serialized_end=473 - _globals['_VALUE']._serialized_start=476 - _globals['_VALUE']._serialized_end=1596 - _globals['_BYTESLIST']._serialized_start=1598 - _globals['_BYTESLIST']._serialized_end=1622 - _globals['_STRINGLIST']._serialized_start=1624 - _globals['_STRINGLIST']._serialized_end=1649 - _globals['_INT32LIST']._serialized_start=1651 - _globals['_INT32LIST']._serialized_end=1675 - _globals['_INT64LIST']._serialized_start=1677 - _globals['_INT64LIST']._serialized_end=1701 - _globals['_DOUBLELIST']._serialized_start=1703 - _globals['_DOUBLELIST']._serialized_end=1728 - _globals['_FLOATLIST']._serialized_start=1730 - _globals['_FLOATLIST']._serialized_end=1754 - _globals['_BOOLLIST']._serialized_start=1756 - _globals['_BOOLLIST']._serialized_end=1779 - _globals['_BYTESSET']._serialized_start=1781 - _globals['_BYTESSET']._serialized_end=1804 - _globals['_STRINGSET']._serialized_start=1806 - _globals['_STRINGSET']._serialized_end=1830 - _globals['_INT32SET']._serialized_start=1832 - _globals['_INT32SET']._serialized_end=1855 - _globals['_INT64SET']._serialized_start=1857 - _globals['_INT64SET']._serialized_end=1880 - _globals['_DOUBLESET']._serialized_start=1882 - _globals['_DOUBLESET']._serialized_end=1906 - _globals['_FLOATSET']._serialized_start=1908 - _globals['_FLOATSET']._serialized_end=1931 - _globals['_BOOLSET']._serialized_start=1933 - _globals['_BOOLSET']._serialized_end=1955 - _globals['_MAP']._serialized_start=1957 - _globals['_MAP']._serialized_end=2066 - _globals['_MAP_VALENTRY']._serialized_start=2004 - _globals['_MAP_VALENTRY']._serialized_end=2066 - _globals['_MAPLIST']._serialized_start=2068 - _globals['_MAPLIST']._serialized_end=2108 - _globals['_REPEATEDVALUE']._serialized_start=2110 - _globals['_REPEATEDVALUE']._serialized_end=2158 + _globals['_VALUETYPE_ENUM']._serialized_end=527 + _globals['_VALUE']._serialized_start=530 + _globals['_VALUE']._serialized_end=1809 + _globals['_BYTESLIST']._serialized_start=1811 + _globals['_BYTESLIST']._serialized_end=1835 + _globals['_STRINGLIST']._serialized_start=1837 + _globals['_STRINGLIST']._serialized_end=1862 + _globals['_INT32LIST']._serialized_start=1864 + _globals['_INT32LIST']._serialized_end=1888 + _globals['_INT64LIST']._serialized_start=1890 + _globals['_INT64LIST']._serialized_end=1914 + _globals['_DOUBLELIST']._serialized_start=1916 + _globals['_DOUBLELIST']._serialized_end=1941 + _globals['_FLOATLIST']._serialized_start=1943 + _globals['_FLOATLIST']._serialized_end=1967 + _globals['_BOOLLIST']._serialized_start=1969 + _globals['_BOOLLIST']._serialized_end=1992 + _globals['_BYTESSET']._serialized_start=1994 + _globals['_BYTESSET']._serialized_end=2017 + _globals['_STRINGSET']._serialized_start=2019 + _globals['_STRINGSET']._serialized_end=2043 + _globals['_INT32SET']._serialized_start=2045 + _globals['_INT32SET']._serialized_end=2068 + _globals['_INT64SET']._serialized_start=2070 + _globals['_INT64SET']._serialized_end=2093 + _globals['_DOUBLESET']._serialized_start=2095 + _globals['_DOUBLESET']._serialized_end=2119 + _globals['_FLOATSET']._serialized_start=2121 + _globals['_FLOATSET']._serialized_end=2144 + _globals['_BOOLSET']._serialized_start=2146 + _globals['_BOOLSET']._serialized_end=2168 + _globals['_MAP']._serialized_start=2170 + _globals['_MAP']._serialized_end=2279 + _globals['_MAP_VALENTRY']._serialized_start=2217 + _globals['_MAP_VALENTRY']._serialized_end=2279 + _globals['_MAPLIST']._serialized_start=2281 + _globals['_MAPLIST']._serialized_end=2321 + _globals['_REPEATEDVALUE']._serialized_start=2323 + _globals['_REPEATEDVALUE']._serialized_end=2371 # @@protoc_insertion_point(module_scope) diff --git a/sdk/python/feast/protos/feast/types/Value_pb2.pyi b/sdk/python/feast/protos/feast/types/Value_pb2.pyi index 0e10849ebad..64079291f4d 100644 --- a/sdk/python/feast/protos/feast/types/Value_pb2.pyi +++ b/sdk/python/feast/protos/feast/types/Value_pb2.pyi @@ -82,6 +82,10 @@ class ValueType(google.protobuf.message.Message): FLOAT_SET: ValueType._Enum.ValueType # 27 BOOL_SET: ValueType._Enum.ValueType # 28 UNIX_TIMESTAMP_SET: ValueType._Enum.ValueType # 29 + JSON: ValueType._Enum.ValueType # 32 + JSON_LIST: ValueType._Enum.ValueType # 33 + STRUCT: ValueType._Enum.ValueType # 34 + STRUCT_LIST: ValueType._Enum.ValueType # 35 class Enum(_Enum, metaclass=_EnumEnumTypeWrapper): ... INVALID: ValueType.Enum.ValueType # 0 @@ -112,6 +116,10 @@ class ValueType(google.protobuf.message.Message): FLOAT_SET: ValueType.Enum.ValueType # 27 BOOL_SET: ValueType.Enum.ValueType # 28 UNIX_TIMESTAMP_SET: ValueType.Enum.ValueType # 29 + JSON: ValueType.Enum.ValueType # 32 + JSON_LIST: ValueType.Enum.ValueType # 33 + STRUCT: ValueType.Enum.ValueType # 34 + STRUCT_LIST: ValueType.Enum.ValueType # 35 def __init__( self, @@ -149,6 +157,10 @@ class Value(google.protobuf.message.Message): FLOAT_SET_VAL_FIELD_NUMBER: builtins.int BOOL_SET_VAL_FIELD_NUMBER: builtins.int UNIX_TIMESTAMP_SET_VAL_FIELD_NUMBER: builtins.int + JSON_VAL_FIELD_NUMBER: builtins.int + JSON_LIST_VAL_FIELD_NUMBER: builtins.int + STRUCT_VAL_FIELD_NUMBER: builtins.int + STRUCT_LIST_VAL_FIELD_NUMBER: builtins.int bytes_val: builtins.bytes string_val: builtins.str int32_val: builtins.int @@ -194,6 +206,13 @@ class Value(google.protobuf.message.Message): def bool_set_val(self) -> global___BoolSet: ... @property def unix_timestamp_set_val(self) -> global___Int64Set: ... + json_val: builtins.str + @property + def json_list_val(self) -> global___StringList: ... + @property + def struct_val(self) -> global___Map: ... + @property + def struct_list_val(self) -> global___MapList: ... def __init__( self, *, @@ -224,10 +243,14 @@ class Value(google.protobuf.message.Message): float_set_val: global___FloatSet | None = ..., bool_set_val: global___BoolSet | None = ..., unix_timestamp_set_val: global___Int64Set | None = ..., + json_val: builtins.str = ..., + json_list_val: global___StringList | None = ..., + struct_val: global___Map | None = ..., + struct_list_val: global___MapList | None = ..., ) -> None: ... - def HasField(self, field_name: typing_extensions.Literal["bool_list_val", b"bool_list_val", "bool_set_val", b"bool_set_val", "bool_val", b"bool_val", "bytes_list_val", b"bytes_list_val", "bytes_set_val", b"bytes_set_val", "bytes_val", b"bytes_val", "double_list_val", b"double_list_val", "double_set_val", b"double_set_val", "double_val", b"double_val", "float_list_val", b"float_list_val", "float_set_val", b"float_set_val", "float_val", b"float_val", "int32_list_val", b"int32_list_val", "int32_set_val", b"int32_set_val", "int32_val", b"int32_val", "int64_list_val", b"int64_list_val", "int64_set_val", b"int64_set_val", "int64_val", b"int64_val", "map_list_val", b"map_list_val", "map_val", b"map_val", "null_val", b"null_val", "string_list_val", b"string_list_val", "string_set_val", b"string_set_val", "string_val", b"string_val", "unix_timestamp_list_val", b"unix_timestamp_list_val", "unix_timestamp_set_val", b"unix_timestamp_set_val", "unix_timestamp_val", b"unix_timestamp_val", "val", b"val"]) -> builtins.bool: ... - def ClearField(self, field_name: typing_extensions.Literal["bool_list_val", b"bool_list_val", "bool_set_val", b"bool_set_val", "bool_val", b"bool_val", "bytes_list_val", b"bytes_list_val", "bytes_set_val", b"bytes_set_val", "bytes_val", b"bytes_val", "double_list_val", b"double_list_val", "double_set_val", b"double_set_val", "double_val", b"double_val", "float_list_val", b"float_list_val", "float_set_val", b"float_set_val", "float_val", b"float_val", "int32_list_val", b"int32_list_val", "int32_set_val", b"int32_set_val", "int32_val", b"int32_val", "int64_list_val", b"int64_list_val", "int64_set_val", b"int64_set_val", "int64_val", b"int64_val", "map_list_val", b"map_list_val", "map_val", b"map_val", "null_val", b"null_val", "string_list_val", b"string_list_val", "string_set_val", b"string_set_val", "string_val", b"string_val", "unix_timestamp_list_val", b"unix_timestamp_list_val", "unix_timestamp_set_val", b"unix_timestamp_set_val", "unix_timestamp_val", b"unix_timestamp_val", "val", b"val"]) -> None: ... - def WhichOneof(self, oneof_group: typing_extensions.Literal["val", b"val"]) -> typing_extensions.Literal["bytes_val", "string_val", "int32_val", "int64_val", "double_val", "float_val", "bool_val", "unix_timestamp_val", "bytes_list_val", "string_list_val", "int32_list_val", "int64_list_val", "double_list_val", "float_list_val", "bool_list_val", "unix_timestamp_list_val", "null_val", "map_val", "map_list_val", "bytes_set_val", "string_set_val", "int32_set_val", "int64_set_val", "double_set_val", "float_set_val", "bool_set_val", "unix_timestamp_set_val"] | None: ... + def HasField(self, field_name: typing_extensions.Literal["bool_list_val", b"bool_list_val", "bool_set_val", b"bool_set_val", "bool_val", b"bool_val", "bytes_list_val", b"bytes_list_val", "bytes_set_val", b"bytes_set_val", "bytes_val", b"bytes_val", "double_list_val", b"double_list_val", "double_set_val", b"double_set_val", "double_val", b"double_val", "float_list_val", b"float_list_val", "float_set_val", b"float_set_val", "float_val", b"float_val", "int32_list_val", b"int32_list_val", "int32_set_val", b"int32_set_val", "int32_val", b"int32_val", "int64_list_val", b"int64_list_val", "int64_set_val", b"int64_set_val", "int64_val", b"int64_val", "json_list_val", b"json_list_val", "json_val", b"json_val", "map_list_val", b"map_list_val", "map_val", b"map_val", "null_val", b"null_val", "string_list_val", b"string_list_val", "string_set_val", b"string_set_val", "string_val", b"string_val", "struct_list_val", b"struct_list_val", "struct_val", b"struct_val", "unix_timestamp_list_val", b"unix_timestamp_list_val", "unix_timestamp_set_val", b"unix_timestamp_set_val", "unix_timestamp_val", b"unix_timestamp_val", "val", b"val"]) -> builtins.bool: ... + def ClearField(self, field_name: typing_extensions.Literal["bool_list_val", b"bool_list_val", "bool_set_val", b"bool_set_val", "bool_val", b"bool_val", "bytes_list_val", b"bytes_list_val", "bytes_set_val", b"bytes_set_val", "bytes_val", b"bytes_val", "double_list_val", b"double_list_val", "double_set_val", b"double_set_val", "double_val", b"double_val", "float_list_val", b"float_list_val", "float_set_val", b"float_set_val", "float_val", b"float_val", "int32_list_val", b"int32_list_val", "int32_set_val", b"int32_set_val", "int32_val", b"int32_val", "int64_list_val", b"int64_list_val", "int64_set_val", b"int64_set_val", "int64_val", b"int64_val", "json_list_val", b"json_list_val", "json_val", b"json_val", "map_list_val", b"map_list_val", "map_val", b"map_val", "null_val", b"null_val", "string_list_val", b"string_list_val", "string_set_val", b"string_set_val", "string_val", b"string_val", "struct_list_val", b"struct_list_val", "struct_val", b"struct_val", "unix_timestamp_list_val", b"unix_timestamp_list_val", "unix_timestamp_set_val", b"unix_timestamp_set_val", "unix_timestamp_val", b"unix_timestamp_val", "val", b"val"]) -> None: ... + def WhichOneof(self, oneof_group: typing_extensions.Literal["val", b"val"]) -> typing_extensions.Literal["bytes_val", "string_val", "int32_val", "int64_val", "double_val", "float_val", "bool_val", "unix_timestamp_val", "bytes_list_val", "string_list_val", "int32_list_val", "int64_list_val", "double_list_val", "float_list_val", "bool_list_val", "unix_timestamp_list_val", "null_val", "map_val", "map_list_val", "bytes_set_val", "string_set_val", "int32_set_val", "int64_set_val", "double_set_val", "float_set_val", "bool_set_val", "unix_timestamp_set_val", "json_val", "json_list_val", "struct_val", "struct_list_val"] | None: ... global___Value = Value diff --git a/sdk/python/feast/stream_feature_view.py b/sdk/python/feast/stream_feature_view.py index 2a6ae6cef78..b8f410f9a48 100644 --- a/sdk/python/feast/stream_feature_view.py +++ b/sdk/python/feast/stream_feature_view.py @@ -121,6 +121,7 @@ def __init__( stream_engine: Optional[Dict[str, Any]] = None, enable_tiling: bool = False, tiling_hop_size: Optional[timedelta] = None, + enable_validation: bool = False, ): if not flags_helper.is_test(): warnings.warn( @@ -184,6 +185,7 @@ def __init__( source=source, # type: ignore[arg-type] mode=mode, sink_source=sink_source, + enable_validation=enable_validation, ) def get_feature_transformation(self) -> Optional[Transformation]: @@ -279,6 +281,7 @@ def to_proto(self): mode=mode_to_string(self.mode), enable_tiling=self.enable_tiling, tiling_hop_size=tiling_hop_size_duration, + enable_validation=self.enable_validation, ) return StreamFeatureViewProto(spec=spec, meta=meta) @@ -340,6 +343,7 @@ def from_proto(cls, sfv_proto): and sfv_proto.spec.tiling_hop_size.ToNanoseconds() != 0 else None ), + enable_validation=sfv_proto.spec.enable_validation, ) if batch_source: @@ -393,6 +397,7 @@ def __copy__(self): udf=self.udf, udf_string=self.udf_string, feature_transformation=self.feature_transformation, + enable_validation=self.enable_validation, ) fv.entities = self.entities fv.features = copy.copy(self.features) @@ -418,6 +423,7 @@ def stream_feature_view( aggregations: Optional[List[Aggregation]] = None, mode: Optional[str] = "spark", timestamp_field: Optional[str] = "", + enable_validation: bool = False, ): """ Creates an StreamFeatureView object with the given user function as udf. @@ -449,6 +455,7 @@ def decorator(user_function): aggregations=aggregations, mode=mode, timestamp_field=timestamp_field, + enable_validation=enable_validation, ) functools.update_wrapper(wrapper=stream_feature_view_obj, wrapped=user_function) return stream_feature_view_obj diff --git a/sdk/python/feast/templates/local/feature_repo/feature_definitions.py b/sdk/python/feast/templates/local/feature_repo/feature_definitions.py index e2fd0a891cf..6fe94a5fa59 100644 --- a/sdk/python/feast/templates/local/feature_repo/feature_definitions.py +++ b/sdk/python/feast/templates/local/feature_repo/feature_definitions.py @@ -17,7 +17,7 @@ from feast.feature_logging import LoggingConfig from feast.infra.offline_stores.file_source import FileLoggingDestination from feast.on_demand_feature_view import on_demand_feature_view -from feast.types import Float32, Float64, Int64 +from feast.types import Float32, Float64, Int64, Json, Map, String, Struct # Define a project for the feature repo project = Project(name="%PROJECT_NAME%", description="A project for driver statistics") @@ -52,12 +52,26 @@ Field(name="conv_rate", dtype=Float32), Field(name="acc_rate", dtype=Float32), Field(name="avg_daily_trips", dtype=Int64, description="Average daily trips"), + Field( + name="driver_metadata", + dtype=Map, + description="Driver metadata as key-value pairs", + ), + Field( + name="driver_config", dtype=Json, description="Driver configuration as JSON" + ), + Field( + name="driver_profile", + dtype=Struct({"name": String, "age": String}), + description="Driver profile as a typed struct", + ), ], online=True, source=driver_stats_source, # Tags are user defined key/value pairs that are attached to each # feature view tags={"team": "driver_performance"}, + enable_validation=True, ) # Define a request data source which encodes features / information only @@ -119,6 +133,9 @@ def transformed_conv_rate(inputs: pd.DataFrame) -> pd.DataFrame: Field(name="conv_rate", dtype=Float32), Field(name="acc_rate", dtype=Float32), Field(name="avg_daily_trips", dtype=Int64), + Field(name="driver_metadata", dtype=Map), + Field(name="driver_config", dtype=Json), + Field(name="driver_profile", dtype=Struct({"name": String, "age": String})), ], online=True, source=driver_stats_push_source, # Changed from above diff --git a/sdk/python/feast/templates/local/feature_repo/test_workflow.py b/sdk/python/feast/templates/local/feature_repo/test_workflow.py index eebeb113115..82175972321 100644 --- a/sdk/python/feast/templates/local/feature_repo/test_workflow.py +++ b/sdk/python/feast/templates/local/feature_repo/test_workflow.py @@ -1,3 +1,4 @@ +import json import subprocess from datetime import datetime @@ -45,6 +46,11 @@ def run_demo(): "conv_rate": [1.0], "acc_rate": [1.0], "avg_daily_trips": [1000], + "driver_metadata": [{"vehicle_type": "truck", "rating": "5.0"}], + "driver_config": [ + json.dumps({"max_distance_km": 500, "preferred_zones": ["north"]}) + ], + "driver_profile": [{"name": "driver_1001_updated", "age": "30"}], } ) print(event_df) @@ -115,6 +121,9 @@ def fetch_online_features(store, source: str = ""): else: features_to_fetch = [ "driver_hourly_stats:acc_rate", + "driver_hourly_stats:driver_metadata", + "driver_hourly_stats:driver_config", + "driver_hourly_stats:driver_profile", "transformed_conv_rate:conv_rate_plus_val1", "transformed_conv_rate:conv_rate_plus_val2", ] diff --git a/sdk/python/feast/type_map.py b/sdk/python/feast/type_map.py index cebec80d208..60d22a48ea9 100644 --- a/sdk/python/feast/type_map.py +++ b/sdk/python/feast/type_map.py @@ -83,6 +83,30 @@ def feast_value_type_to_python_type(field_value_proto: ProtoValue) -> Any: return None val = getattr(field_value_proto, val_attr) + # Handle JSON types — stored as strings but returned as parsed Python objects + if val_attr == "json_val": + try: + return json.loads(val) + except (json.JSONDecodeError, TypeError): + return val + elif val_attr == "json_list_val": + result = [] + for v in val.val: + if isinstance(v, str): + try: + result.append(json.loads(v)) + except (json.JSONDecodeError, TypeError): + result.append(v) + else: + result.append(v) + return result + + # Handle Struct types — stored using Map proto, returned as dicts + if val_attr == "struct_val": + return _handle_map_value(val) + elif val_attr == "struct_list_val": + return _handle_map_list_value(val) + # Handle Map and MapList types FIRST (before generic list processing) if val_attr == "map_val": return _handle_map_value(val) @@ -162,7 +186,7 @@ def feast_value_type_to_pandas_type(value_type: ValueType) -> Any: ValueType.UNIX_TIMESTAMP: "datetime64[ns]", } if ( - value_type.name == "MAP" + value_type.name in ("MAP", "JSON", "STRUCT") or value_type.name.endswith("_LIST") or value_type.name.endswith("_SET") ): @@ -375,6 +399,12 @@ def _convert_value_type_str_to_value_type(type_str: str) -> ValueType: "FLOAT_LIST": ValueType.FLOAT_LIST, "BOOL_LIST": ValueType.BOOL_LIST, "UNIX_TIMESTAMP_LIST": ValueType.UNIX_TIMESTAMP_LIST, + "MAP": ValueType.MAP, + "MAP_LIST": ValueType.MAP_LIST, + "JSON": ValueType.JSON, + "JSON_LIST": ValueType.JSON_LIST, + "STRUCT": ValueType.STRUCT, + "STRUCT_LIST": ValueType.STRUCT_LIST, } return type_map.get(type_str, ValueType.STRING) @@ -803,20 +833,108 @@ def _python_value_to_proto_value( """ # Handle Map types if feast_value_type == ValueType.MAP: - return [ - ProtoValue(map_val=_python_dict_to_map_proto(value)) - if value is not None - else ProtoValue() - for value in values - ] + result = [] + for value in values: + if value is None: + result.append(ProtoValue()) + else: + if isinstance(value, str): + value = json.loads(value) + if not isinstance(value, dict): + raise TypeError( + f"Expected dict for MAP type, got {type(value).__name__}: {value!r}" + ) + result.append(ProtoValue(map_val=_python_dict_to_map_proto(value))) + return result if feast_value_type == ValueType.MAP_LIST: - return [ - ProtoValue(map_list_val=_python_list_to_map_list_proto(value)) - if value is not None - else ProtoValue() - for value in values - ] + result = [] + for value in values: + if value is None: + result.append(ProtoValue()) + else: + if isinstance(value, str): + value = json.loads(value) + if not isinstance(value, list): + raise TypeError( + f"Expected list for MAP_LIST type, got {type(value).__name__}: {value!r}" + ) + result.append( + ProtoValue(map_list_val=_python_list_to_map_list_proto(value)) + ) + return result + + # Handle JSON type — serialize Python objects as JSON strings + if feast_value_type == ValueType.JSON: + result = [] + for value in values: + if value is None: + result.append(ProtoValue()) + else: + if isinstance(value, str): + try: + json.loads(value) + except (json.JSONDecodeError, TypeError) as e: + raise ValueError( + f"Invalid JSON string for JSON type: {e}" + ) from e + json_str = value + else: + json_str = json.dumps(value) + result.append(ProtoValue(json_val=json_str)) + return result + + if feast_value_type == ValueType.JSON_LIST: + result = [] + for value in values: + if value is None: + result.append(ProtoValue()) + else: + json_strings = [] + for v in value: + if isinstance(v, str): + try: + json.loads(v) + except (json.JSONDecodeError, TypeError) as e: + raise ValueError( + f"Invalid JSON string in JSON_LIST: {e}" + ) from e + json_strings.append(v) + else: + json_strings.append(json.dumps(v)) + result.append(ProtoValue(json_list_val=StringList(val=json_strings))) + return result + + # Handle Struct type — reuses Map proto for storage + if feast_value_type == ValueType.STRUCT: + result = [] + for value in values: + if value is None: + result.append(ProtoValue()) + else: + if isinstance(value, str): + value = json.loads(value) + if not isinstance(value, dict): + value = ( + dict(value) + if hasattr(value, "items") + else {"_value": str(value)} + ) + result.append(ProtoValue(struct_val=_python_dict_to_map_proto(value))) + return result + + if feast_value_type == ValueType.STRUCT_LIST: + result = [] + for value in values: + if value is None: + result.append(ProtoValue()) + else: + if isinstance(value, str): + value = json.loads(value) + result.append( + ProtoValue(struct_list_val=_python_list_to_map_list_proto(value)) + ) + return result # Get sample for type checking sample = next(filter(_non_empty_value, values), None) @@ -928,6 +1046,10 @@ def python_values_to_proto_values( "unix_timestamp_list_val": ValueType.UNIX_TIMESTAMP_LIST, "map_val": ValueType.MAP, "map_list_val": ValueType.MAP_LIST, + "json_val": ValueType.JSON, + "json_list_val": ValueType.JSON_LIST, + "struct_val": ValueType.STRUCT, + "struct_list_val": ValueType.STRUCT_LIST, "int32_set_val": ValueType.INT32_SET, "int64_set_val": ValueType.INT64_SET, "double_set_val": ValueType.DOUBLE_SET, @@ -967,6 +1089,12 @@ def pa_to_feast_value_type(pa_type_as_str: str) -> ValueType: if pa_type_as_str.startswith("timestamp"): value_type = ValueType.UNIX_TIMESTAMP + elif pa_type_as_str.startswith("map<"): + value_type = ValueType.MAP + elif pa_type_as_str == "large_string": + value_type = ValueType.STRING + elif pa_type_as_str.startswith("struct<") or pa_type_as_str.startswith("struct{"): + value_type = ValueType.STRUCT else: type_map = { "int32": ValueType.INT32, @@ -1012,6 +1140,9 @@ def bq_to_feast_value_type(bq_type_as_str: str) -> ValueType: "BOOL": ValueType.BOOL, "BOOLEAN": ValueType.BOOL, # legacy sql data type "NULL": ValueType.NULL, + "JSON": ValueType.JSON, + "STRUCT": ValueType.STRUCT, + "RECORD": ValueType.STRUCT, } value_type = type_map.get(bq_type_as_str, ValueType.STRING) @@ -1036,6 +1167,7 @@ def mssql_to_feast_value_type(mssql_type_as_str: str) -> ValueType: "nchar": ValueType.STRING, "nvarchar": ValueType.STRING, "nvarchar(max)": ValueType.STRING, + "json": ValueType.JSON, "real": ValueType.FLOAT, "smallint": ValueType.INT32, "tinyint": ValueType.INT32, @@ -1065,6 +1197,13 @@ def pa_to_mssql_type(pa_type: "pyarrow.DataType") -> str: if pa_type_as_str.startswith("decimal"): return pa_type_as_str + if pa_type_as_str.startswith("map<"): + return "nvarchar(max)" + if pa_type_as_str == "large_string": + return "nvarchar(max)" + if pa_type_as_str.startswith("struct<") or pa_type_as_str.startswith("struct{"): + return "nvarchar(max)" + # We have to take into account how arrow types map to parquet types as well. # For example, null type maps to int32 in parquet, so we have to use int4 in Redshift. # Other mappings have also been adjusted accordingly. @@ -1105,7 +1244,8 @@ def redshift_to_feast_value_type(redshift_type_as_str: str) -> ValueType: "varchar": ValueType.STRING, "timestamp": ValueType.UNIX_TIMESTAMP, "timestamptz": ValueType.UNIX_TIMESTAMP, - "super": ValueType.BYTES, + "super": ValueType.MAP, + "json": ValueType.JSON, # skip date, geometry, hllsketch, time, timetz } @@ -1126,6 +1266,10 @@ def snowflake_type_to_feast_value_type(snowflake_type: str) -> ValueType: "TIMESTAMP_TZ": ValueType.UNIX_TIMESTAMP, "TIMESTAMP_LTZ": ValueType.UNIX_TIMESTAMP, "TIMESTAMP_NTZ": ValueType.UNIX_TIMESTAMP, + "VARIANT": ValueType.MAP, + "OBJECT": ValueType.MAP, + "ARRAY": ValueType.STRING_LIST, + "JSON": ValueType.JSON, } return type_map[snowflake_type] @@ -1172,6 +1316,15 @@ def pa_to_redshift_value_type(pa_type: "pyarrow.DataType") -> str: if pa_type_as_str.startswith("list"): return "super" + if pa_type_as_str.startswith("map<"): + return "super" + + if pa_type_as_str == "large_string": + return "super" + + if pa_type_as_str.startswith("struct<"): + return "super" + # We have to take into account how arrow types map to parquet types as well. # For example, null type maps to int32 in parquet, so we have to use int4 in Redshift. # Other mappings have also been adjusted accordingly. @@ -1208,8 +1361,7 @@ def _non_empty_value(value: Any) -> bool: def spark_to_feast_value_type(spark_type_as_str: str) -> ValueType: - # TODO not all spark types are convertible - # Current non-convertible types: interval, map, struct, structfield, binary + # Current non-convertible types: interval, struct, structfield, binary type_map: Dict[str, ValueType] = { "null": ValueType.UNKNOWN, "byte": ValueType.BYTES, @@ -1235,14 +1387,24 @@ def spark_to_feast_value_type(spark_type_as_str: str) -> ValueType: "array": ValueType.UNIX_TIMESTAMP_LIST, "array": ValueType.UNIX_TIMESTAMP_LIST, } - if spark_type_as_str.startswith("decimal"): - spark_type_as_str = "decimal" - if spark_type_as_str.startswith("array Iterator[np.dtype]: @@ -1269,6 +1431,12 @@ def arrow_to_pg_type(t_str: str) -> str: try: if t_str.startswith("timestamp") or t_str.startswith("datetime"): return "timestamptz" if "tz=" in t_str else "timestamp" + if t_str.startswith("map<"): + return "jsonb" + if t_str == "large_string": + return "jsonb" + if t_str.startswith("struct<") or t_str.startswith("struct{"): + return "jsonb" return { "null": "null", "bool": "boolean", @@ -1329,6 +1497,10 @@ def pg_type_to_feast_value_type(type_str: str) -> ValueType: "numeric": ValueType.DOUBLE, "uuid": ValueType.STRING, "uuid[]": ValueType.STRING_LIST, + "json": ValueType.MAP, + "jsonb": ValueType.MAP, + "json[]": ValueType.MAP_LIST, + "jsonb[]": ValueType.MAP_LIST, } value = ( type_map[type_str.lower()] @@ -1362,6 +1534,14 @@ def feast_value_type_to_pa( ValueType.BYTES_LIST: pyarrow.list_(pyarrow.binary()), ValueType.BOOL_LIST: pyarrow.list_(pyarrow.bool_()), ValueType.UNIX_TIMESTAMP_LIST: pyarrow.list_(pyarrow.timestamp(timestamp_unit)), + ValueType.MAP: pyarrow.map_(pyarrow.string(), pyarrow.string()), + ValueType.MAP_LIST: pyarrow.list_( + pyarrow.map_(pyarrow.string(), pyarrow.string()) + ), + ValueType.JSON: pyarrow.large_string(), + ValueType.JSON_LIST: pyarrow.list_(pyarrow.large_string()), + ValueType.STRUCT: pyarrow.struct([]), + ValueType.STRUCT_LIST: pyarrow.list_(pyarrow.struct([])), ValueType.NULL: pyarrow.null(), } return type_map[feast_type] @@ -1442,7 +1622,9 @@ def athena_to_feast_value_type(athena_type_as_str: str) -> ValueType: "varchar": ValueType.STRING, "string": ValueType.STRING, "timestamp": ValueType.UNIX_TIMESTAMP, - # skip date,decimal,array,map,struct + "json": ValueType.JSON, + "struct": ValueType.STRUCT, + "map": ValueType.MAP, } return type_map[athena_type_as_str.lower()] @@ -1460,6 +1642,18 @@ def pa_to_athena_value_type(pa_type: "pyarrow.DataType") -> str: if pa_type_as_str.startswith("python_values_to_proto_values"): return pa_type_as_str + if pa_type_as_str.startswith("list"): + return "array" + + if pa_type_as_str.startswith("map<"): + return "string" + + if pa_type_as_str == "large_string": + return "string" + + if pa_type_as_str.startswith("struct<"): + return "string" + # We have to take into account how arrow types map to parquet types as well. # For example, null type maps to int32 in parquet, so we have to use int4 in Redshift. # Other mappings have also been adjusted accordingly. @@ -1529,6 +1723,8 @@ def convert_scalar_column( return series.astype("string") elif value_type == ValueType.UNIX_TIMESTAMP: return pd.to_datetime(series, unit="s", errors="coerce") + elif value_type in (ValueType.JSON, ValueType.STRUCT, ValueType.MAP): + return series else: return series.astype(target_pandas_type) diff --git a/sdk/python/feast/types.py b/sdk/python/feast/types.py index 922b3cce0ac..d94c356cd1a 100644 --- a/sdk/python/feast/types.py +++ b/sdk/python/feast/types.py @@ -33,6 +33,7 @@ "BOOL": "BOOL", "UNIX_TIMESTAMP": "UNIX_TIMESTAMP", "MAP": "MAP", + "JSON": "JSON", } @@ -85,6 +86,7 @@ class PrimitiveFeastType(Enum): PDF_BYTES = 9 IMAGE_BYTES = 10 MAP = 11 + JSON = 12 def to_value_type(self) -> ValueType: """ @@ -118,6 +120,7 @@ def __hash__(self): Float64 = PrimitiveFeastType.FLOAT64 UnixTimestamp = PrimitiveFeastType.UNIX_TIMESTAMP Map = PrimitiveFeastType.MAP +Json = PrimitiveFeastType.JSON SUPPORTED_BASE_TYPES = [ Invalid, @@ -132,6 +135,7 @@ def __hash__(self): Float64, UnixTimestamp, Map, + Json, ] PRIMITIVE_FEAST_TYPES_TO_STRING = { @@ -147,6 +151,7 @@ def __hash__(self): "FLOAT64": "Float64", "UNIX_TIMESTAMP": "UnixTimestamp", "MAP": "Map", + "JSON": "Json", } @@ -160,8 +165,9 @@ class Array(ComplexFeastType): base_type: Union[PrimitiveFeastType, ComplexFeastType] - def __init__(self, base_type: Union[PrimitiveFeastType, ComplexFeastType]): - if base_type not in SUPPORTED_BASE_TYPES: + def __init__(self, base_type: Union[PrimitiveFeastType, "ComplexFeastType"]): + # Allow Struct as a base type for Array(Struct(...)) + if not isinstance(base_type, Struct) and base_type not in SUPPORTED_BASE_TYPES: raise ValueError( f"Type {type(base_type)} is currently not supported as a base type for Array." ) @@ -169,6 +175,8 @@ def __init__(self, base_type: Union[PrimitiveFeastType, ComplexFeastType]): self.base_type = base_type def to_value_type(self) -> ValueType: + if isinstance(self.base_type, Struct): + return ValueType.STRUCT_LIST assert isinstance(self.base_type, PrimitiveFeastType) value_type_name = PRIMITIVE_FEAST_TYPES_TO_VALUE_TYPES[self.base_type.name] value_type_list_name = value_type_name + "_LIST" @@ -208,6 +216,53 @@ def __str__(self): return f"Set({self.base_type})" +class Struct(ComplexFeastType): + """ + A Struct represents a structured type with named, typed fields. + + Attributes: + fields: A dictionary mapping field names to their FeastTypes. + """ + + fields: Dict[str, Union[PrimitiveFeastType, "ComplexFeastType"]] + + def __init__( + self, fields: Dict[str, Union[PrimitiveFeastType, "ComplexFeastType"]] + ): + if not fields: + raise ValueError("Struct must have at least one field.") + self.fields = fields + + def to_value_type(self) -> ValueType: + return ValueType.STRUCT + + def to_pyarrow_type(self) -> pyarrow.DataType: + pa_fields = [] + for name, feast_type in self.fields.items(): + pa_type = from_feast_to_pyarrow_type(feast_type) + pa_fields.append(pyarrow.field(name, pa_type)) + return pyarrow.struct(pa_fields) + + def __str__(self): + field_strs = ", ".join( + f"{name}: {ftype}" for name, ftype in self.fields.items() + ) + return f"Struct({{{field_strs}}})" + + def __eq__(self, other): + if isinstance(other, Struct): + return self.fields == other.fields + return False + + def __hash__(self): + return hash( + ( + "Struct", + tuple((k, hash(v)) for k, v in sorted(self.fields.items())), + ) + ) + + FeastType = Union[ComplexFeastType, PrimitiveFeastType] VALUE_TYPES_TO_FEAST_TYPES: Dict["ValueType", FeastType] = { @@ -232,6 +287,8 @@ def __str__(self): ValueType.UNIX_TIMESTAMP_LIST: Array(UnixTimestamp), ValueType.MAP: Map, ValueType.MAP_LIST: Array(Map), + ValueType.JSON: Json, + ValueType.JSON_LIST: Array(Json), ValueType.BYTES_SET: Set(Bytes), ValueType.STRING_SET: Set(String), ValueType.INT32_SET: Set(Int32), @@ -251,6 +308,8 @@ def __str__(self): Float64: pyarrow.float64(), # Note: datetime only supports microseconds https://github.com/python/cpython/blob/3.8/Lib/datetime.py#L1559 UnixTimestamp: pyarrow.timestamp("us", tz=_utc_now().tzname()), + Map: pyarrow.map_(pyarrow.string(), pyarrow.string()), + Json: pyarrow.large_string(), } FEAST_VECTOR_TYPES: List[Union[ValueType, PrimitiveFeastType, ComplexFeastType]] = [ @@ -279,12 +338,25 @@ def from_feast_to_pyarrow_type(feast_type: FeastType) -> pyarrow.DataType: assert isinstance(feast_type, (ComplexFeastType, PrimitiveFeastType)), ( f"Expected FeastType, got {type(feast_type)}" ) + if isinstance(feast_type, Struct): + return feast_type.to_pyarrow_type() if isinstance(feast_type, PrimitiveFeastType): if feast_type in FEAST_TYPES_TO_PYARROW_TYPES: return FEAST_TYPES_TO_PYARROW_TYPES[feast_type] - elif isinstance(feast_type, ComplexFeastType): - # Handle the case when feast_type is an instance of ComplexFeastType - pass + elif isinstance(feast_type, Array): + base_type = feast_type.base_type + if isinstance(base_type, Struct): + return pyarrow.list_(base_type.to_pyarrow_type()) + if isinstance(base_type, PrimitiveFeastType): + if base_type == Map: + return pyarrow.list_(pyarrow.map_(pyarrow.string(), pyarrow.string())) + if base_type in FEAST_TYPES_TO_PYARROW_TYPES: + return pyarrow.list_(FEAST_TYPES_TO_PYARROW_TYPES[base_type]) + elif isinstance(feast_type, Set): + base_type = feast_type.base_type + if isinstance(base_type, PrimitiveFeastType): + if base_type in FEAST_TYPES_TO_PYARROW_TYPES: + return pyarrow.list_(FEAST_TYPES_TO_PYARROW_TYPES[base_type]) raise ValueError(f"Could not convert Feast type {feast_type} to PyArrow type.") @@ -304,6 +376,14 @@ def from_value_type( if value_type in VALUE_TYPES_TO_FEAST_TYPES: return VALUE_TYPES_TO_FEAST_TYPES[value_type] + # Struct types cannot be looked up from the dict because they require + # field definitions. Return a default placeholder Struct that can be + # enriched later from Field tags / schema metadata. + if value_type == ValueType.STRUCT: + return Struct({"_value": String}) + if value_type == ValueType.STRUCT_LIST: + return Array(Struct({"_value": String})) + raise ValueError(f"Could not convert value type {value_type} to FeastType.") @@ -322,6 +402,12 @@ def from_feast_type( Raises: ValueError: The conversion could not be performed. """ + # Handle Struct types directly since they are not in the dict + if isinstance(feast_type, Struct): + return ValueType.STRUCT + if isinstance(feast_type, Array) and isinstance(feast_type.base_type, Struct): + return ValueType.STRUCT_LIST + if feast_type in VALUE_TYPES_TO_FEAST_TYPES.values(): return list(VALUE_TYPES_TO_FEAST_TYPES.keys())[ list(VALUE_TYPES_TO_FEAST_TYPES.values()).index(feast_type) diff --git a/sdk/python/feast/value_type.py b/sdk/python/feast/value_type.py index bdd47952dc6..d05691199b4 100644 --- a/sdk/python/feast/value_type.py +++ b/sdk/python/feast/value_type.py @@ -67,6 +67,10 @@ class ValueType(enum.Enum): UNIX_TIMESTAMP_SET = 29 PDF_BYTES = 30 IMAGE_BYTES = 31 + JSON = 32 + JSON_LIST = 33 + STRUCT = 34 + STRUCT_LIST = 35 ListType = Union[ diff --git a/sdk/python/tests/unit/test_type_map.py b/sdk/python/tests/unit/test_type_map.py index 8508b490d78..8125ab61b90 100644 --- a/sdk/python/tests/unit/test_type_map.py +++ b/sdk/python/tests/unit/test_type_map.py @@ -1,15 +1,27 @@ import numpy as np import pandas as pd +import pyarrow import pytest from feast.protos.feast.types.Value_pb2 import Map, MapList from feast.type_map import ( + _convert_value_type_str_to_value_type, _python_dict_to_map_proto, _python_list_to_map_list_proto, + arrow_to_pg_type, + feast_value_type_to_pa, feast_value_type_to_python_type, + pa_to_feast_value_type, + pa_to_redshift_value_type, + pg_type_to_feast_value_type, python_type_to_feast_value_type, python_values_to_proto_values, + redshift_to_feast_value_type, + snowflake_type_to_feast_value_type, + spark_to_feast_value_type, ) +from feast.types import Array, from_feast_to_pyarrow_type +from feast.types import Map as FeastMap from feast.value_type import ValueType @@ -461,3 +473,945 @@ def test_multiple_set_values(self): assert feast_value_type_to_python_type(protos[0]) == {1, 2, 3} assert feast_value_type_to_python_type(protos[1]) == {4, 5} assert feast_value_type_to_python_type(protos[2]) == {6} + + +class TestMapArrowTypeSupport: + """Test cases for MAP and MAP_LIST Arrow type conversions.""" + + def test_feast_value_type_to_pa_map(self): + """Test that ValueType.MAP converts to a PyArrow map type.""" + pa_type = feast_value_type_to_pa(ValueType.MAP) + assert isinstance(pa_type, pyarrow.MapType) + assert pa_type.key_type == pyarrow.string() + + def test_feast_value_type_to_pa_map_list(self): + """Test that ValueType.MAP_LIST converts to a PyArrow list of maps.""" + pa_type = feast_value_type_to_pa(ValueType.MAP_LIST) + assert isinstance(pa_type, pyarrow.ListType) + assert isinstance(pa_type.value_type, pyarrow.MapType) + + def test_pa_to_feast_value_type_map(self): + """Test that PyArrow map type string converts to ValueType.MAP.""" + result = pa_to_feast_value_type("map") + assert result == ValueType.MAP + + def test_pa_to_feast_value_type_map_various_value_types(self): + """Test that various PyArrow map type strings all convert to MAP.""" + assert pa_to_feast_value_type("map") == ValueType.MAP + assert pa_to_feast_value_type("map") == ValueType.MAP + assert pa_to_feast_value_type("map") == ValueType.MAP + + def test_from_feast_to_pyarrow_type_map(self): + """Test that Feast Map type converts to PyArrow map type.""" + pa_type = from_feast_to_pyarrow_type(FeastMap) + assert isinstance(pa_type, pyarrow.MapType) + + def test_from_feast_to_pyarrow_type_array_map(self): + """Test that Feast Array(Map) converts to PyArrow list of maps.""" + pa_type = from_feast_to_pyarrow_type(Array(FeastMap)) + assert isinstance(pa_type, pyarrow.ListType) + assert isinstance(pa_type.value_type, pyarrow.MapType) + + def test_convert_value_type_str_map(self): + """Test that 'MAP' string converts to ValueType.MAP.""" + assert _convert_value_type_str_to_value_type("MAP") == ValueType.MAP + + def test_convert_value_type_str_map_list(self): + """Test that 'MAP_LIST' string converts to ValueType.MAP_LIST.""" + assert _convert_value_type_str_to_value_type("MAP_LIST") == ValueType.MAP_LIST + + def test_arrow_to_pg_type_map(self): + """Test that Arrow map type converts to Postgres jsonb.""" + assert arrow_to_pg_type("map") == "jsonb" + assert arrow_to_pg_type("map") == "jsonb" + + def test_pg_type_to_feast_value_type_json(self): + """Test that Postgres json/jsonb types convert to ValueType.MAP.""" + assert pg_type_to_feast_value_type("json") == ValueType.MAP + assert pg_type_to_feast_value_type("jsonb") == ValueType.MAP + + def test_pg_type_to_feast_value_type_json_array(self): + """Test that Postgres json[]/jsonb[] types convert to ValueType.MAP_LIST.""" + assert pg_type_to_feast_value_type("json[]") == ValueType.MAP_LIST + assert pg_type_to_feast_value_type("jsonb[]") == ValueType.MAP_LIST + + def test_snowflake_variant_to_map(self): + """Test that Snowflake VARIANT/OBJECT types convert to ValueType.MAP.""" + assert snowflake_type_to_feast_value_type("VARIANT") == ValueType.MAP + assert snowflake_type_to_feast_value_type("OBJECT") == ValueType.MAP + + def test_redshift_super_to_map(self): + """Test that Redshift super type converts to ValueType.MAP.""" + assert redshift_to_feast_value_type("super") == ValueType.MAP + + def test_map_roundtrip_proto_to_arrow_type(self): + """Test that MAP type survives a full conversion roundtrip.""" + pa_type = feast_value_type_to_pa(ValueType.MAP) + pa_type_str = str(pa_type) + roundtrip = pa_to_feast_value_type(pa_type_str) + assert roundtrip == ValueType.MAP + + def test_spark_map_to_feast(self): + """Test that Spark map types convert to ValueType.MAP.""" + assert spark_to_feast_value_type("map") == ValueType.MAP + assert spark_to_feast_value_type("map") == ValueType.MAP + assert spark_to_feast_value_type("MAP") == ValueType.MAP + + def test_spark_array_map_to_feast(self): + """Test that Spark array> types convert to ValueType.MAP_LIST.""" + assert ( + spark_to_feast_value_type("array>") == ValueType.MAP_LIST + ) + + def test_spark_unknown_still_returns_null(self): + """Test that unrecognized Spark types still return NULL.""" + assert spark_to_feast_value_type("interval") == ValueType.NULL + + def test_spark_struct_to_feast_struct(self): + """Test that Spark struct types now convert to ValueType.STRUCT.""" + assert spark_to_feast_value_type("struct") == ValueType.STRUCT + + +class TestEnableValidationOnFeatureView: + """Test that enable_validation is a real parameter on FeatureView.""" + + def test_feature_view_has_enable_validation_default_false(self): + """Test that FeatureView has enable_validation defaulting to False.""" + import inspect + + from feast.feature_view import FeatureView + + sig = inspect.signature(FeatureView.__init__) + assert "enable_validation" in sig.parameters + assert sig.parameters["enable_validation"].default is False + + def test_batch_feature_view_has_enable_validation(self): + """Test that BatchFeatureView has enable_validation parameter.""" + import inspect + + from feast.batch_feature_view import BatchFeatureView + + sig = inspect.signature(BatchFeatureView.__init__) + assert "enable_validation" in sig.parameters + assert sig.parameters["enable_validation"].default is False + + def test_stream_feature_view_has_enable_validation(self): + """Test that StreamFeatureView has enable_validation parameter.""" + import inspect + + from feast.stream_feature_view import StreamFeatureView + + sig = inspect.signature(StreamFeatureView.__init__) + assert "enable_validation" in sig.parameters + assert sig.parameters["enable_validation"].default is False + + +class TestRedshiftDynamoDBMapSupport: + """Test cases for DynamoDB + Redshift map type round-trips.""" + + def test_pa_to_redshift_value_type_map(self): + """Test that Arrow map type maps to Redshift 'super' type.""" + pa_type = feast_value_type_to_pa(ValueType.MAP) + assert pa_to_redshift_value_type(pa_type) == "super" + + def test_pa_to_redshift_value_type_map_list(self): + """Test that Arrow list-of-map type maps to Redshift 'super' type.""" + pa_type = feast_value_type_to_pa(ValueType.MAP_LIST) + assert pa_to_redshift_value_type(pa_type) == "super" + + def test_json_string_to_map_proto(self): + """Test that JSON strings are parsed to MAP protos during materialization.""" + json_str = '{"key1": "value1", "key2": "value2"}' + protos = python_values_to_proto_values([json_str], ValueType.MAP) + converted = feast_value_type_to_python_type(protos[0]) + assert isinstance(converted, dict) + assert converted["key1"] == "value1" + assert converted["key2"] == "value2" + + def test_json_string_to_map_list_proto(self): + """Test that JSON strings are parsed to MAP_LIST protos during materialization.""" + json_str = '[{"a": "1"}, {"b": "2"}]' + protos = python_values_to_proto_values([json_str], ValueType.MAP_LIST) + converted = feast_value_type_to_python_type(protos[0]) + assert isinstance(converted, list) + assert len(converted) == 2 + assert converted[0]["a"] == "1" + + def test_dict_still_works_for_map(self): + """Test that regular Python dicts still work for MAP (no regression).""" + test_dict = {"x": "y", "a": 1} + protos = python_values_to_proto_values([test_dict], ValueType.MAP) + converted = feast_value_type_to_python_type(protos[0]) + assert isinstance(converted, dict) + assert converted["x"] == "y" + + def test_none_map_still_works(self): + """Test that None MAP values still produce empty proto (no regression).""" + protos = python_values_to_proto_values([None], ValueType.MAP) + converted = feast_value_type_to_python_type(protos[0]) + assert converted is None + + def test_redshift_super_roundtrip(self): + """Test full type conversion roundtrip: Redshift super → Feast MAP → Arrow → Redshift super.""" + feast_type = redshift_to_feast_value_type("super") + assert feast_type == ValueType.MAP + pa_type = feast_value_type_to_pa(feast_type) + redshift_type = pa_to_redshift_value_type(pa_type) + assert redshift_type == "super" + + +class TestJsonTypeSupport: + """Test cases for JSON value type.""" + + def test_simple_json_conversion(self): + """Test basic JSON type conversion: Python dict -> proto (json_val) -> Python.""" + test_data = {"name": "Alice", "age": 30, "active": True} + protos = python_values_to_proto_values([test_data], ValueType.JSON) + converted = feast_value_type_to_python_type(protos[0]) + + assert isinstance(converted, dict) + assert converted["name"] == "Alice" + assert converted["age"] == 30 + assert converted["active"] is True + + def test_json_string_passthrough(self): + """Test that a raw JSON string is stored and returned correctly.""" + json_str = '{"key": "value", "count": 42}' + protos = python_values_to_proto_values([json_str], ValueType.JSON) + converted = feast_value_type_to_python_type(protos[0]) + + assert isinstance(converted, dict) + assert converted["key"] == "value" + assert converted["count"] == 42 + + def test_json_array_value(self): + """Test JSON type with an array as the top-level value.""" + test_data = [1, 2, 3, "four"] + protos = python_values_to_proto_values([test_data], ValueType.JSON) + converted = feast_value_type_to_python_type(protos[0]) + + assert isinstance(converted, list) + assert converted == [1, 2, 3, "four"] + + def test_json_nested(self): + """Test deeply nested JSON structures.""" + test_data = { + "level1": {"level2": {"level3": {"value": "deep"}}}, + "array": [{"a": 1}, {"b": 2}], + } + protos = python_values_to_proto_values([test_data], ValueType.JSON) + converted = feast_value_type_to_python_type(protos[0]) + + assert converted["level1"]["level2"]["level3"]["value"] == "deep" + assert converted["array"][0]["a"] == 1 + + def test_null_json(self): + """Test None JSON conversion.""" + protos = python_values_to_proto_values([None], ValueType.JSON) + converted = feast_value_type_to_python_type(protos[0]) + assert converted is None + + def test_json_list_conversion(self): + """Test JSON_LIST type conversion.""" + test_data = [ + {"name": "Alice"}, + '{"name": "Bob"}', + {"count": 42}, + ] + protos = python_values_to_proto_values([test_data], ValueType.JSON_LIST) + converted = feast_value_type_to_python_type(protos[0]) + + assert isinstance(converted, list) + assert len(converted) == 3 + assert converted[0] == {"name": "Alice"} + assert converted[1] == {"name": "Bob"} + assert converted[2] == {"count": 42} + + def test_null_json_list(self): + """Test None JSON_LIST conversion.""" + protos = python_values_to_proto_values([None], ValueType.JSON_LIST) + converted = feast_value_type_to_python_type(protos[0]) + assert converted is None + + def test_multiple_json_values(self): + """Test conversion of multiple JSON values.""" + test_values = [ + {"x": 1}, + {"y": 2}, + None, + {"z": 3}, + ] + protos = python_values_to_proto_values(test_values, ValueType.JSON) + converted = [feast_value_type_to_python_type(p) for p in protos] + + assert converted[0] == {"x": 1} + assert converted[1] == {"y": 2} + assert converted[2] is None + assert converted[3] == {"z": 3} + + def test_feast_value_type_to_pa_json(self): + """Test that ValueType.JSON converts to PyArrow large_string.""" + pa_type = feast_value_type_to_pa(ValueType.JSON) + assert pa_type == pyarrow.large_string() + + def test_feast_value_type_to_pa_json_list(self): + """Test that ValueType.JSON_LIST converts to PyArrow list of large_string.""" + pa_type = feast_value_type_to_pa(ValueType.JSON_LIST) + assert isinstance(pa_type, pyarrow.ListType) + assert pa_type.value_type == pyarrow.large_string() + + def test_convert_value_type_str_json(self): + """Test that 'JSON' string converts to ValueType.JSON.""" + assert _convert_value_type_str_to_value_type("JSON") == ValueType.JSON + assert _convert_value_type_str_to_value_type("JSON_LIST") == ValueType.JSON_LIST + + def test_arrow_to_pg_type_json(self): + """Test that Arrow large_string converts to Postgres jsonb.""" + assert arrow_to_pg_type("large_string") == "jsonb" + + def test_bq_json_to_feast(self): + """Test that BigQuery JSON type converts to ValueType.JSON.""" + from feast.type_map import bq_to_feast_value_type + + assert bq_to_feast_value_type("JSON") == ValueType.JSON + + def test_spark_struct_not_json(self): + """Test that Spark struct types map to STRUCT not JSON.""" + assert spark_to_feast_value_type("struct") == ValueType.STRUCT + + def test_snowflake_json_to_feast(self): + """Test that Snowflake JSON type converts to ValueType.JSON.""" + assert snowflake_type_to_feast_value_type("JSON") == ValueType.JSON + + def test_json_feast_type_aliases(self): + """Test Json FeastType alias and conversions.""" + from feast.types import Json, from_feast_to_pyarrow_type + + pa_type = from_feast_to_pyarrow_type(Json) + assert pa_type == pyarrow.large_string() + + def test_json_value_types_mapping(self): + """Test JSON types in VALUE_TYPES_TO_FEAST_TYPES.""" + from feast.types import VALUE_TYPES_TO_FEAST_TYPES, Json + + assert VALUE_TYPES_TO_FEAST_TYPES[ValueType.JSON] == Json + + def test_pa_to_feast_value_type_large_string(self): + """Test that large_string arrow type converts to ValueType.JSON.""" + result = pa_to_feast_value_type("large_string") + assert result == ValueType.JSON + + +class TestStructTypeSupport: + """Test cases for STRUCT value type.""" + + def test_simple_struct_conversion(self): + """Test basic STRUCT type conversion: Python dict -> proto (struct_val) -> Python dict.""" + test_data = {"name": "Alice", "age": 30} + protos = python_values_to_proto_values([test_data], ValueType.STRUCT) + converted = feast_value_type_to_python_type(protos[0]) + + assert isinstance(converted, dict) + assert converted["name"] == "Alice" + assert converted["age"] == 30 + + def test_nested_struct_conversion(self): + """Test nested STRUCT type conversion.""" + test_data = { + "address": {"street": "123 Main St", "city": "NYC"}, + "name": "Alice", + } + protos = python_values_to_proto_values([test_data], ValueType.STRUCT) + converted = feast_value_type_to_python_type(protos[0]) + + assert converted["address"]["street"] == "123 Main St" + assert converted["address"]["city"] == "NYC" + assert converted["name"] == "Alice" + + def test_null_struct(self): + """Test None STRUCT conversion.""" + protos = python_values_to_proto_values([None], ValueType.STRUCT) + converted = feast_value_type_to_python_type(protos[0]) + assert converted is None + + def test_struct_list_conversion(self): + """Test STRUCT_LIST type conversion.""" + test_data = [ + {"name": "Alice", "age": 30}, + {"name": "Bob", "age": 25}, + ] + protos = python_values_to_proto_values([test_data], ValueType.STRUCT_LIST) + converted = feast_value_type_to_python_type(protos[0]) + + assert isinstance(converted, list) + assert len(converted) == 2 + assert converted[0]["name"] == "Alice" + assert converted[1]["age"] == 25 + + def test_null_struct_list(self): + """Test None STRUCT_LIST conversion.""" + protos = python_values_to_proto_values([None], ValueType.STRUCT_LIST) + converted = feast_value_type_to_python_type(protos[0]) + assert converted is None + + def test_multiple_struct_values(self): + """Test conversion of multiple STRUCT values.""" + test_values = [ + {"x": 1}, + None, + {"y": 2, "z": 3}, + ] + protos = python_values_to_proto_values(test_values, ValueType.STRUCT) + converted = [feast_value_type_to_python_type(p) for p in protos] + + assert converted[0] == {"x": 1} + assert converted[1] is None + assert converted[2] == {"y": 2, "z": 3} + + def test_struct_class_creation(self): + """Test Struct FeastType creation and validation.""" + from feast.types import Int32, String, Struct + + struct_type = Struct({"name": String, "age": Int32}) + assert struct_type.to_value_type() == ValueType.STRUCT + assert "name" in struct_type.fields + assert struct_type.fields["name"] == String + assert struct_type.fields["age"] == Int32 + + def test_struct_empty_raises(self): + """Test that empty Struct raises ValueError.""" + from feast.types import Struct + + with pytest.raises(ValueError, match="at least one field"): + Struct({}) + + def test_struct_to_pyarrow(self): + """Test Struct type converts to PyArrow struct.""" + from feast.types import Int32, String, Struct + + struct_type = Struct({"name": String, "age": Int32}) + pa_type = struct_type.to_pyarrow_type() + + assert pyarrow.types.is_struct(pa_type) + assert pa_type.get_field_index("name") >= 0 + assert pa_type.get_field_index("age") >= 0 + + def test_struct_from_feast_to_pyarrow(self): + """Test from_feast_to_pyarrow_type handles Struct.""" + from feast.types import Int32, String, Struct + + struct_type = Struct({"name": String, "age": Int32}) + pa_type = from_feast_to_pyarrow_type(struct_type) + + assert pyarrow.types.is_struct(pa_type) + + def test_array_of_struct(self): + """Test Array(Struct(...)) works.""" + from feast.types import Array, Int32, String, Struct + + struct_type = Struct({"name": String, "value": Int32}) + array_type = Array(struct_type) + + assert array_type.to_value_type() == ValueType.STRUCT_LIST + pa_type = from_feast_to_pyarrow_type(array_type) + assert isinstance(pa_type, pyarrow.ListType) + assert pyarrow.types.is_struct(pa_type.value_type) + + def test_feast_value_type_to_pa_struct(self): + """Test that ValueType.STRUCT converts to PyArrow struct (empty default).""" + pa_type = feast_value_type_to_pa(ValueType.STRUCT) + assert pyarrow.types.is_struct(pa_type) + + def test_feast_value_type_to_pa_struct_list(self): + """Test that ValueType.STRUCT_LIST converts to PyArrow list of struct.""" + pa_type = feast_value_type_to_pa(ValueType.STRUCT_LIST) + assert isinstance(pa_type, pyarrow.ListType) + assert pyarrow.types.is_struct(pa_type.value_type) + + def test_convert_value_type_str_struct(self): + """Test that 'STRUCT' string converts to ValueType.STRUCT.""" + assert _convert_value_type_str_to_value_type("STRUCT") == ValueType.STRUCT + assert ( + _convert_value_type_str_to_value_type("STRUCT_LIST") + == ValueType.STRUCT_LIST + ) + + def test_spark_struct_to_feast(self): + """Test that Spark struct types convert to ValueType.STRUCT.""" + assert spark_to_feast_value_type("struct") == ValueType.STRUCT + assert spark_to_feast_value_type("STRUCT") == ValueType.STRUCT + + def test_spark_array_struct_to_feast(self): + """Test that Spark array> types convert to STRUCT_LIST.""" + assert ( + spark_to_feast_value_type("array>") == ValueType.STRUCT_LIST + ) + + def test_bq_struct_to_feast(self): + """Test that BigQuery STRUCT/RECORD types convert to ValueType.STRUCT.""" + from feast.type_map import bq_to_feast_value_type + + assert bq_to_feast_value_type("STRUCT") == ValueType.STRUCT + assert bq_to_feast_value_type("RECORD") == ValueType.STRUCT + + def test_pa_to_feast_value_type_struct(self): + """Test that struct arrow type string converts to ValueType.STRUCT.""" + result = pa_to_feast_value_type("struct") + assert result == ValueType.STRUCT + + def test_struct_schema_persistence(self): + """Test that Struct schema is preserved through Field serialization/deserialization.""" + from feast.field import Field + from feast.types import Int32, String, Struct + + struct_type = Struct({"street": String, "zip": Int32}) + field = Field(name="address", dtype=struct_type) + + proto = field.to_proto() + restored = Field.from_proto(proto) + + assert isinstance(restored.dtype, Struct) + assert "street" in restored.dtype.fields + assert "zip" in restored.dtype.fields + assert restored.dtype.fields["street"] == String + assert restored.dtype.fields["zip"] == Int32 + + def test_struct_json_string_parsing(self): + """Test that JSON string input is parsed for STRUCT type.""" + json_str = '{"name": "Alice", "score": 95}' + protos = python_values_to_proto_values([json_str], ValueType.STRUCT) + converted = feast_value_type_to_python_type(protos[0]) + + assert isinstance(converted, dict) + assert converted["name"] == "Alice" + assert converted["score"] == 95 + + def test_struct_equality(self): + """Test Struct type equality.""" + from feast.types import Int32, String, Struct + + s1 = Struct({"name": String, "age": Int32}) + s2 = Struct({"name": String, "age": Int32}) + s3 = Struct({"name": String}) + + assert s1 == s2 + assert s1 != s3 + + def test_from_feast_type_struct(self): + """Test from_feast_type works for Struct.""" + from feast.types import Int32, String, Struct, from_feast_type + + struct_type = Struct({"name": String, "age": Int32}) + value_type = from_feast_type(struct_type) + assert value_type == ValueType.STRUCT + + def test_from_value_type_struct(self): + """Test from_value_type works for STRUCT (returns placeholder).""" + from feast.types import Struct, from_value_type + + feast_type = from_value_type(ValueType.STRUCT) + assert isinstance(feast_type, Struct) + + def test_from_value_type_struct_list(self): + """Test from_value_type works for STRUCT_LIST (returns placeholder Array(Struct)).""" + from feast.types import Array, Struct, from_value_type + + feast_type = from_value_type(ValueType.STRUCT_LIST) + assert isinstance(feast_type, Array) + assert isinstance(feast_type.base_type, Struct) + + +class TestJsonValidation: + """Test JSON well-formedness validation.""" + + def test_proto_conversion_valid_json_string(self): + """Valid JSON strings should convert without error.""" + valid_json = '{"key": "value", "num": 42}' + protos = python_values_to_proto_values([valid_json], ValueType.JSON) + assert protos[0].json_val == valid_json + + def test_proto_conversion_invalid_json_string_raises(self): + """Invalid JSON strings should raise ValueError during proto conversion.""" + import pytest + + invalid_json = "this is not json {{" + with pytest.raises(ValueError, match="Invalid JSON string for JSON type"): + python_values_to_proto_values([invalid_json], ValueType.JSON) + + def test_proto_conversion_dict_no_validation_needed(self): + """Python dicts are valid by definition and should not raise.""" + data = {"name": "Alice", "items": [1, 2, 3]} + protos = python_values_to_proto_values([data], ValueType.JSON) + converted = feast_value_type_to_python_type(protos[0]) + assert converted == data + + def test_proto_conversion_list_no_validation_needed(self): + """Python lists are valid by definition and should not raise.""" + data = [1, "two", {"three": 3}] + protos = python_values_to_proto_values([data], ValueType.JSON) + converted = feast_value_type_to_python_type(protos[0]) + assert converted == data + + def test_proto_conversion_none_passes(self): + """None values should pass through without validation.""" + protos = python_values_to_proto_values([None], ValueType.JSON) + converted = feast_value_type_to_python_type(protos[0]) + assert converted is None + + def test_proto_conversion_json_list_invalid_string_raises(self): + """Invalid JSON strings in JSON_LIST should raise ValueError.""" + import pytest + + data = ['{"valid": true}', "not-json"] + with pytest.raises(ValueError, match="Invalid JSON string in JSON_LIST"): + python_values_to_proto_values([data], ValueType.JSON_LIST) + + def test_proto_conversion_json_list_valid_mixed(self): + """JSON_LIST with valid strings and dicts should succeed.""" + data = ['{"a": 1}', {"b": 2}] + protos = python_values_to_proto_values([data], ValueType.JSON_LIST) + converted = feast_value_type_to_python_type(protos[0]) + assert len(converted) == 2 + assert converted[0] == {"a": 1} + assert converted[1] == {"b": 2} + + def test_proto_conversion_json_scalar_string(self): + """JSON scalar values like numbers-as-strings should validate.""" + protos = python_values_to_proto_values(["42"], ValueType.JSON) + converted = feast_value_type_to_python_type(protos[0]) + assert converted == 42 + + def test_proto_conversion_json_null_string(self): + """The JSON string 'null' is valid JSON.""" + protos = python_values_to_proto_values(["null"], ValueType.JSON) + converted = feast_value_type_to_python_type(protos[0]) + assert converted is None + + def test_proto_conversion_json_empty_string_raises(self): + """An empty string is not valid JSON.""" + import pytest + + with pytest.raises(ValueError, match="Invalid JSON string for JSON type"): + python_values_to_proto_values([""], ValueType.JSON) + + def test_local_validation_node_valid_json(self): + """LocalValidationNode should accept valid JSON strings.""" + from feast.infra.compute_engines.local.nodes import LocalValidationNode + + table = pyarrow.table( + {"config": ['{"a": 1}', '{"b": 2}', "null"]}, + schema=pyarrow.schema([pyarrow.field("config", pyarrow.string())]), + ) + + node = LocalValidationNode( + name="test_validate", + validation_config={ + "columns": {"config": pyarrow.large_string()}, + "json_columns": {"config"}, + }, + backend=None, + ) + # Should not raise + node._validate_schema(table) + + def test_local_validation_node_invalid_json(self): + """LocalValidationNode should reject invalid JSON strings.""" + import pytest + + from feast.infra.compute_engines.local.nodes import LocalValidationNode + + table = pyarrow.table( + {"config": ['{"valid": true}', "not-json-at-all", '{"ok": 1}']}, + schema=pyarrow.schema([pyarrow.field("config", pyarrow.string())]), + ) + + node = LocalValidationNode( + name="test_validate", + validation_config={ + "columns": {"config": pyarrow.large_string()}, + "json_columns": {"config"}, + }, + backend=None, + ) + with pytest.raises(ValueError, match="invalid JSON value"): + node._validate_schema(table) + + def test_local_validation_node_skips_nulls(self): + """LocalValidationNode should skip null values in JSON columns.""" + from feast.infra.compute_engines.local.nodes import LocalValidationNode + + table = pyarrow.table( + {"config": ['{"a": 1}', None, '{"b": 2}']}, + schema=pyarrow.schema([pyarrow.field("config", pyarrow.string())]), + ) + + node = LocalValidationNode( + name="test_validate", + validation_config={ + "columns": {"config": pyarrow.large_string()}, + "json_columns": {"config"}, + }, + backend=None, + ) + # Should not raise + node._validate_schema(table) + + def test_local_validation_node_no_json_columns(self): + """LocalValidationNode should skip JSON validation if no json_columns.""" + from feast.infra.compute_engines.local.nodes import LocalValidationNode + + table = pyarrow.table( + {"data": ["not-json"]}, + schema=pyarrow.schema([pyarrow.field("data", pyarrow.string())]), + ) + + node = LocalValidationNode( + name="test_validate", + validation_config={ + "columns": {"data": pyarrow.string()}, + }, + backend=None, + ) + # Should not raise — no json_columns configured + node._validate_schema(table) + + def test_local_validation_node_error_message_shows_row_and_detail(self): + """Error message should include the row number and parse error.""" + import pytest + + from feast.infra.compute_engines.local.nodes import LocalValidationNode + + table = pyarrow.table( + {"config": ['{"ok": 1}', '{"ok": 2}', "{bad}"]}, + schema=pyarrow.schema([pyarrow.field("config", pyarrow.string())]), + ) + + node = LocalValidationNode( + name="test_validate", + validation_config={ + "columns": {"config": pyarrow.large_string()}, + "json_columns": {"config"}, + }, + backend=None, + ) + with pytest.raises(ValueError, match="row 2"): + node._validate_schema(table) + + +class TestSparkNativeTypeValidation: + """Test Spark-native type mapping and compatibility checking.""" + + def test_feast_string_to_spark_string(self): + from pyspark.sql.types import StringType + + from feast.infra.compute_engines.spark.nodes import from_feast_to_spark_type + from feast.types import String + + assert from_feast_to_spark_type(String) == StringType() + + def test_feast_int32_to_spark_integer(self): + from pyspark.sql.types import IntegerType + + from feast.infra.compute_engines.spark.nodes import from_feast_to_spark_type + from feast.types import Int32 + + assert from_feast_to_spark_type(Int32) == IntegerType() + + def test_feast_int64_to_spark_long(self): + from pyspark.sql.types import LongType + + from feast.infra.compute_engines.spark.nodes import from_feast_to_spark_type + from feast.types import Int64 + + assert from_feast_to_spark_type(Int64) == LongType() + + def test_feast_float32_to_spark_float(self): + from pyspark.sql.types import FloatType + + from feast.infra.compute_engines.spark.nodes import from_feast_to_spark_type + from feast.types import Float32 + + assert from_feast_to_spark_type(Float32) == FloatType() + + def test_feast_float64_to_spark_double(self): + from pyspark.sql.types import DoubleType + + from feast.infra.compute_engines.spark.nodes import from_feast_to_spark_type + from feast.types import Float64 + + assert from_feast_to_spark_type(Float64) == DoubleType() + + def test_feast_bool_to_spark_boolean(self): + from pyspark.sql.types import BooleanType + + from feast.infra.compute_engines.spark.nodes import from_feast_to_spark_type + from feast.types import Bool + + assert from_feast_to_spark_type(Bool) == BooleanType() + + def test_feast_bytes_to_spark_binary(self): + from pyspark.sql.types import BinaryType + + from feast.infra.compute_engines.spark.nodes import from_feast_to_spark_type + from feast.types import Bytes + + assert from_feast_to_spark_type(Bytes) == BinaryType() + + def test_feast_timestamp_to_spark_timestamp(self): + from pyspark.sql.types import TimestampType + + from feast.infra.compute_engines.spark.nodes import from_feast_to_spark_type + from feast.types import UnixTimestamp + + assert from_feast_to_spark_type(UnixTimestamp) == TimestampType() + + def test_feast_map_to_spark_map(self): + from pyspark.sql.types import MapType, StringType + + from feast.infra.compute_engines.spark.nodes import from_feast_to_spark_type + from feast.types import Map + + assert from_feast_to_spark_type(Map) == MapType(StringType(), StringType()) + + def test_feast_json_to_spark_string(self): + from pyspark.sql.types import StringType + + from feast.infra.compute_engines.spark.nodes import from_feast_to_spark_type + from feast.types import Json + + assert from_feast_to_spark_type(Json) == StringType() + + def test_feast_array_int_to_spark_array(self): + from pyspark.sql.types import ArrayType, IntegerType + + from feast.infra.compute_engines.spark.nodes import from_feast_to_spark_type + from feast.types import Array, Int32 + + assert from_feast_to_spark_type(Array(Int32)) == ArrayType(IntegerType()) + + def test_feast_array_map_to_spark_array(self): + from pyspark.sql.types import ArrayType, MapType, StringType + + from feast.infra.compute_engines.spark.nodes import from_feast_to_spark_type + from feast.types import Array, Map + + assert from_feast_to_spark_type(Array(Map)) == ArrayType( + MapType(StringType(), StringType()) + ) + + def test_feast_struct_to_spark_struct(self): + from pyspark.sql.types import IntegerType, StringType, StructField, StructType + + from feast.infra.compute_engines.spark.nodes import from_feast_to_spark_type + from feast.types import Int32, String, Struct + + struct = Struct({"name": String, "age": Int32}) + expected = StructType( + [ + StructField("name", StringType(), True), + StructField("age", IntegerType(), True), + ] + ) + assert from_feast_to_spark_type(struct) == expected + + def test_feast_array_struct_to_spark_array_struct(self): + from pyspark.sql.types import ( + ArrayType, + IntegerType, + StringType, + StructField, + StructType, + ) + + from feast.infra.compute_engines.spark.nodes import from_feast_to_spark_type + from feast.types import Array, Int32, String, Struct + + struct = Struct({"name": String, "age": Int32}) + expected = ArrayType( + StructType( + [ + StructField("name", StringType(), True), + StructField("age", IntegerType(), True), + ] + ) + ) + assert from_feast_to_spark_type(Array(struct)) == expected + + def test_unsupported_type_returns_none(self): + from feast.infra.compute_engines.spark.nodes import from_feast_to_spark_type + from feast.types import Invalid + + assert from_feast_to_spark_type(Invalid) is None + + # Compatibility tests + + def test_exact_match_compatible(self): + from pyspark.sql.types import StringType + + from feast.infra.compute_engines.spark.nodes import _spark_types_compatible + + assert _spark_types_compatible(StringType(), StringType()) + + def test_map_struct_compatible(self): + from pyspark.sql.types import MapType, StringType, StructType + + from feast.infra.compute_engines.spark.nodes import _spark_types_compatible + + assert _spark_types_compatible( + MapType(StringType(), StringType()), StructType([]) + ) + + def test_struct_map_compatible(self): + from pyspark.sql.types import MapType, StringType, StructType + + from feast.infra.compute_engines.spark.nodes import _spark_types_compatible + + assert _spark_types_compatible( + StructType([]), MapType(StringType(), StringType()) + ) + + def test_integer_long_widening_compatible(self): + from pyspark.sql.types import IntegerType, LongType + + from feast.infra.compute_engines.spark.nodes import _spark_types_compatible + + assert _spark_types_compatible(IntegerType(), LongType()) + assert _spark_types_compatible(LongType(), IntegerType()) + + def test_float_double_widening_compatible(self): + from pyspark.sql.types import DoubleType, FloatType + + from feast.infra.compute_engines.spark.nodes import _spark_types_compatible + + assert _spark_types_compatible(FloatType(), DoubleType()) + assert _spark_types_compatible(DoubleType(), FloatType()) + + def test_string_vs_integer_incompatible(self): + from pyspark.sql.types import IntegerType, StringType + + from feast.infra.compute_engines.spark.nodes import _spark_types_compatible + + assert not _spark_types_compatible(StringType(), IntegerType()) + + def test_bool_vs_double_incompatible(self): + from pyspark.sql.types import BooleanType, DoubleType + + from feast.infra.compute_engines.spark.nodes import _spark_types_compatible + + assert not _spark_types_compatible(BooleanType(), DoubleType()) + + def test_array_element_compatibility(self): + from pyspark.sql.types import ArrayType, IntegerType, LongType + + from feast.infra.compute_engines.spark.nodes import _spark_types_compatible + + assert _spark_types_compatible(ArrayType(IntegerType()), ArrayType(LongType())) + + def test_array_element_incompatibility(self): + from pyspark.sql.types import ArrayType, IntegerType, StringType + + from feast.infra.compute_engines.spark.nodes import _spark_types_compatible + + assert not _spark_types_compatible( + ArrayType(StringType()), ArrayType(IntegerType()) + )