Skip to content

Commit a19f25a

Browse files
feat: Added Json and Struct Complex Data Type
Signed-off-by: ntkathole <nikhilkathole2683@gmail.com> Co-authored-by: Cursor <cursoragent@cursor.com>
1 parent 970650b commit a19f25a

34 files changed

+1706
-221
lines changed

.pre-commit-config.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ repos:
1010
stages: [commit]
1111
language: system
1212
types: [python]
13+
exclude: '_pb2\.py$'
1314
entry: bash -c 'uv run ruff check --fix "$@" && uv run ruff format "$@"' --
1415
pass_filenames: true
1516

@@ -20,6 +21,7 @@ repos:
2021
stages: [commit]
2122
language: system
2223
types: [python]
24+
exclude: '_pb2\.py$'
2325
entry: bash -c 'uv run ruff check "$@" && uv run ruff format --check "$@"' --
2426
pass_filenames: true
2527

docs/getting-started/concepts/feast-types.md

Lines changed: 17 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -11,29 +11,38 @@ Feast supports the following categories of data types:
1111
- **Array types**: ordered lists of any primitive type, e.g. `Array(Int64)`, `Array(String)`.
1212
- **Set types**: unordered collections of unique values for any primitive type, e.g. `Set(String)`, `Set(Int64)`.
1313
- **Map types**: dictionary-like structures with string keys and values that can be any supported Feast type (including nested maps), e.g. `Map`, `Array(Map)`.
14+
- **JSON type**: opaque JSON data stored as a string at the proto level but semantically distinct from `String` — backends use native JSON types (`jsonb`, `VARIANT`, etc.), e.g. `Json`, `Array(Json)`.
15+
- **Struct type**: schema-aware structured type with named, typed fields. Unlike `Map` (which is schema-free), a `Struct` declares its field names and their types, enabling schema validation, e.g. `Struct({"name": String, "age": Int32})`.
1416

1517
For a complete reference with examples, see [Type System](../../reference/type-system.md).
1618

1719
Each feature or schema field in Feast is associated with a data type, which is stored in Feast's [registry](registry.md). These types are also used to ensure that Feast operates on values correctly (e.g. making sure that timestamp columns used for [point-in-time correct joins](point-in-time-joins.md) actually have the timestamp type).
1820

1921
As a result, each system that Feast interacts with needs a way to translate data types from the native platform into a Feast type. E.g., Snowflake SQL types are converted to Feast types [here](https://rtd.feast.dev/en/master/feast.html#feast.type_map.snowflake_python_type_to_feast_value_type). The onus is therefore on authors of offline or online store connectors to make sure that this type mapping happens correctly.
2022

21-
### Backend Type Mapping for Maps
23+
### Backend Type Mapping for Complex Types
2224

23-
Map types are supported across all major Feast backends:
25+
Map, JSON, and Struct types are supported across all major Feast backends:
2426

2527
| Backend | Native Type | Feast Type |
2628
|---------|-------------|------------|
27-
| PostgreSQL | `jsonb` | `Map` |
29+
| PostgreSQL | `jsonb` | `Map`, `Json`, `Struct` |
2830
| PostgreSQL | `jsonb[]` | `Array(Map)` |
2931
| Snowflake | `VARIANT`, `OBJECT` | `Map` |
32+
| Snowflake | `JSON` | `Json` |
3033
| Redshift | `SUPER` | `Map` |
31-
| BigQuery | `JSON`, `STRUCT` | `Map` |
34+
| Redshift | `json` | `Json` |
35+
| BigQuery | `JSON` | `Json` |
36+
| BigQuery | `STRUCT`, `RECORD` | `Struct` |
3237
| Spark | `map<string,string>` | `Map` |
3338
| Spark | `array<map<string,string>>` | `Array(Map)` |
34-
| MSSQL | `nvarchar(max)` | `Map` |
35-
| DynamoDB | Proto bytes | `Map` |
36-
| Redis | Proto bytes | `Map` |
37-
| Milvus | `VARCHAR` (base64 proto) | `Map` |
39+
| Spark | `struct<...>` | `Struct` |
40+
| Spark | `array<struct<...>>` | `Array(Struct(...))` |
41+
| MSSQL | `nvarchar(max)` | `Map`, `Json`, `Struct` |
42+
| DynamoDB | Proto bytes | `Map`, `Json`, `Struct` |
43+
| Redis | Proto bytes | `Map`, `Json`, `Struct` |
44+
| Milvus | `VARCHAR` (serialized) | `Map`, `Json`, `Struct` |
45+
46+
**Note**: When the backend native type is ambiguous (e.g., `jsonb` could be `Map`, `Json`, or `Struct`), the **schema-declared Feast type takes precedence**. The backend-to-Feast type mappings above are only used for schema inference when no explicit type is provided.
3847

3948
**Note**: Feast currently does *not* support a null type in its type system.

docs/getting-started/concepts/feature-view.md

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -171,7 +171,7 @@ This is useful for catching data quality issues early in the pipeline. To enable
171171

172172
```python
173173
from feast import FeatureView, Field
174-
from feast.types import Int64, Float32, Map
174+
from feast.types import Int32, Int64, Float32, Json, Map, String, Struct
175175

176176
validated_fv = FeatureView(
177177
name="validated_features",
@@ -180,12 +180,19 @@ validated_fv = FeatureView(
180180
Field(name="trips_today", dtype=Int64),
181181
Field(name="rating", dtype=Float32),
182182
Field(name="preferences", dtype=Map),
183+
Field(name="config", dtype=Json), # opaque JSON data
184+
Field(name="address", dtype=Struct({"street": String, "city": String, "zip": Int32})), # typed struct
183185
],
184186
source=my_source,
185187
enable_validation=True, # enables schema checks
186188
)
187189
```
188190

191+
**JSON vs Map vs Struct**: These three complex types serve different purposes:
192+
- **`Map`**: Schema-free dictionary with string keys. Use when the keys and values are dynamic.
193+
- **`Json`**: Opaque JSON data stored as a string. Backends use native JSON types (`jsonb`, `VARIANT`). Use for configuration blobs or API responses where you don't need field-level typing.
194+
- **`Struct`**: Schema-aware structured type with named, typed fields. Persisted through the registry via Field tags. Use when you know the exact structure and want type safety.
195+
189196
Validation is supported in all compute engines (Local, Spark, and Ray). When a required column is missing, a `ValueError` is raised. Type mismatches are logged as warnings but do not block execution, allowing for safe gradual adoption.
190197

191198
The `enable_validation` parameter is also available on `BatchFeatureView` and `StreamFeatureView`, as well as their respective decorators (`@batch_feature_view` and `@stream_feature_view`).

docs/how-to-guides/dbt-integration.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -289,10 +289,12 @@ Feast automatically maps dbt/warehouse column types to Feast types:
289289
| `TIMESTAMP`, `DATETIME` | `UnixTimestamp` |
290290
| `BYTES`, `BINARY` | `Bytes` |
291291
| `ARRAY<type>` | `Array(type)` |
292-
| `JSON`, `JSONB` | `Map` |
292+
| `JSON`, `JSONB` | `Map` (or `Json` if declared in schema) |
293293
| `VARIANT`, `OBJECT` | `Map` |
294294
| `SUPER` | `Map` |
295295
| `MAP<string,string>` | `Map` |
296+
| `STRUCT`, `RECORD` | `Struct` (BigQuery) |
297+
| `struct<...>` | `Struct` (Spark) |
296298

297299
Snowflake `NUMBER(precision, scale)` types are handled specially:
298300
- Scale > 0: `Float64`

docs/specs/offline_store_format.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,10 @@ Here's how Feast types map to Pandas types for Feast APIs that take in or return
5151
| BOOL\_LIST | `list[bool]`|
5252
| MAP | `dict` (`Dict[str, Any]`)|
5353
| MAP\_LIST | `list[dict]` (`List[Dict[str, Any]]`)|
54+
| JSON | `object` (parsed Python dict/list/str)|
55+
| JSON\_LIST | `list[object]`|
56+
| STRUCT | `dict` (`Dict[str, Any]`)|
57+
| STRUCT\_LIST | `list[dict]` (`List[Dict[str, Any]]`)|
5458

5559
Note that this mapping is non-injective, that is more than one Pandas type may corresponds to one Feast type (but not vice versa). In these cases, when converting Feast values to Pandas, the **first** Pandas type in the table above is used.
5660

@@ -82,6 +86,10 @@ Here's how Feast types map to BigQuery types when using BigQuery for offline sto
8286
| BOOL\_LIST | `ARRAY<BOOL>`|
8387
| MAP | `JSON` / `STRUCT` |
8488
| MAP\_LIST | `ARRAY<JSON>` / `ARRAY<STRUCT>` |
89+
| JSON | `JSON` |
90+
| JSON\_LIST | `ARRAY<JSON>` |
91+
| STRUCT | `STRUCT` / `RECORD` |
92+
| STRUCT\_LIST | `ARRAY<STRUCT>` |
8593

8694
Values that are not specified by the table above will cause an error on conversion.
8795

@@ -99,6 +107,7 @@ https://docs.snowflake.com/en/user-guide/python-connector-pandas.html#snowflake-
99107
| INT64 | `INT64 / UINT64` |
100108
| DOUBLE | `FLOAT64` |
101109
| MAP | `VARIANT` / `OBJECT` |
110+
| JSON | `JSON` / `VARIANT` |
102111

103112
#### Redshift Types
104113
Here's how Feast types map to Redshift types when using Redshift for offline storage:
@@ -114,5 +123,6 @@ Here's how Feast types map to Redshift types when using Redshift for offline sto
114123
| FLOAT | `FLOAT4` / `REAL` |
115124
| BOOL | `BOOL` |
116125
| MAP | `SUPER` |
126+
| JSON | `json` / `SUPER` |
117127

118128
Note: Redshift's `SUPER` type stores semi-structured JSON data. During materialization, Feast automatically handles `SUPER` columns that are exported as JSON strings by parsing them back into Python dictionaries before converting to `MAP` proto values.

protos/feast/core/FeatureView.proto

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ message FeatureView {
3636
FeatureViewMeta meta = 2;
3737
}
3838

39-
// Next available id: 17
39+
// Next available id: 18
4040
// TODO(adchia): refactor common fields from this and ODFV into separate metadata proto
4141
message FeatureViewSpec {
4242
// Name of the feature view. Must be unique. Not updated.
@@ -89,6 +89,9 @@ message FeatureViewSpec {
8989

9090
// The transformation mode (e.g., "python", "pandas", "spark", "sql", "ray")
9191
string mode = 16;
92+
93+
// Whether schema validation is enabled during materialization
94+
bool enable_validation = 17;
9295
}
9396

9497
message FeatureViewMeta {

protos/feast/core/StreamFeatureView.proto

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ message StreamFeatureView {
3737
FeatureViewMeta meta = 2;
3838
}
3939

40-
// Next available id: 20
40+
// Next available id: 21
4141
message StreamFeatureViewSpec {
4242
// Name of the feature view. Must be unique. Not updated.
4343
string name = 1;
@@ -99,5 +99,8 @@ message StreamFeatureViewSpec {
9999
// Hop size for tiling (e.g., 5 minutes). Determines the granularity of pre-aggregated tiles.
100100
// If not specified, defaults to 5 minutes. Only used when enable_tiling is true.
101101
google.protobuf.Duration tiling_hop_size = 19;
102+
103+
// Whether schema validation is enabled during materialization
104+
bool enable_validation = 20;
102105
}
103106

protos/feast/types/Value.proto

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,10 @@ message ValueType {
5353
FLOAT_SET = 27;
5454
BOOL_SET = 28;
5555
UNIX_TIMESTAMP_SET = 29;
56+
JSON = 32;
57+
JSON_LIST = 33;
58+
STRUCT = 34;
59+
STRUCT_LIST = 35;
5660
}
5761
}
5862

@@ -88,6 +92,10 @@ message Value {
8892
FloatSet float_set_val = 27;
8993
BoolSet bool_set_val = 28;
9094
Int64Set unix_timestamp_set_val = 29;
95+
string json_val = 32;
96+
StringList json_list_val = 33;
97+
Map struct_val = 34;
98+
MapList struct_list_val = 35;
9199
}
92100
}
93101

sdk/python/feast/feature_view.py

Lines changed: 5 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -284,6 +284,7 @@ def __copy__(self):
284284
online=self.online,
285285
offline=self.offline,
286286
sink_source=self.batch_source if self.source_views else None,
287+
enable_validation=self.enable_validation,
287288
)
288289

289290
# This is deliberately set outside of the FV initialization as we do not have the Entity objects.
@@ -462,17 +463,13 @@ def to_proto_spec(
462463
else self.mode
463464
)
464465

465-
tags = dict(self.tags) if self.tags else {}
466-
if self.enable_validation:
467-
tags["feast:enable_validation"] = "true"
468-
469466
return FeatureViewSpecProto(
470467
name=self.name,
471468
entities=self.entities,
472469
entity_columns=[field.to_proto() for field in self.entity_columns],
473470
features=[feature.to_proto() for feature in self.features],
474471
description=self.description,
475-
tags=tags,
472+
tags=self.tags,
476473
owner=self.owner,
477474
ttl=(ttl_duration if ttl_duration is not None else None),
478475
online=self.online,
@@ -482,6 +479,7 @@ def to_proto_spec(
482479
source_views=source_view_protos,
483480
feature_transformation=feature_transformation_proto,
484481
mode=mode_str,
482+
enable_validation=self.enable_validation,
485483
)
486484

487485
def to_proto_meta(self):
@@ -651,12 +649,8 @@ def _from_proto_internal(
651649
f"Entities: {feature_view.entities} vs Entity Columns: {feature_view.entity_columns}"
652650
)
653651

654-
# Restore enable_validation from well-known tag.
655-
proto_tags = dict(feature_view_proto.spec.tags)
656-
feature_view.enable_validation = (
657-
proto_tags.pop("feast:enable_validation", "false").lower() == "true"
658-
)
659-
feature_view.tags = proto_tags
652+
# Restore enable_validation from proto field.
653+
feature_view.enable_validation = feature_view_proto.spec.enable_validation
660654

661655
# FeatureViewProjections are not saved in the FeatureView proto.
662656
# Create the default projection.

sdk/python/feast/field.py

Lines changed: 95 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -12,15 +12,18 @@
1212
# See the License for the specific language governing permissions and
1313
# limitations under the License.
1414

15+
import json
1516
from typing import Dict, Optional
1617

1718
from typeguard import typechecked
1819

1920
from feast.feature import Feature
2021
from feast.protos.feast.core.Feature_pb2 import FeatureSpecV2 as FieldProto
21-
from feast.types import FeastType, from_value_type
22+
from feast.types import FeastType, Struct, from_value_type
2223
from feast.value_type import ValueType
2324

25+
STRUCT_SCHEMA_TAG = "feast:struct_schema"
26+
2427

2528
@typechecked
2629
class Field:
@@ -117,11 +120,15 @@ def to_proto(self) -> FieldProto:
117120
"""Converts a Field object to its protobuf representation."""
118121
value_type = self.dtype.to_value_type()
119122
vector_search_metric = self.vector_search_metric or ""
123+
tags = dict(self.tags)
124+
# Persist Struct field schema in tags
125+
if isinstance(self.dtype, Struct):
126+
tags[STRUCT_SCHEMA_TAG] = _serialize_struct_schema(self.dtype)
120127
return FieldProto(
121128
name=self.name,
122129
value_type=value_type.value,
123130
description=self.description,
124-
tags=self.tags,
131+
tags=tags,
125132
vector_index=self.vector_index,
126133
vector_length=self.vector_length,
127134
vector_search_metric=vector_search_metric,
@@ -136,13 +143,25 @@ def from_proto(cls, field_proto: FieldProto):
136143
field_proto: FieldProto protobuf object
137144
"""
138145
value_type = ValueType(field_proto.value_type)
146+
tags = dict(field_proto.tags)
139147
vector_search_metric = getattr(field_proto, "vector_search_metric", "")
140148
vector_index = getattr(field_proto, "vector_index", False)
141149
vector_length = getattr(field_proto, "vector_length", 0)
150+
151+
# Reconstruct Struct type from persisted schema in tags
152+
dtype: FeastType
153+
if value_type == ValueType.STRUCT and STRUCT_SCHEMA_TAG in tags:
154+
dtype = _deserialize_struct_schema(tags[STRUCT_SCHEMA_TAG])
155+
# Remove the internal tag so it doesn't leak to users
156+
user_tags = {k: v for k, v in tags.items() if k != STRUCT_SCHEMA_TAG}
157+
else:
158+
dtype = from_value_type(value_type=value_type)
159+
user_tags = tags
160+
142161
return cls(
143162
name=field_proto.name,
144-
dtype=from_value_type(value_type=value_type),
145-
tags=dict(field_proto.tags),
163+
dtype=dtype,
164+
tags=user_tags,
146165
description=field_proto.description,
147166
vector_index=vector_index,
148167
vector_length=vector_length,
@@ -163,3 +182,75 @@ def from_feature(cls, feature: Feature):
163182
description=feature.description,
164183
tags=feature.labels,
165184
)
185+
186+
187+
def _feast_type_to_str(feast_type: FeastType) -> str:
188+
"""Convert a FeastType to a string representation for serialization."""
189+
from feast.types import (
190+
Array,
191+
PrimitiveFeastType,
192+
)
193+
194+
if isinstance(feast_type, PrimitiveFeastType):
195+
return feast_type.name
196+
elif isinstance(feast_type, Struct):
197+
nested = {
198+
name: _feast_type_to_str(ft) for name, ft in feast_type.fields.items()
199+
}
200+
return json.dumps({"__struct__": nested})
201+
elif isinstance(feast_type, Array):
202+
return f"Array({_feast_type_to_str(feast_type.base_type)})"
203+
else:
204+
return str(feast_type)
205+
206+
207+
def _str_to_feast_type(type_str: str) -> FeastType:
208+
"""Convert a string representation back to a FeastType."""
209+
from feast.types import (
210+
Array,
211+
PrimitiveFeastType,
212+
)
213+
214+
# Check if it's an Array type
215+
if type_str.startswith("Array(") and type_str.endswith(")"):
216+
inner = type_str[6:-1]
217+
base_type = _str_to_feast_type(inner)
218+
return Array(base_type)
219+
220+
# Check if it's a nested Struct (JSON encoded)
221+
if type_str.startswith("{"):
222+
try:
223+
parsed = json.loads(type_str)
224+
if "__struct__" in parsed:
225+
fields = {
226+
name: _str_to_feast_type(ft_str)
227+
for name, ft_str in parsed["__struct__"].items()
228+
}
229+
return Struct(fields)
230+
except (json.JSONDecodeError, TypeError):
231+
pass
232+
233+
# Must be a PrimitiveFeastType name
234+
try:
235+
return PrimitiveFeastType[type_str]
236+
except KeyError:
237+
from feast.types import String
238+
239+
return String
240+
241+
242+
def _serialize_struct_schema(struct_type: Struct) -> str:
243+
"""Serialize a Struct's field schema to a JSON string for tag storage."""
244+
schema_dict = {}
245+
for name, feast_type in struct_type.fields.items():
246+
schema_dict[name] = _feast_type_to_str(feast_type)
247+
return json.dumps(schema_dict)
248+
249+
250+
def _deserialize_struct_schema(schema_str: str) -> Struct:
251+
"""Deserialize a JSON string from tags back to a Struct type."""
252+
schema_dict = json.loads(schema_str)
253+
fields = {}
254+
for name, type_str in schema_dict.items():
255+
fields[name] = _str_to_feast_type(type_str)
256+
return Struct(fields)

0 commit comments

Comments
 (0)