Skip to content

Commit 52458fc

Browse files
authored
feat: Add Set as feature type (feast-dev#5888)
1 parent 4018e7b commit 52458fc

File tree

16 files changed

+766
-117
lines changed

16 files changed

+766
-117
lines changed

docs/reference/type-system.md

Lines changed: 43 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
## Motivation
44

55
Feast uses an internal type system to provide guarantees on training and serving data.
6-
Feast supports primitive types, array types, and map types for feature values.
6+
Feast supports primitive types, array types, set types, and map types for feature values.
77
Null types are not supported, although the `UNIX_TIMESTAMP` type is nullable.
88
The type system is controlled by [`Value.proto`](https://github.com/feast-dev/feast/blob/master/protos/feast/types/Value.proto) in protobuf and by [`types.py`](https://github.com/feast-dev/feast/blob/master/sdk/python/feast/types.py) in Python.
99
Type conversion logic can be found in [`type_map.py`](https://github.com/feast-dev/feast/blob/master/sdk/python/feast/type_map.py).
@@ -40,6 +40,23 @@ All primitive types have corresponding array (list) types:
4040
| `Array(Bool)` | `List[bool]` | List of booleans |
4141
| `Array(UnixTimestamp)` | `List[datetime]` | List of timestamps |
4242

43+
### Set Types
44+
45+
All primitive types (except Map) have corresponding set types for storing unique values:
46+
47+
| Feast Type | Python Type | Description |
48+
|------------|-------------|-------------|
49+
| `Set(Int32)` | `Set[int]` | Set of unique 32-bit integers |
50+
| `Set(Int64)` | `Set[int]` | Set of unique 64-bit integers |
51+
| `Set(Float32)` | `Set[float]` | Set of unique 32-bit floats |
52+
| `Set(Float64)` | `Set[float]` | Set of unique 64-bit floats |
53+
| `Set(String)` | `Set[str]` | Set of unique strings |
54+
| `Set(Bytes)` | `Set[bytes]` | Set of unique binary data |
55+
| `Set(Bool)` | `Set[bool]` | Set of unique booleans |
56+
| `Set(UnixTimestamp)` | `Set[datetime]` | Set of unique timestamps |
57+
58+
**Note:** Set types automatically remove duplicate values. When converting from lists or other iterables to sets, duplicates are eliminated.
59+
4360
### Map Types
4461

4562
Map types allow storing dictionary-like data structures:
@@ -60,7 +77,7 @@ from datetime import timedelta
6077
from feast import Entity, FeatureView, Field, FileSource
6178
from feast.types import (
6279
Int32, Int64, Float32, Float64, String, Bytes, Bool, UnixTimestamp,
63-
Array, Map
80+
Array, Set, Map
6481
)
6582

6683
# Define a data source
@@ -101,6 +118,12 @@ user_features = FeatureView(
101118
Field(name="notification_settings", dtype=Array(Bool)),
102119
Field(name="login_timestamps", dtype=Array(UnixTimestamp)),
103120

121+
# Set types (unique values only)
122+
Field(name="visited_pages", dtype=Set(String)),
123+
Field(name="unique_categories", dtype=Set(Int32)),
124+
Field(name="tag_ids", dtype=Set(Int64)),
125+
Field(name="preferred_languages", dtype=Set(String)),
126+
104127
# Map types
105128
Field(name="user_preferences", dtype=Map),
106129
Field(name="metadata", dtype=Map),
@@ -110,6 +133,24 @@ user_features = FeatureView(
110133
)
111134
```
112135

136+
### Set Type Usage Examples
137+
138+
Sets store unique values and automatically remove duplicates:
139+
140+
```python
141+
# Simple set
142+
visited_pages = {"home", "products", "checkout", "products"} # "products" appears twice
143+
# Feast will store this as: {"home", "products", "checkout"}
144+
145+
# Integer set
146+
unique_categories = {1, 2, 3, 2, 1} # duplicates will be removed
147+
# Feast will store this as: {1, 2, 3}
148+
149+
# Converting a list with duplicates to a set
150+
tag_list = [100, 200, 300, 100, 200]
151+
tag_ids = set(tag_list) # {100, 200, 300}
152+
```
153+
113154
### Map Type Usage Examples
114155

115156
Maps can store complex nested data structures:

protos/feast/types/Value.proto

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,14 @@ message ValueType {
4545
NULL = 19;
4646
MAP = 20;
4747
MAP_LIST = 21;
48+
BYTES_SET = 22;
49+
STRING_SET = 23;
50+
INT32_SET = 24;
51+
INT64_SET = 25;
52+
DOUBLE_SET = 26;
53+
FLOAT_SET = 27;
54+
BOOL_SET = 28;
55+
UNIX_TIMESTAMP_SET = 29;
4856
}
4957
}
5058

@@ -72,6 +80,14 @@ message Value {
7280
Null null_val = 19;
7381
Map map_val = 20;
7482
MapList map_list_val = 21;
83+
BytesSet bytes_set_val = 22;
84+
StringSet string_set_val = 23;
85+
Int32Set int32_set_val = 24;
86+
Int64Set int64_set_val = 25;
87+
DoubleSet double_set_val = 26;
88+
FloatSet float_set_val = 27;
89+
BoolSet bool_set_val = 28;
90+
Int64Set unix_timestamp_set_val = 29;
7591
}
7692
}
7793

@@ -107,6 +123,34 @@ message BoolList {
107123
repeated bool val = 1;
108124
}
109125

126+
message BytesSet {
127+
repeated bytes val = 1;
128+
}
129+
130+
message StringSet {
131+
repeated string val = 1;
132+
}
133+
134+
message Int32Set {
135+
repeated int32 val = 1;
136+
}
137+
138+
message Int64Set {
139+
repeated int64 val = 1;
140+
}
141+
142+
message DoubleSet {
143+
repeated double val = 1;
144+
}
145+
146+
message FloatSet {
147+
repeated float val = 1;
148+
}
149+
150+
message BoolSet {
151+
repeated bool val = 1;
152+
}
153+
110154
message Map {
111155
map<string, Value> val = 1;
112156
}

sdk/python/feast/infra/online_stores/milvus_online_store/milvus.py

Lines changed: 22 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -167,6 +167,8 @@ def _get_or_create_collection(
167167
fields_to_exclude = [
168168
"event_ts",
169169
"created_ts",
170+
"event_timestamp",
171+
"created_timestamp",
170172
]
171173
fields_to_add = [f for f in table.schema if f.name not in fields_to_exclude]
172174
for field in fields_to_add:
@@ -202,6 +204,7 @@ def _get_or_create_collection(
202204
schema=schema,
203205
)
204206
index_params = self.client.prepare_index_params()
207+
indices_added = False
205208
for vector_field in schema.fields:
206209
if (
207210
vector_field.dtype
@@ -222,7 +225,8 @@ def _get_or_create_collection(
222225
index_name=f"vector_index_{vector_field.name}",
223226
params={"nlist": config.online_store.nlist},
224227
)
225-
if len(index_params) > 0:
228+
indices_added = True
229+
if indices_added:
226230
self.client.create_index(
227231
collection_name=collection_name,
228232
index_params=index_params,
@@ -281,6 +285,16 @@ def online_write_batch(
281285
serialize_to_string=True,
282286
)
283287

288+
# Remove timestamp fields that are handled separately to avoid conflicts
289+
timestamp_fields = [
290+
"event_timestamp",
291+
"created_timestamp",
292+
"event_ts",
293+
"created_ts",
294+
]
295+
for field in timestamp_fields:
296+
values_dict.pop(field, None)
297+
284298
single_entity_record = {
285299
composite_key_name: entity_key_str,
286300
"event_ts": timestamp_int,
@@ -722,7 +736,7 @@ def _extract_proto_values_to_dict(
722736
numeric_vector_list_types = [
723737
k
724738
for k in PROTO_VALUE_TO_VALUE_TYPE_MAP.keys()
725-
if k is not None and "list" in k and "string" not in k
739+
if k is not None and ("list" in k or "set" in k) and "string" not in k
726740
]
727741
numeric_types = [
728742
"double_val",
@@ -747,9 +761,13 @@ def _extract_proto_values_to_dict(
747761
if (
748762
serialize_to_string
749763
and proto_val_type
750-
not in ["string_val", "bytes_val"] + numeric_types
764+
not in ["string_val", "bytes_val", "unix_timestamp_val"]
765+
+ numeric_types
751766
):
752-
vector_values = feature_values.SerializeToString().decode()
767+
# For complex types, use base64 encoding instead of decode
768+
vector_values = base64.b64encode(
769+
feature_values.SerializeToString()
770+
).decode("utf-8")
753771
elif proto_val_type == "bytes_val":
754772
byte_data = getattr(feature_values, proto_val_type)
755773
vector_values = base64.b64encode(byte_data).decode("utf-8")

sdk/python/feast/protos/feast/core/DatastoreTable_pb2.pyi

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,19 @@
11
"""
22
@generated by mypy-protobuf. Do not edit manually!
33
isort:skip_file
4-
5-
* Copyright 2021 The Feast Authors
6-
*
7-
* Licensed under the Apache License, Version 2.0 (the "License");
8-
* you may not use this file except in compliance with the License.
9-
* You may obtain a copy of the License at
10-
*
11-
* https://www.apache.org/licenses/LICENSE-2.0
12-
*
13-
* Unless required by applicable law or agreed to in writing, software
14-
* distributed under the License is distributed on an "AS IS" BASIS,
15-
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
16-
* See the License for the specific language governing permissions and
4+
5+
* Copyright 2021 The Feast Authors
6+
*
7+
* Licensed under the Apache License, Version 2.0 (the "License");
8+
* you may not use this file except in compliance with the License.
9+
* You may obtain a copy of the License at
10+
*
11+
* https://www.apache.org/licenses/LICENSE-2.0
12+
*
13+
* Unless required by applicable law or agreed to in writing, software
14+
* distributed under the License is distributed on an "AS IS" BASIS,
15+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
16+
* See the License for the specific language governing permissions and
1717
* limitations under the License.
1818
"""
1919
import builtins

sdk/python/feast/protos/feast/core/Entity_pb2.pyi

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,19 @@
11
"""
22
@generated by mypy-protobuf. Do not edit manually!
33
isort:skip_file
4-
5-
* Copyright 2020 The Feast Authors
6-
*
7-
* Licensed under the Apache License, Version 2.0 (the "License");
8-
* you may not use this file except in compliance with the License.
9-
* You may obtain a copy of the License at
10-
*
11-
* https://www.apache.org/licenses/LICENSE-2.0
12-
*
13-
* Unless required by applicable law or agreed to in writing, software
14-
* distributed under the License is distributed on an "AS IS" BASIS,
15-
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
16-
* See the License for the specific language governing permissions and
4+
5+
* Copyright 2020 The Feast Authors
6+
*
7+
* Licensed under the Apache License, Version 2.0 (the "License");
8+
* you may not use this file except in compliance with the License.
9+
* You may obtain a copy of the License at
10+
*
11+
* https://www.apache.org/licenses/LICENSE-2.0
12+
*
13+
* Unless required by applicable law or agreed to in writing, software
14+
* distributed under the License is distributed on an "AS IS" BASIS,
15+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
16+
* See the License for the specific language governing permissions and
1717
* limitations under the License.
1818
"""
1919
import builtins

sdk/python/feast/protos/feast/core/FeatureViewProjection_pb2.pyi

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ else:
1919
DESCRIPTOR: google.protobuf.descriptor.FileDescriptor
2020

2121
class FeatureViewProjection(google.protobuf.message.Message):
22-
"""A projection to be applied on top of a FeatureView.
22+
"""A projection to be applied on top of a FeatureView.
2323
Contains the modifications to a FeatureView such as the features subset to use.
2424
"""
2525

sdk/python/feast/protos/feast/core/Project_pb2.pyi

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,19 @@
11
"""
22
@generated by mypy-protobuf. Do not edit manually!
33
isort:skip_file
4-
5-
* Copyright 2020 The Feast Authors
6-
*
7-
* Licensed under the Apache License, Version 2.0 (the "License");
8-
* you may not use this file except in compliance with the License.
9-
* You may obtain a copy of the License at
10-
*
11-
* https://www.apache.org/licenses/LICENSE-2.0
12-
*
13-
* Unless required by applicable law or agreed to in writing, software
14-
* distributed under the License is distributed on an "AS IS" BASIS,
15-
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
16-
* See the License for the specific language governing permissions and
4+
5+
* Copyright 2020 The Feast Authors
6+
*
7+
* Licensed under the Apache License, Version 2.0 (the "License");
8+
* you may not use this file except in compliance with the License.
9+
* You may obtain a copy of the License at
10+
*
11+
* https://www.apache.org/licenses/LICENSE-2.0
12+
*
13+
* Unless required by applicable law or agreed to in writing, software
14+
* distributed under the License is distributed on an "AS IS" BASIS,
15+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
16+
* See the License for the specific language governing permissions and
1717
* limitations under the License.
1818
"""
1919
import builtins

sdk/python/feast/protos/feast/core/Registry_pb2.pyi

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,19 @@
11
"""
22
@generated by mypy-protobuf. Do not edit manually!
33
isort:skip_file
4-
5-
* Copyright 2020 The Feast Authors
6-
*
7-
* Licensed under the Apache License, Version 2.0 (the "License");
8-
* you may not use this file except in compliance with the License.
9-
* You may obtain a copy of the License at
10-
*
11-
* https://www.apache.org/licenses/LICENSE-2.0
12-
*
13-
* Unless required by applicable law or agreed to in writing, software
14-
* distributed under the License is distributed on an "AS IS" BASIS,
15-
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
16-
* See the License for the specific language governing permissions and
4+
5+
* Copyright 2020 The Feast Authors
6+
*
7+
* Licensed under the Apache License, Version 2.0 (the "License");
8+
* you may not use this file except in compliance with the License.
9+
* You may obtain a copy of the License at
10+
*
11+
* https://www.apache.org/licenses/LICENSE-2.0
12+
*
13+
* Unless required by applicable law or agreed to in writing, software
14+
* distributed under the License is distributed on an "AS IS" BASIS,
15+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
16+
* See the License for the specific language governing permissions and
1717
* limitations under the License.
1818
"""
1919
import builtins

0 commit comments

Comments
 (0)