Skip to content

Commit bf99640

Browse files
authored
feat: Elasticsearch vector database (feast-dev#4188)
1 parent 37f36b6 commit bf99640

15 files changed

Lines changed: 478 additions & 9 deletions

File tree

Makefile

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -310,6 +310,25 @@ test-python-universal-cassandra-no-cloud-providers:
310310
not test_snowflake" \
311311
sdk/python/tests
312312

313+
test-python-universal-elasticsearch-online:
314+
PYTHONPATH='.' \
315+
FULL_REPO_CONFIGS_MODULE=sdk.python.feast.infra.online_stores.contrib.elasticsearch_repo_configuration \
316+
PYTEST_PLUGINS=sdk.python.tests.integration.feature_repos.universal.online_store.elasticsearch \
317+
python -m pytest -n 8 --integration \
318+
-k "not test_universal_cli and \
319+
not test_go_feature_server and \
320+
not test_feature_logging and \
321+
not test_reorder_columns and \
322+
not test_logged_features_validation and \
323+
not test_lambda_materialization_consistency and \
324+
not test_offline_write and \
325+
not test_push_features_to_offline_store and \
326+
not gcs_registry and \
327+
not s3_registry and \
328+
not test_universal_types and \
329+
not test_snowflake" \
330+
sdk/python/tests
331+
313332
test-python-universal:
314333
python -m pytest -n 8 --integration sdk/python/tests
315334

docs/reference/alpha-vector-database.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ Below are supported vector databases and implemented features:
1010
| Vector Database | Retrieval | Indexing |
1111
|-----------------|-----------|----------|
1212
| Pgvector | [x] | [ ] |
13-
| Elasticsearch | [ ] | [ ] |
13+
| Elasticsearch | [x] | [x] |
1414
| Milvus | [ ] | [ ] |
1515
| Faiss | [ ] | [ ] |
1616

Lines changed: 125 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,125 @@
1+
# ElasticSearch online store (contrib)
2+
3+
## Description
4+
5+
The ElasticSearch online store provides support for materializing tabular feature values, as well as embedding feature vectors, into an ElasticSearch index for serving online features. \
6+
The embedding feature vectors are stored as dense vectors, and can be used for similarity search. More information on dense vectors can be found [here](https://www.elastic.co/guide/en/elasticsearch/reference/current/dense-vector.html).
7+
8+
## Getting started
9+
In order to use this online store, you'll need to run `pip install 'feast[elasticsearch]'`. You can get started by then running `feast init -t elasticsearch`.
10+
11+
## Example
12+
13+
{% code title="feature_store.yaml" %}
14+
```yaml
15+
project: my_feature_repo
16+
registry: data/registry.db
17+
provider: local
18+
online_store:
19+
type: elasticsearch
20+
host: ES_HOST
21+
port: ES_PORT
22+
user: ES_USERNAME
23+
password: ES_PASSWORD
24+
vector_len: 512
25+
write_batch_size: 1000
26+
```
27+
{% endcode %}
28+
29+
The full set of configuration options is available in [ElasticsearchOnlineStoreConfig](https://rtd.feast.dev/en/master/#feast.infra.online_stores.contrib.elasticsearch.ElasticsearchOnlineStoreConfig).
30+
31+
## Functionality Matrix
32+
33+
34+
| | Postgres |
35+
| :-------------------------------------------------------- | :------- |
36+
| write feature values to the online store | yes |
37+
| read feature values from the online store | yes |
38+
| update infrastructure (e.g. tables) in the online store | yes |
39+
| teardown infrastructure (e.g. tables) in the online store | yes |
40+
| generate a plan of infrastructure changes | no |
41+
| support for on-demand transforms | yes |
42+
| readable by Python SDK | yes |
43+
| readable by Java | no |
44+
| readable by Go | no |
45+
| support for entityless feature views | yes |
46+
| support for concurrent writing to the same key | no |
47+
| support for ttl (time to live) at retrieval | no |
48+
| support for deleting expired data | no |
49+
| collocated by feature view | yes |
50+
| collocated by feature service | no |
51+
| collocated by entity key | no |
52+
53+
To compare this set of functionality against other online stores, please see the full [functionality matrix](overview.md#functionality-matrix).
54+
55+
## Retrieving online document vectors
56+
57+
The ElasticSearch online store supports retrieving document vectors for a given list of entity keys. The document vectors are returned as a dictionary where the key is the entity key and the value is the document vector. The document vector is a dense vector of floats.
58+
59+
{% code title="python" %}
60+
```python
61+
from feast import FeatureStore
62+
63+
feature_store = FeatureStore(repo_path="feature_store.yaml")
64+
65+
query_vector = [1.0, 2.0, 3.0, 4.0, 5.0]
66+
top_k = 5
67+
68+
# Retrieve the top k closest features to the query vector
69+
70+
feature_values = feature_store.retrieve_online_documents(
71+
feature="my_feature",
72+
query=query_vector,
73+
top_k=top_k
74+
)
75+
```
76+
{% endcode %}
77+
78+
## Indexing
79+
Currently, the indexing mapping in the ElasticSearch online store is configured as:
80+
81+
{% code title="indexing_mapping" %}
82+
```json
83+
"properties": {
84+
"entity_key": {"type": "binary"},
85+
"feature_name": {"type": "keyword"},
86+
"feature_value": {"type": "binary"},
87+
"timestamp": {"type": "date"},
88+
"created_ts": {"type": "date"},
89+
"vector_value": {
90+
"type": "dense_vector",
91+
"dims": config.online_store.vector_len,
92+
"index": "true",
93+
"similarity": config.online_store.similarity,
94+
},
95+
}
96+
```
97+
{% endcode %}
98+
And the online_read API mapping is configured as:
99+
100+
{% code title="online_read_mapping" %}
101+
```json
102+
"query": {
103+
"bool": {
104+
"must": [
105+
{"terms": {"entity_key": entity_keys}},
106+
{"terms": {"feature_name": requested_features}},
107+
]
108+
}
109+
},
110+
```
111+
{% endcode %}
112+
113+
And the similarity search API mapping is configured as:
114+
115+
{% code title="similarity_search_mapping" %}
116+
```json
117+
{
118+
"field": "vector_value",
119+
"query_vector": embedding_vector,
120+
"k": top_k,
121+
}
122+
```
123+
{% endcode %}
124+
125+
These APIs are subject to change in future versions of Feast to improve performance and usability.

sdk/python/feast/feature_store.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1886,7 +1886,7 @@ def retrieve_online_documents(
18861886
feature: str,
18871887
query: Union[str, List[float]],
18881888
top_k: int,
1889-
distance_metric: str,
1889+
distance_metric: Optional[str] = None,
18901890
) -> OnlineResponse:
18911891
"""
18921892
Retrieves the top k closest document features. Note, embeddings are a subset of features.
@@ -1911,7 +1911,7 @@ def _retrieve_online_documents(
19111911
feature: str,
19121912
query: Union[str, List[float]],
19131913
top_k: int,
1914-
distance_metric: str = "L2",
1914+
distance_metric: Optional[str] = None,
19151915
):
19161916
if isinstance(query, str):
19171917
raise ValueError(
@@ -2209,7 +2209,7 @@ def _retrieve_from_online_store(
22092209
requested_feature: str,
22102210
query: List[float],
22112211
top_k: int,
2212-
distance_metric: str,
2212+
distance_metric: Optional[str],
22132213
) -> List[Tuple[Timestamp, "FieldStatus.ValueType", Value, Value, Value]]:
22142214
"""
22152215
Search and return document features from the online document store.

0 commit comments

Comments
 (0)