diff --git a/infra/website/docs/blog/mongodb-feast-integration.md b/infra/website/docs/blog/mongodb-feast-integration.md new file mode 100644 index 00000000000..05e3764d083 --- /dev/null +++ b/infra/website/docs/blog/mongodb-feast-integration.md @@ -0,0 +1,210 @@ +--- +title: Native MongoDB Support in Feast: One Database for Operational Data, Features, and Vectors +description: Feast now ships first-class support for **MongoDB** as both an online and an offline store, plus native **Vector Search** for embedding-based retrieval. Machine Learning teams running on MongoDB can serve features at low latency, generate point-in-time-correct training datasets, and power RAG or recommender workloads, all from a single MongoDB Atlas cluster, with no separate cache, no separate warehouse, and no parallel vector database to keep in sync. +date: 2026-05-07 +authors: ["Rishabh Bisht"] +--- + + +
+MongoDB Feast Stores:@.mongodb.net" + database: "feast_online" + +offline_store: + type: mongodb + connection_string: "mongodb+srv://:@.mongodb.net" + database: "feast_offline" + +entity_key_serialization_version: 3 +``` + +### **Define a feature view backed by `MongoDBSource`** + +```py +from datetime import timedelta +from feast import Entity, FeatureView, Field +from feast.types import Float32, Int64 +from feast.infra.offline_stores.contrib.mongodb_offline_store.mongodb_source import ( + MongoDBSource, +) + +driver = Entity(name="driver", join_keys=["driver_id"]) + +driver_stats_source = MongoDBSource( + name="driver_stats_source", + database="feast_offline", + collection="driver_stats", + timestamp_field="event_timestamp", + created_timestamp_column="created", +) + +driver_stats_fv = FeatureView( + name="driver_hourly_stats", + entities=[driver], + ttl=timedelta(days=7), + schema=[ + Field(name="conv_rate", dtype=Float32), + Field(name="acc_rate", dtype=Float32), + Field(name="avg_daily_trips", dtype=Int64), + ], + online=True, + source=driver_stats_source, +) +``` + +### **Apply, materialize, and serve** + +```shell +feast apply +feast materialize-incremental $(date -u +"%Y-%m-%dT%H:%M:%S") +``` + +```py +from feast import FeatureStore + +store = FeatureStore(repo_path=".") + +features = store.get_online_features( + features=[ + "driver_hourly_stats:conv_rate", + "driver_hourly_stats:acc_rate", + "driver_hourly_stats:avg_daily_trips", + ], + entity_rows=[{"driver_id": 1001}, {"driver_id": 1002}], +).to_dict() +``` + +That's it. Same connection string, same auth model, same cluster \- features in, features out. + +## **RAG and embeddings: vector search in the same cluster** + +If you're building a RAG pipeline, a recommender, or an agent that needs nearest-neighbor lookup over feature embeddings, the online store doubles as a vector store when `vector_enabled` is set: + +``` +online_store: + type: mongodb + connection_string: "mongodb+srv://:@.mongodb.net" + database: "feast_online" + vector_enabled: true + vector_index_wait_timeout: 60 + vector_index_wait_poll_interval: 2 +``` + +Mark the embedding field on your `FeatureView`: + +```py +from feast import FeatureView, Field +from feast.types import Array, Float32, Int64, String, UnixTimestamp + +document_embeddings = FeatureView( + name="embedded_documents", + entities=[item], + schema=[ + Field( + name="vector", + dtype=Array(Float32), + vector_index=True, # ← enable vector index + vector_search_metric="COSINE", # cosine | dot product | euclidean + ), + Field(name="item_id", dtype=Int64), + Field(name="sentence_chunks", dtype=String), + Field(name="event_timestamp", dtype=UnixTimestamp), + ], + source=rag_documents_source, +) +``` + +When you run `feast apply`, Feast creates the corresponding Atlas vector search index. When the feature view is removed, the index is dropped. The `vector_index_wait_timeout` and `vector_index_wait_poll_interval` settings control how long Feast waits for newly created Atlas Search indexes to become queryable before returning. + +Querying nearest neighbors is then one call: + +```py +results = store.retrieve_online_documents_v2( + features=[ + "embedded_documents:vector", + "embedded_documents:item_id", + "embedded_documents:sentence_chunks", + ], + query=query_embedding, # list[float] of the same dim + top_k=5, + distance_metric="COSINE", +).to_df() +``` + +Under the hood, this becomes a `$vectorSearch` aggregation against your Atlas cluster \- no second system to provision, no vector data to keep in sync with the rest of your features. + +## **Why this matters** + +A few reasons we think this lands in the right place for ML teams already on MongoDB: + +* **One database for training and inference.** The same Atlas cluster powers historical retrieval, materialization, and online serving. No ETL pipelines pushing features from a warehouse. Update a feature once, see it everywhere. +* **One security and compliance posture.** Atlas networking, IAM, encryption, and audit logging cover both halves of the feature store. Architects don't have to add a new database vendor and a new threat model to say yes to ML. +* **Vector and operational data colocated.** For RAG, recommenders, and agents, the embeddings live next to the entity data they describe. Filter your vector search on operational fields with the same query language you already use. +* **Flexible schema where it helps.** Feature engineering is iterative. MongoDB's document model means adding a field to a feature view doesn't require a schema migration on day one. +* **Async serving when you need it.** The online store ships a native async path on `AsyncMongoClient`, so feature lookups don't block the rest of your serving stack. + +## **Where to next** + +* **Online store reference:** [Feast docs \- MongoDB online store](https://docs.feast.dev/master/reference/online-stores/mongodb) +* **Offline store reference:** [Feast docs \- MongoDB offline store](https://docs.feast.dev/master/reference/offline-stores/mongodb) +* **Vector search:** [Feast docs \- Vector Search](https://docs.feast.dev/master/reference/data-sources/mongodb#vector-search) +* **Tutorial:** [Integrate MongoDB with Feast](https://www.mongodb.com/docs/atlas/ai-integrations/feast/) + +If you're already on MongoDB and want to standardize your ML stack on a single backend, this is the time to try it. Spin up a feature repo, point both stores at your cluster, and let us know how it goes \- issues and PRs welcome on GitHub. \ No newline at end of file diff --git a/infra/website/public/images/blog/mongodb-feature-stores.png b/infra/website/public/images/blog/mongodb-feature-stores.png new file mode 100644 index 00000000000..c0705834dc9 Binary files /dev/null and b/infra/website/public/images/blog/mongodb-feature-stores.png differ