diff --git a/infra/website/docs/blog/rag-with-feast.md b/infra/website/docs/blog/rag-with-feast.md
new file mode 100644
index 00000000000..349127c2d0d
--- /dev/null
+++ b/infra/website/docs/blog/rag-with-feast.md
@@ -0,0 +1,248 @@
+---
+title: Retrieval Augmented Generation with Feast
+description: How Feast empowers ML Engineers to ship RAG applications to Production.
+date: 2025-03-17
+authors: ["Francisco Javier Arceo"]
+---
+
+
+

+
+
+
+## Why Feature Stores Make Sense for GenAI and RAG
+
+Feature stores have been developed over the [past decade](./what-is-a-feature-store) to address the challenges AI
+practitioners face in managing, serving, and scaling machine learning models in production.
+
+Some of the key challenges include:
+* Accessing the right raw data
+* Building features from raw data
+* Combining features into training data
+* Calculating and serving features in production
+* Monitoring features in production
+
+And Feast was specifically designed to address these challenges.
+
+These same challenges extend naturally to Generative AI (GenAI) applications, with the exception of model training. In
+GenAI applications, the foundation model is typically pre-trained and the focus is on fine-tuning or using the model simply as
+an endpoint from some provider (e.g., OpenAI, Anthropic, etc.).
+
+For GenAI use cases, feature stores enable the efficient management of context and metadata, both during
+training/fine-tuning and at inference time.
+
+By using a feature store for your application, you have the ability to treat the LLM context, including the prompt,
+as features. This means you can manage not only input context, document processing, data formatting, tokenization,
+chunking, and embeddings, but also track and version the context used during model inference, ensuring consistency,
+transparency, and reproducibility across models and iterations.
+
+With Feast, ML engineers can streamline the embedding generation process, ensure consistency across both offline and
+online environments, and track the lineage of data and transformations. By leveraging a feature store, GenAI
+applications benefit from enhanced scalability, maintainability, and reproducibility, making them ideal for complex
+AI applications and enterprise needs.
+
+## Feast Now Supports RAG
+
+With the rise of generative AI applications, the need to serve vectors has grown quickly. Feast now has alpha support
+for vector similarity search to power retrieval augmented generation (RAG) systems in production.
+
+
+

+
+
+This allows ML Engineers and Data Scientists to use the power of their feature store to easily deploy GenAI
+applications using RAG to production. More importantly, Feast offers the flexibility to customize and scale your
+production RAG applications through our scalable transformation systems (streaming, request-time, and batch).
+
+## Retrieval Augmented Generation (RAG)
+[RAG](https://en.wikipedia.org/wiki/Retrieval-augmented_generation) is a technique that combines generative models
+(e.g., LLMs) with retrieval systems to generate contextually relevant output for a particular goal (e.g.,
+question and answering).
+
+The typical RAG process involves:
+1. Sourcing text data relevant for your application
+2. Transforming each text document into smaller chunks of text
+3. Transforming those chunks of text into embeddings
+4. Inserting those chunks of text along with some identifier for the chunk and document in some database
+5. Retrieving those chunks of text along with the identifiers at run-time to inject that text into the LLM's context
+6. Calling some API to run inference with your LLM to generate contextually relevant output
+7. Returning the output to some end user
+
+Implicit in (1)-(4) is the potential of scaling to large amounts of data (i.e., using some form of distributed computing),
+orchestrating that scaling through some batch or streaming pipeline, and customization of key transformation decisions
+(e.g., tokenization, model, chunking, data formatting, etc.).
+
+## Powering Retrieval in Production
+To power the Retrieval step of RAG in production, we need to handle data ingestion, data transformation, indexing,
+and serving web requests from an API.
+
+Building high availability software that can handle these requirements and scale as your data scales is a
+non-trivial task. This is a strength of Feast, using the power of Kubernetes, large scale data frameworks like
+Spark and Flink, and the ability to ingest and transform data in real-time through the Feast Feature Server is
+a powerful combination.
+
+## Beyond Vector Similarity Search
+RAG patterns often use vector similarity search for the retrieval step, but this is not the
+only retrieval pattern that can be useful. In fact, standard entity-based retrieval can be very powerful for
+applications where relevant user-context is necessary.
+
+For example, many RAG applications are customer Chat Bots and they benefit significantly from user data (e.g.,
+account balance, location, etc.) to generate contextually relevant output. Feast can help you manage this user data
+using its existing entity based retrieval patterns.
+
+## The Benefits of Feast
+Fine-tuning is the holy grail to optimize your RAG systems, and by logging the documents/data and context retrieved
+and during inference, you can ensure that you can fine-tune both the generator and *the retriever* your LLMs for
+your particular needs.
+
+This means that Feast can help you not only serve your documents, user data, and other metadata for production
+RAG applications, but it can also help you scale your embeddings on large amounts of data (e.g,. using Spark to embed
+gigabytes of documents), re-use the same code online and offline, track changes to your transformations, data sources, and
+RAG-sources to provide you with replayability and data lineage, and prepare your datasets so you can fine-tune your
+embedding, retrieval, or generator models later.
+
+Historically, Feast catered to Data Scientists and ML Engineers who implemented their own types of data/feature transformations but, now,
+many RAG providers handle this out of the box for you. We will invest in creating extendable implementations to make it easier
+to ship your applications.
+
+## Feast Powered by Milvus
+
+[Milvus](https://milvus.io/) is a high performance open source vector database that provides a powerful and efficient way to store
+and retrieve embeddings. By using Feast with Milvus, you can easily deploy RAG applications to production and scale
+your retrieval systems on Kubernetes using the Feast Operator or the [Feature Server Helm Chart](https://github.com/feast-dev/feast/tree/master/infra/charts/feast-feature-server).
+
+This tutorial will walk you through building a basic RAG application with Milvus and Feast; i.e., ingesting embedded
+documents in Milvus and retrieving the most similar documents for a given query embedding.
+
+This example consists of 5 steps:
+1. Configuring Milvus
+2. Defining your Data Sources and Views
+3. Updating your Registry
+4. Ingesting the Data
+5. Retrieving the Data
+
+The full demo is available on our [GitHub repository](https://github.com/feast-dev/feast/tree/master/examples/rag).
+
+### Step 1: Configure Milvus
+Configure milvus in a simple `yaml` file.
+```yaml
+project: rag
+provider: local
+registry: data/registry.db
+online_store:
+ type: milvus
+ path: data/online_store.db
+ vector_enabled: true
+ embedding_dim: 384
+ index_type: "IVF_FLAT"
+
+offline_store:
+ type: file
+entity_key_serialization_version: 3
+# By default, no_auth for authentication and authorization, other possible values kubernetes and oidc. Refer the documentation for more details.
+auth:
+ type: no_auth
+```
+
+### Step 2: Define your Data Sources and Views
+You define your data declaratively using Feast's `FeatureView` and `Entity` objects, which are meant to be an easy way
+to give your software engineers and data scientists a common language to define data they want to ship to production.
+
+Here is an example of how you might define a `FeatureView` for a document retrieval. Notice how we define the `vector`
+field and enable vector search by setting `vector_index=True` and the distance metric to `COSINE`.
+
+That's it, the rest of the implementation is already handled for you by Feast and Milvus.
+
+```python
+document = Entity(
+ name="document_id",
+ description="Document ID",
+ value_type=ValueType.INT64,
+)
+
+source = FileSource(
+ file_format=ParquetFormat(),
+ path="./data/my_data.parquet",
+ timestamp_field="event_timestamp",
+)
+
+# Define the view for retrieval
+city_embeddings_feature_view = FeatureView(
+ name="city_embeddings",
+ entities=[document],
+ schema=[
+ Field(
+ name="vector",
+ dtype=Array(Float32),
+ vector_index=True, # Vector search enabled
+ vector_search_metric="COSINE", # Distance metric configured
+ ),
+ Field(name="state", dtype=String),
+ Field(name="sentence_chunks", dtype=String),
+ Field(name="wiki_summary", dtype=String),
+ ],
+ source=source,
+ ttl=timedelta(hours=2),
+)
+```
+
+### Step 3: Update your Registry
+After we have defined our code we use the `feast apply` syntax in the same folder as the `feature_store.yaml` file and
+update the registry with our metadata.
+```bash
+feast apply
+```
+
+### Step 4: Ingest your Data
+Now that we have defined our metadata, we can ingest our data into Milvus using the following code:
+```python
+store.write_to_online_store(feature_view_name='city_embeddings', df=df)
+```
+
+### Step 5: Retrieve your Data
+Now that the data is actually stored in Milvus, we can easily query it using the SDK (and corresponding REST API) to
+retrieve the most similar documents for a given query embedding.
+```python
+context_data = store.retrieve_online_documents_v2(
+ features=[
+ "city_embeddings:vector",
+ "city_embeddings:document_id",
+ "city_embeddings:state",
+ "city_embeddings:sentence_chunks",
+ "city_embeddings:wiki_summary",
+ ],
+ query=query_embedding,
+ top_k=3,
+ distance_metric='COSINE',
+).to_df()
+```
+
+### The Benefits from using Feast for RAG
+We've discussed some of the high-level benefits from using Feast for a RAG application.
+More specifically, here are some of the concrete benefits you can expect from using Feast for RAG:
+1. [Real-time, Stream, and Batch data Ingestion](https://docs.feast.dev/getting-started/concepts/data-ingestion) support to the Feature Server for online retrieval
+1. [Data dictionary/metadata catalog](https://docs.feast.dev/getting-started/components/registry) autogenerated from code
+3. [UI exposing the metadata catalog](https://docs.feast.dev/reference/alpha-web-ui)
+2. [FastAPI Server](https://docs.feast.dev/getting-started/components/feature-server) to serve your data
+3. [Role Based Access Control (RBAC)](https://docs.feast.dev/getting-started/concepts/permission) to govern access to your data
+6. [Deployment on Kubernetes](https://docs.feast.dev/how-to-guides/running-feast-in-production) using our Helm Chart or our Operator
+7. [Multiple vector database providers](https://docs.feast.dev/reference/alpha-vector-database)
+8. [Multiple data warehouse providers](https://docs.feast.dev/reference/offline-stores/overview#functionality-matrix)
+9. Support for different [data sources](https://docs.feast.dev/reference/data-sources/overview#functionality-matrix)
+10. Support for stream and [batch processors (e.g., Spark and Flink)](https://docs.feast.dev/tutorials/building-streaming-features)
+
+And more!
+
+## The Future of Feast and GenAI
+
+Feast will continue to invest in GenAI use cases.
+
+In particular, we will invest in (1) NLP as a first-class citizen, (2) support for images, (3) support for
+transforming unstructured data (e.g., PDFs), (4) an enhanced GenAI focused feature server to allow our end-users to
+more easily ship RAG to production, (4) an out of the box chat UI meant for internal development and fast iteration,
+and (5) making [Milvus]([url](https://milvus.io/intro)) a fully supported and core online store for RAG.
+
+## Join the Conversation
+
+Are you interested in learning more about how Feast can help you build and deploy RAG applications to production?
+Reach out to us on Slack or [GitHub](https://github.com/feast-dev/feast), we'd love to hear from you!
\ No newline at end of file
diff --git a/infra/website/public/images/blog/milvus-rag.png b/infra/website/public/images/blog/milvus-rag.png
new file mode 100644
index 00000000000..702c94e6b5b
Binary files /dev/null and b/infra/website/public/images/blog/milvus-rag.png differ
diff --git a/infra/website/public/images/blog/space.jpg b/infra/website/public/images/blog/space.jpg
new file mode 100644
index 00000000000..b0519eee4da
Binary files /dev/null and b/infra/website/public/images/blog/space.jpg differ
diff --git a/infra/website/src/pages/index.astro b/infra/website/src/pages/index.astro
index fe5b03d8e43..f64ca6388dc 100644
--- a/infra/website/src/pages/index.astro
+++ b/infra/website/src/pages/index.astro
@@ -33,7 +33,19 @@ features = store.get_online_features(
"product_features:price"
],
entity_rows=[{"customer_id": "C123", "product_id": "P456"}]
-).to_dict()`;
+).to_dict()
+
+# Retrieve your documents using vector similarity search for RAG
+features = store.retrieve_online_documents(
+ features=[
+ "corpus:document_id",
+ "corpus:chunk_id",
+ "corpus:chunk_text",
+ "corpus:chunk_embedding",
+ ],
+ query="What is the biggest city in the USA?"
+).to_dict()
+`;
---
@@ -42,7 +54,7 @@ features = store.get_online_features(
-
Feature Serving for Production AI
+
Serving Data for Production AI
Feast is an open source feature store that delivers structured data to AI and LLM applications at high scale during training and inference