diff --git a/infra/website/docs/blog/rag-with-feast.md b/infra/website/docs/blog/rag-with-feast.md new file mode 100644 index 00000000000..349127c2d0d --- /dev/null +++ b/infra/website/docs/blog/rag-with-feast.md @@ -0,0 +1,248 @@ +--- +title: Retrieval Augmented Generation with Feast +description: How Feast empowers ML Engineers to ship RAG applications to Production. +date: 2025-03-17 +authors: ["Francisco Javier Arceo"] +--- + +
+ Exploring the Possibilities of AI +
+ + +## Why Feature Stores Make Sense for GenAI and RAG + +Feature stores have been developed over the [past decade](./what-is-a-feature-store) to address the challenges AI +practitioners face in managing, serving, and scaling machine learning models in production. + +Some of the key challenges include: +* Accessing the right raw data +* Building features from raw data +* Combining features into training data +* Calculating and serving features in production +* Monitoring features in production + +And Feast was specifically designed to address these challenges. + +These same challenges extend naturally to Generative AI (GenAI) applications, with the exception of model training. In +GenAI applications, the foundation model is typically pre-trained and the focus is on fine-tuning or using the model simply as +an endpoint from some provider (e.g., OpenAI, Anthropic, etc.). + +For GenAI use cases, feature stores enable the efficient management of context and metadata, both during +training/fine-tuning and at inference time. + +By using a feature store for your application, you have the ability to treat the LLM context, including the prompt, +as features. This means you can manage not only input context, document processing, data formatting, tokenization, +chunking, and embeddings, but also track and version the context used during model inference, ensuring consistency, +transparency, and reproducibility across models and iterations. + +With Feast, ML engineers can streamline the embedding generation process, ensure consistency across both offline and +online environments, and track the lineage of data and transformations. By leveraging a feature store, GenAI +applications benefit from enhanced scalability, maintainability, and reproducibility, making them ideal for complex +AI applications and enterprise needs. + +## Feast Now Supports RAG + +With the rise of generative AI applications, the need to serve vectors has grown quickly. Feast now has alpha support +for vector similarity search to power retrieval augmented generation (RAG) systems in production. + +
+ Retrieval Augmented Generation with Milvus and Feast +
+ +This allows ML Engineers and Data Scientists to use the power of their feature store to easily deploy GenAI +applications using RAG to production. More importantly, Feast offers the flexibility to customize and scale your +production RAG applications through our scalable transformation systems (streaming, request-time, and batch). + +## Retrieval Augmented Generation (RAG) +[RAG](https://en.wikipedia.org/wiki/Retrieval-augmented_generation) is a technique that combines generative models +(e.g., LLMs) with retrieval systems to generate contextually relevant output for a particular goal (e.g., +question and answering). + +The typical RAG process involves: +1. Sourcing text data relevant for your application +2. Transforming each text document into smaller chunks of text +3. Transforming those chunks of text into embeddings +4. Inserting those chunks of text along with some identifier for the chunk and document in some database +5. Retrieving those chunks of text along with the identifiers at run-time to inject that text into the LLM's context +6. Calling some API to run inference with your LLM to generate contextually relevant output +7. Returning the output to some end user + +Implicit in (1)-(4) is the potential of scaling to large amounts of data (i.e., using some form of distributed computing), +orchestrating that scaling through some batch or streaming pipeline, and customization of key transformation decisions +(e.g., tokenization, model, chunking, data formatting, etc.). + +## Powering Retrieval in Production +To power the Retrieval step of RAG in production, we need to handle data ingestion, data transformation, indexing, +and serving web requests from an API. + +Building high availability software that can handle these requirements and scale as your data scales is a +non-trivial task. This is a strength of Feast, using the power of Kubernetes, large scale data frameworks like +Spark and Flink, and the ability to ingest and transform data in real-time through the Feast Feature Server is +a powerful combination. + +## Beyond Vector Similarity Search +RAG patterns often use vector similarity search for the retrieval step, but this is not the +only retrieval pattern that can be useful. In fact, standard entity-based retrieval can be very powerful for +applications where relevant user-context is necessary. + +For example, many RAG applications are customer Chat Bots and they benefit significantly from user data (e.g., +account balance, location, etc.) to generate contextually relevant output. Feast can help you manage this user data +using its existing entity based retrieval patterns. + +## The Benefits of Feast +Fine-tuning is the holy grail to optimize your RAG systems, and by logging the documents/data and context retrieved +and during inference, you can ensure that you can fine-tune both the generator and *the retriever* your LLMs for +your particular needs. + +This means that Feast can help you not only serve your documents, user data, and other metadata for production +RAG applications, but it can also help you scale your embeddings on large amounts of data (e.g,. using Spark to embed +gigabytes of documents), re-use the same code online and offline, track changes to your transformations, data sources, and +RAG-sources to provide you with replayability and data lineage, and prepare your datasets so you can fine-tune your +embedding, retrieval, or generator models later. + +Historically, Feast catered to Data Scientists and ML Engineers who implemented their own types of data/feature transformations but, now, +many RAG providers handle this out of the box for you. We will invest in creating extendable implementations to make it easier +to ship your applications. + +## Feast Powered by Milvus + +[Milvus](https://milvus.io/) is a high performance open source vector database that provides a powerful and efficient way to store +and retrieve embeddings. By using Feast with Milvus, you can easily deploy RAG applications to production and scale +your retrieval systems on Kubernetes using the Feast Operator or the [Feature Server Helm Chart](https://github.com/feast-dev/feast/tree/master/infra/charts/feast-feature-server). + +This tutorial will walk you through building a basic RAG application with Milvus and Feast; i.e., ingesting embedded +documents in Milvus and retrieving the most similar documents for a given query embedding. + +This example consists of 5 steps: +1. Configuring Milvus +2. Defining your Data Sources and Views +3. Updating your Registry +4. Ingesting the Data +5. Retrieving the Data + +The full demo is available on our [GitHub repository](https://github.com/feast-dev/feast/tree/master/examples/rag). + +### Step 1: Configure Milvus +Configure milvus in a simple `yaml` file. +```yaml +project: rag +provider: local +registry: data/registry.db +online_store: + type: milvus + path: data/online_store.db + vector_enabled: true + embedding_dim: 384 + index_type: "IVF_FLAT" + +offline_store: + type: file +entity_key_serialization_version: 3 +# By default, no_auth for authentication and authorization, other possible values kubernetes and oidc. Refer the documentation for more details. +auth: + type: no_auth +``` + +### Step 2: Define your Data Sources and Views +You define your data declaratively using Feast's `FeatureView` and `Entity` objects, which are meant to be an easy way +to give your software engineers and data scientists a common language to define data they want to ship to production. + +Here is an example of how you might define a `FeatureView` for a document retrieval. Notice how we define the `vector` +field and enable vector search by setting `vector_index=True` and the distance metric to `COSINE`. + +That's it, the rest of the implementation is already handled for you by Feast and Milvus. + +```python +document = Entity( + name="document_id", + description="Document ID", + value_type=ValueType.INT64, +) + +source = FileSource( + file_format=ParquetFormat(), + path="./data/my_data.parquet", + timestamp_field="event_timestamp", +) + +# Define the view for retrieval +city_embeddings_feature_view = FeatureView( + name="city_embeddings", + entities=[document], + schema=[ + Field( + name="vector", + dtype=Array(Float32), + vector_index=True, # Vector search enabled + vector_search_metric="COSINE", # Distance metric configured + ), + Field(name="state", dtype=String), + Field(name="sentence_chunks", dtype=String), + Field(name="wiki_summary", dtype=String), + ], + source=source, + ttl=timedelta(hours=2), +) +``` + +### Step 3: Update your Registry +After we have defined our code we use the `feast apply` syntax in the same folder as the `feature_store.yaml` file and +update the registry with our metadata. +```bash +feast apply +``` + +### Step 4: Ingest your Data +Now that we have defined our metadata, we can ingest our data into Milvus using the following code: +```python +store.write_to_online_store(feature_view_name='city_embeddings', df=df) +``` + +### Step 5: Retrieve your Data +Now that the data is actually stored in Milvus, we can easily query it using the SDK (and corresponding REST API) to +retrieve the most similar documents for a given query embedding. +```python +context_data = store.retrieve_online_documents_v2( + features=[ + "city_embeddings:vector", + "city_embeddings:document_id", + "city_embeddings:state", + "city_embeddings:sentence_chunks", + "city_embeddings:wiki_summary", + ], + query=query_embedding, + top_k=3, + distance_metric='COSINE', +).to_df() +``` + +### The Benefits from using Feast for RAG +We've discussed some of the high-level benefits from using Feast for a RAG application. +More specifically, here are some of the concrete benefits you can expect from using Feast for RAG: +1. [Real-time, Stream, and Batch data Ingestion](https://docs.feast.dev/getting-started/concepts/data-ingestion) support to the Feature Server for online retrieval +1. [Data dictionary/metadata catalog](https://docs.feast.dev/getting-started/components/registry) autogenerated from code +3. [UI exposing the metadata catalog](https://docs.feast.dev/reference/alpha-web-ui) +2. [FastAPI Server](https://docs.feast.dev/getting-started/components/feature-server) to serve your data +3. [Role Based Access Control (RBAC)](https://docs.feast.dev/getting-started/concepts/permission) to govern access to your data +6. [Deployment on Kubernetes](https://docs.feast.dev/how-to-guides/running-feast-in-production) using our Helm Chart or our Operator +7. [Multiple vector database providers](https://docs.feast.dev/reference/alpha-vector-database) +8. [Multiple data warehouse providers](https://docs.feast.dev/reference/offline-stores/overview#functionality-matrix) +9. Support for different [data sources](https://docs.feast.dev/reference/data-sources/overview#functionality-matrix) +10. Support for stream and [batch processors (e.g., Spark and Flink)](https://docs.feast.dev/tutorials/building-streaming-features) + +And more! + +## The Future of Feast and GenAI + +Feast will continue to invest in GenAI use cases. + +In particular, we will invest in (1) NLP as a first-class citizen, (2) support for images, (3) support for +transforming unstructured data (e.g., PDFs), (4) an enhanced GenAI focused feature server to allow our end-users to +more easily ship RAG to production, (4) an out of the box chat UI meant for internal development and fast iteration, +and (5) making [Milvus]([url](https://milvus.io/intro)) a fully supported and core online store for RAG. + +## Join the Conversation + +Are you interested in learning more about how Feast can help you build and deploy RAG applications to production? +Reach out to us on Slack or [GitHub](https://github.com/feast-dev/feast), we'd love to hear from you! \ No newline at end of file diff --git a/infra/website/public/images/blog/milvus-rag.png b/infra/website/public/images/blog/milvus-rag.png new file mode 100644 index 00000000000..702c94e6b5b Binary files /dev/null and b/infra/website/public/images/blog/milvus-rag.png differ diff --git a/infra/website/public/images/blog/space.jpg b/infra/website/public/images/blog/space.jpg new file mode 100644 index 00000000000..b0519eee4da Binary files /dev/null and b/infra/website/public/images/blog/space.jpg differ diff --git a/infra/website/src/pages/index.astro b/infra/website/src/pages/index.astro index fe5b03d8e43..f64ca6388dc 100644 --- a/infra/website/src/pages/index.astro +++ b/infra/website/src/pages/index.astro @@ -33,7 +33,19 @@ features = store.get_online_features( "product_features:price" ], entity_rows=[{"customer_id": "C123", "product_id": "P456"}] -).to_dict()`; +).to_dict() + +# Retrieve your documents using vector similarity search for RAG +features = store.retrieve_online_documents( + features=[ + "corpus:document_id", + "corpus:chunk_id", + "corpus:chunk_text", + "corpus:chunk_embedding", + ], + query="What is the biggest city in the USA?" +).to_dict() +`; --- @@ -42,7 +54,7 @@ features = store.get_online_features(
-

Feature Serving for Production AI

+

Serving Data for Production AI

Feast is an open source feature store that delivers structured data to AI and LLM applications at high scale during training and inference