From f9c887955cd8893716ff9ed7df4ca0040d28ebcd Mon Sep 17 00:00:00 2001
From: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Date: Sun, 6 Apr 2025 11:48:18 +0000
Subject: [PATCH] feat: Add RAG tutorial and Use Cases documentation

Co-Authored-By: Francisco Javier Arceo <arceofrancisco@gmail.com>
---
 docs/SUMMARY.md                    |   2 +
 docs/getting-started/use-cases.md  | 117 ++++++++++++++
 docs/tutorials/rag-with-docling.md | 236 +++++++++++++++++++++++++++++
 3 files changed, 355 insertions(+)
 create mode 100644 docs/getting-started/use-cases.md
 create mode 100644 docs/tutorials/rag-with-docling.md

diff --git a/docs/SUMMARY.md b/docs/SUMMARY.md
index a8d3c7d630a..a95439a7161 100644
--- a/docs/SUMMARY.md
+++ b/docs/SUMMARY.md
@@ -28,6 +28,7 @@
   * [\[Alpha\] Saved dataset](getting-started/concepts/dataset.md)
   * [Permission](getting-started/concepts/permission.md)
   * [Tags](getting-started/concepts/tags.md)
+* [Use Cases](getting-started/use-cases.md)
 * [Components](getting-started/components/README.md)
   * [Overview](getting-started/components/overview.md)
   * [Registry](getting-started/components/registry.md)
@@ -50,6 +51,7 @@
   * [Driver stats on Snowflake](tutorials/tutorials-overview/driver-stats-on-snowflake.md)
 * [Validating historical features with Great Expectations](tutorials/validating-historical-features.md)
 * [Building streaming features](tutorials/building-streaming-features.md)
+* [Retrieval Augmented Generation (RAG) with Feast](tutorials/rag-with-docling.md)
 
 ## How-to Guides
 
diff --git a/docs/getting-started/use-cases.md b/docs/getting-started/use-cases.md
new file mode 100644
index 00000000000..861eed4da1f
--- /dev/null
+++ b/docs/getting-started/use-cases.md
@@ -0,0 +1,117 @@
+# Use Cases
+
+This page covers common use cases for Feast and how a feature store can benefit your AI/ML workflows.
+
+## Recommendation Engines
+
+Recommendation engines require personalized feature data related to users, items, and their interactions. Feast can help by:
+
+- **Managing feature data**: Store and serve user preferences, item characteristics, and interaction history
+- **Low-latency serving**: Provide real-time features for dynamic recommendations
+- **Point-in-time correctness**: Ensure training and serving data are consistent to avoid data leakage
+- **Feature reuse**: Allow different recommendation models to share the same feature definitions
+
+### Example: User-Item Recommendations
+
+A typical recommendation engine might need features such as:
+- User features: demographics, preferences, historical behavior 
+- Item features: categories, attributes, popularity scores
+- Interaction features: past user-item interactions, ratings
+
+Feast allows you to define these features once and reuse them across different recommendation models, ensuring consistency between training and serving environments.
+
+{% content-ref url="../tutorials/tutorials-overview/driver-ranking-with-feast.md" %}
+[Driver Ranking Tutorial](../tutorials/tutorials-overview/driver-ranking-with-feast.md)
+{% endcontent-ref %}
+
+## Risk Scorecards
+
+Risk scorecards (such as credit risk, fraud risk, and marketing propensity models) require a comprehensive view of entity data with historical contexts. Feast helps by:
+
+- **Feature consistency**: Ensure all models use the same feature definitions
+- **Historical feature retrieval**: Generate training datasets with correct point-in-time feature values
+- **Feature monitoring**: Track feature distributions to detect data drift
+- **Governance**: Maintain an audit trail of features used in regulated environments
+
+### Example: Credit Risk Scoring
+
+Credit risk models might use features like:
+- Transaction history patterns
+- Account age and status
+- Payment history features
+- External credit bureau data
+- Employment and income verification
+
+Feast enables you to combine these features from disparate sources while maintaining data consistency and freshness.
+
+{% content-ref url="../tutorials/tutorials-overview/real-time-credit-scoring-on-aws.md" %}
+[Real-time Credit Scoring on AWS](../tutorials/tutorials-overview/real-time-credit-scoring-on-aws.md)
+{% endcontent-ref %}
+
+{% content-ref url="../tutorials/tutorials-overview/fraud-detection.md" %}
+[Fraud Detection on GCP](../tutorials/tutorials-overview/fraud-detection.md)
+{% endcontent-ref %}
+
+## NLP / RAG / Information Retrieval
+
+Natural Language Processing (NLP) and Retrieval Augmented Generation (RAG) applications require efficient storage and retrieval of text embeddings. Feast supports these use cases by:
+
+- **Vector storage**: Store and index embedding vectors for efficient similarity search
+- **Document metadata**: Associate embeddings with metadata for contextualized retrieval
+- **Scaling retrieval**: Serve vectors with low latency for real-time applications
+- **Versioning**: Track changes to embedding models and document collections
+
+### Example: Retrieval Augmented Generation
+
+RAG systems can leverage Feast to:
+- Store document embeddings and chunks in a vector database
+- Retrieve contextually relevant documents for user queries
+- Combine document retrieval with entity-specific features
+- Scale to large document collections
+
+Feast makes it remarkably easy to make data available for retrieval by providing a simple API for both storing and querying vector embeddings.
+
+{% content-ref url="../tutorials/rag-with-docling.md" %}
+[RAG with Feast Tutorial](../tutorials/rag-with-docling.md)
+{% endcontent-ref %}
+
+## Time Series Forecasting
+
+Time series forecasting for demand planning, inventory management, and anomaly detection benefits from Feast through:
+
+- **Temporal feature management**: Store and retrieve time-bound features
+- **Feature engineering**: Create time-based aggregations and transformations
+- **Consistent feature retrieval**: Ensure training and inference use the same feature definitions
+- **Backfilling capabilities**: Generate historical features for model training
+
+### Example: Demand Forecasting
+
+Demand forecasting applications typically use features such as:
+- Historical sales data with temporal patterns
+- Seasonal indicators and holiday flags
+- Weather data
+- Price changes and promotions
+- External economic indicators
+
+Feast allows you to combine these diverse data sources and make them available for both batch training and online inference.
+
+## Image and Multi-Modal Processing
+
+While Feast was initially built for structured data, it can also support multi-modal applications by:
+
+- **Storing feature metadata**: Keep track of image paths, embeddings, and metadata
+- **Vector embeddings**: Store image embeddings for similarity search
+- **Feature fusion**: Combine image features with structured data features
+
+## Why Feast Is Impactful
+
+Across all these use cases, Feast provides several core benefits:
+
+1. **Consistency between training and serving**: Eliminate training-serving skew by using the same feature definitions
+2. **Feature reuse**: Define features once and use them across multiple models
+3. **Scalable feature serving**: Serve features at low latency for production applications
+4. **Feature governance**: Maintain a central registry of feature definitions with metadata
+5. **Data freshness**: Keep online features up-to-date with batch and streaming ingestion
+6. **Reduced operational complexity**: Standardize feature access patterns across models
+
+By implementing a feature store with Feast, teams can focus on model development rather than data engineering challenges, accelerating the delivery of ML applications to production.
diff --git a/docs/tutorials/rag-with-docling.md b/docs/tutorials/rag-with-docling.md
new file mode 100644
index 00000000000..2f9affa4b4e
--- /dev/null
+++ b/docs/tutorials/rag-with-docling.md
@@ -0,0 +1,236 @@
+# Retrieval Augmented Generation (RAG) with Feast
+
+This tutorial demonstrates how to use Feast with [Docling](https://github.com/doclingjs/docling) and [Milvus](https://milvus.io/) to build a Retrieval Augmented Generation (RAG) application. You'll learn how to store document embeddings in Feast and retrieve the most relevant documents for a given query.
+
+## Overview
+
+RAG is a technique that combines generative models (e.g., LLMs) with retrieval systems to generate contextually relevant output for a particular goal (e.g., question and answering). Feast makes it easy to store and retrieve document embeddings for RAG applications by providing integrations with vector databases like Milvus.
+
+The typical RAG process involves:
+1. Sourcing text data relevant for your application
+2. Transforming each text document into smaller chunks of text
+3. Transforming those chunks of text into embeddings
+4. Inserting those chunks of text along with some identifier for the chunk and document in a database
+5. Retrieving those chunks of text along with the identifiers at run-time to inject that text into the LLM's context
+6. Calling some API to run inference with your LLM to generate contextually relevant output
+7. Returning the output to some end user
+
+## Prerequisites
+
+- Python 3.10 or later
+- Feast installed with Milvus support: `pip install feast[milvus]`
+- A basic understanding of feature stores and vector embeddings
+
+## Step 1: Configure Milvus in Feast
+
+Create a `feature_store.yaml` file with the following configuration:
+
+```yaml
+project: rag
+provider: local
+registry: data/registry.db
+online_store:
+  type: milvus
+  path: data/online_store.db
+  vector_enabled: true
+  embedding_dim: 384
+  index_type: "IVF_FLAT"
+
+offline_store:
+  type: file
+entity_key_serialization_version: 3
+# By default, no_auth for authentication and authorization, other possible values kubernetes and oidc. Refer the documentation for more details.
+auth:
+  type: no_auth
+```
+
+## Step 2: Define your Data Sources and Views
+
+Create a `feature_repo.py` file to define your entities, data sources, and feature views:
+
+```python
+from datetime import timedelta
+from feast import Entity, FeatureView, Field, FileSource
+from feast.types import Array, Float32, Int64, String, UnixTimestamp, ValueType
+
+# Define entities
+document = Entity(
+    name="document_id",
+    description="Document ID",
+    value_type=ValueType.INT64,
+)
+
+# Define data source
+source = FileSource(
+    path="data/embedded_documents.parquet",
+    timestamp_field="event_timestamp",
+    created_timestamp_column="created_timestamp",
+)
+
+# Define the view for retrieval
+document_embeddings = FeatureView(
+    name="embedded_documents",
+    entities=[document],
+    schema=[
+        Field(
+            name="vector",
+            dtype=Array(Float32),
+            vector_index=True,                # Vector search enabled
+            vector_search_metric="COSINE",    # Distance metric configured
+        ),
+        Field(name="document_id", dtype=Int64),
+        Field(name="created_timestamp", dtype=UnixTimestamp),
+        Field(name="sentence_chunks", dtype=String),
+        Field(name="event_timestamp", dtype=UnixTimestamp),
+    ],
+    source=source,
+    ttl=timedelta(hours=24),
+)
+```
+
+## Step 3: Update your Registry
+
+Apply the feature view definitions to the registry:
+
+```bash
+feast apply
+```
+
+## Step 4: Ingest your Data
+
+Process your documents, generate embeddings, and ingest them into the Feast online store:
+
+```python
+from feast import FeatureStore
+import pandas as pd
+import numpy as np
+from transformers import AutoTokenizer, AutoModel
+import torch
+import torch.nn.functional as F
+
+# Initialize FeatureStore
+store = FeatureStore(".")
+
+# Function to generate embeddings
+def mean_pooling(model_output, attention_mask):
+    token_embeddings = model_output[0]
+    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
+    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(
+        input_mask_expanded.sum(1), min=1e-9
+    )
+
+def generate_embeddings(sentences, tokenizer, model):
+    encoded_input = tokenizer(
+        sentences, padding=True, truncation=True, return_tensors="pt"
+    )
+    with torch.no_grad():
+        model_output = model(**encoded_input)
+    sentence_embeddings = mean_pooling(model_output, encoded_input["attention_mask"])
+    sentence_embeddings = F.normalize(sentence_embeddings, p=2, dim=1)
+    return sentence_embeddings.detach().cpu().numpy()
+
+# Example data
+data = {
+    "document_id": [1, 2, 3],
+    "sentence_chunks": [
+        "New York City is the most populous city in the United States.",
+        "Los Angeles is the second most populous city in the United States.",
+        "Chicago is the third most populous city in the United States."
+    ],
+    "event_timestamp": pd.to_datetime(["2023-01-01", "2023-01-01", "2023-01-01"]),
+    "created_timestamp": pd.to_datetime(["2023-01-01", "2023-01-01", "2023-01-01"])
+}
+
+# Load model and tokenizer
+tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")
+model = AutoModel.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")
+
+# Generate embeddings
+embeddings = generate_embeddings(data["sentence_chunks"], tokenizer, model)
+
+# Create DataFrame with embeddings
+df = pd.DataFrame(data)
+df["vector"] = embeddings.tolist()
+
+# Write to online store
+store.write_to_online_store(feature_view_name='embedded_documents', df=df)
+```
+
+## Step 5: Retrieve Relevant Documents
+
+Now you can retrieve the most relevant documents for a given query:
+
+```python
+from feast import FeatureStore
+
+# Initialize FeatureStore
+store = FeatureStore(".")
+
+# Generate query embedding
+query = "What is the largest city in the US?"
+tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")
+model = AutoModel.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")
+query_embedding = generate_embeddings([query], tokenizer, model)[0].tolist()
+
+# Retrieve similar documents
+context_data = store.retrieve_online_documents_v2(
+    features=[
+        "embedded_documents:vector",
+        "embedded_documents:document_id",
+        "embedded_documents:sentence_chunks",
+    ],
+    query=query_embedding,
+    top_k=3,
+    distance_metric='COSINE',
+).to_df()
+
+print(context_data)
+```
+
+## Step 6: Use Retrieved Documents for Generation
+
+Finally, you can use the retrieved documents as context for an LLM:
+
+```python
+from openai import OpenAI
+import os
+
+client = OpenAI(
+    api_key=os.environ.get("OPENAI_API_KEY"),
+)
+
+# Format documents for context
+def format_documents(context_data, base_prompt):
+    documents = "\n".join([f"Document {i+1}: {row['embedded_documents__sentence_chunks']}" 
+                          for i, row in context_data.iterrows()])
+    return f"{base_prompt}\n\nContext documents:\n{documents}"
+
+BASE_PROMPT = """You are a helpful assistant that answers questions based on the provided context."""
+FULL_PROMPT = format_documents(context_data, BASE_PROMPT)
+
+# Generate response
+response = client.chat.completions.create(
+    model="gpt-4o-mini",
+    messages=[
+        {"role": "system", "content": FULL_PROMPT},
+        {"role": "user", "content": query}
+    ],
+)
+
+print(response.choices[0].message.content)
+```
+
+## Why Feast for RAG?
+
+Feast makes it remarkably easy to set up and manage a RAG system by:
+
+1. Simplifying vector database configuration and management
+2. Providing a consistent API for both writing and reading embeddings
+3. Supporting both batch and real-time data ingestion
+4. Enabling versioning and governance of your document repository
+5. Offering seamless integration with multiple vector database backends
+6. Providing a unified API for managing both feature data and document embeddings
+
+For more details on using vector databases with Feast, see the [Vector Database documentation](../reference/alpha-vector-database.md).
+
+The complete demo code is available in the [GitHub repository](https://github.com/feast-dev/feast/tree/master/examples/rag-docling).