From f9c887955cd8893716ff9ed7df4ca0040d28ebcd Mon Sep 17 00:00:00 2001 From: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Date: Sun, 6 Apr 2025 11:48:18 +0000 Subject: [PATCH] feat: Add RAG tutorial and Use Cases documentation Co-Authored-By: Francisco Javier Arceo --- docs/SUMMARY.md | 2 + docs/getting-started/use-cases.md | 117 ++++++++++++++ docs/tutorials/rag-with-docling.md | 236 +++++++++++++++++++++++++++++ 3 files changed, 355 insertions(+) create mode 100644 docs/getting-started/use-cases.md create mode 100644 docs/tutorials/rag-with-docling.md diff --git a/docs/SUMMARY.md b/docs/SUMMARY.md index a8d3c7d630a..a95439a7161 100644 --- a/docs/SUMMARY.md +++ b/docs/SUMMARY.md @@ -28,6 +28,7 @@ * [\[Alpha\] Saved dataset](getting-started/concepts/dataset.md) * [Permission](getting-started/concepts/permission.md) * [Tags](getting-started/concepts/tags.md) +* [Use Cases](getting-started/use-cases.md) * [Components](getting-started/components/README.md) * [Overview](getting-started/components/overview.md) * [Registry](getting-started/components/registry.md) @@ -50,6 +51,7 @@ * [Driver stats on Snowflake](tutorials/tutorials-overview/driver-stats-on-snowflake.md) * [Validating historical features with Great Expectations](tutorials/validating-historical-features.md) * [Building streaming features](tutorials/building-streaming-features.md) +* [Retrieval Augmented Generation (RAG) with Feast](tutorials/rag-with-docling.md) ## How-to Guides diff --git a/docs/getting-started/use-cases.md b/docs/getting-started/use-cases.md new file mode 100644 index 00000000000..861eed4da1f --- /dev/null +++ b/docs/getting-started/use-cases.md @@ -0,0 +1,117 @@ +# Use Cases + +This page covers common use cases for Feast and how a feature store can benefit your AI/ML workflows. + +## Recommendation Engines + +Recommendation engines require personalized feature data related to users, items, and their interactions. Feast can help by: + +- **Managing feature data**: Store and serve user preferences, item characteristics, and interaction history +- **Low-latency serving**: Provide real-time features for dynamic recommendations +- **Point-in-time correctness**: Ensure training and serving data are consistent to avoid data leakage +- **Feature reuse**: Allow different recommendation models to share the same feature definitions + +### Example: User-Item Recommendations + +A typical recommendation engine might need features such as: +- User features: demographics, preferences, historical behavior +- Item features: categories, attributes, popularity scores +- Interaction features: past user-item interactions, ratings + +Feast allows you to define these features once and reuse them across different recommendation models, ensuring consistency between training and serving environments. + +{% content-ref url="../tutorials/tutorials-overview/driver-ranking-with-feast.md" %} +[Driver Ranking Tutorial](../tutorials/tutorials-overview/driver-ranking-with-feast.md) +{% endcontent-ref %} + +## Risk Scorecards + +Risk scorecards (such as credit risk, fraud risk, and marketing propensity models) require a comprehensive view of entity data with historical contexts. Feast helps by: + +- **Feature consistency**: Ensure all models use the same feature definitions +- **Historical feature retrieval**: Generate training datasets with correct point-in-time feature values +- **Feature monitoring**: Track feature distributions to detect data drift +- **Governance**: Maintain an audit trail of features used in regulated environments + +### Example: Credit Risk Scoring + +Credit risk models might use features like: +- Transaction history patterns +- Account age and status +- Payment history features +- External credit bureau data +- Employment and income verification + +Feast enables you to combine these features from disparate sources while maintaining data consistency and freshness. + +{% content-ref url="../tutorials/tutorials-overview/real-time-credit-scoring-on-aws.md" %} +[Real-time Credit Scoring on AWS](../tutorials/tutorials-overview/real-time-credit-scoring-on-aws.md) +{% endcontent-ref %} + +{% content-ref url="../tutorials/tutorials-overview/fraud-detection.md" %} +[Fraud Detection on GCP](../tutorials/tutorials-overview/fraud-detection.md) +{% endcontent-ref %} + +## NLP / RAG / Information Retrieval + +Natural Language Processing (NLP) and Retrieval Augmented Generation (RAG) applications require efficient storage and retrieval of text embeddings. Feast supports these use cases by: + +- **Vector storage**: Store and index embedding vectors for efficient similarity search +- **Document metadata**: Associate embeddings with metadata for contextualized retrieval +- **Scaling retrieval**: Serve vectors with low latency for real-time applications +- **Versioning**: Track changes to embedding models and document collections + +### Example: Retrieval Augmented Generation + +RAG systems can leverage Feast to: +- Store document embeddings and chunks in a vector database +- Retrieve contextually relevant documents for user queries +- Combine document retrieval with entity-specific features +- Scale to large document collections + +Feast makes it remarkably easy to make data available for retrieval by providing a simple API for both storing and querying vector embeddings. + +{% content-ref url="../tutorials/rag-with-docling.md" %} +[RAG with Feast Tutorial](../tutorials/rag-with-docling.md) +{% endcontent-ref %} + +## Time Series Forecasting + +Time series forecasting for demand planning, inventory management, and anomaly detection benefits from Feast through: + +- **Temporal feature management**: Store and retrieve time-bound features +- **Feature engineering**: Create time-based aggregations and transformations +- **Consistent feature retrieval**: Ensure training and inference use the same feature definitions +- **Backfilling capabilities**: Generate historical features for model training + +### Example: Demand Forecasting + +Demand forecasting applications typically use features such as: +- Historical sales data with temporal patterns +- Seasonal indicators and holiday flags +- Weather data +- Price changes and promotions +- External economic indicators + +Feast allows you to combine these diverse data sources and make them available for both batch training and online inference. + +## Image and Multi-Modal Processing + +While Feast was initially built for structured data, it can also support multi-modal applications by: + +- **Storing feature metadata**: Keep track of image paths, embeddings, and metadata +- **Vector embeddings**: Store image embeddings for similarity search +- **Feature fusion**: Combine image features with structured data features + +## Why Feast Is Impactful + +Across all these use cases, Feast provides several core benefits: + +1. **Consistency between training and serving**: Eliminate training-serving skew by using the same feature definitions +2. **Feature reuse**: Define features once and use them across multiple models +3. **Scalable feature serving**: Serve features at low latency for production applications +4. **Feature governance**: Maintain a central registry of feature definitions with metadata +5. **Data freshness**: Keep online features up-to-date with batch and streaming ingestion +6. **Reduced operational complexity**: Standardize feature access patterns across models + +By implementing a feature store with Feast, teams can focus on model development rather than data engineering challenges, accelerating the delivery of ML applications to production. diff --git a/docs/tutorials/rag-with-docling.md b/docs/tutorials/rag-with-docling.md new file mode 100644 index 00000000000..2f9affa4b4e --- /dev/null +++ b/docs/tutorials/rag-with-docling.md @@ -0,0 +1,236 @@ +# Retrieval Augmented Generation (RAG) with Feast + +This tutorial demonstrates how to use Feast with [Docling](https://github.com/doclingjs/docling) and [Milvus](https://milvus.io/) to build a Retrieval Augmented Generation (RAG) application. You'll learn how to store document embeddings in Feast and retrieve the most relevant documents for a given query. + +## Overview + +RAG is a technique that combines generative models (e.g., LLMs) with retrieval systems to generate contextually relevant output for a particular goal (e.g., question and answering). Feast makes it easy to store and retrieve document embeddings for RAG applications by providing integrations with vector databases like Milvus. + +The typical RAG process involves: +1. Sourcing text data relevant for your application +2. Transforming each text document into smaller chunks of text +3. Transforming those chunks of text into embeddings +4. Inserting those chunks of text along with some identifier for the chunk and document in a database +5. Retrieving those chunks of text along with the identifiers at run-time to inject that text into the LLM's context +6. Calling some API to run inference with your LLM to generate contextually relevant output +7. Returning the output to some end user + +## Prerequisites + +- Python 3.10 or later +- Feast installed with Milvus support: `pip install feast[milvus]` +- A basic understanding of feature stores and vector embeddings + +## Step 1: Configure Milvus in Feast + +Create a `feature_store.yaml` file with the following configuration: + +```yaml +project: rag +provider: local +registry: data/registry.db +online_store: + type: milvus + path: data/online_store.db + vector_enabled: true + embedding_dim: 384 + index_type: "IVF_FLAT" + +offline_store: + type: file +entity_key_serialization_version: 3 +# By default, no_auth for authentication and authorization, other possible values kubernetes and oidc. Refer the documentation for more details. +auth: + type: no_auth +``` + +## Step 2: Define your Data Sources and Views + +Create a `feature_repo.py` file to define your entities, data sources, and feature views: + +```python +from datetime import timedelta +from feast import Entity, FeatureView, Field, FileSource +from feast.types import Array, Float32, Int64, String, UnixTimestamp, ValueType + +# Define entities +document = Entity( + name="document_id", + description="Document ID", + value_type=ValueType.INT64, +) + +# Define data source +source = FileSource( + path="data/embedded_documents.parquet", + timestamp_field="event_timestamp", + created_timestamp_column="created_timestamp", +) + +# Define the view for retrieval +document_embeddings = FeatureView( + name="embedded_documents", + entities=[document], + schema=[ + Field( + name="vector", + dtype=Array(Float32), + vector_index=True, # Vector search enabled + vector_search_metric="COSINE", # Distance metric configured + ), + Field(name="document_id", dtype=Int64), + Field(name="created_timestamp", dtype=UnixTimestamp), + Field(name="sentence_chunks", dtype=String), + Field(name="event_timestamp", dtype=UnixTimestamp), + ], + source=source, + ttl=timedelta(hours=24), +) +``` + +## Step 3: Update your Registry + +Apply the feature view definitions to the registry: + +```bash +feast apply +``` + +## Step 4: Ingest your Data + +Process your documents, generate embeddings, and ingest them into the Feast online store: + +```python +from feast import FeatureStore +import pandas as pd +import numpy as np +from transformers import AutoTokenizer, AutoModel +import torch +import torch.nn.functional as F + +# Initialize FeatureStore +store = FeatureStore(".") + +# Function to generate embeddings +def mean_pooling(model_output, attention_mask): + token_embeddings = model_output[0] + input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float() + return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp( + input_mask_expanded.sum(1), min=1e-9 + ) + +def generate_embeddings(sentences, tokenizer, model): + encoded_input = tokenizer( + sentences, padding=True, truncation=True, return_tensors="pt" + ) + with torch.no_grad(): + model_output = model(**encoded_input) + sentence_embeddings = mean_pooling(model_output, encoded_input["attention_mask"]) + sentence_embeddings = F.normalize(sentence_embeddings, p=2, dim=1) + return sentence_embeddings.detach().cpu().numpy() + +# Example data +data = { + "document_id": [1, 2, 3], + "sentence_chunks": [ + "New York City is the most populous city in the United States.", + "Los Angeles is the second most populous city in the United States.", + "Chicago is the third most populous city in the United States." + ], + "event_timestamp": pd.to_datetime(["2023-01-01", "2023-01-01", "2023-01-01"]), + "created_timestamp": pd.to_datetime(["2023-01-01", "2023-01-01", "2023-01-01"]) +} + +# Load model and tokenizer +tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/all-MiniLM-L6-v2") +model = AutoModel.from_pretrained("sentence-transformers/all-MiniLM-L6-v2") + +# Generate embeddings +embeddings = generate_embeddings(data["sentence_chunks"], tokenizer, model) + +# Create DataFrame with embeddings +df = pd.DataFrame(data) +df["vector"] = embeddings.tolist() + +# Write to online store +store.write_to_online_store(feature_view_name='embedded_documents', df=df) +``` + +## Step 5: Retrieve Relevant Documents + +Now you can retrieve the most relevant documents for a given query: + +```python +from feast import FeatureStore + +# Initialize FeatureStore +store = FeatureStore(".") + +# Generate query embedding +query = "What is the largest city in the US?" +tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/all-MiniLM-L6-v2") +model = AutoModel.from_pretrained("sentence-transformers/all-MiniLM-L6-v2") +query_embedding = generate_embeddings([query], tokenizer, model)[0].tolist() + +# Retrieve similar documents +context_data = store.retrieve_online_documents_v2( + features=[ + "embedded_documents:vector", + "embedded_documents:document_id", + "embedded_documents:sentence_chunks", + ], + query=query_embedding, + top_k=3, + distance_metric='COSINE', +).to_df() + +print(context_data) +``` + +## Step 6: Use Retrieved Documents for Generation + +Finally, you can use the retrieved documents as context for an LLM: + +```python +from openai import OpenAI +import os + +client = OpenAI( + api_key=os.environ.get("OPENAI_API_KEY"), +) + +# Format documents for context +def format_documents(context_data, base_prompt): + documents = "\n".join([f"Document {i+1}: {row['embedded_documents__sentence_chunks']}" + for i, row in context_data.iterrows()]) + return f"{base_prompt}\n\nContext documents:\n{documents}" + +BASE_PROMPT = """You are a helpful assistant that answers questions based on the provided context.""" +FULL_PROMPT = format_documents(context_data, BASE_PROMPT) + +# Generate response +response = client.chat.completions.create( + model="gpt-4o-mini", + messages=[ + {"role": "system", "content": FULL_PROMPT}, + {"role": "user", "content": query} + ], +) + +print(response.choices[0].message.content) +``` + +## Why Feast for RAG? + +Feast makes it remarkably easy to set up and manage a RAG system by: + +1. Simplifying vector database configuration and management +2. Providing a consistent API for both writing and reading embeddings +3. Supporting both batch and real-time data ingestion +4. Enabling versioning and governance of your document repository +5. Offering seamless integration with multiple vector database backends +6. Providing a unified API for managing both feature data and document embeddings + +For more details on using vector databases with Feast, see the [Vector Database documentation](../reference/alpha-vector-database.md). + +The complete demo code is available in the [GitHub repository](https://github.com/feast-dev/feast/tree/master/examples/rag-docling).