Course Materials RAG System

A Retrieval-Augmented Generation (RAG) system designed to answer questions about course materials using semantic search and AI-powered responses.

Overview

This application is a full-stack web application that enables users to query course materials and receive intelligent, context-aware responses. It uses ChromaDB for vector storage, Anthropic's Claude for AI generation, and provides a web interface for interaction.

Architecture

The system implements a Retrieval-Augmented Generation (RAG) pipeline:

Document Processing: Ingests course documents (TXT, PDF, DOCX) via backend/document_processor.py. Extracts metadata (title, instructor, links) and lessons using regex patterns. Chunks content into ~800-character segments with 100-character overlap, enriching with course/lesson context.
Vector Storage: ChromaDB (./chroma_db) stores embeddings using Sentence Transformers ("all-MiniLM-L6-v2"). Separate collections for course metadata (course_catalog) and content chunks (course_content).
Query Processing: FastAPI backend (backend/app.py) handles /api/query. Orchestrated by RAGSystem (backend/rag_system.py), which uses tools for semantic search and Claude for synthesis.
AI Integration: Anthropic Claude (model: "claude-sonnet-4-20250514") via AIGenerator (backend/ai_generator.py). Supports tool use (e.g., CourseSearchTool for retrieval) and session history (up to 2 exchanges).
Frontend: Static HTML/JS/CSS (frontend/) with chat UI, markdown rendering, and API calls.
Key Features: Semantic search with filters (course/lesson), sourced responses, conversational sessions.

Documents must follow a structured format for optimal processing:

Line 1: Course Title: [Title]
Line 2: Course Link: [URL] (optional)
Line 3: Course Instructor: [Name] (optional)
Subsequent: Lesson X: [Title] followed by content; optional Lesson Link: [URL].

Sample docs in docs/ cover AI/LLM topics (e.g., Anthropic courses).

Prerequisites

Python 3.13 or higher
uv (Python package manager)
An Anthropic API key (for Claude AI)
For Windows: Use Git Bash to run the application commands - Download Git for Windows

Installation

Install uv (if not already installed)

curl -LsSf https://astral.sh/uv/install.sh | sh

Install Python dependencies
```
uv sync
```
Set up environment variables

Create a .env file in the root directory:
```
ANTHROPIC_API_KEY=your_anthropic_api_key_here
```

Running the Application

Quick Start

Use the provided shell script:

chmod +x run.sh
./run.sh

Manual Start

cd backend
uv run uvicorn app:app --reload --port 8000

The application will be available at:

Web Interface: http://localhost:8000
API Documentation: http://localhost:8000/docs

Project Structure

Root: Configuration (pyproject.toml, .env.example, run.sh).
backend/: FastAPI app and core logic (8 Python modules: app.py, config.py, rag_system.py, etc.).
frontend/: Static web UI (index.html, script.js, style.css).
docs/: Sample course transcripts (4 TXT files).
./chroma_db: Auto-created vector database (delete to reload docs).

Development Notes

Sessions: Limited to 2 history exchanges for context; managed via session_manager.py.
Search Limits: Top-5 results per query; one search max via tool use.
Customization: Add docs to docs/ (restart to reload). Extend tools in search_tools.py.
Dependencies: Locked via uv.lock; uses uv for virtual env management.

Troubleshooting

API Key Issues: "Invalid API key" or 500 errors → Verify .env and restart. Ensure key has Claude access.
No Courses Loaded: Empty stats/UI → Check docs/ files; delete ./chroma_db and restart for reload. Verify processing logs.
Port Conflicts: "Address in use" → Change --port 8001 in run command.
Windows: Use Git Bash; ensure uv in PATH (echo $PATH).
Query Errors: "No content found" → Docs may not match query; test with samples (e.g., "MCP course outline").
High Latency: Long queries → Claude token limits; check API usage in Anthropic console.
Clean Start: rm -rf ./chroma_db then restart to rebuild vectors.

For production: Consider Docker, environment-specific configs, or cloud vector DB (e.g., Pinecone).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Course Materials RAG System

Overview

Architecture

Prerequisites

Installation

Running the Application

Quick Start

Manual Start

Project Structure

Development Notes

Troubleshooting

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
backend		backend
docs		docs
frontend		frontend
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
run.sh		run.sh
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Course Materials RAG System

Overview

Architecture

Prerequisites

Installation

Running the Application

Quick Start

Manual Start

Project Structure

Development Notes

Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages