Skip to content

BenoitGaudieri/rag-cli

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

rag-cli

Ask questions about your documents — entirely on your machine, zero API costs.

rag-cli is a command-line RAG (Retrieval-Augmented Generation) tool that lets you index a PDF, a text file, or an entire folder of documents, then query them in natural language using a local LLM. Everything runs locally via Ollama: no cloud, no keys, no data leaving your machine.

$ python main.py query "What are the main configuration options?"

Q: What are the main configuration options?

A: According to the documentation, the three main configuration options are...

── Sources ──────────────────────────────
  1. docs/manual.pdf (p.12)
     "Configuration is handled through environment variables or a .env file..."
  2. docs/manual.pdf (p.14)
     "Advanced options can be set at runtime via the --model flag..."

Stack

Layer Technology
LLM & Embeddings Ollama (llama3.2, nomic-embed-text)
RAG Framework LangChain 1.x (LCEL)
Vector Store FAISS — persistent, local, no server needed
Document Parsing pypdf, docx2txt, built-in text loaders
TTS edge-tts — Microsoft Neural voices (multilingual)
CLI Typer + Rich

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                           rag-cli                               │
│                                                                 │
│  INDEX                              QUERY                       │
│                                                                 │
│  PDF / TXT / MD / DOCX              natural language question   │
│         │                                    │                  │
│         ▼                                    ▼                  │
│   Document Loader               OllamaEmbeddings (nomic)        │
│         │                                    │                  │
│         ▼                                    ▼                  │
│  RecursiveCharacter               FAISS MMR Retriever           │
│    TextSplitter                   (top-k relevant chunks)       │
│         │                                    │                  │
│         ▼                                    ▼                  │
│  OllamaEmbeddings  ──────►   FAISS     ChatPromptTemplate       │
│   (nomic-embed-text)        (on disk)        │                  │
│                                               ▼                 │
│                                       ChatOllama (llama3.2)     │
│                                               │                 │
│                                               ▼                 │
│                                       streamed answer           │
└─────────────────────────────────────────────────────────────────┘

Retrieval strategy: MMR (Maximum Marginal Relevance) — retrieved chunks are ranked by relevance to the query and diversity from each other, reducing redundancy in the context window.


Requirements

  • Python 3.10+
  • Ollama installed and running

Pull the required models once:

ollama pull nomic-embed-text   # ~274 MB — embedding model
ollama pull llama3.2           # ~2 GB  — default LLM (or swap for mistral, etc.)

Installation

git clone https://github.com/BenoitGaudieri/rag-cli
cd rag-cli

python -m venv venv
source venv/bin/activate        # Windows: venv\Scripts\activate

pip install -r requirements.txt

Usage

Index documents

# Single file
python main.py index ./docs/report.pdf

# Entire folder (PDF, TXT, MD, DOCX — recursive)
python main.py index ./docs/

# Named collection (to keep multiple knowledge bases separate)
python main.py index ./docs/ --collection myproject

Query

# One-shot question
python main.py query "Summarise the key findings"

# Show the source chunks used to generate the answer
python main.py query "What are the installation steps?" --sources

# Save the answer to the output/ folder (format inferred from extension)
python main.py query "Summarise the CV" --output summary.md
python main.py query "List key skills" --output skills.json

# Interactive REPL — ask multiple questions in a session
python main.py query

# Override the LLM at runtime
python main.py query "Translate chapter 1 to Italian" --model mistral

# Query a named collection
python main.py query "..." --collection myproject

Text-to-speech

Read text, documents, or query answers aloud using Microsoft Neural voices via edge-tts. Requires an internet connection for synthesis; audio playback is handled natively (no extra dependencies on Windows).

# Read a string directly
python main.py speak "Ciao, questo è un test"

# Read a PDF file
python main.py speak ./docs/report.pdf

# Read a saved answer (.json, .txt, .md)
python main.py speak output/summary.json

# Truncate long documents to the first 3 000 characters
python main.py speak ./docs/report.pdf --max-chars 3000

# Save the synthesised audio to an MP3 instead of playing it
python main.py speak ./docs/report.pdf --save audio/report.mp3

# Use a different voice
python main.py speak "hello" --voice en-US-AriaNeural

Add --speak / -S to any query call to have the answer read aloud automatically:

# Single question
python main.py query "Riassumi il documento" --speak

# Interactive REPL — every answer is read aloud
python main.py query --speak

# Different voice
python main.py query "..." --speak --voice it-IT-IsabellaNeural

Available Italian voices: it-IT-ElsaNeural (default, female), it-IT-IsabellaNeural (female), it-IT-DiegoNeural (male).


Compare models

Run the same question(s) against multiple models and collect results in a CSV or JSON file.

# Single question, two models
python main.py compare "What are your main skills?" --models "llama3.2,mistral" --output comparison.csv

# Multiple questions from a file (one per line)
python main.py compare questions.txt --models "llama3.2,mistral,phi3" --output comparison.csv

# Save as JSON instead
python main.py compare "Summarise the document" --models "llama3.2,mistral" --output comparison.json

The output CSV has four columns: question, model, answer, latency_s.

Manage collections

python main.py list                          # list all collections + chunk counts
python main.py clear --collection myproject  # delete one collection
python main.py clear                         # delete everything

Configuration

All defaults can be overridden via environment variables (or a .env file):

Variable Default Description
RAG_LLM_MODEL llama3.2 Ollama model for generation
RAG_EMBED_MODEL nomic-embed-text Ollama model for embeddings
RAG_COLLECTION default FAISS collection name
RAG_INDEX_DIR ./faiss_db Vector index directory
RAG_OUTPUT_DIR ./output Directory for saved answers and compare results
RAG_CHUNK_SIZE 1000 Characters per text chunk
RAG_CHUNK_OVERLAP 200 Overlap between consecutive chunks
RAG_TOP_K 5 Number of chunks retrieved per query
RAG_TTS_VOICE it-IT-ElsaNeural Default TTS voice
RAG_TTS_MAX_CHARS 0 Max characters to synthesise (0 = no limit)

Example:

RAG_LLM_MODEL=mistral RAG_TOP_K=8 python main.py query "..."

Project structure

rag-cli/
├── main.py          # CLI entry point — index / query / speak / list / clear / compare
├── rag/
│   ├── config.py    # all parameters, overridable via env vars
│   ├── indexer.py   # document loading, chunking, embedding → FAISS
│   ├── chain.py     # LCEL RAG chain, MMR retriever, streaming output
│   └── tts.py       # TTS synthesis (edge-tts) + text extraction from PDF/JSON/MD
├── requirements.txt
├── faiss_db/        # auto-created on first index  ← gitignored
└── output/          # saved answers and compare CSVs ← gitignored

Supported file types

Extension Loader
.pdf PyPDFLoader (pypdf)
.txt TextLoader (UTF-8 autodetect)
.md TextLoader
.docx Docx2txtLoader

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages