rag-cli

Ask questions about your documents — entirely on your machine, zero API costs.

rag-cli is a command-line RAG (Retrieval-Augmented Generation) tool that lets you index a PDF, a text file, or an entire folder of documents, then query them in natural language using a local LLM. Everything runs locally via Ollama: no cloud, no keys, no data leaving your machine.

$ python main.py query "What are the main configuration options?"

Q: What are the main configuration options?

A: According to the documentation, the three main configuration options are...

── Sources ──────────────────────────────
  1. docs/manual.pdf (p.12)
     "Configuration is handled through environment variables or a .env file..."
  2. docs/manual.pdf (p.14)
     "Advanced options can be set at runtime via the --model flag..."

Stack

Layer	Technology
LLM & Embeddings	Ollama (`llama3.2`, `nomic-embed-text`)
RAG Framework	LangChain 1.x (LCEL)
Vector Store	FAISS — persistent, local, no server needed
Document Parsing	`pypdf`, `docx2txt`, built-in text loaders
TTS	edge-tts — Microsoft Neural voices (multilingual)
CLI	Typer + Rich

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                           rag-cli                               │
│                                                                 │
│  INDEX                              QUERY                       │
│                                                                 │
│  PDF / TXT / MD / DOCX              natural language question   │
│         │                                    │                  │
│         ▼                                    ▼                  │
│   Document Loader               OllamaEmbeddings (nomic)        │
│         │                                    │                  │
│         ▼                                    ▼                  │
│  RecursiveCharacter               FAISS MMR Retriever           │
│    TextSplitter                   (top-k relevant chunks)       │
│         │                                    │                  │
│         ▼                                    ▼                  │
│  OllamaEmbeddings  ──────►   FAISS     ChatPromptTemplate       │
│   (nomic-embed-text)        (on disk)        │                  │
│                                               ▼                 │
│                                       ChatOllama (llama3.2)     │
│                                               │                 │
│                                               ▼                 │
│                                       streamed answer           │
└─────────────────────────────────────────────────────────────────┘

Retrieval strategy: MMR (Maximum Marginal Relevance) — retrieved chunks are ranked by relevance to the query and diversity from each other, reducing redundancy in the context window.

Requirements

Python 3.10+
Ollama installed and running

Pull the required models once:

ollama pull nomic-embed-text   # ~274 MB — embedding model
ollama pull llama3.2           # ~2 GB  — default LLM (or swap for mistral, etc.)

Installation

git clone https://github.com/BenoitGaudieri/rag-cli
cd rag-cli

python -m venv venv
source venv/bin/activate        # Windows: venv\Scripts\activate

pip install -r requirements.txt

Usage

Index documents

# Single file
python main.py index ./docs/report.pdf

# Entire folder (PDF, TXT, MD, DOCX — recursive)
python main.py index ./docs/

# Named collection (to keep multiple knowledge bases separate)
python main.py index ./docs/ --collection myproject

Query

# One-shot question
python main.py query "Summarise the key findings"

# Show the source chunks used to generate the answer
python main.py query "What are the installation steps?" --sources

# Save the answer to the output/ folder (format inferred from extension)
python main.py query "Summarise the CV" --output summary.md
python main.py query "List key skills" --output skills.json

# Interactive REPL — ask multiple questions in a session
python main.py query

# Override the LLM at runtime
python main.py query "Translate chapter 1 to Italian" --model mistral

# Query a named collection
python main.py query "..." --collection myproject

Text-to-speech

Read text, documents, or query answers aloud using Microsoft Neural voices via edge-tts. Requires an internet connection for synthesis; audio playback is handled natively (no extra dependencies on Windows).

# Read a string directly
python main.py speak "Ciao, questo è un test"

# Read a PDF file
python main.py speak ./docs/report.pdf

# Read a saved answer (.json, .txt, .md)
python main.py speak output/summary.json

# Truncate long documents to the first 3 000 characters
python main.py speak ./docs/report.pdf --max-chars 3000

# Save the synthesised audio to an MP3 instead of playing it
python main.py speak ./docs/report.pdf --save audio/report.mp3

# Use a different voice
python main.py speak "hello" --voice en-US-AriaNeural

Add --speak / -S to any query call to have the answer read aloud automatically:

# Single question
python main.py query "Riassumi il documento" --speak

# Interactive REPL — every answer is read aloud
python main.py query --speak

# Different voice
python main.py query "..." --speak --voice it-IT-IsabellaNeural

Available Italian voices: it-IT-ElsaNeural (default, female), it-IT-IsabellaNeural (female), it-IT-DiegoNeural (male).

Compare models

Run the same question(s) against multiple models and collect results in a CSV or JSON file.

# Single question, two models
python main.py compare "What are your main skills?" --models "llama3.2,mistral" --output comparison.csv

# Multiple questions from a file (one per line)
python main.py compare questions.txt --models "llama3.2,mistral,phi3" --output comparison.csv

# Save as JSON instead
python main.py compare "Summarise the document" --models "llama3.2,mistral" --output comparison.json

The output CSV has four columns: question, model, answer, latency_s.

Manage collections

python main.py list                          # list all collections + chunk counts
python main.py clear --collection myproject  # delete one collection
python main.py clear                         # delete everything

Configuration

All defaults can be overridden via environment variables (or a .env file):

Variable	Default	Description
`RAG_LLM_MODEL`	`llama3.2`	Ollama model for generation
`RAG_EMBED_MODEL`	`nomic-embed-text`	Ollama model for embeddings
`RAG_COLLECTION`	`default`	FAISS collection name
`RAG_INDEX_DIR`	`./faiss_db`	Vector index directory
`RAG_OUTPUT_DIR`	`./output`	Directory for saved answers and compare results
`RAG_CHUNK_SIZE`	`1000`	Characters per text chunk
`RAG_CHUNK_OVERLAP`	`200`	Overlap between consecutive chunks
`RAG_TOP_K`	`5`	Number of chunks retrieved per query
`RAG_TTS_VOICE`	`it-IT-ElsaNeural`	Default TTS voice
`RAG_TTS_MAX_CHARS`	`0`	Max characters to synthesise (0 = no limit)

Example:

RAG_LLM_MODEL=mistral RAG_TOP_K=8 python main.py query "..."

Project structure

rag-cli/
├── main.py          # CLI entry point — index / query / speak / list / clear / compare
├── rag/
│   ├── config.py    # all parameters, overridable via env vars
│   ├── indexer.py   # document loading, chunking, embedding → FAISS
│   ├── chain.py     # LCEL RAG chain, MMR retriever, streaming output
│   └── tts.py       # TTS synthesis (edge-tts) + text extraction from PDF/JSON/MD
├── requirements.txt
├── faiss_db/        # auto-created on first index  ← gitignored
└── output/          # saved answers and compare CSVs ← gitignored

Supported file types

Extension	Loader
`.pdf`	`PyPDFLoader` (pypdf)
`.txt`	`TextLoader` (UTF-8 autodetect)
`.md`	`TextLoader`
`.docx`	`Docx2txtLoader`

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
rag		rag
.gitignore		.gitignore
=6.1.0		=6.1.0
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

rag-cli

Stack

Architecture

Requirements

Installation

Usage

Index documents

Query

Text-to-speech

Compare models

Manage collections

Configuration

Project structure

Supported file types

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

rag-cli

Stack

Architecture

Requirements

Installation

Usage

Index documents

Query

Text-to-speech

Compare models

Manage collections

Configuration

Project structure

Supported file types

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages