HalluciTrace solves the problem of hallucination propagation in long-running AI agent workflows. Most approaches treat hallucination as a per-step output problem, where each response is checked individually. But in real agent systems, a false claim introduced at step 3 does not stay at step 3. It becomes context for steps 4–20, and the agent continues building on top of it. By the time the final output is wrong, the original hallucination is often invisible in the trace.
HalluciTrace treats hallucination as a propagation problem instead of just an output problem. It surfaces where the hallucination first entered the workflow, which downstream steps became contaminated by it, how the incorrect information propagated through the reasoning chain and whether the agent ever self-corrected. This helps developers understand not just whether the final answer is wrong, but how the system became wrong in the first place, making long-running AI agents easier to debug, analyze and improve.
Heatmap view
N×N matrix showing propagation edge count from source step (row) to
target step (column). A hot cell at row 3, column 9 means a claim from
step 3 is still appearing at step 9. Row label color shows whether that
step contains verified, hallucinated, or unknown claims. Per-step
hallucination rate bars on the side.
Replay mode
Step-by-step walkthrough of the hallucination lifecycle:
| Frame type | Meaning |
|---|---|
INTRODUCED |
False claim enters the agent's reasoning for the first time |
REUSED |
Same claim reappears in a later step |
AMPLIFIED |
Agent builds on the hallucination, extending it further |
CORRECTED |
Agent catches and corrects an earlier false claim |
FAILURE |
Hallucination reaches the final output step |
Use ← → to step through frames, Space to autoplay.
Claims view
Full audit table of every extracted claim with verification status,
reference match and similarity score. Filterable by status.
fastapi — API server uvicorn — ASGI runner openai — GPT-4o-mini (claim extraction) + text-embedding-3-small (similarity) numpy — cosine similarity on embedding vectors pandas — claim dataframe handling python-dotenv — .env loading
git clone https://github.com/yourname/hallucitrace
cd hallucitrace
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
# Optional — for better accuracy
echo "OPENAI_API_KEY=sk-..." > .env
python api/index.py
# → http://localhost:8000Works without an API key. Accuracy is lower on paraphrased hallucinations.
| Parameter | With key | Without key | Notes |
|---|---|---|---|
| Verification match | 0.78 cosine | 0.60 SequenceMatcher | Below = "unknown" |
| Propagation detection | 0.82 cosine | 0.52 SequenceMatcher | Same claim reappearing |
| Amplification | 0.92 cosine | 0.70 SequenceMatcher | Near-verbatim repetition |
- Verification quality depends on reference coverage. Claims not in the
reference base are marked
unknown, notfalse. - Without an OpenAI key, paraphrased hallucinations may not be detected as propagation — the lexical fallback misses semantic similarity.
- The heuristic claim extractor misses implicit or compound-sentence claims.
- Serverless cold start on Vercel ~1–2s on first request.