HalluciTrace

HalluciTrace solves the problem of hallucination propagation in long-running AI agent workflows. Most approaches treat hallucination as a per-step output problem, where each response is checked individually. But in real agent systems, a false claim introduced at step 3 does not stay at step 3. It becomes context for steps 4–20, and the agent continues building on top of it. By the time the final output is wrong, the original hallucination is often invisible in the trace.

HalluciTrace treats hallucination as a propagation problem instead of just an output problem. It surfaces where the hallucination first entered the workflow, which downstream steps became contaminated by it, how the incorrect information propagated through the reasoning chain and whether the agent ever self-corrected. This helps developers understand not just whether the final answer is wrong, but how the system became wrong in the first place, making long-running AI agents easier to debug, analyze and improve.

Live demo

halluci-trace.vercel.app

What it does

Heatmap view
N×N matrix showing propagation edge count from source step (row) to target step (column). A hot cell at row 3, column 9 means a claim from step 3 is still appearing at step 9. Row label color shows whether that step contains verified, hallucinated, or unknown claims. Per-step hallucination rate bars on the side.

Replay mode
Step-by-step walkthrough of the hallucination lifecycle:

Frame type	Meaning
`INTRODUCED`	False claim enters the agent's reasoning for the first time
`REUSED`	Same claim reappears in a later step
`AMPLIFIED`	Agent builds on the hallucination, extending it further
`CORRECTED`	Agent catches and corrects an earlier false claim
`FAILURE`	Hallucination reaches the final output step

Use ← → to step through frames, Space to autoplay.

Claims view
Full audit table of every extracted claim with verification status, reference match and similarity score. Filterable by status.

Stack

fastapi — API server uvicorn — ASGI runner openai — GPT-4o-mini (claim extraction) + text-embedding-3-small (similarity) numpy — cosine similarity on embedding vectors pandas — claim dataframe handling python-dotenv — .env loading

Run locally

git clone https://github.com/yourname/hallucitrace
cd hallucitrace

python -m venv venv
source venv/bin/activate      # Windows: venv\Scripts\activate

pip install -r requirements.txt

# Optional — for better accuracy
echo "OPENAI_API_KEY=sk-..." > .env

python api/index.py
# → http://localhost:8000

Works without an API key. Accuracy is lower on paraphrased hallucinations.

Thresholds

Parameter	With key	Without key	Notes
Verification match	0.78 cosine	0.60 SequenceMatcher	Below = "unknown"
Propagation detection	0.82 cosine	0.52 SequenceMatcher	Same claim reappearing
Amplification	0.92 cosine	0.70 SequenceMatcher	Near-verbatim repetition

Limitations

Verification quality depends on reference coverage. Claims not in the reference base are marked unknown, not false.
Without an OpenAI key, paraphrased hallucinations may not be detected as propagation — the lexical fallback misses semantic similarity.
The heuristic claim extractor misses implicit or compound-sentence claims.
Serverless cold start on Vercel ~1–2s on first request.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
api		api
data		data
public		public
.gitignore		.gitignore
readme.md		readme.md
requirements.txt		requirements.txt
vercel.json		vercel.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HalluciTrace

Live demo

What it does

Stack

Run locally

Thresholds

Limitations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

HalluciTrace

Live demo

What it does

Stack

Run locally

Thresholds

Limitations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages