AI Journal Summarizer is a production-deployed journaling analysis application with a FastAPI backend and web frontend. It focuses on reliable model-backed inference, transparent fallback diagnostics, and security-aware bring-your-own-key (BYOK) support.
- Frontend: generative-ai-journal-summarizer.vercel.app
- Backend API: ai-journal-backend-production.up.railway.app
This project demonstrates practical AI engineering for a real user-facing workflow:
- Analyze unstructured journal text for sentiment, insights, and summarization
- Support multiple providers and models behind one consistent API contract
- Preserve reliability through diagnostics, fallback behavior, and smoke-test gates
- Expose operational behavior clearly for portfolio and reviewer validation
- Frontend: static web app served via Vercel
- Backend: FastAPI service deployed on Railway
- AI providers: Groq and Hugging Face (plus premium/BYOK provider paths)
- Token vault: encrypted persistent storage for BYOK tokens
- Sentiment, insights, and summarize routes under /api/ai
- Provider-aware model catalog and tier metadata
- Session-based auth with guest and authenticated modes
- BYOK token ownership checks and restricted token usage
- Rate limiting and CORS controls for production hardening
- Diagnostics endpoint with provider error visibility
flowchart TD
A["Journal Entry\nfree text"] --> B["FastAPI Gateway\nRailway deployment"]
B --> J["Auth Layer\nSession tokens, rate limiting"]
B --> C["RAG Layer\nFAISS + SQLite\nall-MiniLM-L6-v2 384-dim"]
C --> D["Provider Router"]
D --> E["Groq\nLlama 3"]
D --> F["HuggingFace\nMistral-7B"]
D --> G["BYOK Tokens\nOpenAI, Anthropic"]
E --> H["Sentiment + Summary\nResponse"]
F --> H
G --> H
H --> I["Static Frontend\nVercel"]
The service is designed to return model-backed output when providers are healthy, while surfacing fallback details if provider calls fail.
Recent reliability work included:
- Migrating deprecated provider paths to currently supported APIs
- Replacing deprecated Groq model IDs with supported model IDs
- Updating Hugging Face routing and model compatibility
- Capturing evidence artifacts for both failing and successful runs
See evidence artifacts in:
- evidence/reliability-2026-04-12/
- evidence/reliability-2026-04-12-final-confirmed/
- evidence/RECRUITER_READY_EVIDENCE_BLOCK_2026-04-12.md
A production smoke-test gate is required before updating portfolio-facing reliability claims.
- GET /health
- GET /api/ai/diagnostics
- GET /api/ai/tier-info
- POST /api/ai/sentiment with Groq model: groq-llama3-70b
- POST /api/ai/sentiment with Hugging Face model: hf-mistral-7b
Option 1:
npm run test:smokeOption 2:
py -3 smoke_test_production.py --base-url https://ai-journal-backend-production.up.railway.app- Health endpoint returns status=healthy
- Diagnostics and tier-info return HTTP 200 with expected metadata
- Groq sentiment returns provider_used=groq and fallback_used=false
- Hugging Face sentiment returns provider_used=huggingface and fallback_used=false
- fallback_count does not increase during smoke run
- last_provider_errors is empty in the post-check snapshot
- Session token auth with guest and authenticated user modes
- Auth-required BYOK token connection route
- User ownership enforcement for BYOK token use
- Encrypted token persistence in backend vault storage
- Strict CORS configuration through environment variables
- Request rate limiting on AI routes
- GET /health
- GET /api/ai/models
- GET /api/ai/tier-info
- GET /api/ai/diagnostics
- POST /api/ai/sentiment
- POST /api/ai/insights
- POST /api/ai/summarize
- POST /api/journal — store a journal entry (embed + persist)
- GET /api/journal — list stored entries
- GET /api/journal/stats — store size and embedding info
- POST /api/rag/query — RAG-augmented analysis (retrieve → augment → LLM)
- POST /api/auth/session
- POST /api/auth/login
- GET /api/auth/me
- POST /api/auth/connect-token
All three AI endpoints accept "use_rag": true in the request body to automatically retrieve relevant past journal entries and augment the LLM prompt with longitudinal context.
The project includes a retrieval-augmented generation pipeline that gives the LLM temporal context across journal entries:
- Ingest —
POST /api/journalembeds journal text withall-MiniLM-L6-v2(384-dim) and stores the vector in FAISS alongside the text in SQLite. - Retrieve — On query, the pipeline embeds the input, searches FAISS with cosine similarity, and returns the top-k most relevant past entries.
- Augment — Retrieved entries are formatted into a context block and prepended to the LLM prompt, enabling the model to reference patterns, themes, and emotional trends across the user's journal history.
- Generate — The augmented prompt is sent to the selected LLM provider (Groq, HuggingFace, OpenAI, Anthropic, etc.).
| Component | Implementation |
|---|---|
| Embeddings | sentence-transformers/all-MiniLM-L6-v2 (384-dim) |
| Vector store | FAISS IndexFlatIP with L2-normalized vectors (cosine similarity) |
| Text store | SQLite (data/journal.db) |
| Prompts | Task-specific templates with RAG context blocks (rag/prompts.py) |
Evaluated on a 20-entry golden test set with 5 thematic queries (k=3):
| Metric | Score |
|---|---|
| Recall@3 | 0.77 |
| Precision@3 | 0.80 |
| MRR | 1.00 |
| Avg Cosine Similarity | 0.42 |
MRR = 1.0 means the first retrieved result is always relevant. Run the eval:
python eval/run_eval.pyA ReAct-style agent built from Groq API primitives (no LangChain) that orchestrates multi-step journal analysis with 5 tools:
| Tool | Purpose |
|---|---|
journal_search |
Semantic search over past entries via FAISS |
analyze_sentiment |
Emotion/tone analysis on entry text |
trend_analysis |
Pattern detection across entries over time |
reflect |
LLM self-critique to catch unsupported claims |
suggest_actions |
Actionable recommendations from journal patterns |
The planner loops: prompt → LLM → tool calls → execute → observe → repeat until the LLM produces a grounded final response. Conversation + artifact memory persists in SQLite.
Evaluated on 10 benchmark cases across 6 categories (Llama 4 Scout 17B):
| Metric | Score |
|---|---|
| Pass rate | 90% (9/10) |
| Tool precision | 0.77 |
| Tool recall | 0.92 |
| Keyword hit rate | 85% |
| Avg latency | 4.8s |
| Avg steps/query | 4.2 |
python -m agent.eval_agent- Node.js 16+
- Python 3.8+
npm install
npm run backend:installnpm run backend:dev
npm run webAuthoritative project status and next steps are tracked in:
- PROJECT_STATUS_NEXT_STEPS.md
This repository is maintained as a portfolio-grade AI engineering project. Documentation and claims are expected to remain evidence-based and aligned with live production behavior.
Use this concise narrative block directly in portfolio pages and LinkedIn project posts:
- Problem: journaling tools often lack reliable, explainable AI inference in production.
- Architecture: Vercel frontend + FastAPI on Railway, multi-provider routing (Groq and Hugging Face), session auth, BYOK token controls, encrypted token vault, and diagnostics telemetry.
- Reliability proof: live production failures were captured, root causes identified (provider deprecations and endpoint migration), remediations shipped, and final confirmation validated provider-backed output.
- Outcomes: production health stability, provider-backed inference restored, and repeatable smoke quality gate established.
- Tradeoffs: explicit fallback visibility prioritized over silent failover; lightweight auth chosen for delivery speed while preserving ownership boundaries.
Extended, copy-ready versions (portfolio + 30-second demo + LinkedIn) are available in:
- evidence/PORTFOLIO_DEMO_NARRATIVE_BLOCK.md