Skip to content
View Adeliyio's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report Adeliyio

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Adeliyio/README.md

Tommy Adeliyi

AI Engineer · LLM Systems · Evaluation & Reliability

I build production-grade AI systems that organizations can trust.

My work goes beyond prototypes and demos — I focus on the engineering discipline required to make AI reliable in real-world environments: evaluation systems, calibrated confidence, self-correction loops, safety guardrails, and cost-aware architectures.


🧭 Engineering Philosophy

Most AI systems don’t fail loudly; they fail silently and convincingly.

A model works on five examples, hallucinates on the sixth, and there’s no infrastructure to catch it. I design systems for that reality.

  • Evaluation is a system → regression testing, golden datasets, LLM-as-judge
  • Confidence is engineered → calibrated scoring, uncertainty-aware escalation
  • Safety is layered → deterministic checks + semantic review
  • Cost matters → hybrid routing (frontier + local models)
  • Impact is measured → A/B testing, calibration curves, failure analysis

🚀 Featured Systems

🔧 LLM Reliability Engine

https://github.com/Adeliyio/ai-system-debugger

LLM systems don’t fail like traditional software — they fail silently with plausible but incorrect outputs.

  • Ensemble failure detection (LLM-as-judge + embeddings + rules)
  • Root cause analysis (retrieval, prompt, model, context)
  • Self-healing with regression-tested fixes
  • Hybrid routing (GPT-4o + Llama via Ollama)
  • Meta-evaluation of evaluator reliability

📊 Financial Intelligence Copilot (10-K Q&A)

https://github.com/Adeliyio/SEC-filings-knowledge-copilot

A hallucinated financial figure is worse than no answer — correctness must be provable.

  • Multi-agent LangGraph reasoning pipeline
  • Claim-level grounding + hallucination correction
  • RAGAS + regression-based evaluation
  • Transparent confidence + citations
  • Fully local deployment (Ollama)

🎧 AI Support Copilot

https://github.com/Adeliyio/customer-support-copilot

The goal isn’t generating replies — it’s ensuring responses are safe to send.

  • Agentic classification with deterministic fallback
  • Hybrid retrieval (dense + BM25 + reranking)
  • Dual-layer safety (rules + LLM review)
  • Confidence-aware escalation
  • Five-layer evaluation pipeline

📈 Deal Intelligence System

https://github.com/Adeliyio/deal-intelligence-for-sales-teams

The real question isn’t scoring deals — it’s whether AI measurably improves sales decisions.

  • Calibrated win probabilities (Platt scaling)
  • Temporal feature engineering (deal behavior tracking)
  • Evidence-grounded risk explanations
  • Counterfactual simulations for strategy decisions
  • A/B tested agent impact

🛠️ Core Stack

LLM & GenAI OpenAI · Anthropic Claude · LangChain · LangGraph · LlamaIndex · Ollama

Retrieval & Search FAISS · Pinecone · ChromaDB · BM25 · Cross-encoders

Evaluation Systems RAGAS · LLM-as-judge · Golden datasets · Custom eval harnesses

ML / Modeling PyTorch · TensorFlow · scikit-learn · XGBoost

MLOps / LLMOps MLflow · DVC · GitHub Actions · Docker

Deployment AWS · FastAPI · Streamlit


📦 MLOps Foundations

  • End-to-end ML pipelines (CI/CD + experiment tracking)
  • Model versioning (DVC + MLflow)
  • Real-time computer vision systems (YOLOv8)
  • Scalable inference APIs (FastAPI + AWS)

🏭 Where I've applied AI across business functions

Domain Representative systems
HR & Talent Resume screening · Candidate ranking · Attrition prediction
Finance Fraud detection · Cash flow forecasting · Invoice automation
Sales & Marketing Lead scoring · Churn prediction · Customer segmentation
Customer Support Support copilots · Ticket classification · Escalation systems
Data & Analytics NL-to-SQL · Forecasting · Insight generation
Legal & Compliance Contract analysis · Policy enforcement
Operations Workflow automation · Document processing
Engineering / Product Code review · Bug triage · Log analysis
Executive / Strategy KPI dashboards · Scenario planning · decision support

✍️ Writing

  • Why Most RAG Systems Fail in Production (And How to Fix Them)
  • Continuous Integration for Data Science: Automating Model Building Pipelines
  • Version Control for Machine Learning Models: Best Practices and Tools
  • Automated Model Testing and Monitoring: The Bedrock of Startup MLOps
  • Taming Data and Model Drift in Startup MLOps
  • Building Scalable Machine Learning Pipelines in Startup Environments

https://medium.com/@tommyadeliyi


🤝 Let’s Connect

If you're building AI systems that need to be:

  • reliable in production
  • measurable and testable
  • safe under real-world conditions

I’m always open to conversations around high-impact AI systems.

Pinned Loading

  1. ai-system-debugger ai-system-debugger Public

    A production-grade AI observability and self-healing system that instruments LLM pipelines, classifies semantic failures using a calibrated ensemble evaluator, performs automated root cause analysi…

    Python 2

  2. customer-support-copilot customer-support-copilot Public

    An AI copilot that helps human support agents resolve tickets faster with agentic classification, hybrid RAG retrieval, grounded response generation, and a five-layer evaluation harness. Runs fully…

    Python

  3. deal-intelligence-for-sales-teams deal-intelligence-for-sales-teams Public

    a multi-agent AI revenue intelligence system combining predictive ML, temporal feature engineering, and retrieval-augmented generation to diagnose deal risk, simulate strategy trade-offs, and optim…

    Python

  4. SEC-filings-knowledge-copilot SEC-filings-knowledge-copilot Public

    An AI copilot that reads SEC 10-K filings from Apple, Meta, and Microsoft, answering financial queries with multi-agent retrieval, self-correction, and claim-level verification. A comprehensive eva…

    HTML