I specialize in designing and deploying enterprise-grade Generative AI and LLM systems. Currently, I'm an Applied AI Scientist, where I bridge the gap between cutting-edge research and production-scale security applications.
- π¬ Model Distillation β Successfully distilled Claude 3.5 Sonnet into a local Qwen3-0.6B model, achieving a 300Γ latency reduction.
- π‘οΈ AI for Security β Invented StackPrint (patent pending), a system clustering 3.5M+ vulnerability records with 87.5% coverage.
- π Distributed Training β Multi-GPU optimization using DeepSpeed, FSDP, and Megatron-LM.
- π οΈ Infrastructure β Scalable AI via Docker, Kubernetes, and high-throughput inference APIs.
| ML / AI |
|
| LLM Ops |
|
| Cloud & Data |
|
https://github.com/ambuj991/rufus
A Python package for intelligent web extraction tailored for RAG pipelines.
- Natural language aware web data extraction
- Designed for LLM-powered retrieval pipelines
- Compatible with LangChain & LlamaIndex
- Supports headless rendering for complex websites
π¦ Package available on TestPyPI:
https://test.pypi.org/project/rufus-ai-web-extraction/
Enterprise deployment-template detection system.
- Clustered 3.5M+ vulnerability records
- Identified provisioning groups automatically
- Reduced manual patch management by 96.3%
- Distilled Claude 3.5 Sonnet β Qwen3-0.6B
- Trained using Unsloth-accelerated LoRA
- Achieved 93.5% task accuracy
- Replaced expensive cloud API inference
graph LR
A[Applied AI] --> B[Model Optimization]
A --> C[Scalable RAG]
B --> B1[Quantization]
B --> B2[LoRA Fine-tuning]
B --> B3[Distillation]
C --> C1[VectorDBs]
C --> C2[Knowledge Graphs]
A --> D[MLOps]
D --> D1[Distributed Training]
D --> D2[Inference Serving]
style A fill:#2196F3,stroke:#fff,stroke-width:2px,color:#fff
|
University of Cincinnati M.S. Computer Science (NLP & ML) GPA: 3.8/4.0 |
SVKMβs NMIMS University B.Tech Electronics & Telecommunication Mumbai, India |

