Bioengineer → Computational Biologist → Data & AI
M.Tech in Biotechnology · 8 peer-reviewed publications
Started in wet lab (DNA extractions, PCR, microbial cultures), moved into computational biology, and built an ML portfolio around the methods that matter in both research and industry.
I recognised that data is the common thread across every domain and started going deep into it.
I now work across two tracks: biological data analysis for research roles, and data analytics for industry roles.
Built across financial services, healthcare, supply chain, and e-commerce ➥ the problem domains that come up most in data analyst roles at banks, consulting firms, and operations companies.
| Project | What I Built | Key Result |
|---|---|---|
| 🧠 RetailSense AI | AI dashboard: natural language → SQL → chart → insight | Full LLM pipeline; any business question answered in <10 seconds |
| 🏥 Healthcare Operations Analytics | Patient flow & resource analysis across 318,438 admissions | Identified 78.3% departmental concentration & pricing anomaly |
| 📦 Brazilian E-Commerce Supply Chain | Full-stack supply chain analytics on 100K+ real orders | 70.9% Black Friday demand spike undetected by baseline forecast |
M.Tech Biotechnology · 8 peer-reviewed publications in life sciences · B. Tech Gold Medalist
My research background gave me something most data analysts don't have: the habit of questioning data, understanding statistical significance, and communicating complex findings to non-technical audiences. I'm now applying those same skills to business data.
10 projects built in sequence, starting from fundamentals and working toward multi-omics integration. Each one was built to understand the method, not just run the code.
| Project | What I learned | Techniques |
|---|---|---|
| Heart Disease EDA | How to read a dataset before touching a model | pandas, seaborn, statistical analysis |
| Diabetes Data Cleaning | Real medical data is messy and cleaning it takes longer than modelling | Missing data imputation, IQR outlier capping, feature engineering |
| Cancer Risk Classification | When the simplest model wins and why that is not a failure | Logistic Regression, Random Forest, XGBoost, ROC-AUC |
| Survival Analysis | Time-to-event modelling has its own logic from classification | Kaplan-Meier, Cox Proportional Hazards, C-index |
| Customer Segmentation | Finding structure in data without being told what to look for | K-Means, Elbow Method, PCA |
| Gene Expression Clustering | RNA-Seq preprocessing rules and why skipping them breaks everything | Log transformation, hierarchical clustering, heatmaps |
| Explainable AI with SHAP | A model nobody can explain is a model nobody will use | TreeExplainer, beeswarm plots, bootstrap stability |
| Counterfactual Explanations | SHAP tells you why. Counterfactuals tell you what to change | Actionable counterfactuals, diverse CF generation |
| Multi-Modal Data Fusion | Genomic, microbiome, and clinical data together tell a story none can tell alone | Early/late/intermediate fusion, stacking ensemble, ablation study |
| Transfer Learning | When your dataset is small you need a model that borrows knowledge | Neural network pre-training, layer freezing, fine-tuning |
| Area | Skills |
|---|---|
| Languages | Python, SQL, R (basic) |
| Data Analytics | pandas, NumPy, matplotlib, seaborn, SciPy, statistics, EDA, data cleaning, feature engineering |
| Machine Learning | scikit-learn, classification, clustering, survival analysis, SHAP explainability, SMOTE, Random Forest, XGBoost, Logistic Regression, transfer learning, counterfactual explanations |
| Visualisation & BI | matplotlib, seaborn, Tableau, Power BI, Jupyter Notebook |
| Bioinformatics | NGS pipelines, RNA-seq, variant analysis (VCF), metagenomics, taxonomic profiling, hierarchical clustering, gene expression analysis |
| Databases & Querying | SQL, DuckDB |
| Reporting | Excel (xlsxwriter, openpyxl), Tableau Public, Power BI dashboards |
| Dev Tools | Git, GitHub, VS Code, virtual environments |
| Wet Lab | DNA/RNA extraction, PCR, qPCR, RT-PCR, microbial isolation, antimicrobial screening, enzymatic assays, protein quantification, metabolite extraction |
| Domain Knowledge | Fraud detection, healthcare operations, supply chain analytics, precision medicine, multi-omics integration |
Fair warning: if you follow me on Letterboxd @manicindisguise you will find out very quickly that I am a huge movie lover (not limiting myself to “cinephile”) . Currently obsessed with Project Hail Mary ⮕ read the book, watched the movie, both hit. George R.R. Martin said Andy Weir is the writer for you if you like a lot of science in your science fiction and honestly George was right and I will not be taking questions.
"The goal is to turn data into information, and information into insight." - Carly Fiorina