Skip to content
View Shaflovescoffee19's full-sized avatar
💭
👩🏻‍💻
💭
👩🏻‍💻

Block or report Shaflovescoffee19

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Shaflovescoffee19/README.md

Hi, I'm Mohiraa

Bioengineer → Computational Biologist → Data & AI

M.Tech in Biotechnology · 8 peer-reviewed publications

Started in wet lab (DNA extractions, PCR, microbial cultures), moved into computational biology, and built an ML portfolio around the methods that matter in both research and industry.

I recognised that data is the common thread across every domain and started going deep into it.

I now work across two tracks: biological data analysis for research roles, and data analytics for industry roles.

LinkedIn Google Scholar


Industry Analytics Projects

Built across financial services, healthcare, supply chain, and e-commerce ➥ the problem domains that come up most in data analyst roles at banks, consulting firms, and operations companies.

Project What I Built Key Result
🧠 RetailSense AI AI dashboard: natural language → SQL → chart → insight Full LLM pipeline; any business question answered in <10 seconds
🏥 Healthcare Operations Analytics Patient flow & resource analysis across 318,438 admissions Identified 78.3% departmental concentration & pricing anomaly
📦 Brazilian E-Commerce Supply Chain Full-stack supply chain analytics on 100K+ real orders 70.9% Black Friday demand spike undetected by baseline forecast

🎓 Background

M.Tech Biotechnology · 8 peer-reviewed publications in life sciences · B. Tech Gold Medalist

My research background gave me something most data analysts don't have: the habit of questioning data, understanding statistical significance, and communicating complex findings to non-technical audiences. I'm now applying those same skills to business data.

Computational Biology & ML Portfolio

10 projects built in sequence, starting from fundamentals and working toward multi-omics integration. Each one was built to understand the method, not just run the code.

Project What I learned Techniques
Heart Disease EDA How to read a dataset before touching a model pandas, seaborn, statistical analysis
Diabetes Data Cleaning Real medical data is messy and cleaning it takes longer than modelling Missing data imputation, IQR outlier capping, feature engineering
Cancer Risk Classification When the simplest model wins and why that is not a failure Logistic Regression, Random Forest, XGBoost, ROC-AUC
Survival Analysis Time-to-event modelling has its own logic from classification Kaplan-Meier, Cox Proportional Hazards, C-index
Customer Segmentation Finding structure in data without being told what to look for K-Means, Elbow Method, PCA
Gene Expression Clustering RNA-Seq preprocessing rules and why skipping them breaks everything Log transformation, hierarchical clustering, heatmaps
Explainable AI with SHAP A model nobody can explain is a model nobody will use TreeExplainer, beeswarm plots, bootstrap stability
Counterfactual Explanations SHAP tells you why. Counterfactuals tell you what to change Actionable counterfactuals, diverse CF generation
Multi-Modal Data Fusion Genomic, microbiome, and clinical data together tell a story none can tell alone Early/late/intermediate fusion, stacking ensemble, ablation study
Transfer Learning When your dataset is small you need a model that borrows knowledge Neural network pre-training, layer freezing, fine-tuning

Technical Skills

Area Skills
Languages Python, SQL, R (basic)
Data Analytics pandas, NumPy, matplotlib, seaborn, SciPy, statistics, EDA, data cleaning, feature engineering
Machine Learning scikit-learn, classification, clustering, survival analysis, SHAP explainability, SMOTE, Random Forest, XGBoost, Logistic Regression, transfer learning, counterfactual explanations
Visualisation & BI matplotlib, seaborn, Tableau, Power BI, Jupyter Notebook
Bioinformatics NGS pipelines, RNA-seq, variant analysis (VCF), metagenomics, taxonomic profiling, hierarchical clustering, gene expression analysis
Databases & Querying SQL, DuckDB
Reporting Excel (xlsxwriter, openpyxl), Tableau Public, Power BI dashboards
Dev Tools Git, GitHub, VS Code, virtual environments
Wet Lab DNA/RNA extraction, PCR, qPCR, RT-PCR, microbial isolation, antimicrobial screening, enzymatic assays, protein quantification, metabolite extraction
Domain Knowledge Fraud detection, healthcare operations, supply chain analytics, precision medicine, multi-omics integration

Fair warning: if you follow me on Letterboxd @manicindisguise you will find out very quickly that I am a huge movie lover (not limiting myself to “cinephile”) . Currently obsessed with Project Hail Mary ⮕ read the book, watched the movie, both hit. George R.R. Martin said Andy Weir is the writer for you if you like a lot of science in your science fiction and honestly George was right and I will not be taking questions.

"The goal is to turn data into information, and information into insight." - Carly Fiorina

Pinned Loading

  1. healthcare-operations-analytics healthcare-operations-analytics Public

    Healthcare ops analytics: End-to-end analysis of 318K hospital admissions to surface operational inefficiencies · Python · Power BI · Excel

    Jupyter Notebook

  2. retailsense-ai retailsense-ai Public

    AI-powered retail analytics dashboard, ask business questions in plain English, get SQL, charts & executive insights instantly

    Python

  3. brazilian-ecommerce-supply-chain brazilian-ecommerce-supply-chain Public

    Supply chain analytics on 100K+ real Brazilian e-commerce orders. Inventory turnover, stockout risk, seller lead times, and demand forecasting gaps. Python, SQL, Tableau, Excel.

    Python

  4. fraud-detection-portfolio fraud-detection-portfolio Public

    Credit card fraud detection | Random Forest 0.97 AUC | Python, scikit-learn, SMOTE, pandasCredit card fraud detection | Random Forest 0.97 AUC | Python, scikit-learn, SMOTE, pandas

    Python

  5. transfer-learning transfer-learning Public

    Applies transfer learning to overcome limited CRC cohort data by pre-training a neural network on 2000-patient pan cancer data then fine tuning on 200 target patients

    Python

  6. gene-expression-clustering gene-expression-clustering Public

    Discovers cancer subtypes from simulated RNA-Seq data using log transformation, variance-based feature selection, PCA, hierarchical clustering, and clustered heatmaps. Implements the core computati…

    Python