Bita Ashoori bashoori

Hi, I'm Bita 👋

Data Engineer · Azure · Databricks · Microsoft Fabric · ETL & Cloud Pipelines
I build reliable, observable, scalable data systems.

🎯 About

Data Engineer with 5+ years of experience turning fragmented, inconsistent data into reliable platforms across healthcare, retail, and enterprise systems. I focus on the spots where small data inconsistencies have real business impact — and design pipelines that stay trustworthy as complexity grows.

Currently focused on the move from legacy ETL to modern cloud lakehouse architectures on Azure, Fabric, and Databricks.

📍 Vancouver, Canada · 🇨🇦 Open to remote / hybrid roles

🛠 Tech Stack

What I work on: end-to-end pipelines (ingestion → reporting) · medallion & dimensional modeling · data quality, validation & monitoring · multi-source integration · cloud lakehouse migrations.

📌 Featured Projects

🤖 ML-Ready Data Pipeline

Medallion-architecture pipeline that turns messy, multi-source raw data into validated, observable, ML-ready feature tables.

Bronze → Silver → Gold with embedded data-quality checks at every layer
Per-run DQ report (JSON + Markdown) for observability
ML consumer example showing the DE → ML handoff with scikit-learn

Stack: Python · Pandas · SQL · Medallion · Data Quality

🚆 TransLink GTFS Data Warehouse

End-to-end data warehouse on real GTFS transit data using a medallion architecture.

Bronze → Silver → Gold layers with embedded data-quality checks
Handled domain edge cases like GTFS times beyond 24:00
Dimensional models built for time-based ridership analysis

Stack: Python · SQL · PySpark · Medallion architecture

🛒 Global Retail Lakehouse — Microsoft Fabric

Multi-region retail lakehouse with unified customer / product / sales models.

Standardized ingestion & transformation across regions
Consistent datasets for scalable Power BI reporting

Stack: Microsoft Fabric · OneLake · Power BI · Medallion

⚙️ Databricks End-to-End Pipeline

Medallion-based pipeline using Delta Lake + Unity Catalog.

Governed access, scalable processing, reusable transformations

Stack: Databricks · Delta Lake · Unity Catalog · PySpark

🌬️ Airflow + Spark + AWS Pipeline

Containerized ETL reflecting production patterns — orchestration, retries, and scheduling.

Stack: Apache Airflow · Spark · AWS S3 · Docker

📊 GitHub Stats

🎓 Certifications & Learning

✅ Microsoft Certified: Azure Data Fundamentals (DP-900)
📘 In progress: Microsoft Fabric Data Engineer (DP-700)
📚 Currently building hands-on lakehouse projects on Fabric & Databricks

📫 Let's Connect

I'm open to Data Engineering roles (full-time, contract, or remote). Reach out if you're hiring, collaborating, or just want to talk about pipelines.

LinkedIn · Portfolio · Email

_{Build systems that remain reliable as complexity grows.}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly