portfolio

Bita Ashoori

๐Ÿ’ผ Data Engineering Portfolio

Designing scalable, cloud-native data pipelines that power decision-making across healthcare, retail, and public services.

Bita Ashoori

About Me

Iโ€™m a Data Engineer based in Vancouver with over 5 years of experience spanning data engineering, business intelligence, and analytics). I specialize in designing cloud native ETL/ELT pipelines and automating data workflows that transform raw data into actionable insights.

My background includes work across healthcare, retail, and public-sector environments, where Iโ€™ve delivered scalable and reliable data solutions. With 3+ years building cloud data pipelines and 2+ years as a BI/ETL Developer, I bring strong expertise in Python, SQL, Apache Airflow, and AWS (S3, Lambda, Redshift).

Iโ€™m currently expanding my skills in Azure and Databricks, focusing on modern data stack architecturesโ€”including Delta Lake, Medallion design, and real-time streamingโ€”to build next generation data platforms that drive performance, reliability, and business value.

Contact Me

GitHub   LinkedIn   Resume


๐Ÿ”— Quick Navigation


Project Highlights

๐Ÿ›’ Azure ADF Retail Pipeline

Scenario: Retail organizations needed an automated cloud data pipeline to consolidate and analyze sales data from multiple regions.
๐Ÿ“Ž View GitHub Repo
Solution: Developed a cloud-native ETL pipeline using Azure Data Factory that ingests, transforms, and loads retail sales data from on-prem SQL Server to Azure Data Lake and Azure SQL Database. Implemented parameterized pipelines, incremental data loads, and monitoring through ADF logs.
โœ… Impact: Improved reporting efficiency by 45%, automated data refresh cycles, and reduced manual dependencies.
๐Ÿงฐ Stack: Azure Data Factory ยท Azure SQL Database ยท Blob Storage ยท Power BI
๐Ÿงช Tested On: Azure Free Tier + GitHub Codespaces

Azure ADF Retail Pipeline Diagram


๐Ÿ—๏ธ End-to-End Data Pipeline with Databricks

Scenario: Designed and implemented a complete end-to-end ETL pipeline in Azure Databricks, applying the Medallion Architecture (Bronze โ†’ Silver โ†’ Gold) to build a modern data lakehouse for analytics.
๐Ÿ“Ž View GitHub Repo
Solution: Developed a multi-layer Delta Lake pipeline to ingest, cleanse, and aggregate retail data using PySpark and SQL within Databricks notebooks. Implemented data quality rules, incremental MERGE operations, and created analytical views for dashboards.
โœ… Impact: Improved data reliability and reduced transformation latency by enabling efficient, governed, and automated data processing in the Databricks ecosystem.
๐Ÿงฐ Stack: Azure Databricks, Delta Lake, PySpark, Spark SQL, Unity Catalog, Power BI Cloud
๐Ÿงช Tested On: Azure Databricks Community Edition + GitHub Codespaces

Databricks Lakehouse Pipeline Diagram


โ˜๏ธ Cloud ETL Modernization

Scenario: Legacy workflows lacked observability, scalability, and centralized monitoring.
๐Ÿ“Ž View GitHub Repo
Solution: Built scalable ETL from APIs to Redshift with Airflow orchestration and CloudWatch alerting; standardized schemas and error handling.
โœ… Impact: ~30% faster troubleshooting via unified logging/metrics; more consistent SLAs.
๐Ÿงฐ Stack: Apache Airflow, AWS Redshift, CloudWatch
๐Ÿงช Tested On: AWS Free Tier, Docker

Cloud ETL Diagram


๐Ÿ› ๏ธ Airflow AWS Modernization

Scenario: Legacy Windows Task Scheduler jobs needed modernization for reliability and observability.
๐Ÿ“Ž View GitHub Repo
Solution: Migrated jobs into modular Airflow DAGs containerized with Docker, storing artifacts in S3 and standardizing logging/retries.
โœ… Impact: Up to 50% reduction in manual errors and improved job monitoring/alerting.
๐Ÿงฐ Stack: Python, Apache Airflow, Docker, AWS S3
๐Ÿงช Tested On: Local Docker, GitHub Codespaces

Airflow AWS Diagram


โšก Real-Time Marketing Pipeline

Scenario: Marketing teams need faster feedback loops from ad campaigns to optimize spend and performance.
๐Ÿ“Ž View GitHub Repo
Solution: Simulated real-time ingestion of campaign data with PySpark + Delta patterns for incremental insights.
โœ… Impact: Reduced reporting lag from 24h โ†’ ~1h, enabling quicker optimization cycles.
๐Ÿงฐ Stack: PySpark, Databricks, GitHub Actions, AWS S3
๐Ÿงช Tested On: Databricks Community Edition, GitHub CI/CD

Real-Time Marketing Pipeline Diagram


๐ŸŽฎ Real-Time Player Pipeline

Scenario: Gaming companies need real-time analytics on player activity to optimize engagement and retention.
๐Ÿ“Ž View GitHub Repo

โœ… Impact: Reduced reporting lag from hours โ†’ seconds for live ops insights.
๐Ÿงฐ Stack: Kafka / AWS Kinesis, Airflow, S3, Spark

Real-Time Player Pipeline Diagram


๐Ÿ“ˆ PySpark Sales Pipeline

Scenario: Enterprises need scalable ETL for large sales datasets to drive timely BI and planning.
๐Ÿ“Ž View GitHub Repo
Solution: Production-style PySpark ETL to ingest/transform into Delta Lake with partitioning and optimization.
โœ… Impact: ~40% faster transformations and improved reporting accuracy with Delta optimizations.
๐Ÿงฐ Stack: PySpark, Delta Lake, AWS S3
๐Ÿงช Tested On: Local Databricks + S3

PySpark Sales Pipeline Diagram


๐Ÿฅ FHIR Healthcare Pipeline

Scenario: Healthcare projects using FHIR require clean, analytics-ready datasets while preserving clinical context.
๐Ÿ“Ž View GitHub Repo

โœ… Impact: Cut preprocessing time by ~60%; improved data quality.
๐Ÿงฐ Stack: Python, Pandas, SQLite, Streamlit

FHIR Pipeline Diagram


๐Ÿš€ Real-Time Event Processing with AWS Kinesis, Glue & Athena

Scenario: Simulated a real-time clickstream pipeline where user interaction events are sent to AWS Kinesis, processed with Glue, and queried in Athena.

๐Ÿงฐ Stack: Python โ€ข AWS Kinesis โ€ข AWS Glue โ€ข AWS Athena โ€ข S3
โœ… Impact: Built a reusable pattern for clickstream and analytics pipelines.

Kinesis Glue Athena Diagram


๐Ÿ” LinkedIn Scraper (Lambda)

Scenario: Manual job tracking is slow and error-prone for candidates and recruiters.
๐Ÿ“Ž View GitHub Repo

โœ… Impact: Automated lead sourcing and job search analytics.
๐Ÿงฐ Stack: AWS Lambda, EventBridge, BeautifulSoup, S3

LinkedIn Scraper Diagram


Unique Visitors portfolio Views