|
Bita AshooriπΌ Data Engineering Portfolio |
Iβm a Data Engineer based in Vancouver with over 5 years of experience across data engineering, business intelligence, and analytics. I specialize in building clean, cloud-native data pipelines and automating workflows that help organizations turn raw data into smart decisions. I have 3+ years of experience building and maintaining cloud-based pipelines and 2+ years as a BI/ETL Developer. Iβm skilled in Python, SQL, Apache Airflow, AWS (S3, Lambda, Redshift), and modern orchestration techniques.
π» Explore my work on GitHub
π Connect with me on LinkedIn
π§ Contact me at bitaashoori20@gmail.com
π Download My Resume
Scenario: Businesses needed faster feedback loops from ad campaigns to optimize performance and engagement.
πGitHub Repo
Solution: Migrated legacy Windows Task Scheduler jobs into modular Airflow DAGs with Docker and AWS S3.
β
Potential Impact: Could reduce manual errors by up to 50% and improve job monitoring and reliability in real-world environments.
π§° Stack: Python, Apache Airflow, Docker, AWS S3
π§ͺ Tested On: Local Docker, GitHub Codespaces
Scenario: Businesses needed faster feedback loops from ad campaigns to optimize performance and engagement.
πGitHub Repo
Solution: Simulates real-time ingestion of campaign data, transforming and storing insights using PySpark and Delta Lake.
β
Potential Impact: May reduce reporting lag from 24 hours to 1 hour, enabling faster marketing insights and campaign optimization.
π§° Stack: PySpark, Databricks, GitHub Actions, AWS S3
π§ͺ Tested On: Databricks Community Edition, GitHub CI/CD
Scenario: Legacy workflows lacked observability, scalability, and centralized monitoringβcritical for modern data teams.
πGitHub Repo
Solution: Built a scalable and maintainable ETL pipeline for structured data movement from APIs to Redshift with alerting via CloudWatch.
β
Potential Impact: Should improve troubleshooting efficiency by ~30% with enhanced logging and monitoring practices.
π§° Stack: Apache Airflow, AWS Redshift, CloudWatch
π§ͺ Tested On: AWS Free Tier, Docker
Scenario: Healthcare projects using FHIR data require a clean, structured pipeline to support downstream analytics and ML.
πGitHub Repo
Solution: Processes synthetic healthcare records in FHIR JSON format and converts them into clean, queryable relational tables.
β
Potential Impact: Designed to reduce preprocessing time by 60% and prepare healthcare data for analytics and ML workloads.
π§° Stack: Python, Pandas, Synthea, SQLite, Streamlit
π§ͺ Tested On: Local + Streamlit + BigQuery-compatible
Scenario: Manual job tracking and lead sourcing is time-consuming and unscalable.
πGitHub Repo
Solution: Automates job scraping from LinkedIn using serverless AWS Lambda and stores structured output in S3.
β
Potential Impact: Can automate job scraping workflows and enable structured job search analysis without manual effort.
π§° Stack: AWS Lambda, EventBridge, BeautifulSoup, S3, CloudWatch
π§ͺ Tested On: AWS Free Tier
Scenario: Businesses need scalable ETL systems to process large sales datasets for timely business intelligence reporting.
πGitHub Repo
Solution: A production-ready PySpark ETL that ingests and transforms high-volume sales data into Delta Lake for BI.
β
Potential Impact: Built to cut transformation runtimes by 40% and improve sales reporting accuracy through Delta Lake optimization.
π§° Stack: PySpark, Delta Lake, AWS S3
π§ͺ Tested On: Local Databricks + S3