logo

LET'S TURN DATA INTO IMPACT

Project five background

Project

Overview

A multi-source data analytics project encompassing ETL pipeline, data warehouse, and interactive dashboard. Examining how primary care shortages correlate with chronic disease burden and preventable hospitalizations across US counties.

Technical Highlights

  • Built end-to-end ETL pipeline integrating 5+ public health datasets (CDC PLACES, County Health Rankings, HRSA, and USDA) into Snowflake using dbt with staging, intermediate, and mart layers.
  • Engineered 10+ derived variables across dbt staging, intermediate, and mart layers; performed data quality validation checks via dbt generic tests on staging models.
  • Developed interactive Plotly Dash dashboard visualizing shortage severity, disease burden, and preventable hospitalizations, including a KPI summary row with a custom dark theme.
  • Identified poverty rate (r=0.33) and high disease burden counties (average 482 excess stays above the national average) as strongest predictors of preventable hospitalizations.
  • Conducted exploratory data analysis and inferential statistical testing in Python (summary statistics, correlations, t-tests, ANOVA, chi-squared), and SQL analyses to answer 5 research questions across 2,957 US counties.

Technologies

Snowflake

dbt

SQL

Python

Pandas

Plotly

Back