A multi-source data analytics project — encompassing ETL pipeline, data warehouse, and interactive dashboard — examining how primary care shortages correlate with chronic disease burden and preventable hospitalizations across US counties.
Technical Highlights
•Built end-to-end ETL pipeline integrating 5+ public health datasets (CDC PLACES, County Health Rankings, HRSA, and USDA) into Snowflake using dbt with staging, intermediate, and mart layers.
•Engineered composite vulnerability scoring model and HPSA severity tiering across 2,957 US counties.
•Identified poverty rate (r=0.34) and high disease burden counties (average 481.9 excess stays above the national average) as strongest predictors of preventable hospitalizations via Pearson correlation analysis.