logo

LET'S TURN DATA INTO IMPACT

Project five background

Project

Overview

A multi-source data analytics project — encompassing ETL pipeline, data warehouse, and interactive dashboard — examining how primary care shortages correlate with chronic disease burden and preventable hospitalizations across US counties.

Technical Highlights

  • Built end-to-end ETL pipeline integrating 5+ public health datasets (CDC PLACES, County Health Rankings, HRSA, and USDA) into Snowflake using dbt with staging, intermediate, and mart layers.
  • Engineered composite vulnerability scoring model and HPSA severity tiering across 2,957 US counties.
  • Identified poverty rate (r=0.34) and high disease burden counties (average 481.9 excess stays above the national average) as strongest predictors of preventable hospitalizations via Pearson correlation analysis.

Technologies

Snowflake

dbt

SQL

Pandas

Plotly

Back