EPA Air Quality Analysis
BigQuery EPA Air Quality Analysis
Team members:
Javier Orraca
Yiting Fu
Mark Planas
Project Overview
- Dataset:
    - US Historical Air Quality data from the EPA (338 GB and continuously updated), accessible via BigQuery
- Source: Air Quality Data Collected at Outdoor Monitors Across the US
 
- Deliverables:
    - Interactive dashboard (via Dash or Bokeh) of a US map, showing how air quality has changed from 1950 to current date
 
- Interactive dashboard (via Dash or Bokeh) of a US map, showing how air quality has changed from 1950 to current date
- Project:
    - Analyze air quality summary statistics by year at the US county level
- If time permits, build a neural network via TensorFlow or PyTorch, using a pipeline created in Spark / Databricks, and made available through an interactive dashboard
 
- Research Questions:
    - How US air quality changes through time at the county level;
- What factors could cause these changes (potentially researching external data sets for understanding “why” US air quality got better or worse, etc.)
 
BigQuery Data Accessibility
You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.epa_historical_air_quality.[TABLENAME].
Code
Please see the Jupyter Notebook for details.