R scripts to generate city-level forecasts
This repository contains R code to generate city-level forecasts for submission to the flu-metrocast hub. All models are exploratory/preliminary, though we will regularly update this document to describe the latest mathematical model used in the submission.
All outputs submitted to the Hub will be archived in this repository, along with additional model metadata (such as the model definition associated with a submission and details on any additional data sources used or decisions made in the submission process). If significant changes to the model are made during submission, we will rename the model in the submission file.
Since the Hub now solicits both local level and aggregate (typically state-level) forecasts of the specified target (typically the percent of ED visits due to flu), we will:
- run each set of local jurisdictions within an aggregate location jointly
- run each aggregate location (typically state) independently using the aggregate data
A table of all the locations being solicited is available at the flu-metrocast hub GitHub.
Since all forecast targets are percents, we will use the same latent and observaiton model for all locations (however, if we have count data this can easily be incorporated by using a Poisson observaiton model).
Because all data is available publicly, the forecasts generated should be completely reproducible for any given forecast date.
We use the mvgam package, which is a an R package that leverages both mgcv and brms formula interface to fit Bayesian Dynamic Generalized Additive Models (GAMs).
These packages use metaprogramming to produce Stan files, and we also include the Stan code generated by the package.
To produce forecasts each week we follow the following workflow:
- Modify the
forecast_dateintargets/create_config_targets.R. If using for a real-time submission, setreal_time <- TRUE,filepath_forecasts <- cityforecasts. - Run
targets::tar_make(). This should take about ~10 minutes to run on all 57 locations (some of which are estimated jointly). It should populateoutput/{filepath_forecasts}/{forecast_date}. - Open a PR into the flu-metrocast hub GitHub GitHub under our team name
epiforecasts.
Eventually, steps 1-2 will be automated with the Github Action .git/workflows/generate_forecasts and set on a schedule to run on Wednesdays after 2 pm EST, corresponding to the time that the target_data is updated on the Hub.
The below describes the preliminary model used:
For the forecasts of counts due to ED visits, we assume a Poisson observation process
For the forecasts of the percent of ED visits due to flu, we assume a Beta observation process on the proportion of ED visits due to flu:
We model latent admissions with a hierarchical GAM component to capture shared seasonality and weekday effects and a univariate autoregressive component to capture trends in the dynamics within each location.
For the NYC data, we have count data on a daily scale so we add in a weekday component
And since
where
For the TX data,
For the NYC data, we have daily data so
The above model estimates a hierarchical dynamic GAM, which contains both a GAM component and an autoregressive component. We can additionally fit a more traditional hierarchical GAM (with no autoregression but with tensor product splines to jointly estimate across location and time) as well as a vector ARIMA without a spline component. Eventually, we can also mash everything together and estimate a hierarchical GAM with a multivariante vector autoregression. These will be areas of future work.