Skip to content

epiforecasts/cityforecasts

Repository files navigation

cityforecasts

R scripts to generate city-level forecasts

Summary

This repository contains R code to generate city-level forecasts for submission to the flu-metrocast hub. All models are exploratory/preliminary, though we will regularly update this document to describe the latest mathematical model used in the submission.

All outputs submitted to the Hub will be archived in this repository, along with additional model metadata (such as the model definition associated with a submission and details on any additional data sources used or decisions made in the submission process). If significant changes to the model are made during submission, we will rename the model in the submission file.

Since the Hub now solicits both local level and aggregate (typically state-level) forecasts of the specified target (typically the percent of ED visits due to flu), we will:

  • run each set of local jurisdictions within an aggregate location jointly
  • run each aggregate location (typically state) independently using the aggregate data

A table of all the locations being solicited is available at the flu-metrocast hub GitHub.

Since all forecast targets are percents, we will use the same latent and observaiton model for all locations (however, if we have count data this can easily be incorporated by using a Poisson observaiton model).

Workflow

Because all data is available publicly, the forecasts generated should be completely reproducible for any given forecast date. We use the mvgam package, which is a an R package that leverages both mgcv and brms formula interface to fit Bayesian Dynamic Generalized Additive Models (GAMs). These packages use metaprogramming to produce Stan files, and we also include the Stan code generated by the package.

To produce forecasts each week we follow the following workflow:

  1. Modify the forecast_date in targets/create_config_targets.R. If using for a real-time submission, set real_time <- TRUE, filepath_forecasts <- cityforecasts.
  2. Run targets::tar_make(). This should take about ~10 minutes to run on all 57 locations (some of which are estimated jointly). It should populate output/{filepath_forecasts}/{forecast_date}.
  3. Open a PR into the flu-metrocast hub GitHub GitHub under our team name epiforecasts.

Eventually, steps 1-2 will be automated with the Github Action .git/workflows/generate_forecasts and set on a schedule to run on Wednesdays after 2 pm EST, corresponding to the time that the target_data is updated on the Hub.

Model definition

The below describes the preliminary model used:

Observation model

For the forecasts of counts due to ED visits, we assume a Poisson observation process

$$ y_{l,t} \sim Poisson(exp(x_{l,t})) $$

For the forecasts of the percent of ED visits due to flu, we assume a Beta observation process on the proportion of ED visits due to flu:

$$\begin{align} p_{l,t} = y_{l,t} \times 100 \\\ y_{l,t} \sim Beta (z_{l,t}, \phi) \\\ logit(z_{l,t}) = x_{l,t} \end{align}$$

Latent state-space model: Dynamic hierachical GAM with independent autoregression

We model latent admissions with a hierarchical GAM component to capture shared seasonality and weekday effects and a univariate autoregressive component to capture trends in the dynamics within each location.

$$\begin{align} x_{l,t} \sim Normal(\mu_{l,t} + \delta_{l} x_{l,t-1}, \sigma_l)\\\ \mu_{l,t} = \beta_l + f_{global,t}(weekofyear) + f_{l,t}(weekofyear) \\\ \beta_l \sim Normal(\beta_{global}, \sigma_{count}) \\\ \sigma_{count} \sim exp(0.33) \\\ \delta_l \sim Normal(0.5, 0.25) T[0,1] \\\ \sigma \sim exp(1) \\\ \end{align}$$

For the NYC data, we have count data on a daily scale so we add in a weekday component

$$\mu_{l,t} = \beta_l + f_{global,t}(week) + f_{l,t}(week) + f_{global,t}(wday)$$

And since $\beta_{global}$ represents the intecept on the count scale, we place a prior on it using the mean observed count across the historical data:

$$ \beta_{global} \sim Normal(log(\frac{\sum_{l=1}^L \sum_{t=1}^T y_{l,t}}{N_{obs}}), 1) $$

where $N_obs$ is the number of observations of $y_{l,t}$.

For the TX data, $\beta_{global}$ represents the intercept as a proportion, so we use:

$$ \beta_{global} \sim Normal(logit(\frac{\sum_{l=1}^L \sum_{t=1}^T y_{l,t}}{N_{obs}}), 1) $$

For the NYC data, we have daily data so $t$ is measured in days, whereas for the Texas data, $t$ is measured in weeks.

Additional models

The above model estimates a hierarchical dynamic GAM, which contains both a GAM component and an autoregressive component. We can additionally fit a more traditional hierarchical GAM (with no autoregression but with tensor product splines to jointly estimate across location and time) as well as a vector ARIMA without a spline component. Eventually, we can also mash everything together and estimate a hierarchical GAM with a multivariante vector autoregression. These will be areas of future work.

About

R scripts to generate city-level forecasts

Resources

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages