--- title: "ATQ Guide" output: rmarkdown::html_vignette author: Vinay Joshy vignette: > %\VignetteIndexEntry{ATQ_Guide} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} editor_options: markdown: wrap: 90 --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", warning = FALSE, message = FALSE ) ``` # ATQ: Using Absenteeism Data to Detect Onset of Epidemics ## Introduction The `ATQ` package provides tools for public health institutions to detect epidemics using school absenteeism data. It offers functions to simulate regional populations of households, elementary schools, and epidemics, and to calculate alarm metrics from these simulations. This package builds on the work of Ward et al. and Vanderkruk et al. It introduces the Alert Time Quality (ATQ) metrics such as the Average ATQ (AATQ) and First ATQ (FATQ), to evaluate the timeliness and accuracy of epidemic alerts. This vignette demonstrates the package's use through a simulation study based on Vanderkruk et al., modeling yearly influenza epidemics and their alarm metrics in the Wellington-Dufferin-Guelph public health unit, Canada. To use the package, install and load it with: ```{r setup, message = FALSE, warning = FALSE} tryCatch({ devtools::install_github("vjoshy/ATQ_Surveillance_Package") }, error = function(e) { message("Unable to install package from GitHub. Using local version if available.") }) library(ATQ) ``` The following sections will guide you through population simulation, epidemic modeling, and alarm metric calculation using the `ATQ` package. ## Methods `ATQ` provides a simulation model that consists of three sequential parts: 1) a population of individuals, 2) annual influenza epidemics, 3) school absenteeism and laboratory confirmed influenza case data. The final part of this section will include alarm metrics evaluation. ### Population simulation To simulate the population of the Wellington-Dufferin-Guelph (WDG) region in Ontario, Canada, the package offers the following functions: - `catchment_sim`: Simulates catchment areas using a default gamma distribution for the number of schools in each area. The `dist_func` argument allows for specifying other distributions. - `elementary_pop`: Simulates elementary school enrollment and assigns students to catchments using a default gamma distribution. This function requires the output of `catchment_sim.` The `dist_func` argument can be modified for other distributions. - `subpop_children`: Simulates households with children using the output of `elementary_pop.` It requires specifying population proportions such as coupled parents, number of children per household type, and proportion of elementary school-age children. Distributions for parent, child, and age simulations can be specified. - `subpop_noChildren`: Simulates households without children using the outputs of `subpop_children` and `elementary_pop.` It requires specifying proportions of household sizes and the overall proportion of households with children. - `simulate_households`: Creates a list containing two simulated populations: households and individuals. If population proportions are not provided to `subpop_children` and `subpop_noChildren`, the functions will prompt the user for input. ```{r populationSimulation} set.seed(123) # Simulate 16 catchments of 80x80 squares and the number of schools they contain catchment <- catchment_sim(16, 80, dist_func = stats::rgamma, shape = 4.313, rate = 3.027) # Simulate population size of elementary schools elementary<- elementary_pop(catchment, dist_func = stats::rgamma, shape = 5.274, rate = 0.014) # Simulate households with children house_children <- subpop_children(elementary, n = 5, prop_parent_couple = 0.7668901, prop_children_couple = c(0.3634045, 0.4329440, 0.2036515), prop_children_lone = c(0.5857832, 0.3071523, 0.1070645), prop_elem_age = 0.4976825) # Simulate households without children using pre-specified proportions house_noChild <- subpop_noChildren(house_children, elementary, prop_house_size = c(0.23246269, 0.34281716, 0.16091418, 0.16427239, 0.09953358), prop_house_Children = 0.4277052) # Combine household simulations and generate individual-level data households <- simulate_households(house_children, house_noChild) ``` ### Epidemic and Laboratory Confirmed Cases simulation The package simulates epidemics using a stochastic Susceptible-Infected-Removed (SIR) framework. This approach differs from Vanderkruk et al., who used a spatial and network-based individual-level model. #### Simulation Process - Initialization: The population is divided into S (Susceptible), I (Infectious), and R (Removed) compartments. Initially, most individuals are susceptible, a few are infectious, and none are removed. - Start Date: A random start date for the epidemic is chosen based on specified average and minimum start dates. Time Steps: The simulation proceeds in discrete time steps. For each step: a. Transmission Probability (p_inf): Calculated as $1 - e^{-\alpha {\frac{I[t-1]}{N}}}$, where $\alpha$ is the transmission rate, $I[t-1]$ is the number of infectious individuals at the previous time step, and $N$ is the total population. b. New Infections (new_inf): Determined by drawing from a binomial distribution with parameters n (number of susceptible individuals) and p (transmission probability). c. Compartment Updates: - Susceptible (S): Decreases by new infections. - Infectious (I): Increases by new infections, decreases by recoveries/deaths. - Removed (R): Increases by recoveries/deaths. d. Reported Cases: A subset of new infections is reported based on the reporting rate, with delays added using an exponential distribution to reflect reporting lag. The summary and plot methods can be used to visualize and summarize the simulated epidemics: ```{r epidemicSimulation} # isolate individuals data individuals <- households$individual_sim # simulate epidemics for 10 years, each with a period of 300 days and 32 individuals infected initially # infection period of 4 days epidemic <- ssir(nrow(individuals), T = 300, alpha = 0.298, avg_start = 45, min_start = 20, inf_period = 4, inf_init = 32, report = 0.02, lag = 7, rep = 10) # Summarize and plot the epidemic simulation results summary(epidemic) plot(epidemic) ``` ### Absenteeism simulation The `compile_epi` function in this code compiles and processes epidemic data, simulating school absenteeism using epidemic and individual data. It creates a data set for actual cases, absenteeism and laboratory confirmed cases, this data set will also include a "True Alarm Window", reference dates for each epidemic year and seasonal lag values. Absenteeism data is simulated as follows: - For each day, the proportion of infected individuals based on new infection over the past few days - Whether each child is absent or not is determined using the logic: - 95% of infected children stay home - 5% of healthy children are absent for reasons other than sickness The data is aggregated across all schools. ```{r absenteeism} absent_data <- compile_epi(epidemic, individuals) dplyr::glimpse(absent_data) ``` ### Alarm Metrics Evaluation The `eval_metrics` function assesses the performance of epidemic alarm systems across various lags and thresholds using school absenteeism data. It evaluates the following key metrics: - False Alarm Rate (FAR): Proportion of alarms raised outside the true alarm window. - Added Days Delayed (ADD): Measures how many days after the optimal alarm day the first true alarm was raised. - Average Alarm Time Quality (AATQ): Mean quality of all alarms raised, considering their timing relative to the optimal alarm day. - First Alarm Time Quality (FATQ): Quality of the first alarm raised, based on its timing. - Weighted versions (WAATQ, WFATQ): Apply year-specific weights to AATQ and FATQ. A logistic regression model with lagged absenteeism and fixed seasonal terms given by: \[ \text{logit}(\theta_{tj}) = \beta_0 + \sum_{k=0}^l \beta_{k+1}x_{(t-k)j} + \beta_{l+2}\sin\Bigg(\frac{2\pi t^*}{T^*}\Bigg) + \beta_{l+3}\cos\Bigg(\frac{2\pi t^*}{T^*}\Bigg) + \gamma_j \] where \(\theta_{tj}\) represents the probability of having at least one reported case on day \( t \) for school year \( j \). The predictor variables \( x_{(t-k)j} \) are the mean daily absenteeism percentages with lag times ranging from 0 to \( l \). To account for seasonal variations in influenza, trigonometric functions with sine and cosine terms are included, where \( t^* \) denotes the calendar day of the year when \( x_{tj} \) is observed and \( T^* = 365.25 \) days represents the length of a year. Additionally, \(\gamma_j\) captures random effects specific to each school year \( j \). `eval_metrics` also identifies the best model parameters (lag & threshold) for each metric. The output is a list with three main components: - `metrics`: An object containing: - matrices of each metric (FAR, ADD, AATQ, FATQ, WAATQ, WFATQ) for all lag and threshold combinations. - best models according to each metric, including lag and threshold values. - `plot_data`: plot object to visualize epidemic data and the best model for each metric - `results`: provides summary statistics In the example provided, alarms are calculated for school years 2 to 10, considering lags up to 15 days and threshold values ranging from 0.1 to 0.6 in 0.05 increments. Year weights are assigned proportionally to the school year number. ```{r metrics, fig.height = 4, fig.width = 7} # Evaluate alarm metrics for epidemic detection # lag of 15 alarms <- eval_metrics(absent_data, maxlag = 15, thres = seq(0.1,0.6,by = 0.05)) summary(alarms$results) # Plot various alarm metrics values plot(alarms$metrics, "FAR") # False Alert Rate plot(alarms$metrics, "ADD") # Accumulated Days Delayed plot(alarms$metrics, "FATQ") # First Alert Time Quality plot(alarms$metrics, "AATQ") # Average ATQ plot(alarms$metrics, "WFATQ") # Weighted FATQ plot(alarms$metrics, "WAATQ") # Weighted Average ATQ # visualization of epidemics with alarms raised. alarm_plots <- plot(alarms$plot_data) for(i in seq_along(alarm_plots)) { print(alarm_plots[[i]]) } ``` ## References Vanderkruk, K.R., Deeth, L.E., Feng, Z. et al. ATQ: alert time quality, an evaluation metric for assessing timely epidemic detection models within a school absenteeism-based surveillance system. BMC Public Health 23, 850 (2023).