-
PDF
- Split View
-
Views
-
Cite
Cite
Maria Bekker-Nielsen Dunbar, Felix Hofmann, Leonhard Held, Session 3 of the RSS Special Topic Meeting on Covid-19 Transmission: Replies to the Discussion, Journal of the Royal Statistical Society Series A: Statistics in Society, Volume 185, Issue Supplement_1, November 2022, Pages S158–S164, https://doi.org/10.1111/rssa.12985
- Share Icon Share
1 ESTIMATING ACCURATE CASE COUNTS (DIGGLE)
Entire editions of academic journals are dedicated to infectious disease modelling efforts while proper use of data to inform the modelling has been emphasised only recently (e.g. Held et al., 2020). The importance of data deserves highlighting and it is noteworthy that one of the most detailed and often analysed data sets in the field dates back to a measles outbreak in 1861 (Aaby et al., 2021). Without useful data, we will not be able to estimate the susceptible and asymptomatic proportions of the population.
Strengthening and improving national and intergovernmental (coordinated by bodies such as ECDC and WHO) disease surveillance and monitoring systems allows for improved early disease outbreak detection. Such disease surveillance systems include both mandatory case reporting of notifiable disease, sentinel surveillance systems, and also internet and news media, under the umbrella of epidemic intelligence services. Disease surveillance requires certain amounts of man power and resources to function and systems have seen increases in technological capacity in recent years (Groseclose & Buckeridge, 2017; Hulth et al., 2010). Resources needed for ‘infodemic’ management also reduces the amount of human effort available for surveillance activities.
Time series of infectious disease cases typically arising from a surveillance system can easily be modelled using the framework we used and presented. However, if the underlying data is flawed, so too will be the outputs. We are cognisant of the adage ‘garbage in, garbage out’. While we are aware of many funding opportunities for COVID-19 modelling, it is unclear how much emergency grant support has been given to strengthening current and future data gathering and storing infrastructure. Utilising existing data mechanisms rather than ‘re-inventing the wheel’ is paramount. Relatedly, there has recently been an attempt at re-branding the data-focused parts of infectious disease surveillance as ‘outbreak analytics’ (Polonsky et al., 2019).
In our own work examining the effect of travel restrictions to neighbouring regions on cases in Switzerland we have recently considered both Italian and French case data (see Grimée et al., 2022, for an initial analysis of some of the regions) and have experienced two matters that caused us to consider the data in further detail and not simply model it as-is. The first is that certain case counts in Italian regions show changes from one day to the next which seem unrealistic. In particular we have instances of zero (or even negative!) case counts followed by large counts. The second is incoherence in case counts for French regions between data sets after changing data providers. The Zurich case data does not suffer such problems, but certain cases may not be captured by the surveillance system, and so there is a risk of underreporting.
2 UNDERREPORTING (DIGGLE, SCALIA-TOMBA, AND KUCHARSKI)
Routine infectious disease surveillance systems are prone to only capturing part of the disease prevalence and so provide an incomplete picture of the burden. Specifically, not all infected persons will develop symptoms (asymptomatic cases) and thus seek health care, whereby their case may not be reported in either notifiable disease surveillance systems or sentinel and syndromic surveillance systems. The impact of underreporting on endemic-epidemic models was examined by Bracher and Held (2021) and we are aware we need to correct for this in our analysis of school closure, taking into account that underreporting may be age dependent. The reporting also depends on a correct clinical diagnosis (i.e. no misdiagnosis) and timely entry in the notification system. Certain delays are inherent to the reporting system, for example, the time between a test being taken and sent to laboratory for analysis, and are usually corrected for using nowcasting (Höhle & an der Heiden, 2014). Increased testing efforts are expected to change the reporting rate as more asymptomatic cases will be captured, and so underreporting is also time dependent.
3 METRICS FOR COMMUNICATION BETWEEN TECHNICAL EXPERTS AND POLICY MAKERS (SCALIA-TOMBA, KUCHARSKI, AND PANOVSKA-GRIFFITHS)
Our work is a ‘proof-of-concept’ analysis and forms the basis for an extended analysis of data from all of Switzerland and so the feedback will help hone future efforts. Our paper provides expected case counts in order to investigate the effect of school closures on disease incidence in the relevant age groups and shows that such an approach works. Such expected case counts could be a metric reported in addition to the effective reproduction number and the growth rate . For specific formulations of endemic-epidemic models, it is even possible to estimate an effective reproduction number in addition to expected cases (see Bracher & Held, 2021, for details).
4 NEED FOR NULL HYPOTHESES IN INFECTIOUS DISEASE MODELLING (RILEY)
We agree that there is a need for well-specified null hypotheses to examine the effect of disease control interventions. Null hypotheses may need to be borne from benefit-harm assessments. The societal damage from a public health emergency affects more than simple case counts. It is crucial to balance benefits and harms, which policy makers do qualitatively, in a quantitative manner. As we are not in the position to decide which measures to introduce or lift, we cannot determine with great certainty what an ‘acceptable’ number of additional expected cases is, and so construct a hypothesis test for our work, but we like to stress the importance of age in such considerations.
Related to this, we wish to briefly highlight an experience we have had during our work in the previous year: To avoid creating unnecessary research waste and add to the gargantuan amount of exploratory COVID-19 modelling papers, we submitted our work as registered research with an associated study protocol (Chambers, 2019a, 2019b). The preregistration was written according to Van den Akker et al. (2020) specifications. One of the sticking points from our protocol is how to determine a specific and testable hypothesis for our approach with associated rationale (question 4 of the Van den Akker et al., 2020 specification). In the absence of well-defined null hypotheses as requested by Riley, such protocols can be hard to complete.
Reviewers specialised in modelling analyses of infectious disease surveillance data do not seem well-versed in the preregistered publication approach. The academic editor admitted to finding reviewers with the required subject matter expertise who were also able to review proposed procedures difficult. Finding reviewers for the myriad COVID-19 papers being released is already taxing (Schwab & Held, 2020). It would seem following traditional publication methods (with review only occurring after the analysis is completed) are the ones used by the wider field, albeit with pre-prints and providing access to a repository with their analysis code being increasingly utilised (Brooks-Pollock et al., 2021). These approaches still do not allow the option to appraise the methods before they are applied to data. Additionally, checks of data quality prior to modelling (cf. the need for improved data) provide additional motivation for infectious disease modellers to preregister their work.
5 COMPARING HYPOTHETICAL CONTROL OPTIONS (KUCHARSKI)
While we used prediction retrospectively, the model could also be used prospectively to predict the effects of a future control scenario. The endemic-epidemic modelling framework is often used in probabilistic forecasting (Bauer et al., 2016; Held & Meyer, 2020; Ray et al., 2017; Stojanović et al., 2019). Many of the recent extensions to the framework consider aspects which need to be considered for such forward-looking approaches (Bracher & Held, 2022; Held et al., 2017) and incorporate methodology used in weather forecasting. We have not personally examined future scenarios of interventions using the modelling framework, as we have preferred to inform our work by available data.
Informing the model with future hypothetical time-varying contact matrices would enable us to examine the predicted number of cases under such hypothetical scenarios, for example, returning to baseline contact levels to represent fully reopening/removal of all social distancing measures. For examples of how such hypothetical contact matrices may be constructed see Willem et al. (2020) and Prem et al. (2020). Similarly to how we constructed our contact matrix with data on policy interventions, Alleman et al. (2021) informed changes to a contact matrix with mobility data and van Leeuwen et al. (2021) updated a contact matrix using time-use survey information. An alternative would be to use contact surveys conducted during the COVID-19 pandemic (Feehan & Mahmud, 2021; Jarvis et al., 2020, 2021; Latsuzbaia et al., 2020). In the work presented here—the pilot analysis of Zurich COVID-19 case data—we used a synthetic contact matrix which is informed by demographic data as well as contact diary surveys (Mistry et al., 2021). Demographic data has also been suggested as a way of ‘updating’ older contact matrices for newer use (Arregui et al., 2018) as the commonly used POLYMOD matrices are now some 16 years old and conducting a contact survey may be resource intensive.
6 CHANGES IN TRANSMISSIBILITY AND CHOICE OF AGE GROUPS (RILEY AND SCALIA-TOMBA)
The construction of our time-varying transmission weights is based upon informing a contact matrix by policy indicators given as step functions. We have previously considered use of ramp functions (as an alternative representation of changes in policy) in place of step functions. However, the choice of slope in such a ramp function needs to be informed by relevant information. We have not considered a smooth function as suggested by Riley. For simplicity, we continued our work with the step function representation of policy (hence transmission opportunity) changes.
It is true that the construction of the time-varying contact matrices has assumed all members of the population are in the same class with respect to factors that are not age. If information on subclasses of interest (e.g. ‘responding’) is available to inform the model, it would be possible to include an extended contact matrix including subclasses, meaning cases would also need to be further divided depending on subclass status. If such a status is true for certain age groups, for example, the younger three age groups, it may be better represented as a covariate with the same matrix structure as the observed counts rather than increasing the dimension of the matrix to reflect the increased number of classes. The goal is to include enough nuance that the transmission matrix is informative for the groups of interest included in the model, but does not incorporate unnecessary distinctions which could mean artificially low disease counts would enter the model, and could cause convergence problems.
For example, in our work we have not stratified cases by sex, as the patterns of case counts are similar for each sex. It bears mentioning that summing the results from a multivariate endemic-epidemic model may not yield the same as those found in the univariate version, as the interplay between groups will not have been incorporated. A related issue is how sensitive the results are to the choice of the age groups. We have tried to define the age groups in a reasonable manner (school children, working adults, elderly, etc.) though it would be interesting to investigate how sensitive the results are to other stratifications.
7 GENERALISABILITY AND VACCINES (KUCHARSKI)
While our modelling approach can easily be applied to other countries, when working with COVID-19 data for multiple regions, it is pertinent that users of data gathered consider whether the case definitions and testing strategies are the same across regions. If data is not harmonised in such a manner, conclusions may not be straightforward in multi-region comparisons. With regards to the roll out of COVID-19 vaccines, it is important to know not just how many doses of vaccine have been given but also which ones are being given. To continue with the examples of the two countries considered, at the time of writing (July 2021), Switzerland has only licensed messenger RNA vaccines (Comirnaty and Spikevax) for use against COVID-19, while other options exist (e.g. adenovirus-based Vaxzevria) in the United Kingdom, a nuance which might not be evident from numbers of proportion vaccinated in each country. Furthermore the immunisation regimens are different; many younger Swiss residents are currently fully vaccinated with 4–6 weeks between shots while their British equivalents are waiting up to 12 weeks between shots and were invited for their first vaccination later. However, once appropriate considerations have been made regarding vaccine and case data, it is possible to incorporate (time-dependent) vaccination coverage rates in endemic-epidemic modelling. To appropriately account for the remaining (unvaccinated) number of susceptibles, use of the log proportion of unvaccinated cases is recommended following Herzog et al. (2011). This is also the approach Kucharski and colleagues have utilised in their endemic-epidemic model for measles which included vaccination (Robert et al., 2022).
8 INTERPRETABILITY (KUCHARSKI)
It is true that there is a balance between what data allows us to fit and how realistic and interpretable our model is. The benefit of the endemic-epidemic modelling framework is that it allows us to examine the spread of disease across age groups with flexible statistical techniques. The first instance of such a multivariate model is Knorr-Held and Richardson (2003) which investigated the spatio-temporal dynamics of meningococcal disease. Compartmental models are easier to interpret, but more difficult to apply to surveillance data (see Held et al., 2006; Paul et al., 2008, for further discussion).
ACKNOWLEDGEMENTS
We thank the Royal Statistical Society for the opportunity to appraise comments from discussants and the discussants themselves for providing feedback. As some of the comments cover similar topics, we respond to the points raised rather than individual responses.