1 ESTIMATING ACCURATE CASE COUNTS (DIGGLE)

Entire editions of academic journals are dedicated to infectious disease modelling efforts while proper use of data to inform the modelling has been emphasised only recently (e.g. Held et al., 2020). The importance of data deserves highlighting and it is noteworthy that one of the most detailed and often analysed data sets in the field dates back to a measles outbreak in 1861 (Aaby et al., 2021). Without useful data, we will not be able to estimate the susceptible and asymptomatic proportions of the population.

Strengthening and improving national and intergovernmental (coordinated by bodies such as ECDC and WHO) disease surveillance and monitoring systems allows for improved early disease outbreak detection. Such disease surveillance systems include both mandatory case reporting of notifiable disease, sentinel surveillance systems, and also internet and news media, under the umbrella of epidemic intelligence services. Disease surveillance requires certain amounts of man power and resources to function and systems have seen increases in technological capacity in recent years (Groseclose & Buckeridge, 2017; Hulth et al., 2010). Resources needed for ‘infodemic’ management also reduces the amount of human effort available for surveillance activities.

Time series of infectious disease cases typically arising from a surveillance system can easily be modelled using the framework we used and presented. However, if the underlying data is flawed, so too will be the outputs. We are cognisant of the adage ‘garbage in, garbage out’. While we are aware of many funding opportunities for COVID-19 modelling, it is unclear how much emergency grant support has been given to strengthening current and future data gathering and storing infrastructure. Utilising existing data mechanisms rather than ‘re-inventing the wheel’ is paramount. Relatedly, there has recently been an attempt at re-branding the data-focused parts of infectious disease surveillance as ‘outbreak analytics’ (Polonsky et al., 2019).

In our own work examining the effect of travel restrictions to neighbouring regions on cases in Switzerland we have recently considered both Italian and French case data (see Grimée et al., 2022, for an initial analysis of some of the regions) and have experienced two matters that caused us to consider the data in further detail and not simply model it as-is. The first is that certain case counts in Italian regions show changes from one day to the next which seem unrealistic. In particular we have instances of zero (or even negative!) case counts followed by large counts. The second is incoherence in case counts for French regions between data sets after changing data providers. The Zurich case data does not suffer such problems, but certain cases may not be captured by the surveillance system, and so there is a risk of underreporting.

2 UNDERREPORTING (DIGGLE, SCALIA-TOMBA, AND KUCHARSKI)

Routine infectious disease surveillance systems are prone to only capturing part of the disease prevalence and so provide an incomplete picture of the burden. Specifically, not all infected persons will develop symptoms (asymptomatic cases) and thus seek health care, whereby their case may not be reported in either notifiable disease surveillance systems or sentinel and syndromic surveillance systems. The impact of underreporting on endemic-epidemic models was examined by Bracher and Held (2021) and we are aware we need to correct for this in our analysis of school closure, taking into account that underreporting may be age dependent. The reporting also depends on a correct clinical diagnosis (i.e. no misdiagnosis) and timely entry in the notification system. Certain delays are inherent to the reporting system, for example, the time between a test being taken and sent to laboratory for analysis, and are usually corrected for using nowcasting (Höhle & an der Heiden, 2014). Increased testing efforts are expected to change the reporting rate as more asymptomatic cases will be captured, and so underreporting is also time dependent.

3 METRICS FOR COMMUNICATION BETWEEN TECHNICAL EXPERTS AND POLICY MAKERS (SCALIA-TOMBA, KUCHARSKI, AND PANOVSKA-GRIFFITHS)

Our work is a ‘proof-of-concept’ analysis and forms the basis for an extended analysis of data from all of Switzerland and so the feedback will help hone future efforts. Our paper provides expected case counts in order to investigate the effect of school closures on disease incidence in the relevant age groups and shows that such an approach works. Such expected case counts could be a metric reported in addition to the effective reproduction number R and the growth rate r. For specific formulations of endemic-epidemic models, it is even possible to estimate an effective reproduction number in addition to expected cases (see Bracher & Held, 2021, for details).

4 NEED FOR NULL HYPOTHESES IN INFECTIOUS DISEASE MODELLING (RILEY)

We agree that there is a need for well-specified null hypotheses to examine the effect of disease control interventions. Null hypotheses may need to be borne from benefit-harm assessments. The societal damage from a public health emergency affects more than simple case counts. It is crucial to balance benefits and harms, which policy makers do qualitatively, in a quantitative manner. As we are not in the position to decide which measures to introduce or lift, we cannot determine with great certainty what an ‘acceptable’ number of additional expected cases is, and so construct a hypothesis test for our work, but we like to stress the importance of age in such considerations.

Related to this, we wish to briefly highlight an experience we have had during our work in the previous year: To avoid creating unnecessary research waste and add to the gargantuan amount of exploratory COVID-19 modelling papers, we submitted our work as registered research with an associated study protocol (Chambers, 2019a, 2019b). The preregistration was written according to Van den Akker et al. (2020) specifications. One of the sticking points from our protocol is how to determine a specific and testable hypothesis for our approach with associated rationale (question 4 of the Van den Akker et al., 2020 specification). In the absence of well-defined null hypotheses as requested by Riley, such protocols can be hard to complete.

Reviewers specialised in modelling analyses of infectious disease surveillance data do not seem well-versed in the preregistered publication approach. The academic editor admitted to finding reviewers with the required subject matter expertise who were also able to review proposed procedures difficult. Finding reviewers for the myriad COVID-19 papers being released is already taxing (Schwab & Held, 2020). It would seem following traditional publication methods (with review only occurring after the analysis is completed) are the ones used by the wider field, albeit with pre-prints and providing access to a repository with their analysis code being increasingly utilised (Brooks-Pollock et al., 2021). These approaches still do not allow the option to appraise the methods before they are applied to data. Additionally, checks of data quality prior to modelling (cf. the need for improved data) provide additional motivation for infectious disease modellers to preregister their work.

5 COMPARING HYPOTHETICAL CONTROL OPTIONS (KUCHARSKI)

While we used prediction retrospectively, the model could also be used prospectively to predict the effects of a future control scenario. The endemic-epidemic modelling framework is often used in probabilistic forecasting (Bauer et al., 2016; Held & Meyer, 2020; Ray et al., 2017; Stojanović et al., 2019). Many of the recent extensions to the framework consider aspects which need to be considered for such forward-looking approaches (Bracher & Held, 2022; Held et al., 2017) and incorporate methodology used in weather forecasting. We have not personally examined future scenarios of interventions using the modelling framework, as we have preferred to inform our work by available data.

Informing the model with future hypothetical time-varying contact matrices would enable us to examine the predicted number of cases under such hypothetical scenarios, for example, returning to baseline contact levels to represent fully reopening/removal of all social distancing measures. For examples of how such hypothetical contact matrices may be constructed see Willem et al. (2020) and Prem et al. (2020). Similarly to how we constructed our contact matrix with data on policy interventions, Alleman et al. (2021) informed changes to a contact matrix with mobility data and van Leeuwen et al. (2021) updated a contact matrix using time-use survey information. An alternative would be to use contact surveys conducted during the COVID-19 pandemic (Feehan & Mahmud, 2021; Jarvis et al., 2020, 2021; Latsuzbaia et al., 2020). In the work presented here—the pilot analysis of Zurich COVID-19 case data—we used a synthetic contact matrix which is informed by demographic data as well as contact diary surveys (Mistry et al., 2021). Demographic data has also been suggested as a way of ‘updating’ older contact matrices for newer use (Arregui et al., 2018) as the commonly used POLYMOD matrices are now some 16 years old and conducting a contact survey may be resource intensive.

6 CHANGES IN TRANSMISSIBILITY AND CHOICE OF AGE GROUPS (RILEY AND SCALIA-TOMBA)

The construction of our time-varying transmission weights is based upon informing a contact matrix by policy indicators given as step functions. We have previously considered use of ramp functions (as an alternative representation of changes in policy) in place of step functions. However, the choice of slope in such a ramp function needs to be informed by relevant information. We have not considered a smooth function as suggested by Riley. For simplicity, we continued our work with the step function representation of policy (hence transmission opportunity) changes.

It is true that the construction of the time-varying contact matrices has assumed all members of the population are in the same class with respect to factors that are not age. If information on subclasses of interest (e.g. ‘responding’) is available to inform the model, it would be possible to include an extended contact matrix including subclasses, meaning cases would also need to be further divided depending on subclass status. If such a status is true for certain age groups, for example, the younger three age groups, it may be better represented as a covariate with the same matrix structure as the observed counts rather than increasing the dimension of the matrix to reflect the increased number of classes. The goal is to include enough nuance that the transmission matrix is informative for the groups of interest included in the model, but does not incorporate unnecessary distinctions which could mean artificially low disease counts would enter the model, and could cause convergence problems.

For example, in our work we have not stratified cases by sex, as the patterns of case counts are similar for each sex. It bears mentioning that summing the results from a multivariate endemic-epidemic model may not yield the same as those found in the univariate version, as the interplay between groups will not have been incorporated. A related issue is how sensitive the results are to the choice of the age groups. We have tried to define the age groups in a reasonable manner (school children, working adults, elderly, etc.) though it would be interesting to investigate how sensitive the results are to other stratifications.

7 GENERALISABILITY AND VACCINES (KUCHARSKI)

While our modelling approach can easily be applied to other countries, when working with COVID-19 data for multiple regions, it is pertinent that users of data gathered consider whether the case definitions and testing strategies are the same across regions. If data is not harmonised in such a manner, conclusions may not be straightforward in multi-region comparisons. With regards to the roll out of COVID-19 vaccines, it is important to know not just how many doses of vaccine have been given but also which ones are being given. To continue with the examples of the two countries considered, at the time of writing (July 2021), Switzerland has only licensed messenger RNA vaccines (Comirnaty and Spikevax) for use against COVID-19, while other options exist (e.g. adenovirus-based Vaxzevria) in the United Kingdom, a nuance which might not be evident from numbers of proportion vaccinated in each country. Furthermore the immunisation regimens are different; many younger Swiss residents are currently fully vaccinated with 4–6 weeks between shots while their British equivalents are waiting up to 12 weeks between shots and were invited for their first vaccination later. However, once appropriate considerations have been made regarding vaccine and case data, it is possible to incorporate (time-dependent) vaccination coverage rates in endemic-epidemic modelling. To appropriately account for the remaining (unvaccinated) number of susceptibles, use of the log proportion of unvaccinated cases is recommended following Herzog et al. (2011). This is also the approach Kucharski and colleagues have utilised in their endemic-epidemic model for measles which included vaccination (Robert et al., 2022).

8 INTERPRETABILITY (KUCHARSKI)

It is true that there is a balance between what data allows us to fit and how realistic and interpretable our model is. The benefit of the endemic-epidemic modelling framework is that it allows us to examine the spread of disease across age groups with flexible statistical techniques. The first instance of such a multivariate model is Knorr-Held and Richardson (2003) which investigated the spatio-temporal dynamics of meningococcal disease. Compartmental models are easier to interpret, but more difficult to apply to surveillance data (see Held et al., 2006; Paul et al., 2008, for further discussion).

ACKNOWLEDGEMENTS

We thank the Royal Statistical Society for the opportunity to appraise comments from discussants and the discussants themselves for providing feedback. As some of the comments cover similar topics, we respond to the points raised rather than individual responses.

REFERENCES

Aaby
,
P.
,
Thoma
,
H.
&
Dietz
,
K.
(
2021
)
Measles in the European past: outbreak of severe measles in an isolated German village, 1861
.

Alleman
,
T.W.
,
Vergeynst
,
J.
,
De Vischer
,
L.
,
Rollier
,
M.
,
Torfs
,
E.
,
Belgian Collaborative Group on COVID-19 Hospital Surveillance
et al. (
2021
)
Assessing the effects of non-pharmaceutical interventions on SARS-CoV-2 transmission in Belgium by means of an extended SEIQRD model and public mobility data
.
Epidemics
,
37
, 100505.

Arregui
,
S.
,
Aleta
,
A.
,
Sanz
,
J.
&
Moreno
,
Y.
(
2018
)
Projecting social contact matrices to different demographic structures
.
PLoS Computational Biology
,
14
(
12
),
1
18
.

Bauer
,
C.
,
Wakefield
,
J.
,
Rue
,
H.
,
Self
,
S.
,
Feng
,
Z.
&
Wang
,
Y.
(
2016
)
Bayesian penalized spline models for the analysis of spatio-temporal count data
.
Statistics in Medicine
,
35
(
11
),
1848
1865
.

Bracher
,
J.
&
Held
,
L.
(
2021
)
A marginal moment matching approach for fitting endemic-epidemic models to underreported disease surveillance counts
.
Biometrics
,
77
(
4
),
1202
1214
.

Bracher
,
J.
&
Held
,
L.
(
2022
)
Endemic-epidemic models with discrete-time serial interval distributions for infectious disease prediction
.
International Journal of Forecasting
,
38
(
3
),
1221
1233
.

Brooks-Pollock
,
E.
,
Danon
,
L.
,
Jombart
,
T.
&
Pellis
,
L.
(
2021
)
Modelling that shaped the early COVID-19 pandemic response in the UK
.
Philosophical Transactions of the Royal Society B: Biological Sciences
,
376
(
1829
), 20210001.

Chambers
,
C.
(
2019a
)
The registered reports revolution lessons in cultural reform
.
Significance
,
16
(
4
),
23
27
.

Chambers
,
C.
(
2019b
)
What’s next for registered reports
.
Nature
,
573
(
7773
),
187
189
.

Feehan
,
D.M.
&
Mahmud
,
A.S
. (
2021
)
Quantifying population contact patterns in the United States during the COVID-19 pandemic
.
Nature Communications
,
12
(
1
),
893
.

Grimée
,
M.
,
Bekker-Nielsen Dunbar
,
M.
,
Hofmann
,
F.
&
Held
,
L.
(
2022
)
Modelling the effect of a border closure between Switzerland and Italy on the spatiotemporal spread of COVID-19 in Switzerland
.
Spatial Statistics
,
49
,
100552
.

Groseclose
,
S.L.
&
Buckeridge
,
D.L.
(
2017
)
Public health surveillance systems: recent advances in their use and evaluation
.
Annual Review of Public Health
,
38
,
57
79
.

Held
,
L.
,
Hens
,
N.
,
O’Neill
,
P.
&
Wallinga
,
J.
(Eds.). (
2020
)
Handbook of infectious disease data analysis
.
Handbooks of Modern Statistical Methods
.
Boca Raton
:
Chapman & Hall/CRC Press
.

Held
,
L.
,
Hofmann
,
M.
,
Höhle
,
M.
&
Schmid
,
V.
(
2006
)
A two-component model for counts of infectious diseases
.
Biostatistics
,
7
(
3
),
422
437
.

Held
,
L.
&
Meyer
,
S.
(
2020
) Forecasting based on surveillance data. In:
Held
,
L.
,
Hens
,
N.
,
O’Neill
,
P.
&
Wallinga
,
J.
(Eds.)
Handbook of infectious disease data analysis
.
Handbooks of Modern Statistical Methods
.
Boca Raton
:
Chapman & Hall/CRC Press
, pp.
509
528
.

Held
,
L.
,
Meyer
,
S.
&
Bracher
,
J.
(
2017
)
Probabilistic forecasting in infectious disease epidemiology: the 13th Armitage lecture
.
Statistics in Medicine
,
36
(
22
),
3443
3460
.

Herzog
,
S.A.
,
Paul
,
M.
&
Held
,
L.
(
2011
)
Heterogeneity in vaccination coverage explains the size and occurrence of measles epidemics in German surveillance data
.
Epidemiology & Infection
,
139
(
4
),
505
515
.

Höhle
,
M.
&
an der Heiden
,
M.
(
2014
)
Bayesian nowcasting during the STEC O104:H4 outbreak in Germany, 2011
.
Biometrics
,
70
(
4
),
993
1002
.

Hulth
,
A.
,
Andrews
,
N.
,
Ethelberg
,
S.
,
Dreesman
,
J.
,
Faensen
,
D.
,
Van Pelt
,
W.
et al. (
2010
)
Practical usage of computer-supported outbreak detection in five European countries
.
Eurosurveillance
,
15
(
36
), 19658.

Jarvis
,
C.I.
,
Gimma
,
A.
,
van
Zandvoort
,
K.
et al. (
2021
)
The impact of local and national restrictions in response to COVID-19 on social contacts in England: a longitudinal natural experiment
.
BMC Med
,
19
, 52.

Jarvis
,
C.I.
,
Van Zandvoort
,
K.
,
Gimma
,
A.
,
Prem
,
K.
,
CMMID COVID-19 Working Group
,
Klepac
,
P.
et al. (
2020
)
Quantifying the impact of physical distance measures on the transmission of COVID-19 in the UK
.
BMC Medicine
,
18
(
1
),
124
.

Knorr-Held
,
L.
&
Richardson
,
S.
(
2003
)
A hierarchical model for space-time surveillance data on meningococcal disease incidence
.
Journal of the Royal Statistical Society: Series C (Applied Statistics)
,
52
(
2
),
169
183
.

Latsuzbaia
,
A.
,
Herold
,
M.
,
Bertemes
,
J.-P.
&
Mossong
,
J.
(
2020
)
Evolving social contact patterns during the COVID-19 crisis in Luxembourg
.
PLoS One
,
15
(
8
),
1
13
.

Mistry
,
D.
,
Litvinova
,
M.
,
Pastore y Piontti
,
A.
,
Chinazzi
,
M.
,
Fumanelli
,
L.
,
Gomes
,
M.F.C.
et al. (
2021
)
Inferring high-resolution human mixing patterns for disease modeling
.
Nature Communications
,
12
(
1
),
323
.

Paul
,
M.
,
Held
,
L.
&
Toschke
,
A.M.
(
2008
)
Multivariate modelling of infectious disease surveillance data
.
Statistics in Medicine
,
27
(
29
),
6250
6267
.

Polonsky
,
J.A.
,
Baidjoe
,
A.
,
Kamvar
,
Z.N.
,
Cori
,
A.
,
Durski
,
K.
,
Edmunds
,
W.J.
et al. (
2019
)
Outbreak analytics: a developing data science for informing the response to emerging pathogens
.
Philosophical Transactions of the Royal Society B: Biological Sciences
,
374
(
1776
), 20180276.

Prem
,
K.
,
Liu
,
Y.
,
Russell
,
T.
,
Kucharski
,
A.J.
,
Eggo
,
R.M.
,
Davies
,
N.
et al. (
2020
)
The effect of control strategies to reduce social mixing on outcomes of the COVID-19 epidemic in Wuhan, China: a modelling study
.
Lancet Public Health
,
5
(
5
),
e261
e270
.

Ray
,
E.L.
,
Sakrejda
,
K.
,
Lauer
,
S.A.
,
Johansson
,
M.A.
&
Reich
,
N.G.
(
2017
)
Infectious disease prediction with kernel conditional density estimation
.
Statistics in Medicine
,
36
,
4908
4929
.

Robert
,
A.
,
Kucharski
,
A.J.
&
Funk
,
S.
(
2022
)
The impact of local vaccine coverage and recent incidence on measles transmission in France between 2009 and 2018
.
BMC Med
,
20
,
77
.

Schwab
,
S.
&
Held
,
L.
(
2020
)
Science after Covid-19: faster, better, stronger?
Significance
,
17
(
4
),
8
9
.

Stojanović
,
O.
,
Leugering
,
J.
,
Pipa
,
G.
,
Ghozzi
,
S.
&
Ullrich
,
A.
(
2019
)
A Bayesian Monte Carlo approach for predicting the spread of infectious diseases
.
PLoS One
,
14
(
12
), e0225838.

Van den Akker
,
O.
,
Weston
,
S.J.
,
Campbell
,
L.
,
Chopik
,
W.J.
,
Damian
,
R.I.
,
Davis-Kean
,
P.
et al. (
2020
).
Preregistration of secondary data analysis: a template and tutorial. PsyArXiv
.

van
Leeuwen
,
E.
,
PHE Joint Modelling Group
&
Sandmann
,
F.
(
2021
)
Augmenting contact matrices with time-use data for fine-grained intervention modelling of disease dynamics: a modelling analysis
.
Statistical Methods in Medical Research
,
31
(
9
),
1704
1715
.

Willem
,
L.
,
Hoang
,
T.V.
,
Funk
,
S.
,
Coletti
,
P.
,
Beutels
,
P.
&
Hens
,
N.
(
2020
)
SOCRATES: an online tool leveraging a social contact data sharing initiative to assess mitigation strategies for COVID-19
.
BMC Research Notes
,
13
(
1
),
293
.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)