## Abstract

The recent availability of survey data on social contact patterns has made possible important advances in the understanding of the social determinants of the spread of close-contact infections, and of the importance of long-lasting contacts for effective transmission to occur. Still, little is known about the relationship between two of the most critical identified factors (frequency of contacts and duration of exposure) and how this relationship applies to different types of infections. By integrating data from two independently collected social surveys (Polymod and time use), we propose a model that combines these two transmission determinants into a new epidemiologically relevant measure of contacts: the number of “suitable” contacts, which is the number of contacts that involve a sufficiently long exposure time to allow for transmission. The validity of this new epidemiological measure is tested against Italian serological data for varicella and parvovirus-B19, with uncertainty evaluated using the Bayesian melding technique. The model performs quite well, indicating that the interplay between time of exposure and contacts is critical for varicella transmission, while for B19 it is the duration of exposure that matters for transmission.

## Introduction

Social contacts and their underlying socio-demographic structures are known to greatly affect the way in which infectious diseases, such as measles and influenza, circulate in the population (Wallinga *and others*, 2006). For instance, schools and workplaces strongly influence the age profile of the contacts individuals have during the day, as well as the overall number of contacts and the duration of exposure. Standard mathematical approaches to study infectious disease dynamics are based on compartmental transmission models, whereby individuals in the population are classified according to their infection status (e.g. Susceptible, Infectious, and Recovered SIR model) and their age (Anderson and May, 1991). The chain of transmission events is generally assumed via “indirect” methods (e.g. the Who-Acquires-Infection-From-Whom matrices by Anderson and May, 1991 and the preferred matrices by Hethcote, 1995). Although this indirect approach has generated a very large number of contributions to theoretical epidemiology and public health, the method has important limitations. The results are strongly sensitive to the hypothesized contact structures, as emphasized by Greenhalgh and Dietz (1994).

More recently, several alternative approaches aimed at “directly” estimating contact matrices, i.e. matrices whose entries represent the average number of contacts that individuals in age group $$i$$ have with individuals in age group $$j$$ per unit of time, have been proposed. A first approach relies on contact surveys in which the respondent self-reports the number of contacts he or she had during a randomly sampled day (Edmunds *and others*, 1997; Mossong, Hens, Jit *and others*, 2008; Wallinga *and others*, 2006). A second approach relies on time use data (TUD), in which time of exposure matrices are estimated from time use (TU) diaries (Zagheni *and others*, 2008). In a third approach, contact matrices are generated from individual-based models and appropriately calibrated to socio-demographic data and/or TUD (Del Valle *and others*, 2007; Fumanelli *and others*, 2012; Iozzi *and others*, 2010). In all these works, TUD and contact data have been used separately to model two different dimensions of the transmission process, namely exposure duration and contacts. It is only in Smieszek (2009) that, for the first time, a basic transmission framework including both the duration and the intensity of contacts has been developed. Recent investigations based on Polymod data (Mossong, Hens, Jit *and others*, 2008) showed the importance of long-duration physical contact in explaining observed serological profiles for varicella and parvovirus B19 (Goeyvaerts *and others*, 2010; Melegaro *and others*, 2011). These results provide the first epidemiological evidence of the need for a deeper integration of contacts and duration of exposure for modeling the spread of infection.

The aim of this work is to take the existing literature one step forward, integrating TU and contact data into a unified model, since the two data sources provide useful complementary information. Specifically, we address the following questions: what is the role of the number of contacts versus duration of exposure in explaining serological profiles? Can this modeling framework provide insights into on infection-specific determinants?

To answer these questions, we rely on a classic probabilistic result known as the occupancy problem, and we define contacts as “suitable for transmission” if they last long enough to include a potential infectious event. Our method allows us to first build a large family of “suitable” contact matrices that possibly reflect different degrees of infection transmissibility and then use these matrices to build SIR and SIRS (Susceptible, Infectious, Recovered, and Susceptible) epidemiological models for, respectively, varicella zoster virus (VZV) and parvovirus B19 (B19). The models are validated against serological data for these infections. Statistical inference of our deterministic model parameters is performed using the Bayesian melding (BM) approach (Alkema *and others*, 2007; Coelho *and others*, 2011; Poole and Raftery, 2000). Comparisons with models based on more standard social contact structures (Goeyvaerts *and others*, 2010; 2011; Melegaro *and others*, 2011) are also considered.

The article is structured as follows. Section 2 describes the data sources and presents the theoretical framework yielding the suitable contact matrix. Section 3 reports the fit of the considered models to Italian serological data for VZV and B19. Discussion of critical issues is postponed to Section 4. Supplementary material available at Biostatistics online reports additional technical details and a number of further modeling results.

## Materials and methods

### Data

Three complementary and independent datasets were used: social contact data from the Polymod Study, TUD, and serological data on varicella and B19 infections. We used data for Italy, the only country for which we have a full combination of the three relevant data sources, in particular a TU study covering young children ($$\geq 3$$ years).

The EU-funded Polymod project gathered social contact data for eight European countries from May 2005 to September 2006 (Mossong, Hens, Jit *and others*, 2008). The sample consisted of 7290 respondents (849 in Italy) who were asked to self-report the number of contacts they had during a randomly assigned day, together with some additional information on their socio-economic background. Participants provided information about the age and sex of each contacted person, the contact location, duration, proximity, and frequency.

TUD were collected by the Italian National Statistical Agency in 2002–2003 on a sample of approximately 24 000 households. Respondents recorded their activities in diaries during a randomly assigned day, as well as the location where the activity took place.

VZV and B19 age-specific serological data, i.e. data which provide the individuals’ current epidemiological status (susceptible or immune), were collected for Italy between 1997 and 2003 by the European Sero-Epidemiology Network (ESEN2) (Nardone *and others*, 2007). The total sample size was 2517 and individuals’ ages ranged from 0 to 79 years (Mossong, Hens *and others*, 2008). VZV and B19 are both close-contact childhood respiratory infections characterized by an infectious period of approximately 6–7 days (Anderson and Cherry, 2004; Heymann, 2008). Given that a mass vaccination program for both VZV and B19 was not in place in Italy at the time when serological data were gathered (a vaccine is still not currently available for B19), we assumed a pre-vaccination equilibrium for both infections.

### Derivation of suitable contact matrices

Standard contact matrices $$C$$ based on Polymod survey data (Mossong, Hens, Jit *and others*, 2008; Wallinga *and others*, 2006) have elements $$c_{ij}$$ representing the average number of contacts (per unit of time, e.g. per day) that individuals in age group $$i$$ have with individuals in age group $$j$$. This is estimated by dividing the total number of contacts with individuals in age group $$i$$ reported by respondents in age class $$j$$ ($$M_{ij}$$) by the number of survey respondents $$x_j$$ ($$c_{ij} = M_{ij}/x_j$$). Contact matrices based on either all reported contacts or duration-stratified contacts ($$<$$15, 15–60 min, $$>1\,{\rm h}$$) were considered here.

Duration of exposure matrices, $$E$$, are computed as in Zagheni *and others* (2008), whereby the elements $$e_{ij}$$ represent the average time (e.g. in minutes) that individuals in age group $$i$$ are “exposed” to individuals in age group $$j$$.

Matrices $$C$$ and $$E$$ are combined and a novel measure of contact, which we call “suitable contacts”, is defined. In our approach, a contact is “suitable” when, under the assumption that transmissibility cumulates over the duration of a contact, the underlying duration of exposure is sufficiently long to allow for transmission of the infection. We evaluate suitable contacts from TUD and contact data, using a probabilistic result known as the *occupancy problem*, (for details see Appendix A of supplementary material available at Biostatistics online).

We assume that duration of exposure between age groups is randomly allocated to the respective number of contacts in discrete amounts and that each infection is characterized by a minimal exposure duration that is necessary for transmission—a “minimal suitable duration” (MSD). Essentially, given independent information on the average number of contacts and the average duration of contacts, we want to compute the expected number of contacts that last longer than a minimal threshold for transmission, which is the MSD. For example, consider an infection for which 1 min is the MSD, and assume that individuals in age groups $$i$$ and $$j$$ are exposed to each other, $$e_{ij}$$, for 20 min/day, and have an average number of contacts $$c_{ij}$$ of 10/day. The average duration of contacts, $$e_{ij}/c_{ij}$$, is 2 min. However, if these 20 min are randomly allocated to contacts, there is also quite a large (binomial) probability (more than 12%) that a contact lasts less than 1 min, and therefore it is “not suitable” for infection transmission.

Now consider contacts and exposure over longer time periods, under the assumption that the average duration per contact, $$e_{ij}/c_{ij}$$, remains constant over time. The available data do not contain information about possible variations of the average duration per contact, and therefore we assume that it is constant. Let $$u_{ij}$$ be a random variable representing the number of suitable contacts between age groups $$i$$ and $$j$$. The expected number of suitable contacts $$\bar {u}_{ij} = E(u_{ij})$$ is given by the product of the average number of contacts $$c_{ij}$$ and the proportion of these contacts that are suitable for transmission $$(1-\exp (-e_{ij}/c_{ij}))$$, where $$\exp (-e_{ij}/c_{ij})$$ is the Poisson probability that a contact is not suitable. The larger the average duration of contacts $$e_{ij}/c_{ij}$$, the larger is the proportion of contacts suitable for transmission. In our previous example, the proportion of suitable contacts would be $$(1 - \exp (-2))$$, i.e. about 86%. The average number of suitable contacts $$\bar {u}_{ij}$$ is smaller than the average number of contacts $$c_{ij}$$ as some contacts may not last long enough and therefore are “not suitable for transmission”.

In principle, if the MSD were known for different infections, we could compute suitable contact matrices $$U$$ (with elements $$u_{ij}$$) for any close-contact infectious disease. Although this information is generally not available and needs to be estimated or assumed, our approach can be used to show how the shape of a suitable contact matrix, or the level of assortativeness in contacts, varies as the minimal (infection-specific) duration varies. The impact of changes in the MSD on the overall shapes of the suitable contact matrix depends on the overall shape of both $$E$$ and $$C$$ matrices. It can be shown that, for standard situations, less transmissible infections have a less assortative suitable matrix $$U$$ than their original contact matrix $$C$$ (see Figure 1 as an example).

In order to estimate the fraction of total exposure time between age groups that is suitable for transmission, which we define as $$q_2$$, we express the expected number of suitable contacts in the following more general form:

*and others*, 2006): $$\lambda _i=q_1\sum _j u_{ij}I_j$$, where $$I_j$$ is the fraction of infective individuals of age $$j$$ and $$q_1$$ is a constant disease-specific transmission coefficient. The FOI is used as the building block of appropriate age-structured transmission models that are validated, under the assumption of endemic equilibrium, against observed seroprevalence data for varicella and B19, according to the standard approach (Goeyvaerts

*and others*, 2010, 2011; Iozzi

*and others*, 2010; Melegaro

*and others*, 2011; Wallinga

*and others*, 2006). More specifically, an SIR model was used for varicella, while an SIRS structure was chosen for B19 infection due to the existing epidemiological and modeling evidence of reinfection (Goeyvaerts

*and others*, 2011). From standard epidemiological modeling techniques, the above equations lead to the derivation of the next generation matrix (NGM) (Diekmann

*and others*, 1990)) whose elements are $${\mathrm {NGM}}_{ij} = q_1 \times \bar {u}_{ij} \times d$$, where $$d$$ is the average duration of infectiousness, and for which the leading eigenvalue represents the so-called basic reproduction number ($$R_0$$), the most well-known summary measure of the potential of infection spread.

In what follows, we call “Baseline” the transmission models (either SIR or SIRS) based on the suitable contact matrix computed from the Polymod matrix for all reported contacts $$C$$ and the exposure-duration matrix computed from TUD. The parameters $$q_1$$ and $$q_2$$ are here interpreted as “level” and “shape” parameters, respectively. High values of $$q_2$$ give little importance to the exposure matrix and more importance to the contact matrix. The level parameter $$q_1$$ rescales the overall structure of suitable contacts to account for infection transmissibility that is assumed to be non-age-dependent, and it is potentially affected by the use of suitable contacts as proxies of those contacts that enable transmission.

### Estimation and statistical inference on transmission parameters

In the standard approach, transmission parameters are estimated by fitting age-structured models based on the adopted contact matrix to serological data using maximum likelihood (Goeyvaerts *and others*, 2010; Iozzi *and others*, 2010; Kretzschmar *and others*, 2010; Melegaro *and others*, 2011; Ogunjimi *and others*, 2009; Zagheni *and others*, 2008).

In this work, we use the BM approach (Poole and Raftery, 2000) to formally incorporate existing epidemiological and biological knowledge in the estimation of SIR and SIRS model parameters. Consider, for instance, a standard age-structured SIR model that deterministically transforms a set of inputs (e.g. $$\theta $$) into a set of outputs (e.g. $$\rho $$: seroprofiles and $$R_0$$). The vector $$\theta $$ of unknown parameters is, for instance, $$\theta = (q_1,q_2)$$ for the Baseline SIR model. The available epidemiological knowledge is translated into a prior distribution for the basic reproduction number ($$R_0$$) of the infection considered. As prior density on $$R_0$$ for VZV we assume a uniform probability density with range $$(1,8)$$, which covers all best fitting $$R_0$$ estimates found in the literature (Farrington *and others*, 2013; Goeyvaerts *and others*, 2010; Melegaro *and others*, 2011). Given that in mathematical terms, $$R_0$$ is the dominant eigenvalue of the NGM (Diekmann *and others*, 1990), it follows that inputs and outputs in our model are linked through the deterministic $$M$$ function, $$R_0 = M(\theta)$$, which maps the vector of unknown parameters into the dominant eigenvalue of the $$NGM$$. A prior distribution on the outputs thus implicitly defines a prior distribution on the inputs (Poole and Raftery, 2000), conditional on the time of exposure and contact matrices. The implicitly defined prior is the so-called induced prior distribution: $$p^*(\theta)$$. Similarly, we expressed our a priori uncertainty about model parameters $$(q_1,q_2)$$ in the form of uniform prior distributions.

The likelihood of observing the vector of serological data ($$W$$) was written using a standard product of Bernoulli likelihoods throughout age groups:

*and others*(2011). Infants younger than 1 year of age are not considered because they are assumed to be (at least partially) protected by maternal antibodies.

The posterior distribution for the parameters of interest was obtained by combining priors and likelihood within the standard Bayesian framework: $$p(\theta |W) \propto p^*(\theta) p(W| M(\theta))$$.

Since an analytical solution for the posterior distribution could not be obtained, we used the Sampling-Importance-Resampling algorithm (Rubin, 1988) to generate samples of the posterior distribution of model inputs and outputs. Model comparison was based on the deviance information criterion (DIC) (Spiegelhalter *and others*, 2002).

Extension to SIRS models, as appropriate for B19, is straightforward. The vector of unknown parameters is 3D: $$\theta = (q_1, q_2, \sigma)$$, where sigma is the rate at which individuals lose immunity and move back to the susceptible compartment. As for varicella, a uniform prior distribution for $$R_0$$ is assumed, although with a narrower range $$(1,5)$$ (Goeyvaerts *and others*, 2011).

## Results

### Suitable contact matrices

By considering different values of the MSD, which we assume to be a proxy for infection transmissibility, our approach allows for an entire family of suitable contact matrices to be generated. Figure 1 illustrates the main differences between three suitable matrices obtained by combining the TU matrix (measured in minutes) with the Polymod matrix, under the assumption of an MSD of, respectively, 1, 10, and 20 min. Specifically, Figure 1(a) reports the mean number of contacts along the main diagonals of the three suitable matrices, showing a massive decline in children's contacts (i.e. age groups 0–14), which are up to six times less as the MSD increases from 1 to 20 min, whereas older age groups are much less penalized. Figures 1(b)–(d) report contacts of children aged, respectively, 0–4, 5–9, and 10–14, i.e. the groups contributing most to transmission, with other age groups. Again, the stronger decline (still up to a factor of six) is for contacts with other child-age groups, whereas contacts with older individuals are much less affected. Overall these effects sharply reduce the matrix assortativeness.

The simplest summary measure of this decline in children's role in transmission is represented by the dominant eigenvalue of the corresponding contact matrices, i.e. the basic reproduction number that would be observed under the social contact hypothesis for a hypothetical infection having a probability of transmission per single contact equal to one (Wallinga *and others*, 2006). It happens that as the MSD goes from 1 to 10 to 20 min, the dominant eigenvalue declines from 21.9 (very close to the figure of the Polymod matrix of 22.1) to 16.1 and to 12.5, suggesting a substantial decline in the overall transmissibility. These examples are useful to illustrate the decline in the transmission potential when longer time of exposure is required for transmission.

### Fit to varicella and parvovirus B19 serological data

The ability of suitable contact matrices to fit serological data for VZV and B19 is evaluated by comparing the performance of the Baseline model to a range of alternative models based on different mixing matrices. These include the Polymod matrix for all reported contacts (Polymod model), the time of exposure matrix based on TUD (Zagheni *and others*, 2008) (TU model), and a range of Polymod-based matrices, stratified by duration of contacts (Polymod model $$<15\,{\mathrm {min}}$$, Polymod model 15–60 min, and Polymod model $$>1\,{\mathrm {h}}$$). Additional results using contacts and TUD stratified by setting (home, school, general community) and by proximity of contacts are also reported in Appendix B of supplementary material available at Biostatistics online. The comparison between the Baseline model and models based on duration-specific Polymod matrices is of central interest here, given that the suitable matrix intends to be a substitute for duration-stratified contact data when these are not available. Tables 1 (for VZV) and 2 (for B19) report the estimated posterior means and modes, and related credible intervals, for the $$q$$s and $$R_0$$ parameters, and the DIC values for all models considered. Table 1 shows that the Baseline model fits observed varicella serology slightly better ($${\mathrm {DIC}} = 1496.55$$) than the Polymod and the TU model ($${\mathrm {DIC}} = 1509.66$$ and $${\mathrm {DIC}}=1519.11,$$ respectively), and it outperforms those based on Polymod short duration contacts ($${\mathrm {DIC}}_{<15\,{\mathrm { min}}} = 2025.06$$, $${\rm DIC}_{15\mbox {--}60\,{\mathrm {min}}} =1560.74$$). The Polymod model based on long contacts remains slightly superior ($${\mathrm {DIC}}_{>1\,{\mathrm { h}}}= 1478.73$$).

Baseline model | TU model | Polymod model | Polymod model (contacts $$<15\,{\mathrm {min}}$$) | Polymod model (contacts 15–60 min) | Polymod model (contacts $$>1\,{\mathrm {h}}$$) | Polymod model (contacts $$\lessgtr 1\,{\mathrm {h}}$$) | |
---|---|---|---|---|---|---|---|

Mean $$q$$ | 0.001 | 0.038 | 0.403 | 0.271 | 0.050 | ||

Mode $$q$$ | 0.001 | 0.039 | 0.397 | 0.270 | 0.051 | ||

[0.0014, 0.0015] | [0.036, 0.040] | [0.379, 0.427] | [0.260, 0.282] | [0.048, 0.053] | |||

Mean $$q_1$$ | 0.040 | ||||||

Mode $$q_1$$ | 0.040 | ||||||

[0.037, 0.047] | |||||||

Mean $$q_2$$ | 0.368 | ||||||

Mode $$q_2$$ | 0.249 | ||||||

[0.147, 0.882] | |||||||

Mean $$q_{\le 1\,{\mathrm {h}}}$$ | 0.0078 | ||||||

Mode $$q_{\le 1\,{\mathrm {h}}}$$ | 0.0001 | ||||||

[0.0002, 0.024] | |||||||

Mean $$q_{>1\,{\mathrm {h}}}$$ | 0.048 | ||||||

Mode $$q_{>1\,{\mathrm {h}}}$$ | 0.049 | ||||||

[0.043, 0.052] | |||||||

Mean $$R_0$$ | 5.491 | 5.127 | 5.844 | 18.619 | 7.210 | 5.006 | 5.118 |

Mode $$R_0$$ | 5.399 | 5.126 | 5.893 | 18.352 | 7.189 | 5.012 | 5.001 |

[5.181, 5.869] | [4.886, 5.381] | [5.552, 6.151] | [17.521, 19.732] | [6.914, 7.517] | [4.751, 5.273] | [4.815, 5.494] | |

DIC | 1496.548 | 1519.114 | 1509.657 | 2025.060 | 1560.738 | 1478.733 | 1476.969 |

Baseline model | TU model | Polymod model | Polymod model (contacts $$<15\,{\mathrm {min}}$$) | Polymod model (contacts 15–60 min) | Polymod model (contacts $$>1\,{\mathrm {h}}$$) | Polymod model (contacts $$\lessgtr 1\,{\mathrm {h}}$$) | |
---|---|---|---|---|---|---|---|

Mean $$q$$ | 0.001 | 0.038 | 0.403 | 0.271 | 0.050 | ||

Mode $$q$$ | 0.001 | 0.039 | 0.397 | 0.270 | 0.051 | ||

[0.0014, 0.0015] | [0.036, 0.040] | [0.379, 0.427] | [0.260, 0.282] | [0.048, 0.053] | |||

Mean $$q_1$$ | 0.040 | ||||||

Mode $$q_1$$ | 0.040 | ||||||

[0.037, 0.047] | |||||||

Mean $$q_2$$ | 0.368 | ||||||

Mode $$q_2$$ | 0.249 | ||||||

[0.147, 0.882] | |||||||

Mean $$q_{\le 1\,{\mathrm {h}}}$$ | 0.0078 | ||||||

Mode $$q_{\le 1\,{\mathrm {h}}}$$ | 0.0001 | ||||||

[0.0002, 0.024] | |||||||

Mean $$q_{>1\,{\mathrm {h}}}$$ | 0.048 | ||||||

Mode $$q_{>1\,{\mathrm {h}}}$$ | 0.049 | ||||||

[0.043, 0.052] | |||||||

Mean $$R_0$$ | 5.491 | 5.127 | 5.844 | 18.619 | 7.210 | 5.006 | 5.118 |

Mode $$R_0$$ | 5.399 | 5.126 | 5.893 | 18.352 | 7.189 | 5.012 | 5.001 |

[5.181, 5.869] | [4.886, 5.381] | [5.552, 6.151] | [17.521, 19.732] | [6.914, 7.517] | [4.751, 5.273] | [4.815, 5.494] | |

DIC | 1496.548 | 1519.114 | 1509.657 | 2025.060 | 1560.738 | 1478.733 | 1476.969 |

Estimates of the $$q$$ parameters are not comparable across models due to the different scales of the underlying matrices. The square brackets indicate the 95% Credible Interval. The model with the lower value of DIC is the one to be preferred.

^{†}Serological samples were collected between 1997 and 2003, TUD in 2002–2003, Polymod data in 2006.

Baseline model | TU model | Polymod model | Polymod model (contacts $$<15\,{\mathrm {min}}$$) | Polymod model (contacts 15–60 min) | Polymod model (contacts $$>1\,{\mathrm {h}}$$) | Polymod model (contacts $$\lessgtr 1\,{\mathrm {h}}$$) | |
---|---|---|---|---|---|---|---|

Mean $$q$$ | 0.0007 | 0.018 | 0.103 | 0.126 | 0.026 | ||

Mode $$q$$ | 0.001 | 0.018 | 0.103 | 0.126 | 0.026 | ||

[0.0007, 0.0008] | [0.018, 0.019] | [0.097, 0.110] | [0.120, 0.132] | [0.025, 0.028] | |||

Mean $$q_1$$ | 0.764 | ||||||

Mode $$q_1$$ | 0.760 | ||||||

[0.739, 0.790] | |||||||

Mean $$q_2$$ | 0.001015 | ||||||

Mode $$q_2$$ | 0.001001 | ||||||

[0.001, 0.001] | |||||||

Mean $$\sigma $$ | 0.009 | 0.009 | 0.006 | 0.010 | 0.010 | 0.009 | 0.009 |

Mode $$\sigma $$ | 0.01 | 0.010 | 0.006 | 0.01 | 0.01 | 0.010 | 0.010 |

[0.007, 0.010] | [0.007, 0.010] | [0.005, 0.006] | [0.009, 0.010] | [0.009, 0.010] | [0.007, 0.010] | [0.000, 0.005] | |

Mean $$q_{\le 1\,{\mathrm {h}}}$$ | 0.001 | ||||||

Mode $$q_{\le 1\,{\mathrm {h}}}$$ | 0.000 | ||||||

[0.025, 0.028] | |||||||

Mean $$q_{>1\,{\mathrm {h}}}$$ | 0.026 | ||||||

Mode $$q_{>1\,{\mathrm {h}}}$$ | 0.026 | ||||||

[0.007, 0.010] | |||||||

Mean $$R_0$$ | 2.251 | 2.389 | 2.400 | 4.098 | 2.870 | 2.251 | 2.258 |

Mode $$R_0$$ | 2.235 | 2.389 | 2.402 | 4.091 | 2.866 | 2.246 | 2.243 |

[2.176, 2.324] | [2.303, 2.479] | [2.301, 2.502] | [3.854, 4.359] | [2.737, 3.010] | [2.162, 2.342] | [2.173, 2.347] | |

DIC | 3177.597 | 3177.529 | 3274.799 | 3569.968 | 3345.168 | 3207.100 | 3525.530 |

Baseline model | TU model | Polymod model | Polymod model (contacts $$<15\,{\mathrm {min}}$$) | Polymod model (contacts 15–60 min) | Polymod model (contacts $$>1\,{\mathrm {h}}$$) | Polymod model (contacts $$\lessgtr 1\,{\mathrm {h}}$$) | |
---|---|---|---|---|---|---|---|

Mean $$q$$ | 0.0007 | 0.018 | 0.103 | 0.126 | 0.026 | ||

Mode $$q$$ | 0.001 | 0.018 | 0.103 | 0.126 | 0.026 | ||

[0.0007, 0.0008] | [0.018, 0.019] | [0.097, 0.110] | [0.120, 0.132] | [0.025, 0.028] | |||

Mean $$q_1$$ | 0.764 | ||||||

Mode $$q_1$$ | 0.760 | ||||||

[0.739, 0.790] | |||||||

Mean $$q_2$$ | 0.001015 | ||||||

Mode $$q_2$$ | 0.001001 | ||||||

[0.001, 0.001] | |||||||

Mean $$\sigma $$ | 0.009 | 0.009 | 0.006 | 0.010 | 0.010 | 0.009 | 0.009 |

Mode $$\sigma $$ | 0.01 | 0.010 | 0.006 | 0.01 | 0.01 | 0.010 | 0.010 |

[0.007, 0.010] | [0.007, 0.010] | [0.005, 0.006] | [0.009, 0.010] | [0.009, 0.010] | [0.007, 0.010] | [0.000, 0.005] | |

Mean $$q_{\le 1\,{\mathrm {h}}}$$ | 0.001 | ||||||

Mode $$q_{\le 1\,{\mathrm {h}}}$$ | 0.000 | ||||||

[0.025, 0.028] | |||||||

Mean $$q_{>1\,{\mathrm {h}}}$$ | 0.026 | ||||||

Mode $$q_{>1\,{\mathrm {h}}}$$ | 0.026 | ||||||

[0.007, 0.010] | |||||||

Mean $$R_0$$ | 2.251 | 2.389 | 2.400 | 4.098 | 2.870 | 2.251 | 2.258 |

Mode $$R_0$$ | 2.235 | 2.389 | 2.402 | 4.091 | 2.866 | 2.246 | 2.243 |

[2.176, 2.324] | [2.303, 2.479] | [2.301, 2.502] | [3.854, 4.359] | [2.737, 3.010] | [2.162, 2.342] | [2.173, 2.347] | |

DIC | 3177.597 | 3177.529 | 3274.799 | 3569.968 | 3345.168 | 3207.100 | 3525.530 |

Estimates of the $$q$$ parameters are not comparable across models due to the different scales of the underlying matrices. The square brackets indicate the 95% Credible Interval. The model with the lower value of DIC is the one to be preferred.

^{†}Serological samples were collected between 1997 and 2003, TUD in 2002–2003, Polymod data in 2006.

Figure 2 shows the prior and posterior distributions for the $$q$$s and $$R_0$$ parameters of the Baseline model for VZV and the TU model for B19. Specifically, Figures 2(a)–(c) show that the distribution of $$q_1$$ is mainly concentrated around the modal value of 0.04. The proportion $$q_2$$ of exposure time that is suitable for transmission has a remarkably asymmetric shape (Figure 2) as indicated by the difference between the posterior mean and mode. Nevertheless, both values indicate that a substantial proportion (25–37%) of duration of exposure is suitable for transmission, in line with the fact that varicella is highly transmissible. This means (see the discussion in Appendix A of supplementary material available at Biostatistics online) that, for varicella transmission, both contacts and time of exposure matter.

The estimates of $$R_0$$ provided by the best performing models in Table 1 are all within the range of 5–6. A sensitivity analysis expanding the range of the uniform prior on $$R_0$$ from $$(1,8)$$ to $$(1,30)$$ to account for the variability found in Goeyvaerts *and others* (2010) did not result in any relevant changes in the posterior distributions.

In Figure 3 (left panel), the appropriateness of the Baseline model for reproducing varicella serological data is shown and compared with the other models.

As for B19, our results indicate, first of all, that the SIRS structure (results in Table 2) is systematically superior to the SIR one (results in Appendix 2.4 of supplementary material available at Biostatistics online), with smaller values of DIC for all matrices considered. The generality of this result, in particular the fact that it holds irrespective of the type of contact pattern adopted, seems to robustly confirm the findings in Goeyvaerts *and others* (2011). The estimates of the recovery rate (sigma) broadly agree with those in Goeyvaerts *and others* (2011). Table 2 shows that SIRS models based either on the suitable matrix or the TU matrix (both with a $${\mathrm {DIC}}=3177$$) perform better than all other models considered ($${\mathrm {DIC}} >3207$$). The estimate of $$q_2$$ from the Baseline model is very small (0.001). Recalling that (Appendix 1 of supplementary material available at Biostatistics online), for negligible $$q_2$$ values, the suitable contact matrix collapses into the exposure duration matrix and only the product $$q= q_1\times q_2$$ becomes identifiable (note from Table 2 that indeed the product of the estimates of $$q_1$$ and $$q_2$$ for the baseline overlaps the estimate of $$q$$ for the TU model), this means that, for B19, the Baseline model is not really distinguishable from the TU one. Overall this indicates that the duration of exposure is a better predictor of B19 transmission than reported contacts of any type. The ensuing estimates of $$R_0$$, around two (Figure 2(e)), agree with those in Melegaro *and others* (2011) and are slightly higher than those reported in Goeyvaerts *and others* (2011). These differences are, however, expected, given that, compared with the latter paper, we used different (unsmoothed instead of smoothed) contact matrices, and a different estimation strategy, namely BM instead of maximum likelihood.

The TU and the Baseline models perform better than models based on Polymod matrices, under the social contact hypothesis. This is due to the lower assortativeness of the underlying contact matrices, that implies a smoother predicted serological profile. Such smoother profiles allow to better follow the unusually noisy B19 serological data (Figure 3 (right panel)).

## Discussion

In recent years, the availability of social contact data (Horby *and others*, 2011; Mossong, Hens, Jit *and others*, 2008; Wallinga *and others*, 2006) as well as TUD (Zagheni *and others*, 2008) has made possible important improvements in our understanding of mixing patterns that are critical for infectious disease modeling. However, the number of social contacts is only one among the critical variables that characterize individual interactions and that lead to the transmission of infections. Ideally, other quantities, such as the amount of excreted infectious virus (e.g. through coughing, sneezing, or exhalation) and its propagation dynamics might be relevant (Smieszek, 2009; Tellier, 2006; Teunis *and others*, 2010; Weber and Stilianakis, 2008). In practice, the difficulty of measuring these quantities necessitates the use of proxies, such as the duration of exposure between individuals. For instance, if a sneeze during a contact substantially increases the probability of transmission, then a longer contact is more likely to lead to transmission of the infection, as it is more likely that at least one sneeze occurs during the contact. In other words, it is highly likely that the occurrence of some “suitable events” (e.g. kiss, handshake, sneeze, etc.) is positively related to the duration of the contact. For highly transmissible infections (e.g. measles), a short duration of exposure between an infected individual and a susceptible one might be sufficient for transmission. If the infection is not highly transmissible, a longer duration of exposure may be needed. Does this mean that many contacts are in this case “wasted” for transmission? Does a minimal “suitable duration” (MSD) of exposure, below which transmission cannot occur, exist?

The present paper attempts to answer the previous questions by integrating into a single unified contact model (termed the Baseline model) these two dimensions, i.e. number of encounters (from contact surveys) and the duration of exposure (from TUD), which are in most cases independently collected.

Our approach allows us to generate a large class of contact matrices by appropriately varying the dimensional unit of the exposure matrix, taken as a proxy of the MSD across different infections. For small values of the MSD, we obtain the standard contact matrix itself (i.e. Polymod matrix), suggesting that for highly transmissible infections only the number of contacts matters. For larger values of the MSD (infections with lower transmissibility), the importance of the number of contacts decreases and that of the exposure duration increases. In this case, some contacts might not have sufficient exposure duration and therefore are not suitable. Thus, the ensuing contact matrix is less assortative because age groups with large numbers of contacts are more heavily penalized for the lack of exposure time. Our method was tested against Italian seroprevalence data for varicella and B19 using the BM techniques for the estimation of model parameters. We believe that BM might be a valuable tool for further progress in statistical modeling of infectious disease dynamics when using deterministic processes. In particular, here BM allowed us to consistently incorporate—in the form of prior knowledge about the major summary parameter of infection transmission, i.e. $$R_0$$—the current epidemiological evidence about VZV and B19 as it appears in the available scientific literature.

The performance of the Baseline model in explaining serological data was compared with those of alternative models based on a variety of contact matrices computed from Polymod and TUD. In the case of varicella, the Baseline model shows quite a good fit compared with models based on the Polymod overall and duration-stratified matrices and the TU matrix. For varicella the estimated proportion $$q_2$$ of exposure duration relevant for transmission is quite high (Mode of $$q_2$$ around 25%) compared with B19 (Mode of $$q_2$$ around 0.1%), making varicella much more transmissible than B19. Said differently, both contacts and exposure duration seem to matter for varicella transmission.

With regard to B19 infection, our results confirm the robustness of the recent findings by Goeyvaerts *and others* (2011) about the importance of the loss of immunity and reinfection, as shown by the fact that modeling structures postulating reinfection (i.e. of the SIRS type) systematically perform better than structures with permanent immunity (SIR) irrespective of the contact model adopted. In particular, our results indicate that duration of exposure is a much better predictor of B19 transmission than the number of reported contacts.

The method that we developed in the paper is specifically designed for the case where contact and exposure duration data come from independent sources, and this may be considered a limitation of our work. Indeed at the individual level, there may be strong correlations between time of exposure and number of contacts, which we cannot test using independent datasets.

Nevertheless, our approach has attempted to evaluate the effects of the interplay between number of contacts and duration of exposure for the spread of infectious diseases. The results add to the theory of the social determinants of infection dynamics and they also provide some important insights for public health interventions, which depend on how individuals mix and the social context in which this mixing occurs (e.g. vaccination, screening, treatments of infectious individuals, or school closures during a pandemic). Our work can be valuable for those areas where the collection of finely stratified contact data might be costly or difficult to obtain, such as in low-income countries.

## Supplementary material

Supplementary Material is available at http://biostatistics.oxfordjournals.org.

## Funding

E.D.C., P.M., and A.M. were partly funded by the European Centre for Disease Prevention and Control (ECDC) under grant 2009/002 (project title: “Vaccine preventable diseases modeling in the European Union and EEFTA countries”) to the Dipartimento di Statistica e Matematica Applicata all’Economia, Università di Pisa, coordinator Piero Manfredi. The research leading to these results has received funding also from the European Research Council under the European Union's Seventh Framework Program (FP7/2007-2013)/ERC Grant agreement no [283955].

## Acknowledgements

The authors thank Luigi Marangi for his great help in programming B19 model; ESEN2 for having made available serological data on VZV and B19; Tommi Asikainen, Pierluigi Lopalco, Paloma Carrillo, Bruno Ciancio, Chantal Quinteen, Piero Benazzo, and Luca Fulgeri from ECDC for scientific advice and financial support; Nele Goeyvaerts for giving us the poststratification weights and helpful comments on B19; Rebecca Graziani and the Epidemics3 Conference participants for helpful discussions; and Amy Johnson for final editing of the manuscript. We warmly thank two anonymous reviewers and an associate editor of the Journal for their valuable comments that allowed us to improve the manuscript significantly. Usual disclaimers apply. *Conflict of Interest*: None declared.

## References

*R*

_{0}in models for infectious diseases in heterogeneous populations

*and others*

*and others*

*and others*