Per-partnership transmission probabilities for Chlamydia trachomatis infection: evidence synthesis of population-based survey data

Abstract Background Chlamydia is the most commonly diagnosed sexually transmitted infection worldwide. Mathematical models used to plan and assess control measures rely on accurate estimates of chlamydia’s natural history, including the probability of transmission within a partnership. Several methods for estimating transmission probability have been proposed, but all have limitations. Methods We have developed a new model for estimating per-partnership chlamydia transmission probabilities from infected to uninfected individuals, using data from population-based surveys. We used data on sexual behaviour and prevalent chlamydia infection from the second UK National Study of Sexual Attitudes and Lifestyles (Natsal-2) and the US National Health and Nutrition Examination Surveys 2009–2014 (NHANES) for Bayesian inference of average transmission probabilities, across all new heterosexual partnerships reported. Posterior distributions were estimated by Markov chain Monte Carlo sampling using the Stan software. Results Posterior median male-to-female transmission probabilities per partnership were 32.1% [95% credible interval (CrI) 18.4–55.9%] (Natsal-2) and 34.9% (95%CrI 22.6–54.9%) (NHANES). Female-to-male transmission probabilities were 21.4% (95%CrI 5.1–67.0%) (Natsal-2) and 4.6% (95%CrI 1.0–13.1%) (NHANES). Posterior predictive checks indicated a well-specified model, although there was some discrepancy between reported and predicted numbers of partners, especially in women. Conclusions The model provides statistically rigorous estimates of per-partnership transmission probability, with associated uncertainty, which is crucial for modelling and understanding chlamydia epidemiology and control. Our estimates incorporate data from several sources, including population-based surveys, and use information contained in the correlation between number of partners and the probability of chlamydia infection. The evidence synthesis approach means that it is easy to include further data as it becomes available.


Methods
The aim of the study is to provide a mathematical and statistical model that can be used to infer perpartnership transmission probability from survey data.

a. Mathematical model
Let each individual j experience a force of infection Fj, which depends on his or her rate of forming infectious contacts (partnerships). Assume that all women recover from infection at the same rate, λf, and all men recover at the same rate, λm. We use a susceptible-infected-susceptible (SIS) model of infection and recovery (Figure 1). The probability that individual j is infected at a given moment is ! " , and the probability that he or she is susceptible is 1 − ! " .
Assuming only heterosexual transmission, the force of infection is the rate at which an individual makes contacts with infected members of the opposite sex, multiplied by the per-contact transmission probability. We denote the sex of individual j with the symbol x, and the opposite sex with the symbol x'. The rate of contacting infected members of the opposite sex is % &" , and the per-contact transmission probability from the opposite sex is ' &(→& . Then: The following assumptions are implicit in this argument and are discussed in the main text: 1. Closed system: the number of people entering and leaving the system is negligible. 2. Steady state: prevalence is stable, and force of infection and recovery rate do not change.
3. Identical partnerships: all partnerships have the same risk of transmission, regardless of partnership length and frequency of sex acts.
Our model considers asymptomatic infections; symptomatic infections prompt treatment-seeking and are therefore short-lived and unlikely to cause onward infection or to be detected in population-based surveys. b. Data We infer parameter values in the model by synthesizing data from several sources.

i. Clearance of untreated chlamydia infection
Data informing the clearance rate of untreated chlamydia infection in men and women came from studies in the literature synthesized in previous analyses. 1,2 In each study people found to be infected with chlamydia were re-tested at a later date, having remained untreated in the interim. The number who cleared their infection provides information on the clearance rate. Nine studies in women and eight in men were included, involving a total of 569 women and 165 men. Further details are provided in the original papers describing this analysis. 1,2 ii. Partnership numbers We used data on sexual behaviour and chlamydia infection from two population-based studies: the second National Study of Sexual Attitudes and Lifestyles (Natsal-2), 3 and the three National Health and Nutrition Examination Surveys (NHANES) conducted biennially between 2009 and 2014 4 . The ideal data to inform the sexual contact rate would be the number of new sexual partnerships formed in the last year.
In Natsal-2, participants reported their number of opposite-sex partners in the last year and were then asked: • This information was used to inform the distribution of number of new partners in the last year in the Natsal-2 population.
We combined data from the three NHANES conducted between 2009 and 2014 to achieve a larger sample size than would be possible using just one study. 4 Participants were asked: • In the past 12 months, with how many [women/men] have you had vaginal sex? and • In the past 12 months, did you have any kind of sex with a person that you never had sex with before? We used these two questions to provide a proxy for the number of new partners in the last year according to the following algorithm: • If a participant stated they had had no new partners in the last year, we took the number of new partners to be zero. • If a participant stated they had had new partner(s) in the last year, and reported one partner in total, we took the number of new partners to be one. • If a participant stated they had had new partner(s) in the last year, and reported more than one partner in total, we took the number of new partners to be one less than the total number of partners.
This approach is similar to the use elsewhere of "shifted negative binomial" distributions for modelling partner numbers. 5 iii. Infection status The publicly-available data from both Natsal-2 and NHANES also includes chlamydia infection status, diagnosed using nucleic acid amplification tests (NAATs) on urine samples, which provides information on the prevalence of infection in individuals reporting different numbers of partners. Natsal-2 participants were eligible for a urine sample if they were aged 18-44 years and had ever had sex, and a randomly-selected half of these eligible participants were invited to provide samples. All NHANES participants aged 14-39 years were invited to provide a sample for chlamydia testing, but the publicly-available data excludes 14-17-year-olds.
The raw data on numbers of partnerships reported by susceptible and infected men and women in Natsal-2 and NHANES are provided in Supplementary Tables S1 and S2.

c. Statistical model
We conducted a Bayesian evidence synthesis using data from the sources described to construct a likelihood. This was combined with appropriate priors to provide a posterior distribution for the model parameters.

i. Partnership dynamics
We used negative binomial distributions to model the estimated numbers of new partners reported in the last year by men and women. A negative binomial distribution with size ! and mean " can arise as a mixture of Poisson distributions, where the mixing distribution for the Poisson rate is a Gamma distribution with shape ! and rate # $ . Formally, let the number of new partners reported by individual j be represented by the random variable Nj which has a Poisson distribution with rate σj: : .
It can be shown 6 In our model, the shape and rate depend on the sex of the individual: .

ii. Prevalence
As described above, the probability that individual j is infected with chlamydia is a function of the Poisson rate of forming partnerships with infected people (Y Z& ), the per-partnership transmission probability ([  Z\→Z ), and the clearance rate (^Z): The rate of individual j forming infectious contacts, Y Z& , equals the rate of forming contacts, / & , multiplied by the proportion of contacts offered by the opposite sex that are infectious, _Z \ : _Z \ is calculated by integrating (numerically) the product of prevalence and expected number of partnerships formed, over all possible partner change rates in sex x', and then dividing by the total expected number of partnerships formed, " Z\ = ": (1), the probability that an individual j is infected, _ & , therefore fulfills the equality: For individual j, the exact value of / & is not known, but the reported number of new partners, nj, provides some information, allowing us to update our Gamma prior as described above. The expected prevalence in individuals reporting nj partners is calculated by integrating the product of prevalence and the updated Gamma probability density for individual j:

iii. Infection clearance rate
We modelled immunological clearance of infection using the parameter ^Z. The statistical model is described elsewhere, 1 and allows for two courses of infection: fast-or slow-clearing. A proportion p of incident infections clear fast, and the remainder, 1 -p, clear slow. In this analysis we assume that only the slow-clearing infections last long enough to be detected in population-based studies. The clearance rate (denoted ^Z below) is therefore equal to the slow clearance rate in the clearance model, and the transmission probability we estimate is the probability that an infection is transmitted and then follows the slow-clearing course. The parameter values are inferred from published observational data in men and women 1,2 .
In the absence of data on the rates of testing and treating for asymptomatic chlamydia infection at the time of Natsal-2 and NHANES, we were not able to account in our model for chlamydia clearance via treatment of asymptomatic infections. We investigated the results of this decision in our predictive checks (see below).

iv. Full likelihood
The full set of model parameters is  Table S3.  Proportion of all partnerships in which the man/woman is infected.
Per-partnership transmission probability from an infected man/woman to a susceptible woman/man.
Survey weights | 5 are incorporated by multiplying the relevant component of the log-likelihood by the weight. The log-likelihood of the data is given by: where: • }~m l74kl is the log-likelihood associated with the partnership turnover data in men and women. • } ÄnkClC7Äk is the log-likelihood associated with the clearance data: where ntest is the number of people tested for each data point, r is the number who had cleared their infection and θ is the proportion expected to clear the infection (full details provided elsewhere 1 ).
• }`l kCnk7Äk is the log-likelihood associated with the prevalence data in men and women reporting different numbers of partners:

i. Priors
Prior distributions for the parameters were as follows:

ii. Bayesian methods and sampling of posterior distribution
Estimation was carried out by sampling from the posterior using a Markov chain Monte Carlo (MCMC) algorithm implemented in the Stan software, 8 within the R environment. 9 The data, Stan model file and R scripts used for handling input and results are all available online at https://github.com/mrcide/ct_transmission_prob. MCMC estimation is carried out by drawing thousands of samples from the joint posterior distribution. We ran four chains for 2000 iterations each, discarding the first 1000 "warmup" iterations of each chain. The results reported below are summary means, medians and credible intervals of the marginal distributions from this sampled joint posterior.

iii. Posterior predictive checks
We carried out graphical posterior predictive checks 6 to check the fit of the model. We simulated values for the data (number of partners and infection status for each individual), using each sample from the joint posterior distribution. The simulated data were compared to observed data to look for any systematic differences.
We expect that a proportion ä Z of incident chlamydia infections in sex x will cause symptoms that prompt testing and treatment, while the remaining 1 − ä Z are asymptomatic. As noted above, our model considers asymptomatic infections, so the modelled force of infection represents the force of asymptomatic infection. The force of symptomatic infection is  Table S4: Summary of posterior distributions for model parameters, inferred using data from the second National Study of Sexual Attitudes and Lifestyles (Natsal-2) and National Health and Nutrition Examination Surveys (NHANES). The first six parameters were sampled directly; the last three were calculated from the first six, as described in the text.  Figure S2 illustrates the model's agreement with partnership number data, showing the actual and simulated proportions of men and women who reported each number of partners. Transparent grey circle markers represent simulations from the posterior distributions; lines show the 50 th (solid) and 2.5 th /97.5 th (dashed) centiles of the simulations, and red crosses show the data. For a perfect model and completely accurate reporting of the data, we would expect the dashed lines to enclose 95% of data points.

b. Posterior predictive checks i. Partner number distributions
In both studies, the partnership numbers simulated in men generally agreed well with the data. The predictive properties were less good in women, with under-reporting of high partner numbers compared to simulations. If the average number of partnerships formed by men and women were allowed to differ then the agreement between simulations and data was improved and the posterior distributions for transmission probability remained similar. In our model we chose to constrain the average number in men and women to be equal because this is a necessary condition in reality. The main graph in each panel uses a linear scale on the y-axis, and the inset shows the same information but on a log scale. Simulations are shown using transparent grey markers, so that several superimposed markers appear as a darker grey. The solid and dashed lines show the 2.5th, 50th and 97.5th centiles of the simulations. The observed data shown takes into account the survey weights.

ii. Infection status
We checked the predictive properties of the infection model by using each sampled parameter set to simulate infection status in each survey participant, given their reported number of partners. In Figures S3  (Natsal-2) and S4 (NHANES), each transparent grey marker shows simulated prevalence among the participants reporting a given number of partners, which agreed well with the observed data. Only a small number of participants reported the highest numbers of partners (see bar graphs in lower panels), so only a few levels of prevalence were possible in those with several partners. For example, one man in Natsal-2 reported 19 partners, so simulated prevalence could only be 0 (one man, uninfected) or 1 (one man, infected). Figure S3: Simulated (grey) and observed (red) chlamydia prevalence (y-axis) in men and women reporting different numbers of new partners in the last year (x-axis) in the second National Study of Sexual Attitudes and Lifestyles (Natsal-2). Simulations are shown using transparent grey markers, so that several superimposed markers appear as a darker grey. The solid and dashed lines join the 2.5th, 50th and 97.5th centiles of the simulations. The observed data takes into account the survey weights. Bar charts below each plot show the (unweighted) number of survey participants reporting each number of partnerships. Figure S4: Simulated (grey) and observed (red) chlamydia prevalence (y-axis) in men and women reporting different numbers of new partners in the last year (x-axis) in the National Health and Nutrition Examination Studies (NHANES). Simulations are shown using transparent grey markers, so that several superimposed markers appear as a darker grey. The solid and dashed lines join the 2.5th, 50th and 97.5th centiles of the simulations. The observed data takes into account the survey weights. Bar charts below each plot show the (unweighted) number of survey participants reporting each number of partnerships. Table S5 shows the median and central 95% range of simulated numbers of symptomatic chlamydia cases, based on our posterior distributions and the male and female populations of England aged 15-44 in 2000 (Natsal-2), or the US aged 15-39 in 2009 (NHANES). For comparison, we also report the number of diagnoses recorded in surveillance systems covering approximately the same times and locations. In men in both studies and women in Natsal-2 the range of our simulations overlapped with the range from surveillance, suggesting that most of the observed diagnoses can be accounted for by treatment-seeking in response to symptoms, and that few additional diagnoses were made as a result of asymptomatic testing. In women in NHANES, more diagnoses were observed than we expected to be sought by symptomatic cases alone, so it seems likely that there was additional testing of asymptomatic women which would merit further empirical investigation. Table S5: Numbers of symptomatic chlamydia cases simulated using posterior parameter distributions inferred using Natsal-2 and NHANES data, and diagnoses recorded in surveillance systems covering approximately the same times and locations. For comparison to Natsal-2 we used diagnosis rate ranges in 15-44-year-olds in 2000, 10

. Balancing partnership numbers
We tested the effect of constraining the mean numbers to be equal by repeating the analysis, relaxing the constraint of equal mean partnership number in men and women (see online code). Figure S5 illustrates this model's agreement with partnership number data. In both studies the agreement between simulations and observations is improved compared to the constrained model, especially in women, but more than 5% of observations still fell outside the 95% prediction interval. Using Natsal-2, the posterior median (95%CrI) for the mean number of new partners per year in men was 0.75 (0.67-0.83) and in women was 0.40 (0.35-0.45). Inferred transmission probabilities were 32.4% (18.4-55.5)% (male-to-female) and 26.2% (5.8-84.8)% (female-to-male). Using NHANES, the inferred mean number of partners in men was 1.10 (1.08-1.33) and in women was 0.58 (0.52-0.66). Transmission probabilities were 31.3% (20.4-48.7)% (male-to-female) and 6.3% (1.4-18.0)% (female-to-male). Therefore, constraining the mean number of partnerships to be equal did not materially change the posterior distributions for transmission probabilities. In this model, the mean number of partnerships was not constrained to be equal between the sexes. Simulations are shown using transparent grey markers, so that several superimposed markers appear as a darker grey. The solid and dashed lines show the 2.5th, 50th and 97.5th centiles of the simulations. The observed data shown takes into account the survey weights.

ii. Condom use
In Natsal-2 participants were asked, With how many different women/men have you had vaginal (or anal) intercourse in the past year without using a condom? To investigate the potential effects of condom use on our estimates, we used this question to estimate the number of new partners without a condom: • If participants reported 0 partners without a condom then we classified them as having 0 new partners without a condom. • If participants reported the same number of partners in the last year as partners without a condom (i.e. if all partners in the last year were without a condom) then we classified the number of new partners without a condom as the same as the total number of new partners. • If neither of these conditions applied then we classified the number of new partners without a condom as the reported number of partners without a condom.
We used the same model as in the main analysis to estimate the transmission probabilities in partnerships where condoms were not always used. Figure S6 shows the posterior distributions compared to the posteriors in the main analysis.
As expected, the posterior distributions were shifted slightly to the right, suggesting higher transmission probabilities in partnerships without a condom, but the shift was small compared to the uncertainty in the estimates. The posterior median (95% credible interval) transmission probabilities were 40.1% (21.5-72.8)% from men to women and 31.6% (7.2-96.1)% from women to men. We conclude that it might be valuable for sexual behavior surveys to collect information on the annual number of new partnerships without a condom for parameter inference and predictive modelling. In the absence of such data, however, it is more reliable to calculate an average probability across all new partnerships, and we have no reason to suppose that such an average is not valid.  iii. Assortative mixing The model reported in the main text assumes random mixing between men and women -that is, that for individual j, the probability that a partnership they form with a member of the opposite sex is a potential source of infection does not depend on j's partnership formation rate. In fact, evidence indicates that sexual mixing is assortative, 12,13 although this is difficult to quantify precisely.
To investigate the potential effects of assortative mixing in our model, we reasoned that if individuals with more partners tend to form partnerships with others who also have more partners -and therefore the partners are more likely to be infected with chlamydia -then ! " #$ would be higher in people with more partners. If the transmission probability were the same for every partnership then we would therefore expect the product % #$→# = ! " #$ ( #$→# to be higher in people with more partners.
We ran an adapted model which allows % to be different for men and women reporting different numbers of partners. If people with more partners are more likely to form partnerships with infected people then we would expect % to be higher in those individuals. Figure S7 shows the posterior distributions for % that we inferred in men and women reporting different numbers of partners. For Natsal-2, although the posterior distributions for % were slightly higher in people reporting no new partners, there was considerable overlap and therefore no evidence of significantly higher prevalence in partnerships presented to individuals with high partnership formation rate than to those with low formation rate. In NHANES the posterior distributions suggested higher values for % in both men and women reporting no new partners: the opposite of what we would expect if there is assortative mixing. This pattern may arise if there is a higher transmission probability in slow-turnover partnerships, because they tend to last longer and have more sex acts during the infectious period, possibly with lower levels of condom use.
We found no evidence in either Natsal-2 or NHANES of higher % in people reporting more partners, providing confidence that the random mixing in the model has not affected our results.