The effect of vaccination programs on transmission of infectious disease is usually assessed by monitoring programs that rely on notifications of symptomatic illness. For monitoring of infectious diseases with a high proportion of asymptomatic cases or a low reporting rate, molecular sequence data combined with modern coalescent-based techniques offer a complementary tool to assess transmission. Here, the authors investigate the added value of using viral sequence data to monitor a vaccination program that was started in 1998 and was targeted against hepatitis B virus in men who have sex with men in Amsterdam, the Netherlands. The incidence in this target group, as estimated from the notifications of acute infections with hepatitis B virus, was low; therefore, there was insufficient power to show a significant change in incidence. In contrast, the genetic diversity, as estimated from the viral sequence collected from the target group, revealed a marked decrease after vaccination was introduced. Taken together, the findings suggest that introduction of vaccination coincided with a change in the target group toward behavior with a higher risk of infection. The authors argue that molecular sequence data provide a powerful additional monitoring instrument, next to conventional case registration, for assessing the impact of vaccination.
Vaccination programs are commonly monitored by using clinical case registry data. Molecular sequences may offer an additional data source for assessing the impact of vaccination. Recent advances in the analysis of molecular data enable estimation of changes in the circulation of infectious disease based on DNA or RNA sequences (1–3). Bayesian coalescent-based methods have been used to estimate changes in the circulation of hepatitis C virus (4), hepatitis A virus (5), West Nile virus (6), and human immunodeficiency virus (HIV)-1 (7). The methods assume that only a small fraction of cases per unit of time are sampled and sequenced to infer the infection dynamics in the larger population. That makes them well suited for monitoring infection dynamics when conventional case notifications offer an incomplete picture because part of the cases of infection are asymptomatic, subclinical, or simply not reported. To our knowledge, however, no studies so far have used molecular sequence data to assess the impact of a vaccination program.
A direct motivation for this study was provided by assessing the impact of a vaccination program against hepatitis B virus (HBV) targeted at men who have sex with men (MSM) in Amsterdam, the Netherlands. Sexual transmission is the most frequent route of infection in the Netherlands (8–10). An acute HBV infection typically lasts 3–4 months but can lead to chronic carriage of the virus. Chronic infection is uncommon in sexually acquired HBV because the probability of developing chronic carriage declines with age, from 90% in infants to 5% in adults (11, 12). Symptoms of acute HBV infection may include jaundice and fatigue, but often infection is asymptomatic or subclinical. The fraction of asymptomatic infections in adults is estimated to be about two-thirds. Because of this high proportion of asymptomatic and subclinical cases, a significant fraction of all cases are missed by routine passive surveillance by reporting of symptomatic cases.
Infection with HBV can be prevented by vaccination. Efficacy of vaccination against infection in MSM is 80%–87% (13). Because of the low incidence in the general population in the Netherlands, the Dutch government opted for a vaccination program targeted at risk groups. A pilot vaccination program was started in Amsterdam in 1998, aimed at MSM, commercial sex workers, heterosexuals with a high rate of partner change, and drug users. Although the targeted vaccination program is under close scrutiny, the high proportion of asymptomatic and subclinical cases makes it difficult to assess the impact of vaccination and draw firm conclusions helpful for policy making.
Coalescent-based analysis of molecular sequences provides a way to overcome the difficulties in assessing the impact of targeted vaccination programs. Because the coalescent-based method relies on the genealogical associations between the sampled sequences to infer underlying changes in the population dynamics of the pathogens, its estimates are unaffected by a high proportion of asymptomatic or subclinical cases or a low notification rate. This advantage makes it a promising complementary tool for inferring the impact of vaccination on transmission. In this study, we assessed the additional value of coalescent-based analysis of molecular sequence data of HBV, next to notification of acute hepatitis B infections, in monitoring the impact of a targeted vaccination program.
MATERIALS AND METHODS
Study population and vaccination program
We focused on hepatitis B infection among men having sex with men (MSM) in Amsterdam over the period 1992–2006. Of all reported acute HBV cases in Amsterdam, half occurred among MSM (10). The total size of the MSM risk group in Amsterdam is estimated to be approximately 26,000 persons (14). In this risk group, genotype A accounts for 87% of all infections, and almost 84% of genotype A infections are among MSM (10). Phylogenetic trees of HBV sequences from all genotypes from the study population and other risk groups in the area (until 2003) can be found in van Houdt et al. (10).
The targeted vaccination program was initiated in 1998. The vaccination effort for MSM consisted of an outreach program for recruiting vaccinees at social venues and offering vaccination for free. Over the period 1998–2006, 8,472 persons received the first vaccination. Of those receiving the first vaccination, 1,597 were found to be already anti–hepatitis B core positive, indicating prior exposure, and were not further vaccinated. We note for completeness that the persons vaccinated were not randomly sampled from the risk group, so translating this number into seroprevalence is difficult. A completed vaccination regimen of 3 administrations was achieved by 4,479 persons.
Data collection: notifications of acute infections with hepatitis B
Observed cases of hepatitis B in Amsterdam are reported to the Municipal Health Service. The patients are approached by public health nurses, who collect general information such as age and gender and try to identify the most likely route of transmission, for example, homosexual or heterosexual contact, injecting drug use, and parent to child. For some cases, a transmission route could not be established. From 1992 to 2006 in Amsterdam, there were 186 reported cases of acute hepatitis B infections with a reported homosexual transmission route. In the same period, there were 118 reported cases attributed to heterosexual transmission, 56 from an unknown source, and 53 from various nonsexual sources.
Analysis of notified acute infections with hepatitis B
We were interested in observing a change in incidence of illness. We used change-point analysis to detect whether there is any empirical support for a change and, if so, when this change occurred. For such an analysis, we invoked a step function to describe incidence of illness. This function requires 3 variables: the incidence before the change, the moment of change, and the incidence after the change. We used all registered clinical cases of illness attributed to homosexual transmission based on interviews and assumed a constant notification probability (refer to the Appendix for technical details). We used a Bayesian analysis to assess the posterior probability of many possible stepwise functions in incidence, given the data. The average over all these stepwise functions, each weighted by their respective posterior probability, yielded an estimated gradual change in incidence of illness as well as a probability distribution for the moment of change in incidence of illness. Note that a Bayesian change-point analysis does not assume that a step function is the most appropriate model for change, but it assumes that the step function is an appropriate building block for an appropriate model of change. Use of change-point analysis to detect the signature of different shapes of changes is of special value when the exact shape of change is not known beforehand. We used this method to examine indications of a change in incidence over the period 1992–2003 that was also analyzed by van Houdt et al. (10) (also refer to Figure 1 for the data).
Data collection: DNA sequences of HBV
For our analysis, we considered all 54 available sequences of HBV genotype A. Of these sequences, we excluded all 3 collected from female patients and all 8 not self-reported to originate from homosexual transmission. Additional analysis showed that exclusion of these 11 sequences did affect the precision but not the values of resulting estimates. Available sequences were obtained and aligned as described earlier (10); refer to the Appendix for accession numbers and collection dates. The 43 sequences are 654 nucleotide DNA fragments of the S gene, including part of the pre-S2 region. Collection dates for these sequences were spread evenly over the study period from 1992 to 2006, leading to inclusion of approximately 3 sequences per year. Interviews did not give any indication of social connections between the cases involved, nor was contact tracing practiced during the study period.
Analysis of molecular DNA sequences
Coalescent-based methods relate the coalescence (i.e., inverse branching) rate θ of a phylogenetic tree to the so-called effective number of infections Ne according to θ = cσ2/(τNe). In this equation, the parameter c corrects for the number of branches coexisting at a particular time in the phylogenetic tree, and σ2 is the variance in the number of new infections that infected cases create (15, 16). The generation time τ refers to the typical time between subsequent infections. The generation time τ and variance σ2 are not well known for many pathogens—for sexually transmitted HBV in MSM, we assumed that the generation time is 2–4 months—and, for this reason, Neτ/σ2 is commonly expressed as a compound variable. Following standard terminology, we refer to this compound variable Neτ/σ2 as genetic diversity (refer, for example, to Rambaut et al. (17) and Carrington et al. (18)).
We used a Bayesian skyline plot (19) to detect trends in the transmission of HBV. The procedure to construct these plots is implemented in the BEAST package (2). Skyline plots present a flexible model to visualize the trends in population change suggested by the data, making few prior assumptions about such changes.
We were interested in observing a change in the compound variable genetic diversity. We used change-point analysis to detect whether there was any empirical support for a change and, if so, when this change occurred. A step function was used to describe genetic diversity. Doing so requires 3 variables: genetic diversity before the change, the moment of change, and genetic diversity after the change. We compared this change-point model with a model based on a constant number of infections over the entire time period (representing the null hypothesis of no change in HBV circulation). To describe the goodness of fit of both models, we used the marginal likelihood, adopting the smoothed method of Newton and Raftery (20) as implemented in the BEAST package (refer to the Appendix for technical details). To quantify the strength of statistical evidence in the molecular sequence data for a change in prevalence, we used Bayes factors (21); interpretation of these Bayes factors is similar to that for the more familiar likelihood ratio.
The clinical case registries for the MSM in the Amsterdam area, going back to 1992, provided no support for a change in number of reported acute HBV infections (Figure 1). Accordingly, the posterior density from the change-point analysis was spread out over the period studied, indicating no clear-cut moment when a change in incidence of illness might have occurred (Figure 2). Such a change point would become apparent in Figure 2 as a concentration of the posterior probability density in a limited period in time.
The maximum posterior density phylogenetic tree of the Bayesian skyline analysis is presented in Figure 3. The sequence data reveal a decline in genetic diversity (Figure 4), for which there is strong statistical support (Table 1). 10Log marginal likelihoods were used to indicate the goodness of fit (the higher the better) of each model. The 10log marginal likelihoods were approximated by using the smoothed Newton and Raftery (20) method. Bayes factors indicate the strength of evidence for one model over another, in a way similar to the likelihood ratio. The common guideline for interpreting 10log Bayes factors (and 10log-likelihood ratios) is that values of 0–1/2 provide no evidence, values of 1/2–1 provide weak evidence, and values of 1 or more provide strong-to-decisive evidence in favor of the hypothesis with the maximum marginal likelihood (21). Here, the log marginal likelihood indicated that the model with a change in HBV genetic diversity gave the best fit to the data; the log Bayes factor indicated that the data provided strong evidence for a model with a change in HBV genetic diversity over a model with a constant genetic diversity.
|Model 1||Model 2|
|HBV genetic diversity||Constant||Stepwise change|
|Mean genetic diversity (Neτ/σ2)a||308||602|
|95% Credible interval||68, 996||126, 1,417|
|10Log marginal likelihood (L)||−1,332.1||−1,329.1|
|10Log Bayes’ factor (Lmax−L)||3.0||0|
|Model 1||Model 2|
|HBV genetic diversity||Constant||Stepwise change|
|Mean genetic diversity (Neτ/σ2)a||308||602|
|95% Credible interval||68, 996||126, 1,417|
|10Log marginal likelihood (L)||−1,332.1||−1,329.1|
|10Log Bayes’ factor (Lmax−L)||3.0||0|
Abbreviation: HBV, hepatitis B virus.
Ne, the effective number of infections; τ, the generation interval; σ2, the variance in the number of new infections that infected cases cause.
The population histories sampled from the posterior showed a mean reduction in the genetic diversity to 17% (95% confidence interval: 3, 65) of its former magnitude. The interval estimate of the relative change was below 100%, which therefore provides strong statistical support for the change being a decrease. The stepwise model places a high posterior probability on a change occurring just after the vaccination program was implemented in 1998, with a maximum posterior probability near the year 2000 (Figure 5).
Coalescent-based approaches can underestimate genetic diversity when the number of lineages in the phylogenetic tree is large compared with the number of infected persons in the population, which can be cause for concern when, in a phylogeny, the number of lineages increases toward the tips of the tree. Because of the distribution of our samples over time, however, change in the number of lineages through time throughout the study period was relatively low. Visual inspection of lineage through time plots from the posterior sample of trees indicated that the number of lineages ranged from 3 to 15 between 1992 and 2003, with no indication of an increasing or decreasing trend (data not shown). Fu (22) showed that the Kingman coalescent can be a reasonable approximation to exact coalescent results even for small and oversampled populations.
In this study, we explored the added value of using molecular sequence data, next to clinical case registries, to assess the impact of vaccination. As an example, we used a targeted vaccination program against HBV infections among MSM in Amsterdam because infections with HBV are often asymptomatic and will not be reported to the Municipal Health Service. We observed a significant decline in genetic diversity of the HBV genotype A viral sequences collected among the target population, and this decline occurred within a few years after the vaccination program was implemented. In contrast, we could not detect a significant decline in the number of notified acute HBV infections in the target population.
Apparently, we have 2 opposing observations, and it may be tempting to argue that one is flawed. Both data sources, however, may have limitations. On the one hand, we included only 43 sequences (654 nucleotides), so our number of observations may not be large enough and our molecular sequence data may contain insufficient information on the actual incidence of hepatitis B among MSM in Amsterdam. We collected hepatitis B viral sequences of the S gene, and mutations in the S gene are constrained by the overlapping reading frame of the P gene. Inclusion of sequences from the more variable C gene could potentially further enhance the information that can be obtained from such sample sets (23). However, despite the small number of sequences and the constrained mutation rate, we did find a statistically significant signal in the data. Support of the data for a change in diversity is considered strong evidence by all possible classifications we consulted.
There is a small probability that we have a type I error and reject the hypothesis that transmission remained constant while this hypothesis was true. With smaller sample sizes, this probability becomes smaller, in our case meaning that small sample size cannot be the reason for detecting a change. This rules out the possibility that transmission remained constant and we detected an erratic signal due to a small number of sequences; the limited number of sequences do carry a definite signal that transmission has changed.
On the other hand, the case notification data contain insufficient information about incidence because of incomplete case ascertainment of all acute infections among MSM. We had only 186 reported acute infections of hepatitis B over a period of 15 years. Some infections through homosexual contacts are suspected to be reported as unknown. Case reports of acute infections through heterosexual contacts and acute infections of unknown route declined after 1998 (10). Moreover, it cannot be excluded that increased awareness because of the vaccination program in both the risk group and health care workers influenced reporting practice. These factors suggest that it is difficult to infer the absolute value of incidence from case notifications.
There is a small probability that we have a type II error and accept the hypothesis that transmission remained constant while there was a change. For larger changes, this possibility diminishes. It is therefore extremely unlikely that a large change would be missed; the limited number of case reports does signify that incidence changed very little. Case reports for injecting drug user and heterosexual risk groups (both with smaller numbers of reported cases than MSM in the prevaccination period) declined significantly in the 6 years after vaccination (10). For injecting drug users, however, this decline is thought to be caused by a reduction in injecting drug use, and, for heterosexuals, a causal relation with the vaccination is not proven. The decline in hepatitis B case reports for these groups indicates that case registration surveillance is potentially able to detect significant changes in incidence in a short time frame.
Most studies that relate molecular sequence data to infection dynamics assume proportionality between genetic diversity and incidence of infection. However, our results demonstrate that this relation might be more complicated than suggested by the standard assumption of proportionality. We can offer a number of non-mutually-exclusive explanations for the apparent discrepancy between genetic diversity and incidence.
A reduction in genetic diversity can be due to a decline in the number of infections but also to a shorter generation time or an increase in the average and variance of the number of secondary infections produced by one infective individual (14). The latter could occur if the vaccination program coincided with an increase in risk behavior. If risk behavior increased among MSM in Amsterdam, we would also expect an increase in the incidence of other sexually transmitted infections in this group. This possibility was indeed observed (24–26). Improved HIV treatment is thought to have led to increased risk behavior around 1995–2000 in the Amsterdam region and internationally (24–28). Therefore, it is very likely that there was a change toward increased risk behavior among the target population during the time vaccination was introduced. An increase in superspreaders (both HIV-positive and HIV-negative MSM) can also help to explain reduced diversity. Furthermore, increased risk behavior by chronic carriers of HBV, including HIV-positive persons who can carry high viral loads, could have introduced conserved HBV strains into the MSM population, leading to a further decline in genetic diversity observed in sampled sequences.
Given these coincidental changes, we would expect that incidence remains about constant because increased risk behavior would result in increased incidence and because increased vaccination uptake would result in lower incidence; we would also expect genetic diversity to decrease sharply because both increased vaccination uptake and increased risk behavior (through increased variance in the number of secondary cases) would result in lower genetic diversity. In a modeling study, Xiridou et al. (29) showed that the negative effect of increased risk behavior among MSM can counterbalance the positive effect of hepatitis B vaccination at current levels of vaccine uptake in Amsterdam.
These behavioral changes can explain why the decline in genetic diversity is larger and more sudden than the expected decline in genetic diversity that would result from a decline in number of infections. Finally, imported strains of infection can potentially confound our data set and increase the estimated genetic diversity in the period before they are introduced. A standard coalescent-based analysis, without accounting for case notifications, might risk interpreting the resulting loss of genetic diversity in samples as an additional decline in the effective number of infections, Ne. A standard analysis of case notifications, without accounting for sequence data, might miss the changes in transmission altogether.
Summarizing, after a targeted hepatitis B vaccination program was introduced, we did not observe a distinct change in notified acute infections, whereas we did observe a significant sharp decline in genetic diversity in the HBV genotype A that is characteristic for the target population. The molecular data revealed that HBV transmission started changing immediately after the vaccination program was implemented. A plausible explanation for these observations is that introduction of the program coincided with increased risk behavior, leading to reduced diversity in the sampled sequences. Studies based on information about only case notifications would have missed the change in HBV genetic diversity altogether, whereas studies based on information about only molecular sequence data would have overestimated the size of the impact of the targeted vaccination program. We conclude that use of molecular sequence data is a valuable addition to routine case notifications for both assessing and interpreting the impact of vaccination on transmission of the pathogen.
hepatitis B virus
human immunodeficiency virus
men who have sex with men
Author affiliations: Centre for Infectious Disease Control, National Institute for Public Health and the Environment, Bilthoven, the Netherlands (W. Marijn van Ballegooijen, Hein J. Boot, Roel A. Coutinho, Jacco Wallinga); Department of Infectious Diseases, Public Health Service, GGD, Amsterdam, the Netherlands (Robin van Houdt, Sylvia M. Bruisten); Department of Human Retrovirology, Amsterdam Medical Center, University of Amsterdam, Amsterdam, the Netherlands (Roel A. Coutinho); and Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, the Netherlands (Jacco Wallinga).
Financial support was received from ZonMw, grant 50-50800-98-024.
The authors thank the Department of Infectious Diseases of the Public Health Service Amsterdam for generously providing anonymous registration data of reported patients with an acute HBV infection.
Conflict of interest: none declared.
Change-point analysis of registered acute HBV infections
We assumed that the registered clinical cases Ct in year t represent a binomial sample of the true incidence in that year It, based on a constant notification probability P. Similar to the model used in the molecular analysis, we assumed that incidence is constant over time, with the exception of a single stepwise change at time T. We used flat priors for the incidence before and after this stepwise change over the range of 1 to 500 persons per year. We assumed that the prior probability of a change in number of infections at time T is constant. We used a constant prior probability for number of infections after change, Ia, and number of infections before change, Ib. With this model, we examined the posterior probability of the change occurring in a particular year, given the clinical case registries. Using Bayes’ theorem, the marginal posterior for the change occurring in year T is
Change-point analysis of HBV DNA sequences
Our data set of 43 sequences contained 21 different strains, and the dominant strain contained 21 sequences. The Bayesian framework is appropriate to such data sets because it can deal with identical sequences without reverting to branches of length zero. We used the HKY + I model (30) for molecular evolution in all analyses presented. To prevent overfitting, we constructed the skyline plots from a model with 3 constant levels of population size and 2 points where this level changed. BEAST uses Markov chain Monte Carlo sampling to estimate the posterior distribution for parameters of interest. The Markov chain Monte Carlo chain was run for 88 million updates, results were visually inspected for convergence, and the first 8 million updates were discarded as burn-in.
We calculated posterior distributions for the parameter values of interest (time of change, genetic diversity before change, genetic diversity after change). We used an uninformative, flat prior for the time of change within the years 1991–2003, and we used flat priors ranging from 0 to 2,000 for the product of the generation time and the genetic diversity before and after this change. Because sequence data contain information on genetic diversity in the population at a time only before the sequences were collected, and because the collection dates were spread out over the study period, the data set contains limited information on the most recent years. We limited the change-point window to 2003 to prevent a strong influence of the Bayesian prior in the recent years on the estimate. We examined this change-point model by running the Markov chain Monte Carlo chain for 88 million steps, examining for convergence by visual inspection, and discarded the first 8 million steps as burn-in.
Tests on simulated data
To check the reliability of our method, we repeated our analysis on simulated data sets. First, to show that the inferred decline in genetic diversity depends crucially on information in the observed sequences and their associated dates, we reassigned collection dates to the sequences at random by sampling without replacement from the actual collection dates of the sequences. The results of Bayesian skyline analysis on 20 randomized data sets (Appendix Figure 1) showed that the Bayesian skyline plot from the actual data is distinct from the skyline plots of sequences with randomized dates. This finding indicates that the observed pattern is not determined by the method of analysis or the sampling scheme but follows from the actual dates of sequence sample collection.
Second, to show that a decline in genetic diversity is unlikely for a population in which the population genetic parameters are constant through time, we simulated molecular evolution on 100 random coalescent phylogenies under the assumption of constancy, and we analyzed them. We used the mean posterior estimates of coalescent and evolutionary parameters from the constant population model of the HBV sequences. The random coalescent trees used the same sampling time points as our data. Molecular sequence evolution was simulated on these trees by using the Mesquite program (31). Change-point analysis of these simulated data sets showed that their typical posterior change-point distribution is uninformative (roughly uniform throughout its interval; Appendix Figure 2). A decline in genetic diversity was not apparent in the simulated data sets. The posterior distributions of the change point for the simulated data sets did not show a concentration of probability density in particular time intervals, unlike the posterior distribution for the actual data set, indicating that the decline in genetic diversity observed in the HBV data is unlikely to originate under the null hypothesis of no change in genetic diversity. Further tests of the coalescent method implemented in BEAST on simulated data can be found in Drummond et al. (2).
Sequence accession numbers
Sequences used in this analysis have the following accession numbers and dates (month/year): DQ631557 (June/1992), DQ631559 (July/1992), DQ631603 (September/1992), DQ631584 (October/1992), DQ631585 (December/1992), DQ631588 (February/1993), DQ631591 (June/1993), DQ631563 (October/1993), DQ631565 (October/1993), DQ631569 (May/1994), DQ631570 (June/1994), DQ631571 (August/1994), DQ631602 (October/1995), DQ631598 (September/1996), DQ631632 (August/1998), DQ631633 (September/1998), DQ631635 (December/1998), DQ631636 (April/1999), DQ631606 (July/1999), DQ631611 (November/1999), DQ631613 (September/2000), DQ631620 (December/2000), DQ631637 (April/2001), DQ631623 (May/2002), DQ631624 (October/2002), DQ631625 (May/2003), DQ631628 (June/2003), DQ988365 (February/2004), DQ988368 (March/2004), DQ988369 (April/2004), DQ988371 (May/2004), DQ988375 (August/2004), DQ988376 (August/2004), DQ988377 (August/2004), DQ988378 (September/2004), FJ391662 (June/2005), FJ391670 (June/2005), FJ391683 (October/2005), FJ391697 (December/2005), FJ391846 (February/2006), FJ391848 (February/2006), FJ391849 (February/2006).