-
PDF
- Split View
-
Views
-
Cite
Cite
Sandra E Chaudron, Christine Leemann, Katharina Kusejko, Huyen Nguyen, Nadine Tschumi, Alex Marzel, Michael Huber, Jürg Böni, Matthieu Perreau, Thomas Klimkait, Sabine Yerly, Alban Ramette, Hans H Hirsch, Andri Rauch, Alexandra Calmy, Pietro Vernazza, Enos Bernasconi, Matthias Cavassini, Karin J Metzner, Roger D Kouyos, Huldrych F Günthard, for the Swiss HIV Cohort Study , A Systematic Molecular Epidemiology Screen Reveals Numerous Human Immunodeficiency Virus (HIV) Type 1 Superinfections in the Swiss HIV Cohort Study, The Journal of Infectious Diseases, Volume 226, Issue 7, 1 October 2022, Pages 1256–1266, https://doi.org/10.1093/infdis/jiac166
- Share Icon Share
Abstract
Studying human immunodeficiency virus type 1 (HIV-1) superinfection is important to understand virus transmission, disease progression, and vaccine design. But detection remains challenging, with low sampling frequencies and insufficient longitudinal samples.
Using the Swiss HIV Cohort Study (SHCS), we developed a molecular epidemiology screening for superinfections. A phylogeny built from 22 243 HIV-1 partial polymerase sequences was used to identify potential superinfections among 4575 SHCS participants with longitudinal sequences. A subset of potential superinfections was tested by near-full-length viral genome sequencing (NFVGS) of biobanked plasma samples.
Based on phylogenetic and distance criteria, 325 potential HIV-1 superinfections were identified and categorized by their likelihood of being detected as superinfections due to sample misidentification. NFVGS was performed for 128 potential superinfections; of these, 52 were confirmed by NFVGS, 15 were not confirmed, and for 61 sampling did not allow confirming or rejecting superinfection because the sequenced samples did not include the relevant time points causing the superinfection signal in the original screen. Thus, NFVGS could support 52 of 67 adequately sampled potential superinfections.
This cohort-based molecular approach identified, to our knowledge, the largest population of confirmed superinfections, showing that, while rare with a prevalence of 1%–7%, superinfections are not negligible events.
The human immunodeficiency virus type 1 (HIV-1) remains a global health challenge with 1.7 million new infections in 2019 [1] despite available combination antiretroviral therapy (ART), which, if successful [2], can reduce HIV transmission to almost zero [3]. In 2019, Switzerland had approximately 16 600 people living with HIV, with 425 diagnosed in 2018 [4]. Although HIV-1 is typically transmitted from infected to noninfected individuals, HIV-1 superinfection can also occur—that is, individuals with an already established HIV-1 infection acquiring another HIV-1 strain [5, 6]. Since the first reported cases of HIV-1 superinfection in 2002, several others were identified by a range of different approaches [7–9]. These superinfections were primarily identified in association with an unexpected increase in viral load (VL) or after treatment failure. Typically, cases are molecularly confirmed either by strain-specific polymerase chain reaction (PCR) [6], heteroduplex mobility assays [10], or genetic screening assays [11] to identify different HIV-1 subtypes and calculate viral sequence ambiguity scores [12], or by reconstruction of sequence-based viral phylogenies obtained from longitudinally sampled sequences [13].
However, there are still unknowns, such as the factors contributing to its acquisition and its incidence in the population [14, 15]. The immunological responses associated with HIV-1 superinfection are also not well understood [16, 17]. Finally, HIV-1 superinfection leading to or preventing rapid disease progression is debated [18, 19]. These uncertainties remain because a systematic assessment of HIV-1 superinfection is challenging. First, HIV-1 superinfection is difficult to distinguish from coinfection due to intrasubtype viral similarity. Accordingly, the most reported HIV-1 superinfections involve distinct HIV-1 subtypes [20].
Second, the event is deemed rare and can be transient, thus missed if the sampling is inappropriate and the superinfecting strain does not outcompete the established strain [21], making it challenging to assess the within-host viral population dynamics [22]. Finally, sampling size, frequencies, and timing are critical to detect HIV-1 superinfections. A large screen of a prospective seroincident cohort in Mombasa revealed 21 HIV-1 superinfection cases [23]. Another large retrospective screening of 4425 individuals could only confirm 2 of the 14 potential cases resequenced [24]. Overall, each study on HIV-1 superinfection identified at most a dozen cases. In general, longitudinal samples, population sequences, or next-generation sequences linked to dense qualitative epidemiological data from HIV-infected individuals are often not available for systematic screens of large populations and identification of HIV-1 superinfection in large numbers.
We thus established a molecular epidemiology approach to systematically screen for HIV-1 superinfection in cohort studies. We utilized the Swiss HIV Cohort Study (SHCS), a well-characterized cohort of >20 000 HIV-1–infected individuals, with good representativeness of the Swiss HIV-1 epidemic [25]. This method to identify HIV-1 superinfection cases was developed with the viral phylogeny of longitudinally sampled HIV-1 polymerase (pol) sequences from genotypic HIV-1 drug resistance testing. The process was then validated with HIV-1 near-full-length viral genome sequencing (NFVGS) from longitudinal samples of the potential cases identified within the SHCS.
METHODS
The Swiss HIV Cohort Study
The SHCS is a Swiss prospective multicenter, longitudinal study established in 1988 [25], with 20 845 people living with HIV enrolled by the end of 2019. It covers ≥53% of the cumulative number of HIV-infected individuals, approximately 75% of all HIV-positive individuals on ART, and 72% of individuals diagnosed with acquire immune deficiency syndrome (AIDS). All participants provided written informed consent and the SHCS was approved by the participating institutions’ ethics committees. At semiannual follow-ups, sociodemographic, behavioral, clinical, and laboratory data are obtained, and biological samples are stored in the SHCS biobank. Since 2002, routine HIV genotypic resistance tests (GRTs) are performed on baseline plasma samples and for treatment failure. Also, >11 000 GRTs were done retrospectively from biobanked plasma samples obtained before 2002 [26]. Overall, approximately 60% of enrolled individuals have ≥1 HIV-1 partial pol gene consensus sequence in the SHCS drug resistance database (DRDB). The DRDB HIV-1 partial pol sequences contain the protease (PR: nucleotides [nt] 2253–2550), and the reverse transcriptase (RT: nt 2550–3870, at minimum codons 28–225). At the start of this study in 2017, the database contained demographic information on 20 089 individuals.
Data availability is described in Supplementary Appendix 1.
Phylogeny Reconstruction
We built a phylogeny using an in-house pipeline. In brief, for the initial screen, all SHCS sequences were quality checked, that is, filtered for length (PR ≥250 bp, RT ≥500 bp) and duplicates. They were aligned (Supplementary Appendix 2) to HIV-1 HXB2 pol gene (GenBank accession number: K03455.1, nt 2253–3870) and known drug resistance mutations from the Stanford HIV DRDB and the International Antiviral Society–USA were removed from the alignment. The sequences were trimmed and the phylogeny was reconstructed with 2 different tools (Supplementary Appendix 2). For the validation analyses we used the same process on different genomic area of interest in the near-full-length HIV-1 consensuses.
Identification of Potential HIV-1 Superinfections
We used 2 criteria on SHCS participants with ≥2 longitudinal (ie, different time points) HIV-1 partial pol sequences: first, the within-individual maximal patristic distance, obtained by calculating the pairwise patristic distance from the individual’s longitudinal sequences (Supplementary Appendix 2 for R functions); second, the cluster size, that is, the number of sequences in the smallest subtree containing all the longitudinal sequences from an individual’s most recent common ancestor (MRCA). For superinfection, this cluster in nonmonophyletic containing all sequences of the focal patients as well as other SHCS sequences. Thus, to identify potential HIV-1 superinfection, we chose the thresholds of ≥0.05 and ≥20 for within-individual maximum patristic genetic distance and per individual smallest cluster size. Respectively, we tested the sensitivity of these combined thresholds by varying them from ≥0.01 to ≥0.1 (patristic distance) and from ≥5 to ≥250 (smallest cluster size).
We define the estimated time of HIV-1 superinfection for each individual, as the time point with the highest maximal patristic distance. This time point is the most distant to the other time points, thus provides the strongest evidence for HIV-1 superinfection.
Categorization of the Potential HIV-1 Superinfections
To assess the likelihood of HIV-1 superinfection regarding sample misidentification, we categorized the topology of the smallest subtree of SHCS participants with ≥2 longitudinal sequences (See Results, Supplementary Appendix 2 for R functions used). In individuals’ smallest tree, each time point was alternatively removed from the phylogeny, a new MRCA and smallest cluster size from it was calculated for the remaining time points. Category 1 cases only have 2 longitudinal HIV-1 partial pol sequences, distant in the phylogeny, and so no new MRCA was found by dropping one or the other sequence. Category 2 cases have >2 longitudinal sequences. One of the new smallest cluster sizes was smaller than the total number of longitudinal sequences in the phylogeny, for the focal individual. Meaning, all time points except one, cluster together in the phylogeny. Category 3 cases also have >2 longitudinal sequences but a different tree topology. Every new smallest cluster size remained larger than the total number of an individual’s longitudinal sequences used in the phylogeny. Meaning, several sequences from the same individual cluster together away from the others in the phylogeny.
Retrospective Sequencing of Near-Full-Length HIV-1 Genomes
To validate potential HIV-1 superinfections, we performed next-generation sequencing (NGS) of the near-full-length HIV-1 genome. Given the limitations of NGS at low VLs, we only used plasma samples with VL ≥5000 copies/mL. HIV-1 RNA isolation, complementary DNA synthesis, and PCR amplification were performed from individuals’ longitudinal plasma samples, with 4 overlapping fragments across HIV-1 genome amplified, combined, and NGS with Illumina Mi-Seq (detailed in Supplementary Appendix 3 and Supplementary Table 6).
Bioinformatic Analysis
We analyzed the NGS reads for each time point of a focal individual using an in-house bioinformatic pipeline. In brief, the NGS reads were trimmed and mapped to HIV-1 HXB2, and the near-full-length viral consensus was reconstructed (Supplementary Appendix 2). The coverage along the genome was assessed. The read mapping was repeated using as reference, the new sample’ viral consensus, until no further improvement in the coverage. The consensus before the last mapping was used to build the final viral consensus. The consensuses with 2500 HIV-1 full-length background sequences randomly selected from the Los Alamos HIV Databases, matching the viral subtypes prevalence in the SHCS (Supplementary Appendix 4), were used to validate superinfections with phylogeny and our selection criteria as described above.
The regions analyzed were HIV-1 full-length, pol, gag, and env (Supplementary Appendix 2). We excluded samples if the amplicon spanning the genomic area of interest failed and excluded individuals if too many samples failed leading to only <2 sequences available for the analysis.
Statistical Analysis
We characterized the confirmed and not confirmed HIV-1 superinfection, the remaining superinfection cases, and a control group. The control group represents 4250 of 4575 SHCS individuals with ≥2 longitudinal HIV-1 pol sequences, not meeting the selection criteria for a potential HIV-1 superinfection. The groups were compared on gender (male or female), ethnicity (white, black, or other ethnicity [ie, Hispano-American, Asian, and unknown]), and risk behavior such as likely source of infection: men who have sex with men (MSM), heterosexual contacts (HET), and intravenous drug use (IVD). Having had a positive test for other coinfections such as cytomegalovirus (CMV), syphilis (caused by Treponema pallidum), and hepatitis C virus (HCV) were also considered. We performed univariable and multivariable logistic regressions considering the confirmed superinfections against the control group.
Average Pairwise Diversity Calculation
The average pairwise diversity (APD) [27, 28] was calculated over the third-codon positions of HIV-1 pol (PR-RT), based on the viral consensus sequence (Supplementary Appendix 5). A high APD score reflecting a high within-host diversity may potentially be a useful marker for superinfection. We used the APD score of 0.0336 for high diversity at a given time point to confirm superinfection.
RESULTS
Study Population for HIV-1 Superinfection in the SHCS
To study HIV-1 superinfection, we started our workflow with all GRT HIV-1 partial pol sequences in the SHCS DRDB (Figure 1). We then reconstructed the phylogeny of 22 243 sequences linked to 12 397 cohort participants. We restricted the workflow to 4575 individuals in the phylogeny having ≥2 longitudinal HIV-1 partial pol sequences, a requirement to screen for HIV-1 superinfection.
![Study population. Overview of the selection process of the study population in the Swiss HIV Cohort Study (SHCS). We considered 4575 individuals, with ≥2 longitudinally sampled human immunodeficiency virus type 1 (HIV-1) pol sequences (protease [PR] and reverse transcriptase [RT]), to further study HIV-1 superinfection.](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/jid/226/7/10.1093_infdis_jiac166/1/m_jiac166f1.jpeg?Expires=1747887545&Signature=D89uRrLmws39GemYQB8LEeAUZYxTp8t2vqqCw~qBAKMlr5XBSA14hBwiuKIA7pCZK93uEQPvdFa-If5oarJPj5hqOXqKIx2TASQh~iPUWP5j9klrknMs-OtmpttM945Tapr-nIZyxCL4Wl9teRPu76zxuPpmA4ujQOz~9d-c42-ao3dPA~EH0hzGt2u6DEc4pwJta7066VWTKZnVKkO3LqyRR8NoBIn~jObQ0NtdvM1Pz6gCA83zHEUG-k-57~NfM9pGr7r9r3RPIQo-H5HtU1ET4iEkC~pnHzIOwT~5nTQkA8nRO9KgYgTZmj0vvXdYCN3JViSz4Z1ZBKgkaQLIjQ__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Study population. Overview of the selection process of the study population in the Swiss HIV Cohort Study (SHCS). We considered 4575 individuals, with ≥2 longitudinally sampled human immunodeficiency virus type 1 (HIV-1) pol sequences (protease [PR] and reverse transcriptase [RT]), to further study HIV-1 superinfection.
Identification of Potential HIV-1 Superinfections in the SHCS
To screen for HIV-1 superinfection, we used (1) the within-individual maximum patristic distance and (2) the per individual smallest nonmonophyletic cluster size. The first criterion reflects the requirement, for sufficiently variable and genetically distant within-individual sequences [29]. The second criterion distinguishes superinfection from transmission chains initiated by the focal individual [24, 30, 31], and usually, a value ≤0.045 is used to identify transmission clusters in phylogenies [32]. We then varied the maximum patristic distance from ≥0.01 to ≥0.10 and the smallest cluster size from ≥5 to ≥250 sequences, to identify HIV-1 superinfections. The number of cases varied considerably within the smaller thresholds, but for the higher ranges (0.05–0.10 and 20–250), it was less dependent on the thresholds (Figure 2A). We selected a within-individual maximum patristic distance ≥0.05 and a smallest cluster size ≥20 sequences to identify HIV-1 superinfection (Figure 2A and 2B). We identified 325 potential HIV-1 superinfections in the SHCS (Figure 2C; results comparable using RAxML; Supplementary Table 1).

Sensitivity of the screening criteria and selection potential human immunodeficiency virus type 1 (HIV-1) superinfections in the Swiss HIV Cohort Study (SHCS). Two criteria were considered to identify potential HIV-1 superinfections in the SHCS. The sensitivity of the maximum patristic distance was assessed with 10 different threshold values chosen from 0.01 to 0.10 and the one for the smallest cluster size, with 10 threshold values chosen from 5 to 250 sequences. A, Overlap between the 10 values of the 2 selection criteria. The blue box circles the number of identified HIV-1 superinfections for the 2 final criteria separately. B, Overlap between the patients having a maximum patristic distance ≥0.05, the ones whose sequences in the phylogeny do not create a monophyletic subtree (cluster), and the ones whose smallest cluster of sequences contains ≥20 sequences. C, Representation of the maximum patristic distance against the smallest cluster size in log scale for 972 focal patients having nonmonophyletic clusters (empty circles). The selection criteria thresholds are set at 0.05 and log10(20) for the x and y axis, respectively. The potential 325 HIV-1 superinfections are shown as blue circles.
Categorization of the Potential HIV-1 Superinfections in the SHCS
HIV-1 superinfection is described as a much less frequent event compared to initial infection [23]. A previous study showed that approximately 86% of the analyzed cases were linked to misidentified samples or sequences [24]. We thus categorized the potential HIV-1 superinfections to address the evidence for superinfection against potential specimen misidentification (Figure 3). The potential cases were classified into one of 3 categories. The likelihood of being superinfected increases from category 1 to 3, while respectively the likelihood of specimen misidentification decreases. We classified 29 individuals in category 3, 161 in category 2, and 135 in category 1.

Categorizing the potential human immunodeficiency virus type 1 (HIV-1) cases in the Swiss HIV Cohort Study (SHCS). Three hundred twenty-five potential HIV-1 superinfections were classified in 3 categories, representing the likelihood of HIV-1 superinfection (category 1: likely to category 3: most likely). Category 1 individuals only had 2 longitudinally sampled partial HIV-1 pol sequences, distant in the phylogeny. Category 2 individuals had >2 longitudinally sampled partial HIV-1 pol sequences with one sequence distant from the others in the phylogeny. Category 3 individuals had >2 longitudinally sampled partial HIV-1 pol sequences with sequences clustering with each other at different positions in the tree. Numbers represent the number of individuals identified per category.
Confirming Potential HIV-1 Superinfection
To validate our approach, we did HIV-1 NFVGS for 128 cases (Figure 4) with available longitudinal plasma samples in the SHCS biobank, around the estimated assumed time of superinfection, and reconstructed the partial pol phylogeny. Using only our selection criteria for HIV-1 superinfection, we confirmed 41 (32%, values summarized in Supplementary Table 2 and Supplementary Figure 1) superinfections in the SHCS. The “confirmed superinfections” were 10 of 15 (66.7%) category 3, 19 of 69 (27.6%) category 2, and 12 of 44 (27.2%) category 1. The varying confirmation rates correlate with the hypothesized superinfection categorization (Figure 3; results consistent using RAxML for phylogeny Supplementary Table 3).
![Validation of human immunodeficiency virus type 1 (HIV-1) superinfection (SI) with near-full-length HIV-1 sequencing. Of the 325 potential HIV-1 SI identified in the Swiss HIV Cohort Study, 128 (44 in category 1 [green box], 69 in category 2 [yellow box], and 15 in category 3 [red box]) were near-full-length HIV-1 next-generation sequenced. We reconstructed the phylogeny of the HIV-1 partial pol (protease–reverse transcriptase) genomic area and applied the 2 selection criteria for HIV-1 superinfection. “Confirmed SI” represents the cases where superinfection could be validated with near-full-length HIV-1 sequencing and our method. “Lack of evidence” represents either the cases for which we only have one time point or no time points matched between initial screen and validation analyses, or the cases with ≥2 time points matching that are not informative enough to identify superinfection in the initial screen or to validate it with near-full-length HIV-1–based analysis. “Discrepant phylogenetic pattern” represents the cases where we have ≥2 matching time points between the initial screen and validation analyses. However, for these cases there is a discrepancy between the initial screen and validation phylogeny resulting in them not being confirmed as superinfections with our analysis but still identified as superinfection with the matching time points. The average pairwise diversity (APD) was calculated for every case (see Supplementary Figure 1), and the numbers in blue are the total number of validated superinfections.](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/jid/226/7/10.1093_infdis_jiac166/1/m_jiac166f4.jpeg?Expires=1747887545&Signature=PhIZSLUOap4Pdbk0eCfO-hckQ~RW1FSHA4wblAWxZqL4GloMGVJ5H5GaOPxUniNdw-CQhI5gHD~-7iMbQ-mCLGYyVV3xEZ5d92208GqHc81nEGPI0nMgdMoaMpa~ajKK4tmwS5AgWDxZHSGYMccE~GQ4IGfSfvdpiuxg005eGBj3FiMLlTLh3PW-MTP4GuI95MHNNnI8B10pza5ofkm9DpovVYcC10lW94EprS2yUNSoGi-0VjW1waTlfWlBqGU14hO3r2-u62IvJe3SnAnKVk2r2XvCrsxeqMR64324Ay-A0gnQOTbuwU3wkVEhyeJ3ZYOMjpEfEzT3iW-zqsioow__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Validation of human immunodeficiency virus type 1 (HIV-1) superinfection (SI) with near-full-length HIV-1 sequencing. Of the 325 potential HIV-1 SI identified in the Swiss HIV Cohort Study, 128 (44 in category 1 [green box], 69 in category 2 [yellow box], and 15 in category 3 [red box]) were near-full-length HIV-1 next-generation sequenced. We reconstructed the phylogeny of the HIV-1 partial pol (protease–reverse transcriptase) genomic area and applied the 2 selection criteria for HIV-1 superinfection. “Confirmed SI” represents the cases where superinfection could be validated with near-full-length HIV-1 sequencing and our method. “Lack of evidence” represents either the cases for which we only have one time point or no time points matched between initial screen and validation analyses, or the cases with ≥2 time points matching that are not informative enough to identify superinfection in the initial screen or to validate it with near-full-length HIV-1–based analysis. “Discrepant phylogenetic pattern” represents the cases where we have ≥2 matching time points between the initial screen and validation analyses. However, for these cases there is a discrepancy between the initial screen and validation phylogeny resulting in them not being confirmed as superinfections with our analysis but still identified as superinfection with the matching time points. The average pairwise diversity (APD) was calculated for every case (see Supplementary Figure 1), and the numbers in blue are the total number of validated superinfections.
For the 87 unconfirmed superinfections, a potential reason could be mismatches between the time points sequenced for validation and the ones in the initial screening. To investigate this hypothesis, we looked at whether the overlapping time points between the validation and initial screen analyses alone would have allowed us to identify the superinfections in the initial screen. The “lack of evidence” subset represents 61 unconfirmed cases, for which we either had only one or no time point matching between the 2 analyses, or ≥2 matching time points not sufficient to identify the superinfection in our initial analysis. In these cases, we did not confirm superinfection likely because of less informative time points used for validation, taken outside the critical window of superinfection. Thus, the validation for these cases does not allow the confirmation of superinfection but also does not contradict the initial screen (and hence it does not provide evidence against superinfection).
The “discrepant phylogenetic pattern” subset represents 26 unconfirmed cases, with sampling times sufficiently concordant with the initial screen (ie, we have multiple sequences matching between the validation and the initial screen analyses). These matching time points were informative enough to initially identify HIV-1 superinfection, hence the discrepancy between the 2 analyses. Since our method uses consensus sequences to confirm HIV-1 superinfections, we considered a measure of diversity as a complementary screening method. The APD was shown [27, 28] to be a time measure of diversity within HIV-1–infected individuals, calculated over the HIV-1 partial pol gene. High APD scores (≥0.0336 for pol [PR-RT] [28]) may be a marker for superinfection, and in our study, 43 cases had a high APD (Figure 4, Supplementary Figure 2) for at least one time point. Eleven (6 category 2, 5 category 1) belonged to the 26 discrepant phylogenetic pattern subset. Such a high viral diversity in some of these cases still suggests HIV-1 superinfections, thereby potentially resolving the apparent discrepancy. For the confirmed cases, with no evidence of superinfection by APD, we looked at the fraction of ambiguous nucleotides [33], and it supports the conclusion of superinfection (Supplementary Table 4). Finally, we investigated the remaining 15 cases (6 category 1, 9 category 2), not confirmed with ≥2 longitudinal sequences matching between both analyses, and lower APD. Fourteen were intrasubtype B superinfections, a limitation for detection of superinfections with phylogenetic approaches, especially if the sampled viruses are close genetically. One was an intersubtype B and 01_AE superinfection, belonging to the category 1. We performed NFVGS for the 2 available samples, and both identified as subtype 01_AE. This indicates that it is not an intersubtype HIV-1 superinfection and that sample misidentification could be the cause of the suspected superinfection.
In total, from the 128 cases that were HIV-1 NFVGS, we confirmed 52 HIV-1 superinfections (77.6% of 67 cases sufficiently concordant with the initial screen).
Basic Epidemiology of the HIV-1 Superinfections in the SHCS
The confirmed or hypothesized superinfections were similar in demography and basic epidemiology (Tables 1 and 2). Of 52 confirmed, 71% were male and 29% female (multivariable odds ratio [OR], 1.192 [95% confidence interval {CI}, .562–2.518]), similar for the other groups compared. There were more MSM (42%) than HET (29%) (OR, 0.608 [95% CI, .244–1.428]) and IVD (29%) (OR, 0.695 [95% CI, .243–2.048]). The SHCS is predominantly of white ethnicity (approximately 70% in 2019), also reflected in the confirmed superinfections, with 86% white vs 8% black (OR, 0.622 [95% CI, .149–1.737]) and 6% other ethnicities. For the confirmed superinfections, 23% were already assigned ≥2 subtypes in our database vs only approximately 6% in the control group. The ones in the control group are due to recombination, where the recombinant clustered with the main subtype in the phylogeny, hence they were not superinfections. Having ≥2 subtypes is significant in the univariable and multivariable analyses (OR, 5.094 [95% CI, 2.528–9.546] and 5.409 [95% CI, 2.667–10.225]), respectively, confirming the hypothesized HIV-1 superinfections. Finally, 87% of confirmed cases were seropositive for CMV (OR, 1.201 (95% CI, .547–3.029]), 40% for HCV (OR, 1.763 [95% CI, .737–3.853]), and 11% for syphilis (OR, 0.788 [95% CI, .359–1.623]). We see no significant effect of having or having had these coinfections with being HIV-1 superinfected. For the confirmed superinfections, the viral load and treatment history patterns support that superinfection occurs during treatment failure where an increase of the viral load is often noticeable (Supplementary Table 5, Supplementary Figure 3). Finally, the time elapsed analysis between the GRT time points (Supplementary Figure 4) is in line with the assumption that a smaller sampling frequency allows the detection of HIV-1 superinfections more efficiently.
Characteristic . | Superinfection Confirmed . | Superinfection Not Confirmed . | Remaining Potential Superinfection . | SHCS Individuals With ≥ 2 HIV-1 pol Sequences . |
---|---|---|---|---|
Total No. | 52 | 76 | 273 | 4250 |
Sex | ||||
Male | 37 (71) | 53 (70) | 182 (67) | 2968 (70) |
Female | 15 (29) | 23 (30) | 91 (33) | 1282 (30) |
Risk | ||||
MSM | 22 (42) | 25 (33) | 92 (34) | 1594 (38) |
HET | 15 (29) | 28 (37) | 94 (34) | 1559 (37) |
IVD | 15 (29) | 19 (25) | 71 (26) | 916 (22) |
Ethnicity | ||||
White | 45 (86) | 56 (74) | 210 (77) | 3299 (78) |
Black | 4 (8) | 10 (13) | 34 (12) | 574 (13) |
Other ethnicity | 3 (6) | 10 (13) | 29 (11) | 377 (9) |
Age, y, median (IQR) | 57 (53–64) | 56 (50–60) | 55 (49–60) | 55 (49–60) |
No. of subtypes | ||||
1 | 40 (77) | 55 (72) | 181 (66) | 4007 (94) |
2 | 12 (23) | 20 (26) | 92 (34) | 236 (6) |
Other infections | ||||
Ever had syphilis | 11 (21) | 15 (20) | 65 (24) | 963 (23) |
Having CMV | 45 (87) | 65 (86) | 233 (85) | 3614 (85) |
Ever had HCV | 21 (40) | 28 (37) | 90 (33) | 1221 (29) |
Characteristic . | Superinfection Confirmed . | Superinfection Not Confirmed . | Remaining Potential Superinfection . | SHCS Individuals With ≥ 2 HIV-1 pol Sequences . |
---|---|---|---|---|
Total No. | 52 | 76 | 273 | 4250 |
Sex | ||||
Male | 37 (71) | 53 (70) | 182 (67) | 2968 (70) |
Female | 15 (29) | 23 (30) | 91 (33) | 1282 (30) |
Risk | ||||
MSM | 22 (42) | 25 (33) | 92 (34) | 1594 (38) |
HET | 15 (29) | 28 (37) | 94 (34) | 1559 (37) |
IVD | 15 (29) | 19 (25) | 71 (26) | 916 (22) |
Ethnicity | ||||
White | 45 (86) | 56 (74) | 210 (77) | 3299 (78) |
Black | 4 (8) | 10 (13) | 34 (12) | 574 (13) |
Other ethnicity | 3 (6) | 10 (13) | 29 (11) | 377 (9) |
Age, y, median (IQR) | 57 (53–64) | 56 (50–60) | 55 (49–60) | 55 (49–60) |
No. of subtypes | ||||
1 | 40 (77) | 55 (72) | 181 (66) | 4007 (94) |
2 | 12 (23) | 20 (26) | 92 (34) | 236 (6) |
Other infections | ||||
Ever had syphilis | 11 (21) | 15 (20) | 65 (24) | 963 (23) |
Having CMV | 45 (87) | 65 (86) | 233 (85) | 3614 (85) |
Ever had HCV | 21 (40) | 28 (37) | 90 (33) | 1221 (29) |
Data are presented as Number (%) unless otherwise indicated. Comparison of the sex, risk, ethnicity, age, number of subtypes assigned, and coinfections for the confirmed and unconfirmed HIV-1 superinfections, the remaining potential HIV-1 superinfections and the control group of SHCS individuals with ≥2 longitudinal sequences in the drug resistance database.
Abbreviations: CMV, cytomegalovirus; HCV, hepatitis C virus; HET, heterosexual contact; HIV-1, human immunodeficiency virus type 1; IQR, interquartile range; IVD, intravenous drug use; MSM, men who have sex with men; SHCS, Swiss HIV Cohort Study.
Characteristic . | Superinfection Confirmed . | Superinfection Not Confirmed . | Remaining Potential Superinfection . | SHCS Individuals With ≥ 2 HIV-1 pol Sequences . |
---|---|---|---|---|
Total No. | 52 | 76 | 273 | 4250 |
Sex | ||||
Male | 37 (71) | 53 (70) | 182 (67) | 2968 (70) |
Female | 15 (29) | 23 (30) | 91 (33) | 1282 (30) |
Risk | ||||
MSM | 22 (42) | 25 (33) | 92 (34) | 1594 (38) |
HET | 15 (29) | 28 (37) | 94 (34) | 1559 (37) |
IVD | 15 (29) | 19 (25) | 71 (26) | 916 (22) |
Ethnicity | ||||
White | 45 (86) | 56 (74) | 210 (77) | 3299 (78) |
Black | 4 (8) | 10 (13) | 34 (12) | 574 (13) |
Other ethnicity | 3 (6) | 10 (13) | 29 (11) | 377 (9) |
Age, y, median (IQR) | 57 (53–64) | 56 (50–60) | 55 (49–60) | 55 (49–60) |
No. of subtypes | ||||
1 | 40 (77) | 55 (72) | 181 (66) | 4007 (94) |
2 | 12 (23) | 20 (26) | 92 (34) | 236 (6) |
Other infections | ||||
Ever had syphilis | 11 (21) | 15 (20) | 65 (24) | 963 (23) |
Having CMV | 45 (87) | 65 (86) | 233 (85) | 3614 (85) |
Ever had HCV | 21 (40) | 28 (37) | 90 (33) | 1221 (29) |
Characteristic . | Superinfection Confirmed . | Superinfection Not Confirmed . | Remaining Potential Superinfection . | SHCS Individuals With ≥ 2 HIV-1 pol Sequences . |
---|---|---|---|---|
Total No. | 52 | 76 | 273 | 4250 |
Sex | ||||
Male | 37 (71) | 53 (70) | 182 (67) | 2968 (70) |
Female | 15 (29) | 23 (30) | 91 (33) | 1282 (30) |
Risk | ||||
MSM | 22 (42) | 25 (33) | 92 (34) | 1594 (38) |
HET | 15 (29) | 28 (37) | 94 (34) | 1559 (37) |
IVD | 15 (29) | 19 (25) | 71 (26) | 916 (22) |
Ethnicity | ||||
White | 45 (86) | 56 (74) | 210 (77) | 3299 (78) |
Black | 4 (8) | 10 (13) | 34 (12) | 574 (13) |
Other ethnicity | 3 (6) | 10 (13) | 29 (11) | 377 (9) |
Age, y, median (IQR) | 57 (53–64) | 56 (50–60) | 55 (49–60) | 55 (49–60) |
No. of subtypes | ||||
1 | 40 (77) | 55 (72) | 181 (66) | 4007 (94) |
2 | 12 (23) | 20 (26) | 92 (34) | 236 (6) |
Other infections | ||||
Ever had syphilis | 11 (21) | 15 (20) | 65 (24) | 963 (23) |
Having CMV | 45 (87) | 65 (86) | 233 (85) | 3614 (85) |
Ever had HCV | 21 (40) | 28 (37) | 90 (33) | 1221 (29) |
Data are presented as Number (%) unless otherwise indicated. Comparison of the sex, risk, ethnicity, age, number of subtypes assigned, and coinfections for the confirmed and unconfirmed HIV-1 superinfections, the remaining potential HIV-1 superinfections and the control group of SHCS individuals with ≥2 longitudinal sequences in the drug resistance database.
Abbreviations: CMV, cytomegalovirus; HCV, hepatitis C virus; HET, heterosexual contact; HIV-1, human immunodeficiency virus type 1; IQR, interquartile range; IVD, intravenous drug use; MSM, men who have sex with men; SHCS, Swiss HIV Cohort Study.
Characteristic . | Univariable OR (95% CI) . | Univariable P Value . | Multivariable OR (95% CI) . | Multivariable P Value . |
---|---|---|---|---|
Sex | ||||
Male | 1 | 1 | ||
Female | 0.939 (.498–1.681) | .837 | 1.192 (.562–2.518) | .644 |
Risk | ||||
MSM | 1 | 1 | ||
HET | 0.697 (.353–1.681) | .284 | 0.608 (.244–1.428) | .268 |
IVD | 1.186 (.601–2.281) | .612 | 0.695 (.243–2.048) | .502 |
Ethnicity | ||||
White | 1 | 1 | ||
Black | 0.511 (.153–1.264) | .200 | 0.64 (.176–1.857) | .446 |
Other ethnicity | 0.583 (.141–1.605) | .368 | 0.622 (.149–1.737) | .432 |
No. of subtypes | ||||
1 subtype | 1 | 1 | ||
≥2 subtypes | 5.094 (2.528–9.546) | 0 | 5.409 (2.667–10.225) | 0 |
Other infections | ||||
Ever had syphilis | 0.916 (.446–1.726) | .797 | 0.788 (.359–1.623) | .533 |
Having CMV | 1.131 (.542–2.757) | .763 | 1.201 (.547–3.029) | .671 |
Ever had HCV | 1.681 (.949–2.918) | .068 | 1.763 (.737–3.853) | .177 |
Characteristic . | Univariable OR (95% CI) . | Univariable P Value . | Multivariable OR (95% CI) . | Multivariable P Value . |
---|---|---|---|---|
Sex | ||||
Male | 1 | 1 | ||
Female | 0.939 (.498–1.681) | .837 | 1.192 (.562–2.518) | .644 |
Risk | ||||
MSM | 1 | 1 | ||
HET | 0.697 (.353–1.681) | .284 | 0.608 (.244–1.428) | .268 |
IVD | 1.186 (.601–2.281) | .612 | 0.695 (.243–2.048) | .502 |
Ethnicity | ||||
White | 1 | 1 | ||
Black | 0.511 (.153–1.264) | .200 | 0.64 (.176–1.857) | .446 |
Other ethnicity | 0.583 (.141–1.605) | .368 | 0.622 (.149–1.737) | .432 |
No. of subtypes | ||||
1 subtype | 1 | 1 | ||
≥2 subtypes | 5.094 (2.528–9.546) | 0 | 5.409 (2.667–10.225) | 0 |
Other infections | ||||
Ever had syphilis | 0.916 (.446–1.726) | .797 | 0.788 (.359–1.623) | .533 |
Having CMV | 1.131 (.542–2.757) | .763 | 1.201 (.547–3.029) | .671 |
Ever had HCV | 1.681 (.949–2.918) | .068 | 1.763 (.737–3.853) | .177 |
Univariable and multivariable logistic regression for different risk factors that could be associated with the outcome of being human immunodeficiency virus type 1 superinfected. The sex, risk group, ethnicity, number of subtypes, and coinfections were considered for the regression for the 52 confirmed cases against the 4250 control patients with ≥2 longitudinal sequences in the Swiss HIV Cohort Study drug resistance database.
Abbreviations: CI, confidence interval; CMV, cytomegalovirus; HCV, hepatitis C virus; HET, heterosexual contact; IVD, intravenous drug use; MSM, men who have sex with men; OR, odds ratio.
Characteristic . | Univariable OR (95% CI) . | Univariable P Value . | Multivariable OR (95% CI) . | Multivariable P Value . |
---|---|---|---|---|
Sex | ||||
Male | 1 | 1 | ||
Female | 0.939 (.498–1.681) | .837 | 1.192 (.562–2.518) | .644 |
Risk | ||||
MSM | 1 | 1 | ||
HET | 0.697 (.353–1.681) | .284 | 0.608 (.244–1.428) | .268 |
IVD | 1.186 (.601–2.281) | .612 | 0.695 (.243–2.048) | .502 |
Ethnicity | ||||
White | 1 | 1 | ||
Black | 0.511 (.153–1.264) | .200 | 0.64 (.176–1.857) | .446 |
Other ethnicity | 0.583 (.141–1.605) | .368 | 0.622 (.149–1.737) | .432 |
No. of subtypes | ||||
1 subtype | 1 | 1 | ||
≥2 subtypes | 5.094 (2.528–9.546) | 0 | 5.409 (2.667–10.225) | 0 |
Other infections | ||||
Ever had syphilis | 0.916 (.446–1.726) | .797 | 0.788 (.359–1.623) | .533 |
Having CMV | 1.131 (.542–2.757) | .763 | 1.201 (.547–3.029) | .671 |
Ever had HCV | 1.681 (.949–2.918) | .068 | 1.763 (.737–3.853) | .177 |
Characteristic . | Univariable OR (95% CI) . | Univariable P Value . | Multivariable OR (95% CI) . | Multivariable P Value . |
---|---|---|---|---|
Sex | ||||
Male | 1 | 1 | ||
Female | 0.939 (.498–1.681) | .837 | 1.192 (.562–2.518) | .644 |
Risk | ||||
MSM | 1 | 1 | ||
HET | 0.697 (.353–1.681) | .284 | 0.608 (.244–1.428) | .268 |
IVD | 1.186 (.601–2.281) | .612 | 0.695 (.243–2.048) | .502 |
Ethnicity | ||||
White | 1 | 1 | ||
Black | 0.511 (.153–1.264) | .200 | 0.64 (.176–1.857) | .446 |
Other ethnicity | 0.583 (.141–1.605) | .368 | 0.622 (.149–1.737) | .432 |
No. of subtypes | ||||
1 subtype | 1 | 1 | ||
≥2 subtypes | 5.094 (2.528–9.546) | 0 | 5.409 (2.667–10.225) | 0 |
Other infections | ||||
Ever had syphilis | 0.916 (.446–1.726) | .797 | 0.788 (.359–1.623) | .533 |
Having CMV | 1.131 (.542–2.757) | .763 | 1.201 (.547–3.029) | .671 |
Ever had HCV | 1.681 (.949–2.918) | .068 | 1.763 (.737–3.853) | .177 |
Univariable and multivariable logistic regression for different risk factors that could be associated with the outcome of being human immunodeficiency virus type 1 superinfected. The sex, risk group, ethnicity, number of subtypes, and coinfections were considered for the regression for the 52 confirmed cases against the 4250 control patients with ≥2 longitudinal sequences in the Swiss HIV Cohort Study drug resistance database.
Abbreviations: CI, confidence interval; CMV, cytomegalovirus; HCV, hepatitis C virus; HET, heterosexual contact; IVD, intravenous drug use; MSM, men who have sex with men; OR, odds ratio.
DISCUSSION
In this work, we developed a molecular epidemiology–based approach for systematic screening of HIV-1 superinfection, using the dense pool of historic samples of the SHCS. We identified 325 potential HIV-1 superinfections, and assessed 128 likely cases by retrospective HIV-1 NFVGS from longitudinal samples in our biobank. We validated our approach and unambiguously confirmed 52 cases (77.6% of 67 sufficiently concordant with the initial screen).
A similar approach found approximately 86% validation cases involving sample misidentification [24]. Other studies retrospectively analyzed cases identified through patients’ abnormal laboratory values (eg, increased VL, decreased CD4 cell counts, changed resistance patterns) [8, 9, 20], or using proviral DNA [21, 34, 35], which was not yet used to screen for superinfection. Also, available tools to investigate dual infections [30] are not yet tailored to identify superinfection, nor to work with individuals’ longitudinal sequences. So, although considerable work on superinfection was done, larger systematic population-based longitudinal screens are missing. And with changing guidelines for treatment as prevention independent of clinical or laboratory markers [36, 37], the suitable window to systematically study HIV-1 superinfection became very short or borderline impossible. Thus, our study demonstrates the feasibility of a systematic molecular epidemiology–based approach, applicable to cohort studies to screen for HIV-1 superinfection. The systematic aspect of our approach is underlined by our initial screen including 4575 of 12 397 patients from the SHCS DRDB; that is, we screened for superinfections in ≥30% of patients with available viral sequences.
With our approach and the SHCS resources, we reliably identified and confirmed, to our knowledge, more cases than other studies could [23, 38, 39]. Notably, we use the combination of 2 robust criteria, which were often used separately but rarely in combination to characterize superinfections. The confirmation rate per category supports our hypothesized stratification on the likelihood of sample misclassification (66.7%–100% in category 3, 27.6% in category 2, and 27.2% in category 1). Considering the similar results between the categories 1 and 2, they could be treated as one category in phylogenetic-based approaches to identify HIV-1 superinfection. Overall, this demonstrates the complexity of using phylogenetics and the appropriate phylogeny reconstruction tool in such analysis [40].
We also demonstrated that the sampling window and frequency is key to systematically screen for HIV-1 superinfection. We could not confirm or eliminate 61 cases for which the time points in common between the initial screen and validation analyses were insufficiently informative. For 26 other cases with ≥2 time points matching between both analyses, 56.7% could not be confirmed, highlighting another limitation of such approach regarding the similarity of the virus strains involved in the superinfection. They were intrasubtype B cases that our validation phylogeny could not disentangle, most likely due to sampling time or to the size and diversity of the phylogenetic tree used for our validation compared to the original screening (3009 vs 22 243 sequences). These superinfections remain challenging for a systematic phylogenetic-based screen and should involve detailed recombination analysis or haplotype reconstruction more sensitive to these phenotypes. We thus used the APD as an additional measure of intrapatient diversity, and validated 52 cases as true superinfections (77.6% of 67 with sufficiently matching and cross-comparable near-full-length and GRT HIV-1 sequences). Overall, our phylogenetic approach like most others, shows limitations and might not detect superinfections involving viral strains from the same subtype, or a strain that does not replace, or only partially outcompetes (at a lower frequency) the original strain. We also acknowledge that our screen may fail to identify superinfections in case of recombination or if the viral strains before and after superinfections stem from epidemiological settings that are only poorly represented in the analyzed dataset. Finally, our initial screen was based on partial pol sequences; thus, we cannot exclude that some cases not selected by this screening method would show signs of superinfection in other genes.
For the potential and confirmed superinfections, we find more MSM than HET and IVDs, and no significant association between confirmed superinfections and these risk factors. Despite the small sample size, this may suggest that superinfection happens independent of the infection route. This also suggest that for MSM and IVDs, HIV-1 superinfections occur in rather similar networks with closely related viral strains, which are challenging to detect using phylogenetic-based approaches and thus to estimate the true prevalence.
To our knowledge, this is the most extensive screen for superinfections in a cohort study with the largest number of confirmed cases. It confirms that superinfections are rare but not negligible events, with an estimated prevalence of 1% to 7%, most likely an underestimation since detection is challenging. Nevertheless, this work paves the way for follow-up studies to benefit from the sample size and the NGS data generated to molecularly characterize HIV-1 superinfection and investigate other risk factors associated. Better molecular characterization and risk factors understanding could provide further insights into HIV transmission and pathogenesis, benefit HIV vaccine research, and enable preventive measures to raise awareness on HIV-1 superinfection in the community. Overall, this work sets the groundwork to use any detailed cohort database like the SHCS, to systematically study HIV-1 superinfection and its biological mechanisms.
Supplementary Data
Supplementary materials are available at The Journal of Infectious Diseases online (http://jid.oxfordjournals.org/). Supplementary materials consist of data provided by the author that are published to benefit the reader. The posted materials are not copyedited. The contents of all supplementary data are the sole responsibility of the authors. Questions or messages regarding errors should be addressed to the author.
Notes
Members of the Swiss HIV Cohort Study (SHCS). K. Aebi-Popp, A. Anagnostopoulos, M. Battegay, E. Bernasconi, J. Böni, D. L. Braun, H. C. Bucher, A. Calmy, M. Cavassini, A. Ciuffi, G. Dollenmaier, M. Egger, L. Elzi, J. Fehr, J. Fellay, H. Furrer, C. A. Fux, H. F. Günthard (President of the SHCS), D. Haerry (deputy of Positive Council), B. Hasse, H. H. Hirsch, M. Hoffmann, I. Hösli, M. Huber, C. R. Kahlert (Chairman of the Mother and Child Substudy), L. Kaiser, O. Keiser, T. Klimkait, R. D. Kouyos, H. Kovari, B. Ledergerber, G. Martinetti, B. Martinez de Tejada, C. Marzolini, K. J. Metzner, N. Müller, D. Nicca, P. Paioni, G. Pantaleo, M. Perreau, A. Rauch (Chairman of the Scientific Board), C. Rudin, K. Kusejko (Head of Data Centre), P. Schmid, R. Speck, M. Stöckle (Chairman of the Clinical and Laboratory Committee), P. Tarr, A. Trkola, P. Vernazza, G. Wandeler, R. Weber, S. Yerly.
Author contributions. H. F. G., R. D. K., K. J. M., and S. E. C. conceived the study, performed the analysis, and wrote the first draft of the manuscript. C. L. performed the HIV-1 near-full-length viral genome sequencing. A. M., K. K., N. T., and H. N. contributed to the analysis of the results. M. H., J. B., M. P., T. K., S. Y., A. R., H. H. H., A. R., A. C., P. V., E. B., and M. C. collected and contributed data. All authors read and approved the final manuscript.
Acknowledgments. We thank the patients who participate in the SHCS; the physicians and study nurses for excellent patient care; A. Scherrer, A. Traytel, S. Wild, and K. Kusejko from the SHCS Data Centre for data management; and D. Perraudin and M. Amstad for administrative assistance. We thank the members of the SHCS. We thank Melissa Robbiani for help with editing the manuscript.
Financial support. This study has been financed within the framework of the SHCS, supported by the Swiss National Science Foundation (grant numbers 177499 and 179571 to H. F. G.); the SHCS Research Foundation; and the Yvonne Jacob Foundation (to H. F. G.). The data are gathered by the 5 Swiss University Hospitals, 2 Cantonal Hospitals, 15 affiliated hospitals, and 36 private physicians (listed in http://www.shcs.ch/180-health-care-providers).
References
Author notes
K. J. M., R. D. K., and H. F. G. contributed equally to this work.
The members of the Swiss HIV Cohort Study are listed in the Notes.
Potential conflicts of interest. The institution of E. B. received fees for E. B. participation in advisory boards and travel grants from Gilead Sciences, MSD, ViiV Healthcare, Pfizer, AbbVie, and Sandoz. K. J. M. has received advisory board honoraria from Gilead Sciences; has received travel grants and honoraria from Gilead Sciences, Roche Diagnostics, GlaxoSmithKline, Merck Sharp & Dohme, Bristol-Myers Squibb, ViiV, and Abbott; the University of Zurich received research grants from Gilead Science, Novartis, Roche, and Merck Sharp & Dohme for studies for which K. J. M. serves as principal investigator. H. F. G. has received unrestricted research grants from Gilead Sciences and Roche; fees for data and safety monitoring board membership from Merck; consulting/advisory board membership fees from Gilead Sciences, Merck, and ViiV Healthcare; and grants from SystemsX, and the National Institutes of Health. The institution of H. F. G. received educational grants from Gilead Sciences, ViiV, MSD, AbbVie, and Sandoz. All other authors report no potential conflicts of interest.
All authors have submitted the ICMJE Form for Disclosure of Potential Conflicts of Interest. Conflicts that the editors consider relevant to the content of the manuscript have been disclosed.