HIV-1 p24Gag adaptation to modern and archaic HLA-allele frequency differences in ethnic groups contributes to viral subtype diversification

Abstract Pathogen-driven selection and past interbreeding with archaic human lineages have resulted in differences in human leukocyte antigen (HLA)-allele frequencies between modern human populations. Whether or not this variation affects pathogen subtype diversification is unknown. Here we show a strong positive correlation between ethnic diversity in African countries and both human immunodeficiency virus (HIV)-1 p24gag and subtype diversity. We demonstrate that ethnic HLA-allele differences between populations have influenced HIV-1 subtype diversification as the virus adapted to escape common antiviral immune responses. The evolution of HIV Subtype B (HIV-B), which does not appear to be indigenous to Africa, is strongly affected by immune responses associated with Eurasian HLA variants acquired through adaptive introgression from Neanderthals and Denisovans. Furthermore, we show that the increasing and disproportionate number of HIV-infections among African Americans in the USA drive HIV-B evolution towards an Africa-centric HIV-1 state. Similar adaptation of other pathogens to HLA variants common in affected populations is likely.


Gag Regions in HIV-B (B) and HIV-C (C)
The subtype-specific positions (SSP) 27, 41, 116, 120, and 128 are highlighted in yellow, and the letters representing site-specific amino acids are sized according to their associated frequencies.
HIV-B position 128 and HIV-C positions 27, 41, and 116 are almost always occupied by the consensus amino acids and cannot be used in the analysis. The HIV-B consensus amino acids are circled. This figure was generated using a single sequence from each HLA annotated patient in the HIV database ). (A). Three common forms of HLA-associated single epitope selective pressures. CTL responses targeting an epitope might select for an upstream mutation (indicated by the orange polygon) that disrupts ERAP trimming (as in (Draenert et al. 2004)) or intra-epitope mutations that abrogates epitope binding to the presenting HLA molecule or T cell receptor (TCR) recognition of the epitope (reviewed in (Goulder and Watkins 2008;Goulder and Walker 2012)). Arrows signify different forms of CTL selective pressures, geometric symbols indicate amino acids, and the green triangle represents an HIV-B subtype-specific amino acid (as in (B)).
(B). Schematic representation of a conserved p24Gag region in HIV-B and HIV-C with a subtype-specific amino acid position (SSP; HIV-B, green triangle, HIV-C, yellow ellipse) and down-and upstream epitope clusters; epitopes a-g indicate epitopes processed by intra-cellular proteasomes. When a patient presents any of these epitopes, additional HLA-associated selective pressures might select for the intra-epitope CTL-escape mutations described in Figure S2A. The nature of the amino acid in the subtype-specific position controls intra-cellular proteasomal production of the surrounding epitope-clusters (Tenzer et al. 2014), which each can contain 20-50 epitopes presented by approximately as many HLA variants . The outlined structure consisting of partly overlapping epitope sequences in clusters in hydrophobic regions that are separated by a subtype-specific position is also common in other HIV-1 proteins Lucchiari-Hartz et al. 2003).
(C), (D). Schematic outline of the outcome of the HLA-associated selective pressure on the subtype-specific positions (experimentally demonstrated in (Tenzer et al. 2014)). The combined HLA-associated selective pressure on the SSP in an HIV-infected population results in an inverse relationship between the abundance of a processed epitope and the frequency of the presenting HLA allele in the population in which the virus circulates (Tenzer et al. 2014). In this hypothetical example, the grey epitope is not produced when the proteasome processes HIV-B because the HLA variants that can present the grey epitope is very common in US Caucasians. In contrast, the red epitope is not produced when the proteasome digests HIV-C because the restricting HLA variant is very common in the South African Zulu population. The likelihood of CTL priming will increase with the amount of presented epitope on the infected cell surface (Faroudi et al. 2003;Tenzer et al. 2009).      Europe is higher than in the US due to a higher proportion of non-B HIV-1-infected immigrants and refugees primarily from Africa. Furthermore, the epidemic in some European countries was in part, or mostly, founded by non-B subtypes; for example, the percentage of HIV-B is low in Russia and former Soviet Union countries due to the introduction of HIV-A from the DRC and HIV-G in Portugal because of the introduction of HIV-G from Cape Verde, a former Portuguese colony (Beloukas et al. 2016;Diez-Fuertes et al. 2015).   The HIV-B BEAST Maximum Clade Credibility phylogeny annotated with the subtype-specific positions (SSP) 27, 41, 116, and 120 (see Fig. S1). Every leaf-node on the phylogeny represents a patient, and the four lines next to the leaf node show whether the majority rule amino acid in each of the four eligible SSPs was identical to the HIV-B consensus amino acid. The amino acid patterns shown here follow the phylogeny (i.e., continuous patches of red are more vertical than horizontal, and are therefore shared between closely related sequences rather than shared between multiple positions on the same patient), and demonstrate the necessity of using a multiple response random effect model where each amino acid is allowed to evolve independently. If the color pattern had been more horizontal, that would have meant a single patient's subtype-specific amino acids tended to change as a block, in which case a single patient parameter (random effect model) would have been more appropriate than the multiple random effect phylogenetic model used in this study.    The MCA converts each patient's HIV subtype-specific position amino acid profile into two dimensions. This MCA was trained using the subtype consensus sequences (labels), the MCA was then used to project the patient's HIV subtype-specific amino acids onto the two-dimensional space. Points were jittered to prevent them obscuring one another. While many patients' subtypespecific amino acids are identical to the subtype consensus (see large clusters around the HIV-B

Figure S3
and HIV-C labeled consensus sequences), other HIV-B and HIV-C sequences are identical in the five Gag subtype-specific sites studied here (e.g., around the labeled HIV-1 circulating recombinant (CRF) 01_AE consensus).    (modified from (Gragert et al. 2013)).