Modelling and in vitro testing of the HIV-1 Nef fitness landscape

Abstract An effective vaccine is urgently required to curb the HIV-1 epidemic. We have previously described an approach to model the fitness landscape of several HIV-1 proteins, and have validated the results against experimental and clinical data. The fitness landscape may be used to identify mutation patterns harmful to virus viability, and consequently inform the design of immunogens that can target such regions for immunological control. Here we apply such an analysis and complementary experiments to HIV-1 Nef, a multifunctional protein which plays a key role in HIV-1 pathogenesis. We measured Nef-driven replication capacities as well as Nef-mediated CD4 and HLA-I down-modulation capacities of thirty-two different Nef mutants, and tested model predictions against these results. Furthermore, we evaluated the models using 448 patient-derived Nef sequences for which several Nef activities were previously measured. Model predictions correlated significantly with Nef-driven replication and CD4 down-modulation capacities, but not HLA-I down-modulation capacities, of the various Nef mutants. Similarly, in our analysis of patient-derived Nef sequences, CD4 down-modulation capacity correlated the most significantly with model predictions, suggesting that of the tested Nef functions, this is the most important in vivo. Overall, our results highlight how the fitness landscape inferred from patient-derived sequences captures, at least in part, the in vivo functional effects of mutations to Nef. However, the correlation between predictions of the fitness landscape and measured parameters of Nef function is not as accurate as the correlation observed in past studies for other proteins. This may be because of the additional complexity associated with inferring the cost of mutations on the diverse functions of Nef.


Introduction
Despite extensive research efforts, a vaccine or a cure for HIV-1 remains elusive. One of the major hurdles in vaccine design and development is eliciting a human immune response that is not evaded by the highly diverse and mutable HIV-1 (Kwong, Mascola, and Nabel 2012). Because HIV-1 fitness plays a significant role in disease progression (Deacon et al. 1995;Quiñones-Mateu et al. 2000;Song et al. 2012), continuous immune pressure on the wild-type virus favouring the outgrowth of escape variants with diminished fitness is a suggested vaccine strategy (Allen and Altfeld 2008;Chopera et al. 2011a). Although promising, this strategy poses a challenge because HIV-1 develops compensatory mutations that can restore its reduced fitness (Brockman et al. 2007;Crawford et al. 2007;Chopera et al. 2011a). Thus, mutations cannot be considered in isolation when assessing fitness consequences.
Our group has developed, and tested, computational models that aim to predict viral fitness based on the amino acid sequence and thereby quantify the relative fitness of strains containing different mutation patterns (Ferguson et al. 2013;Mann et al. 2014a; Barton et al. 2016b;Butler et al. 2016;Louie et al. 2018). These models may be used to identify deleterious mutational couplings with the end goal of potentially restricting viable escape of HIV-1 with vaccine immunogens that maximise sites that are harmful to HIV-1 when mutated in combination. We have previously applied this modelling approach to the HIV-1 Gag protein and validated the model by measuring the replication capacities of HIV-1 strains with various Gag mutation combinations in vitro (Ferguson et al. 2013;Mann et al. 2014a). It has also been successfully applied to study drug resistance mutations in HIV-1 protease (Butler et al. 2016), and to predict the process of escape in individual patients from T cell-mediated host immune pressure on diverse HIV proteins (Barton et al. 2016b).
Here we report a fitness landscape model of the HIV-1 Nef protein. Nef plays an important role in HIV-1 pathogenesis as well as host immune evasion (Kestler et al. 1991;Collins et al. 1998;Foster and Garcia 2008) making it an attractive vaccine target. Although Nef was included in the unsuccessful STEP trial, there is currently little evidence for or against its inclusion in vaccine strategies (Buchbinder et al. 2008). In this study, we evaluated and compared the accuracy of models of varying complexities in predicting the fitness consequences of mutations in HIV-1 Nef. In order to do so, we performed in vitro measurements of the Nef-driven replication capacities as well as Nef-mediated CD4 and HLA-I down-modulation capacities of thirty-two mutants, which were selected based on their predicted values of fitness derived from in silico models. Nef-mediated CD4 and HLA-I down-modulation capacities of the selected mutants were measured in addition to replicative fitness since these are key activities of the Nef protein that may influence pathogenesis (Iafrate et al. 2000;Swigut et al. 2004;Mann et al. 2014b). While CD4 down-modulation enhances viral replication through promoting the release of infectious viral particles (Ross, Oran, and Cullen 1999;Greenway et al. 2003), HLA-I down-modulation is not expected to directly influence HIV replication in vitro yet may still play a significant role in HIV-1 pathogenesis through enhancing immune evasion (Foster and Garcia 2008). Furthermore, we extended the testing of the model to patient-derived sequences of Nef, 298 subtype C Nef sequences (Mann et al. 2014b) and 150 subtype B Nef sequences (Mwimanzi et al. 2013a,b;Mahiti et al. 2015;Toyoda et al. 2015;Mahiti et al. 2016), for which several Nef functions were previously measured. For the subtype B Nef sequences, in addition to CD4 and HLA-I down-modulation, five other Nef activities were measured, namely down-modulation of CXCR4 and CCR5, upregulation of CD74, enhancement of virion infectivity, and enhancement of viral replication, allowing investigation of which Nef activities align most strongly with model predictions.

Ising and Potts model inference
In order to infer the Ising and Potts models, we downloaded multiple sequence alignments (MSAs) of 11,354 HIV-1 Nef subtype B sequences and 6,469 subtype C sequences, obtained from 3,300 and 1,656 unique individuals, respectively, from the Los Alamos National Laboratory HIV Sequence Database (www.hiv.lanl.gov). We then processed the sequence data and employed the selective cluster expansion method (Cocco andMonasson 2011, 2012;Barton et al. 2016a) to infer Ising and Potts models that capture the pattern of correlated mutations observed in the sequence data, as previously described (Barton et al. 2016b); for completeness, methods for processing sequence data and information about the model inference are summarised in the Supplementary Material. Separate models are inferred for subtypes B and C data. In this model, the estimated prevalence P(z) of an HIV sequence z, represented as a vector of amino acids, is given by Here N ¼ 206 is the length of the protein, and the z i represent the amino acids present at each residue i. The parameters h i (z i ) and J ij (z i , z j ), referred to as fields and couplings, are chosen so that the model reproduces the observed single-and doublemutant frequencies. Note that the values of E, referred to as energy, in Equation (1) are inversely related to prevalence, such that sequences with high E values have low prevalence, and vice versa. In past work, based on HIV biology, its evolutionary history and physics-based models, we have argued that, for HIV, the order of the prevalence of circulating strains is statistically similar to its fitness (i.e. ability to propagate infection) (Butler et al. 2016;Chakraborty 2017;Chakraborty and Barton 2017). This is in part due to HIV's status as a chronic infection, its high mutation and recombination rates, and the great diversity of largely ineffective immune responses against it.
We considered models that account for the diversity of amino acids at each residue (Potts models) with a wide range of complexities, as measured by the entropy threshold S T , for selecting how many amino acids to include explicitly in the model at each residue. The possible values of the entropy threshold range from S T ¼ 0 (the Ising model), where only the consensus amino acid at each residue is explicitly modelled and all other amino acids are treated as the same 'mutant' type, to S T ¼ 1, where all observed amino acids are modelled explicitly. Here we inferred models with S T ¼ 0, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, and 0.9. Overall agreement between the E values for all models was excellent ( Supplementary Fig. S1). We use models trained on sequence data from the same HIV-1 subtype as in the experiment for the comparisons with experimental measurements reported below.

Mutation selection
To evaluate and compare the ability of the Ising and the Potts models to predict the fitness consequences of mutations in HIV-1 Nef, several mutations and mutation combinations were selected for testing based on their computed values of E. The models assign a value of E to each viral strain, which is predicted to be inversely related to the replicative fitness of the strain (Berg, Willmann, and Lä ssig 2004;Sella and Hirsh 2005;Ferguson et al. 2013;Mann et al. 2014a). Mutants were selected subjectively so that they covered the full range of E values. We included mutations in known functional motifs involved in CD4 and HLA-I down-modulation for comparison to what has been documented in literature. We also included HLA-associated mutations (which are likely to be cytotoxic T lymphocyte [CTL] escape mutations and are identified by statistical association with HLA alleles; Carlson et al. 2008) or known CTL escape mutations that covered a range of E values, since it would be desirable to identify CTL escape mutations with functional costs. Different amino acid mutations at the same codon with differential E values were also chosen to test the ability of Potts models to distinguish between different substitutions at the same codon. All mutations and their E values computed by the Ising and Potts models are summarised in Table 1.

Site-directed mutagenesis
The mutations listed in Table 1 were introduced using the QuikChange II XL Site-Directed Mutagenesis kit (Stratagene, La Jolla, CA, USA) into the consensus B nef sequence (2004 consensus sequence available from the Los Alamos National Laboratory HIV Sequence Database [LANL]), which was first cloned into a TOPO vector (Invitrogen, Carlsbad, CA, USA). The procedure involved the generation of mutants through polymerase chain reaction (PCR) amplification of the desired template with custom designed mutagenic primers. The mutant plasmid was then transformed into XL10-Gold ultracompetent cells and sequencing was performed to confirm the presence of the correct mutation in nef-containing colonies. Following successful introduction of mutations, the mutant nef sequences were cloned into an HIV-1 NL4-3 plasmid. Briefly, an SF2 nef NL4-3 plasmid (containing NcoI and NotI restriction sites flanking SF2 nef), which was previously prepared (Fackler et al. 2006;Ueno et al. 2008), was digested with NcoI and NotI enzymes and the resulting nef-deleted NL4-3 plasmid was gel purified. PCR was performed to amplify the mutant nef sequences using forward and reverse primers containing the NcoI and NotI restriction sites, respectively. PCR products were digested with NcoI and NotI enzymes and then ligated to the digested nef-deleted NL4-3 plasmid using T4 ligase. The ligation mixture was transformed into XL10-Gold ultracompetent cells. Plasmids were then purified using a Qiagen maxiprep kit and sequenced to verify the correct mutant sequence.

Generation of mutant viruses in reporter cell lines and measurement of their replication capacities following virus infection of primary cells
Mutant viruses were generated from the mutant nef NL4-3 plasmids as previously described (Wright et al. 2012). Briefly, this was carried out by electroporating 4 million green fluorescent protein (GFP)-reporter CEM-GXR T cells (described in Brockman et al. 2006; and obtained from Prof. Mark Brockman from Simon Fraser University) with 10 lg mutant plasmid. This resulted in production of mutant virus from the cells transfected with mutant plasmids. The percentage infected cells (GFP-positive cells) was monitored by flow cytometry and mutant viruses were harvested at 30 per cent infection. Replication capacities of the mutant viruses were measured in peripheral blood mononuclear cells (PBMCs) from two different HIV-1 negative donors as previously described (Ueno et al. 2008). Briefly, 1 million PBMCs were infected with mutant viruses at 7.5 ng p24 (pre-determined by p24 ELISA using the Retro-trek HIV-1 p24 Antigen ELISA 2.0 kit [ZeptoMetrix corporation, New York, NY, USA]), and were equally divided into four wells (200 ml each) of a ninety-six-well plate. On Day 3 of incubation, 100 ml supernatant was removed and PBMCs were stimulated with 100 ml of 10 mg/ml phytohaemagglutinin, and thereafter every 3 days 100 ml supernatant was removed and replaced with medium containing 20 U/ml interleukin-2. p24 concentration on Day 12 of incubation, at which point the viruses had detectable replication but had not yet surpassed peak replication, was used as the measure of replication capacity. The replication capacities of the mutant viruses were expressed as the mean of quadruplicate assessments in each donor and normalised to that of the LANL consensus B Nef-NL4-3 wild-type virus.
2.2.4 CD4 and HLA-I down-modulation assay using mutant Nef sequences cloned into pSELECT plasmid Nef-mediated CD4 and HLA-I down-modulation assays were performed as previously described in a GXR cell line which expresses high levels of CD4 and HLA-A*02 (Brockman et al. 2006) (obtained from Prof. Mark Brockman of Simon Fraser University). Briefly, the mutant nef sequences were cloned, as previously described (Mann et al. 2013), into a zeocin-resistant GFP-expressing plasmid (pSELECT) as this allowed detection by flow cytometry of cells transfected with Nef clones. GXR cells (600,000) were transfected by electroporation with the Nef clones (8 mg). Following a 20-h incubation, the cells were stained with fluorochrome-conjugated antibodies which bind to CD4 (APC-labelled anti-CD4 antibody, BD Biosciences, San Jose, CA, USA) and HLA-A*02 (PE-labelled HLA-A*02 antibody, BD Biosciences) molecules to allow for measurement of these molecules by flow cytometry. The down-modulation capacities of the mutant Nef clones were indicated by median fluorescence intensity of CD4 or HLA-A*02 expression in GFP-positive cells relative to that of the positive (SF2 pSELECT) and negative (DNef pSELECT) controls. This value was then normalised to that of the LANL consensus B Nef which was included in every assay. Assays were performed in duplicate and the results averaged.

Correlation analysis for Nef mutants
The relationship between the predicted fitness costs of the mutant viruses (expressed as E values) and the measured replication capacities of the mutant viruses was evaluated using Spearman's rank correlation. The relationship between the E values and the measured Nef-mediated CD4 and HLA-I downmodulation capacities was similarly evaluated. A significance level of P < 5 Â 10 À2 was used for all statistical analyses. E values were not corrected for the influence of phylogeny.

Functional analysis of patient-derived Nef sequences
The testing of the relationship between model energies and Nef function was extended to previously published patient-derived Nef sequences of subtype C (n ¼ 298) (Mann et al. 2014b) and subtype B (n ¼ 150) (Mwimanzi et al. 2013a,b;Mahiti et al. 2015Mahiti et al. , 2016Toyoda et al. 2015), with each sequence obtained from a different patient. The subtype C Nef sequences were derived from southern Africa and comprised 107 sequences from recently infected individuals and 191 sequences from chronically infected individuals. CD4 and HLA-I down-modulation activities were previously measured, using Nef sequences cloned into pSELECT plasmid as described above, for those Nef sequences (Mann et al. 2014b). The subtype B Nef sequences were derived from Boston, NY, and Berlin, Germany, and comprised ninetyone sequences from chronically infected individuals (forty-five of whom were elite controllers) and fifty-nine sequences from acute infection. Down-modulation of HLA-I, CD4, CXCR4, and CCR5, upregulation of CD74, enhancement of virion infectivity, and enhancement of viral replication was previously measured for these subtype B Nef sequences, as previously described ( Toyoda et al. 2015). Briefly, HLA-I down-modulation, upregulation of CD74, virion infectivity, and viral replication capacity were measured using NL4-3 recombinant viruses encoding the patient-derived subtype B Nef sequences (Mwimanzi et al. 2013a,b;Mahiti et al. 2015Mahiti et al. , 2016. The Nef sequences were first cloned into a nef-deleted NL4-3 plasmid, as described earlier for the Nef mutant viruses, and then the recombinant viruses were generated by the transfection of HEK-293T cells followed by harvest of culture supernatants 48 h later (Mwimanzi et al. 2013b). HLA-I down-modulation and CD74 upregulation were determined in a 721.221 cell line stably expressing HLA-A*24:02, by infecting these cells with recombinant virus (300 ng p24) and staining cells 48 h later with fluorescently labelled antibodies for these molecules followed by flow cytometry (Mahiti et al. 2015). Nef-mediated enhancement of virion infectivity was measured in TZM-bl cells by chemiluminescence detection following infection of these cells with recombinant virus (3 ng p24) (Mwimanzi et al. 2013b). Nef-mediated enhancement of viral replication was measured following infection of PBMCs with recombinant virus (Mwimanzi et al. 2013b), as described earlier for the mutant Nef viruses. CD4, CXCR4, and CCR5 downmodulation were measured by transfecting Nef clones (patientderived Nef sequences were cloned into the pIRES2-EGFP vector) into TZM-bl cells and subsequently detecting expression of these molecules by flow cytometry using fluorescently labelled antibodies (Toyoda et al. 2015). Notably, the CD4 measurements for the same subtype B Nef clones obtained in the TZM-bl cell line concurred well (r ¼ 0.9 and P < 0.0001) with those obtained in a CEM T cell line (Mwimanzi et al. 2013a,b), as was used for CD4 down-regulation measurements for the mutant Nef and subtype C patient-derived Nef clones.

Correlation analysis for patient-derived Nef sequences
Past work has shown a close relationship between the value of E in Equation (1) and fitness for a variety of HIV proteins (Ferguson et al. 2013;Mann et al. 2014a; Barton et al. 2016b;Louie et al. 2018). However, as mentioned above, contributions to fitness are more difficult to experimentally assess for a multifunctional protein such as Nef, especially since some of its functions may only be important in vivo. Furthermore, the existence of multiple functions that contribute to fitness makes comparisons between individual functional measurements and E more challenging. This is because E, which is inferred from diverse patient-derived sequences, is a proxy for in vivo fitness and concatenates the effects of mutations on the different Nef functions (including the unknown weights of each function in vivo). For example, consider a Nef protein that is unable to perform one of its critical functions (e.g. CD4 down-modulation), but where other, independent functions remain intact. In principle, because of this critical defect we would expect the E value of the corresponding sequence to be high. This makes sense in the context of the one function that is impaired, but not for the other Nef functions, which may appear normal. In short, high E values would lead us to expect low fitness and therefore functional impairment, but it need not be true that all functions are impaired in order for fitness to be affected. Thus, for a multifunctional protein, univariate correlations between E and individual functional measurements may not be strong even if E is a good predictor of overall fitness. Thus, in order to develop a more complete picture of the potential contributions of different Nef functions to fitness, we used Nef functional measurements to predict the E value of the corresponding sequence with multiple linear regression.
Assuming that E values are a reasonable proxy for fitness in vivo, this analysis would allow us to determine the extent to which impairment of each of the functions measured here contributes to fitness. In addition, we measured the correlation between each functional measurement and E as described above, but bearing in mind that these individual measurements do not reflect the complete picture.

Analysis of mutant Nef sequences
3.1.1 Ising and Potts model predictions correlate with replication capacities of mutant viruses The replication capacities of the Nef mutants were determined by infection of PBMCs followed by supernatant p24 ELISA measurement, and were normalised to the wild-type virus ( Table 2). The experiments were performed in duplicate, using PBMCs from two different donors, and the results were averaged (concordance between replicates: Pearson r ¼ 0.78, P ¼ 3 Â 10 À7 ; Supplementary Fig. S2). E values for all models correlated significantly with the mutant replication capacities (Fig. 1A). The correlation did not depend on model complexity, that is, S T ( Supplementary Fig. S3). Indeed, pairwise comparisons across models suggest that all differences in correlations fall short of the threshold of statistical significance (Supplementary Table  S1). In the subsequent analysis we therefore present only correlations for the Potts model with S T ¼ 0.9, which we have employed in past work (Mann et al. 2014a; Barton et al. 2016b). Results for models with other values of S T are similar in all cases. Full results are included in the Supplementary Material. Moreover, independent site forms of the model (i.e. those that include only the terms h i (z i ) in Equation (1)) display similar levels of correlation (see Supplementary Fig. S4). This is somewhat surprising and is in contrast with previous work on other HIV proteins, which showed that models with couplings outperformed simple entropy (Ferguson et al. 2013;Mann et al. 2014a; Barton et al. 2016b) or models without couplings (Louie et al. 2018) for diverse HIV proteins. Furthermore, couplings were shown to make a more substantial effect on the dynamics of escape (Barton et al. 2016b), compared to static metrics of fitness measured in vitro. This is because of the high mutability and replication rates of HIV, which allows the virus to sample rare compensatory pathways dynamically. Such rare, coupled mutations may be uncommon in our data outside of T cell epitopes, which would contribute a small fraction to the energy overall. Past studies of bacterial proteins have also found an advantage to the inclusion of couplings for functional predictions (Figliuzzi et al. 2016). One reason for this may be that the Nef sequences in the patient-derived samples have mutations in few pairs of sites with strong couplings, or that the cumulative effect of the fields overwhelms the effects of couplings for sequences that are more phylogenetically distant (see Supplementary Fig. S5). Another may be that the multiple functions of Nef obscure the effect of couplings when we compare with in vitro assays of one function at a time.
Thus, our data can test our ability to predict the fitness landscape of HIV, but significantly more data would be required to test distinctions between the predictive capabilities of the different Potts models. Although good, the correlation between replication capacity and E was lower for Nef than in previous comparisons using Gag (Mann et al. 2014a) and Env (Louie et al. 2018) mutants, and the accuracy of estimates of viral evolution in individual patients using our estimated fitness landscape for other proteins (Barton et al. 2016b). This may be due to the complex behaviour of Nef in vivo, which is reflected in our fitness landscape inferred from in vivo data, only part of which is captured through in vitro assays of replication capacity. Consistent with this hypothesis, as we describe below, our data on CD4 down-modulation also correlates with predictions from the fitness landscape.
3.1.2 Model predictions are significantly correlated with CD4 down-modulation but not with HLA-I down-modulation capacities of mutant Nef sequences GXR cells expressing high levels of HLA-A*02 were transfected with the mutant Nef clones and cell surface expression of CD4 and HLA-A*02 in transfected cells was subsequently measured by flow cytometry. The CD4 and HLA-I down-modulation capacities of the mutant Nef clones are presented in Table 2. Representative flow cytometry plots are illustrated in Fig. 2.
E values correlated significantly with the mutant CD4 downmodulation capacities (Spearman q ¼ À0.48, P ¼ 6 Â 10 À3 ; Fig. 1B), with little difference in correlation between model complexities ( Supplementary Fig. S3). Substantial reductions in CD4 downmodulation are only observed at E values >5 (Fig. 3A). Further analysis showed that Nef mutant sequences with impaired ability to down-modulate CD4 (<0.8) have significantly higher energies than those that are competent at CD4 down-modulation (Mann-Whitney U ¼ 7, P ¼ 2 Â 10 À3 ), consistent with the above observation (Fig. 3A). There was no significant rank correlation between E values and the mutant HLA-I down-modulation capacities (Spearman q ¼ À0.16, P ¼ 4 Â 10 À1 ) (Fig. 1C), though this correlation is more difficult to evaluate due to the small number of Nef mutant sequences with significantly impaired ability to down-modulate HLA-I. Consistent with this, Nef mutant sequences with impaired ability to down-modulate HLA-I have higher energies than those that are competent, but near the threshold of significance (Mann-Whitney U ¼ 3, P ¼ 0.04) (Fig. 3B). As expected, CD4 down-modulation was correlated positively with replication capacity (q ¼ 0.58 and P ¼ 7 Â 10 À4 ; Fig. 3C) while no strong correlation was observed between HLA-I down-modulation and replication capacity (q ¼ 0.12 and P ¼ 5 Â 10 À1 ; Fig. 3C). However, due to the small sample of Nef mutant sequences with impaired ability to down-modulate CD4 and/or HLA-I, it is difficult to compare their importance based on this data alone. We explore this question in more detail in the analysis of a large sample of patient-derived sequences, described below.

CD4 and HLA-I down-modulation abilities of Nef mutant sequences are consistent with known functional motifs
Several of the Nef mutations tested here (17K19K, 57G, 57R, 57R58P, 72L75L, and 123G) were in functional motifs previously Escape or HLA-associated mutations (see Table 1 for further details). c These results were unobtainable due to unsuccessful cloning of mutants 192R into the NL4-3 plasmid and 28E102H into the pSELECT plasmid.
described to be involved in either CD4 and/or HLA-I downregulation (Table 1). We observed that Nef mutants 57G, 57R, 57R58P, 72L75L, and 123G had impaired ability to downmodulate CD4 (13.8-65.3% of wild-type levels). Accordingly, these mutants, with the exception of 72L75L, were previously noted in the literature to be important for CD4 down-modulation activity (Table 1). Also consistent with the literature, mutants 57G, 57R, and 57R58P retained the ability to down-modulate HLA (100-102% of wild-type levels) while CD4 down-modulation activity was lessened, and 123G was impaired for both CD4 and HLA down-modulation (28.7 and 32%, respectively). Mutations (17K19K, 72L75L, and 123G) in motifs previously shown to be required for HLA-I down-modulation (Table 1) also showed reduced HLA down-modulation relative to the wild-type virus (94.4, 54, and 32%, respectively), although for mutant 17K19K, HLA-I down-modulation was not substantially impaired. Therefore all mutations in the known functional motifs, with the exception of 17K19K, had substantially impaired function. These mutations, including the 17K19K, are predicted to have large values of E (E > 5, Table 1), consistent with experimental data.
3.2 HLA-associated/escape mutants tested did not significantly impair Nef function A subset of HLA-associated mutations (likely cytotoxic T cell escape mutants) or known escape mutants, as well as pairs of these mutations, was selected that covered a range of E values ( Table 1) as identification of escape mutants/pairs with substantial fitness costs would be useful for attenuation-based HIV vaccine design. Only one of the mutants in this category displayed a marked reduction in replication capacity, namely 80D (18.6% of wild-type levels), while the rest ranged from 46.6 to 146.3 per cent of wild-type levels (median of 85.1%) ( Table 2). Overall, CD4 and HLA down-modulation capacities of mutants in this category were largely similar to wild-type ( Table 2). The CD4 down-modulation capacities ranged from 90 to 102.6 per cent of wild-type levels (median of 99.4%) and HLA down-modulation capacities from 80.4 to 103.8 per cent of wild-type levels (median of 96.8%). This is in contrast to our previous study testing the models of the HIV-1 Gag fitness landscape where we identified several pairs of HLA-associated mutations with high E values that had substantial fitness costs i.e. viruses bearing these mutations did not replicate in vitro (Mann et al. 2014a). While there are HLA-associated mutations in Nef that reduce its function (Ueno et al. 2008;Kuang et al. 2014;Shahid et al. 2015), our results indicate that it is more difficult to identify escape mutations with a substantial fitness cost in this highly variable protein. Furthermore, considering that some of the HLAassociated mutations we studied here were previously reported to significantly impact Nef function (71K and 88G) (Mann et al. 2015;Naidoo et al. 2019) but had very moderate effects here in the LANL consensus subtype B background, the fitness cost of these mutations may be dependent on sequence background. However, in these results we emphasise that we have tested only a subset of HLA-associated mutations, not the comprehensive list of all escape or HLA-associated mutations in Nef.

Functional measurements of different amino acids at the same codon are in mixed agreement with E values predicted by the Potts model
We explored whether the fitness outcome of different mutations at the same codon could be distinguished by Potts models. We tested the effect of different mutations at codons 21, 57, 80, 102 and codon combination 28/102 on Nef-driven replication capacity as well as Nef-mediated CD4 and HLA-I down-modulation ability (Table 2) represented examples of the most distinct Potts model E values for different pairs of amino acids at the same codon. Mutants 57G and 57R both have similar, and very small, replication capacities, but the ability of 57G to down-modulate CD4 is substantially lower than for 57R. In agreement with this disparity, the Potts model assigns a higher E cost for the 57G mutation. Potts model predictions also successfully capture differences in fitness for 80D and 80N. Consistent with the higher E of 80D compared to 80N, the replication capacity of 80D is substantially lower than that of 80N. Next, we find that 102H has lower replication capacity than 102W, though the difference is not large compared to the variance in replication capacity measurements between experimental replicates. However, the HLA-I down-modulation ability of 102W is lower than that of 102H, and is in fact the third smallest among all Nef clones tested here. The Potts model predicts a higher fitness cost for 102W than for 102H, which is consistent with the HLA downmodulation data, but inconsistent with experimental replication capacities. The Potts model predicts higher fitness costs for 28E102W than for 28E102H, and in line with this the replication capacity of 28E102W was significantly lower than 28E102H. Finally, mutant 21E displays much higher replication capacity than 21K, with comparable CD4 and HLA-I down-modulation capabilities. However, the Potts model E cost of 21E is larger than 21K, and thus this difference is not correctly predicted by the model.
In summary, in three out of five instances the Potts model predictions were consistent with fitness/functional differences of different amino acid variants at the same codon. In one of the remaining two cases (102H versus 102W), the fitness/functional difference between the different amino acid variants is ambiguous, and for the other the model prediction is incorrect. In the case of 102H versus 102W, it is possible that the model captures a stronger influence of attenuated HLA-I down-modulation on in vivo fitness.

Analysis of patient-derived Nef sequences: CD4 down-regulation has the strongest connection with E values
Following our initial analysis of thirty-two closely-related Nef mutants, where each of the mutant sequences differed by only one or two amino acids from each other and a maximum of seven amino acids from the MSA, we next explored whether the model energies were also significantly related to the function of patient-derived Nef sequences which were considerably different from the constructed sequences and each other. Comparison between distant sequences is more challenging in part due to the influence of phylogeny, which affects prevalence (i.e. sequences closer to consensus are disproportionately likely to be observed). Prior work suggests that phylogeny particularly influences the inferred h i shown in Equation (1) (Shekhar et al. 2013). The effects of these biases on relative E values are anticipated to be small for closely-related sequences such as the collection of Nef mutants described above. However, for patientderived Nef sequences that can differ by tens of mutations, phylogeny-induced shifts in the E values may make dominant contributions to the energy.
Using 298 subtype C Nef sequences (Mann et al. 2014b), as for the mutant Nef sequences, we observed a statistically significant correlation between the model energies and CD4 down-modulation activities of these patient-derived sequences (q ¼ À0.22, P ¼ 1.5 Â 10 À4 ) (Fig. 4A), although this was a weaker correlation than that observed with the mutant sequences. Interestingly, we found a weaker correlation between the model energies and HLA-I down-modulation ability that was not consistently statistically significant (q ¼ À0.13, P ¼ 2.7 Â 10 À2 ) (Fig. 4B). A limitation of the analyses presented for the mutant Nef sequences and subtype C Nef sequences is that only two to three Nef activities were measured, albeit ones suggested to significantly influence clinical outcome (Iafrate et al. 2000;Swigut et al. 2004;Mwimanzi et al. 2012;Mann et al. 2014b), while Nef displays a multitude of activities that may influence pathogenesis.
However, as detailed in Section 2, such individual comparisons can be misleading because, assuming that E is a good proxy for fitness, a defect in one important Nef function may lead us to infer a high E for the corresponding sequence even though other Nef functions remain intact. We thus employed multiple linear regression to gain insight into the overall shown, together with Spearman correlations. Note that all viruses with substantially impaired ability to down-modulate CD4 or HLA also have low replication capacities.
relationship between E and Nef function, using Nef functional measurements to estimate the E. This also allows us to roughly quantify how much each Nef function contributes to in vivo fitness: although E is only an estimate it represents the totality of Nef function, and our goal was to determine which of the components measured would align more closely (and therefore likely contribute most) to the sum of Nef function. Here we normalised values across functional measurements by converting them to z-scores. Using multiple linear regression, we found that overall these seven Nef functions could explain approximately 25 per cent of the variance in energies (R 2 ¼ 0.26, adjusted R 2 ¼ 0.19, P ¼ 6.2 Â 10 À4 ; see Table 3). CD4 downmodulation had the largest and most significant association with E, suggesting that this function of Nef is particularly important in vivo. We find similar results using quantile regression, a form of linear regression that aims to minimise total absolute residuals, thereby being more robust to outliers (see Supplementary Table S2).

Discussion
In this study we inferred computational models of the HIV-1 Nef fitness landscape and firstly evaluated the models by testing predictions against in vitro measurements of Nef-driven replication capacities as well as the Nef-mediated CD4 and HLA-I down-modulation capacities of thirty-two different Nef mutants. We observed that predictions from models of varying complexities correlated significantly with Nef-driven replication capacities and CD4 down-modulation capacities, but not HLA-I down-modulation capacities, of the various Nef mutants: mutations with higher values of E tended to show lower Nef-driven replication as well as costs to CD4 down-modulation capacity. However, these correlations were weaker than those previously observed between comparisons of fitness landscape-based predictions for other HIV proteins and in vitro and clinical data (Ferguson et al. 2013;Mann et al. 2014a;Barton et al. 2016b;Butler et al. 2016;Louie et al. 2018). For example, correlations of r ¼ À0.81 (Ferguson et al. 2013) and r ¼ À0.83 (Mann et al. 2014a) were previously observed between measured HIV Gag-driven replication capacity and E and similar results (r ¼ À0.74) were also found for HIV Env in recent work (Louie et al. 2018). This difference may be due in part to the more complex way in which Nef contributes to HIV-1 fitness in vivo (Foster and Garcia 2008), which is challenging to comprehensively measure through in vitro tests. It is important to note that, because our fitness model is constructed from sequences derived from patients, it reflects a concatenation of multiple Nef functions (Foster and Garcia 2008). This presents a challenge in using our approach to predict which specific Nef functions of a particular sequence might be impaired because selection for multiple functions contributes to the distribution of sequences that are observed in clinical data. Furthermore these pressures on the different functions of Nef may fluctuate during disease progression (Carl et al. 2001;Lewis et al. 2008). Our model also neglects intergenic epistasis, which could be one additional source of variance especially for the widely-diverged natural Nef sequences which were cloned into the same viral background for experimental measurements. Prior work has estimated that intergenic epistasis is small for HIV (Hinkley et al. 2011), but we cannot rule out its influence. It is also true that our inferred fitness landscape can only be expected to be correct in a statistical way, not necessarily precise for every case. Consistent with this, the experimental data presented here are significantly correlated with model predictions, indicating that the models were able to predict fitness costs in the Nef protein better than chance. Importantly, we observed that the functional experimental outcomes for mutations within known functional motifs involved in Nef-mediated CD4 and/or HLA-I down-modulation were consistent with what was previously documented in the literature, supporting the validity of our functional measurements. Furthermore, we extended our evaluation of the models to patient-derived sequences of Nef, 298 subtype C Nef sequences (for which CD4 and HLA-I down-modulation capacities were A B Figure 4. Relationships between in vitro Nef-mediated CD4 and HLA-I down-modulation capacities of natural subtype C Nef sequences and E values. (A) Energy values both Ising and Potts model energies are significantly negatively correlated with CD4 down-modulation. Note that energies shown here are derived from models trained on subtype C sequence data. (B) Correlation between energies and HLA-I down-modulation capacities are weaker than for CD4 down-modulation. measured) (Mann et al. 2014b) and 150 subtype B Nef sequences (for which 7 different Nef activities were measured) (Mwimanzi et al. 2013a,b;Mahiti et al. 2015Mahiti et al. , 2016Toyoda et al. 2015), which were far apart in sequence space from each other and the MSA from which the landscape was inferred. Nevertheless, we still observed a significant inverse correlation between CD4 downmodulation capacity of these patient-derived Nef sequences and model energies. The subtype B patient-derived Nef sequences represented a more comprehensive analysis of Nef function.
In combination, the seven Nef activities significantly correlated with the model, although only CD4 down-modulation, followed by CXCR4/CCR5 down-modulation (at borderline significance), correlated individually, and CD4 down-modulation capacity contributed the most to the correlation in the multiple linear regression. Thus, in all analyses of mutant and patient-derived Nef sequences, CD4 down-modulation emerged as the Nef activity most strongly aligned with model predictions, suggesting that this Nef activity is the strongest determinant (out of the Nef activities measured here) of the in vivo effect of Nef, although we cannot rule out that other Nef activities not measured here, known or unknown, may be equally or more important. In support of this, in previous work when a Nef motif essential for CD4 down-modulation was disrupted (yet HLA down-modulation function was intact) simian immunodeficiency virus replication was greatly reduced (Iafrate et al. 2000), and studies in the humanised mouse model indicated that CD4 down-modulation plus one or more unknown Nef activities contribute the most strongly to Nef's pathogenic effect (Watkins et al. 2013).
It should be noted that there were some differences between the results of the mutant and patient-derived Nef analyses. For example, in the mutant analysis, HLA down-modulation capacity did not correlate significantly with the model energies; however, in the analysis of the subtype C patient-derived Nef sequences, it did, although not consistently for all model complexities. This discrepancy may be partly due to the fact that only two of the mutants displayed substantially reduced HLA-I down-modulation capacity, while there was a greater spread of values for this parameter in the patient-derived Nef sequences, which limited our ability to accurately assess correlations between HLA-I down-modulation and other measurements in the mutant group. The overall correlation values with individual functional measurements are also modest, and thus larger sample sizes are needed to observe them reliably.
The Potts model is expected to be more reliable than the Ising model in predicting the fitness costs of mutations in highly mutable proteins such as Nef since the Potts model has residue-specific resolution. The Ising model does not consider the identity of the mutant amino acid but rather uses a binary approximation (all mutant amino acids are denoted as one and the wild-type amino acid is denoted as zero) which we expected may not be sufficient for highly variable proteins. However, while the E values of all models correlated significantly with replication capacity and CD4 down-modulation of the mutant Nef sequences, differences between the models could not be determined in a statistically significant way (Supplementary  Table S1). We also observed that the values of E assigned by the Potts model to different mutations at the same codon correctly reflected the functional/fitness consequences of these mutations in three out of five cases studied, was ambiguous in one case, and incorrect in one case.
In aggregate, we observe that the performance of the Ising and Potts models is similar. The surprising success of the Ising model may be because, in most cases, the most common mutation at a particular codon was tested. The binary approximation of the Ising model is expected to be more representative of the most common mutations rather than rare ones (Mann et al. 2014a). The difference between the expected and experimental outcomes might also be due in part to the inclusion of less wellsampled and thus noisier mutation information, in particular for the most complex models (S T > 0.8, where performance more clearly degrades). Essentially, while considering multiple amino acids provides more information about the sequence sequences. In order to compare different functional measurements versus energies on roughly equal terms, all functional measurements were converted to zscores (i.e. the plotted values represent individual functional measurements minus the mean, divided by the standard deviation). The association is strongest for CD4 down-modulation (see Section 3 for details). distribution, the negative effect of additional noise may potentially outweigh the gain in information. While our data may potentially caution against overly complex models, more extensive tests will be required in order to thoroughly test the association between model complexity and predictions of viral function.
Our study has additional limitations. Firstly, only the downmodulation of HLA-A*02 was measured for the mutant Nef sequences and the subtype C patient-derived sequences (while HLA-A*24:02 was measured for the subtype B patient-derived sequences). However, in a previous study Nef-mediated HLA-A*02 down-modulation was highly correlated with HLA-B*07 down-modulation (Spearman q ¼ 0.89, P < 1 Â 10 À4 ) (Mann et al. 2013). Additionally, HLA down-modulation occurs through a sequence shared by the cytoplasmic tail of HLA-A and HLA-B molecules (Le Gall et al. 1998). Therefore, the ability of Nef to down-modulate HLA-A and HLA-B molecules may be generally represented by its ability to down-modulate HLA-A*02. However, it is possible that the ability of Nef to down-modulate HLA may differ depending on the HLA allele (Mahiti et al. 2016) and therefore it may be worthwhile to measure downmodulation of different HLA alleles. The second limitation was that even though Nef is involved in many cellular functions (Foster et al. 2011), we focussed only on replication capacity, CD4 down-modulation, and HLA-I down-modulation for the mutant Nef sequences. Using in vitro assays that only measure limited known functions of the multifunctional Nef protein, for which there may be yet unknown important functions, is a significant limitation for our study. However, while the functions we measured through our in vitro assays do not represent a comprehensive list, it was shown that they are important in vivo and do influence clinical outcome (Iafrate et al. 2000;Swigut et al. 2004;Mwimanzi et al. 2012;Mann et al. 2014b). Furthermore, this limitation was somewhat mitigated by our analysis of 150 subtype B patient-derived Nef sequences for which 7 different Nef activities were measured. Lastly, we utilised PBMCs, since the presence of Nef is not essential for viral replication in many immortalised T cell lines (Lundquist et al. 2002), but PBMCs are highly variable between donors (Brockman et al. 2006). Nevertheless, we measured Nef-driven replication capacities using two different donors in this study and the measurements of the two donors were in good agreement. Variance between replication capacity measurements in different donors does limit fine distinctions between replication capacities, however.
In conclusion, the experimental data was in significant agreement with predictions of the fitness consequences of mutations in the Nef protein, and with continuing validation efforts (Ferguson et al. 2013;Mann et al. 2014a; Barton et al. 2016b;Butler et al. 2016;Chakraborty 2017;Chakraborty and Barton 2017), these models can be used to direct immunogen design in a manner similar to that outlined for the Gag protein (Ferguson et al. 2013).