Elucidating the Molecular Determinants of Aβ Aggregation with Deep Mutational Scanning

Despite the importance of Aβ aggregation in Alzheimer’s disease etiology, our understanding of the sequence determinants of aggregation is sparse and largely derived from in vitro studies. For example, in vitro proline and alanine scanning mutagenesis of Aβ40 proposed core regions important for aggregation. However, we lack even this limited mutagenesis data for the more disease-relevant Aβ42. Thus, to better understand the molecular determinants of Aβ42 aggregation in a cell-based system, we combined a yeast DHFR aggregation assay with deep mutational scanning. We measured the effect of 791 of the 798 possible single amino acid substitutions on the aggregation propensity of Aβ42. We found that ∼75% of substitutions, largely to hydrophobic residues, maintained or increased aggregation. We identified 11 positions at which substitutions, particularly to hydrophilic and charged amino acids, disrupted Aβ aggregation. These critical positions were similar but not identical to critical positions identified in previous Aβ mutagenesis studies. Finally, we analyzed our large-scale mutagenesis data in the context of different Aβ aggregate structural models, finding that the mutagenesis data agreed best with models derived from fibrils seeded using brain-derived Aβ aggregates.

treatment with the competitive DHFR inhibitor methotrexate, yeast expressing soluble Ab variants grow rapidly, whereas yeast expressing aggregating Ab variants grow slowly.
Mutagenesis can elucidate the role of individual residues in protein aggregation. For example, in vitro proline (Williams et al. 2004) and alanine (Williams, Shivaprasad, and Wetzel 2006) scanning mutagenesis of Ab 40 revealed core regions important for aggregation. However, we lack even this limited mutagenesis data for the more disease-relevant Ab 42 and, so far, the majority of mutagenesis studies have been performed in vitro.
Thus, to fully understand the molecular determinants of Ab 42 aggregation in a cell-based system, we combined the yeast growth-based aggregation assay with deep mutational scanning (Araya and Fowler 2011; to measure the effect of 791 of the possible 798 single amino acid substitution on the aggregation propensity of Ab 42 . We used high-throughput DNA sequencing to track the frequency of each Ab 42 variant during the selection, enabling us to assign a solubility score to every variant. We present the first large-scale, cell-based mutational analysis of Ab, illuminating the physicochemical properties of amino acids that abrogate, promote or do not affect Ab aggregation. Of 791 single amino acid Ab variants we evaluated, 75% maintained or increased aggregation. In addition, we identified 11 positions at which substitutions, particularly to hydrophilic and charged amino acids, disrupted Ab aggregation. These critical positions were similar but not identical to critical positions identified in previous Ab mutagenesis studies. Finally, we analyzed our large-scale mutagenesis data in the context of different Ab aggregate structural models, finding that some structures were plausible whereas others were not.

Library construction
The library was cloned using in vivo assembly (García-Nafría et al. 2016). First, a forward primer containing a 59 homology region, an NNK codon, and a 39 extension region was designed for each codon in Ab 42 (Table S1). The homology and extension regions were at least 15 nucleotides in length and had melting temperatures greater than 55C. Reverse primers were the reverse complement of the 59 homology region.
A separate PCR reaction was performed for each codon. These reactions contained 40 ng template (p416GAL1-Ab-DHFR) and 10 mM forward and reverse primers (IDT, custom oligos) in a total reaction volume of 30 mL. The following cycling conditions were used: 95C 3min, 8x [ 98C 20 sec, 60C 15 sec, 72C 9 min], 72C 9 min. After PCR, 7.5 mL of each product was run on a 1.5% agarose gel for 30 min at 100V to check for a single product. The remaining 22.5 mL aliquots of product were each digested for an hour at 37C with 0.6 mL of DpnI (NEB, R0176S). After digestion, 4 mL of each linear product was transformed into a 50 mL of TOP10F Chemically Competent E. coli (ThermoFisher, C303003) according to manufacturer's instructions, with the following modifications: the protocol was done in a 96 well plate, and cells were recovered in a total volume of 200 mL SOC. After recovery, cells were transferred to a deep well plate with 1.6-1.8 mL of ampicillin LB and shaken overnight. To estimate colony count, 50 mL of culture was plated on an LB + ampicillin agar plate. Deep well plates and agar plates were incubated at 37C overnight. After incubation, all 42 deep well plate cultures were combined and subject to Midiprep (Sigma, NA0200).

Methotrexate selection assay
Transformed yeast were inoculated into 5 mL (low-throughput) or 300 mL (co-culture and deep mutational scan) of C-Ura, 2% glucose media, grown in a rotating/shaking, 30C incubator overnight and then transferred to 5 mL or 300 mL 2% raffinose media to remove the glucose repression acting on the gal1 promoter. After two hours in 2% raffinose, yeast were back-diluted to an OD of 0.01 into 5 mL or 300 mL 2% galactose to induce Ab 42 -DHFR expression in the presence or absence of 80 mΜ methotrexate (TCI America, M-1664) and 1 mM sulfanilamide (Sigma, S-9251). In 5 mL experiments, yeast growth was measured over 48h using a spectrophotometer that detects 660 nm wavelengths. The following equation was used to calculate doubling times from two time points: (Log 10 (OD T2 /OD T1 )/ Log 10 (2))/ DΤ, where OD represents the optical density at 600nm at a time point (T). For co-culture experiments, yeast with aggregating and nonaggregating variants were inoculated at equal densities in 300 mL. Ten OD units of yeast were collected from 300 mL cultures every 12h, spun down, concentrated and stored in -80C. At the end of the experiment, frozen yeast were thawed and then their plasmids were extracted using a DNA Clean and Concentrator kit (Zymo Research, D4013). Extracted plasmids were prepped and sequenced using Sanger sequencing. For the deep mutational scan, 300 mL cultures were sampled at the following timepoints: input, 28h (OD 1.0), 31.5h (OD 2.0), 35h (OD 3.0), 38h (OD 4.5), and 40h (OD 6.0). Cultures were spun down, concentrated and stored in -80C. Plasmids were extracted from yeast with Yeast Plasmid Miniprep 1 kit (Zymo Research, D-2001). Library fragments were amplified in 17 PCR cycles using primers specific to DNA sequences that flank Ab-DHFR in p416, and sequenced by an Illumina NextSeq sequencer using paired-end reads (Table S1).

Variant effect analysis
Enrich2 was used to calculate solubility scores for each Ab variant from sequencing fastq files (Rubin et al. 2017). The Enrich2 pipeline calculates a variant's score in three steps. First, a variant's normalized frequency ratios are tabulated for each timepoint by dividing the frequency of a variant's sequencing reads by all mapped reads and normalizing by the wild-type frequency ratio. Sequencing reads were stringently filtered for quality; we require each base have a Phred score greater than 20 and no uncalled bases. Second, a weighted linear least squares regression line is fit to the normalized frequency ratios across time points. Third, the slope of the regression line is calculated, averaged across the three replicates and log 2 transformed. This averaged log 2 slope reflects a variant's aggregation propensity. Solubility scores below 0 denote variants that are more aggregation-prone than wildtype, whereas scores above 0 indicate that a variant has increased solubility compared to wild-type.
Classifying Ab variants using synonymous mutations Variant classifications (i.e., WT-like, more aggregation-prone, more soluble) were assigned using the distribution of 39 synonymous mutations from the deep mutational scan. We define WT-like as any variant with a score within 6 2 SD of the synonymous variant mean [-0.26,0.39]. A variant is more-aggregation prone than wildtype if its score is greater than 0.39 or more soluble if its score is lower than -0.26.

RESULTS
First, we verified that the DHFR-based yeast aggregation assay could differentiate between aggregating wild type Ab (Ab WT ) and a nonaggregating (Ab 19 FD ) variant (Morell et al. 2011). As expected, in a mixed culture treated with methotrexate, Ab 19 FD outcompeted Ab WT ( Figure 1B). We used fluorescence microscopy of Ab-GFP fusions to confirm that 30-70% of yeast expressing Ab WT -GFP had cytoplasmic punctae compared to 0-20% of cells expressing Ab 19 FD -GFP across five fields of view ( Figure 1C-D). Thus, we concluded the assay could be used in a deep mutational scan to measure the aggregation propensity of variants of Ab.
Using this assay, we conducted a deep mutational scan of Ab that yielded solubility scores for 791 single amino acid variants, representing 99.1% of the possible single variants. Solubility scores were calculated by taking the weighted least squares slope of each variant's frequency ratios across six time points. (see Methods). The slopes from each replicate were well correlated (Pearson's R 0.78 to 0.92; Figure 2A, Figure S1A). Replicate slopes were averaged and log 2 transformed to produce final solubility scores such that wild-type had a solubility score of zero (Table S2). Positive solubility scores indicated less aggregation and negative scores indicated increased aggregation.
Solubility scores ranged from -2.38 (most aggregating) to 1.45 (most soluble). The mean (median) solubility score for all variants was 0.09 (0.08), which was similar to the solubility scores of the 39 synonymous variants in our library (mean: 0.06; median: 0.08). Because we did not expect synonymous variants to affect aggregation propensity, we used their distribution of scores to identify WT-like variants ( Figure 2B). In total, we found that 344 (43.4%) of Ab variant scores were within two standard deviations of the synonymous score mean and thus had WT-like effects (WT-like range: [-0.26,0.39]). Additionally, we found 246 (31.1%) variants to be more aggregation-prone than Ab WT and 201 (25.4%) variants to be more soluble. Therefore, 75% of Ab variants maintained or increased the peptide's propensity to aggregate in yeast cells.
To verify that our deep mutational scan accurately measured variant effects on aggregation, we tested six Ab variants, G38F, K16V, A42V, 19 Figure 2 A-F. Solubility scores for 791 Ab variants. Solubility scores reliably measure the effects of Ab sequence on aggregation propensity. A scatter plot shows the correlation between two of three biological replicates that were averaged to yield final solubility scores (A; Figure S1A). The distribution of solubility scores (x-axis) of synonymous variants was used to determine cutoffs that define variants that are wild-type-like or more/less aggregation-prone than wild-type. The density plot shows distributions of nonsynonymous (light gray) and synonymous (dark gray) variants and the white lines show the lower (-0.26) and upper (0.39) bounds for wild-type-like variants (B). The scatterplot shows the correlation between our solubility scores (y-axis) and a low-throughput yeast growth assay that measured yeast growth rate as a proxy for Ab solubility (C; Figure S1B). The heatmap shows the effect of 791 Ab variants on solubility with Ab positions on the x-axis and mutant amino acids on the y-axis. A variant's color denotes its solubility: red is most soluble, white is wild-type-like and, dark blue is most aggregated, whereas yellow variants are missing from our variant library and dots denote the wild-type amino acid at a given position. The annotation tracks on the x-and y-axes display the hydrophobicity of each wild-type and mutant amino acid, respectively. The heatmap's y-axis has been re-ordered using hierarchical clustering on the solubility score vectors (D). For each position, the mean solubility score at each position is depicted using the same color scheme as the main heatmap. Additionally, the mean solubility scores for all hydrophobic and polar substitutions are shown (E; Figure S2A). Heirarchical clustering on the x-axis yielded 6 distinct clusters: 1 (red), 2 (orange), 3 (yellow), 4 (green), 5 (light blue), and 6 (dark blue; F; Figure S2B-C).
FY, L17S and L34R, that spanned the solubility score range in a lowthroughput validation assay. The growth rate of methotrexate-treated yeast expressing each Ab variant was measured and compared to the aggregation propensity scores ( Figure 2C, S1B). We found that low-throughput assay results strongly correlated with the solubility scores derived from deep mutational scanning (R 2 = 0.98). Thus, our deep mutational scan reliably measured Ab variant aggregation propensity in the yeast assay.
To explore the effects of each amino acid substitution on Ab aggregation, we created an Ab sequence-aggregation map ( Figure 2D). Substitutions to aspartic acid and proline were most associated with Ab solubility, as evinced by their median scores of 0.64 and 0.56, respectively ( Figure S2A). Conversely, the most aggregation-associated substitutions were hydrophobic tryptophan and phenylalanine, with scores of -0.60 and -0.51, respectively. Moreover, hierarchical clustering of all 791 solubility scores by amino acid revealed that hydrophobic substitutions, except alanine, clustered together and were associated with greater aggregation than other classes of substitutions.
Next, we characterized each position in Ab based on its mutational profile. Hierarchical clustering of variant solubility scores by position identified six distinct clusters (Figures 2E-F; S2B-C). In cluster 1, comprising positions 17-20, 31-32, 34-36, 39 and 41, substitutions tended to decrease Ab aggregation compared to substitutions in other clusters (cluster 1 mean solubility scores = 0.64, all other clusters = -0.28; Figure  S2D). In cluster 1, even substitutions to hydrophobic amino acids slightly decreased aggregation (mean solubility score = 0.17). The effects of substitutions in cluster 2 were similar to but less extreme than in cluster 1. Both clusters 1 and 2 are largely comprised of hydrophobic positions in the wild type Ab sequence. Indeed, 80% of Ab positions with hydrophobic wild type residues are in clusters 1 and 2. In stark contrast, within clusters 4, 5 and 6, hydrophobic substitutions generally increase protein aggregation (all mean: -0.15, -0.12 and -0.45; hydrophobic means: -0.29, -0.65, and -1.04). Cluster 3 contains only two positions, 37 and 38. Here, every substitution except proline increased aggregation (all mean: -0.99, hydrophobic mean: -1.56). Given that cluster 1 is characterized by hydrophobic positions where hydrophilic substitutions profoundly decreased aggregation, we suggest that this cluster defined buried b-strands in the Ab sequence.
Next, we compared our solubility scores to previous alanine and proline scans which reported Ab 40 fibril thermodynamic stability in vitro (DDG). DDG values were determined by measuring variant Ab monomer concentration remaining in solution after fibril formation reached equilibrium (Williams et al. 2004;Williams, Shivaprasad, and Wetzel 2006). We found that the effects of proline substitution in our assay were correlated with proline DDG values (R 2 = 0.40), while the effects of alanine substitutions in our assay were less correlated with alanine DDG values (R 2 = 0.17; Figure 3A). In our alanine and proline comparisons, we found the greatest correlation at positions 17-20 and 31-32, where substitutions decreased aggregation in all studies ( Figure  S3). The most notable disagreement between studies was for alanine substitutions at positions 37 and 38. In our assay, alanine substitutions caused a profound increase in aggregation, whereas the in vitro alanine scan showed the opposite effect.
We also compared our buried b-stand positions from cluster 1 to b-stands proposed based on the in vitro alanine and proline scans, finding some concordance ( Figure 3B). The single amino acid scans identify three regions that disrupt fibril elongation thermodynamics when mutated. The regions include positions 15-21, 24-28, and Figure 3 Comparison of yeast cell-based solubility scores to in vitro aggregation measurements and Ab structural models. The scatterplot shows the correlation between our solubility scores (y-axis) and two single amino acid scans that measured the effect of proline (orange) or alanine (teal) variants on the thermodynamic stability of aggregates, relative to wild type (DDG) (A; Figure S3). The first two tracks show unmeasured mutations (dashed gray) and the Ab buried b-strand positions (black) suggested by proline scanning alone, or by proline and alanine scanning together (Williams et al. 2004;Williams, Shivaprasad, and Wetzel 2006). The third track shows positions with the greatest increase in solubility when mutated in our large-scale mutagenesis study, found in cluster 1 (B). The next nine tracks show the secondary structure of nine models of Ab aggregate structure for each Ab position (x-axis; C). The Ab wild-type sequence is shown at the top. 31-36 for the proline scan and positions 18-21, 25-26, and 32-33 for the combined alanine and proline scans (Williams et al. 2004;Williams, Shivaprasad, and Wetzel 2006). Given the generally highly disruptive nature of proline substitutions (Gray et al. 2017), it is not surprising that the proline scan would nominate many positions. Our deep mutational scan, on the other hand, does not reveal a central b-strand or strong decrease in aggregation with alanine or proline substitution from positions 24-28. We speculate that this difference is due either to the distinct experimental approaches used or to the different Ab species (Ab 40 vs. Ab 42 ).

DISCUSSION
We used deep mutational scanning to characterize 791 Ab variants in a yeast-based aggregation assay. Proline and aspartic acid substitutions were most disruptive of Ab aggregation, while tryptophan and phenylalanine increased aggregation most. Additionally, we used unsupervised clustering to determine the regions of Ab most important for aggregation. We conclude that these regions are most likely to form buried b-stands, which are necessary for aggregation and sensitive to amino acid substitutions (Jahn et al. 2010;Abrusán and Marsh 2016). These include positions 17-20, 31-32, 34-35, 39 and 41. While other positions could also form b-stands, the positions in cluster 1 are most likely to form the buried cores of Ab aggregates in our cell-based assay.
Due to the noncrystalline nature of Ab fibrils, traditional techniques such as X-ray crystallography and solution-state NMR cannot be used to solve Ab's aggregate structure. Instead, structural models have been developed by amassing constraints, such as the direction and register of b-sheets. For example, solid-state nuclear magnetic resonance studies suggest that Ab fibrils are parallel, in register b-sheets Gregory et al. 1998;Antzutkin et al. 2002;Tycko 2011). Many of these structural models are problematic because they are generated from constraints derived from in vitro experimental data, which may not be representative of in vivo conditions.
Given that we collected large-scale mutagenesis data in a cell-based system, we examined how our results compared to structural models of Ab fibrils. Some models such as 1IYT (Crescenzi et al. 2002) and 2NAO (Wälti et al. 2016), showed very little to no overlap with either our proposed buried b-strands or those proposed by Williams et al. (2004Williams et al. ( , 2006 (Figure 3C). Other models contained three b-strand regions reminiscent of those suggested by Williams et al. (2004Williams et al. ( , 2006: 2MXU (Xiao et al. 2015), 5KK3 (Colvin et al. 2016), and 5OQV (Gremer et al. 2017). Yet other models propose b-strand patterns more similar to ours. These include 2BEG (Lührs et al. 2005), 2LNQ (Gremer et al. 2017), 2LMP and 2LMN (Lu et al. 2013). Since our b-strand patterns were derived from data gathered in a cell-based assay, we hypothesized that they would be most consistent with structural models based on in vivo-derived fibrils. Indeed, the 2LMP and 2LMN models were based on fibrils seeded from plaques isolated from the brains of individuals afflicted by Alzheimer's disease. Moreover, every model besides 2LMP and 2LMN was constructed using NMR or cryo-EM data from laboratory grown fibrils. These models are less concordant with our cell-based mutational data, which suggests that there are important structural differences between in vitro and in vivo derived fibrils.
Two major differences exist between the experimental conditions used by Williams et al. (2004Williams et al. ( , 2006 and in our work, and may explain the difference in b-strands proposed in our respective in vitroand in vivo-derived models. First, Williams et al. (2004Williams et al. ( , 2006 incubate Ab in the absence of any other proteins, while our yeast-based system provides key players that affect protein aggregation, such as chaperone proteins and molecular crowding. Second, Williams et al. (2004Williams et al. ( , 2006 incubate Ab peptides at 37C, whereas our yeast-based experiments required a lower temperature of 30C. This temperature difference may yield differences in folding kinetics. Further experiments are required to determine the contribution of these experimental differences to b-strand formation in Ab. Deep mutational scanning data could contribute to the investigation of Ab fibril structure beyond the analysis of existing models we present. For example, others have used site-saturation mutagenesis and deep mutational scanning data to evaluate proposed structural models (Bajaj et al. 2008;Khare et al. 2019). Additionally, deep mutational scanning data have now been used to generate distance constraints for the prediction of tertiary protein structure (Schmiedel and Lehner 2018;Rollins et al. 2018).
In summary, we used deep mutational scanning to elucidate the effects of amino acid substitutions on Ab aggregation in a cell-based model. We used these large-scale mutagenesis data to propose positions critical for Ab aggregation. Our results conflict with some previous in vitro reports of the effects of substitutions on Ab aggregation and with some models of Ab fibril structure. This outcome highlights the difficulties of studying protein aggregation and emphasizes the potential utility of in vivo or cell-based models. We suggest that deep mutational scanning of other aggregation-prone proteins such as a-synuclein or transthyretin could help reveal the relationship between sequence, structure and aggregation.