Coevolution between MHC Class I and Antigen-Processing Genes in Salamanders

Abstract Proteins encoded by antigen-processing genes (APGs) provide major histocompatibility complex (MHC) class I (MHC-I) with antigenic peptides. In mammals, polymorphic multigenic MHC-I family is served by monomorphic APGs, whereas in certain nonmammalian species both MHC-I and APGs are polymorphic and coevolve within stable haplotypes. Coevolution was suggested as an ancestral gnathostome feature, presumably enabling only a single highly expressed classical MHC-I gene. In this view coevolution, while optimizing some aspects of adaptive immunity, would also limit its flexibility by preventing the expansion of classical MHC-I into a multigene family. However, some nonmammalian taxa, such as salamanders, have multiple highly expressed MHC-I genes, suggesting either that coevolution is relaxed or that it does not prevent the establishment of multigene MHC-I. To distinguish between these two alternatives, we use salamanders (30 species from 16 genera representing six families) to test, within a comparative framework, a major prediction of the coevolution hypothesis: the positive correlation between MHC-I and APG diversity. We found that MHC-I diversity explained both within-individual and species-wide diversity of two APGs, TAP1 and TAP2, supporting their coevolution with MHC-I, whereas no consistent effect was detected for the other three APGs (PSMB8, PSMB9, and TAPBP). Our results imply that although coevolution occurs in salamanders, it does not preclude the expansion of the MHC-I gene family. Contrary to the previous suggestions, nonmammalian vertebrates thus may be able to accommodate diverse selection pressures with flexibility granted by rapid expansion or contraction of the MHC-I family, while retaining the benefits of coevolution between MHC-I and TAPs.


Introduction
Adaptive immunity is a major vertebrate innovation (Müller et al. 2018). The major histocompatibility complex (MHC) is a key player in the adaptive immunity of jawed vertebrates (Flajnik 2018). Classical MHC proteins present antigenic peptides to T cells, which, upon recognition of foreign antigens, trigger an adaptive immune response. Classical class I molecules (MHC-I) enable general surveillance of the translational activity inside cells, by presentation on the cell surface of antigens derived from intracellular proteins (including those of viruses and intracellular bacteria). The antigens are generated and loaded onto MHC-I molecules in a carefully orchestrated process (Blum et al. 2013;Blees et al. 2017;Fisette et al. 2020;Trowitzsch and Tamp e 2020). First, antigenic peptides are produced by a dedicated version of the proteasome: immunoproteasome. The three immunoproteasome-specific catalytic units are encoded by PSMB8 (LMP7), PSMB9 (LMP2), and PSMB10 (MECL1) genes (reviewed in Murata et al. [2018]). Next, the peptides produced by the immunoproteasome are pumped from the cytoplasm to the lumen of the endoplasmatic reticulum by a specialized, heterodimeric transporter associated with antigen presentation (TAP, encoded by the TAP1 and TAP2 genes). Peptides are then loaded onto MHC-I molecules with the help of several proteins, including tapasin (encoded by the TAPBP gene), a chaperone and mediator of TAP-MHC-I interaction. Tapasin stabilizes an empty MHC-I molecule and ensures loading of high-affinity peptides. The collective term antigen-processing genes (APGs) is used for the genes encoding PSMB, TAP, and TAPBP proteins (e.g., McConnell et al. 2016;Palomar et al. 2021). Finally, the antigen-MHC-I complex moves via a secretory pathway to the cell surface.
The above description applies to the classical MHC-I genes, which are highly expressed in multiple tissues, highly polymorphic, and encode proteins presenting antigenic peptides to the cytotoxic CD8þ ab T cells. In addition, so-called nonclassical MHC-I genes, often, but not always linked to the classical MHC-I genes, have been identified in all vertebrates studied to date (Adams and Luoma 2013). They encode molecules similar in sequence and structure to the classical MHC-I, but of limited polymorphism and tissue expression. Their functions are less well defined, but often involve presentation of specialized antigen types, interaction with other cell types, such as NK or NKT cells (Adams and Luoma 2013;Edholm et al. 2013), and may or may not require input from APGs (Braud et al. 1998;Tilloy et al. 1999).
The dependence of classical MHC-I on ligands supplied by the products of APGs sets the stage for coevolution between APGs and MHC-I within species (Kaufman 1999). In its essence, the coevolution hypothesis posits that some combinations of APG and MHC-I alleles work better together than others because selection has optimized their interaction (Kaufman 1999(Kaufman , 2015. High polymorphism, maintained mainly by the arms-race between hosts and their pathogens, is a defining feature of both classes of classical MHC (Radwan et al. 2020). This polymorphism, concentrated in the peptidebinding groove of MHC molecules, affects the spectrum of antigens that can be bound. As APG products should supply antigenic peptides matching the requirements of MHC-I alleles, APG polymorphism is expected under coevolution as well. With both coevolving partners exhibiting high levels of polymorphism, an efficient system would require little or no recombination between them-frequent recombination would impose a heavy genetic load, separating coadapted alleles and thereby reducing the fitness of recombinant haplotypes (Kaufman 1999). Tight linkage between APG and MHC-I enables coevolution and is thus a key prediction of the hypothesis that has been confirmed in several vertebrate groups, such as frogs (Ohta et al. 2006), salamanders (Palomar et al. 2021), and birds . Mammals are a notable exception (Horton et al. 2004), where generalist APGs serve all MHC-I alleles, presumably after the linkage between MHC-I and APGs was broken by an inversion (Kaufman 2015). Tight linkage was proposed as the ancestral gnathostome condition (Ohta et al. 2006), leading to the hypothesis that APG-MHC-I coevolution is also an ancestral gnathostome feature (reviewed in Ohta and Flajnik [2015] and Kaufman [2018b]).
By one view, coevolutionary fine-tuning between the APGs and MHC-I should lead to a single highly expressed classical MHC-I gene, as observed, for example, in the chicken ) and the frog Xenopus (Ohta et al. 2006), whereas the relaxation of coevolution would allow the appearance of a multigenic classical MHC-I, as seen in mammals (Kaufman 2018b). In this view, coevolution may be considered as an ancestral state that limits the flexibility of the adaptive immune response by restricting the number of classical MHC-I genes. However, evidence for multiple expressed MHC-I genes has accumulated in nonmammalian vertebrates: fishes McConnell et al. 2014;Grimholt et al. 2015), amphibians (Sammut et al. 1999;Fijarczyk et al. 2018;Palomar et al. 2021), and birds (Drews and Westerdahl 2019). At least some species in these groups maintain tight linkage between MHC-I and APGs (McConnell et al. 2016;Palomar et al. 2021). This suggests that either coevolution was relaxed in these species, or coevolution does not preclude establishment, following gene duplication, of multiple highly expressed MHC-I genes. Indeed, already in the early days of the coevolution hypothesis, it was postulated that multiple genes with similar peptide-binding specificities could coevolve on a haplotype with a single set of APG alleles (Kaufman 1999). An accumulation of MHC-I diversity in a population would then be coupled with an increase in the overall APG diversity, whereas monomorphic "average best fit" APGs would be expected to evolve in the absence of coevolution.
To date, most of what we know about coevolution between MHC-I and APGs comes from experimental studies in model systems with simple MHC, such as the chicken (Walker et al. 2011;van Hateren et al. 2013;Kaufman 2015). Such studies are invaluable in providing a detailed mechanistic understanding of the coevolutionary process, but few systems are amenable to full scale immunological and immunogenetic analyses. The question of whether APG-MHC-I coevolution is a widespread phenomenon should be addressed with studies over broader phylogenetic scales because the key predictions of the hypothesis can be tested within a comparative framework. Crucially, the coevolution hypothesis predicts a positive correlation between intraspecific APG and MHC-I diversity when differences in background genetic diversity are controlled. However, this prediction has not been tested so far, and, whereas MHC-I polymorphism has been studied in dozens of species from all major vertebrate groups, information on APG polymorphism is scarce outside mammals (Ohta et al. 2003;Miura et al. 2010;MHC-I and APGs in Salamanders . doi:10.1093/molbev/msab237 MBE Walker et al. 2011;Huang et al. 2013;van Hateren et al. 2013;Fijarczyk et al. 2018).
Salamanders (Urodela) are characterized by a long, independent history -they diverged from other extant amphibian lineages approximately 250-300 Ma, with a crown age estimated at approximately 180-200 Ma, (Marjanovi c and Laurin 2014;Irisarri et al. 2017;Jetz and Pyron 2018). Salamanders exhibit a combination of features that makes them a suitable model to test the coevolution hypothesis. At least some APGs are polymorphic in certain species (Huang et al. 2013;Fijarczyk et al. 2018;Palomar et al. 2021), as expected under the coevolution model. On the other hand, salamanders studied so far have multiple highly expressed MHC-I genes (Sammut et al. 1999;Fijarczyk et al. 2018;Palomar et al. 2021). This stands in contrast to predictions of the coevolution model which, as currently formulated, predicts a single, highly expressed classical MHC-I gene. We note, however, that establishing the classical nature of the MHC-I by sequence analysis alone is challenging (Sammut et al. 1999), whereas functional data are lacking in salamanders. To date, the only functional studies of nonclassical MHC-I in amphibians were conducted in Xenopus (Edholm et al. 2013;Robert and Edholm 2014;Banach et al. 2017), but Xenopus nonclassical class I genes appear to lack orthologues in salamanders (Sammut et al. 1999;Edholm et al. 2014).
Recently, we used salamanders to test two predictions of the coevolution hypothesis: 1) tight linkage between APGs and MHC-I, and 2) a signal of adaptive evolution in APGs (Palomar et al. 2021). First, we directly estimated the recombination rate between APG and MHC-I in Lissotriton newts by examining products of over 1,500 meioses. No recombination was detected between MHC-I and four of the five analyzed APGs, whereas the total map length of the region spanning multiple MHC-I genes and all five APGs was less than 0.5 cM. The extremely limited recombination between MHC-I and APGs in Lissotriton and close physical proximity of these genes revealed by the recent chromosomal-scale assembly of the axolotl genome (Schloissnig et al. 2021) suggest that salamanders fulfil a condition for coevolution. Second, we used the coding sequences of APGs derived from transcriptomes of over 40 salamander species to test for signatures of positive selection over evolutionary timescale. The signal of adaptive evolution was subtle and restricted mostly to TAP1 and TAP2 genes. We concluded that coevolution between APGs and MHC-I cannot be ruled out, but it may involve only some APGs, in particular TAPs, and its mechanisms would need to accommodate MHC-I duplication. We proposed that a major prediction of the coevolution hypothesis-a positive correlation between genetic variation of APGs and MHC-Ishould be tested in a comparative framework. Here, we perform such an analysis.
To test for the correlation between APG and MHC-I diversity, we examined 30 salamander species widely sampled from the Urodela tree of life ( fig. 1). Our sampling included representatives of six out of nine currently recognized salamander families (Frost 2021), and spans the most recent ancestor of all living species. We used this data set to assess the diversity of: 1) MHC-I genes, 2) five APGs (PSMB8, PSMB9, TAP1, TAP2, and TAPBP), and 3) five non-APGs (BRD2, DAXX, KIFC1, RXRBA, RGL2)-protein coding genes that are physically (Palomar et al. 2021), but not functionally, tightly linked to MHC-I. Sequence polymorphism was measured both at synonymous codon positions and at the amino acid sequence level. Diversity was estimated both at the individual and at the species level, using measures applicable to all examined genes. These measures, adopted from biodiversity studies (Chao et al. 2014), allow a comprehensive characterization of diversity, taking into account sequence divergence between alleles as well as differences in copy number among genes and individuals. We then fitted several phylogenetic generalized least squares (PGLS) models to the data, to test whether APG diversity could be explained by MHC-I and non-APG diversity, as predicted by the coevolution hypothesis.

Sequencing and Polymorphism
Samples Diversity of MHC-I, APGs, and non-APGs was studied, by targeted sequencing of genomic DNA, in 30 species representing 16 genera (23% of salamander genera), and six out of nine Urodela families that comprise approximately 98% salamander species ( fig. 1). One to four (median ¼ 2) populations and 15-65 (median ¼ 35) individuals per species were examined (supplementary table S1, Supplementary Material online).
MHC-I Polymorphism was assessed using Illumina amplicon sequencing for 214-224 bp of exon 2 and 166-184 bp of exon 3, depending on genus and not counting indels that caused some sequences to depart from the canonical length. Full codon data available for all species covered amino acid positions 8-79 (exon 2) and 109-162 (exon 3) of the human HLA-A protein a domain. A total of 2,796 and 3,133 unique sequence variants were detected in exon 2 and exon 3, respectively (supplementary fig. S1, Supplementary Material online). Although we cannot assign these variants to loci, for simplicity, we will refer to them as "alleles." Nonetheless, the alleles come from multiple genes as individuals typically carry more than two MHC-I variants. Genotyping repeatability, averaged over all species, was 91.4% for exon 2 and 97.4% for exon 3 (supplementary table S2, Supplementary Material online). The fraction of nonfunctional alleles (exhibiting frameshifts or internal stop codons) was generally low (on an average 5.6% and 4.4% for exons 2 and 3, respectively), but with considerable variation among species (supplementary table S3, Supplementary Material online). The mean perindividual number of potentially functional alleles ranged from 2.6 (Desmognathus fuscus, Plethodontidae) to 21.1 (Lissotriton helveticus, Salamandridae) in exon 2, and from 1.8 (D. fuscus) to 31.3 (Proteus anguinus, Proteidae) in exon 3, indicating substantial differences in the number of MHC-I genes among salamander species (supplementary table S3, Supplementary Material online). The assayed fragments of exons 2 and 3 covered, respectively, one and four residues Palomar et al. . doi:10.1093/molbev/msab237 MBE which are important for anchoring the termini of antigenic peptides and are conserved in classical MHC-I of most taxa. The alleles preserving the conserved amino acids at these residues formed the "conserved anchor" data set. This data set was intended to minimize the fraction of nonclassical alleles, but, because of the difficulties of establishing the classical/nonclassical status based on sequence alone, it may include some nonclassical alleles and exclude some classical alleles; therefore, we adopt a neutral "conserved anchor" name. The fraction of conserved anchor alleles in exon 2 ranged from 0.85 (Andrias davidianus, Cryptobranchidae) to 1.0 (Desmognathus, Karsenia, and Salamandra) and in exon 3, it ranged from 0.36 (P. anguinus) to 1.0 (D. fuscus) (supplementary table S3, Supplementary Material online). Phylogenies showed family-level monophyly of MHC-I alleles in most cases ( fig. 2)-either gene duplications postdated the divergence of salamander families, or sequences of different genes have been homogenized in a process of concerted evolution. Whichever the mechanism, phylogenies show dynamic evolution of MHC-I in salamanders, making identification of 1:1 orthologs between families next to impossible. A detailed analysis of MHC-I molecular evolution in salamanders will be reported elsewhere (Minias et al., in preparation).

Antigen-Processing Genes
Polymorphism of all five APGs was assessed with Illumina sequencing of targets captured with molecular inversion probes (MIPs, fig. 3). Because sequencing produced stacks of overlapping paired-end reads starting at defined positions, we were able to obtain physically phased microhaplotypes for nonoverlapping segments along the reference ( fig. 3). Apparently, some APGs have been lost in plethodontid salamanders: PSMB8 was not found in the transcriptome of any plethodontid, PSMB9 was not found in Hydromantes and TAP2 may be missing in Karsenia (Palomar et al. 2021). We attempted to sequence as much APGs coding sequence (cds) as possible, but we were not able to design MIPs for exons shorter than approximately 120 bp. We considered a segment within an individual effectively sequenced if at least one MIP spanning that segment had a coverage of 20 or more reads. The average fraction of APG cds length effectively sequenced in at least 50% of individuals within species was 0.626 (4,289 bp), ranging from 0.528 (2,192 bp)

Non-APGs
Polymorphism of four non-APGs was assessed with MIPs, whereas the fifth, BRD2, was amplified and sequenced similarly to MHC-I. Because coding sequences of some non-APGs are long, we did not attempt to maximize the fraction of non-APGs cds sequenced, but instead aimed to obtain enough data for a meaningful comparison with APGs. The average total length of sequenced non-APGs cds was 3,558 bp, ranging from 2,246 in

Diversity and the Phylogenetic Correlation between MHC-I and APGs
Diversity of each gene was estimated at both the withinindividual (a diversity) and species-wide (c diversity) level using three measures of genetic distance: 1) DNA divergence at synonymous sites, 2) protein divergence measured as amino acid p-distance, and 3) functional protein divergence measured as Grantham (1974) distance. Within-individual diversity was expressed as the sum of branch lengths of the phylogenetic tree linking individual's alleles-in the case of two alleles this amounted to the genetic distance between them. Species-wide diversity was expressed as the sum of branch lengths of the tree linking all alleles detected in a species, with three different schemes of weighting the allele frequency ( fig. 3, see Materials and Methods for details). Diversities within all three categories of genes investigated here, that is, MHC-I, APGs, and non-APGs varied, sometimes by orders of magnitude, among salamander species (figs. 1 and 4; supplementary table S8, Supplementary Material online). MHC-I diversity was generally much higher than diversity of either APGs or non-APGs. The differences in the number of MHC-I alleles per individual apparently reflect interspecific differences in the extent of gene duplication and intraspecific copy number variation (supplementary table S3, Supplementary Material online). APG diversity in turn was generally higher than that of non-APGs, PSMBs, and TAPs exhibited comparable diversities, whereas TAPBP was less diverse ( fig. 4 and supplementary table S8, Supplementary Material online).
We explored the relationships between MHC-I and APG diversity, simultaneously controlling for non-APG diversity, using a series of PGLS models. In the analysis not including APGs, diversity of non-APGs did not explain MHC-I diversity (table 2 and supplementary tables S9-S11, Supplementary Material online). The general formulation of the coevolution hypothesis, as applied here, considers all five APGs as a single class and was tested accordingly, using as the response variable the unweighted average of all APG diversities within species. The APGs may, however, also be considered a heterogeneous group consisting of three functional subgroups: PSMBs (PSMB8 and PSMB9), TAPs (TAP1 and TAP2), and TAPBP. As the strength of coevolution with MHC-I may differ among subgroups, or only some of APGs may coevolve with MHC-I, we also fitted models using each subgroup and individual APGs as the response variable. The results were broadly similar for within-individual (a) and species-wide (c) diversity, though the signal was stronger for the latter ( fig. 4 and table 2).   Resequencing of our genes of interest produced stacks of overlapping reads that provided physically phased microhaplotypes (local haplotypes). Amplicon sequencing (left) produced a single stack of reads per exon, whereas MIP resequencing (right) produced several, partly overlapping stacks, which were then divided into segments, such that within each segment microhaplotypes spanning the full segment length were recovered from reads spanning the segment. Thus a single segment per exon was obtained from amplicon resequencing, whereas MIP resequencing typically produced multiple segments per exon. Note that in both illustrated cases the individual has more than two alleles per gene, indicating gene duplication. In addition, not all variation is necessarily recovered using the applied methods, as indicated by one haplotype missed by MIP resequencing for segment j ¼ 1. (B) Phylogenetic c (left) and a (right) diversities are then calculated for each segment, using the segment's phylogeny, to estimate species-wide and individual-level diversity, respectively; c diversity is the sum of branch lengths in the phylogeny of all alleles detected in a species (blue), with allele frequency weighting schemes depending on the q value (e.g., equal weights when q ¼ 0, see Text for details on q ¼ 1 and q ¼ 2); a diversity is the sum of branch lengths connecting alleles present in a given individual (red). (C) To calculate the per-gene a and c phylogenetic diversities, the per-base/amino acid estimates were obtained using the method of Chao et al. (2014) and then their weighted average was calculated with segment length used as weights (see Text for details).   MBE MHC-I a diversity was unrelated to mean APG a diversity for any distance measure, whereas a significant effect of non-APGs was only detected for synonymous divergence (P ¼ 0.013 , table 2 and supplementary table S9, Supplementary Material online). When the three APG subsets were analyzed separately, TAP a diversity was explained by both MHC-I (P ¼ 0.008 for both amino acid pdistance and Grantham distance, the effect for synonymous divergence was marginally nonsignificant) and non-APG a diversities (P 0.002 for all three distances). We did not MHC-I c diversity was positively related to mean APG c diversity. This effect was significant for the amino acid p-distance and Grantham distance for all weightings (q values) of allele frequencies (all P 0.006), and for synonymous diversity for q ¼ 0 and 1 (P ¼ 0.002 and 0.027, respectively , table 2  and supplementary table S9, Supplementary Material online). The relationship between non-APG and mean APG c diversity was significant only for synonymous variation (all P 0.001). Similarly to a diversity, there was also considerable heterogeneity between individual APGs and their functionally related subsets in c diversity. Although the positive relationship between MHC-I and TAP c diversity was strong (stronger for TAP1), for PSMBs, it was weak and patchy-reaching significance only for some combinations of q and distance measures, and no effect was detected for TAPBP (table 2  and supplementary table S9, Supplementary Material online). The relationship between non-APG c diversity and c diversity of particular APGs also varied, with the strongest effect for synonymous diversity in TAPs (supplementary table S9, Supplementary Material online). The results of PGLS modeling for sample sizes standardized to 15 individuals per species, and for the "conserved anchor" data set were very similar to the results obtained for the full data set (supplementary tables S10 and S11, Supplementary Material online).

Discussion
Both MHC-I and APG diversity vary widely among salamander species, making them a suitable system for testing predictions of the coevolution hypothesis. MHC-I diversity predicted species-wide-but not within-individual-mean APG diversity in the PGLS analysis. The analysis of functional APG MBE subcategories and individual APGs showed that the signal is driven by TAP1 and TAP2, with a significant MHC-I effect at both the within-individual and species-wide levels. No consistent effect was detected for PSMBs, and, especially, TAPBP. Thus, TAPs, but not other APGs robustly support the major prediction of the coevolution hypothesis tested in the current study. This is in line with the results of Palomar et al. (2021) who examined signatures of adaptive evolution in salamander APGs at the phylogenetic scales, as predicted under coevolution, and found more signal of adaptive evolution in TAPs than in other APGs. The pattern of a positive correlation between MHC-I and TAP diversity suggests that coevolution does not select against the expansion of the MHC-I family. The establishment of multiple highly expressed MHC-I genes following duplication may be allowed or even favored, as long as all MHC-I variants encoded on a haplotype efficiently bind peptides pumped by the TAP variant(s) encoded on it (Kaufman 1999;Palomar et al. 2021). The number of functional MHC-I genes on a haplotype would not be strongly constrained under this scenario, whereas their postduplication divergence could be. Such a situation has been observed in plant rbcS encoding the small subunit of RuBisCO enzyme-postduplication divergence of rbcS copies has been strongly constrained by the requirements of coevolution with the large subunit (Yamada et al. 2019). The number of MHC-I genes differs considerably among salamander species. This, together with copy number variation within species, indicates that the number of MHC-I genes can change rapidly in response to as yet incompletely understood selection pressures and complex tradeoffs (O'Connor et al. 2018;Phillips et al. 2018;Migalska et al. 2019;Radwan et al. 2020). Gene duplication also occurs in APGs, but is more limited (Palomar et al. 2021, this study). Thus, the range of MHC-I diversity within haplotypes (which correlates with within-individual diversity) would vary more among species than would the APG diversity, potentially leading to a weaker correlation at the withinindividual compared with the species-wide level. Species-wide diversity of both MHC-I and APGs would be determined by the number and frequencies of different haplotypes carrying various coadapted MHC-I-APGs combinations segregating within the species, leading to the observed correlation between MHC-I and APG species-wide diversities.
The strongest support for MHC-I-APG coevolution in salamanders was found in TAPs, which is consistent with evidence from several species that have been studied in-depth: chicken (Walker et al. 2011), rat (Joly et al. 1998), Xenopus (Ohta et al. 2003), andzebrafish (McConnell et al. 2016). Our test of the coevolution hypothesis in a comparative framework implies that coevolution between at least MHC-I and TAPs may be widespread in other taxa with duplicated MHC-I genes. To confirm whether this is the case, additional comparative studies in other taxonomic groups, using methodology similar to that applied here, are needed. Candidate taxa for such additional analyses include teleost fishes (Grimholt et al. 2015), squamate reptiles (Radwan et al. 2014;Olivieri et al. 2020), and several orders of birds, particularly the Passeriformes (He et al. 2021). If coevolution is widespread, MHC-I diversity should explain TAP diversity in these taxa as well.
In contrast to TAP results, we found only a weak and inconsistent relationship between MHC-I and PSMB diversity. Indeed, the available evidence for coevolution of PSMBs is based mainly on a tight linkage with TAPs and cosegregation of divergent lineages of both genes in the frog Xenopus (reviewed in Kasahara and Flajnik [2019]), whereas no functional data supporting coevolution are available. In fact, birds lack immunoproteasome (Erath and Groettrup 2015), so no conclusive evidence could have been provided by the otherwise extensive functional research on coevolution in chicken (reviewed in Kaufman [2015]). One striking feature of nonmammalian PSMBs, which has been attributed to possible coevolution with different MHC-I alleles, still awaits explanation. Many species possess two PSMB8 lineages that most likely differ in catalytic properties and have apparently been maintained over evolutionary timescales by extremely strong balancing selection (Huang et al. 2013). The two PSMB8 lineages also occur in the Urodela, and we also detected two distinct PSMB9 lineages in the family Salamandridae, although they were less divergent than those of PSMB8 (Palomar et al. 2021). The relationship between PSMB lineages, the overall diversity of these genes, and MHC-I in salamanders remains unresolved. Although tight linkage creates conditions for coevolution, our data suggest some independence of the balancing selection mechanisms acting on PSMBs and MHC-I. Recent findings revealed diverse roles of the immunoproteasome, apart from providing antigenic peptides to the MHC-I (reviewed in Ferrington and Gregerson [2012] and Murata et al. [2018]). These include control of transcriptional activation and modulation of downstream cytokines, a role in T cell differentiation, involvement in the response to oxidative stress and a role in protein homeostasis during inflammation, a role in lipid metabolisms, and even some function in uninjured, immunoprivileged tissues such as the retina and brain. Potentially, any of these functions could contribute to the maintenance of divergent PSMB lineages.
TAPBP diversity did not correlate with that of MHC-I, whereas both within-individual and species-wide amino acid diversity of this gene was explained to some extent by the diversity of non-APGs. This suggests that TAPBP variation is determined by the joint action of purifying selection and demography, as elsewhere in the genome (Ellegren and Galtier 2016). TAPBP also consistently showed the lowest diversity among the five APGs. Thus, comparative data do not support coevolution between MHC-I and TAPBP in salamanders. Perhaps the recombination distance between TAPBP and MHC-I ($0.45 cM vs. <0.1 cM between the remaining four APGs and MHC-I in Lissotriton, the only salamander with the necessary data available, Palomar et al. 2021) is too large for coevolution to occur. In addition, MHC-I alleles differ greatly in their dependence on TAPBP for high-affinity peptide loading (Peh et al. 1998;van Hateren et al. 2013;Raghavan and Geng 2015). This can further obscure a signal of coevolution, because the presence of numerous MHC-I alleles that do not rely on TAPBP could decouple MHC-I and TAPBP diversities. So far the only evidence for coevolution, found MHC-I and APGs in Salamanders . doi:10.1093/molbev/msab237 MBE in chicken, is based on interactions of polymorphic TAPBP with MHC-I in an allele-specific manner (van Hateren et al. 2013). However, this could be system-specific: chicken has only one highly expressed MHC-I gene within an extremely tightly linked MHC region. Unfortunately almost no other information on TAPBP polymorphism is available in nonmammalian systems.
The phylogenetic correlation between MHC-I and TAP diversity reported here is statistically robust, as it holds across allele frequency weightings (q values), measures of genetic distance, and regardless of whether sample sizes were standardized across species. However, we also need consider carefully explanations other than coevolution for the observed relationship between MHC-I and TAP diversity. One could argue that the correlation is a simple consequence of tight physical linkage between MHC-I and TAPs, so that the processes that promote MHC-I duplication and divergence spill over to the neighboring genomic regions. The data, however, speak against such an interpretation. First, TAPs and PSMBs are extremely closely linked in newts (Palomar et al. 2021), whereas a robust correlation was detected only for the former. Second, we have controlled for genomic region-specific effects by including non-APGs alongside MHC-I and APGs.
Several factors may contribute to the moderate strength of the detected correlation. One is related to the recently elucidated differences in the diversity of peptides translocated and bound by different alleles of TAP and MHC-I, respectively (Chappell et al. 2015;Tregaskes et al. 2016;Kaufman 2018a). Although it was long recognized that the polymorphism of these molecules qualitatively affects peptide repertoires, it is now clear that the size and diversity of these repertoires also vary. Some molecules are "fastidious" (specialist), others "promiscuous" (generalist), promoting translocation/binding of few similar-or many diverse-peptides, respectively. The measures of diversity applied in the present study, based on sequence information only, might not fully capture these functional differences. In particular, generalists, fulfilling functions of a more diverse set of molecules, may erode the observed correlation.
Another potential complication is the possibility that nonfunctional pseudogenes or nonclassical genes may constitute a sizeable fraction of the reported MHC-I diversity. Not all MHC-I sequences detected in genomic DNA are highly expressed and pseudogenes are scattered across allele phylogenies in newts (Fijarczyk et al. 2018). These two observations, together with the monophyly of MHC-I generally observed at the level of families, suggest a rapid turnover of MHC-I genes, which likely includes both episodes of pseudogenization and recurrent emergence of nonclassical genes. The evolutionary dynamics of MHC-I probably contributes to the difficulties in establishing the classical status of MHC-I genes in salamanders based on sequence data alone (Sammut et al. 1999). Nonetheless, three major considerations argue against the possibility that nonfunctional or nonclassical genes are of major concern in our analyses. First, several MHC-I genes are similarly highly transcribed in multiple tissues of salamander species studied so far (Sammut et al. 1999;Fijarczyk et al. 2018;Palomar et al. 2021) and ubiquitous expression is consistent with their classical status. Although high expression at the mRNA level does not automatically imply high protein level or high cell surface expression (Tregaskes et al. 2016), it is suggestive of classical function. Second, alleles with signatures of pseudogenization constitute only a small fraction of salamander MHC-I diversity, whereas overall high polymorphism and consistent signal of positive selection, both hallmarks of classical MHC genes, has been detected in all species studied to date (Fijarczyk et al. 2018;Minias et al., in preparation). Third, the PGLS analyses using the "conserved anchor" MHC-I data set produced results very similar to those including all potentially functional MHC-I alleles. To summarize, although we cannot completely rule out that certain nonfunctional or nonclassical MHC-I alleles were included in our analyses, they are unlikely to contribute to the signal of phylogenetic correlation between MHC-I and APG diversities. Instead, they may have introduced noise, which would not affect our conclusions, but might also explain why the PGLS models had only a moderate predictive power.
Our approach has some limitations, as phylogenetic comparative analysis, although suggestive, does not provide direct evidence of causality. It does, however clearly indicate the need for mechanistic tests in taxa with polymorphic APGs (in particular TAPs) and duplicated classical MHC-I genes. Such tests should examine the binding profiles of MHC-I proteins encoded on a single haplotype and compare them with TAP transport specificities. Under coevolution the binding profiles of MHC-I variants encoded on the same haplotype would be more similar than those of MHC-I variants encoded on different haplotypes and would match the haplotype's TAP pumping specificity. To date, such experiments have been performed in chicken (Walker et al. 2011;Tregaskes et al. 2016), which has one highly expressed MHC-I molecule (encoded by BF2 gene), one poorly expressed (encoded by BF1), and TAP pumping specificity matching the antigen-binding specificity of the highly expressed BF2 gene product (supporting coevolution with just a single MHC-I gene). The next, challenging step would be to expand the scope of this approach to species with duplicated highly expressed and polymorphic MHC-I. Such an endeavor would require advanced experimental tools including homozygous strains with well-defined MHC haplotypes carrying multiple classical class I genes and corresponding cell lines. Such resources are increasingly available for various taxa that possess multiple MHC-I genes, including not only zebrafish and passerine birds, but also salamanders, such as Pleurodeles waltl and the axolotl (Reiß et al. 2015;Elewa et al. 2017). We hope that recent advancements in molecular biology, including those facilitating directed mutagenesis and generation of transgenic and knockout lines, will prompt mechanistic tests that will be capable of supporting or rejecting haplotype-specific coevolution of APGs with multiple MHC-Is.

Conclusions
Here, we report the first comparative test of a crucial prediction of the coevolution hypothesis-a positive phylogenetic correlation between MHC-I and APG diversities. Data from 30 salamander species across six families support this prediction, Palomar et al. . doi:10.1093/molbev/msab237 MBE with the support restricted mainly to one APG subclass-TAPs. Our results imply that coevolution does not prevent the expansion of MHC-I gene family, although it may restrict postduplication divergence of MHC-I genes. Nonmammalian vertebrates thus may be able to respond to diverse selection pressures by rapidly expanding or contracting the MHC-I gene family, while retaining the benefits of coevolution between MHC-I and TAPs within haplotypes. Such a mechanism would provide a great deal of flexibility in shaping the adaptive immune response.

Laboratory Procedures
DNA was extracted using the Wizard Genomic DNA Purification Kit (Promega). MHC-I exons 2 and 3 as well as BRD2 variation was assessed using amplicon sequencing, whereas diversity of the remaining genes was examined by resequencing, following target capture with overlapping MIPs (Niedzicka et al. 2016). MIPs and primers for amplification of MHC-I and BRD2 (supplementary table S12, Supplementary Material online) were designed from available salamander transcriptomes (Palomar et al. 2021). For the purpose of the current study, we additionally generated tailtip transcriptomes of Eurycea bislineata, Desmognathus fuscus, and Proteus anguinus and assembled transcriptome of Plethodon cinereus using RNAseq data deposited in SRA (SRR9925250, SRR9925255, SRR9925273, and SRR9925296). Details of primer and MIP design, laboratory procedures, Illumina sequencing, SNP-calling, and genotyping are in supplementary methods, Supplementary Material online.

Identification of Putative Functional MHC-I Sequences
In nonmodel species that possess duplicated MHC-I genes, locus-specific primers for amplification of the variable exons usually cannot be designed and alleles from multiple loci are coamplified. Given sufficient similarity, gene fragments, pseudogenes, or other similar genes can also be amplified. To include in our analyses only potentially functional MHC-I alleles, we first removed all sequences with signatures of pseudogenization-frameshifts or internal stop codons. The remaining alleles could still be derived from both classical and nonclassical MHC-I genes. As almost all MHC-I alleles in Lissotriton newts segregate as stable haplotypes (Palomar et al. 2021), classical and nonclassical genes in salamanders are most likely tightly linked. Distinguishing between these two categories of MHC genes on the basis of sequence alone is challenging, probably even more so in salamanders; in the axolotl, sequences with intermediate characteristics between classical and nonclassical MHC-I have been described (Sammut et al. 1999). To minimize the risk that our results are distorted by the inclusion of nonclassical MHC-I sequences, in addition to the data set comprising all putative functional alleles, we also prepared a smaller "conserved anchor" data set, including only sequences that in key residues that anchor the termini of the antigenic peptide contained the amino acids conserved in most classical MHC-I molecules (Kaufman et al. 1994;Sammut et al. 1999). These were Y59(61) in exon 2, and T143(47), K146(50) or R146(50), W147(51), and Y159(66) in exon 3; the numbers following the letter indicate positions in the HLA-A molecule and the numbers in parentheses the positions in our alignments (supplementary fig. S1, Supplementary Material online). By applying this approach, we probably removed also many classical alleles, as in the phylogenetic trees (not shown) only some alleles excluded from the "conserved anchor" data set formed clusters that may represent nonclassical genes, whereas other alleles were scattered on the tree and intermixed with the "conserved anchor" alleles. Therefore, the "conserved anchor" data set is probably a conservative, "worst-case" scenario.

Genetic Diversity
The overall diversity of salamander MHC-I alleles was visualized, separately for exons 2 and 3, with BIONJ (Gascuel 1997) trees constructed from the matrix of Jukes-Cantor distances among DNA sequences of all potentially functional alleles. Alleles were color-coded by family to visualize the extent to which alleles clustered together within salamander families and assess whether orthology is retained over extended evolutionary periods.
A proper test of the phylogenetic correlation between MHC-I and APG diversities requires appropriate diversity measures, and we were interested in both within-individual and species-wide diversity. Because most of the studied genes were duplicated in at least some species and the extent of duplication differed among genes and taxa, standard population genetic measures, such as nucleotide diversity, were not appropriate. Instead, we adopted measures of a and c diversity developed in ecology to study species diversity, but increasingly used also for measuring genetic diversity (Sherwin et al. 2017;Gaggiotti et al. 2018). We used Hill numbers-based phylogenetic diversity as defined by Chao et al. (2014). This approach has three major advantages: 1) it naturally accommodates an arbitrary level of gene duplication; 2) it allows an assessment of the effect of rare and common alleles on diversity within a single framework by varying the q value: q ¼ 0 assigns equal weight to all variants, regardless of their frequency, and diversity corresponds to the number of alleles, q ¼ 1 weights variants according to their frequency and diversity corresponds to the exponential of Shannon's diversity index, and q ¼ 2 gives more weight to frequent variants resulting in the inverse of Simpson's diversity index; and 3) it allows the application of various measures of genetic distance among alleles/haplotypes, including those most relevant for MHC diversity, such as synonymous DNA divergence or functional protein divergence. In this approach haplotypes/alleles were analogous to species, individuals to local communities and all individuals sampled within species to the total community. Thus, phylogenetic a diversity was the sum of branch lengths connecting an individual's alleles in the phylogeny of a given locus, providing a measure of withinindividual diversity. Species-wide diversity was estimated as the phylogenetic c diversity, the sum of branch lengths in the phylogeny of all alleles detected in a species, with various allele frequency weighting schemes applied by varying q, as described in Chao et al. (2014).
MHC-I and APGs in Salamanders . doi:10.1093/molbev/msab237 MBE To calculate within-individual (a) diversity, we used information on allele presence-absence, not attempting to estimate the number of copies of each allele. Hence, a diversities were calculated and interpreted only for q ¼ 0. We deem this approach appropriate, as codominant expression of APGs and MHC-I generally results in their dominant effect on fitness, that is, it is the presence of the allele in an individual, not its number of copies that matters (Radwan et al. 2020). Because the genotype of each individual was known with an accuracy up to the genotyping error, so was its a diversity. The situation with species-wide (c) diversity is slightly more complicated. First, different weightings of allele frequencies (q ¼ 0, 1, 2) provide complementary information so we report and interpret them all. Second, c diversity depends on sample size, with the strongest dependence at q ¼ 0. Differences among species in sample sizes should not generally be a problem in testing evolutionary correlations, as long as they do not differ across categories of genes within species, which was generally the case in our study. Nonetheless, we also calculated diversities for sample sizes standardized to the minimum available for all species-15 individuals, as determined by the sample size of Andrias davidianus. To minimize missing data, the 15 individuals of each species with the highest coverage in MIPs were included.
Because both amplicon and MIP resequencing produced stacks of overlapping paired-end reads starting at defined positions, we performed the diversity analysis at the level of physically phased microhaplotypes ( fig. 3A). For MHC-I exons and BRD2, haplotypes were reconstructed during genotyping with AmpliSAS (Sebastian et al. 2016). The remaining genes were divided into nonoverlapping segments and microhaplotypes were obtained for each segment separately using the R package microhaplot (Ng 2019) and custom R scripts, as described in supplementary methods, Supplementary Material online. Then, phylogenetic withinindividual (a) and species-wide (c) diversities were calculated for each segment. a diversity was the sum of the phylogenetic tree branch lengths connecting the segment's haplotypes within the individual, whereas c diversity was the length of the branches connecting all segment's haplotypes within species, weighted accordingly for various q values ( fig.  3B; Chao et al. 2014). To obtain the per-base/per-amino acid estimate for each gene, the weighted average of segment diversities was calculated, with segments weighted according to their lengths ( fig. 3C).
We used the following measures of sequence divergence: 1) nucleotide divergence at synonymous codon positions estimated using the method of Li (1993), which should be affected mainly by demography, 2) protein divergence estimated with the amino acid p-distance, and 3) protein divergence estimated using the Grantham (1974) distance, which was shown to adequately reflect functional divergence between human MHC alleles (Pierini and Lenz 2018). BIONJ trees were constructed for each segment (see above) from the matrices of genetic distances using ape (Paradis and Schliep 2019) and diversities were calculated in hillR (Li 2018). All analyses were performed in R.

Statistical Analysis
The coevolution hypothesis evaluated in this study predicts a positive correlation between APG and MHC-I diversity when controlling for non-APG (a "covariate") diversity. We tested this prediction using PGLS in caper (Orme et al. 2013), in order to take into account the nonindependence of related species. The model including APG diversity as the response variable and MHC-I and non-APG diversity as continuous predictors was fitted with the pgls function. The nonindependence between residuals was modeled using the maximum likelihood estimate of Pagel's k to transform the variance-covariance matrix obtained from the phylogeny under the Brownian motion model (Freckleton et al. 2002). Separate models were fitted for a and c diversity. Because there was no a priori reason to transform the data and a visual inspection of residuals did not reveal serious departures from normality, we used untransformed diversity estimates in PGLS modeling.
We used the time calibrated phylogeny of Jetz and Pyron (2018), to which we added the recently recognized species O. nesterovi (van Riemsdijk et al. 2017). We also used a modified phylogeny with the topology and divergence times within Salamandridae taken from newer phylogenomics-based phylogenies of the family Salamandridae (Rancilhac et al. 2021) and the genus Triturus (Wielstra et al. 2019). The PGLS modeling results with both phylogenies were virtually identical, so we present only the former.

Supplementary Material
Supplementary data are available at Molecular Biology and Evolution online.