MHC class IIa haplotypes derived by high-throughput SNP screening in an isolated sheep population

Abstract Investigating the current evolutionary processes acting on a highly polymorphic gene region, such as the major histocompatibility complex (MHC), requires extensive population data for both genotypes and phenotypes. The MHC consists of several tightly linked loci with both allelic and gene content variation, making it challenging to genotype. Eight class IIa haplotypes have previously been identified in the Soay sheep (Ovis aries) of St. Kilda using Sanger sequencing and cloning, but no single locus is representative of all haplotypes. Here, we exploit the closed nature of the island population of Soay sheep and its limited haplotypic variation to identify a panel of SNPs that enable imputation of MHC haplotypes. We compared MHC class IIa haplotypes determined by Sanger sequence-based genotyping of 135 individuals to their SNP profiles generated using the Ovine Infinium HD BeadChip. A panel of 11 SNPs could reliably determine MHC diplotypes, and two additional SNPs within the DQA1 gene enabled detection of a recombinant haplotype affecting only the SNPs downstream of the expressed genes. The panel of 13 SNPs was genotyped in 5951 Soay sheep, of which 5349 passed quality control. Using the Soay sheep pedigree, we were able to trace the origin and inheritance of the recombinant SNP haplotype. This SNP-based method has enabled the rapid generation of locus-specific MHC genotypes for large numbers of Soay sheep. This volume of high-quality genotypes in a well-characterized population of free-living sheep will be valuable for investigating the mechanisms maintaining diversity at the MHC.


Introduction
The most highly polymorphic region of the vertebrate genome is the major histocompatibility complex (MHC), and yet the evolutionary processes driving this diversity remain much debated (Bernatchez and Landry 2003;Spurgin and Richardson 2010). MHC genes encode proteins that present pathogen-derived peptides to T-cells, thus enabling an individual to mount an immune response (Klein 1986). Pathogen-mediated selection (PMS) is strongly implicated in driving diversity at the MHC (Bernatchez and Landry 2003;Piertney and Oliver 2006;Spurgin and Richardson 2010). The key PMS mechanisms involved, which are not necessarily mutually exclusive, are heterozygote advantage (Doherty and Zinkernagel 1975;Slade and McCallum 1992) and variation in selection pressure through either negative frequency-dependent selection (Apanius et al. 1997) or fluctuating selection (Hill 1991;Hedrick 2002). Disentangling these mechanisms is challenging in experimental systems which may be incapable of replicating the wide array of pathogens and parasites that occur within a wild host (although see Kalbe et al. 2009 andEizaguirre et al. 2012), and within wild systems which may be swamped with variation in individual ontogeny, environment, genetic background, and pathogen diversity.
A major impediment to understanding current selection patterns on the MHC in natural populations is our inability to accurately characterize variation at the level of the individual, and to do this at scale across many individuals. Developing genotyping methods which attribute alleles to loci and loci to haplotypes are key technical challenges when studying the MHC in nonmodel species (Bernatchez and Landry 2003;Spurgin and Richardson 2010). Large sample sizes are necessary to generate sufficient power in statistical analyses of highly polymorphic loci, such as the MHC. Rare alleles, which are of particular interest in studies of the MHC as they may confer increased fitness under negative frequency-dependent selection or in heterozygote genotypes under heterozygote advantage (Apanius et al. 1997;Hill 1998), enhance the need for large sample sizes. Characterization of each locus individually is required to partition variation amongst loci and determine an individual's true genotype. Using pooled diversity measures generated by co-amplification of multiple indistinguishable loci prohibits assessment of the heterozygous state of each locus, and therefore the ability to effectively assess heterozygous advantage (Spurgin and Richardson 2010). Locus-based genotyping of the MHC, particularly at the haplotype level, is often expensive, time-consuming, or not possible due to gene duplication, gene conversion, and variation in gene number which makes designing locus-specific primers difficult or impossible (Westerdahl 2007;Babik 2010;Whittaker et al. 2012, but see Worley et al. 2008). Moreover, MHC alleles exist in haplotypes and, in the short-term, selection will act on the combination of alleles present in haplotypes or diplotypes. Genotyping only a single locus may not capture the full diversity of the MHC molecules generated by single haplotypes (e.g., Dicks et al. 2019). An effective and reliable genotyping method should, therefore, be able to capture variation at individual loci and across multiple expressed loci.
The Soay sheep (Ovis aries) population of St. Kilda, UK provides an excellent study system in which to look for signatures of the evolutionary mechanisms operating to maintain diversity at the MHC. Soay sheep are descendants of early domestication and have been unmanaged on the island of Soay for probably thousands of years, and since the translocation of 107 sheep to the adjacent island of Hirta in 1932 (Clutton-Brock et al. 2004). The population has been the subject of a long-term study since 1985 (Clutton-Brock et al. 2004). Multiple population censuses per year and the sampling of lambs in spring and of sheep of all age categories in summer has provided a wealth of data on both phenotypic and fitness traits relevant to understanding signatures of evolutionary processes acting on the MHC, as well as a deep understanding of their ecology and demography. Driven primarily by climate and food limitation at high density, the population undergoes large fluctuations in population density (Boyd et al. 1964), varying between approximately 400 and 2000 sheep on the island of Hirta. However, the effective population size was estimated at 194 sheep (Kijas et al. 2012) and the population has generally low genetic diversity compared to other breeds (Lawson Handley et al. 2007;Kijas et al. 2009). This intensively studied island population which is subjected almost exclusively to natural evolutionary processes lends itself to investigating MHC evolution. Indeed, a previous study found that allele frequencies of two MHC-linked microsatellite loci were more even than would be expected under neutral processes, implicating balancing selection (Paterson 1998), and associations have been detected between the MHC-linked microsatellites and parasite resistance (fecal egg counts) and fitness traits (survival in lambs and yearlings) .
Sequence-based genotyping methods have been developed for multiple ovine MHC class IIa loci (Ballingall and Tassi 2010;Ballingall et al. 2015Ballingall et al. , 2018, and variation at class IIa loci was previously characterized in the Soay sheep population using these methods (Dicks et al. 2019). Genotyping of the MHC class II-DRB1, DQA1, 2, and 2-like, as well as DQB1, 2 and 2-like loci revealed eight haplotypes, with some allele sharing amongst haplotypes (but not between loci; see Supplementary Figure S1). No single locus captures all MHC class IIa variation in the Soay sheep population. In addition, sequence-based genotyping of the DQ loci, especially the DQB loci, was complicated by cross-amplification amongst loci and variation in locus configuration (Ballingall et al. 2015(Ballingall et al. , 2018Ali et al. 2017;Dicks et al. 2019) . Sanger sequencing of multiple loci in large numbers of individuals would be costly and slow, yet such haplotypic data is necessary to carry out analyses of the evolutionary processes underlying the high variability in this region.
SNP-based genotyping methods are easily scalable, and human population studies have used selected SNPs that are in linkage disequilibrium (LD) with specific MHC haplotypes to impute MHC haplotypes (Dilthey et al. 2011;de Bakker and Raychaudhuri 2012;Zheng et al. 2014). High LD between loci within subregions is characteristic of the mammalian MHC (Dawkins et al. 1999), including the ovine MHC (Lee et al. 2012). However, high levels of polymorphism and sequence similarity between alleles at different loci within the class IIa genes limits the ability to select SNPs from within coding regions, and sequence data for introns and other noncoding regions of the ovine class IIa is generally lacking. As a consequence, the Illumina Ovine 50 K SNP BeadChip has just one SNP in the putative MHC class IIa region and the Illumina Ovine Infinium HD BeadChip, with an attempted 606,066 SNPs, has just 27 SNPs ( Figure 1).
Here, we develop a high throughput haplotyping system for the Soay sheep MHC class IIa. We hypothesized that on the Illumina Ovine Infinium HD BeadChip there might be a sufficient number of SNPs in the flanking regions that were in LD with the class IIa region to enable imputation of the Soay haplotypes. We then aimed to identify a panel of SNPs capable of imputing the Soay sheep haplotypes, and to genotype them in a large number of individuals spanning the 30-years study period for which DNA was available.

Study system and DNA sampling
The Soay sheep living in the Village Bay area of Hirta, St. Kilda, UK have been intensively monitored since 1985 (Clutton-Brock et al. 2004). Lambs within the study area of Village Bay are caught, ear-tagged, and sampled for genetic analysis soon after birth. Individuals which move into the study area as adults are sampled when possible, during spring, August catch-up or the autumn rut. Individual life-histories are determined through lambing observations, censuses of the Village Bay population during spring, August and autumn, and regular mortality checks. The pedigree of Soay sheep has previously been defined using 315 SNPs selected from the 50 K SNP chip combined with motheroffspring relationships observed during lambing (Bé ré nos et al.

2014).
Ovine Infinium HD SNP BeadChip and MHC class IIa genotyping Johnston et al. (2016) previously genotyped 188 sheep at 606 066 SNPs on the Ovine Infinium HD SNP BeadChip (2016). These sheep were selected from the pedigree in order to maximize the genetic diversity captured by the fewest individuals.
Soay sheep MHC class IIa haplotypes were characterized by Dicks et al. (2019) using Sanger sequence-based genotyping of individual loci. We sequenced the DRB1 and DQA1, DQA2, and DQA2-like loci in half (n ¼ 94) of the sheep that were also genotyped on the Ovine Infinium HD SNP BeadChip, and we sequenced DRB1 and DQA1 in the remaining half (n ¼ 94) individuals. DQA2, DQA2-like and DQB loci were not genotyped due to technical challenges with co-amplification of multiple loci, so it was not possible to distinguish between haplotypes A and G.

Identification of SNPs linked to MHC class IIa haplotypes
The exact locations of the MHC genes on chromosome 20 are poorly defined, so we first estimated the MHC class IIa gene region in the O. aries genome (Oar_v3.1, GCA_000298735.1) on Ensembl (Yates et al. 2016). We searched for protein families including the words "class II histocompatibility" in their description, limiting results to chromosome 20 and unmapped scaffolds, and using the single-copy gene BTNL2 to define the q-arm telomeric end of the class IIa region (Gao et al. 2010;Liu et al. 2011;Lee et al. 2012). The Soay sheep Ovine Infinium HD SNP BeadChip genotypes were then filtered using PLINK v1.90 (Purcell et al. 2007) to select polymorphic SNPs within the identified MHC class IIa region with a genotyping rate >95%, and a minor allele frequency >1%.
Pairwise LD between SNPs within 500 kb of each other was calculated in Haploview (Barrett et al. 2005) as both r 2 and D 0 . LD blocks were identified using the Four Gamete Rule (Wang et al. 2002). Blocks were numbered according to their proximity to the SNP "desert" in the center of the class II region (Figure 1), with those upstream (i.e., closer to the centromere) being negative and those downstream (i.e., closer to the q-arm telomere) being positive. The seven SNPs within block -1 were phased using BEAGLE (Browning and Browning 2007) with standard settings and compared to the class IIa haplotypes. Upstream (more negative) SNPs were then added sequentially to block -1 SNP haplotypes and phased using BEAGLE after each SNP addition. This identified if and where the SNP and class IIa haplotypes matched, and where LD degraded, i.e., where a single class IIa haplotype was represented by multiple SNP haplotypes. The process was repeated for SNPs downstream of the SNP desert.
A recombination between haplotypes B and H was identified (see Results), such that the SNP profile matched haplotype H but the class IIa haplotype was haplotype B. To enable identification of this recombinant haplotype using SNPs alone, the second exons of class IIa genes were searched to identify SNPs suitable for a genotyping assay-biallelic across all haplotypes with a minimum of 50 bp of flanking sequence in each direction containing limited variation.

KASP SNP panel selection and genotyping
Candidate SNPs were validated in silico using Kraken software by LGC Genomics (Hoddesdon, UK) for Kompetitive allele specific PCR (KASP). Of the approved SNPs, alleles and allelic combinations were identified which were unique to each class IIa Sanger haplotype, preferentially selecting those closest to the SNP "desert" in the center of the class IIa region. A minimum subset of SNPs was identified which were able to impute the class IIa diplotype of an individual with degeneracy. The final panel of SNPs used for KASP genotyping included 11 SNPs selected from the Ovine Infinium HD BeadChip and two intragenic DQA1 SNPs (Table 1).
DNA was extracted from tissues (ear punch or post-mortem) or buffy coat samples for a previous study using the Illumina Ovine50K SNP chip (Bé ré nos et al. 2014). The Qiagen DNeasy 96 Blood and Tissue Kit was used following the manufacturer's protocol, with the exception that the final elution was performed using two 50 ll elutions (Bé ré nos et al. 2014). DNA was normalized to 50 ng/ll for 5951 individuals and distributed across 66, 96-well plates. A single sheep-the "golden sheep" (ID 7568) -was included on 61 plates to confirm genotyping consistency and a no template control (NTC) was included on all plates. An additional 200 individuals were genotyped more than once for reasons unrelated to this work (e.g., when DNA was used for genotyping on the Ovine50K SNP chip, the DNA failed at the first attempt and so was re-attempted on a subsequent plate-note that differences in genotyping technologies between Illumina arrays and KASP may mean that a sample could be successful with one method but not the other).
LGC Genomics performed KASP genotyping and automatic genotype calling for all 66 plates.

Quality control and SNP haplotyping
Quality control was carried out excluding DQA1 intragenic SNPs as they carry an expected null allele. PLINK v1.90 (Chang et al. 2015) was used to exclude individuals with more than 50% missing genotypes. Genotypes of the "golden sheep" (ID 7568) were compared across all plates to identify any systematic errors. For other duplicated individuals, DNA was expected to be of poor quality and therefore more susceptible to allelic dropout, and so the genotyping attempt with the highest match to parental genotypes was included. For individuals genotyped in triplicate, the genotyping attempt with the fewest mismatches to alternative attempts was retained. Assessments of the multiple genotyping attempts of the "golden sheep" and 201 repeated individuals, as well as Mendelian error checks between offspring and parents, indicated a high genotyping error rate on DNA plates 1-3, the very oldest plates, and so all genotypes from these plates were discounted. Individuals with more than two (of 11) missing genotypes were excluded, and SNPs were phased using BEAGLE v4.0 (Browning and Browning 2007). Phased SNP haplotypes were assigned to their corresponding class IIa haplotypes. Twenty-six individuals with novel haplotypes were excluded as these were presumed genotyping or phasing errors (maximum novel haplotype frequency was four). A Mendelian inheritance check was carried out on the inferred haplotypes for each individual using the Soay sheep pedigree (Bé ré nos et al. 2014). Three individuals whose phased MHC diplotypes mismatched the diplotype of at least one parent were excluded from further analysis. Five thousand three hundred and forty-nine sheep passed quality control measures.

Detection of B-H recombinants
The class IIa gene DQA1 is not present in all haplotypes (Ballingall et al. 2018;Dicks et al. 2019) and consequently the DQA1 SNPs are effectively tri-allelic-they have a major allele (nucleotide G for locus DQA1-171 and nucleotide A for locus DQA1-195, haplotypes B, C, D, and F), a minor allele (nucleotide C for both SNPs, haplotype H), and a null allele (haplotypes A, E, and G) where the DQA1 locus is absent from the haplotype. A DQA1 genotype failure can, therefore, represent both true failures and the diplotype of an individual carrying two DQA1*null haplotypes. For all sheep identified as carrying haplotype H, the DQA1 SNP genotypes were assessed to identify B-H recombinants.

Hardy-Weinberg Equilibrium tests
Hardy-Weinberg equilibrium (HWE) predicts genotype (or diplotype) frequencies under random mating at loci with no selection, mutation, or migration (Guo and Thompson 1992), and therefore, deviation from HWE would indicate that natural selection (including heterozygote advantage and negative frequency-dependent selection) or sexual selection (including MHC based mate choice) may be acting on the haplotypes (Hedrick 2004), or that some other demographic process is acting (Guo and Thompson 1992). As selection can act more or less at different life-history stages (LH stages), we assessed HWE for all known conceived sheep (live born, still born, and fetuses of mothers who died over winter), live born (survived to be tagged within approximately 1st week of life), survived to 1st August (approximately 4 months), and survived to 2nd August (approximately 16 months). Individuals were known to be alive if they were observed in the summer census or August catch, or if they had a known birth year prior to the observation year and were observed alive or known to have died at a later date. HWE tests were performed in the R package GenePop v1.0.5 (Rousset 2008) as probability tests (the exact test of Haldane 1954; Guo and Thompson 1992;Weir 1996) where the null hypothesis is that the same allele frequencies are observed) and U-tests for heterozygote excess (where the alternative hypothesis is heterozygote excess Raymond and Rousset 1995) for each LH stages (all cohorts combined) and for each cohort (LH stages) or year (standing population). Sequential Bonferroni correction (Holm 1979) was used to account for multiple testing.
Unless otherwise stated, all bioinformatic processes analyses were carried out in R version 3.3.3.

Identification of SNPs linked to MHC class IIa haplotypes
Full class IIa diplotypes were determined by sequence-based genotyping for 135 out of the 188 individuals genotyped on the Ovine Infinium HD BeadChip. Partial diplotype information was determined for an additional 50 individuals which carried haplotypes A or G, but these haplotypes were not differentiated (using only DRB1 and DQA1 loci). Three individuals failed sequence-based genotyping.
Nine protein families represented by 22 transcripts on the Oar_v3.1 genome were identified with the term "class II histocompatibility" in their descriptions, as well as BTNL2 (Supplementary Table S1). Five transcripts were located on unmapped scaffolds. The remaining 17 transcripts were clustered in two regions of chromosome 20, 7.164-7.426 Mb and 25.353-25.812 Mb, with the former including primarily class IIb gene annotations (DO and DM genes) and the latter class IIa gene annotations (DR and DQ genes). DQA gene annotations were found in both class II regions. The unusual ruminant class IIa gene DYA is located in the former region, 7.164-7.426 Mb (Ballingall et al. 2001) and so it is likely that this gene has been poorly annotated . Therefore, it is more likely that the DQA gene is located in the latter class IIa region, 25.353-25.812 Mb, along with the DR and other DQ genes. Previous assemblies of this region using BAC cloning did not identify DQ loci outside of the class IIa region (Liu et al. 2006;Gao et al. 2010). The classical class IIa genes are therefore expected to be located between 25.353 and 25.812 Mb.
On the Ovine Infinium HD BeadChip, there were 11,757 HD SNPs located on chromosome 20, of which 8420 SNPs passed quality control in the Soay sheep sample. Twenty-one SNPs were located in the class IIa region predicted above ( Figure 1B).
Pairwise LD estimates of D 0 for the SNPs between 25.353 and 25.812 Mb were typically very high (Figure 2). There was no obvious pattern to the degradation of LD, which would be suggestive of a recombination hotspot. On the other hand, r 2 estimates, were typically very low (Supplementary Figure S2) and were also uninformative in identifying linkage regions. Given the lack of obvious LD pattern, linkage blocks were identified using the Four Gamete Rule in Haploview (shown in Figure 2).
Block -1, which is immediately upstream of the HD SNP gap, consisted of six SNPs and formed six haplotypes (Figure 3). Four of the six haplotypes in block -1 each corresponded to a single Sanger haplotype, and two each corresponded to a pair of haplotypes, A and G, or B, and H (Figure 3). The addition of upstream SNPs to block -1 showed that linkage degraded immediately upstream of block -1 (at SNP oar3_OAR20_25383037) within haplotype E, but haplotypes A and G, as well as haplotypes B and H, only became differentiated at oar3_OAR20_25346061 (block -3). Downstream of the SNP "desert," haplotypes B and H became differentiated at oar3_OAR20_25742763 (block 1), and A and G differentiated with the addition of oar3_OAR20_25752381 (adjacent to block 1). Linkage within haplotypes did not degrade until oar3_OAR20_25833644 (within block 6) for both haplotypes B and E (Figure 3) Figure 2 D 0 linkage disequilibrium estimates for pairs of HD SNPs within and surrounding the class IIa gene region, where values of 1 represent high LD, and 0 represents low LD. SNPs within the class IIa gene region are highlighted by the grey shading. The SNP "desert" is denoted by the green box, and SNP pairs that fall within this green box represent pairs spanning this SNP "desert" (i.e., that one SNP is upstream and the other downstream). Open black boxes show linkage blocks identified within Haploview using the Four Gamete Rule.

Haplotype B and H recombinant detection
Following Sanger sequencing individual 4179 was found to be heterozygous for haplotypes for B and G. However, the flanking SNPs revealed a recombination event which did not affect the expressed haplotype but affected the haplotype that would be imputed using the panel of 31 SNPs within the class IIa linkage region. Using only the seven SNPs upstream of the SNP "desert," 4179's SNP diplotype was consistent with haplotypes B/H (identical in this region) and G (Figure 3,

KASP assay selection and genotyping
All 31 SNPs within the class IIa linkage region that were selected from the Ovine Infinium HD SNP BeadChip passed in silico validation for KASP. From this shortlist, we selected 11 SNPs enabling imputation of the eight class II haplotypes whilst maintaining degeneracy in case of genotyping failures ( Figure 4; Table 1). To enable detection of the B-H recombinant haplotype, two SNPs within exon 2 of the DQA1 gene were identified (DQA1-171 and DQA1-195; Supplementary Figure S3) that met the requirements for KASP assay design. No suitable biallelic SNPs with sufficient invariant flanking sequence could be identified in other MHC class IIa genes to make this distinction. DQA1 is not present on three Soay sheep MHC class IIa haplotypes, A, E, and G, which instead carry DQA2-like alleles. As such, DQA1 SNPs are effectively triallelic and consist of a major allele, minor allele, and null allele (see Figure 4). Five thousand nine hundred and fifty-one individuals were genotyped by LGC Genomics using KASP at the final panel of 13 SNPs.

Quality control
The "golden sheep" was KASP genotyped 57 times, with an overall genotyping error rate of 0.48% and a maximum error rate of 1.85% per locus. After excluding DNA plates 1-3 due to exceptionally high error rates, 5675 individuals were KASP genotyped for the panel of 13 SNPs. Quality control measures on the 11 flanking SNPs, and samples sizes at each stage are shown in Supplementary Figure S4. The genotyping rate was below 50% for 195 sheep, which were excluded.  oar3_OAR20_25255531  oar3_OAR20_25257945  oar3_OAR20_25257947  oar3_OAR20_25272020  oar3_OAR20_25285640  oar3_OAR20_25297078  oar3_OAR20_25302189  oar3_OAR20_25305860  oar3_OAR20_25319392  oar3_OAR20_25327353  oar3_OAR20_25334790  oar3_OAR20_25336466  oar3_OAR20_25346061  oar3_OAR20_25373238  oar3_OAR20_25375239  oar3_OAR20_25383037  oar3_OAR20_25393230  oar3_OAR20_25417300  oar3_OAR20_25419505  oar3_OAR20_25441205  oar3_OAR20_25447962  oar3_OAR20_25458354  oar3_OAR20_25460053  oar3_OAR20_25685291  oar3_OAR20_25742711  oar3_OAR20_25742763  oar3_OAR20_25752381  oar3_OAR20_25754527  oar3_OAR20_25759002  oar3_OAR20_25761264  oar3_OAR20_25763641  oar3_OAR20_25774758   OAR20_27259292.1  oar3_OAR20_25782804  oar3_OAR20_25792461  oar3_OAR20_25793665  oar3_OAR20_25793818  oar3_OAR20_25800652  oar3_OAR20_25803142  oar3_OAR20_25806179  oar3_OAR20_25815134  oar3_OAR20_25815236  oar3_OAR20_25815388  oar3_OAR20_25819681  oar3_OAR20_25820480  oar3_OAR20_25829244  oar3_OAR20_25831572  oar3_OAR20_25833644  oar3_OAR20_25836502  oar3_OAR20_25836516  oar3_OAR20_25836613  oar3_OAR20_25843578  oar3_OAR20_25849183  oar3_OAR20_25849318  oar3_OAR20_25852689  oar3_OAR20_25922589  oar3_OAR20_25923324  oar3_OAR20_25924943  oar3_OAR20_25936577  oar3_OAR20_25945680  oar3_OAR20_25955212  oar3_OAR20_25976471   77  1  1   After assessing genotyping attempts for repeated individuals and excluding the least successful genotyping attempt, 102 individuals were removed due to having two or more missing genotypes. The 11 flanking SNPs were phased in the remaining 5378 individuals. This revealed eight frequent haplotypes which exactly matched the eight expected class IIa haplotypes. Twenty novel haplotypes were identified in 26 individuals (with a maximum frequency of four, see Supplementary Table S2). Information on both parents was available for 15 of the 26 individuals, and only one instance was identified where a novel haplotype was present in both a parent and offspring (haplotype AACCGGATACA, Supplementary Table S2). Thus, all novel haplotypes were excluded from further analysis as they were considered probably due to genotyping errors or extremely rare haplotypes. Mendelian error checks identified only three sheep whose haplotypes mismatched their parents'; the offspring were removed from the dataset. The final dataset consisted of diplotypes for 5349 individuals. In this sample, all SNPs were in HWE, except for the DQA1 SNPs which were expected to deviate from HWE due to null alleles (Table 1).

DQA1 SNP genotypes
DQA1 SNP genotypes were compared to the diplotypes identified by the panel of 11 flanking SNPs. The two selected DQA1 SNPs, DQA1-171, and DQA1-195, are tri-allelic such that haplotype H carries the minor allele (C for both loci), four haplotypes (B, C, D, and F) have the major allele (G or A respectively), and three haplotypes (A, E, and G) do not carry the locus and present as null alleles. As such, it is possible to predict the expected genotypes for the DQA1 SNPs from an individual's diplotype. In an initial test sample of the first 801 sheep genotyped, five had unexpected DQA1 genotypes, all of which carried DQA1*null haplotypes (see Supplementary Figure S5), which is an error rate of 0.62%. In addition, genotypes classified as fail, which include true negatives, true failures to amplify, and homozygote null-alleles, showed greater variance in fluorescence for the DQA1 SNPs compared to the panel of 11 truly bi-allelic SNPs. This suggests nonspecific amplification may be occurring in the absence of the target locus. Nevertheless, B-H recombinant haplotypes could be identified (see Supplementary Figure S5).
In the full dataset, genotypes for the two DQA1 SNPs were assessed only in individuals carrying haplotype H (N ¼ 906), identifying 56 recombinant individuals. After recoding recombinant haplotypes as R, all recombinant individuals were found to be descendants of the individual in which the recombinant haplotype was identified, male 4179 ( Figure 5). Furthermore, the mother of 4179 carried both B and H haplotypes, and thus was the likely origin of the cross-over event.

Haplotype and diplotype frequencies
Of 5349 individuals, 976 (17.9%) had homozygous diplotypes. Haplotype B was the most frequent, whilst haplotype D the rarest, with only six individuals homozygous for haplotype D ( Figure  6B). All possible diplotypes were observed ( Figure 6A). Haplotype frequencies for individuals born in 3-year periods from 1985 to 2012 (5137 individuals with known birth years) varied over time (Figure 7).

HWE tests
None of the four life-history stages considered were found to deviate from HWE (Table 2). HWE tests were carried out for each cohort independently at each of the four life stages. The 2012 "known conceived" cohort was marginally out of HWE for the exact test (P ¼ 0.023, a threshold ¼ 0.025), with more homozygotes than expected (H E ¼ 235.30, H O ¼ 223), as was the 2004 "live born" cohort (P ¼ 0.021, a threshold ¼ 0.025), also with more homozygotes than expected (H E ¼ 232.02, H O ¼ 241). All other years and tests for each LH did not deviate from HWE (Supplementary  Table S3).

Discussion
Analyses of the evolutionary processes underlying variation at the MHC require population-scale data. To achieve this, a SNPbased genotyping method was developed for rapid and accurate determination of the class IIa diplotypes in Soay sheep. Using SNPs selected from the Ovine Infinium HD BeadChip, a linkage region surrounding the class IIa genes was identified, and from it a minimal panel of 11 SNPs was selected that would enable imputation of the class IIa haplotypes. Furthermore, a recombination event between haplotypes B and H was detected, and two intragenic SNPs were selected to detect such recombinant haplotypes. Finally, this panel of 13 SNPs was applied to 5349 Soay sheep using KASP genotyping technology.
The classical MHC class IIa genes were estimated as occurring between 25.353 and 25.812 Mb on the O. aries genome, although poor mapping within this region prevents identification of the exact positions of MHC genes. Nevertheless, 21 SNPs on the Ovine Infinium BeadChip were located within the identified class IIa gene region, with a 0.225 Mb "SNP desert" located in the center of this region. This lack of SNPs is not a reflection of a lack of variable SNPs in this region but rather their unsuitability for incorporation on a SNP chip due to the presence of low minor allele frequencies, and variable flanking sequences that prevent the design of robust primers and probes.
Pairwise measures of LD, D 0 and r 2 , showed opposing patterns across the MHC class IIa region. Pairwise D 0 was typically very high, whilst pairwise r 2 was typically very low in this region. This opposing pattern of D 0 and r 2 measures of LD was also found in the classes IIa, IIb, and III regions in Rylington Merino sheep (Lee et al. 2012). VanLiere and Rosenberg (2008) showed that r 2 and D 0 are not always directly related, and r 2 is sensitive to MAF, especially MAF < 0.3 which is not uncommon within the MHC. Lee et al. (2012) showed that pairwise LD using D 0 in Rylington Merino sheep was high within class IIa but decreased between the classes IIa and IIb regions, as well as between classes IIa and III regions. This suggests recombination within class IIa is reduced compared to recombination between it and the neighboring MHC subregions. Pairwise LD measures were therefore perhaps unlikely to reveal a relationship between the class IIa haplotypes and flanking SNPs on the Ovine Infinium HD BeadChip.
An alternative method of assessing the pattern of LD between the flanking SNPs and the class IIa region, the Four Gamete Rule, was used to identify linkage blocks as a starting point to cumulatively add SNPs to determine where SNP haplotypes degraded. This method detected a linkage region of 31 SNPs that matched the class IIa haplotypes. This class IIa linkage region extended between 25.39 and 25.83 Mb according to the allocated physical positions of the SNPs, which sits well within the estimated class IIa gene region of 25. 35-25.81 Mb. The downstream extent of the estimated class IIa gene region is marked by the gene BTNL2, which, because it is a single copy gene, is more likely to be accurately mapped than the other multi-copy genes in this region. The overlap between the estimated class IIa gene region and the SNP linkage region suggests that at least 24 of the 31 SNPs are located within the estimated class IIa region.
Diplotypes for the 5349 Soay sheep which passed quality control can be considered highly accurate. Indirect MHC typing methods, such as SNP genotyping, are faster and cheaper than direct sequencing methods but may incur an increased error rate which must be minimized. Phasing SNPs to determine the two SNP haplotypes carried by an individual, carried out here with BEAGLE (Browning and Browning 2007), uses information from all individuals within the dataset to calculate the most likely haplotypes. BEAGLE also uses this information to impute missing SNP genotypes. Therefore, phasing errors are increased when there is a high genotyping error rate, and thus errors can be reduced by using a more stringent genotyping rate cut-off across all SNPs. The repetition of a single individual on almost all DNA plates was valuable in detecting systematic errors on three plates, probably due to the DNA having been exhausted or degraded by freezethawing and revealed a genotyping error rate of 0.48% (after excluding plates 1-3). The Mendelian inheritance check demonstrated that the phased haplotypes were consistent between parents and offspring, with only three exceptions. In addition, only 20 novel haplotypes were identified in 26 individuals (0.5% of haplotyped individuals), which were probably due to genotyping errors, including allelic dropout. Whilst this suggests that genotyping errors can lead to incorrect haplotype inferences, in this population, with good pedigree information, it is clearly possible to detect such errors.
A recombination event between haplotypes B and H (Sanger haplotype B upstream, SNP haplotype H downstream) was identified in one individual (ID 4179) genotyped on the Ovine Infinium HD BeadChip. The recombination event must have occurred downstream of the expressed class IIa genes characterized within the Sanger haplotypes (DRB1, DQA, and DQB) but upstream of the SNP oar3_OAR20_25742763, leaving the functional class IIa haplotype unaffected. Individuals were selected for genotyping on the Ovine Infinium HD BeadChip because they were highly representative of the genetic variation within the Soay sheep population (Johnston et al. 2016), and male 4179 was included because he was very prolific with a large number of descendants. Using two SNPs located within the exons of the class IIa locus DQA1, this recombination was detected in 54 of 4179's descendants and, because both of 4179's parents were genotyped, we were able to detect the origin of the recombination event during gametogenesis in his mother. The use of SNPs located within loci that are not present on every haplotype, as for the two DQA1 loci, was less than ideal due to the increased error rate caused by nonspecific amplification. Nevertheless, their inclusion on the KASP panel enabled the identification of a substantial proportion of recombinant individuals with high accuracy, as apparent by a lack of parent-offspring mismatches following recombinant calling. Were this recombination event not identified, and therefore not detectable, a substantial number of haplotypes would have been incorrectly imputed, which could have impacted subsequent analyses. Whilst the DQA1 SNPs were effective for detecting the identified downstream B to H recombination, it is unlikely to be possible to detect novel recombination events involving upstream SNPs between haplotypes A and G. The six SNPs located upstream of the class IIa genes and the DQA1 SNPs were identical for these haplotypes (see Figures 3 and 4). All other haplotype pairs have unique upstream and downstream SNP profiles.
All possible diplotypes were identified within the population, although the rarest, homozygote for D, was only identified in six out of the 5349 individuals genotyped (0.11%). Over the study period , 5137 haplotyped individuals had known birth years (or expected birth years in the case of fetuses). There was no evidence of dramatic shifts in haplotype frequencies, but they were found to vary over time. There is evidence that haplotype frequencies have changed over time (Figure 7), for example, there may be a slight decrease in haplotype C and a slight increase in haplotype D, which warrants further investigation to identify whether these shifts are significant, and if so, if they are due to natural or selective forces.
There was no evidence of deviation from HWE for any of the LH stages assessed. Whilst HWE was also assessed in each cohort at each LH stage, sample sizes were often relatively small, which may limit the power to detect any deviations. Deviations from HWE were detected using the exact test for the "known conceived" stage in 2012 and for the heterozygote excess test in at the "live born" stage in 2004 (both an excess of homozygotes); however, significance was marginal after applying sequential Bonferroni correction and so some caution should be applied in considering this evidence of deviation from HWE. Whilst deviation from HWE can indicate selective or demographic processes operating within the population (Guo and Thompson 1992;Spurgin and Richardson 2010), lack of deviation from HWE does not necessarily provide good evidence to the contrary (Spurgin and Richardson 2010).
Genotyping the MHC is notoriously challenging (Bernatchez and Landry 2003;Piertney and Oliver 2006;Babik 2010;Spurgin and Richardson 2010) but accurate, locus-specific genotypes are vital for unravelling the modes of evolution operating to maintain diversity at the MHC. Previous work identified eight MHC class II haplotypes in the island population of Soay sheep (Dicks et al. 2019) using sequence-based genotyping, and here we have described a SNP-based (KASP) genotyping method which enabled rapid genotyping of 5349 sheep (out of a total 5675 sheep attempted). A panel of SNPs has the advantage of being relatively  easy to genotype, suitable for high-throughput genotyping and does not suffer from artifacts, but it is susceptible to incorrect haplotype phasing due to genotyping errors, particularly allelic dropout which may be most prevalent in poor quality or quantity DNA (Gill 2001;Butler 2005). The stringent quality control measures applied here should minimize any effects of genotyping and phasing errors, and the Soay sheep pedigree was particularly valuable here in enabling the inclusion of a Mendelian inheritance check as a quality control measure. As a method for genotyping MHC in a nonmodel organism, however, this panel of SNPs required high initial optimization effort in terms of preliminary class IIa sequencing and is not easily transferable to other breeds of sheep or other species without extensive preliminary sequencing of haplotypes. Nevertheless, this method has provided a valuable haplotype-level MHC data set that is accompanied by extensive phenotypic and fitness data that will enable future studies on the evolutionary processes acting to maintain diversity at the MHC in a natural environment.

Data availability
Raw genotype files and scripts used to convert raw genotypes into MHC genotypes can be found at figshare: https://doi.org/ 10.25387/g3.14673939.