Abstract

Detailed knowledge of linkage disequilibrium (LD) is regarded as a prerequisite for population-based disease gene mapping. Variable patterns across the human genome are now recognized, both between regions and populations. Here, we demonstrate that LD may also vary within a genomic region in a haplotype-specific manner. In 864 Caucasian unrelated individuals, we describe haplotype-specific LD patterns across the human MHC by the construction of gene-specific allelic haplotypes at 25 loci between HLA-A and Tapasin. Strong and extensive LD is found across both common and rare haplotypes, suggesting that haplotype structure is influenced by factors other than genetic drift, including both selection and differential haplotype recombination. Knowledge of haplotype-specific LD in the HLA may explain the apparent discrepant data from previous studies of global LD, help delineate key areas in mapping HLA-associated diseases and, together with recombination data, provide valuable information about a population's demographic history and the selective pressures operating on it.

INTRODUCTION

Linkage disequilibrium (LD) refers to the non-random association between alleles at adjacent genetic loci. The pattern of LD across the human genome varies markedly between regions and populations (13), and is currently a subject of intense interest both as a mapping tool for the identification of disease susceptibility genes, and for what it may reveal about a population's demographic history and the selective forces acting thereon. An understanding of regional LD may be exploited to fine map susceptibility genes within replicated linkage areas (4). This approach of LD mapping overcomes the prohibitive requirement of linkage analyses for large numbers of families (5) by making use of the many opportunities for crossovers between markers and a disease locus since the first appearance of the mutation. In addition, knowledge of LD will be an essential tool of future genome-wide association studies, providing estimates of the minimum marker density required to achieve acceptable power in a given region.

While LD might be expected to decline monotonically with distance from a specified marker due to recombination, a number of recent studies have failed to show a consistent relationship between LD and genetic distance. Indeed, in a recent study physical distance could account for less than 50% of the observed variation in LD (1). Other factors that influence LD include changes in population demographics (population size, admixture and migration), selective forces and local variation in recombination rates.

Studies have attempted to characterize LD in defined regions using either single-nucleotide polymorphisms (SNPs) or microsatellite markers. Results have generally been disappointingly noisy and inconsistent, with important localization information obscured by properties of non-relevant markers. Characterizing LD using underlying haplotypes rather than individual SNPs is preferable (6) as knowledge of common haplotypes and the unique SNPs that ‘tag’ them may help explain the complex patterns of LD, and minimize effort required to examine common variation within a genetic region (7).

Few genetic systems are as thoroughly characterized at the genomic, population and functional levels as the MHC. This region, known in humans as the human leukocyte antigen (HLA) complex, is located on chromosome 6p, and contains more than 200 densely packed highly polymorphic genes, many of which are involved in immune responses. Global LD characteristics between classical class I and II genes have been investigated for over 30 years (811), and the term ‘ancestral haplotype’ has been coined to describe conserved, population specific haplotypes (12). These comprise stretches of consistently high LD (‘frozen blocks’) interspersed with short intervals of rapid breakdown where blocks have been shuffled in recent evolution. This produces a step-wise decay in LD indicating that recombination is localised to hotspots between highly conserved areas in this region. This relationship between global LD and recombination has recently been confirmed within a 216 kb region of HLA class II using sperm typing (13). However, little is currently known about LD across specific HLA haplotypes. This knowledge, in conjunction with understanding of regional recombination, may yield useful information regarding the role of HLA in disease resistance and susceptibility, and in addition may elucidate functional relationships between loci. The aim of this study was to investigate patterns of LD across specific common Caucasian extended HLA haplotypes by mapping across 3.5 MB from HLA-A to TAPBP in a large, ethnically homogeneous population.

RESULTS

A total of 864 unrelated UK Caucasians were genotyped for markers across 25 genes over 3.5 MB from HLA-A to TAPBP. Individuals were recruited from a Crohn's disease cohort, an ulcerative colitis cohort and healthy controls. For the classical HLA and MIC genes, haplotypes were resolved experimentally using allele-specific PCR amplification. SNPs within all other genes were constructed into intra-gene haplotypes using a statistical haplotype reconstruction method implemented in the computer software, PHASE (14). Haplotype reconstruction in studies of either families or unrelated individuals is critically dependent upon the genotyping error rate (15). Identifying such errors in studies of unrelated individuals rather than families is more difficult. In our cohort we attempted to minimize this error by re-genotyping all samples identified as possessing rare intra-gene haplotypes (defined as occurring less than 10 times). This iterative process proved to be a useful method of highlighting genotyping errors, significantly reducing the number of identified haplotypes for each gene.

For each intra-gene haplotype, PHASE was run separately for each disease group and controls, which revealed identical intra-gene haplotype structure (data not shown), although, as expected, significant differences in frequency were apparent. The probability of accurate phase call for each constituent SNP within an intra-gene haplotype was >95% for 99.2% of SNPs. Intra-gene haplotype definitions for the combined population are shown for all genes, apart from the classical HLA class I and II and MIC genes for which established WHO nomenclature is used (Fig. 1).

Patterns of LD in this region were initially studied using the D′ metric (16) according to the method of Reich et al. (2). In order to calculate D′, maximum-likelihood two locus haplotype frequencies were estimated using the expectation-maximization (EM) algorithm implemented in the ARLEQUIN computer program (17). LD was measured between each intra-gene haplotype block or SNP and a predetermined anchor point. The choice of anchor point using this method is arbitrary, but in this study HLA-B was chosen as the most informative locus due to its high degree of polymorphism. Data using other anchor HLA points is available on request. In order to study patterns of LD decay along individual haplotypes, we needed to define extended haplotypes. In this study we term these ‘surrogate inter-gene haplotypes (SIH)’ to reflect the fact that these are constructed from a single fixed anchor point using pair-wise comparison: SIH were defined at each locus by the intra-gene haplotype that demonstrated the strongest LD (as measured by D′) from the selected anchor point, and were constructed for the 14 most common HLA-B alleles (Table 1). These comprise 83.7% of all HLA-B alleles in this population. SIH-specific LD is presented graphically as haplotopes (Fig. 2), which demonstrate that D′ declines with distance in a stepwise fashion from the anchor point. Importantly, haplotopes in patients and controls were very similar, as shown for the three most common HLA-B alleles in Figure 2B. Therefore to maximize information, haplotopes are presented derived from the combined patient and control groups.

The haplotopes clearly show that regional LD varies according to haplotype: strong LD is observed across the SIHs defined by both common alleles (e.g. HLA-B*0801, HLA-B*5701, HLA-B*4403) and the less frequent (e.g. HLA-B*1302). The variability of LD in this region is illustrated by the half-length of LD, which is greater than 3.5 MB for the HLA-B*0801 EIH compared with 0.4 MB for HLA-B*1801. All SIH demonstrate consistent drops in LD centromeric to HLA-DQ, reflecting the known recombination hotspots in this area (18). Noise is introduced by genes where only single markers are studied, demonstrating the advantage of characterising LD using underlying gene blocks rather than individual SNPs.

The construction of SIH is dependent upon the anchor point selected and is limited by pair-wise comparison of markers. As an alternative, fixed multi-locus haplotypes (MLH) were constructed. To construct MLH from multiple intra-gene haplotypes we attempted to use both the EM and the Markov chain–Monte Carlo (MCMC) algorithms implemented in the ARLEQUIN and PHASE computer programs, respectively. However, using either a 500 MHz G4 Macintosh (OSX) or a 1.8 GHz Pentium 4 computer (Linux OS), we found that only the PHASE program was able to process the large number of data required. This latter program is primarily designed to analyse bi-allelic data or multi-allelic microsatellite data. To analyse multi-allelic data that do not follow a stepwise mutational process, as in our data, the stringent PHASE criteria can however be relaxed using the −d option (14). In addition, there are known recombination points within the region analysed which fail a further assumption of PHASE. However, communication with the author of PHASE and initial trials by ourselves (data not shown) suggest that PHASE is able to deal with some recombination without significant loss of accuracy. However to minimize the amount of recombination included in this analysis, we confined MLH to a smaller region between HLA-Cw and HLA-DQB1. In construction of these MLH the phase call certainty was estimated to be more than 95% for 93.6% of all constituent intra-gene haplotypes. Some 63% of all MLH achieved a phase call certainty of greater than 95% at all constituent intra-gene haplotypes. This figure reflects the size of the region studied (3.5 MB) and the number of loci analysed within it.

For the construction of both SIH and MLH, the ancestral haplotype was arbitrarily defined as the most common. The multi-locus haplotype of the classical autoimmune haplotype is highly conserved with 61% of all HLA-B*0801 positive chromosomes being identical to the ancestral haplotype (Fig. 3A). In contrast, the intact HLA-B*1501 ancestral haplotype is only observed on 13% of HLA-B*1501 positive chromosomes (Fig. 3C). The MLH map for HLA-B*0702 (Fig. 3B) demonstrates conservation either side of the IKBL intra-gene haplotype, demonstrating that a conserved ancestral haplotype may be disrupted by a point mutation or conversion event rather than by recombination. This raises the possibility that even highly conserved haplotypes may not be identical at all loci.

Direct comparison between SIH and MLH is difficult. However comparison of the ancestral haplotypes derived using both methods showed the structures to be virtually, but not completely, identical to each other, and to those previously derived from CEPH family studies (12). Differences in the haplotypes constructed by these methods are illustrated by HLA-B*0702. The most common B*0702 surrogate haplotype includes the IKBL-H1 intra-gene haplotype (Fig. 1). In contrast, the most common B*0702 multilocus haplotype contains IKBL-H5 (21%), and the IKBL-H1 intra-gene haplotype is the second most common, present in 16% (Fig. 3).

DISCUSSION

In this study of 864 unrelated individuals, we investigate LD patterns across the human MHC, and convincingly demonstrate that the extent of LD varies in a haplotype-specific manner. These differences cannot be explained by genetic drift alone, which would predict short-range LD around common, typically old alleles. The observation that the HLA-B*0801 haplotype is both common and highly conserved suggests that other factors must be acting to maintain this haplotype at high frequency. One possibility is positive selection for a specific allele. If this occurs over a relatively short period of time the opportunity for recombination is insufficient to substantially break down the haplotype and entire haplotypes flanking a favoured variant may be rapidly swept to high frequency by a hitchhiking effect. This may result in disproportionately long-range LD around a specified allele given its population frequency. Alternatively, epistatic selection for combinations of alleles at two or more loci on a chromosome may influence LD patterns across haplotypes. Such molecular cooperation between alleles may, for example, explain the cis association of specific DQA1 and DQB1 alleles that ensures a functional DQ heterodimer (19). Additionally, haplotype-specific patterns of LD may reflect haplotype-specific recombination hotspots. There is growing evidence that recombination is localized to hotspots, leading to strong global LD across non-recombining regions and breakdown at hotspots. The possibility of haplotype-specific hotspot activity in humans has not previously been explored, although studies in the mouse suggest that certain MHC haplotypes are more prone to recombination than others (20). Investigating this using traditional methods requires prohibitively large numbers of families to provide sufficient informative meioses across a comparatively small area. If this phenomenon does indeed exist, it may be due to differences in signal sequence, or structural barriers such as variable haplotype lengths (21,22) and gene organization (23,24), which inhibit alignment of homologous chromosomes necessary for recombination. Sperm typing may address this issue more convincingly by allowing the efficient identification of sufficient recombinant chromosomes of selected haplotypes.

Measurement of LD requires an accurate estimation of haplotype frequency, which itself depends upon knowledge that the markers are sited on the same chromosome (phase). Traditionally family genotype data has been used. However LD is often poorly estimated using this method, as prohibitively large numbers of families are required. In this study we chose to genotype unrelated individuals rather than families. This approach offers both increased genotyping efficiency, particularly important in the HLA where high throughput automated typing is still not available, and simpler case ascertainment.

A number of statistical methods, all with significant limitations, have been developed to estimate haplotype frequencies and reconstruct individual haplotypes in the absence of parental data. These include a sequential inference approach (Clark's algorithm) (25), several EM techniques (2628) (used here in ARLEQUIN) (17) and an MCMC (14,29,30) (as used in PHASE) (14). Factors that may influence the accuracy of these statistical methods are now being clarified, and include sample size, allele frequency distributions, departures from Hardy–Weinberg, linkage disequilibrium, and genotype error rate (31). Importantly, in areas characterized by moderate or high LD, and relatively few common but many rare haplotypes (such as the HLA complex), it appears there is little or no advantage to constructing haplotypes from family data rather than unrelated individuals (15).

These statistical methods differ in the way they define ambiguous haplotypes. At present there are insufficient real datasets of known haplotypes to allow statistical comparison, and our initial aim was to use both the EM and MCMC algorithms to reconstruct MLH. The EM algorithm is limited by the need to store estimated haplotype frequencies for every possible haplotype in the sample, requirements which increase exponentially with each additional locus. When constructing MLH, we found that the EM algorithm could not cope with large amounts of complex multi-allelic data, and thus the MCMC algorithm was used for MLH construction, and the EM algorithm was only used to estimate maximum-likelihood two-locus haplotype frequencies for the purpose of calculating D′. Ancestral haplotypes derived using D′ (SIH) were essentially identical to the multi-locus haplotypes derived using PHASE.

Both haplotype reconstruction and linkage disequilibrium measures are critically dependant on accurate genotyping (15,32). When genotyping a large number of markers an error rate of only 1% will produce a large number of inaccurate haplotypes. In this study, a number of steps were taken to minimize error in genotyping and haplotype reconstruction. In particular, all samples identified as possessing rare reconstructed haplotypes were re-genotyped. This iterative approach significantly reduced the number of haplotypes identified. In addition, for classical HLA genes, the genotyping system used incorporates significant redundancy (33), which provides a method of internal audit. Each complex pattern of primer reactions was analysed by a specialist in the field and, in addition, the Assign-it program (M. Barnardo, personal communication) was used to unambiguously designate alleles according to the WHO nomenclature.

The marked haplotype-specific variation in LD shown here adds a further level of complexity to our understanding of LD in the human genome. Initial descriptions of LD within specific genes were soon followed by the realization that LD extent varies both between genomic areas and between ethnic groups. The haplotype-specific variation in LD described here provides the basis for a detailed, second-generation map of the topography of the HLA. This information may help explain the apparent discrepant data from previous studies of global LD, and delineate more clearly the area to be studied when investigating HLA-associated diseases. Mapping a disease-associated polymorphism to a specific extended haplotype will help delineate the genetic boundaries within which the search for the primary susceptibility gene should be targeted. Data from this study suggests that these regions vary widely in size, in a fashion that is not immediately predictable with knowledge of allele frequency.

The complex haplotype structure and patterns of LD observed in this study relate to the HLA region. Recent studies of global LD across other regions have demonstrated the presence of only limited diversity (34,35), and haplotype-specific LD has not been reported. Furthermore, our study was limited to UK Caucasian individuals and it will now be important to determine whether patterns of LD across this area are population-specific. This information may be combined with knowledge of population histories to tease apart the relative importance of selection and demography in generating LD among genes in the HLA region.

MATERIAL AND METHODS

Subjects

The Caucasian populations sampled have previously been described (36) and include a UK Crohn's disease cohort (n=244), a UK ulcerative colitis cohort (n=300) and a UK control population (n=320) selected at random from 10 000 healthy individuals, attending general practice health screening clinics within the same geographical area in the UK (37). All subjects gave informed consent and the study was approved by the Central Oxford Research Ethics Committee (COREC 00.083).

Selection of polymorphisms and genotyping

Polymorphisms in a total of 25 genes were studied (Fig. 1). Complete genotyping results were available for all individuals. Medium resolution classical HLA class I and II typing (HLA-A, B, Cw, DRB1, DQB1) (33) and high-resolution MICA and MICB typing (36) was carried out as previously described. Additional SNPs were selected on the basis of location, previously published frequencies if known, and previously reported disease associations. SNPs are named according to the NCBI reference number when available. Single SNPs only were studied in BAT 1 (novel), PSMB8 (rs2071543), PSMB9 (rs1044244) and TAPBP (rs2071888).

Four-hundred PCR reactions were carried out on each individual using PCR-SSP. Each individual reaction also contained primers to amplify a second non-polymorphic genomic control segment. A single thermocycling programme was used for all reactions. The products were electrophoresed in 1% agarose with ethidium bromide and viewed under UV light. All gels were scored twice by independent investigators. Hardy–Weinberg equilibrium was confirmed for all SNPs. Details of primer sequences and thermocycling conditions are available on our website (www.medicine.ox.ac.uk/gastro).

Intra-gene haplotype block construction

For the classical HLA class I, class II and MIC genes, the Assign-it program (M. Barnardo, personal communication) was used to assist in the designation of HLA and MIC alleles. For all other genes, SNPs were constructed into gene-specific haplotypes or ‘intra-gene haplotypes’ using a statistical haplotype reconstruction method, PHASE (14). This software was recompiled to run under the Macintosh OS10 operating system. Bayesian principles underlie PHASE, which is based upon the coalescent theory using Gibbs sampling, a type of MCMC algorithm. Importantly, this estimates the uncertainty associated with each phase call. For the construction of each intra-gene haplotype block, 30 000 iterations of the PHASE algorithm were used, each comprising 100 runs through the Markov chain.

Inter-gene linkage disequilibrium and haplotype inference

Ancestral haplotypes were described in two ways. Surrogate inter-gene haplotypes (SIH) were defined using LD. This was estimated between each intra-gene haplotype block and a predetermined anchor point using the D′ metric according to the method of Reich et al. (2). In order to calculate D′, maximum-likelihood two locus haplotype frequencies were estimated using the EM algorithm implemented in the ARLEQUIN computer program (17). The EM algorithm is also based on a Markov chain approach, and calculates the most likely pair-wise haplotypes and their frequencies. 500 000 iterations of the Markov chain were carried out within ARLEQUIN. D′ has the same range of values regardless of the frequency of the SNPs or haplotypes compared. In a large sample, a D′ of 1 indicates complete LD; 0 corresponds to no LD. A further measure, the ‘half-length of LD’ (i.e. the distance at which D′ drops below 0.5), was determined for each SIH. SIH were defined at each locus by the intra-gene haplotype that demonstrated the strongest LD (as measured by D′) from the selected anchor point. Multi-locus haplotypes (MLH) were constructed by the PHASE program using multiple intra-gene haplotype blocks. The ancestral haplotype in this instance was determined to be the most common.

ACKNOWLEDGEMENTS

We thank the patients that participated in this study. Dr Matthew Stephens was most helpful in discussion regarding the use of PHASE. Tariq Ahmad is supported by a grant from the National Association of Crohn's and Colitis (NACC), UK.

*

To whom correspondence should be addressed. Tel: +44 1865224938; Fax: +44 1865790792; E-mail: tariq.ahmad@ndm.ox.ac.uk

Figure 1. SNPs and corresponding intra-gene haplotypes studied in the HLA. SNPs genotyped are shown with their corresponding NCBI reference number. Intra-gene haplotypes constructed from these SNPS are shown. Shaded cells indicate the less common variant of each SNP. For simplicity, classical HLA class I, class II and MIC gene component SNPs are omitted. The WHO committee allele nomenclature already designates an intra-gene haplotype as an allele. For these genes, the genotyping method used permitted the following resolution: HLA-A, 78 alleles; HLA-C, 41 alleles; HLA-B, 212 alleles; MICA, 54 alleles; MICB, 17 alleles; HLA-DRB3, seven alleles; HLA-DRB4, three alleles; HLA-DRB5, two alleles; HLA-DRB1, 131 alleles; HLA-DQB1, 13 alleles.

Figure 1. SNPs and corresponding intra-gene haplotypes studied in the HLA. SNPs genotyped are shown with their corresponding NCBI reference number. Intra-gene haplotypes constructed from these SNPS are shown. Shaded cells indicate the less common variant of each SNP. For simplicity, classical HLA class I, class II and MIC gene component SNPs are omitted. The WHO committee allele nomenclature already designates an intra-gene haplotype as an allele. For these genes, the genotyping method used permitted the following resolution: HLA-A, 78 alleles; HLA-C, 41 alleles; HLA-B, 212 alleles; MICA, 54 alleles; MICB, 17 alleles; HLA-DRB3, seven alleles; HLA-DRB4, three alleles; HLA-DRB5, two alleles; HLA-DRB1, 131 alleles; HLA-DQB1, 13 alleles.

Figure 2. Haplotope maps across the MHC. Surrogate inter-gene haplotypes are shown graphically as ‘haplotopes’. D′ (y-axis) is plotted against physical distance (x-axis). Positional coordinates from the Welcome Trust Sanger centre website are used www.sanger.ac.uk/HGP/Ch6/RFP_to_KNSL2_gene_list.txt. Although the true distance between genes may vary in a haplotype-specific manner due to gene duplication events, this has been kept constant on the haplotopes for ease of comparison. (A) Haplotopes for the 12 most common HLA-B alleles derived from the combined patient and control groups. Trios are shown for ease of presentation. (B) Haplotopes for the three most common HLA-B alleles shown separately for patients and controls to illustrate LD pattern within each cohort.

Figure 2. Haplotope maps across the MHC. Surrogate inter-gene haplotypes are shown graphically as ‘haplotopes’. D′ (y-axis) is plotted against physical distance (x-axis). Positional coordinates from the Welcome Trust Sanger centre website are used www.sanger.ac.uk/HGP/Ch6/RFP_to_KNSL2_gene_list.txt. Although the true distance between genes may vary in a haplotype-specific manner due to gene duplication events, this has been kept constant on the haplotopes for ease of comparison. (A) Haplotopes for the 12 most common HLA-B alleles derived from the combined patient and control groups. Trios are shown for ease of presentation. (B) Haplotopes for the three most common HLA-B alleles shown separately for patients and controls to illustrate LD pattern within each cohort.

Figure 3. Multi-locus haplotype maps. Multi-locus haplotypes were constructed from HLA-Cw to HLA-DQB1. Vertical axes have been scaled for all groups to show the proportion of haplotypes possessing individual components of the ancestral haplotype. The HLA-B*0801 (A), HLA-B*0702 (B), and HLA-B*1501 (C) multi-locus haplotype maps are shown.

Figure 3. Multi-locus haplotype maps. Multi-locus haplotypes were constructed from HLA-Cw to HLA-DQB1. Vertical axes have been scaled for all groups to show the proportion of haplotypes possessing individual components of the ancestral haplotype. The HLA-B*0801 (A), HLA-B*0702 (B), and HLA-B*1501 (C) multi-locus haplotype maps are shown.

Table 1.

Surrogate inter-gene haplotypes as defined by pairwise LD from an HLA-B anchor point. SIH for the 14 most common HLA-B alleles are shown. These were defined at each locus by the intra-gene haplotype that demonstrated the greatest D′ value with the specific anchored HLA-B allele. For purposes of comparison, nomenclature for each haplotype refers to that given by Dawkins et al. (12). Nomenclature for intra-gene haplotypes is given in Figure 1 and these are shown in bold if the half-length of D′ is >0.5

  Intragene haplotype Anchor point Intragene haplotype 
Ancestral haplotype HLA-B allele frequency HLA-A CDSN HLA-Cw HLA-B MICA MICB BAT1 NFKBIL1 LTA TNF 1C7 AIF1 HSP70 RAGE NOTCH4 DRB1 DQB1 PSMB8 PSMB9 RXRB TAPBP 
7.1 14.5% *0301 3 *0702 *0702 *00801 0104 A 1 1 1 2 2 4 1 4 *1501 *0602 C g 2 C 
8.1 13.0% *0101 3 *0701 *0801 *00801 0106 C 3 4 3 2 2 1 1 5 *0301 *0201 C A 3 C 
44.1 11.0% *0201 3 *0501 *4402 *00801 0102 A 3 3 3 2 2 3 3 7 *0401 *0301 C A 3 g 
62.1 7.4% *0201 3 *0303 *1501 *010 0103 A 6 5 4 4 3 3 2 2 *0401 *0302 A G 2 g 
44.2 6.9% *2901 3 *1601 *4403 *004 0102 A 5 1 1 2 2 2 1 6 *0701 *0202 C g 1 C 
35.2 5.3% *1101 3 *0401 *3501 *00201 0103 A 4 6 4 4 3 3 1 2 *0101 *0501 A g 2 C 
60.1 5.2% *0201 1 *0304 *4001 *00801 0103 A 5 4 1 2 3 3 1 2 *0404 *0302 A g 1 C 
51.1 4.5% *0201 2 *1502 *5101 *00901 0103 A 5 6 4 6 2 3 2 8 *1101 *0301 C A 1 g 
57.1 4.1% *0101 2 *0602 *5701 *017 0105 A 5 2 5 2 2 5 1 3 *0701 *0303 A g 3 C 
27.1 4.0% *0201 2 *0202 *2705 *00701 0102 A 3 4 1 2 3 3 1 2 *0101 *0501 C g 2 g 
18.2 3.8% *2501 2 *0701 *1801 *018 0103 A 5 1 2 1 2 3 2 2 *0301 *0201 A g 3 g 
65.2 2.2% *0301 1 *0802 *1402 *011 0102 A 4 5 4 3 2 1 1 *1303 *0301 A g 3 C 
64.1 1.8% *3201 3 *0802 *1401 *019 0102 A 5 2 1 5 2 1 1 1 *0701 *0202 A g 3 C 
13.1 1.6% *3001 2 *0602 *1302 *00801 0102 A 5 1 1 2 2 1 1 5 *0701 *0202 C g 1 C 
  Intragene haplotype Anchor point Intragene haplotype 
Ancestral haplotype HLA-B allele frequency HLA-A CDSN HLA-Cw HLA-B MICA MICB BAT1 NFKBIL1 LTA TNF 1C7 AIF1 HSP70 RAGE NOTCH4 DRB1 DQB1 PSMB8 PSMB9 RXRB TAPBP 
7.1 14.5% *0301 3 *0702 *0702 *00801 0104 A 1 1 1 2 2 4 1 4 *1501 *0602 C g 2 C 
8.1 13.0% *0101 3 *0701 *0801 *00801 0106 C 3 4 3 2 2 1 1 5 *0301 *0201 C A 3 C 
44.1 11.0% *0201 3 *0501 *4402 *00801 0102 A 3 3 3 2 2 3 3 7 *0401 *0301 C A 3 g 
62.1 7.4% *0201 3 *0303 *1501 *010 0103 A 6 5 4 4 3 3 2 2 *0401 *0302 A G 2 g 
44.2 6.9% *2901 3 *1601 *4403 *004 0102 A 5 1 1 2 2 2 1 6 *0701 *0202 C g 1 C 
35.2 5.3% *1101 3 *0401 *3501 *00201 0103 A 4 6 4 4 3 3 1 2 *0101 *0501 A g 2 C 
60.1 5.2% *0201 1 *0304 *4001 *00801 0103 A 5 4 1 2 3 3 1 2 *0404 *0302 A g 1 C 
51.1 4.5% *0201 2 *1502 *5101 *00901 0103 A 5 6 4 6 2 3 2 8 *1101 *0301 C A 1 g 
57.1 4.1% *0101 2 *0602 *5701 *017 0105 A 5 2 5 2 2 5 1 3 *0701 *0303 A g 3 C 
27.1 4.0% *0201 2 *0202 *2705 *00701 0102 A 3 4 1 2 3 3 1 2 *0101 *0501 C g 2 g 
18.2 3.8% *2501 2 *0701 *1801 *018 0103 A 5 1 2 1 2 3 2 2 *0301 *0201 A g 3 g 
65.2 2.2% *0301 1 *0802 *1402 *011 0102 A 4 5 4 3 2 1 1 *1303 *0301 A g 3 C 
64.1 1.8% *3201 3 *0802 *1401 *019 0102 A 5 2 1 5 2 1 1 1 *0701 *0202 A g 3 C 
13.1 1.6% *3001 2 *0602 *1302 *00801 0102 A 5 1 1 2 2 1 1 5 *0701 *0202 C g 1 C 

References

1
Abecasis, G.R., Noguchi, E., Heinzmann, A., Traherne, J.A., Bhattacharyya, S., Leaves, N.I., Anderson, G.G., Zhang, Y., Lench, N.J., Carey, A. et al. (
2001
) Extent and distribution of linkage disequilibrium in three genomic regions.
Am. J. Hum. Genet.
 ,
68
,
191
–197.
2
Reich, D.E., Cargill, M., Bolk, S., Ireland, J., Sabeti, P.C., Richter, D.J., Lavery, T., Kouyoumjian, R., Farhadian, S.F., Ward, R. et al. (
2001
) Linkage disequilibrium in the human genome.
Nature
 ,
411
,
199
–204.
3
Stephens, J.C., Schneider, J.A., Tanguay, D.A., Choi, J., Acharya, T., Stanley, S.E., Jiang, R., Messer, C.J., Chew, A., Han, J.H. et al. (
2001
) Haplotype variation and linkage disequilibrium in 313 human genes.
Science
 ,
293
,
489
–493.
4
Hugot, J.P., Chamaillard, M., Zouali, H., Lesage, S., Cezard, J.P., Belaiche, J., Almer, S., Tysk, C., O'Morain, C.A., Gassull, M. et al. (
2001
) Association of NOD2 leucine-rich repeat variants with susceptibility to Crohn's disease.
Nature
 ,
411
,
599
–603.
5
Risch, N. and Merikangas, K. (
1996
) The future of genetic studies of complex human diseases.
Science
 ,
273
,
1516
–1517.
6
Daly, M.J., Rioux, J.D., Schaffner, S.F., Hudson, T.J. and Lander, E.S. (
2001
) High-resolution haplotype structure in the human genome.
Nat. Genet.
 ,
29
,
229
–232.
7
Johnson, G.C., Esposito, L., Barratt, B.J., Smith, A.N., Heward, J., Di Genova, G., Ueda, H., Cordell, H.J., Eaves, I.A., Dudbridge, F. et al. (
2001
) Haplotype tagging for the identification of common disease genes.
Nat. Genet.
 ,
29
,
233
–237.
8
Bodmer, W.F. (
1972
) Evolutionary significance of the HL-A system.
Nature
 ,
237
,
139
–145.
9
Carrington, M., Stephens, J.C., Klitz, W., Begovich, A.B., Erlich, H.A. and Mann, D. (
1994
) Major histocompatibility complex class II haplotypes and linkage disequilibrium values observed in the CEPH families.
Hum. Immunol.
 ,
41
,
234
–240.
10
Djilali-Saiah, I., Benini, V., Daniel, S., Assan, R., Bach, J.F. and Caillat-Zucman, S. (
1996
) Linkage disequilibrium between HLA class II (DR, DQ, DP) and antigen processing (LMP, TAP, DM) genes of the major histocompatibility complex.
Tissue Antigens
 ,
48
,
87
–92.
11
Sanchez-Mazas, A., Djoulah, S., Busson, M., Le Monnier de Gouville, I., Poirier, J.C., Dehay, C., Charron, D., Excoffier, L., Schneider, S., Langaney, A. et al. (
2000
) A linkage disequilibrium map of the MHC region based on the analysis of 14 loci haplotypes in 50 French families.
Eur. J. Hum. Genet.
 ,
8
,
33
–41.
12
Dawkins, R., Leelayuwat, C., Gaudieri, S., Tay, G., Hui, J., Cattley, S., Martinez, P. and Kulski, J. (
1999
) Genomics of the major histocompatibility complex: haplotypes, duplication, retroviruses and disease.
Immunol. Rev.
 ,
167
,
275
–304.
13
Jeffreys, A.J., Kauppi, L. and Neumann, R. (
2001
) Intensely punctate meiotic recombination in the class II region of the major histocompatibility complex.
Nat. Genet.
 ,
29
,
217
–222.
14
Stephens, M., Smith, N.J. and Donnelly, P. (
2001
) A new statistical method for haplotype reconstruction from population data.
Am. J. Hum. Genet.
 ,
68
,
978
–989.
15
Kirk, K.M. and Cardon, L.R. (
2002
) The impact of genotyping error on haplotype reconstruction and frequency estimation.
Eur. J. Hum. Genet.
 ,
10
,
616
–622.
16
Lewontin, R.C. (
1988
) On measures of gametic disequilibrium.
Genetics
 ,
120
,
849
–852.
17
Schneider, S., Kueffer, J-M., Roessli, D., and Excoffier, L. (
1997
)
Arlequin: a Software Package for Polulation Genetics. 2.1 ed.
  Genetics and Biometry Laboratory, Department of Anthropology, University of Geneva, Geneva.
18
Cullen, M., Noble, J., Erlich, H., Thorpe, K., Beck, S., Klitz, W., Trowsdale, J. and Carrington, M. (
1997
) Characterization of recombination in the HLA class II region.
Am. J. Hum. Genet.
 ,
60
,
397
–407.
19
Kwok, W.W., Schwarz, D., Nepom, B.S., Hock, R.A., Thurtle, P.S. and Nepom, G.T. (
1988
) HLA-DQ molecules form alpha-beta heterodimers of mixed allotype.
J. Immunol.
 ,
141
,
3123
–3127.
20
Yoshino, M., Sagai, T., Lindahl, K.F., Toyoda, Y., Shirayoshi, Y., Matsumoto, K., Sugaya, K., Ikemura, T., Moriwaki, K. and Shiroishi, T. (
1994
) Recombination in the class III region of the mouse major histocompatibility complex.
Immunogenetics
 ,
40
,
280
–286.
21
Niven, M.J., Hitman, G.A., Pearce, H., Marshall, B. and Sachs, J.A. (
1990
) Large haplotype-specific differences in inter-genic distances in human MHC shown by pulsed field electrophoresis mapping of healthy and type 1 diabetic subjects.
Tissue Antigens
 ,
36
,
19
–24.
22
Zhang, W.J., Degli-Esposti, M.A., Cobain, T.J., Cameron, P.U., Christiansen, F.T. and Dawkins, R.L. (
1990
) Differences in gene copy number carried by different MHC ancestral haplotypes. Quantitation after physical separation of haplotypes by pulsed field gel electrophoresis.
J. Exp. Med.
 ,
171
,
2101
–2114.
23
Kendall, E., Todd, J.A. and Campbell, R.D. (
1991
) Molecular analysis of the MHC class II region in DR4, DR7, and DR9 haplotypes.
Immunogenetics
 ,
34
,
349
–357.
24
Spies, T., Sorrentino, R., Boss, J.M., Okada, K. and Strominger, J.L. (
1985
) Structural organization of the DR subregion of the human major histocompatibility complex.
Proc. Natl Acad. Sci. USA
 ,
82
,
5165
–5169.
25
Clark, A.G. (
1990
) Inference of haplotypes from PCR-amplified samples of diploid populations.
Mol. Biol. Evol.
 ,
7
,
111
–122.
26
Excoffier, L. and Slatkin, M. (
1995
) Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population.
Mol. Biol. Evol.
 ,
12
,
921
–927.
27
Hawley, M.E. and Kidd, K.K. (
1995
) HAPLO: a program using the EM algorithm to estimate the frequencies of multi-site haplotypes.
J. Hered.
 ,
86
,
409
–411.
28
Long, J.C., Williams, R.C. and Urbanek, M. (
1995
) An E-M algorithm and testing strategy for multiple-locus haplotypes.
Am. J. Hum. Genet.
 ,
56
,
799
–810.
29
Niu, T., Qin, Z.S., Xu, X. and Liu, J.S. (
2002
) Bayesian haplotype inference for multiple linked single-nucleotide polymorphisms.
Am. J. Hum. Genet.
 ,
70
,
157
–169.
30
Liu, J.S., Sabatti, C., Teng, J., Keats, B.J. and Risch, N. (
2001
) Bayesian analysis of haplotypes for linkage disequilibrium mapping.
Genome Res.
 ,
11
,
1716
–1724.
31
Fallin, D. and Schork, N.J. (
2000
) Accuracy of haplotype frequency estimation for biallelic loci, via the expectation-maximization algorithm for unphased diploid genotype data.
Am. J. Hum. Genet.
 ,
67
,
947
–959.
32
Akey, J.M., Zhang, K., Xiong, M., Doris, P. and Jin, L. (
2001
) The effect that genotyping errors have on the robustness of common linkage-disequilibrium measures.
Am. J. Hum. Genet.
 ,
68
,
1447
–1456.
33
Welsh, K. and Bunce, M. (
1999
) Molecular typing for the MHC with PCR-SSP.
Rev. Immunogenet.
 ,
1
,
157
–176.
34
Patil, N., Berno, A.J., Hinds, D.A., Barrett, W.A., Doshi, J.M., Hacker, C.R., Kautzer, C.R., Lee, D.H., Marjoribanks, C., McDonough, D.P. et al. (
2001
) Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21.
Science
 ,
294
,
1719
–1723.
35
Dawson, E., Abecasis, G.R., Bumpstead, S., Chen, Y., Hunt, S., Beare, D.M., Pabial, J., Dibling, T., Tinsley, E., Kirby, S. et al. (
2002
) A first-generation linkage disequilibrium map of human chromosome 22.
Nature
 ,
418
,
544
–548.
36
Ahmad, T., Marshall, S.E., Mulcahy-Hawes, K., Orchard, T.R., Crawshaw, J., Armuzzi, A., Neville, M., van Heel, D., Barnardo, M., Welsh, K. et al. (
2002
) High resolution MIC genotyping: Design and application to the investigation of inflammatory bowel disease susceptibility.
Tissue Antigens
 ,
60
,
164
–179.
37
ICRF (
1995
) OXCHECK Study Group. Effectiveness of health checks conducted by nurses in primary care: final results of the OXCHECK study.
Br. Med. J.
 ,
310
,
1099
–1104.