Ligation Haplotyping is a robust, novel method for experimental determination of haplotypes over long distances, which can be applied to assaying both sequence and structural variation. The simplicity and efficacy of the method for genotyping large chromosomal rearrangements and haplotyping SNPs over long distances make it a valuable and powerful addition to the methodological repertoire, which will be beneficial to studies of population genetics and evolution, disease association and inheritance, and genomic variation. We illustrate the versatility of the method both by genotyping a Yp paracentric inversion, found in ∼ 60% of Northwest European males, that strongly influences the germline rate of infertility-causing XY translocations and by haplotyping two autosomal SNPs that lie 16.4 kb apart on chromosome 7, and which influence an individual's susceptibility to systemic lupus erythematosus.
For a diploid organism, empirical haplotype data is difficult to obtain. There is increasing evidence that gene expression is regulated over long distances, and thus the phase of SNPs is of considerable functional relevance ( 4 ). Statistical methods of assigning haplotypes to genotypes, such as PHASE ( 1 ), are relied upon heavily, and are generally reliable over short ranges ( 2 ), with switch errors occurring infrequently ( 3 ), but these approaches are only highly reliable in regions of the genome where linkage disequilibrium (LD) is high. In many parts of the genome with lower LD, statistical methods would be inaccurate, and reliable haplotype status could only be obtained from experimental data ( 5 ). However, robust experimental methods for determining haplotypes over long distances are lacking.
Previously, we developed Haplotype Fusion PCR (HF-PCR), a method to juxtapose DNA sequences on the same single molecule, thus retaining phase information ( 6 ). Here we demonstrate a ligation haplotyping assay that can be applied to the condensed haplotypes produced by HF-PCR, which enables high-throughput long-range haplotyping of SNPs, as well as genotyping of chromosomal inversions. Both of these forms of genetic variation have hitherto proved refractory to high-throughput experimental analyses.
Previously published approaches to determining haplotypes have relied on allele-specific PCR ( 7 ), long range PCR ( 8 ) or a combination of both ( 9 ). None of these approaches is suitable for long-range determination of haplotypes: allele-specific PCR is prone to mispriming and longer amplicons carry an increased risk of jumping PCR artifacts, resulting in the creation of artificial recombinants by template switching errors. Additionally, the efficiency of long PCR decreases with increasing amplicon length ( 6 ).
In HF-PCR, a fusion PCR ( 10 ) is performed to bring the regions of interest into apposition ( Figure 1 A). The reaction is carried out in an emulsion, which divides a single PCR into millions of independent reactions ( 11 , 12 ). By controlling the amount of DNA template in an emulsion, we can ensure that paralogous sequence variants (PSVs) or SNPs from single molecules are fused together in the overwhelming majority of cases, resulting in a condensed haplotype. As with long PCR, successful HF-PCR requires the targeted nucleotides to be present on the same template strand, but whereas for long PCR, low molecular weight templates will increase the incidence of template switching, this is not possible in Ligation Haplotyping, being essentially a single molecule amplification technique. Furthermore, HF-PCR does not suffer from a decrease in amplification efficiency over long distances because in the first few rounds, amplification proceeds independently at the two loci, and it is only when these amplicons have reached a sufficiently high concentration that the fusion reaction occurs ( 6 ).
Published emulsification methods typically require the use of large volumes and are costly because of the quantity of reagents used, particularly polymerase ( 6 , 11 , 13 ). For the work presented here, we developed a method of emulsification by vortexing that allows 96 reactions to be prepared simultaneously in 96-well plate format using a small emulsion volume. There are several advantages of this approach: the volume of polymerase required is correspondingly small (2 units per reaction as opposed to >35 units for other published methods) ( 11 , 13 ) and additionally, 96 emulsions can be prepared in 150 s reducing both labour and the length of time between preparing emulsions and thermocycling, and increasing emulsion uniformity.
Following HF-PCR, we perform a novel ligation-based haplotyping assay, giving a robust and high-throughput way of identifying haplotypes ( Figure 1 B). In contrast to other allele-specific ligation assays, that rely upon the ability of ligases to be highly discriminating at the 3′ nucleotide at a ligation junction (e.g. ligase detection reaction ( 14 ), GoldenGate assay ( 15 ), SNPlex ( 16 )), our assay exploits the highly specific base pairing requirements of the thermostable DNA ligase from Thermus thermophilus at both 5′ and 3′ sides of the ligation junction.
Inversions are an important form of structural variation in the human genome and are likely to contribute to both simple and complex disorders and to human evolution ( 6 ). However molecular analysis of inversions has proved difficult, because they are frequently too small to be detected by cytogenetic analysis, are not accompanied by changes in copy number or chromosome size, and inversion break-points typically fall within long inverted repeats. Given that the majority of inversions result from non-allelic homologous recombination (NAHR), which occurs at rates of 10 −3 to 10 −5 per generation, many inversions are likely to occur recurrently ( 6 ), precluding ascertainment of inversion status by genotyping adjacent or internal markers. Individual inversions are commonly assayed by interphase FISH or Pulsed Field Gel Electrophoresis and on a genome-wide scale from dense read-pair data from diverse individuals ( 17 ). However, none of these techniques is easily applied to population-scale genotyping. A small number of well-characterized inversions can be genotyped singly by long PCR or RFLP analysis using standard agarose gel electrophoresis ( 18 , 19 ), but this is only a realistic approach when the homologous sequences that give rise to the inversion are short, whereas inversions in the human genome tend to involve homologous recombination between long inverted repeats ( 20 , 21 ).
As with inversion assays, empirical methods for identifying SNP haplotypes have lagged behind statistical approaches, and typically require the inclusion of other family members' genotype data, which is costly, time consuming and not always practical (5).
Here we illustrate the efficacy of Ligation Haplotyping by genotyping a Yp paracentric inversion found in ∼60% of Northwest European males, that strongly influences the germline rate of infertility-causing XY translocations ( 22 ), and which results from homologous recombination between ∼300 kb long inverted repeats. In addition, we demonstrate the versatility of the method by haplotyping two autosomal interferon regulatory factor 5 ( IRF5 ) SNPs that lie 16.4 kb apart on chromosome 7, the haplotypes of which are associated with risk of developing systemic lupus erythematosus (SLE).
MATERIALS AND METHODS
HapMap DNA samples were obtained from the Centre d’Etude du Polymorphisme Human. Other DNA samples were obtained as described previously ( 6 ).
When prepared singly by stirring, emulsions contain aqueous compartments with an average diameter of 15 μm ( 23 ) and contain 5.7 × 10 5 aqueous compartments per μl. Increasing the speed of stirring reduces this average diameter. The speed of vortexing has a similar influence on aqueous compartment diameter. Reducing the compartment diameter increases the number of compartments and thus decreases the opportunity for any given compartment to contain more than one template molecule, thus reducing the background signal. However, the greater the number of aqueous compartments, the lower the number that contain template and polymerase, so the efficiency of the reaction decreases. We found that an aqueous compartment diameter of between 5 and 10 μm gave the optimal compromise between low background and reaction efficiency ( Supplementary Figure 1 ).
Preparation of emulsions singly
For the Yp inversion, we prepared PCR aqueous phases in a total volume of 100 μl as described previously ( 6 ). For IRF5 SNPs, we included 1 μM primers IRF5F1 (CGGGATGAAGACTGGAGTAGG) and IRF5R2 (GACAAGGAGGAGTAAGCAAGGAAC), and 10 nM primer IRF5Fus4 (GGGTGCCTACAGCAGGGTTCTGACCCTGGCAGGTCC). All primers were synthesised by Sigma. Thermocycling conditions for IRF5 were 98°C for 30 s; 33 cycles of 98°C for 10 s, 63°C for 30 s and 72°C for 15 s.
Microtitre plate preparation of emulsions
We prepared PCRs as described above in a total volume of 25 μl and added each reaction to a different well of a 96-well plate along with 50 μl oil phase (40% DC5225C fluid (Corning), 30% DC749 fluid (Corning) and 30% AR20 silicone oil (Sigma) ( 13 )) and a 3 mm tungsten carbide bead (Qiagen). The plate was sealed with a Microseal ‘A’ plate seal (Bio-Rad), inverted, centrifuged briefly and vortexed for 150 s, inverted, at speed 5 on a Vortex Genie 2, fitted with a microtitre plate adapter (both VWR). The plate was put the right way up and centrifuged extremely briefly before thermocycling as described above.
Ligation reactions and electrophoresis
We disrupted emulsions with diethyl ether (Aldrich) and incubated the recovered aqueous phases with 0.8 units Proteinase K (Sigma) for 56°C for 1 h to digest the polymerase and 95°C for 10 min to denature the Proteinase K, and made the volume up to 200 μl with water.
For IRF5 we phosphorylated 1 μm ligation primers IRF5-5T (IDT; TGAGCATCACCAATGTGACCTATCCTATCCGTGCCAGCAAGATCCAATCTAGA-2′UOMe-2′UOMe-2′UOMe-2′UOMe) and IRF5-5C (CGAGCATCACCAATGTGACCGTGCCAGCAAGATCCAATCTAGA-2′UOMe-2′UOMe-2′UOMe-2′UOMe) and 2 μM IRF5FusF4, and for Yp 2 µM primer InvFusion R1 and 3Way5T (Operon; TTACTGGGTGCTGGACATGCTCTGGTGCCAGCAAGATCCAATCTAGA-2′UOMe-2′UOMe-2′UOMe-2′UOMe 3′), and 0.5 µM 3Way5G (Operon; GTACTGGGTGCTGGACATGCTCTGCTGATGTCCGGTGCCAGCAAGATCCAATCTAGA-2′UOMe-2′UOMe-2′UOMe-2′UOMe 3′) at the 5′ terminus by incubating for 30 min at 37°C in 1 × T4 ligase buffer (Roche; 66 mM Tris-HCl pH7.5, 5 mM MgCl 2 , 5 mM DTT, 1 mM ATP) with 10 units of T4 polynucleotide kinase (Roche) in a total volume of 50 μl, after which we denatured the kinase by heating to 65°C for 20 min. For IRF5 we added 50 pmol of primers IRF5-3G (IDT; AmC6-GGGTTCCCTAAGGGTTGGAGAAAGCGAGCTCGGGG) and IRF5-3T (IDT; AmC6-GGGTTCCCTAAGGGTTGGATTTCGGAAAGCGAGCTCGGGT), and for Yp 50 pmol of primers 3Way3A (Operon; AmC6-dTGGGTTCCCTAAGGGTTGGACAGTGATTTCTGGGTAGCTACAATCATA 3′) and 3Way3G (Operon; AmC6-dT GGGTTCCCTAAGGGTTGGATACTTCAGTGATTTCTGGGTAGCTACAATCATG 3′), along with water to a total volume of 100 µl.
The optimal concentration of oligos in the ligation reaction was determined empirically, so as to provide peaks after capillary electrophoresis with similar heights: the concentration of central bridging oligo was kept constant, and concentrations of all allele-specific oligos were titrated until uniform results were obtained.
We set ligation reactions up in a total volume of 20 μl, containing 1 × Tth ligase buffer (ABGene; 20 mM Tris-HCl pH8.3, 50 mM KCl, 10 mM MgCl 2 , 1 mM EDTA, 1 mM NAD + , 10 mM DTT, 0.1% (v/v) Triton X-100), 2 μl of diluted fusion PCR product, 0.5 μl primer mix and 20 units Tth ligase (ABGene), and mixed these thoroughly before thermocycling (95°C for 2 min; 20 cycles of 95°C for 30 s, 64°C for 4 min; 4°C indefinitely). To digest the ligase, we added 1.6 units of Proteinase K (Sigma) to each reaction and incubated them as described above. Subsequently, we added 5 μl of Lambda exonuclease buffer (NEB; 67 mM glycine-KOH pH9.4, 2.5 mM MgCl 2 , 50 µg ml −1 BSA) and 22.75 μl H 2 O, 1 unit Lambda exonuclease (NEB) and 0.5 units E. coli exo I (USB), to each reaction and incubated them at 37°C for an hour, to digest unligated primers, followed by 65°C for 20 min to denature the exonucleases.
To amplify ligated oligos and to incorporate a fluorescent label we performed a final PCR reaction. As with the HF-PCR, we used Phusion DNA polymerase to avoid the addition of adenosine to the 3′ end of PCR products, to give single peaks after capillary electrophoresis. We carried out reactions in 1 × Phusion HF buffer, 200 μM dNTPs, 300 μM PCR primers MLPAF (Sigma; 5′ 6FAM GGGTTCCCTAAGGGTTGGA 3′) and MLPAR (Sigma; 5′ GTGCCAGCAAGATCCAATCTAGA 3′), 2.5 μl LDR product and 1 unit Phusion DNA polymerase in a total volume of 50 μl, using the same thermal cycling conditions as above but with a 63°C annealing temperature.
To visualise the fluorescent PCR products we ran 1 μl of product on an ABI 3100 capillary electrophoresis sequencer along with 8 μl formamide (Fluka) and 1 μl GeneScan ROX 350 size standard (Applied Biosystems).
RESULTS AND DISCUSSION
In the Ligation Haplotyping reaction, allele-specific oligos for each SNP/PSV are brought together and are ligated simultaneously to a central bridging oligo. The assay is designed so that this bridging oligo is the same oligo that was used as the fusion primer in the prior reaction. Several rounds of denaturation and annealing/ligation result in linear amplification of the product: the product does not serve as a template in subsequent rounds of ligation, which helps to maintain specificity ( 14 ). For each SNP/PSV the specific oligos differ in length, giving the ligation products a characteristic size depending upon the haplotypes present. Ligation primers possess universal sequences allowing PCR amplification of all ligation products simultaneously using fluorescently labelled primers, and PCR products are subsequently resolved by capillary electrophoresis ( Figure 1 C).
We previously fine-mapped the breakpoint interval and identified flanking PSVs using four individual YACs containing the inversion breakpoints before and after the inversion ( 6 ) ( Figure 2 A). We performed Ligation Haplotyping on individual and pairwise mixtures of these YACs (data not shown), on genomic DNA from the same two individuals from which the YACs derive ( Figure 2 B) and demonstrated that Ligation Haplotyping identified only a single haplotype from each YAC, which corresponded precisely to the known PSV content in each YAC, and identified the correct pair of haplotypes in the individuals from whom the YACs were derived.
We then performed the assay on a panel of genomic DNA from thirteen individuals of diverse geographic ancestry, obtained from ECACC, to which the experimenter was blinded as to the inversion status of the individuals. In twelve individuals of this geographically diverse panel (92%) it was possible to assign either ‘inverted’ or ‘reference’ status as both breakpoint haplotypes were diagnostic of either orientation. In one individual, a haplotype was obtained that was a mixture of two breakpoint haplotypes ( Figure 2 B), one characteristic of the reference orientation and the other of the inverted orientation, presumably resulting from a gene conversion event that homogenises PSV2. Results were confirmed by bead haplotyping ( 6 ), showing that Ligation Haplotyping is capable of unambiguous inversion genotyping by assaying both inversion breakpoints: gene conversion produces aberrant pairs of breakpoint haplotypes, which are easily distinguishable from either of the inversion states.
The ability to sequence clones spanning inversion breakpoints in their entirety should enable the design of many such assays ( 17 ), allowing population-scale genotyping assays to be developed, which will facilitate our understanding of the impact of chromosomal inversions on complex traits ( 20 ).
The T allele of SNP rs2004640 creates a splice site in exon 1 of IRF5 that is absent from the G allele, permitting expression of several unique IRF5 isoforms. The second SNP, rs2280714 is cis - acting, and the T allele of which is associated with elevated expression ( Figure 3 A). Individuals who possess T alleles at both SNP positions on the same chromosome have an elevated risk of developing SLE ( 24 , 25 ). We compared the results obtained by Ligation Haplotyping the 30 Centre d’Etude du Polymorphisme Human (CEPH) HapMap trios with haplotypes derived statistically ( 24 , 25 ) and obtained 100% concordance ( Supplementary Table 1 ). Inheritance of the haplotypes was observed to be Mendelian in each case, with observed diplotype frequencies obeying the Hardy–Weinberg equilibrium ( Figure 3 B). Not all haplotypes were observed in the HapMap individuals: the GT haplotype occurs at a low frequency (5.6%) and no GTGT homozygotes were detected. Furthermore, no TC haplotypes were observed ( Figure 3 B and C).
In principle, the length of a haplotype that can be condensed is limited only by the size of the template DNA used in the emulsion. Emulsion preparation appears to have little shearing effect on genomic DNA 100–150 kb in length ( 6 ) and so HF-PCR and Ligation Haplotyping should be possible over a considerably longer range than long PCR.
Because HF-PCR only occurs in a small fraction of the aqueous compartments within an emulsion (<0.1%) this part of this assay should be readily amenable to multiplexing, using different sets of primers that do not compete for resources due to amplification proceeding in separate compartments. Additionally, as with the ligation detection reaction, SNPlex and GoldenGate assays, the Ligation Haplotyping part should also be compatible with high levels of multiplexing. The assay could easily be adapted for use on addressable arrays or with combination ultra high-throughput sequencing technologies, which will increase throughput and decrease cost further.
Supplementary Data are available at NAR Online.
This work was funded by the Wellcome Trust. The authors would like to thank Mark Jobling for the kind gift of DNA and Robert Graham for statistically derived IRF5 haplotype data. Funding to pay the Open Access publication charges for the article was provided by the Wellcome Trust.
Conflict of interest statement . None declared.