Velo-cardio-facial syndrome/DiGeorge syndrome/22q11.2 deletion syndrome (22q11.2DS) is caused by meiotic non-allelic homologous recombination events between flanking low copy repeats termed LCR22A and LCR22D, resulting in a 3 million base pair (Mb) deletion. Due to their complex structure, large size and high sequence identity, genetic variation within LCR22s among different individuals has not been well characterized. In this study, we sequenced 13 BAC clones derived from LCR22A/D and aligned them with 15 previously available BAC sequences to create a new genetic variation map. The thousands of variants identified by this analysis were not uniformly distributed in the two LCR22s. Moreover, shared single nucleotide variants between LCR22A and LCR22D were enriched in the Breakpoint Cluster Region pseudogene (BCRP) block, suggesting the existence of a possible recombination hotspot there. Interestingly, breakpoints for atypical 22q11.2 rearrangements have previously been located to BCRPs. To further explore this finding, we carried out in-depth analyses of whole genome sequence (WGS) data from two unrelated probands harbouring a de novo 3Mb 22q11.2 deletion and their normal parents. By focusing primarily on WGS reads uniquely mapped to LCR22A, using the variation map from our BAC analysis to help resolve allele ambiguity, and by performing PCR analysis, we infer that the deletion breakpoints were most likely located near or within the BCRP module. In summary, we found a high degree of sequence variation in LCR22A and LCR22D and a potential recombination breakpoint near or within the BCRP block, providing a starting point for future breakpoint mapping using additional trios.

Introduction

Velo-cardio-facial syndrome/DiGeorge syndrome, also known as 22q11.2 deletion syndrome (22q11.2DS), is a congenital malformation disorder occurring in 1/4000 live births (1). For most affected individuals with this syndrome, the deletion occurs sporadically as a de novo event during meiosis. The most common mechanism is by non-allelic homologous recombination (NAHR) events between flanking low copy repeats (or segmental duplications; SD) termed LCR22s (2–4). This is a common mechanism resulting in many other genomic disorders, which have their own chromosome specific LCRs (5–7). Most individuals with 22q11.2DS have a 3 Mb deletion flanked by two particular LCR22s, termed LCR22A and LCR22D (1,4). These two are the largest among eight LCR22s on the chromosome 22q11.2 region. They are approximately 240 kb in size (8) and consist of a 160 kb direct repeat (i.e. SD). Using the sequences within LCR22A and LCR22D as a base reference for analysis of the 22q11.2 region, there are eight LCR22s that together comprise roughly 11% of the 22q11.2 region (5,8). The LCR22s are composed of blocks or modules containing intermingled genes and pseudogenes, creating a complex mosaic structure (Figure 1) . Each block or module within the two LCR22s is over 99% identical in sequence to each other (5,9). Many of the blocks are repeated two or more times within the two LCR22s as well as again in different LCR22s. When taken together, this complexity creates a great challenge to completely understand their structure, evolution, and sequence variation among different individuals. It also makes it difficult to define where the region(s) of chromosome breakage occurs during meiosis, resulting in the typical 3 Mb deletion.
Alignment of the 13 BAC clones to the LCR22A/D regions and density of variants for each clone. Colour cartoons depict the mosaic organization of the LCR22A and LCR22D regions, with the locations of annotated genes, pseudogenes and lincRNAs. The locations of LCR22A and LCR22D are marked by open black rectangles. The largest duplicated segment (160kb in length, indicated by cyan box) shared by the two LCR22s is the focus of current analysis. At the bottom, each black horizontal line refers to a BAC clone located in either the LCR22A or the LCR22D region. Distribution of deletions, insertions, and SNVs in each BAC, relative to the reference genome, was shown with blue, red, and gray vertical lines (as #SNVs per kb), respectively.
Figure 1.

Alignment of the 13 BAC clones to the LCR22A/D regions and density of variants for each clone. Colour cartoons depict the mosaic organization of the LCR22A and LCR22D regions, with the locations of annotated genes, pseudogenes and lincRNAs. The locations of LCR22A and LCR22D are marked by open black rectangles. The largest duplicated segment (160kb in length, indicated by cyan box) shared by the two LCR22s is the focus of current analysis. At the bottom, each black horizontal line refers to a BAC clone located in either the LCR22A or the LCR22D region. Distribution of deletions, insertions, and SNVs in each BAC, relative to the reference genome, was shown with blue, red, and gray vertical lines (as #SNVs per kb), respectively.

One important question for 22q11.2DS is whether there exist recombination hotspots within the LCR22s that may pinpoint the location(s) of historic, intra-LCR22 or inter-LCR22 rearrangements or meiotic rearrangements causing disease. Such hotspots may occur more frequently in some regions within LCR22s than others. Alternatively, the regions of chromosome breakage during evolution or in 22q11.2DS patients may occur randomly in the LCR22s. The presence of paralogous single nucleotide sequence variants (PSVs) (10) unique to one versus the other LCR22s, would help mark a particular LCR22 and thus be valuable in the quest for understanding the basis of their structure and variation or narrowing the deletion breakpoints by DNA sequence analysis. Using human genomic DNA sequence from existing large insert clones tiling across the LCR22s, we previously characterized sequence variation within LCR22A and LCR22D (LCR22A/D) in hopes to identify PSVs (9,11). Unfortunately, at that time, the BAC clone sequence was available for only two or at most three different alleles of chromosome 22. Despite the paucity of BAC clones to particular locations, PSVs as well as shared sequence variants or shared polymorphism sites (SPS) between the two LCR22s were identified. Characterization of them led to two interesting findings. One was that the DNA variants clustered in particular regions within the two LCR22s and thus occurred in a non-random manner (9). The other was that there was a negative correlation between the locations of PSVs versus SPSs in the two LCR22s. The non-random distribution of polymorphic sites suggested the existence of frequent historic gene conversion events (9). Regions harbouring more frequent historic gene conversion events are likely hotspots for recombination. However, a replication of such findings with more BAC sequences was warranted. In addition, it remains unclear whether the breakpoints in the 22q11.2DS patients are located in the same region or dispersed through LCR22A/D.

In this study, we completely sequenced 13 new BAC clones derived from either LCR22A or LCR22D by traditional Sanger sequencing, with the goal to further delineate PSVs and SPSs and to identify structural variations. Our analysis of the alignment of these de novo assembled clones along with the sequence from additional 15 previously sequenced BAC clones, for a total of 28, as well as the reference sequence, has resulted in an updated map of extensive single nucleotide variants (SNVs), as well as structural variants in the LCR22A/D regions. Characterization of the distribution of these variants indicates that regions near BCRP loci exhibited evidence for greater recombination activities than other regions. This suggests that regions of chromosome breakage during meiosis occurring in evolution or disease, as well as ectopic crossovers on 22q11.2 may occur in the BCRP loci (12).

This hypothesis can be directly tested in 22q11.2DS probands carrying a de novo deletion, by analysing the genomic sequences of the probands and their parents. As such, we performed whole genome sequencing analysis of two trio families with de novo 3Mb deletion in the probands. We identified a genomic region within or near the BCRP block of LCR22A, with evidence that the probands have inherited two alleles from both parents in its upstream region but a single allele from one parent in its downstream region, suggesting it harbours the deletion breakpoints. The importance in identifying the potential hotspots of rearrangements may help determine the mechanism(s) responsible for the deletion leading to human congenital anomaly disorders.

Results

Identification of DNA variations in LCR22A and LCR22D

The two LCR22s, LCR22A and LCR22D, that flank the typical 3 Mb deletion in 22q11.2DS are approximately 240 kb in length (9). The 160 kb direct repeats (>99% sequence identity) between the two are the focus of our current study to identify recombination hotspots and breakpoints in individuals with 22q11.2DS (Figure 1, cyan rectangles). The 160 kb direct repeats harbour the USP18 (Ubiquitin specific peptidase 18), GGT (GGT1; Gamma-glutamyltranspeptidase), GGTLA (GGT5; Gamma glutamyltransferase 5), IGSF3 (Immunoglobulin superfamily member 3) and BCR (Breakpoint cluster region) pseudogenes (Figure 1). The direct repeat and the re-occurrence of nearly identical sequences in other LCR22s have made it very difficult to resolve individual haplotypes or paralogs, hindering a better understanding of the mechanism leading to 22q11.2DS. To more fully understand DNA sequence variations within haplotypes and LCR22 paralogs, we obtained large insert BAC clones mapping to LCR22A or LCR22D from previously existing libraries, in which we could confidently anchor clones to a particular LCR22 (Supplementary Material, Table S1), discern haplotype by Sequenom genotyping (Supplementary Material, Table S2), and obtain long sequencing reads afforded by Sanger sequencing to distinguish sequence paralogs. Note that the sequencing and assembling of the BAC clones were performed in a de novo manner, independent of the reference genome. Assembled sequence data from 13 new BAC clones (Figure 1) were then analysed together with sequence data from 15 previously sequenced BAC clones (11) in GenBank (Supplementary Materials, Table S3; Fig. S1). In total, 7 haplotypes and 14 paralogs were identified.

The alignment between each BAC and the reference genome sequence (hg19) was examined to determine three different types of variants: SNVs, indels (<200 bp) and copy number variations (CNVs) (>200 bp). Considering the variants at each of the 13 individual BACs as independent events, we identified a total of 4,710 SNVs, 819 indels, and 13 CNVs. Among these, 2,641 SNVs, 404 indels and 11 CNVs were from BACs in LCR22A, while 2,069 SNVs, 415 indels and two CNVs were from the BACs in LCR22D (Table 1). After merging these variants and those from the sequences of 15 BACs previously available in GenBank based on their coordinates, i.e., variants present in different BACs but at the identical chromosomal positions were counted only once, we identified a total of 1,577 (900 from the 13 BACs) non-redundant SNVs in the 160-kb repeat block of the LCR22A (18,712,538–18,878,770). In comparison, 485 and 2,030 SNVs were reported from the 1,000 Genomes Project (13) and the dbSNP database (version 137), respectively. The overlaps with the 1,000 Genomes and the dbSNP data were 83 and 426 SNVs, respectively. For the equivalent 160-kb block of the LCR22D (21,515,447–21,679,525), we identified 525 SNVs; 245 and 1,266 were from the 1000 Genomes Project and dbSNP data, respectively. The overlap was six from the 1000 Genomes data and 45 for the dbSNP data. The small overlap in variants found suggest that a) more variants in LCR22s among humans remain to be discovered, and b) analysis of short reads (e.g., data used in the 1000 Genomes Project) may lead to underestimation of variation in highly repetitive, complex regions. Overall, we identified an average of 6.2 SNVs per kb in the 160 kb direct repeat region, which was consistent with a previous analysis of 10 BAC clones (9), indicating that there is extensive single nucleotide variation in the two LCR22s as compared to the genome wide average.

Table 1.

List of the different types of variants identified within LCR22A/D regions based on pairwise sequence alignment of each BAC clone with the reference genome.

BAC IDAligned start at referenceAligned sizeSequence similarity#N of SNVs#N of INDELs (<200bp)CNV start:size (>200bp) a
LCR22ARP13-68B7187684232063270.99747962
  • +/18850827:725

  • +/18851697:232

LCR22ARP13-49D18187796071953440.99635940
  • +/18881784:2300

  • −/18833164:1459*

LCR22ARP11-690P21188049951903340.99747587
  • −/18833164:1445*

  • +/19009696:200

  • +/18927154:8207

LCR22ARP11-249J8185732621755620.99916821NA
LCR22ARP11-130B16187441291958860.99755985−/18833164:1446*
LCR22ACTD-3048O14185482771390250.99912536
  • −/18589360:384

  • −/18668724:315

LCR22ACH17-116O7186541502130310.99647673−/18833331:1466*
LCR22DRP3-329B9215291241289370.99734648NA
LCR22DRP3-181G22213948251661400.99832148NA
LCR22DRP13-339E24216892942069720.99826572NA
LCR22DCTD-2506I16216901882059130.99833696NA
LCR22DCH17-60L4217012442013210.99823265NA
LCR22DCH17-210L21214130142365560.99756986
  • +/21537705:432

  • −/21481326:930*

BAC IDAligned start at referenceAligned sizeSequence similarity#N of SNVs#N of INDELs (<200bp)CNV start:size (>200bp) a
LCR22ARP13-68B7187684232063270.99747962
  • +/18850827:725

  • +/18851697:232

LCR22ARP13-49D18187796071953440.99635940
  • +/18881784:2300

  • −/18833164:1459*

LCR22ARP11-690P21188049951903340.99747587
  • −/18833164:1445*

  • +/19009696:200

  • +/18927154:8207

LCR22ARP11-249J8185732621755620.99916821NA
LCR22ARP11-130B16187441291958860.99755985−/18833164:1446*
LCR22ACTD-3048O14185482771390250.99912536
  • −/18589360:384

  • −/18668724:315

LCR22ACH17-116O7186541502130310.99647673−/18833331:1466*
LCR22DRP3-329B9215291241289370.99734648NA
LCR22DRP3-181G22213948251661400.99832148NA
LCR22DRP13-339E24216892942069720.99826572NA
LCR22DCTD-2506I16216901882059130.99833696NA
LCR22DCH17-60L4217012442013210.99823265NA
LCR22DCH17-210L21214130142365560.99756986
  • +/21537705:432

  • −/21481326:930*

1. a, “+” and “- “represent insertion and deletion CNVs, respectively. “*” marks CNVs in the BCRP regions.

Table 1.

List of the different types of variants identified within LCR22A/D regions based on pairwise sequence alignment of each BAC clone with the reference genome.

BAC IDAligned start at referenceAligned sizeSequence similarity#N of SNVs#N of INDELs (<200bp)CNV start:size (>200bp) a
LCR22ARP13-68B7187684232063270.99747962
  • +/18850827:725

  • +/18851697:232

LCR22ARP13-49D18187796071953440.99635940
  • +/18881784:2300

  • −/18833164:1459*

LCR22ARP11-690P21188049951903340.99747587
  • −/18833164:1445*

  • +/19009696:200

  • +/18927154:8207

LCR22ARP11-249J8185732621755620.99916821NA
LCR22ARP11-130B16187441291958860.99755985−/18833164:1446*
LCR22ACTD-3048O14185482771390250.99912536
  • −/18589360:384

  • −/18668724:315

LCR22ACH17-116O7186541502130310.99647673−/18833331:1466*
LCR22DRP3-329B9215291241289370.99734648NA
LCR22DRP3-181G22213948251661400.99832148NA
LCR22DRP13-339E24216892942069720.99826572NA
LCR22DCTD-2506I16216901882059130.99833696NA
LCR22DCH17-60L4217012442013210.99823265NA
LCR22DCH17-210L21214130142365560.99756986
  • +/21537705:432

  • −/21481326:930*

BAC IDAligned start at referenceAligned sizeSequence similarity#N of SNVs#N of INDELs (<200bp)CNV start:size (>200bp) a
LCR22ARP13-68B7187684232063270.99747962
  • +/18850827:725

  • +/18851697:232

LCR22ARP13-49D18187796071953440.99635940
  • +/18881784:2300

  • −/18833164:1459*

LCR22ARP11-690P21188049951903340.99747587
  • −/18833164:1445*

  • +/19009696:200

  • +/18927154:8207

LCR22ARP11-249J8185732621755620.99916821NA
LCR22ARP11-130B16187441291958860.99755985−/18833164:1446*
LCR22ACTD-3048O14185482771390250.99912536
  • −/18589360:384

  • −/18668724:315

LCR22ACH17-116O7186541502130310.99647673−/18833331:1466*
LCR22DRP3-329B9215291241289370.99734648NA
LCR22DRP3-181G22213948251661400.99832148NA
LCR22DRP13-339E24216892942069720.99826572NA
LCR22DCTD-2506I16216901882059130.99833696NA
LCR22DCH17-60L4217012442013210.99823265NA
LCR22DCH17-210L21214130142365560.99756986
  • +/21537705:432

  • −/21481326:930*

1. a, “+” and “- “represent insertion and deletion CNVs, respectively. “*” marks CNVs in the BCRP regions.

To illustrate the potential effects of these variants on LCR22 gene or pseudogene structure, we compared the variant locations to the Ensembl gene annotation (build 73). Although most of the annotated elements in LCR22A/D are pseudogenes and non-coding transcripts, we found 89 SNVs and 1 CNV located to exons of annotated genes (Table 2), indicating that some variants may affect coding regions. As anticipated based upon the fact that most of the LCR22 sequences are intergenic, the majority of variants from our BAC analysis, however, were indeed located to intergenic regions (Table 2). A total of 72.3%, 82.8% and 57.1% were SNVs, indels, and CNVs, respectively.

Table 2.

Numbers of variants overlapping with Ensembl gene annotation

Annotated elementsStructureSNVINDELCNV
Geneintron683552*
exon8901*
lncRNA transcriptintron5640
exon11080
Pseudogeneexon3930
Intergenic25463374
Annotated elementsStructureSNVINDELCNV
Geneintron683552*
exon8901*
lncRNA transcriptintron5640
exon11080
Pseudogeneexon3930
Intergenic25463374
*

These CNVs span intron–exon junctions.

Table 2.

Numbers of variants overlapping with Ensembl gene annotation

Annotated elementsStructureSNVINDELCNV
Geneintron683552*
exon8901*
lncRNA transcriptintron5640
exon11080
Pseudogeneexon3930
Intergenic25463374
Annotated elementsStructureSNVINDELCNV
Geneintron683552*
exon8901*
lncRNA transcriptintron5640
exon11080
Pseudogeneexon3930
Intergenic25463374
*

These CNVs span intron–exon junctions.

Table 3.

Numbers of different types of variants in LCR22A/D (data also plotted in Figure 3A).

Type# of events
SNVPSV340
NPS108
SPS406
LCR22A/D SNPs2669
INDELShared377
Unique30
CNVShared7
Unique0
Type# of events
SNVPSV340
NPS108
SPS406
LCR22A/D SNPs2669
INDELShared377
Unique30
CNVShared7
Unique0
Table 3.

Numbers of different types of variants in LCR22A/D (data also plotted in Figure 3A).

Type# of events
SNVPSV340
NPS108
SPS406
LCR22A/D SNPs2669
INDELShared377
Unique30
CNVShared7
Unique0
Type# of events
SNVPSV340
NPS108
SPS406
LCR22A/D SNPs2669
INDELShared377
Unique30
CNVShared7
Unique0

CNVs within BAC clones and in LCR22A/D

It is of interest to understand the position of CNVs within the LCR22s because it may help define specific haplotypes or paralogs. The sizes of CNVs within the 13 new BAC clones were up to 8,270 bp (Table 1). Among the CNVs, all have been previously reported (11) and/or present in the Database of Genome Variants (http://projects.tcag.ca/variation/) (14), identified from array comparative genome hybridization or by direct sequencing of fosmid clones. The high frequency of CNVs (∼1 per BAC) is consistent with our previous report that CNVs occur more frequently in LCR22s than in the non-LCR22 regions (P < 0.001) (9), further supporting that CNVs are more common in LCR22s than elsewhere in the 22q11.2 region.

Examination of the breakpoints of these CNVs revealed an enrichment of Alu repetitive elements (6 of 13 CNVs) in comparison to what would be expected by chance (p value < 0.001, permutation test), consistent with our previous findings (11). Intersecting these CNVs with gene annotations, we found that three CNVs spanned exon/intron boundaries (Table 2) and thus would disrupt the intron-exon structures of two predicted genes or transcripts (ENSG00000182824 and ENSG00000182356).

Comparative analysis of sequence variants between LCR22A and LCR22D

Analysis of shared or unique sequences can identify gene conversion and recombination hotspots. To carry out such an analysis, we aligned all BAC clones to the coordinate frame of the 160 kb direct repeat in LCR22A (Figure 2). From the blocks of aligned sequences derived from multiple BACs, we inferred variants that were unique or shared between LCR22A and LCR22D, using the classification scheme illustrated in Figure 2B. We identified 340 PSVs, 108 non-shared polymorphism sites (NPSs), 406 SPSs, and 2,669 shared SNVs between LCR22A and LCR22D (Table 3; Figure 3A). We identified a total of 367 SPSs (excluding 39 SPSs in simple repeat regions) whereas only 122 were reported in the previous study (9). Moreover, the total number of PSVs was significantly reduced from the previous report (9), from 384 to the current number of 100. This is expected as some variants initially considered “unique” to LCR22A and LCR22D would likely become “shared” as more BACs are analysed. Similarly, we also identified 30 unique and 384 shared indels in the 160 kb direct repeat in LCR22A and LCR22D (Supplementary Material, Figure S2A), which are significantly greater numbers than previously described (9).
Illustration of the classification of variants by comparison LCR22A and LCR22D regions. (A) Location of each BAC clone sequence was firstly determined based on its best-match to the reference genome. According to mapping coordinates, we decomposed these BACs sequence into sequence modules and then constructed multiple sequence alignment for BAC and reference subsequences within each module independently. (B) A total of four types of variants, including share and non-shared polymorphism sites, were termed according to base pair comparison between LCR22A and LCR22D. For SNVs, they were basically four types: (1) PSV (Paralogous sequence variant), at a position where no variants from BACs (ditto) were detected for either LCR22A or LCR22D but the bases between LCR22A and LCR22D were different. (2) NPS (non-shared polymorphism site), at a position where variants were detected in one or both LCR22s but no variants were shared between LCR22A and LCR22D. (3) SPS (Shared polymorphism site), at a position where both LCRs are polymorphic and the two sets of possible variants in the two LCR22s are identical. (4) Shared LCR22A/D SNP, at a position where only one LCR is polymorphic, but the nucleotide in the non-variant LCR is contained within the set of variants. Unless specified otherwise, PSV and NPS were also referred to as “unique to” while the latter two categories as “shared by” the two LCR22s. For indels/CNVs, they were simply classified as “unique”, occurring in one LCR22 but not in the equivalent position of the other LCR22; or otherwise “shared”.
Figure 2.

Illustration of the classification of variants by comparison LCR22A and LCR22D regions. (A) Location of each BAC clone sequence was firstly determined based on its best-match to the reference genome. According to mapping coordinates, we decomposed these BACs sequence into sequence modules and then constructed multiple sequence alignment for BAC and reference subsequences within each module independently. (B) A total of four types of variants, including share and non-shared polymorphism sites, were termed according to base pair comparison between LCR22A and LCR22D. For SNVs, they were basically four types: (1) PSV (Paralogous sequence variant), at a position where no variants from BACs (ditto) were detected for either LCR22A or LCR22D but the bases between LCR22A and LCR22D were different. (2) NPS (non-shared polymorphism site), at a position where variants were detected in one or both LCR22s but no variants were shared between LCR22A and LCR22D. (3) SPS (Shared polymorphism site), at a position where both LCRs are polymorphic and the two sets of possible variants in the two LCR22s are identical. (4) Shared LCR22A/D SNP, at a position where only one LCR is polymorphic, but the nucleotide in the non-variant LCR is contained within the set of variants. Unless specified otherwise, PSV and NPS were also referred to as “unique to” while the latter two categories as “shared by” the two LCR22s. For indels/CNVs, they were simply classified as “unique”, occurring in one LCR22 but not in the equivalent position of the other LCR22; or otherwise “shared”.

The density of different types of variants as well as sequence features along LCR22A region. (A) The four types of SNVs (including unique SNPs: PSV, NPS and shared SNPs: SPS, LCR22A/D SNP; Figure 2B) were plotted with different colour lines in LCR22A/D region, representing numbers of individual types of variants per 10-kb region (#N/10kb). (B) Distribution of SPS enrichment probability based on a binomial model (see Methods) was plotted using 10-kb sliding windows with a step of 5kb. (C) A total of 10 CNVs detected from the 13 BACs, one located outside of the plotted region. (D) Distribution of different types of repeat elements, simple tandem repeats and the recombination hotspot motifs along LCR22A region. The green dots mark the location of HSATI elements.
Figure 3.

The density of different types of variants as well as sequence features along LCR22A region. (A) The four types of SNVs (including unique SNPs: PSV, NPS and shared SNPs: SPS, LCR22A/D SNP; Figure 2B) were plotted with different colour lines in LCR22A/D region, representing numbers of individual types of variants per 10-kb region (#N/10kb). (B) Distribution of SPS enrichment probability based on a binomial model (see Methods) was plotted using 10-kb sliding windows with a step of 5kb. (C) A total of 10 CNVs detected from the 13 BACs, one located outside of the plotted region. (D) Distribution of different types of repeat elements, simple tandem repeats and the recombination hotspot motifs along LCR22A region. The green dots mark the location of HSATI elements.

The SNVs were not distributed uniformly within LCR22A and LCR22D, as shown in Figure 3. Based on the updated list of SNVs, we compared the distributions for different types of SNVs and found that the PSVs were mostly located to the ends of the 160 kb direct repeat in the two LCR22s, while the two types of shared variants (SPS and shared LCR22A/D SNPs) were significant enriched in the interval that contains the BCRP pseudogene (Figure 3). Thus, overall, the SPSs and PSVs did not overlap.

As shared variants can be considered a potential signature of recombination (arising from either gene conversion or double cross-over events) between LCR22A and LCR22D, these data suggest that a recombination hotspot might be near or within the BCRP locus. To address this more quantitatively, we applied a statistical approach to compute the enrichment of SPSs along the LCR22s and confirmed that SPSs were indeed most significantly enriched in the BCRP locus (Figure 3B). Inclusion of variants from the 1000 Genomes Project in this analysis yielded the same result (Supplementary Material, Fig. S2B). Notably, the enrichment of SPSs at the BCRP locus is unlikely due to higher mutation rates or ancestral polymorphisms, since no PSVs were found in the region. Therefore, we conclude that the enrichment of SPSs in LCR22A and LCR22D BCRPs most likely have arisen from a high frequency of past exchange of nucleotides between the two LCR22s, suggesting the existence of recombination hotspots near or at the BCRP block.

In our previous work with 10 sequenced BAC clones (11), we observed a significant enrichment (P < 0.05) of CNVs in the BCRP locus. We thus examined the 13 CNVs from the new BAC sequences and found 5 deletion CNVs in the same region of the BCRP locus (Figure 3C;Table 1, the CNVs marked by “*”). Furthermore, many CNVs reported in the Database of Genome Variants occur in the BCRP block (Supplementary Material, Fig. S3).

We next searched for sequence features that might mark hotspots for recombination events or gene conversion. Figure 3D shows the distribution of various annotated repetitive elements. The data indicate that HSATI satellite elements were mostly at the end of LCR22A/D, where PSVs and shared LCR22A/D variants were enriched. On the other hand, SINE elements were weakly enriched at the BCRP locus. The correlations of the HSATI and SINE enrichment, however, were not statistically significant, probably due to greater SINE element enrichment in two other locations far upstream of the BCRP block (Figure 3D).

Since there is a defined hotspot signature for meiotic recombination in the mammalian genome, CCNCCNTNNCCNC (15), we examined whether this motif might be located in the BCRP block. Interestingly, we found seven occurrences of this motif in the vicinity of our predicted recombination hotspot at the BCRP locus and three of them were located within the BCRP block (Figure 3D). Although there is no statistical enrichment, even one motif is conceivably sufficient for stimulating recombination. This result indicates that the BCRP locus could harbour a recombination hotspot. The alternative motif, CCTCCCT, associated with recombination hotspots (16), was also found in the region (Figure 3D).

Potential large rearrangements within BAC clones that might involve LCR22A and LCR22D directly

Intra-LCR22 rearrangements have been recently identified by fibre FISH (fluorescence in situ hybridization) methods (17). BAC clones may reflect such large rearrangements within LCR22s. We did not detect evidence for such rearrangements. This may be due to the limited number of subjects used to generate the BAC libraries. Alternatively, BAC clones from an individual with an inter-LCR22 rearrangement such as a 22q11.2 deletion would harbour the chromosome breakpoint, and would match part of LCR22A and part of LCR22D. We performed an analysis of all the BAC clone sequences to search for cases where one part of the BAC could match the structural organization of LCR22A but the other part could be aligned to LCR22D. Among the 28 BACs, we found that one BAC from an unknown library (termed AC007708) exhibited such evidence (Supplementary Material, Fig. S4), suggesting that it could have been derived from an individual with an inter-LCR22 rearrangement or a 3 Mb deletion. If the rearrangement in AC007708 is bona fide and not due to uncertainty in sequencing or assembly, the breakpoints for the rearrangement or deletion in LCR22A/D would be in the vicinity of BCRP (see supplement for more details).

Deletion breakpoints inferred from WGS of trios with 22q11.2 deletion probands

The above analysis from BAC sequence comparison suggests that the BCRP locus has more genetic variations than other regions within the two LCR22s and may harbour recombination hotspots. To investigate this more directly, we carried out a reanalysis of WGS data from two trios, in which a 3 Mb deletion was present in the 22q11.2DS probands (BM1453.001 and BM1452.001), but not in their parents (18). The WGS read depth in the sequence flanked by LCR22A/D was half that of the surrounding 22q11.2 sequence and thus confirmed the presence of a 3 Mb deletion in both probands (Figure 4A). Analysis of the variants in the non-duplicated region between LCR22A and LCR22D showed that the deleted copies in the BM1452.001 and BM1453.001 were from the maternal and paternal allele, respectively (18).
The summary map from WGS analysis. (A) WGS read densities across six individuals of two family trio are shown from LCR22A to LCR22D. The presence of a 3Mb deletion can be seen from the obvious reduction the read densities to 1/2 (from the end of LCR22A to the beginning of LCR22D) in the probands (BM1452.001 or BM1453.001; red) in comparison to their parents (100 and 200, mother and father, respectively; blue). The green line below the reference coordinate represents non-LCR sequences. A total of 4 informative sites located in the interval of LCR22 spanning, 18,838,045-18,843,686, were ultimately selected for inferring the number of alleles in the two probands at those chromosome positions (B–E). The numbers of WGS reads with both ends of the pairs uniquely mapped across these positions are shown in B) for BM1453 and C) for BM1452 family, while D) and E) are for the data from at least one end that was uniquely mapped. Only nucleotides different from reference sequence are shown in colours, by the convention of the IGV browser.
Figure 4.

The summary map from WGS analysis. (A) WGS read densities across six individuals of two family trio are shown from LCR22A to LCR22D. The presence of a 3Mb deletion can be seen from the obvious reduction the read densities to 1/2 (from the end of LCR22A to the beginning of LCR22D) in the probands (BM1452.001 or BM1453.001; red) in comparison to their parents (100 and 200, mother and father, respectively; blue). The green line below the reference coordinate represents non-LCR sequences. A total of 4 informative sites located in the interval of LCR22 spanning, 18,838,045-18,843,686, were ultimately selected for inferring the number of alleles in the two probands at those chromosome positions (B–E). The numbers of WGS reads with both ends of the pairs uniquely mapped across these positions are shown in B) for BM1453 and C) for BM1452 family, while D) and E) are for the data from at least one end that was uniquely mapped. Only nucleotides different from reference sequence are shown in colours, by the convention of the IGV browser.

The precise position of the breakpoints could not be detected from read depth analysis, due to significant ambiguity in read mapping because of the presence of eight copies of LCR22s within the 22q11.2 region. To overcome this challenge, we started from WGS read pairs that were mapped uniquely to a single location within one LCR22. We manually searched for (a) heterozygous sites in the probands with evidence for inheriting both parental alleles, and (b) sites where the two parents were homozygous but for different alleles while only one of the two alleles was observed in the proband. Due to the high sequence identity between LCR22A and LCR22D, we were only able to find a few such critically informative sites. In the end, we focused on one specific heterozygous site at position 18,838,045 in the LCR22A for both BM1453 and BM1452 (Figure 4B and C), where both probands and their parents contained many heterozygous reads (C/T), indicating the potential breakpoints would be located 3’ downstream to the position (Figure 4B and C). This position is ∼3kb 5’ upstream to the BCRP7 locus. Two additional heterozygous sites further upstream were supported by WGS reads in the two probands (Table 4). Searching 3’ downstream from 18,838,045, we found two sites, 18,843,686 and 18,844,089, in LCR22A with evidence of only inheriting paternal allele in BM1452.001 (Figure 4C;Table 4). For alleles in those two positions, G/G (same as reference genome) were detected in BM1452.100 and T/T were detected in BM1452.200, but only the T allele was transmitted to the proband, BM1452.001. Unfortunately, we performed the same analysis for the BM1453 family, but were unable to find sites that showed conclusive evidence for BM1453.001 inheriting only one allele from the parents.

Table 4.

Number of reads at informative sites from 18,825,550 to 18,844,089, derived from the WGS read pairs with both end uniquely mapped.

PositionReference (R)/ Alternative (A)BM1452 Trio
BM1453 Trio
001100200001100200
18,825,550G/A0/00/010/09/624/60/5
18,836,085T/A6/93/027/614/033/74/6
18,838,045C/T12/134/57/3022/616/333/5
18,843,686G/T0/510/00/80/010/30/0
18,844,089G/T0/310/00/70/010/10/0
PositionReference (R)/ Alternative (A)BM1452 Trio
BM1453 Trio
001100200001100200
18,825,550G/A0/00/010/09/624/60/5
18,836,085T/A6/93/027/614/033/74/6
18,838,045C/T12/134/57/3022/616/333/5
18,843,686G/T0/510/00/80/010/30/0
18,844,089G/T0/310/00/70/010/10/0
Table 4.

Number of reads at informative sites from 18,825,550 to 18,844,089, derived from the WGS read pairs with both end uniquely mapped.

PositionReference (R)/ Alternative (A)BM1452 Trio
BM1453 Trio
001100200001100200
18,825,550G/A0/00/010/09/624/60/5
18,836,085T/A6/93/027/614/033/74/6
18,838,045C/T12/134/57/3022/616/333/5
18,843,686G/T0/510/00/80/010/30/0
18,844,089G/T0/310/00/70/010/10/0
PositionReference (R)/ Alternative (A)BM1452 Trio
BM1453 Trio
001100200001100200
18,825,550G/A0/00/010/09/624/60/5
18,836,085T/A6/93/027/614/033/74/6
18,838,045C/T12/134/57/3022/616/333/5
18,843,686G/T0/510/00/80/010/30/0
18,844,089G/T0/310/00/70/010/10/0

To expand this analysis for additional experimental support, we next analysed WGS read pairs with at least one end uniquely mapped to a single LCR22 location (Figure 4D and E). In the region from 18,838,045 to 18,842,061, we found multiple heterozygous sites in BM1453.001 and BM1452.001. In all these positions, either one or both parents were heterozygous (Table 5). The inclusion of these WGS data further confirmed that the interval between 18,843,686 to 18,844,106 was hemizygous, as only one allele was observed in BM1452.001. Unfortunately, no uniquely mapped reads could be used to infer the deleted region in BM1453.001, again underscoring the difficulty in locating breakpoints in a particular 22q11.2DS individual using short-read WGS data.

Table 5.

Number of reads at informative sites from 18,838,045 to 18,844,106, derived from the WGS read pairs with at least one end uniquely mapped.

PositionR/ABoth end uniquely mapped
One end uniquely mapped
BM1452 Trio
BM1453 Trio
BM1452 Trio
BM1453 Trio
001100200001100200001100200001100200
18,838,045C/T12/134/57/3022/616/333/520/199/723/4830/724/4511/9
18,839,683C/G0000003/31/39/11/52/44/4
18,840,835G/A01/01/004/2014/410/225/113/431/715/6
18,841,052C/G0000000/80/40/1920/714/50/13
18,841,240T/C01/0006/006/35/316/56/023/07/1
18,842,061C/T01/02/21/002/020/1321/2223/3021/1435/1532/12
18,843,686G/T0/510/00/8010/300/2417/10/40013/300/2
18,844,089G/T0/310/00/7010/100/1019/00/14017/190
18,844,106A/G08/00010/000/615/00/6016/90
PositionR/ABoth end uniquely mapped
One end uniquely mapped
BM1452 Trio
BM1453 Trio
BM1452 Trio
BM1453 Trio
001100200001100200001100200001100200
18,838,045C/T12/134/57/3022/616/333/520/199/723/4830/724/4511/9
18,839,683C/G0000003/31/39/11/52/44/4
18,840,835G/A01/01/004/2014/410/225/113/431/715/6
18,841,052C/G0000000/80/40/1920/714/50/13
18,841,240T/C01/0006/006/35/316/56/023/07/1
18,842,061C/T01/02/21/002/020/1321/2223/3021/1435/1532/12
18,843,686G/T0/510/00/8010/300/2417/10/40013/300/2
18,844,089G/T0/310/00/7010/100/1019/00/14017/190
18,844,106A/G08/00010/000/615/00/6016/90
Table 5.

Number of reads at informative sites from 18,838,045 to 18,844,106, derived from the WGS read pairs with at least one end uniquely mapped.

PositionR/ABoth end uniquely mapped
One end uniquely mapped
BM1452 Trio
BM1453 Trio
BM1452 Trio
BM1453 Trio
001100200001100200001100200001100200
18,838,045C/T12/134/57/3022/616/333/520/199/723/4830/724/4511/9
18,839,683C/G0000003/31/39/11/52/44/4
18,840,835G/A01/01/004/2014/410/225/113/431/715/6
18,841,052C/G0000000/80/40/1920/714/50/13
18,841,240T/C01/0006/006/35/316/56/023/07/1
18,842,061C/T01/02/21/002/020/1321/2223/3021/1435/1532/12
18,843,686G/T0/510/00/8010/300/2417/10/40013/300/2
18,844,089G/T0/310/00/7010/100/1019/00/14017/190
18,844,106A/G08/00010/000/615/00/6016/90
PositionR/ABoth end uniquely mapped
One end uniquely mapped
BM1452 Trio
BM1453 Trio
BM1452 Trio
BM1453 Trio
001100200001100200001100200001100200
18,838,045C/T12/134/57/3022/616/333/520/199/723/4830/724/4511/9
18,839,683C/G0000003/31/39/11/52/44/4
18,840,835G/A01/01/004/2014/410/225/113/431/715/6
18,841,052C/G0000000/80/40/1920/714/50/13
18,841,240T/C01/0006/006/35/316/56/023/07/1
18,842,061C/T01/02/21/002/020/1321/2223/3021/1435/1532/12
18,843,686G/T0/510/00/8010/300/2417/10/40013/300/2
18,844,089G/T0/310/00/7010/100/1019/00/14017/190
18,844,106A/G08/00010/000/615/00/6016/90

A similar analysis of LCR22D was unproductive, as there was no WGS reads with at least one end uniquely mapped to the BCRP region in LCR22D, preventing us from inferring the deletion breakpoint in the distal region. This limitation may be related to the fact that two copies of nearly identical BCRP blocks exist in LCR22D. Furthermore, there are four additional highly similar BCRP blocks in other LCR22s.

In summary, the results from our WGS analysis indicate that the deletion breakpoint in LCR22A most likely occurred between 18,842,061 and 18,843,686 for BM1452.001 while the breakpoint is likely 3’ downstream of 18,838,045 in BM1453.001 (Figure 4B). We were, however, unable to analyse the distal breakpoint in LCR22D from the WGS data.

PCR validation of the predicted breakpoints

To confirm our breakpoint inference from WGS data, we designed a set of PCR primers near the breakpoints or within the deleted region (Figure 5B;Supplementary Materials, Table S4) and carried out PCR analysis with genomic DNA from the two 22q11.2DS probands. For all the primers used, either a single band or no band containing the PCR products was detected in agarose gels. The PCR products were shot-gun subcloned into bacteria, sequenced, and analysed (see Methods for details). For individual pairs of PCR primers, the analysed clone numbers ranged from 8 to 35. PCR products not uniquely mapped to the target regions were excluded from our analysis; in some cases they were the majority. Matching the polymorphisms in the PCR products to the WGS data confirmed that the position of chr22:18,842,061 was indeed heterozygous for C/T alleles in DNA from BM1452.001 (Figure 5B; bottom). Unfortunately, we did not obtain PCR products carrying the alternative T allele in DNA from BM1453.001, based on our analysis of 16 clones. It was not possible to design PCR primers targeted specifically to the two sites (18,843,686 and 18,844,089) inferred to be in the deleted region in LCR22A for DNA from BM1452.001 (Figure 4C;Table 4). Therefore, we chose a pair of primers (purple in Figure 5) further downstream and targeted a variant mapping to chr22:18,846,470. A G allele was present in DNA from BM1453.001, while a G/G and T/T were observed for BM1453.100 and BM1453.200, respectively, based on WGS. Sequencing of the PCR products confirmed that BM1453.001 inherited only the G allele, supporting its location in the deleted region of LCR22A. This pair of primers, however, failed to generate any PCR products using DNA from BM1452.001. Based on WGS data, the alleles at chr22:18,846,470 were T/T, T/G, and T/G for BM1452.001, BM1452.100, and BM1452.200, respectively. Thus, even successful PCR sequencing would not be informative for confirming the position of the deletion. As discussed above, we did not find WGS read pairs that uniquely mapped to LCR22D. Thus, no PCR primers were available to validate the deletion status of the regions flanking the putative distal breakpoint in LCR22D.
PCR validation of the breakpoints inferred WGS data. (A) Cartoon representation of the locations of predicted breakpoints in LCR22A. (B) PCR products used to confirm the prediction. A total of 3 pairs of primers with valid information were indicated with colour arrows. PCR products were illustrated with inferred allele(s) (black line for paternal and orange line for maternal allele) and numbers of analysed PCR clones (under the arrow track: first # for black allele and the second # for the orange allele). “*” refers to the heterozygous site (18,842,061) supported by both WGS reads and PCR sequencing. Grey vertical arrows mark the boundaries of predicted breakpoints.
Figure 5.

PCR validation of the breakpoints inferred WGS data. (A) Cartoon representation of the locations of predicted breakpoints in LCR22A. (B) PCR products used to confirm the prediction. A total of 3 pairs of primers with valid information were indicated with colour arrows. PCR products were illustrated with inferred allele(s) (black line for paternal and orange line for maternal allele) and numbers of analysed PCR clones (under the arrow track: first # for black allele and the second # for the orange allele). “*” refers to the heterozygous site (18,842,061) supported by both WGS reads and PCR sequencing. Grey vertical arrows mark the boundaries of predicted breakpoints.

There is only one copy of the IGSF3P locus in LCR22s (Figure 1). It maps upstream to our predicted breakpoint in LCR22A, and thus one parental copy would be deleted in LCR22D of the two probands. We therefore used PCR primers specific for IGSF3P in LCR22D to test its deletion status. The results confirmed that only one haplotype existed in the two probands, based on our analysis of 30 PCR clones from BM1452.001 and 19 from BM1453.001. We confirmed the existence of a common SNP (rs62241966; chr22:21,606,270) within the region subjected to PCR. A comparison of sequences in the PCR product from the probands as compared to the parental WGS reads uniquely mapped to the IGSF3P locus, indicated that BM1452.001 inherited a G allele from the paternal DNA (G/A in BM1452.200) whereas A/A variants were observed for BM1452.100. Data for BM1453.001 also supported the presence of the deletion, but less conclusively, as a G allele was observed in the PCR product from the proband sample while G/A alleles were at this position for both parents based on WGS data. Overall, the results support the conclusion that the breakpoints in LCR22A are 5’ upstream (i.e., proximal) to IGSF3P in LCR22A and 3’ downstream (i.e., distal) of IGSF3P in LCR22D (Figure 5).

In summary, we found that the proximal deletion breakpoint in LCR22A for the BM1452 trio is located between chr22:18,842,061 and chr22:18,843,686, within BCRP7. The distal breakpoint is located downstream to IGSF3P, and probably in chr22:21,644,783 – chr22:21,646,407 (Figure 5) under the assumption that NAHR had occurred between LCR22A and LCR22D. Considering all the consistent evidence that we have obtained from sequence analysis of human BAC clones, WGS analysis and PCR validation, we believe that the breakpoints for the both probands are near or within BCRP blocks.

Discussion

The most common, 3 Mb deletion on chromosome 22q11.2 is mediated by NAHR events between LCR22A and LCR22D. The precise position of the chromosome breakpoints leading to the deletion has remained elusive, even though many studies in past decades have attempted to map and clone them. Using a combination of BAC sequencing of normal individuals and whole genome sequencing of two unrelated patient-parent trios, we suggest that the BCR pseudogene (BCRP) module in LCR22s may harbour a hotspot for chromosome rearrangements in association with 22q11.2DS. This is based upon four pieces of evidence. The first, is that sequence analysis of BAC clones derived from normal individuals shows that the BCRP locus contains the greatest density of shared sequence polymorphisms and the lowest density of paralogous sequence variants within the 160 kb region of highest overall homology in LCR22A and LCR22D. Second, we found several human alleles have CNVs in the BCRP locus as determined by BAC clone sequencing. Third, we found a potentially rearranged BAC clone, AC007708, which appears to have a breakpoint in BCRP (Supplementary Material, Fig. S4). Fourth, we mapped the deletion endpoints in two unrelated trios in which both probands had a 3 Mb 22q11.2 deletion.

Although the most commonly occurring deletion in individuals with 22q11.2DS is 3 Mb in size, a small subset (<10%) has nested or atypical deletions involving other LCR22s. Of interest, deletion breakpoints occurred in the BCRP loci in two unrelated subjects with atypical distal 22q11.2 deletions involving LCR22E to -H (19). Since the telomeric LCR22s, LCR22E to LCR22H are smaller in size and more divergent in DNA sequence, it was feasible to amplify and sequence breakpoint junction fragments in individuals. In a previous report, the chromosome breakpoint occurred in one subject within LCR22D and LCR22E; in another subject, the deletion occurred even more distally, between LCR22E and LCR22F (19). Both breakpoints occurred in the BCRP module (19), in line with our current finding. When taken together, data points to the importance of the BCRP locus in promoting NAHR or gene conversion events in LCR22s.

There is other evidence supporting the potential involvement of BCR sequence in modulating chromosomal rearrangements. The BCRP locus is a partial copy of the BCR gene, which itself is involved in translocations associated with cancers (20). However, the breakpoints are neither in the vicinity of exons 20-23 where the putative 3 Mb breakpoint junction may lie, nor in the same region as where the evolutionary breakpoint in non-human primates in BCR gene forming the BCRP module (8). Thus the basis of the recombination within BCR is independent of other known disease or evolutionary breakpoints.

Challenge in PCR validation analysis

Although our prediction of the position of the breakpoints is based on experimental WGS data, it is important to confirm our finding using an orthogonal approach. The presence of multiple copies of LCR22s and the high degree of sequence identity among them turns out to be a big challenge even for PCR analysis, because useful PCR primers need to be sufficiently specific to a single LCR22 and able to distinguish between two different alleles. From our analysis of a few hundred of PCR clones, we found that almost all primers targeting a single BCRP amplified >1 LCR22, even with PCR products of the same size. Errors from amplification, cloning and Sanger sequencing may introduce variants that must be considered when distinguishing the true origin of PCR products, although this is less of an issue if there exists a sufficient number of PCR product clones. As such, we were only able to validate the bi-allelic status of chr22:18,842,061 in BM1452.001 and the monoallelic status of regions a few kb away from the predicted breakpoints. We have tried to use PCR primers flanking the putative distal breakpoints in LCR22D (Supplementary Material, Table S4) but the PCR products from the two probands either could not be mapped back to the target regions (e.g., several clones were from LCR22E/F) or contained variants not observed in the WGS reads, and thus providing limitedly useful information. More analysis will be needed to extend the confirmation in the future, probably by more advanced sequencing methods (21,22).

We also designed PCR primers for long-range PCR assays with the goal of generating products across the predicted breakpoints. Despite repeated attempts, we did not have primer pairs, with one site anchored before the predicted breakpoint (chr22:18,842,061) on LCR22A and the other after the breakpoint (chr22:21,646,407, equivalent to chr22:18,843,686) on LCR22D, that could amplify the DNAs of the proband but not of their parents in either family. Almost all tested primers generated products of the expected size from genomic DNAs of the probands and their parents. Cloning and Sanger sequencing analysis of the long-range PCR products also indicated that one or two mismatches in the primers were not sufficient to prevent PCR amplification from non-targeted LCR22s (e.g., LCR22E/F), making it very difficult to interpret the PCR sequencing data (data not shown). Nevertheless, we did have a few clones in which their sequences matched the undeleted copies of LCR22A. We should note that our analysis of WGS reads and PCR validation were carried out under the assumption that the linearity of the reference genome is correctly maintained in the 22q11.2 regions of the two trios. We could not rule out the possibility that some unknown structural rearrangement might have contributed to the difficulty in confirming our WGS findings.

Homologous recombination requires stretches of sequence identity

In order to more fully understand the possible mechanism by which the BCRP module serves as a hotspot for chromosome rearrangements, it is important to compare the findings for chromosome 22q11.2 with respect to the literature on criteria for enhanced recombination. For example, it is known that recombination, resulting in crossovers or non-crossover events such as gene conversion, requires a minimal length of identical sequences between paralogs. This is because proteins mediating homologous recombination require a stretch of sequence identity to create recombination products. In bacteria, the length of the minimal efficient processing segment (MEPS) of direct homology for crossing over is only about 30 bp (23). Mammalian homologous recombination requires a longer stretch of sequence identity of approximately 200 bp as determined by using extrachromosomal (24,25) and intrachromosomal recombination assays (25,26) of the HSV thymidine kinase (TK) gene in mammalian cell culture. As for mitotic homologous recombination assays, a similar length of continuous sequence identity is also needed (27). More recent studies suggested that the efficiency of homologous recombination is enhanced as the length of uninterrupted homology expands well beyond the MEPS. For example, primate evolutionary studies of the human Y chromosome have indicated that long stretches of direct sequence identity, in this case of human endogenous retroviral sequences (HERV), may serve to predispose homologous recombination events (28). This work suggests that enhanced gene conversion can homogenize regions allowing for stretches of higher homology between paralogs that would then stimulate more recombination. Such regions by definition would have more shared single nucleotide variations and fewer paralogous sequence variations. Proof of this would be to identify regions within LCRs associated with genomic disorders serving as hotspots for recombination that show stretches of sequence identity due to the existence of enhanced gene conversion and the presence of shared sequence polymorphisms in different normal humans.

Regions of homology and gene conversion in NAHR hotspots for genomic disorders

Fortunately, there exists substantial experimental evidence for the existence of recombination hotspots in LCRs with several genomic disorders that fulfill the criteria of having high homology and enhancing gene conversion. This includes rearrangements associated with Charcot-Marie-Tooth disease type 1A (CMT1A; MIM# 118220) and Hereditary Neuropathy with liability to Pressure Palsies (HNPP; MIM# 162500) syndromes on chromosome 17p12 (29,30), as well as Smith-Magenis syndrome (SMS; MIM# 182290) on chromosome 17p11.2 (31). A hotspot has been identified within a 12 kb region in the LCRs on 17p11.2, coincident with one of the regions of highest homology between the LCRs of increased rate of historic gene conversion (32). Further, approximately 5% of Neurofibromatosis type 1 (NF1; MIM# 162200) patients have deletions that occur by NAHR events between LCRs, with hotspots of recombination identified with direct sequence homology (33–35). As for SMS, such perfectly matched sequences in given patients might result from enhanced gene conversion in the region and would increase the risk for recombination (34,35). Evaluation of shared and paralogous sequence variants in these regions for NF1 supported the idea that gene conversion was increased along with an increase in hotspot usage (36). A hotspot of approximately 10 kb was identified for Williams-Beuren syndrome (WBS: MIM# 194050), and as for the other disorders, the WBS hotspot region containing longer stretches of homology between paralogous LCRs than other regions (37). There is a 3 kb hotspot for NAHR events associated with Sotos syndrome (SoS; MIM# 117550) deletion between LCRs on chromosome 5q35 (38,39). (40). As expected, the 3 kb interval had an increased sequence homology as compared to other regions in the LCRs (40). All of these regions have become duplicated during primate evolution, are not within functional genes and as such should possess paralogous sequence variants. Instead they are highly similar and possess shared polymorphic sites. The most parsimonious explanation is the existence of gene conversion events in those particular intervals resulted in a homogenization of sequences and enhanced recombination.

PRDM9, the human hotspot motif, and other sequence features

It has long been of interest to identify sequence elements within hotspots for meiotic chromosome rearrangements. One well-characterized hotspot motif for mammalian homologous recombination is a 7-mer CCTCCCT (16) or the more degenerate 13-mer of CCNCCNTNNCCNC (15). Recently, a new gene was identified that greatly influences the position of mammalian homologous recombination sites in the mammalian genome termed PRDM9 (PR domain-containing 9) (41–43). PRDM9 is a meiosis-specific histone methyltransferase and zinc-finger protein that binds to the CCNCCNTNNCCNC hotspot motif (15). The sequence recognition of PRDM9 depends on genetic variation within the Zn fingers in its coding region (44). At least one copy of the motif is present in the region of the chromosome crossovers in or near hotspots within LCRs for genomic disorders including X-linked ichthyosis (STS gene) (15,45), CMT1A/HNPP, NF1 and SMS as well as Potocki-Lupski syndrome (duplication) on 17p11.2 (46). Different alleles of PRDM9 alter male meiotic recombination at the CMT1A/HNPP locus suggesting that this is a key mechanism that can influence NAHR events (44). We previously found that the hotspot motif was significantly enriched at the breakpoints of frequently arranged subunits comprising LCR22s (1.6-fold enrichment, P =  0.026) and copy number variations within LCR22s (2.3-fold enrichment, p = 0.016; using data derived from our BAC mapping analysis), suggesting that they may have had a role in shaping LCR22 architecture (11). In this study, we found seven occurrences of this motif within or in the vicinity of the predicted recombination hotspots (Figure 3), with one (chr22:18,844,349) located closely to the breakpoints inferred from WGS data.

We should point out that the GC content within the BCRP locus is higher than its adjacent regions (Supplementary Material, Fig. S5), since increased GC has been associated with recombination, which may not be directly linked to the above recombination motif. As for NF1 (35) and SMS (32), the Sotos syndrome recombination hotspot occurs in a region that is GC rich (40), which may be related to the hotspot motif or be related to genomic structural features of GC rich regions independent of the hotspot motif (47,48). In addition, we would like to point out that two Alu SINE elements (AluJO at chr22:18,842,627–18,842,901 and chr22:21,645,329–21,645,622) are immediately upstream to our predicted breakpoint in LCR22A and LCR22D (data not shown). Their potential relationship with the BCRP breakpoints and their contributions to the mechanism of the 3Mb deletion need to be further characterized, given that Alu elements have been implicated in frequent rearrangements in LCR22s (8, 11) and the human genome (49).

Clinical implications and significance

Experimental replication of WGS data presented in this report will be required to ultimately address the frequency of the BCRP as a hotspot locus for 22q11.2 rearrangements in individuals with 22q11.2DS or more distal rearrangements. This would require WGS of more trios or targeted capture and massively parallel sequencing of the BCRP locus in many more affected offspring of normal non-deleted parents. Improvements in sequencing technology to obtain longer reads or haplotype sequence information would be necessary as well. In addition, further breakpoint sequencing of individuals with unusual deletions in LCR22s such as what was performed in subjects with atypical deletions (19) would help understand the degree of involvement of the BCRP region. If in fact breakpoints occur in the BCRP locus, with additional sequence data, it may be possible to more conclusively narrow deletion endpoints. Knowledge of deletion endpoints may eventually enable long-range PCR based genotyping assays as a screening tool for patients with typical and atypical deletions. Since there are individuals with the reciprocal products of crossover events resulting in a duplication of the 22q11.2 region, delineating NAHR hotspots would also help develop PCR screening assays for the duplication as well. This would be useful for prenatal or postnatal diagnostic genetic testing. It would also shed more light as to the nature of sequences at major hotspots and help uncover the mechanism or process involved in NAHR events. This might make it possible to identify additional vulnerable regions of the genome in other LCRs and would pinpoint new genomic disorders. If indeed the hotspot motif is the same one as recognized by PRDM9, it might suggest that variants in PRDM9 and the LCR22s could alter the risk for recombination in individuals and this can be tested for at the population level.

Materials and Methods

Identification of BACs mapping to LCR22A or LCR22D and BAC sequencing

The centromeric LCR22A spans an interval from 18,638,473 to 18,876,199 of 237,727 bp and LCR22D spans an interval from 21,465,570 to 21,726,672 of 261,616 bp (NCBI 37/hg19 assembly). Filters containing purified human genomic DNA from BAC clones from the CTD (CalTech human BAC library D; NCBI CloneDB), RP11 (RPCI human BAC library 11), RP13 (RPCI human BAC library 13), and CH17 (CHORI-17 BAC library from a hydratidiform (haploid) mole) libraries were purchased from BACPAC Resources (http://bacpac.chori.org/). This represents seven different haplotypes. The BAC filters were screened by Southern blot hybridization using P32 labelled PCR based DNA probes to detect sequences within and just outside of LCR22A and LCR22D (Supplementary Material, Table S1). The probes adjacent to LCR22s were chosen to anchor BAC clones to a specific LCR22 and unique region in the genome. The DNA was isolated from bacteria as described (https://bacpac.chori.org/protocols.htm) and screened by PCR with the same oligonucleotide primers used to screen the filters to validate the presence of the correct BAC clone. Positive clones were then screened by PCR using primers that span the LCR22 and surrounding interval to create a physical map of each to make sure there were no major structural variations. This was followed by Sequenom multiplex genotyping to distinguish the two alleles from each genomic library. Primers for Sequenom assays were chosen from across LCR22 regions containing a high minor allele frequency, with the variants listed in Supplementary Material, Table S2. BAC clones from each allele were sequenced and assembled de novo, at the Genome Center at Washington University. If a BAC clone from one allele was previously sequenced, we made sure to sequence the other allele. To fill in gaps of sequence upon fragment assembly, PCR amplification was performed followed by Sanger sequencing at Washington University. Restriction fragment digestion was also performed to create a fingerprint across the BACs to compare to the actual sequence. For two of 15 BAC clones, we found that ∼40kb of CH17-161A9 (BAC size 185 kb) and ∼102 kb of CH17-409N24 (BAC size 197 kb) exhibited extensive mosaicism when aligned to the reference (i.e. >10 mismatching nucleotides per kb in the region, see below), and thus were excluded from further analysis. The sequences for the remaining 13 BAC clones have been deposited to GenBank (accession KJ155472-KJ155484).

Sequence alignment of BACs to LCR22A and LCR22D

The assembled sequences from 13 BAC clones (with few remaining gaps in some clones) were aligned to the reference genome (Figure 1; Table 1). The sequences previously available in GenBank from an additional 15 BAC clones (Supplementary Materials, Fig. S1; Table S3) and mapped to LCR22s (11) were also re-aligned. For each of the 28 BACs, the local alignment tool BLAST (http://blast.ncbi.nlm.nih.gov/) (50) was used to identify its best-matched location (the longest continuous match with >99.5% identity) in the reference sequence of 22q11.2. The locations of the second best BLAST hits were also checked for mapping consistency. The aligned position was then used to retrieve the reference sequence, with extended bases, if necessary to match the BAC size. A global alignment between the BAC clone and the retrieved reference sequence was constructed using the program “stretcher” within the EMBOSS software package (http://emboss.sourceforge.net/) (51). To avoid potential sequencing and assembling errors, the number of mismatches were determined between each BAC clone and reference with a 1-kb window sliding along the global alignment; any windows with >10 mismatching nucleotides were flagged and excluded for further variant analysis. This filtering step was necessary as some regions on BAC clones showed major differences to the reference genome sequence, suggesting that the alignment was spurious or due to extensive re-arrangement. From the global alignment, we called sequence variants for individual BACs.

Multiple sequence alignment analysis of LCR22A/D BACs to categorize sequence variations

To categorize single nucleotide variants as being unique or shared within or between LCR22A and LCR22D, multiple sequence alignments of the genomic regions covered by overlapping BAC clones were performed. The reference sequence of the longest shared region of homology (i.e., segmental duplication) in LCR22A and LCR22D (spanning 160 kb) was used to compare sequences from BAC clones in LCR22D to the LCR22A sequence alignment (and its adjacent regions if a BAC extended outside of LCR22D). The 160kb region in LCR22A was split into smaller sequence units based on differential BAC clone coverage (Figure 2A). These units were called “modules”, which were determined by boundary points where either a BAC was encountered or a BAC ended as one slide from the beginning (i.e, centromeric end) to the end of LCR22A. From the BAC data, 31 modules were identified. The reference LCR22A, LCR22D, and BAC sequences within each module were used to construct a multiple sequence alignment using the CLUSTALW program (http://www.ebi.ac.uk/Tools/msa/clustalw2/) (52). This led to the creation of different classes of single nucleotide variations, as illustrated in Figure 2.

Breakpoint inference with BAC variants in LCR22A and LCR22D

SNVs that were termed SPS were compared to all other variant classes and were used to detect potential recombination hotspots. All variants located within simple repeat regions were initially excluded. Following the method described previously (9), a Binomial distribution was used to calculate the enrichment of variants of each class in a defined window (10kb). The probability of the enrichment of variants in a given window was calculated by
where p (r/n) is the average probability at each nucleotide derived from the total (r) variant events in the LCR22A/LCR22D region (length n), whereas r is the number of observed variants events in a defined window and n refers to the total nucleotides in the window (i.e, 10 kb). The resulting p values for each 10-kb window were calculated.

Inferring deletion breakpoints from whole genome sequencing data

Blood samples were obtained from two unrelated trio families with their informed consent (Internal Review Board Approved program, 1999-201) and the samples were de-identified in the Molecular Cytogenetics Core Facility at Einstein. The proband in each family carried a de novo 3 Mb 22q11.2 deletion and the parents of each proband did not carry a deletion and were unaffected. The presence of the deletion was determined by Affymetrix 6.0 array comparative genome hybridization (18). The average sequencing depth and coverage of the genome for the six individuals was 35.8x and 99.6%, respectively. All sequencing reads (90-bp paired-end) were mapped to the human genome (hg19) using the BWA tool (version 0.5.9-r16) (53). The full analysis of the SNP and CNV calls in the WGS data has been previously described (18). For the current analysis, we only analysed reads that aligned to the 22q11.2 region. From the alignment data, we separated aligned pairs of reads into three categories: (i) both ends tagged with “XT:A:U,” which indicates uniquely mapped to a single genomic location, (ii) one end tagged with “XT:A:U,”, and (iii) others. We additionally performed a BLAST search against the reference genome for those read pairs selected from (i) and (ii) to ensure their unique mapping genomic locations. For example, a pair of reads from BM1453.001 that was aligned uniquely to LCR22A (covering chr22:18,843,119) is GTCTCTGAATCTC A G C C A C C A C C A C A T A C A A T G C A A G T G G G A A G A T G G G C A G G A C T G G G G G T G G G G C A G G C A G A G G C C A C C T C T G T C A G G . ( 336n). T A C T T C T G G G T C T C A G T G G C C C A G G A C A A G G G G C C A G C T C T G G G C T G A T G G G G A G G T C T T C A T G A T GTGCTTGGGAGGGAAGGGGGGGCG (336n represents the insert between the two ends). Next, we loaded the WGS reads from categories (i) and (ii) into the IGV browser (http://www.broadinstitute.org/igv/) (54) to manually search for heterozygous alleles in the probands, which were likely located in the undeleted regions.

PCR validation in genomic DNA of the two probands

To identify the position of chromosome breakpoints in LCR22A and LCR22D, primer pairs were generated for PCR amplification of <1 kb intervals containing SNVs of interest. The regions to be amplified had unambiguously mapping WGS reads and contained the targeted SNVs in the probands. Primer sequences consisted of 20–30-bp in which the 3’ nucleotide was present in one location in one LCR22 in one or both probands, and the penultimate 3’ nucleotide was mismatched to create the maximal specificity for PCR amplification. Lastly, the distance of the selected pair of primers was < 1 kb to ensure adequate PCR amplification efficiency. The same principle was applied to design primers located in the intronic region of IGSF3 pseudogene within LCR22D to confirm whether the deletion breakpoint occurred in the proximal or distal blocks, upstream or downstream of the IGSF3 region, respectively. The primers are listed in Table S4. Standard PCR methods using Taq polymerase were used for amplification of 50 ng of genomic DNA. Sequences derived from allele-specific primers were mapped to the reference genome using BLAT (55). To exclude the products amplified from other non-intended LCR22 regions, we only analysed PCR sequences that were best matched to the targeted regions with ≥99% sequence identity. To overcome potential effects of sequencing error in a given position, we used a fisher exact test (P < 0.05) for calculating the possibility of observing a mismatching nucleotide (due to error) relative to all nucleotides from PCR sequences. Multiple sequence alignment (MSA) of the filtered sequences was constructed for each pair of primers. The number of haplotypes was then inferred from the MSA, whereas the detection of single haplotype was considered as evidence to support deletion (of the other haplotype).

Supplementary Material

Supplementary Material are available at HMG online, including supplementary methods, Supplementary Materials, Tables S1–S4, Figs S1–S5.

Acknowledgements

The authors would like to thank Dr. Adam Auton for helpful discussion and Dr. Jinlu Cai for help in WGS data analysis. We also thank Drs. Tamim Shaikh and Beverly Emanuel for critical comments and suggestions.

Conflict of Interest Statement. None declared.

Funding

This work is supported by National Institute of Health (NIH) grants (MH083121 and HL133120 to BM and DZ).

References

1

McDonald-McGinn
D.
Sullivan
K.
Marino
B.
Philip
N.
Swillen
A.
Vorstman
J.
Zackai
E.
Emanuel
B.
Vermeesch
J.
Morrow
B.
, et al. . (
2015
)
22q11.2 deletion syndrome
.
Nat. Rev. Dis. Primers
,
15071
.

2

Edelmann
L.
Pandita
R.K.
Spiteri
E.
Funke
B.
Goldberg
R.
Palanisamy
N.
Chaganti
R.S.
Magenis
E.
Shprintzen
R.J.
Morrow
B.E.
(
1999
)
A common molecular basis for rearrangement disorders on chromosome 22q11
.
Hum. Mol. Genet
.,
8
,
1157
1167
.

3

Edelmann
L.
Spiteri
E.
McCain
N.
Goldberg
R.
Pandita
R.K.
Duong
S.
Fox
J.
Blumenthal
D.
Lalani
S.R.
Shaffer
L.G.
, et al. . (
1999
)
A common breakpoint on 11q23 in carriers of the constitutional t(11;22) translocation
.
Am. J. Hum. Genet
.,
65
,
1608
1616
.

4

Shaikh
T.H.
Kurahashi
H.
Saitta
S.C.
O'Hare
A.M.
Hu
P.
Roe
B.A.
Driscoll
D.A.
McDonald-McGinn
D.M.
Zackai
E.H.
Budarf
M.L.
, et al. . (
2000
)
Chromosome 22-specific low copy repeats and the 22q11.2 deletion syndrome: genomic organization and deletion endpoint analysis
.
Hum. Mol. Genet
.,
9
,
489
501
.

5

Bailey
J.A.
Gu
Z.
Clark
R.A.
Reinert
K.
Samonte
R.V.
Schwartz
S.
Adams
M.D.
Myers
E.W.
Li
P.W.
Eichler
E.E.
(
2002
)
Recent segmental duplications in the human genome
.
Science
,
297
,
1003
1007
.

6

Bi
W.
Yan
J.
Stankiewicz
P.
Park
S.S.
Walz
K.
Boerkoel
C.F.
Potocki
L.
Shaffer
L.G.
Devriendt
K.
Nowaczyk
M.J.
, et al. . (
2002
)
Genes in a refined Smith-Magenis syndrome critical deletion interval on chromosome 17p11.2 and the syntenic region of the mouse
.
Genome Res
.,
12
,
713
728
.

7

Lupski
J.R.
(
1998
)
Genomic disorders: structural features of the genome can lead to DNA rearrangements and human disease traits
.
Trends Genet
.,
14
,
417
422
.

8

Babcock
M.
Pavlicek
A.
Spiteri
E.
Kashork
C.D.
Ioshikhes
I.
Shaffer
L.G.
Jurka
J.
Morrow
B.E.
(
2003
)
Shuffling of genes within low-copy repeats on 22q11 (LCR22) by Alu-mediated recombination events during evolution
.
Genome Res
.,
13
,
2519
2532
.

9

Pavlicek
A.
House
R.
Gentles
A.J.
Jurka
J.
Morrow
B.E.
(
2005
)
Traffic of genetic information between segmental duplications flanking the typical 22q11.2 deletion in velo-cardio-facial syndrome/DiGeorge syndrome
.
Genome Res
.,
15
,
1487
1495
.

10

Lindsay
S.J.
Khajavi
M.
Lupski
J.R.
Hurles
M.E.
(
2006
)
A chromosomal rearrangement hotspot can be identified from population genetic variation and is coincident with a hotspot for allelic recombination
.
Am. J. Hum. Genet
.,
79
,
890
902
.

11

Guo
X.
Freyer
L.
Morrow
B.
Zheng
D.
(
2011
)
Characterization of the past and current duplication activities in the human 22q11.2 region
.
BMC Genomics
,
12
,
71.

12

Liu
P.
Lacaria
M.
Zhang
F.
Withers
M.
Hastings
P.J.
Lupski
J.R.
(
2011
)
Frequency of nonallelic homologous recombination is correlated with length of homology: evidence that ectopic synapsis precedes ectopic crossing-over
.
Am. J. Hum. Genet
.,
89
,
580
588
.

13

Genomes Project
C.
Abecasis
G.R.
Auton
A.
Brooks
L.D.
DePristo
M.A.
Durbin
R.M.
Handsaker
R.E.
Kang
H.M.
Marth
G.T.
McVean
G.A.
(
2012
)
An integrated map of genetic variation from 1,092 human genomes
.
Nature
,
491
,
56
65
.

14

MacDonald
J.R.
Ziman
R.
Yuen
R.K.
Feuk
L.
Scherer
S.W.
(
2014
)
The Database of Genomic Variants: a curated collection of structural variation in the human genome
.
Nucleic Acids Res
.,
42
,
D986
D992
.

15

Myers
S.
Freeman
C.
Auton
A.
Donnelly
P.
McVean
G.
(
2008
)
A common sequence motif associated with recombination hot spots and genome instability in humans
.
Nat. Genet
.,
40
,
1124
1129
.

16

Myers
S.
Bottolo
L.
Freeman
C.
McVean
G.
Donnelly
P.
(
2005
)
A fine-scale map of recombination rates and hotspots across the human genome
.
Science
,
310
,
321
324
.

17

Molina
O.
Blanco
J.
Anton
E.
Vidal
F.
Volpi
E.V.
(
2012
)
High-resolution fish on DNA fibers for low-copy repeats genome architecture studies
.
Genomics
,
100
,
380
386
.

18

Chung
J.H.
Cai
J.
Suskin
B.G.
Zhang
Z.
Coleman
K.
Morrow
B.E.
(
2015
)
Whole-Genome Sequencing and Integrative Genomic Analysis Approach on Two 22q11.2 Deletion Syndrome Family Trios for Genotype to Phenotype Correlations
.
Hum. Mutat
.,
36
,
797
807
.

19

Shaikh
T.H.
O'Connor
R.J.
Pierpont
M.E.
McGrath
J.
Hacker
A.M.
Nimmakayalu
M.
Geiger
E.
Emanuel
B.S.
Saitta
S.C.
(
2007
)
Low copy repeats mediate distal chromosome 22q11.2 deletions: sequence analysis predicts breakpoint mechanisms
.
Genome Res
.,
17
,
482
491
.

20

Grigoriou
E.E.
Psarra
K.K.
Garofalaki
M.K.
Tziotziou
E.C.
Papasteriades
C.A.
(
2012
)
BCR-ABL fusion protein detection in peripheral blood and bone marrow samples of adult precursor B-cell acute lymphoblastic leukemia patients using the flow cytometric immunobead assay
.
Clin. Chem. Lab. Med
.,
50
,
1657
1663
.

21

Chaisson
M.J.
Huddleston
J.
Dennis
M.Y.
Sudmant
P.H.
Malig
M.
Hormozdiari
F.
Antonacci
F.
Surti
U.
Sandstrom
R.
Boitano
M.
, et al. . (
2015
)
Resolving the complexity of the human genome using single-molecule sequencing
.
Nature
,
517
,
608
611
.

22

Nuttle
X.
Itsara
A.
Shendure
J.
Eichler
E.E.
(
2014
)
Resolving genomic disorder-associated breakpoints within segmental DNA duplications using massively parallel sequencing
.
Nat. Protoc
.,
9
,
1496
1513
.

23

Shen
P.
Huang
H.V.
(
1986
)
Homologous recombination in Escherichia coli: dependence on substrate length and homology
.
Genetics
,
112
,
441
457
.

24

Rubnitz
J.
Subramani
S.
(
1984
)
The minimum amount of homology required for homologous recombination in mammalian cells
.
Mol. Cell. Biol
.,
4
,
2253
2258
.

25

Waldman
A.S.
Liskay
R.M.
(
1988
)
Dependence of intrachromosomal recombination in mammalian cells on uninterrupted homology
.
Mol. Cell. Biol
.,
8
,
5350
5357
.

26

Waldman
A.S.
Liskay
R.M.
(
1987
)
Differential effects of base-pair mismatch on intrachromosomal versus extrachromosomal recombination in mouse cells
.
Proc. Natl. Acad. Sci. U. S. A
,
84
,
5340
5344
.

27

Liskay
R.M.
Letsou
A.
Stachelek
J.L.
(
1987
)
Homology requirement for efficient gene conversion between duplicated chromosomal sequences in mammalian cells
.
Genetics
,
115
,
161
167
.

28

Blanco
P.
Shlumukova
M.
Sargent
C.A.
Jobling
M.A.
Affara
N.
Hurles
M.E.
(
2000
)
Divergent outcomes of intrachromosomal recombination on the human Y chromosome: male infertility and recurrent polymorphism
.
J. Med. Genet
.,
37
,
752
758
.

29

Lupski
J.R.
de Oca-Luna
R.M.
Slaugenhaupt
S.
Pentao
L.
Guzzetta
V.
Trask
B.J.
Saucedo-Cardenas
O.
Barker
D.F.
Killian
J.M.
Garcia
C.A.
, et al. . (
1991
)
DNA duplication associated with Charcot-Marie-Tooth disease type 1A
.
Cell
,
66
,
219
232
.

30

Reiter
L.T.
Hastings
P.J.
Nelis
E.
De Jonghe
P.
Van Broeckhoven
C.
Lupski
J.R.
(
1998
)
Human meiotic recombination products revealed by sequencing a hotspot for homologous strand exchange in multiple HNPP deletion patients
.
Am. J. Hum. Genet
.,
62
,
1023
1033
.

31

Park
S.S.
Stankiewicz
P.
Bi
W.
Shaw
C.
Lehoczky
J.
Dewar
K.
Birren
B.
Lupski
J.R.
(
2002
)
Structure and evolution of the Smith-Magenis syndrome repeat gene clusters, SMS-REPs
.
Genome Res
.,
12
,
729
738
.

32

Bi
W.
Park
S.S.
Shaw
C.J.
Withers
M.A.
Patel
P.I.
Lupski
J.R.
(
2003
)
Reciprocal crossovers and a positional preference for strand exchange in recombination events resulting in deletion or duplication of chromosome 17p11.2
.
Am. J. Hum. Genet
.,
73
,
1302
1315
.

33

Lopez Correa
C.
Brems
H.
Lazaro
C.
Marynen
P.
Legius
E.
(
2000
)
Unequal meiotic crossover: a frequent cause of NF1 microdeletions
.
Am. J. Hum. Genet
.,
66
,
1969
1974
.

34

Forbes
S.H.
Dorschner
M.O.
Le
R.
Stephens
K.
(
2004
)
Genomic context of paralogous recombination hotspots mediating recurrent NF1 region microdeletion
.
Genes Chromosomes Cancer
,
41
,
12
25
.

35

Lopez-Correa
C.
Dorschner
M.
Brems
H.
Lazaro
C.
Clementi
M.
Upadhyaya
M.
Dooijes
D.
Moog
U.
Kehrer-Sawatzki
H.
Rutkowski
J.L.
, et al. . (
2001
)
Recombination hotspot in NF1 microdeletion patients
.
Hum. Mol. Genet
.,
10
,
1387
1392
.

36

Raedt
T.D.
Stephens
M.
Heyns
I.
Brems
H.
Thijs
D.
Messiaen
L.
Stephens
K.
Lazaro
C.
Wimmer
K.
Kehrer-Sawatzki
H.
, et al. . (
2006
)
Conservation of hotspots for recombination in low-copy repeats associated with the NF1 microdeletion
.
Nat. Genet
.,
38
,
1419
1423
.

37

Bayes
M.
Magano
L.F.
Rivera
N.
Flores
R.
Perez Jurado
L.A.
(
2003
)
Mutational mechanisms of Williams-Beuren syndrome deletions
.
Am. J. Hum. Genet
.,
73
,
131
151
.

38

Kurotaki
N.
Harada
N.
Shimokawa
O.
Miyake
N.
Kawame
H.
Uetake
K.
Makita
Y.
Kondoh
T.
Ogata
T.
Hasegawa
T.
, et al. . (
2003
)
Fifty microdeletions among 112 cases of Sotos syndrome: low copy repeats possibly mediate the common deletion
.
Hum. Mutat
.,
22
,
378
387
.

39

Kurotaki
N.
Stankiewicz
P.
Wakui
K.
Niikawa
N.
Lupski
J.R.
(
2005
)
Sotos syndrome common deletion is mediated by directly oriented subunits within inverted Sos-REP low-copy repeats
.
Hum. Mol. Genet
.,
14
,
535
542
.

40

Visser
R.
Shimokawa
O.
Harada
N.
Kinoshita
A.
Ohta
T.
Niikawa
N.
Matsumoto
N.
(
2005
)
Identification of a 3.0-kb major recombination hotspot in patients with Sotos syndrome who carry a common 1.9-Mb microdeletion
.
Am. J. Hum. Genet
.,
76
,
52
67
.

41

Thomas
J.H.
Emerson
R.O.
Shendure
J.
(
2009
)
Extraordinary molecular evolution in the PRDM9 fertility gene
.
PLoS One
,
4
,
e8505.

42

Parvanov
E.D.
Petkov
P.M.
Paigen
K.
(
2010
)
Prdm9 controls activation of mammalian recombination hotspots
.
Science
,
327
,
835.

43

Baudat
F.
Buard
J.
Grey
C.
Fledel-Alon
A.
Ober
C.
Przeworski
M.
Coop
G.
de Massy
B.
(
2010
)
PRDM9 is a major determinant of meiotic recombination hotspots in humans and mice
.
Science
,
327
,
836
840
.

44

Berg
I.L.
Neumann
R.
Lam
K.W.
Sarbajna
S.
Odenthal-Hesse
L.
May
C.A.
Jeffreys
A.J.
(
2010
)
PRDM9 variation strongly influences recombination hot-spot activity and meiotic instability in humans
.
Nat. Genet
.,
42
,
859
863
.

45

Liu
P.
Erez
A.
Nagamani
S.C.
Bi
W.
Carvalho
C.M.
Simmons
A.D.
Wiszniewska
J.
Fang
P.
Eng
P.A.
Cooper
M.L.
, et al. . (
2011
)
Copy number gain at Xp22.31 includes complex duplication rearrangements and recurrent triplications
.
Hum. Mol. Genet
.,
20
,
1975
1988
.

46

Zhang
F.
Potocki
L.
Sampson
J.B.
Liu
P.
Sanchez-Valle
A.
Robbins-Furman
P.
Navarro
A.D.
Wheeler
P.G.
Spence
J.E.
Brasington
C.K.
, et al. . (
2010
)
Identification of uncommon recurrent Potocki-Lupski syndrome-associated duplications and the distribution of rearrangement types and mechanisms in PTLS
.
Am. J. Hum. Genet
.,
86
,
462
470
.

47

Jiang
H.
Li
N.
Gopalan
V.
Zilversmit
M.M.
Varma
S.
Nagarajan
V.
Li
J.
Mu
J.
Hayton
K.
Henschen
B.
, et al. . (
2011
)
High recombination rates and hotspots in a Plasmodium falciparum genetic cross
.
Genome Biol
.,
12
,
R33.

48

Fullerton
S.M.
Bernardo Carvalho
A.
Clark
A.G.
(
2001
)
Local rates of recombination are positively correlated with GC content in the human genome
.
Mol. Biol. Evol
.,
18
,
1139
1142
.

49

Bailey
J.A.
Liu
G.
Eichler
E.E.
(
2003
)
An Alu transposition model for the origin and expansion of human segmental duplications
.
Am. J. Hum. Genet
.,
73
,
823
834
.

50

Altschul
S.F.
Gish
W.
Miller
W.
Myers
E.W.
Lipman
D.J.
(
1990
)
Basic local alignment search tool
.
J. Mol. Biol
.,
215
,
403
410
.

51

Rice
P.
Longden
I.
Bleasby
A.
(
2000
)
EMBOSS: the European Molecular Biology Open Software Suite
.
Trends Genet
.,
16
,
276
277
.

52

Thompson
J.D.
Higgins
D.G.
Gibson
T.J.
(
1994
)
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice
.
Nucleic Acids Res
.,
22
,
4673
4680
.

53

Li
H.
Durbin
R.
(
2010
)
Fast and accurate long-read alignment with Burrows-Wheeler transform
.
Bioinformatics
,
26
,
589
595
.

54

Robinson
J.T.
Thorvaldsdottir
H.
Winckler
W.
Guttman
M.
Lander
E.S.
Getz
G.
Mesirov
J.P.
(
2011
)
Integrative genomics viewer
.
Nat Biotechnol
.,
29
,
24
26
.

55

Kent
W.J.
(
2002
)
BLAT–the BLAST-like alignment tool
.
Genome Res
.,
12
,
656
664
.

Author notes

Present address: Division of Epidemiology, Department of Medicine, Vanderbilt University School of Medicine, TN, USA

Supplementary data