Abstract

Genomic copy number variation (CNV) plays a major role in various human diseases as well as in normal phenotypic variability. For some recurrent disease-causing CNVs that convey genomic disorders, the causative mechanism is meiotic, non-allelic, homologous recombination between breakpoint regions exhibiting extensive sequence homology (e.g. low-copy repeats). For the majority of recently identified rare pathogenic CNVs, however, the mechanism is unknown. Recently, a model for CNV formation implicated mitotic replication-based mechanisms, such as (alternative) non-homologous end joining and fork stalling and template switching, in the etiology of human pathogenic CNVs. The extent to which such mitotic mechanisms contribute to rare pathogenic CNVs remains to be determined. In addition, it is unexplored whether genomic architectural features such as repetitive elements or sequence motifs associated with DNA breakage stimulate the formation of rare pathogenic CNVs. To this end, we have sequenced breakpoint junctions of 30 rare pathogenic microdeletions and eight tandem duplications, representing the largest series of such CNVs examined to date in this much detail. Our results demonstrate the presence of (micro)homology ranging from 2 to over 75 bp, in 79% of the breakpoint junctions. This indicates that microhomology-mediated repair mechanisms, including the recently reported fork stalling and template switching and/or microhomology-mediated break-induced replication, prevail in rare pathogenic CNVs. In addition, we found that the vast majority of all breakpoints (81%) were associated with at least one of the genomic architectural features evaluated. Moreover, 75% of tandem duplication breakpoints were associated with the presence of one of two novel sequence motifs. These data suggest that rare pathogenic microdeletions and tandem duplications do not occur at random genome sequences, but are stimulated and potentially catalyzed by various genomic architectural features.

INTRODUCTION

Microarray-based copy number-profiling technologies have taken the resolution to detect genome-wide copy number variation (CNV) to a level 100–1000 times higher than that achievable by conventional karyotyping. These technologies have been applied successfully in the field of clinical genetics, leading to (i) a significant increase in diagnostic yield for patients with unexplained mental retardation (MR)/developmental delay, (ii) the identification of genes causative for sporadic malformation syndromes, and (iii) novel microdeletion syndromes ( 1–12 ). In addition to these pathogenic CNVs, extensive DNA CNV has been identified recently in healthy individuals (benign CNVs). These CNVs are anticipated to contribute to normal phenotypic variation as well as to susceptibility to multifactorial disease ( 4 , 13 ).

As yet, little is known about the formation of CNVs. Certain genomic architectural features, such as low-copy repeats (LCRs), may catalyze recurrent CNV formation ( 14–19 ). Because of extensive sequence homology, LCRs can mediate meiotic chromosome/chromatid ectopic pairing, followed by non-allelic homologous recombination (NAHR). As such, these recurrent CNVs do not represent random events but, instead, reflect the underlying genomic architectural features ( 14 , 15 , 17–19 ). However, the vast majority of pathogenic microdeletions and microduplications consist of rare non-recurrent CNVs, scattered throughout the genome, that are not mediated by LCRs ( 20 ).

It is plausible that genomic architectural features other than LCRs may play a role in the molecular mechanisms, resulting in the formation of these rare non-recurrent CNVs ( 21–25 ). The spectrum and contribution of these genomic features and mechanisms leading to rare pathogenic CNVs are poorly explored, as only a limited number of such CNVs have been delineated to the base pair level ( 26–28 ). On the basis of the observation that short stretches of microhomology are present at the breakpoint sites, various mechanisms have been hypothesized to play a role, such as non-homologous end joining (NHEJ), alternative NHEJ [alt-NHEJ; also known as microhomology-mediated end joining (MMEJ)] or fork stalling and template switching (FoSTeS) ( 29–31 ). These latter replication-based mechanisms indicate that mitotic events, rather than meiotic events, could play an important role in the formation of rare pathogenic CNVs ( 32 ). Recently, this hypothesis was experimentally supported by using a model system for CNV formation specifically resulting from replication stress ( 33 ). In this series of induced CNVs, microhomology at the breakpoint junction (<6 bp) was identified as a key feature, consistent with the hypothesis that these CNVs were formed by alt-NHEJ or a replication-based mechanism. The ultimate proof to confirm these findings would come from a large series of naturally occurring rare pathogenic CNVs.

To this end, we have analyzed 38 rare pathogenic CNVs, including 30 deletions and eight tandem duplications, previously identified in routine diagnostics. All corresponding 76 breakpoints were resolved to the base pair level by amplification and sequencing of breakpoint junction fragments. The presence and extend of (micro)homology were determined for each CNV. To evaluate the local genomic architecture, the experimentally determined breakpoints were computationally analyzed for the presence of genomic architectural features, including repetitive elements, sequence motifs associated with chromosomal rearrangements and the potential of these sequences to adopt non-B DNA conformations. Additionally, we compared these breakpoint regions to ‘an average genome sequence’ to investigate whether breakpoint sequences occur randomly in the genome or show a bias towards the presence of specific genomic sequences. We provide evidence for the enrichment of Alu repetitive elements and two new sequence motifs that may facilitate pathology-associated genomic rearrangements that lead to CNV. In addition, the level of microhomology observed at the breakpoint junctions supports an important role for replication-based mechanisms and, thus, a mitotic origin.

RESULTS

To study molecular mechanisms and local genomic architectural features underlying rare pathogenic CNVs, we sequenced and aligned the breakpoints of 30 deletions and eight tandem duplications to the human genome (hg18) and determined the exact proximal and distal breakpoints (Table  1 ). With this information, we analyzed the respective breakpoint regions for the presence of (micro)homology, repetitive elements, sequence motifs previously associated with DNA breakage and potential to adopt non-B DNA conformations.

Table 1.

Breakpoint mapping and evaluation of genomic features at the breakpoints

Del/ Dup Chr Start (hg18) End (hg18) Size (kb)  Micro homology (bp) a Inserted bases (bp)  Breakpoint (region) 1
 
Breakpoint (region) 2
 
Potential molecular mechanism 
Repetitive element Sequence motifs Non-B DNA conformation Repetitive element Sequence motifs Non-B DNA conformation 
Deletion CNVs ordered by the size of microhomology at the junction 
14 196 883 16 342 940 2146 − −  − −  − NHEJ 
165 316 927 165 701 173 384 − − MER5A − L1PB4 − − NHEJ 
145 936 944 146 244 400 307 − 1 (A)  − L1PBa1 − NHEJ 
27 162 674 28 279 518 1117 − 7 (TTGAGAC) AluSg/x − AluJb NHEJ 
24 447 284 31 302 616 6855 −  AluJb − FoSTeSx1/MMBIR/NHEJ 
117 097 068 117 798 935 702 − AluJb − L2 − FoSTeSx1/MMBIR/NHEJ 
61 214 995 75 244 612 14 030 − FLAM_C − MER21C − − FoSTeSx1/MMBIR/NHEJ 
50 034 351 50 413 346 379 −  −  − − FoSTeSx1/MMBIR/NHEJ 
148 832 270 150 022 686 1190 −  − L1MD1 − − FoSTeSx1/MMBIR/NHEJ 
10 160 402 568 165 130 160 4728 − L2 − −  − − FoSTeSx2/MMBIR/SSA 
11 78 514 998 79 167 042 652 −  − −  − FoSTeSx1/MMBIR/NHEJ 
12 61 798 468 61 889 695 91 − Charlie1 − − AluSx − − FoSTeSx1/MMBIR/NHEJ 
13 18 360 769 18 498 469 138 − AluJb − −  − − FoSTeSx1/MMBIR/NHEJ 
14 13 31 154 819 37 648 938 6494 −  − MLT2B3 − − FoSTeSx1/MMBIR/NHEJ 
15 b 16 85 157 843 85 288 903 131 −  − −  − − FoSTeSx1/MMBIR/NHEJ 
16 18 44 917 833 45 160 158 242 − MIRb − −  − − FoSTeSx1/MMBIR/NHEJ 
17 b 16 84 275 154 86 275 754 2001 − L1ME1 − MER3 − − FoSTeSx1/MMBIR/NHEJ 
18 19 60 404 228 63 131 125 2727 − MLT1A0 − −  − FoSTeSx1/MMBIR/NHEJ 
19 145 485 456 149 766 094 4281 −  −  − FoSTeSx1/MMBIR/NHEJ 
20 b,c 16 84 402 571 85 435 712 1033 − AluSx − AluSg1 − FoSTeSx1/MMBIR/NAHR/NHEJ 
21 b,c 18 297 587 18 454 995 157 11 − AluSq − − AluSp − − FoSTeSx1/MMBIR/NAHR/NHEJ 
22 b,c 16 82 908 199 86 405 076 3497 12 − AluSx − − AluJ/FLAM − FoSTeSx1/MMBIR/NAHR/NHEJ 
23 b,c 18 367 759 18 442 628 75 15 − AluSx − − AluSq − FoSTeSx1/MMBIR/NAHR/NHEJ 
24 b,c 16 83 705 765 85 204 004 1498 17 − AluSq − − AluSq − FoSTeSx1/MMBIR/NAHR/NHEJ 
25 b,c 16 84 374 208 85 277 007 903 18 − AluSx − AluSx − − FoSTeSx1/MMBIR/NAHR/NHEJ 
26 c 16 87 676 995 88 037 132 360 20 − AluY − AluSx − FoSTeSx1/MMBIR/NAHR/NHEJ 
27 c 29 176 161 30 008 811 833 27 − AluSc − − AluY − − FoSTeSx1/MMBIR/NAHR/NHEJ 
28 c 155 746 065 157 307 965 1562 30 − AluY − − AluY − FoSTeSx1/MMBIR/NAHR/NHEJ 
29 c 60 249 148 62 195 867 1947 61 − L1PA3 − − L1PA3 − − FoSTeSx1/MMBIR/NAHR/NHEJ 
30 c 6 463 246 8 099 396 1636 >75 − LCRXpC − − LCRXpC′ NAHR 
Duplication CNVs ordered by the size of microhomology at the junction 
14 20 564 121 20 959 453 395 6 (TTTACT)  − L1MC4a FoSTeSx2/MMBIR/NHEJ 
39 298 883 39 579 678 281 6 (ACCGGG)  −  L1PA11 − − FoSTeSx2/MMBIR/NHEJ 
53 692 970 62 539 276 8846 − MIR −   − FoSTeSx1/MMBIR/NHEJ 
50 011 487 50 195 746 184 8 (GAAAGTGT)  −  L1MC1 − FoSTeSx2/MMBIR/NHEJ 
30 244 739 30 365 510 121 −  −   − − FoSTeSx1/MMBIR/NHEJ 
32 443 911 32 490 362 46 −  −   − − FoSTeSx1/MMBIR/NHEJ 
172 494 609 173 018 830 524 − L2 −  − − FoSTeSx1/MMBIR/NHEJ 
8 b 97 063 032 97 426 630 364 −   Charlie2a − − FoSTeSx1/MMBIR/NHEJ 
Del/ Dup Chr Start (hg18) End (hg18) Size (kb)  Micro homology (bp) a Inserted bases (bp)  Breakpoint (region) 1
 
Breakpoint (region) 2
 
Potential molecular mechanism 
Repetitive element Sequence motifs Non-B DNA conformation Repetitive element Sequence motifs Non-B DNA conformation 
Deletion CNVs ordered by the size of microhomology at the junction 
14 196 883 16 342 940 2146 − −  − −  − NHEJ 
165 316 927 165 701 173 384 − − MER5A − L1PB4 − − NHEJ 
145 936 944 146 244 400 307 − 1 (A)  − L1PBa1 − NHEJ 
27 162 674 28 279 518 1117 − 7 (TTGAGAC) AluSg/x − AluJb NHEJ 
24 447 284 31 302 616 6855 −  AluJb − FoSTeSx1/MMBIR/NHEJ 
117 097 068 117 798 935 702 − AluJb − L2 − FoSTeSx1/MMBIR/NHEJ 
61 214 995 75 244 612 14 030 − FLAM_C − MER21C − − FoSTeSx1/MMBIR/NHEJ 
50 034 351 50 413 346 379 −  −  − − FoSTeSx1/MMBIR/NHEJ 
148 832 270 150 022 686 1190 −  − L1MD1 − − FoSTeSx1/MMBIR/NHEJ 
10 160 402 568 165 130 160 4728 − L2 − −  − − FoSTeSx2/MMBIR/SSA 
11 78 514 998 79 167 042 652 −  − −  − FoSTeSx1/MMBIR/NHEJ 
12 61 798 468 61 889 695 91 − Charlie1 − − AluSx − − FoSTeSx1/MMBIR/NHEJ 
13 18 360 769 18 498 469 138 − AluJb − −  − − FoSTeSx1/MMBIR/NHEJ 
14 13 31 154 819 37 648 938 6494 −  − MLT2B3 − − FoSTeSx1/MMBIR/NHEJ 
15 b 16 85 157 843 85 288 903 131 −  − −  − − FoSTeSx1/MMBIR/NHEJ 
16 18 44 917 833 45 160 158 242 − MIRb − −  − − FoSTeSx1/MMBIR/NHEJ 
17 b 16 84 275 154 86 275 754 2001 − L1ME1 − MER3 − − FoSTeSx1/MMBIR/NHEJ 
18 19 60 404 228 63 131 125 2727 − MLT1A0 − −  − FoSTeSx1/MMBIR/NHEJ 
19 145 485 456 149 766 094 4281 −  −  − FoSTeSx1/MMBIR/NHEJ 
20 b,c 16 84 402 571 85 435 712 1033 − AluSx − AluSg1 − FoSTeSx1/MMBIR/NAHR/NHEJ 
21 b,c 18 297 587 18 454 995 157 11 − AluSq − − AluSp − − FoSTeSx1/MMBIR/NAHR/NHEJ 
22 b,c 16 82 908 199 86 405 076 3497 12 − AluSx − − AluJ/FLAM − FoSTeSx1/MMBIR/NAHR/NHEJ 
23 b,c 18 367 759 18 442 628 75 15 − AluSx − − AluSq − FoSTeSx1/MMBIR/NAHR/NHEJ 
24 b,c 16 83 705 765 85 204 004 1498 17 − AluSq − − AluSq − FoSTeSx1/MMBIR/NAHR/NHEJ 
25 b,c 16 84 374 208 85 277 007 903 18 − AluSx − AluSx − − FoSTeSx1/MMBIR/NAHR/NHEJ 
26 c 16 87 676 995 88 037 132 360 20 − AluY − AluSx − FoSTeSx1/MMBIR/NAHR/NHEJ 
27 c 29 176 161 30 008 811 833 27 − AluSc − − AluY − − FoSTeSx1/MMBIR/NAHR/NHEJ 
28 c 155 746 065 157 307 965 1562 30 − AluY − − AluY − FoSTeSx1/MMBIR/NAHR/NHEJ 
29 c 60 249 148 62 195 867 1947 61 − L1PA3 − − L1PA3 − − FoSTeSx1/MMBIR/NAHR/NHEJ 
30 c 6 463 246 8 099 396 1636 >75 − LCRXpC − − LCRXpC′ NAHR 
Duplication CNVs ordered by the size of microhomology at the junction 
14 20 564 121 20 959 453 395 6 (TTTACT)  − L1MC4a FoSTeSx2/MMBIR/NHEJ 
39 298 883 39 579 678 281 6 (ACCGGG)  −  L1PA11 − − FoSTeSx2/MMBIR/NHEJ 
53 692 970 62 539 276 8846 − MIR −   − FoSTeSx1/MMBIR/NHEJ 
50 011 487 50 195 746 184 8 (GAAAGTGT)  −  L1MC1 − FoSTeSx2/MMBIR/NHEJ 
30 244 739 30 365 510 121 −  −   − − FoSTeSx1/MMBIR/NHEJ 
32 443 911 32 490 362 46 −  −   − − FoSTeSx1/MMBIR/NHEJ 
172 494 609 173 018 830 524 − L2 −  − − FoSTeSx1/MMBIR/NHEJ 
8 b 97 063 032 97 426 630 364 −   Charlie2a − − FoSTeSx1/MMBIR/NHEJ 

‘+’, enrichment of sequence motif (Table  2 ) or potential to adopt non-B DNA conformation; ‘−’, no enrichment of sequence motif or potential to adopt non-B DNA conformation.

a Schematic overview of all breakpoint junctions is provided in Supplementary Material, Figure S1 .

b Previously published in references ( 51–53 ).

c Extent of homology was further evaluated using BLAST2 ( Supplementary Material, Fig. S2 ).

Microhomology is present at the vast majority of breakpoints

Twenty-four of 30 (80%) deletion CNVs, as well as six of eight (75%) tandem duplications, showed microhomology at the breakpoints, ranging in size from 2 bases to >75 bases (Table  1 ). A 2 bp microhomology was identified in 11 breakpoint junctions, a 3 bp microhomology was observed in four junctions, a 5 bp microhomology was found in two junctions and 4, 6 and 8 bp microhomologies were each identified in a single breakpoint junction (Fig.  1 ; Supplementary Material, Fig. S1 ). Interestingly, in one duplication CNV, Dup4, the observed microhomology was accompanied by an 8 bp GAAAGTG insertion at the breakpoint junction. Longer stretches of homology, extending over 10 bases, were identified in 10 deletion CNVs (Fig.  1 C; Table  1 ).

Figure 1.

Junction fragments of selected CNVs: Del10 ( A ), Del19 ( B ), Del27 ( C ) and Dup7 ( D ). DNA sequences, obtained from direct sequencing of the junction fragments, were aligned to the normal wild-type proximal and distal sequences. Sequence homology to the normal proximal and distal wild-type sequences are shown in blue and magenta. If genomic elements were known for the normal wild-type proximal and distal sequences, they were indicated above and below the wild-type sequences. Arrows extending to the left and right indicate that the genomic element is longer than that shown in this schematic representation. The presence of perfect microhomology is indicated by a shaded yellow box, extending the size of bases with perfect microhomology. Magenta blocks indicate a second stretch of microhomology (7 bp) on each side of the additional 20 bp deletion observed in the junction fragment of Del10.

Figure 1.

Junction fragments of selected CNVs: Del10 ( A ), Del19 ( B ), Del27 ( C ) and Dup7 ( D ). DNA sequences, obtained from direct sequencing of the junction fragments, were aligned to the normal wild-type proximal and distal sequences. Sequence homology to the normal proximal and distal wild-type sequences are shown in blue and magenta. If genomic elements were known for the normal wild-type proximal and distal sequences, they were indicated above and below the wild-type sequences. Arrows extending to the left and right indicate that the genomic element is longer than that shown in this schematic representation. The presence of perfect microhomology is indicated by a shaded yellow box, extending the size of bases with perfect microhomology. Magenta blocks indicate a second stretch of microhomology (7 bp) on each side of the additional 20 bp deletion observed in the junction fragment of Del10.

BLAST2 analysis was used to determine the extent of the homology as well as the percentage of sequence identity (Table  1 ; Supplementary Material, Fig. S2 ). Using this approach, the breakpoints of Del30 were identified to occur in directly oriented LCRXpC and LCRXpC′, which are part of an ∼7.4 kb LCR cluster on chromosome Xp22.31 (Table  1 ). The distal copy LCRXpC is 1288 bp in size and shares 98% sequence identity with its proximal counterpart LCRXpC′, which is 1291 bp in size. For the remaining nine breakpoint junctions with stretches >10 bp, sequence identity ranged from 78% over 280 bases (Del25) to 97% over 4.6 kb bases (Del29) ( Supplementary Material, Fig. S2 ).

To determine whether the observed (micro)homology could be expected by chance, a series of 100 breakpoints representing the average genome was simulated and analyzed for the presence and extend of (micro)homology. The distribution of microhomology in simulated breakpoints differed significantly (Wilcoxon rank sum test P = 7.27 × 10 −15 ) from the distribution of microhomology at the breakpoint junctions observed in rare pathogenic microdeletions and tandem duplications. To exclude significance due to the relatively low number of simulated breakpoints, the number of simulated breakpoints was increased to a total of 500. This analysis strengthened our initial results as indicated by a highly significant P -value of 8.64 × 10 −20 (Fig.  2 ). In addition, to exclude a bias introduced by the relatively large number of rare pathogenic CNVs with extended homology, the analysis was repeated excluding microhomology >10 bp (100 simulations P = 1.58 × 10 −10 ; 500 simulations P = 4.49 × 10 −13 ). This indicates that the length of exact microhomology at the breakpoints of rare pathogenic microdeletions and tandem duplications significantly exceeds the length of microhomology expected by chance.

Figure 2.

Distribution of microhomology for simulated breakpoints ( n = 500; blue bars) and observed breakpoints in rare pathogenic CNVs ( n = 38; red bars). The simulated breakpoints center at 0–1 bp of microhomology, whereas the observed breakpoint microhomology clusters away from the simulated breakpoint microhomolgy, with its center around 1–3 bp of microhomology. This indicates that longer stretches of microhomology are low by chance alone and, thus, suggests a functional role for the presence of microhomology at the breakpoint junctions of rare pathogenic CNVs.

Figure 2.

Distribution of microhomology for simulated breakpoints ( n = 500; blue bars) and observed breakpoints in rare pathogenic CNVs ( n = 38; red bars). The simulated breakpoints center at 0–1 bp of microhomology, whereas the observed breakpoint microhomology clusters away from the simulated breakpoint microhomolgy, with its center around 1–3 bp of microhomology. This indicates that longer stretches of microhomology are low by chance alone and, thus, suggests a functional role for the presence of microhomology at the breakpoint junctions of rare pathogenic CNVs.

Two of the deletion CNVs, not having any signs of homology, showed a ‘perfect’ transition from the normal proximal to normal distal wildtype sequence (Del1 and Del2). Two additional deletion CNVs, and two tandem duplication CNVs without microhomology, Del3, Del4, Dup1 and Dup2, showed incorporation of additional bases at the breakpoint junction, i.e. a 1 bp A, a 7 bp TTGAGAC, a 6 bp TTTACT and a 6 bp ACCGGG insertion, respectively (Table  1 ).

Repetitive elements are enriched at deletion breakpoints but not at duplication breakpoints

In 42 of 60 (70%) deletion CNV breakpoints, a known repetitive element was observed at the breakpoints (Table  1 ). These included 10 different SINEs ( Alu Sg/x, Alu Jb, FLAM_C, MIRb , Alu Sg , Alu Sp , Alu Sc , Alu Sq , Alu Sx and Alu Y), six different LINEs (L2, L1ME1, L1PA3, L1PB4, L1PBa1 and L1MD1), three DNA repeats (MER5A, Charlie1 and MER3) and three LTRs (MLT1A0, MER21C and MLT2B3). In contrast, only 6 of 16 (38%) tandem duplication breakpoints occurred within such elements (Table  1 ), suggesting a difference between deletions and tandem duplications for these genomic features. To determine whether the observed frequency of repetitive elements at the breakpoint sequences is different from ‘an average genome’, the same list of 500 randomly generated breakpoints was analyzed for these features. In total, 235 (47%) of these simulated breakpoints coincided with known repetitive elements. When comparing deletion breakpoints to this simulation of the average genome, deletion breakpoints show a significant enrichment for repetitive elements ( P = 2.42 × 10 −2 ), whereas tandem duplication breakpoints fall within the normal genomic distribution of these elements ( P = 0.81).

In total, 25 out of 30 (83%) deletion CNVs have at least one breakpoint within a known repetitive element (Table  1 ); in eight cases, only one of the breakpoints mapped within a repetitive element, whereas the other sequence was unique; in five other cases, both breakpoints were part of a known element, but those elements were of different classes; in 12 remaining cases, both breakpoints were part of a known element belonging to the same class. For tandem duplication CNVs, six out of eight (75%) had a breakpoint within a repetitive element (Table  1 ). Interestingly, none of the tandem duplications showed repetitive elements at both breakpoints.

Identification of three novel sequence motifs in tandem duplication breakpoints and enrichment for known motifs in breakpoint sequences

Next to repetitive elements, other sequence motifs have been reported also to predispose to DNA breakage, such as the deletion hotspot consensus sequence, topoisomerase consensus cleavage sites and translin-binding sites ( 22 , 34 ). Here we tested three hypotheses for the role of sequence motifs in the formation of rare pathogenic microdeletions and tandem duplications: (i) enrichment for individual known motifs stimulating CNV formation, (ii) an overall increased density of different motifs predisposing CNV formation and (iii) a novel common sequence motif present in rare pathogenic CNVs mediating deletions and/or duplications. These three hypotheses were tested by a systematic search for 40 different sequence motifs previously associated with DNA breakage using the Fuzznuc program from EMBOSS ( 22 , 35 ). For this search, sequences of 150 bp (referred to as breakpoint regions) were used that surrounded the exact breakpoint ( Supplementary Material, Table S1 ). Of the 40 sequence motifs, 24 were represented in the breakpoint regions (Table  2 ).

Table 2.

Sequence motifs evaluated in breakpoint regions

Motif name Motif sequence  Total occurrence in all breakpoint regions
 
Maximum occurrence in a single breakpoint region
 
Enriched in breakpoint region 
Random genome ( n = 500)   Observed ( n = 76)   Random genome ( n = 500)   Observed ( n = 76)  
Sequences evaluated for their presence in all 76 breakpoint regions 
X-element Escherichia coli GCTGGTGG — — — 
Ade6-M26 ATGACGT — — — 
ARS consensus S. cerevisiae WTTTATRTTTW 11 Del25 (1); Del26 (2) 
ARS consensus Schizosaccharomyces pombe WRTTTATTTAW — — — 
Consensus SAR 1 AATAAAYAAA — — — 
Consensus SAR 2 TTWTWTTWTT 150 18 20 Del6 (1); Del28 (2); Dup3 (2) 
Consensus SAR 3 WADAWAYAWW 158 35 10 Del5 (1); Del8 (1) 
Consensus SAR 4 TWWTDTTWWW 300 36 20 Del6 (1); Del28 (2) 
Deletion hotspot consensus TGRRKM 696 111 — 
DNA polymerase arrest site WGGAG 521 82 — 
DNA polymerase a frameshift hotspot 1 TCCCCC 40 — 
DNA polymerase a frameshift hotspot 2 CTGGCG 12 — — — 
DNA polymerase b frameshift hotspot 1 ACCCWR 179 32 Del4 (1); Del30 (2) 
DNA polymerase a/b frameshift hotspot 1 ACCCCA 41 — 
DNA polymerase a/b frameshift hotspot 2 TGGNGT 173 17 — 
D. topoisomerase 2 consensus  GTNWAYATTNATNNR — — Del8 (1) 
Heptamer recombination signal CACAGTG 21 Del2 (1); Del9 (1); Del17 (1); Del26 (1) 
Human hypervariable minisatellites sequence 1 GGAGGTGGGCAGGARG — — — — — 
Human hypervariable minisatellites sequence 2 AGAGGTGGGCAGGTGG — — — — — 
Human minisatellites core sequence GGGCAGGARG — — — 
Human replication origin consensus WAWTTDDWWWDHWGWHMAWTTDHWGWHMAWTT — — — — — 
Human minisatellites conserved sequence/X-like element GCWGGWGG 26 Del4 (2); Del20 (1/2); Del25 (1) 
Ig heavy chain class switch repeat 1 GAGCT 140 18 — 
Ig heavy chain class switch repeat 2 GGGCT 145 11 Del1 (2) 
Ig heavy chain class switch repeat 3 GGGGT 118 21 — 
Ig heavy chain class switch repeat 4 TGGGG 184 27 — 
Ig heavy chain class switch repeat 5 TGAGC 169 43 — 
LTR-IS motif TGGAAATCCCC — — — — — 
Mariner transposon-like element GAAAATGAAGCTATTTACCCAGGA — — — — — 
Murine MHC recombination hotspot CAGRCAGR 21 Dup8 (1) 
Murine parvovirus recombination hotspot CTWTTY 290 44 Del24 (2); Dup4 (2) 
Nonamer recombination signal ACAAAAACC — — — 
Pur-binding site GGNNGAGGGAGARRRR — — — — — 
Recombination hotspot CCNCCNTNNCCNC — 
Retrotransposon TCATACACCACGCAGGGGTAGAGGACT — — — — — 
Translin-binding site 1 ATGCAG 44 Dup1 (2) 
Translin-binding site 2 GCCCWSSW 53 Del4 (2); Del22 (2) 
Vaccinia topoisomerase I consensus YCCTT 374 34 — 
Vertebrate topoisomerase II consensus RNYNNCNNGYNGKTNYNY — — — 
XY32 homopurine-pyrimidine H-palindrome motif AAGGGAGAARGGGTATAGGGRAAGAGGGAA — — — — — 

 
Motif name Motif sequence Total occurrence in 16 duplication breakpoint regions Maximum occurrence in a single duplication breakpoint region Enriched in breakpoint region 
Random genome ( n = 500)   Observed ( n = 16)   Random genome ( n = 500)   Observed ( n = 16)  

 
New sequence motifs identified in duplication breakpoint regions 
Duplication motif 1  CTSAGYTTTT ( VKSMRHDBB )  1367 53 11 — 
Duplication motif 2  WSCAGGNAYWWTTCC ( WSCAKVTVBHNYKHV )  18 10 Dup1 (2); Dup2 (1 + 2); Dup3 (2); Dup4 (1); Dup5 (1); Dup6 (1/2); Dup7 (2); Dup8 (2) 
Duplication motif 3  ATTTCTYCAGYNYTGGATHT ( RTTYYHYSRSBNHTKGMYHW )  — — Dup1 (2); Dup3 (2); Dup5 (1/2); Dup6 (2); Dup8 (1) 
Motif name Motif sequence  Total occurrence in all breakpoint regions
 
Maximum occurrence in a single breakpoint region
 
Enriched in breakpoint region 
Random genome ( n = 500)   Observed ( n = 76)   Random genome ( n = 500)   Observed ( n = 76)  
Sequences evaluated for their presence in all 76 breakpoint regions 
X-element Escherichia coli GCTGGTGG — — — 
Ade6-M26 ATGACGT — — — 
ARS consensus S. cerevisiae WTTTATRTTTW 11 Del25 (1); Del26 (2) 
ARS consensus Schizosaccharomyces pombe WRTTTATTTAW — — — 
Consensus SAR 1 AATAAAYAAA — — — 
Consensus SAR 2 TTWTWTTWTT 150 18 20 Del6 (1); Del28 (2); Dup3 (2) 
Consensus SAR 3 WADAWAYAWW 158 35 10 Del5 (1); Del8 (1) 
Consensus SAR 4 TWWTDTTWWW 300 36 20 Del6 (1); Del28 (2) 
Deletion hotspot consensus TGRRKM 696 111 — 
DNA polymerase arrest site WGGAG 521 82 — 
DNA polymerase a frameshift hotspot 1 TCCCCC 40 — 
DNA polymerase a frameshift hotspot 2 CTGGCG 12 — — — 
DNA polymerase b frameshift hotspot 1 ACCCWR 179 32 Del4 (1); Del30 (2) 
DNA polymerase a/b frameshift hotspot 1 ACCCCA 41 — 
DNA polymerase a/b frameshift hotspot 2 TGGNGT 173 17 — 
D. topoisomerase 2 consensus  GTNWAYATTNATNNR — — Del8 (1) 
Heptamer recombination signal CACAGTG 21 Del2 (1); Del9 (1); Del17 (1); Del26 (1) 
Human hypervariable minisatellites sequence 1 GGAGGTGGGCAGGARG — — — — — 
Human hypervariable minisatellites sequence 2 AGAGGTGGGCAGGTGG — — — — — 
Human minisatellites core sequence GGGCAGGARG — — — 
Human replication origin consensus WAWTTDDWWWDHWGWHMAWTTDHWGWHMAWTT — — — — — 
Human minisatellites conserved sequence/X-like element GCWGGWGG 26 Del4 (2); Del20 (1/2); Del25 (1) 
Ig heavy chain class switch repeat 1 GAGCT 140 18 — 
Ig heavy chain class switch repeat 2 GGGCT 145 11 Del1 (2) 
Ig heavy chain class switch repeat 3 GGGGT 118 21 — 
Ig heavy chain class switch repeat 4 TGGGG 184 27 — 
Ig heavy chain class switch repeat 5 TGAGC 169 43 — 
LTR-IS motif TGGAAATCCCC — — — — — 
Mariner transposon-like element GAAAATGAAGCTATTTACCCAGGA — — — — — 
Murine MHC recombination hotspot CAGRCAGR 21 Dup8 (1) 
Murine parvovirus recombination hotspot CTWTTY 290 44 Del24 (2); Dup4 (2) 
Nonamer recombination signal ACAAAAACC — — — 
Pur-binding site GGNNGAGGGAGARRRR — — — — — 
Recombination hotspot CCNCCNTNNCCNC — 
Retrotransposon TCATACACCACGCAGGGGTAGAGGACT — — — — — 
Translin-binding site 1 ATGCAG 44 Dup1 (2) 
Translin-binding site 2 GCCCWSSW 53 Del4 (2); Del22 (2) 
Vaccinia topoisomerase I consensus YCCTT 374 34 — 
Vertebrate topoisomerase II consensus RNYNNCNNGYNGKTNYNY — — — 
XY32 homopurine-pyrimidine H-palindrome motif AAGGGAGAARGGGTATAGGGRAAGAGGGAA — — — — — 

 
Motif name Motif sequence Total occurrence in 16 duplication breakpoint regions Maximum occurrence in a single duplication breakpoint region Enriched in breakpoint region 
Random genome ( n = 500)   Observed ( n = 16)   Random genome ( n = 500)   Observed ( n = 16)  

 
New sequence motifs identified in duplication breakpoint regions 
Duplication motif 1  CTSAGYTTTT ( VKSMRHDBB )  1367 53 11 — 
Duplication motif 2  WSCAGGNAYWWTTCC ( WSCAKVTVBHNYKHV )  18 10 Dup1 (2); Dup2 (1 + 2); Dup3 (2); Dup4 (1); Dup5 (1); Dup6 (1/2); Dup7 (2); Dup8 (2) 
Duplication motif 3  ATTTCTYCAGYNYTGGATHT ( RTTYYHYSRSBNHTKGMYHW )  — — Dup1 (2); Dup3 (2); Dup5 (1/2); Dup6 (2); Dup8 (1) 

Most relaxed forms of motifs used to evaluate random genome sequences are in italics. Dup/Del: deletion/duplication; (1): breakpoint 1; (2): breakpoint 2; ‘—’: not present in breakpoint regions/not enriched compared with the ‘average genome’.

For the first hypothesis, an enrichment for a single motif should be present in the breakpoint regions of the CNV. In total, 22 different breakpoint regions (28%) showed an enrichment for a sequence motif compared with the random sampling of the human genome (Table  2 ). Five of these, Del4 (2), Del6 (1), Del8 (1), Del25 (1) and Del28 (2), showed an enrichment for multiple sequence motifs (Table  2 ). The most recurring sequence motif in a single breakpoint region was the consensus scaffold attachment region (SAR) 4 motif (TWWTDTTWWW) with a maximum of nine hits, which is significantly different from the average human genome ( P = 7.12 × 10 −9 ). In addition, higher frequencies than expected were detected for the consensus SAR 2 (TTWTWTTWTT; P = 2.80 × 10 −10 ), the consensus SAR 3 (WADAWAYAWW; P = 6.59 × 10 −4 ), the Ig heavy chain class switch repeat 2 (GGGCT; P = 1.70 × 10 −4 ), the heptamer recombination signal (CACAGTG; P = 7.10 × 10 −4 ), the translin-binding site 1 (ATGCAG; P = 1.37 × 10 −4 ), the translin-binding site 2 (GCCCWSSW; P = 1.52 × 10 −4 ), the ARS consensus Saccharomyces cerevisiae (WTTTATRTTTW; P = 1.90 × 10 −4 ), the DNA polymerase b frameshift hotspot 1 (ACCCWR; P = 4.13 × 10 −4 ), the Drosophila topoisomerase 2 consensus (GTNWAYATTNATNNR; P = 2.99 × 10 −6 ), the murine MHC recombination hotspot (CAGRCAGR; P = 8.73 × 10 −4 ), the murine parvovirus recombination signal (CTWTTY; P = 6.85 × 10 −4 ) and the human minisatellites conserved sequence/χ-like element (GCWGGWGG; P = 8.48 × 10 −4 ).

If an accumulation of (different) motifs plays an important role (hypothesis II), it is expected that the total number of motifs identified in the breakpoint region is higher than that expected in a randomly selected 150 bp region within the human genome. Accordingly, the density of motifs in the 76 regions surrounding the exact breakpoint was determined. For 500 random 150 bp sequences in the genome, a mean of 7.85 motifs is encountered, ranging from 0 to 52 motifs per sequence (data not shown), whereas in the 76 analyzed breakpoint regions, the total number of motifs ranged from 2 to 22, with a mean of 7.65 motifs. Thus, the overall density of sequence motifs in breakpoint regions of rare pathogenic microdeletions and tandem duplications is not significantly different from the average genome sequence ( P = 0.77, two-tailed Student's t -test).

To identify a novel common denominator in deletion and/or tandem duplication breakpoints (hypothesis III), all 60 deletion and 16 duplication breakpoint regions were compared with each other. Using Multiple Em for Motif Elicitation (MEME), we uncovered three potential new sequence motifs common in duplication breakpoints (Fig.  3 ). The first motif, consisting of 10 bases (consensus CTSAGYTTTT), was identified in all 16 tandem duplication breakpoints. The second motif consists of 15 bases (consensus WSCAGGNAYWWTTCC) and was identified in 10 breakpoints. The third motif includes 20 bases (consensus ATTTCTYCAGYNYTGGATHT) and occurred in six tandem duplication breakpoints (Table  2 ). When comparing the most relaxed forms of the three motifs to the same 500 random genomic regions of 150 bp representing ‘an average genome’, not taking the probability matrix into account, only two of the motifs are significantly enriched in duplication breakpoint regions, including WSCAKVTVBHNYKHV for motif 2 ( P = 1.33 × 10 −10 ) and RTTYYHYSRSBNHTKGMYHW for motif 3 ( P = 4.78 × 10 −11 ). No novel sequence motifs were identified in the deletion breakpoint regions.

Figure 3.

Logos for the consensus sequence of the new motifs identified at duplication breakpoints. The x -axis represents the position in the motif, and the height of the nucleotide represents the certainty in the content of that position. The prevalent use of a certain nucleotide over the other nucleotides, e.g. the higher A at the A/G in position 1 in ( B ), indicates the prevalence of using an A over a G on that position. ( A ) Logo for the 15 bp motif enriched in 10 duplication breakpoint regions. ( B ) Logo for the 20 bp motif enriched in six duplication breakpoint regions.

Figure 3.

Logos for the consensus sequence of the new motifs identified at duplication breakpoints. The x -axis represents the position in the motif, and the height of the nucleotide represents the certainty in the content of that position. The prevalent use of a certain nucleotide over the other nucleotides, e.g. the higher A at the A/G in position 1 in ( B ), indicates the prevalence of using an A over a G on that position. ( A ) Logo for the 15 bp motif enriched in 10 duplication breakpoint regions. ( B ) Logo for the 20 bp motif enriched in six duplication breakpoint regions.

Non-B DNA conformations can predispose breakpoint sequences to DNA breakage

Genomic DNA sequences can adopt a number of conformations alternative from the normal B-conformation, including left-handed Z- DNA, cruciforms, slipped hairpin structures, triplexes and tetraplexes, which are collectively termed non-B DNA conformations. These conformations have been implicated in a number of genomic rearrangements from which it was concluded that not the sequence per se is triggering the rearrangement, but instead, that the conformation of the DNA triggers the rearrangement ( 21 , 24 , 36 , 37 ). To investigate the role of these non-B DNA conformations affecting the 76 breakpoint sequences, the same 150 bp breakpoint regions were evaluated for their potential to adopt non-B DNA conformations. Such sequences were identified in a total of 17 breakpoint regions (23%), of which 14 were located in deletion breakpoint regions and 3 in tandem duplication breakpoint regions (Table  1 ). For three breakpoint regions [Del3 (1); Del5 (2); Del30 (2)], multiple conformations could be adopted. Mirror repeats, potentially leading to triplex structures, were identified in two deletion breakpoints [Del3 (1): CAGAAATA; Del19 (1): TTTCTTTA] (Fig.  4 A). Direct repeats aligning out of register, potentially leading to slipped hairpin structures, were found in seven deletion breakpoint regions [Del3 (2): CTCTTTCTCT; Del4 (2): AGCCCAGG; Del5 (2): CTGGCCTCC; Del7 (1): CCTGGCCT; Del14 (1): CACTGCTG; Del19 (2): ATGGCTTA and Del30 (2): TCGGTATCAGGCTGGGTT] and two tandem duplication breakpoint regions [Dup1 (2): AATATTGC; Dup7 (1): AGGTAGAG; Fig.  4 B]. Oligo(G) n tracts, potentially inducing tetraplex formation, were observed in four deletion breakpoints [Del5 (2): GG CCTCCCAA GG T GCTG GG ATTACA GG ; Del11 (2): GG CCTGAGCTT GG TTGC GG GG GG ; Del18 (2): GGGG TG GG CCTTCTT GG and Del30 (2): GG GTATC GG TATCA GG CTG GG ) (Fig.  4 C). Inverted repeats, leading to a cruciform, were identified in two deletion breakpoint regions [Del3 (1): GAAATAAT and Del5 (1): TTGATAAA] (Fig.  4 D). Left-handed Z- DNA was predicted in two deletion breakpoints [Del6 (2): (TG) n repeat; Del23 (2): GCGTGGGGGCGCATGCT] and one tandem duplication breakpoint [Dup1 (1): (CA) n repeat] (Fig.  4 E).

Figure 4.

Predicted non-B DNA structures for breakpoint sequences. The complementary wild-type sequences are shown in red and blue. Bold sequences represent the location of the repeats, and the sequences in shaded yellow boxes indicate the position of the sequenced breakpoint (extending the size of perfect microhomology). ( A ) Triplex in breakpoint 1 of Del19; owing to the mirror repeat TTTCTTTA–ATTTCTTT, an intermolecular DNA triplex can be adopted, in which one of the single strands folds back and forms a triplex structure and the other strand is left unpaired. The location of the breakpoint sequence –AAATAA– is within the unpaired loop. ( B ) Slipped hairpin structure in breakpoint 2 of Del19. Because of the presence of a direct repeat CTGGCCTCC–CTGGCCTCC, misalignment of the repeats can cause the intervening segments to be single-stranded. The breakpoint sequence is located in the unpaired loop. ( C ) Tetraplex structure in breakpoint 2 of Del11. Because of oligo(G) tracts, a quadruplex structure can be formed. The breakpoint is located in the unpaired loop. ( D ) Cruciform in breakpoint 1 of deletion CNV Del5. Because of the presence of inverted repeats TTGATAAA–TTTATCAA, sequences misalign, leaving two single-stranded loops. It is within such a loop that the breakpoint for this deletion CNV is located. ( E ) Left-handed Z -DNA in breakpoint 2 of Del6. The GT-repeat [(YR•YR) n ] can be converted into the Z-form, whereas the flanking sequence remains in its normal B-form. During the B–Z transition, a base pair at each end (T) is extruded from the normal backbone to form the junctions. The breakpoint of the deletion is located in close proximity of this B–Z junction.

Figure 4.

Predicted non-B DNA structures for breakpoint sequences. The complementary wild-type sequences are shown in red and blue. Bold sequences represent the location of the repeats, and the sequences in shaded yellow boxes indicate the position of the sequenced breakpoint (extending the size of perfect microhomology). ( A ) Triplex in breakpoint 1 of Del19; owing to the mirror repeat TTTCTTTA–ATTTCTTT, an intermolecular DNA triplex can be adopted, in which one of the single strands folds back and forms a triplex structure and the other strand is left unpaired. The location of the breakpoint sequence –AAATAA– is within the unpaired loop. ( B ) Slipped hairpin structure in breakpoint 2 of Del19. Because of the presence of a direct repeat CTGGCCTCC–CTGGCCTCC, misalignment of the repeats can cause the intervening segments to be single-stranded. The breakpoint sequence is located in the unpaired loop. ( C ) Tetraplex structure in breakpoint 2 of Del11. Because of oligo(G) tracts, a quadruplex structure can be formed. The breakpoint is located in the unpaired loop. ( D ) Cruciform in breakpoint 1 of deletion CNV Del5. Because of the presence of inverted repeats TTGATAAA–TTTATCAA, sequences misalign, leaving two single-stranded loops. It is within such a loop that the breakpoint for this deletion CNV is located. ( E ) Left-handed Z -DNA in breakpoint 2 of Del6. The GT-repeat [(YR•YR) n ] can be converted into the Z-form, whereas the flanking sequence remains in its normal B-form. During the B–Z transition, a base pair at each end (T) is extruded from the normal backbone to form the junctions. The breakpoint of the deletion is located in close proximity of this B–Z junction.

For four CNVs, both breakpoint sequences met general sequence requirements to adopt non-B DNA conformations (Del3, Del5, Del19 and Dup1) (Table  1 ). The observed conformations, however, differ between the individual breakpoints, i.e. Del3 showed a triplex or cruciform and a slipped hairpin structure, Del5 a tetraplex and hairpin structure, Del19 a triplex and a slipped hairpin structure and Dup1 a slipped hairpin structure and left-handed Z -DNA.

To compare these findings to the average genome, the same series of 500 random breakpoint regions was analyzed for their potential to adopt non-B DNA structures. A total of 119 (24%) simulated regions showed this potential, suggesting that pathogenic breakpoint regions show a similar capability to adopt structures as the average genome ( P = 0.89). Evaluation of the individual frequencies for the various forms of non-B DNA conformations suggested, however, an overrepresentation of left-handed Z -DNA ( P = 0.08) and cruciform structures ( P = 0.14) in the breakpoint regions of pathogenic microdeletions and tandem duplications.

DISCUSSION

With the wide availability of high-resolution arrays targeting the entire human genome, it has become apparent that CNVs contribute to many diseases and play an important role in normal phenotypic variation. Therefore, the major challenge is no longer the identification of CNVs but, instead, the explanation of their formation and interpretation of their biological function. In this study, we demonstrated extensive microhomology at the DNA breakpoints from a set of 38 rare pathogenic microdeletions and tandem duplications, indicating that molecular mechanisms prevailing in mitotic cell division play a major role in germline CNV formation. Hereby, our data support the recent hypothesis that rare pathogenic (germline) CNVs may have a mitotic origin ( 33 ). In addition, the vast majority of the breakpoints were associated with at least one of the local genomic architectural features studied, suggestive for a strong involvement of such features in determining the breakpoint location of rare pathogenic CNVs.

Potential molecular mechanisms involved in the formation of rare pathogenic microdeletions and tandem duplications

When speculating on molecular mechanisms involved in the formation of CNVs, it is important to distinguish between (i) meiotic recombination processes such as the homology-dependent NAHR and the homology-independent classical NHEJ, and (ii) mitotic processes including classical NHEJ and NHEJ mediated by microhomology (alt-NHEJ or MMEJ) and replication-based mechanisms such as FoSTeS and microhomology-mediated break-induced replication (MMBIR). To attribute a specific mechanism to the formation of a CNV, detailed breakpoint analyses at the base level is needed as the ‘molecular fingerprint’ of the breakpoint may help to discriminate between the mechanisms ( Supplementary Material, Fig. S3 ). In the context of discriminating between different mechanisms, it is noteworthy that the number of DNA breaks necessary to form the CNV can differ between mechanisms, i.e. two DNA double-strand breaks are generally needed for NAHR and NHEJ, whereas replication-based mechanisms such as FoSTeS and MMBIR require a single DNA break. In addition, different molecular mechanisms involved in the formation of rare pathogenic CNVs may act over different size ranges, as was suggested from studying CNVs in the normal population ( 26 , 38 ). In this set of 38 rare CNVs, there does not seem to be a correlation between the potential mechanism used and the size of the CNV.

When looking at the breakpoint junctions of the 38 rare pathogenic CNVs included in this study, diverse forms of NHEJ, including classical NHEJ and alt-NHEJ, could potentially explain the formation of 27 CNVs, including 19 deletions and eight duplications. In five of these CNVs, we found inserted bases at the breakpoint junction, representing a molecular scar, a phenomenon only known for classical NHEJ ( 19 ). The remaining 11 CNVs (Del20–Del30) could have resulted from homology-dependent mechanisms. NAHR using repetitive elements, such as AluAlu -mediated NAHR, rather than repeat elements, i.e. LCRs, has been reported previously in pathogenic CNVs as well as in small common structural variations in healthy individuals ( 19 , 39 ). Of these 11 CNVs potentially resulting from NAHR, 9 deletion CNVs could have resulted from Alu–Alu mediated events (Del20–Del28), whereas one deletion CNV, Del29, likely represents a LINE1–LINE1-mediated NAHR. Recently, retrotransposition involving L1 elements has been proposed to specifically mediate normal structural variation ( 38 ). Finally, although LCR-mediated CNVs were excluded prior to inclusion in this study, Del30 seems to meet the criteria for LCR-mediated NAHR, with both breakpoints located in highly homologous sequences (97%) larger than 1 kb ( 17 ). Retrospective analysis indicated that the initial ‘low-resolution’ genome-wide microarray predicted the breakpoints ∼50 kb away from the actual breakpoints determined by breakpoint junction sequencing. In addition, BLAST2 analysis of the breakpoints showed that the LCRs were only 7 kb in size. Taken together, these two observations explain why this deletion was included in the study.

The concept that congenital pathogenic CNVs are all of meiotic origin has been challenged recently by the discovery of the first human replication-based mechanism, i.e. FoSTeS, involved in the formation of duplication CNVs causing Pelizaeus–Merzbacher disease ( PLP1 duplication), as well as the description of a model for CNV formation by replication stress ( 28 , 32 , 33 ). Similar to alt-NHEJ, FoSTeS uses microhomology, rather than larger blocks of DNA sequence homology which we know from NAHR. These sites of microhomology are used to invade a new strand for replication after stalling or collapse of the first replication fork. In addition to duplications of the PLP1 gene, duplications of MECP2 on the X chromosome have been proposed to arise by a two-step mechanism using microhomology and break-induced replication ( 27 , 40 ). The FoSTeS mechanism has been generalized further in a replicative template-switch model, MMBIR, which is characterized by microhomology. In the presence of microhomology, MMBIR can, in addition to deletion/duplication CNVs, potentially explain also inversion, translocation and triplication events ( 29–32 ). In our series of 38 rare pathogenic microdeletions and tandem duplications, the presence of two to six bases of microhomology at the breakpoints in 19 CNVs supports the notion that these CNVs utilized microhomology in their formation and could thus potentially result from FoSTeS/MMBIR. Template switching in a ‘forward’ direction could explain the formation of 13 deletion CNVs, whereas template switching in a ‘backward’ direction could explain the formation of six tandem duplications. In four additional CNVs (Dup1, Dup2, Del5 and Del6), one base of microhomology was observed at the breakpoint junction (Table  1 ). Although this could also be observed by chance, one base of microhomology could still be sufficient to serve as priming location to invade the second replication fork. Multiple consecutive FoSTeS (e.g. FoSTeS×2) events could potentially explain more complex deletion, duplication rearrangements, such as triplications and discontinuous deletion and duplication events ( 28 , 32 ). For one deletion CNV (Del10) and three tandem duplication CNVs (Dup1, Dup2 and Dup4), the rearrangements could have occurred through two FoSTeS events (FoSTeS×2) as, in addition to the microhomology, the breakpoint junctions showed the presence of deletions and/or insertions of several bases ( Supplementary Material, Fig. S1 ). Since these inserted bases also occur in the vicinity of the breakpoint (including the microhomology site), this could indicate that other mechanisms, such as slipped strand synthesis (SSA; Del10), have been involved in the formation of these pathogenic CNVs.

We propose that the nine AluAlu -mediated deletions could potentially have resulted from FoSTeS/MMBIR. This hypothesis is supported by the analyses of the extent of sequence homology between the Alu elements coinciding with the breakpoints. On average, Alu sequences are 83% identical, which seems insufficient to mediate NAHR, usually occurring through sequences with sequence similarity of >95–97% ( Supplementary Material, Fig. S2 ) ( 15 , 17 , 19 ). In addition, all nine breakpoint junctions showed microhomologies, ranging from 8 to 30 bp. As such, we argue that replication-based mechanisms like FoSTeS/MMBIR, rather than NAHR, could be involved in the formation of these deletions, in which case the microhomology within these Alu sequences served as priming site in a second replication fork.

In conclusion, the overlap in molecular fingerprint between diverse forms of replication-based mechanisms, such as the presence of microhomology in both alt-NHEJ and FoSTeS/MMBIR, make it difficult to distinguish between these mechanisms for the formation of rare pathogenic CNVs ( Supplementary Material, Fig. S3 ). We have, however, provided evidence suggesting that classical homology-dependent recombination mechanisms did not mediate these CNVs, and thus, most likely excluded a meiotic origin.

Local genomic architecture potentially leading to genomic instability

To date, it is unknown to what extent the local genomic architecture stimulates the formation of rare pathogenic CNVs. Although functional effects have not been proven in this study, the results of our analysis of breakpoint regions indicate that local genomic architecture can play a significant role in the formation of rare pathogenic microdeletions and tandem duplications. Repetitive elements, including LINEs, SINEs, LTRs and DNA simple repeats, have been implicated in chromosomal aberrations by increasing genome instability in certain regions ( 36 ). In meiosis, they are known to represent stimulating or mediating elements in NAHR and NHEJ recombination mechanisms ( 15 , 19 ). For 30 of 38 (79%) CNVs sequenced at the base pair level, at least one of the breaks occurred in a repetitive element. For deletion breakpoints, more repetitive elements were observed than expected on the basis of ‘random genome sequence’. The most frequently occurring repetitive element at CNV breakpoints included different Alu sequences. When considering a mitotic event with template switching, such repetitive elements may represent more difficult sequences to replicate, with increasing chances for fork stalling or collapse of replication forks.

Different sequence motifs have been found to be associated with recurrent chromosomal rearrangements ( 22 , 34 , 35 , 41 ). We have characterized a subset of 40 sequence motifs for their frequency in our series of 76 rare pathogenic breakpoint regions and compared them to their normal frequency in the average genome. In total, we found 13 different sequence motifs significantly enriched in rare breakpoint regions. Of these, eight sequence motifs require the presence of an additional sequence motif and/or need to be present in more consecutive copies than observed in the breakpoint regions to be functional. The remaining five sequence motifs found to be enriched in breakpoint regions could increase susceptibility for DNA breakage in meiosis and/or mitosis or, alternatively, stall replication in mitosis. The consensus SARs 2, 3 and 4 are significantly increased in five breakpoint regions (Dup4, Del5, Del6, Del8 and Del28). SARs are AT-rich sequences which increase the DNA strand separation potential ( 42 ). This AT richness elevates the presence of single-stranded DNA which, in turn, may be much more sensitive to DNA breakage during meiosis. Alternatively, in mitosis, such sequences may be more sensitive to FoSTeS because of a collapsed fork after replication through a nick and generation of a single double-strand end ( 32 ). Enrichment for translin-binding sites 1 and 2 has been found in three breakpoint regions (Dup1, Del4 and Del22). Translin-binding sites affix the conserved protein Translin and have been associated with breakpoint junctions of recurrent chromosomal translocations observed in some human cancers ( 43–45 ). Translin is proposed to function as a protector of broken DNA-ends ( 45 ), but has been excluded recently as a primary factor in regulating genome stability and/or segregation ( 46 ). The fact that the sequence motif is observed in recurrent chromosomal aberrations, however, suggests that the motif is more sensitive to breakage than other regions in the genome.

In this study, three new motifs have been observed in tandem duplication breakpoint regions, of which two are specifically enriched in breakpoint regions of these duplications. The most relaxed forms of these two motifs do not appear frequently in random genome sequences. Even when the number of random sequences is substantially increased to 5000, the motifs remain significantly enriched (data not shown). Although currently there is no overt reason why deletion and tandem duplication CNVs should differ, these data suggest that these motifs are highly specific to tandem duplication breakpoints. To uncover the function of these motifs in the binding of proteins or a role in histone/genome stability, diverse database searches were performed but were proven unsuccessful due to the ambiguous code of the motifs (data not shown). In addition, efforts were made, but unsuccessful, to annotate the genomic locations of all possible motifs to identify potential overlap with known duplication events. Thus, the potential role of these novel motifs and how they stimulate CNV formation remain elusive.

For many years, it has been known that DNA structures other than B-DNA conformations, including left-handed Z -DNA, cruciforms and slipped (hairpin) structures predispose DNA to breakage at specific locations within the genome ( 37 ). Examples of these include large inverted repeats leading to cruciform formation in the most common non-Robertsonian recurrent translocation t( 11 ; 22 ) in humans, and silencing of the FXN gene in patients with Friedreich's ataxia by triplex structures ( 47–49 ). In general, the stretches of sequences leading to such conformations are several hundreds of base pairs in length, thereby contributing to the recurrent nature of these rearrangements. For 11 deletion and two tandem duplication CNVs (34%), we observed sequences that are capable of adopting non-B DNA structures for at least one of the breakpoints. However, the reported sequences capable for adopting the non-B DNA structures are far longer than what we observed in our sequenced breakpoints. The longest fragment capable of adopting non-B DNA was a 23 bp sequence potentially leading to tetraplex conformation (Del11). Although shorter in length and not significantly enriched in the observed breakpoint regions, these individual structures could still lead to local instability prone to DNA breakage and, moreover, may contribute to the non-recurrent nature of the CNVs in this study. Left-handed Z -DNA could potentially have been involved in three breakpoints (Dup1, Dup2 and Del6). In two of these (Dup1 and Del6), the breakpoint occurred 14 bases from the B–Z junction. With an approximate estimation of 10 nucleotides involved in a single B-loop, the breakpoint mapping 14 bases from the B–Z junction could therefore have resulted from this local instability. Mechanistically, non-B DNA conformations could explain stalling of replication forks, thereby supporting a hypothesis for non-B DNA conformations in microhomology-mediated mitotic mechanisms such as alt-NHEJ, FoSTeS or MMBIR.

In conclusion, we have analyzed 76 breakpoint regions for the presence of various genomic architectural features. The analyses included the evaluation of repetitive elements, enrichment of sequence motifs associated with DNA breakage and the potential to adopt non-B DNA structures. For our series of 76 breakpoint regions, 62 (81%) were associated with at least one genomic architectural feature, of which 21 (28%) showed two features and two (3%) showed all three features examined in this study. Of the remaining 14 breakpoint regions with known genomic architectural features, 6 breakpoint regions (8%) contained at least one of the novel sequence motifs. This leaves only 11% of breakpoint regions without an association between the breakpoint and local genomic architecture studied here.

Summary

Recently, replication-based mechanisms have been suggested for the etiology of human pathogenic CNVs. Here, we have analyzed the etiology of 38 rare pathogenic microdeletions and tandem duplications by analyses of the breakpoint junctions and local genomic architectural features to determine the contribution of such mechanisms in rare, non-recurrent, pathogenic CNVs. Our data suggest that rare pathogenic CNVs do not seem to occur in random genomic sequences, but may favor locations with a high content of specific architectural features. Moreover, the presence of (micro)homology in 79% of the breakpoint junctions argues that (replication-based) microhomology-mediated repair mechanisms, including alt-NHEJ and FoSTeS/MMBIR, prevail in rare pathogenic CNVs. Future research on the molecular mechanistic spectrum of CNV formation and the role of genomic architecture will provide valuable information on genome plasticity in general and its role in health and disease.

MATERIALS AND METHODS

Patient material

A total of 38 patient-derived, anonymized DNA samples were included in this study on the basis of on the presence of a (micro)deletion or (micro)duplication associated with a disease [multiple congenital anomalies (MCA), MR with or without MCA, epilepsy or autism]. Of all CNVs, 37 were identified in a routine diagnostic setting at the Department of Human Genetics of the Radboud University Nijmegen Medical Centre and the Department of Molecular and Human Genetics of Baylor College of Medicine; 5 by 32K BAC array analysis, 10 by 250K Nsp 1 SNP arrays (Affymetrix, Santa Clara, CA, USA), 22 by the clinical targeted array CMA V4, V6.0 or V6.1 BAC-based arrays or CMA V6.3, V6.5, V7.1 or V7.2 oligo arrays ( 1 , 9 , 50 ). One CNV was detected by a commercial company using a genome-wide BAC-based SignatureChipWG array (Signature Genomics Laboratories, Spokane, WA, USA). Prior to this study, CNVs were validated using different molecular approaches, including fluorescence in situ hybridization, PCR analyses, multiplex ligation-dependent probe amplification or alternative microarray platforms. Additionally, 19 CNVs were proven to be de novo and 9 inherited, whereas for the remaining 10 CNVs, this could not be determined because of the absence of at least one of the parental DNA samples. For all CNVs, the presence of large LCRs mediating the deletion or duplication was excluded, thereby focusing on CNVs caused by molecular mechanisms and genomic structures other than LCR-mediated NAHR. Also, on the basis of the initial array analyses, all deletions and duplications were considered to be simple rearrangements, involving two breakpoints.

Ultra-high-resolution array CGH

For detailed breakpoint mapping, high-resolution NimbleGen arrays were used, including whole-genome 385K and 2.1M arrays covering the whole genome, and two custom-designed 385K ultra-high-resolution arrays (Roche NimbleGen Systems, Madison, WI, USA). The average probe density of the HG18_WG_CGH_v1 was 6 kb, whereas that of the 2.1M HG18_WG_CGH_v1 was 1.5 kb. The two custom arrays were designed to cover breakpoint regions on the basis of the initial array analyses, with one probe per 65 bp for the first and one probe per 24 bp for the second array. Array hybridization, post-hybridization washes and scanning were performed according to the manufacturer's instructions (Roche NimbleGen Systems). The acquired images were analyzed using NimbleScan V2.4 extraction software (Roche NimbleGen Systems). For each probe on the array, the log 2 Cy3/Cy5 ratio as well as the log 2 Cy5/Cy3 ratio were calculated using the SegMNT algorithm. The relative intensity of the patient DNA versus the reference DNA was indicated on a log 2 scale. A positive result was determined when a genomic segment complementary to oligo probes for CNV (gain or loss) detection was on log 2 0.2-fold average difference from reference normal DNA. Data were visualized using SignalMap V1.9 software (Roche NimbleGen Systems). Averaging windows (×5 and ×10) were used for breakpoint determination, after which junction fragments were obtained by PCR analysis.

Junction PCR

For all CNVs, PCR primers were designed using both proximal and distal 1.5–15 kb breakpoint flanking regions determined by the array analyses ( Supplementary Material, Fig. S1 ). Genomic sequences were obtained from the UCSC genome browser, build 36.1 (hg18). Several PCR primer combinations (outward facing for duplications and inward facing for deletions) spanning the breakpoint junction were tested and optimized until a unique junction fragment was generated. To validate the uniqueness of the junction product, i.e. only present in patient DNA, control DNA from healthy individuals was used. Breakpoint spanning PCR products of <5 kb were generated using standard protocols, optimized for annealing and extension temperatures (Finnzymes). Alternatively, for breakpoint spanning PCR products expected to exceed 5 kb, long-range PCR protocols were used according to manufacturer's instructions (Takara Bio Inc., Japan). Gel electrophoresis was performed to visualize the junction fragments, and unique junction fragments were purified using QIAquick PCR purification kit (QIAgen). Subsequently, the fragments were bidirectionally sequenced using in-house 3100 or 3730 DNA Analyzers (Applied Biosystems), or by using 3730 DNA Analyzers of Lone Star Labs and SeqWright DNA Technology Service (Houston, TX, USA). Individual primer sequences and PCR programs are available upon request.

Bioinformatic analyses of junction fragments

The genomic sequences of the junction fragments were assembled using the Sequencher software (Gene Codes Corporation, Ann Arbor, MI, USA) or Vector NTI Advanced V10 ( www.invitrogen.com ). After assembly, breakpoints (exact physical genomic breakpoint) and breakpoint regions (150 bp stretches surrounding the breakpoint) were further analyzed using the following online databases: (i) BLAST2 for the identification of (near-perfect) sequence homology between the normal proximal and normal distal breakpoint sequences; (ii) Repeat Masker to uncover known repetitive elements at the breakpoints; (iii) Fuzznuc analyses to identify sequence motifs previously associated with chromosomal aberrations; (iv) MEME Suite for the identification of novel motifs; and (v) Z-Hunt online, RepeatAround and QGRS for the evaluation of non-B DNA conformations. As it is currently not clear whether sequence motifs are required at the breakpoint or in close proximity to the breakpoint to trigger CNV formation, the programs used for Fuzznuc, MEME suite and non-B DNA structures were run using default setting and applied to 150 bp of genomic sequence, extending 75 bp both proximal and distal to the breakpoint (referred to as ‘breakpoint region’) ( Supplementary Material, Table S1 ). For the identification of sequence motifs, both ‘+’ and ‘−’ (complementary) strands were evaluated. As criteria for the evaluation of non-B DNA conformations, both counterparts of the repeat needed to flank the breakpoint.

Findings of a breakpoint sequence were compared with the remainder of the genome by randomly sampling 500 genomic sequences of 150 bp ( Supplementary Material, Table S2 ). These sequences were obtained from hg18 by randomly selecting autosomal chromosomes and then locations based on the selected chromosome. Centromeres and gaps in the sequence alignment were excluded. These 150 bp sequences were evaluated for the same genomic architectural features as the breakpoint (regions) derived from the rare pathogenic CNVs. The genomic position of base 75 in this randomly sampled sequence was evaluated for its involvement of repetitive elements. For evaluating the presence and extend of (micro)homology as well as the potential to adopt non-B DNA conformations at ‘random breakpoints’, a break was simulated between base 75 and base 76 within this sequence. Statistical significance between these randomly sampled sequences and the breakpoint (regions) was tested using Fisher's exact test with Bonferroni correction.

ONLINE RESOURCES

BLAST2, http://www.ncbi.nlm.nih.gov/blast/bl2seq/wblast2.cgi

Fuzznuc, http://emboss.bioinformatics.nl/cgi-bin/emboss/fuzznuc

QGRS mapper, http://bioinformatics.ramapo.edu/QGRS/analyze.php

RepeatAround, http://portugene.com/repeataround.html

RepeatMasker, http://www.repeatmasker.org/

The MEME Suite, http://meme.nbcr.net/meme4/intro.html

UCSC genome browser, http://genome.ucsc.edu/

Z-hunt online, http://gac-web.cgrb.oregonstate.edu/zDNA/

SUPPLEMENTARY MATERIAL

Supplementary Material is available at HMG online .

FUNDING

This work was supported by grants from the Netherlands Organisation for Health Research and Development (ZonMW 916.86.016 to L.E.L.M.V., ZonMW 917.66.363 to J.A.V., ZonMW 917.86.319 to B.B.A.d.V.), grants from the AnEUploidy project (LSHG-CT-2006-037627 to A.H., B.B.A.d.V. and J.A.V.) supported by the European Commission under FP6 and grants from the Polish Ministry of Science and Higher Education (R13-0005-04/2008 to P.S.).

ACKNOWLEDGEMENTS

We gratefully acknowledge Hanka Venselaar and Luminita Moruz for technical assistance, Jayne Hehir-Kwa, Terry Vrijenhoek and Diederik de Bruijn for useful discussions, clinicians for collecting patient DNAs, James R. Lupski and Philip J. Hastings for critical reviews of this manuscript.

Conflict of Interest statement . None declared.

REFERENCES

1
de Vries
B.B.
Pfundt
R.
Leisink
M.
Koolen
D.A.
Vissers
L.E.
Janssen
I.M.
Reijmersdal
S.
Nillesen
W.M.
Huys
E.H.
de Leeuw
N.
, et al.  . 
Diagnostic genome profiling in mental retardation
Am. J. Hum. Genet.
 , 
2005
, vol. 
77
 (pg. 
606
-
616
)
2
Koolen
D.A.
Vissers
L.E.
Pfundt
R.
de Leeuw
N.
Knight
S.J.
Regan
R.
Kooy
R.F.
Reyniers
E.
Romano
C.
Fichera
M.
, et al.  . 
A new chromosome 17q21.31 microdeletion syndrome associated with a common inversion polymorphism
Nat. Genet.
 , 
2006
, vol. 
38
 (pg. 
999
-
1001
)
3
Mefford
H.C.
Sharp
A.J.
Baker
C.
Itsara
A.
Jiang
Z.
Buysse
K.
Huang
S.
Maloney
V.K.
Crolla
J.A.
Baralle
D.
, et al.  . 
Recurrent rearrangements of chromosome 1q21.1 and variable pediatric phenotypes
N. Engl. J. Med.
 , 
2008
, vol. 
359
 (pg. 
1685
-
1699
)
4
Redon
R.
Ishikawa
S.
Fitch
K.R.
Feuk
L.
Perry
G.H.
Andrews
T.D.
Fiegler
H.
Shapero
M.H.
Carson
A.R.
Chen
W.
, et al.  . 
Global variation in copy number in the human genome
Nature
 , 
2006
, vol. 
444
 (pg. 
444
-
454
)
5
Sharp
A.J.
Cheng
Z.
Eichler
E.E.
Structural variation of the human genome
Annu. Rev. Genomics Hum. Genet.
 , 
2006
, vol. 
7
 (pg. 
407
-
442
)
6
Sharp
A.J.
Selzer
R.R.
Veltman
J.A.
Gimelli
S.
Gimelli
G.
Striano
P.
Coppola
A.
Regan
R.
Price
S.M.
Knoers
N.V.
, et al.  . 
Characterization of a recurrent 15q24 microdeletion syndrome
Hum. Mol. Genet.
 , 
2007
, vol. 
16
 (pg. 
567
-
572
)
7
Vissers
L.E.
van Ravenswaaij
C.M.
Admiraal
R.
Hurst
J.A.
de Vries
B.B.
Janssen
I.M.
van der Vliet
W.A.
Huys
E.H.
de Jong
P.J.
Hamel
B.C.
, et al.  . 
Mutations in a new member of the chromodomain gene family cause CHARGE syndrome
Nat. Genet.
 , 
2004
, vol. 
36
 (pg. 
955
-
957
)
8
Stankiewicz
P.
Beaudet
A.L.
Use of array CGH in the evaluation of dysmorphology, malformations, developmental delay, and idiopathic mental retardation
Curr. Opin. Genet. Dev.
 , 
2007
, vol. 
17
 (pg. 
182
-
192
)
9
Ou
Z.
Kang
S.H.
Shaw
C.A.
Carmack
C.E.
White
L.D.
Patel
A.
Beaudet
A.L.
Cheung
S.W.
Chinault
A.C.
Bacterial artificial chromosome-emulation oligonucleotide arrays for targeted clinical array-comparative genomic hybridization analyses
Genet. Med.
 , 
2008
, vol. 
10
 (pg. 
278
-
289
)
10
Brunetti-Pierri
N.
Berg
J.S.
Scaglia
F.
Belmont
J.
Bacino
C.A.
Sahoo
T.
Lalani
S.R.
Graham
B.
Lee
B.
Shinawi
M.
, et al.  . 
Recurrent reciprocal 1q21.1 deletions and duplications associated with microcephaly or macrocephaly and developmental and behavioral abnormalities
Nat. Genet.
 , 
2008
, vol. 
40
 (pg. 
1466
-
1471
)
11
Ou
Z.
Berg
J.S.
Yonath
H.
Enciso
V.B.
Miller
D.T.
Picker
J.
Lenzi
T.
Keegan
C.E.
Sutton
V.R.
Belmont
J.
, et al.  . 
Microduplications of 22q11.2 are frequently inherited and are associated with variable phenotypes
Genet. Med.
 , 
2008
, vol. 
10
 (pg. 
267
-
277
)
12
Ben-Shachar
S.
Ou
Z.
Shaw
C.A.
Belmont
J.W.
Patel
M.S.
Hummel
M.
Amato
S.
Tartaglia
N.
Berg
J.
Sutton
V.R.
, et al.  . 
22q11.2 distal deletion: a recurrent genomic disorder distinct from DiGeorge syndrome and velocardiofacial syndrome
Am. J. Hum. Genet.
 , 
2008
, vol. 
82
 (pg. 
214
-
221
)
13
Gonzalez
E.
Kulkarni
H.
Bolivar
H.
Mangano
A.
Sanchez
R.
Catano
G.
Nibbs
R.J.
Freedman
B.I.
Quinones
M.P.
Bamshad
M.J.
, et al.  . 
The influence of CCL3L1 gene-containing segmental duplications on HIV-1/AIDS susceptibility
Science
 , 
2005
, vol. 
307
 (pg. 
1434
-
1440
)
14
Lupski
J.R.
Stankiewicz
P.
Genomic disorders: molecular mechanisms for rearrangements and conveyed phenotypes
PLoS Genet.
 , 
2005
, vol. 
1
 pg. 
e49
 
15
Stankiewicz
P.
Lupski
J.R.
Genome architecture, rearrangements and genomic disorders
Trends Genet.
 , 
2002
, vol. 
18
 (pg. 
74
-
82
)
16
Sharp
A.J.
Mefford
H.C.
Li
K.
Baker
C.
Skinner
C.
Stevenson
R.E.
Schroer
R.J.
Novara
F.
De Gregori
M.
Ciccone
R.
, et al.  . 
A recurrent 15q13.3 microdeletion syndrome associated with mental retardation and seizures
Nat. Genet.
 , 
2008
, vol. 
40
 (pg. 
322
-
328
)
17
Lupski
J.R.
Genomic disorders: structural features of the genome can lead to DNA rearrangements and human disease traits
Trends Genet.
 , 
1998
, vol. 
14
 (pg. 
417
-
422
)
18
Lupski
J.R.
Structural variation in the human genome
N. Engl. J. Med.
 , 
2007
, vol. 
356
 (pg. 
1169
-
1171
)
19
Shaw
C.J.
Lupski
J.R.
Implications of human genome architecture for rearrangement-based disorders: the genomic basis of disease
Hum. Mol. Genet.
 , 
2004
, vol. 
13
 
Spec. no. 1
(pg. 
R57
-
R64
)
20
Koolen
D.A.
Sistermans
E.A.
Nilessen
W.
Knight
S.J.
Regan
R.
Liu
Y.T.
Kooy
R.F.
Rooms
L.
Romano
C.
Fichera
M.
, et al.  . 
Identification of non-recurrent submicroscopic genome imbalances: the advantage of genome-wide microarrays over targeted approaches
Eur. J. Hum. Genet.
 , 
2008
, vol. 
16
 (pg. 
395
-
400
)
21
Wells
R.D.
Non-B DNA conformations, mutagenesis and disease
Trends Biochem. Sci.
 , 
2007
, vol. 
32
 (pg. 
271
-
278
)
22
Visser
R.
Shimokawa
O.
Harada
N.
Kinoshita
A.
Ohta
T.
Niikawa
N.
Matsumoto
N.
Identification of a 3.0-kb major recombination hotspot in patients with Sotos syndrome who carry a common 1.9-Mb microdeletion
Am. J. Hum. Genet.
 , 
2005
, vol. 
76
 (pg. 
52
-
67
)
23
Stankiewicz
P.
Shaw
C.J.
Dapper
J.D.
Wakui
K.
Shaffer
L.G.
Withers
M.
Elizondo
L.
Park
S.S.
Lupski
J.R.
Genome architecture catalyzes nonrecurrent chromosomal rearrangements
Am. J. Hum. Genet.
 , 
2003
, vol. 
72
 (pg. 
1101
-
1116
)
24
Bacolla
A.
Wojciechowska
M.
Kosmider
B.
Larson
J.E.
Wells
R.D.
The involvement of non-B DNA structures in gross chromosomal rearrangements
DNA Repair (Amst.)
 , 
2006
, vol. 
5
 (pg. 
1161
-
1170
)
25
Arlt
M.F.
Durkin
S.G.
Ragland
R.L.
Glover
T.W.
Common fragile sites as targets for chromosome rearrangements
DNA Repair (Amst.)
 , 
2006
, vol. 
5
 (pg. 
1126
-
1135
)
26
Kidd
J.M.
Cooper
G.M.
Donahue
W.F.
Hayden
H.S.
Sampas
N.
Graves
T.
Hansen
N.
Teague
B.
Alkan
C.
Antonacci
F.
, et al.  . 
Mapping and sequencing of structural variation from eight human genomes
Nature
 , 
2008
, vol. 
453
 (pg. 
56
-
64
)
27
Bauters
M.
Van Esch
H.
Friez
M.J.
Boespflug-Tanguy
O.
Zenker
M.
Vianna-Morgante
A.M.
Rosenberg
C.
Ignatius
J.
Raynaud
M.
Hollanders
K.
, et al.  . 
Nonrecurrent MECP2 duplications mediated by genomic architecture-driven DNA breaks and break-induced replication repair
Genome Res.
 , 
2008
, vol. 
18
 (pg. 
847
-
858
)
28
Lee
J.A.
Carvalho
C.M.B.
Lupski
J.R.
A DNA replication mechanism for generating nonrecurrent rearrangements associated with genomic disorders
Cell
 , 
2007
, vol. 
131
 (pg. 
1235
-
1247
)
29
Zhang
F.
Khajavi
M.
Connolly
A.M.
Towne
C.F.
Batish
S.D.
Lupski
J.R.
The DNA replication FoSTeS/MMBIR mechanism can generate genomic, genic and exonic complex rearrangements in humans
Nat. Genet.
 , 
2009
, vol. 
41
 (pg. 
849
-
853
)
30
Zhang
F.
Carvalho
C.M.
Lupski
J.R.
Complex human chromosomal and genomic rearrangements
Trends Genet.
 , 
2009
 
doi:10.1016/j.tig.2009.05.005
31
Hastings
P.J.
Lupski
J.R.
Rosenberg
S.M.
Ira
G.
Mechanisms of change in gene copy number
Nat. Rev. Genet.
 , 
2009
 
in press
32
Hastings
P.J.
Ira
G.
Lupski
J.R.
A microhomology-mediated break-induced replication model for the origin of human copy number variation
PLoS Genet.
 , 
2009
, vol. 
5
 pg. 
e1000327
 
33
Arlt
M.F.
Mulle
J.G.
Schaibley
V.M.
Ragland
R.L.
Durkin
S.G.
Warren
S.T.
Glover
T.W.
Replication stress induces genome-wide copy number changes in human cells that resemble polymorphic and pathogenic variants
Am. J. Hum. Genet.
 , 
2009
, vol. 
84
 (pg. 
339
-
350
)
34
Gajecka
M.
Gentles
A.J.
Tsai
A.
Chitayat
D.
Mackay
K.L.
Glotzbach
C.D.
Lieber
M.R.
Shaffer
L.G.
Unexpected complexity at breakpoint junctions in phenotypically normal individuals and mechanisms involved in generating balanced translocations t(1;22)(p36;q13)
Genome Res.
 , 
2008
, vol. 
18
 (pg. 
1733
-
1742
)
35
Myers
S.
Freeman
C.
Auton
A.
Donnelly
P.
McVean
G.
A common sequence motif associated with recombination hot spots and genome instability in humans
Nat. Genet.
 , 
2008
, vol. 
40
 (pg. 
1124
-
1129
)
36
Argueso
J.L.
Westmoreland
J.
Mieczkowski
P.A.
Gawel
M.
Petes
T.D.
Resnick
M.A.
Double-strand breaks associated with repetitive DNA can reshape the genome
Proc. Natl Acad. Sci. USA
 , 
2008
, vol. 
105
 (pg. 
11845
-
11850
)
37
Bacolla
A.
Wells
R.D.
Non-B DNA conformations, genomic rearrangements, and human disease
J. Biol. Chem.
 , 
2004
, vol. 
279
 (pg. 
47411
-
47414
)
38
Korbel
J.O.
Urban
A.E.
Affourtit
J.P.
Godwin
B.
Grubert
F.
Simons
J.F.
Kim
P.M.
Palejev
D.
Carriero
N.J.
Du
L.
, et al.  . 
Paired-end mapping reveals extensive structural variation in the human genome
Science
 , 
2007
, vol. 
318
 (pg. 
420
-
426
)
39
de Smith
A.J.
Walters
R.G.
Coin
L.J.M.
Steinfeld
I.
Yakhini
Z.
Sladek
R.
Froguel
P.
Blakemore
A.I.F.
Small deletion variants have stable breakpoints commonly associated with Alu elements
PLoS ONE
 , 
2008
, vol. 
3
 pg. 
e3104
 
40
Carvalho
C.M.
Zhang
F.
Liu
P.
Patel
A.
Sahoo
T.
Bacino
C.A.
Shaw
C.
Peacock
S.
Pursley
A.
Tavyev
Y.J.
, et al.  . 
Complex rearrangements in patients with duplications of MECP2 can occur by fork stalling and template switching
Hum. Mol. Genet.
 , 
2009
, vol. 
18
 (pg. 
2188
-
2203
)
41
Visser
R.
Shimokawa
O.
Harada
N.
Niikawa
N.
Matsumoto
N.
Non-hotspot-related breakpoints of common deletions in Sotos syndrome are located within destabilised DNA regions
J. Med. Genet.
 , 
2005
, vol. 
42
 pg. 
e66
 
42
Fiorini
A.
Gouveia
F.S.
Fernandez
M.A.
Scaffold/matrix attachment regions and intrinsic DNA curvature
Biochemistry (Mosc.)
 , 
2006
, vol. 
71
 (pg. 
481
-
488
)
43
Abeysinghe
S.S.
Chuzhanova
N.
Krawczak
M.
Ball
E.V.
Cooper
D.N.
Translocation and gross deletion breakpoints in human inherited disease and cancer I: nucleotide composition and recombination-associated motifs
Hum. Mutat.
 , 
2003
, vol. 
22
 (pg. 
229
-
244
)
44
Kasai
M.
Matsuzaki
T.
Katayanagi
K.
Omori
A.
Maziarz
R.T.
Strominger
J.L.
Aoki
K.
Suzuki
K.
The translin ring specifically recognizes DNA ends at recombination hot spots in the human genome
J. Biol. Chem.
 , 
1997
, vol. 
272
 (pg. 
11402
-
11407
)
45
Aoki
K.
Suzuki
K.
Sugano
T.
Tasaka
T.
Nakahara
K.
Kuge
O.
Omori
A.
Kasai
M.
A novel gene, Translin, encodes a recombination hotspot binding protein associated with chromosomal translocations
Nat. Genet.
 , 
1995
, vol. 
10
 (pg. 
167
-
174
)
46
Jaendling
A.
Ramayah
S.
Pryce
D.W.
McFarlane
R.J.
Functional characterisation of the Schizosaccharomyces pombe homologue of the leukaemia-associated translocation breakpoint binding protein translin and its binding partner, TRAX
Biochim. Biophys. Acta
 , 
2008
, vol. 
1783
 (pg. 
203
-
213
)
47
Kurahashi
H.
Inagaki
H.
Ohye
T.
Kogo
H.
Kato
T.
Emanuel
B.S.
Palindrome-mediated chromosomal translocations in humans
DNA Repair (Amst.)
 , 
2006
, vol. 
5
 (pg. 
1136
-
1145
)
48
Wells
R.D.
DNA triplexes and Friedreich ataxia
FASEB J.
 , 
2008
, vol. 
22
 (pg. 
1625
-
1634
)
49
Inagaki
H.
Ohye
T.
Kogo
H.
Kato
T.
Bolor
H.
Taniguchi
M.
Shaikh
T.H.
Emanuel
B.S.
Kurahashi
H.
Chromosomal instability mediated by non-B DNA: cruciform conformation and not DNA sequence is responsible for recurrent translocation in humans
Genome Res.
 , 
2009
, vol. 
19
 (pg. 
191
-
198
)
50
Lu
X.
Shaw
C.A.
Patel
A.
Li
J.
Cooper
M.L.
Wells
W.R.
Sullivan
C.M.
Sahoo
T.
Yatsenko
S.A.
Bacino
C.A.
, et al.  . 
Clinical implementation of chromosomal microarray analysis: summary of 2513 postnatal cases
PLoS ONE
 , 
2007
, vol. 
2
 pg. 
e327
 
51
Derwinska
K.
Smyk
M.
Cooper
M.L.
Bader
P.
Cheung
S.W.
Stankiewicz
P.
PTCH1 duplication in a family with microcephaly and mild developmental delay
Eur. J. Hum. Genet.
 , 
2008
, vol. 
17
 (pg. 
267
-
271
)
52
Erez
A.
Patel
A.J.
Wang
X.
Xia
Z.
Bhatt
S.S.
Craigen
W.
Cheung
S.W.
Lewis
R.A.
Fang
P.
Davenport
S.L.
, et al.  . 
Alu-specific microhomology-mediated deletions in CDKL5 in females with early-onset seizure disorder
Neurogenetics
 , 
2009
 
doi: 10.1007/s10048–009–0195-z
53
Stankiewicz
P.
Sen
P.
Bhatt
S.S.
Storer
M.
Xia
Z.
Bejjani
B.A.
Ou
Z.
Wiszniewska
J.
Driscoll
D.J.
Bolivar
J.
, et al.  . 
Genomic and genic deletions of the FOX gene cluster on 16q24.1 and inactivating mutations of FOXF1 cause alveolar capillary dysplasia and other malformations
Am. J Hum. Genet.
 , 
2009
, vol. 
84
 (pg. 
780
-
791
)

Author notes

The authors wish it to be known that, in their opinion, the last two authors should be regarded as joint Last Authors.