-
PDF
- Split View
-
Views
-
Cite
Cite
Mitchell J Machiela, Lea Jessop, Weiyin Zhou, Meredith Yeager, Stephen J Chanock, Characterization of breakpoint regions of large structural autosomal mosaic events, Human Molecular Genetics, Volume 26, Issue 22, 15 November 2017, Pages 4388–4394, https://doi.org/10.1093/hmg/ddx324
- Share Icon Share
Abstract
Recent studies have reported a higher than anticipated frequency of large clonal autosomal mosaic events >2 Mb in size in the aging population. Mosaic events are detected from analyses of intensity parameters of linear stretches with deviations in heterozygous probes of single nucleotide polymorphism microarrays. The non-random distribution of detected mosaic events throughout the genome suggests common mechanisms could influence the formation of mosaic events. Here we use publicly available data tracks from the University of California Santa Cruz Genome Browser to investigate the genomic characteristics of the regions at the terminal ends of two frequent types of large structural mosaic events: telomeric neutral events and interstitial losses. We observed breakpoints are more likely to occur in regions enriched for open chromatin, increased gene density, elevated meiotic recombination rates and in the proximity of repetitive elements. These observations suggest that detected mosaic event breakpoints are preferentially recovered in genomic regions that are observed to be active and thus more accessible to environmental exposures and events related to gene transcription. We propose that errors in DNA repair pathways, such as non-homologous end joining and homologous recombination, may be important cellular mechanisms that lead to the formation of large structural mosaic events such as interstitial losses and copy neutral events that include telomeres. Further studies using next generation sequencing technologies should be instrumental in mapping the specific junctions of mosaic events to the nucleotide and provide insights into the molecular mechanisms responsible for clonal somatic structural events.
Introduction
Human clonal mosaicism is the presence of more than one diploid genotype in a monozygotic individual (1). Clonal mosaic events are somatically acquired post-zygotic mutations and can range in size from single point mutations to copy number changes that span an entire chromosome (2–5). Three criteria are necessary for clonal mosaicism to exist (6). First, an initiating event generates the formation of a mutation or copy number change. Second, the event must be compatible with cellular survival and avoid correction by one or more DNA damage repair pathways, through a phenomena known as reversion (7). Third, the event is passed on to daughter cells and confers growth advantage relative to normal cells, even through a few divisions. In this regard, the aberrant cells can be clonally selected and most likely, over time, reach a detectible cellular fraction of the affected tissue type. Still, the clonal selection can be reversed under conditions favoring reversion. The resulting phenotype of currently detectable mosaic clones likely depends upon the developmental timing of the mutation, cell lineage affected, genomic location (including affected genes), and relative percentage of the aberrant cellular subpopulation (8). Phenotypic manifestations range from apparently normal phenotypes, to mild, localized disorders such as nevi and alterations in skin pigmentation, to systemic and life-threatening disorders including Proteus syndrome and cancer, which can demonstrate a complex relationship with aneuploidy (9,10).
Neither the timing nor mechanisms related to the initial formation of large structural clonal mosaic events are well understood. Existing evidence suggests these events occur early in development when cells are dividing rapidly and the generation of new mutations associated with mitotic errors is more likely (11,12). A recent study of cultured embryonic stem cell lines suggests that a higher than anticipated fraction rapidly develop TP53 mutations in a set of subclones within a few passages (13,14). Similarly, cell line passage has been associated with selection of large structural events (15), perhaps, like the TP53 mutated subclones, due to a proliferation advantage. Such early mutations can remain at low cellular proportions for years or even decades until the local cellular environment favors the clonal expansion of such cells, through a process that inefficiently recognizes and removes cell populations with aberrant genotypes. Alternatively, evidence suggests mosaic alterations can occur later in life as a result of the ageing process or damage due to environmental exposures, such as tobacco smoke (16,17). While circumstantial evidence is available that supports both hypotheses, limited data is available to link the formation of large structural mosaicism to either mechanism conclusively.
Existing population studies of large structural clonal mosaicism suggest mosaic events are non-randomly distributed across the genome (4,5,18–22). These observations are based on bioinformatics analyses of intensity signals generated by commercial single nucleotide polymorphism (SNP) microarray chips (23). Alterations of specific genomic regions, which harbor genes important for growth and development (e.g. 13q14 or 20q) may confer selective advantages when the copy number is altered (24,25). Additionally, the non-random distribution suggests there may be genomic regions more susceptible to the formation of clonal mosaic events. The aim of this investigation is to analyze the molecular features surrounding mosaic event breakpoint regions defined by SNP arrays in an effort to find common molecular features that cluster in breakpoint regions. Knowledge of common molecular features in breakpoint regions could direct future functional studies that seek to better understand developmental timeframe and initiating mechanisms of large clonal mosaic events.
Results
We conducted a breakpoint analysis of two major classes of detectable clonal mosaicism: interstitial losses (N = 688) and telomeric neutral events (N = 543) using current bioinformatic resources (Fig. 1). Analysis of the breakpoint regions indicates substantial clustering of breakpoint in select regions of the genome (Fig. 2). In comparison to random samplings (N = 500) of similar-sized regions on other autosomes, we observed a statistically significant enrichment (P < 0.05) for several data features (Fig. 3).

An example chromosome depicting a telomeric event (A) and an interstitial event (B) with breakpoint regions highlighted in red. Each telomeric event has one defined breakpoint region while each interstitial event has two breakpoint regions.

Circos plots of the observed genomic distribution of breakpoint regions for (A) interstitial losses (N = 688) and (B) copy neutral loss of heterozygosity (N = 543).

Breakpoint analysis for investigated UCSC and ENCODE data tracks. Red and blue distribution represents means from 1000 permutations of random interstitial loss and telomeric copy-neutral breakpoints, respectively. Y-axes denote density. Boxes and error bars represent mean and 95% confidence interval for interstitial losses (red) and neutral events extending into telomeres (blue). Dotted lines are the two-sided permutation P-value cutoffs for P < 0.05.
Measures of open chromatin
There was a significant elevation in mean predicted hydroxyl radical cleavage intensity for both interstitial losses and telomeric neutral breakpoint regions when compared to permutations of random regions (P < 0.001). This suggests breakpoints are in regions with higher solvent-accessible surface area. We specifically looked at the likely cells of origin for these event in the merged tracks of lymphoblastoid cell line-derived DNase I hypersensitivity peaks. Formaldehyde-Assisted Isolation of Regulatory Elements (FAIRE)-seq peaks also showed significant enrichment in breakpoint regions. Both interstitial loss and telomeric neutral breakpoints showed P-values for DNase I enrichment of less than 0.001. Telomeric neutral breakpoints demonstrated a higher enrichment for FAIRE-seq peaks than interstitial losses (P-values = 0.001 and 0.022, respectively).
Gene-rich regions
A count of RefSeq genes spanning breakpoint regions indicated that telomeric neutral breakpoints were significantly enriched for RefSeq genes compared to random permuted regions (P-value < 0.001). Additionally, counts of CpG islands were also found to be elevated for telomeric neutral breakpoint regions (P-value < 0.001). There was no evidence to indicate enrichment for either RefSeq genes or CpG islands in interstitial loss breakpoint regions (P-values = 0.57 and 0.34, respectively).
Meiotic recombination rate
The mean recombination rate for genomic segments spanning breakpoint regions was significantly elevated when comparing breakpoint regions for mosaic losses or copy-neutral events to randomly selected breakpoint regions (P-values < 0.001). The mean recombination rates for interstitial losses and telomeric neutral events was 1.02 and 1.03, respectively, which was significantly greater than the permutation mean recombination values of 0.84 and 0.85, respectively (P-values < 0.001).
Repeat regions
In the vicinity of both interstitial loss and telomeric neutral breakpoints, short interspersed nuclear elements (SINEs) were observed to be more common than in random breakpoint regions (P-values < 0.001). Interstitial losses were also enriched for long interspersed nuclear elements (LINEs) elements (P-value < 0.001). Only telomeric neutral breakpoints indicated evidence for significantly fewer long terminal repeats (LTRs) (P-value = 0.002), but showed enrichment for segmental duplications (P-value < 0.001).
Discussion
In an effort to investigate genomic features that could be related to one or more initiating mechanisms, breakpoint analyses were carried out on 1231 estimated breakpoint regions for the two most common classes of mosaic events: interstitial losses and telomeric copy-neutral events. We found evidence suggesting significant clustering of breakpoint regions as well as enrichment for open chromatin, genes, meiotic recombination hotspots and select repetitive elements in breakpoint regions. These findings suggest a common molecular footprint at breakpoint regions and hint at a set of common mechanisms that could be driving DNA double strand breaks and subsequent structural mosaic event formation.
DNA double strand breaks (DSBs) can occur due to a variety of events. Naturally occurring DSBs are part of V(D)J recombination responsible for the diversity of antigen recognition sites on B and T cells, immunoglobulin class-switch recombination, and meiosis. DSBs are also produced in the cell during repair of stalled replication forks, by reactive oxygen species that are produced during normal cellular respiration, and by topoisomerases that relax torsional stress on DNA (26,27). Environmental sources, such as ionizing radiation and radiomimetic drugs, also create DSBs (28). Left unrepaired, DSBs can lead to cell death or chromosomal instability [for a recent review of DSB repair see (29,30)].
There are two primary mechanisms by which cells repair DSBs: non-homologous end joining (NHEJ) and homologous recombination (HR). NHEJ is the joining of two broken ends of DNA, with little processing of the ends producing short indels, hence this pathway is considered to be error prone. Under certain conditions, more extensive degradation of the ends may occur prior to re-ligation, generating larger gaps. When this occurs, regions of microhomology could be instrumental in bringing the broken DNA ends together (31,32). This is sometimes referred to as alt-NHEJ and would produce breakpoints that would appear as the mosaic interstitial losses we detect. Exactly how such a large event could arise is daunting.
By comparison, HR is considered a more precise DSB repair pathway than NHEJ. During HR, the DSB is resected to form single stranded DNA that pairs with homologous sequences and primes repair synthesis to fill in any genetic information that was lost during break formation and end resection. During the S and G2 phases of the cell cycle, a sister chromatid is present which would allow error-free repair of the break. When a sister chromatid is not used for repair, any homologous sequence could be used. Such repair could introduce short stretches of LOH (<2 Mb), which our method would likely not detect, or more severe events that we would detect as interstitial losses and telomeric neutral events (33).
In a comparison with randomly sampled regions across the genome, interstitial loss and telomeric copy neutral events had breakpoint regions with higher mean predicted hydroxyl radical cleavage intensity scores and significant enrichment for DNase I hypersensitivity and FAIRE-seq peaks (P-values < 0.001). These markers most likely indicate open chromatin by measuring DNA backbone solvent accessibility, cleavage by the DNase I protein and the absence of formaldehyde cross-linking. This evidence suggests mosaic breakpoint regions preferentially occur in genomic regions with open chromatin, which are more susceptible to environmental mutagens such as radiation exposure and genotoxic chemicals (34–36), leading to an increased likelihood of either DNA adducts or development of DSBs. Repair of these lesions by NHEJ or HR could lead to structural genomic events that form mosaic copy neutral events as well as copy losses.
Regions of open chromatin can have higher levels of transcriptional activity since they often cluster near gene promoters of regulatory regions (37). We observed enrichment for RefSeq genes and CpG islands, typically located near promoters of frequently expressed genes, in the vicinity of the breakpoint regions of copy neutral telemetric events. Expressed genes are scanned by RNA polymerases which detect DNA damage and initiate transcription-coupled repair (38). Random or inherited errors in transcription-coupled repair, particularly at non-canonical DNA structures, can result in genomic instability that could predispose to the formation of large structural mosaic events. In addition, transcription requires the activity of topoisomerases to relieve torsional stress within topological domains. The action of topoisomerases can lead to the production of DSBs (39,40).
Breakpoint regions were also observed to have higher recombination rates in comparison to random regions. Recombination rates are known to vary by ancestry (41–43). Preliminary evidence suggests the frequency of mosaicism could also vary by ancestry (4,24); however, the mechanism linking ancestry and recombination rate to mosaicism is unclear. Recent studies have shown that the recombination rate, primarily a measure of meiotic recombination, could be related to PDRM9 and other genes (43); so far, no evidence has been linked between recombination rates and mosaic events, suggesting a distinct set of mechanisms. Our observation that regions of mosaic break points have higher recombination rate may simply reflect that these are sites of DNA accessibility. Additional studies are needed to replicate this association and determine whether local meiotic recombination rate is an important factor for developing mosaic events.
Repeat elements such as short interspersed elements (SINEs), long interspersed nuclear elements (LINEs) and segmental duplications were also found to be enriched in the two studied common classes of events, suggesting sequence homology could be important contributors. Previously, LINE transposable elements have been reported to cause DNA copy-number alterations during embryogenesis as well as in neural progenitor cells (44,45). Genomic deletions caused by LINEs have also been found to be associated with genetic diseases (46). LINEs are normally active in the germline, but somatic retrotranspositions have been reported in a number of cancers (47). The LINE-1 endonuclease required for the retrotransposition may also be a source generating DSBs.
In support of DSB repair pathways as a source for mosaicism, inherited or acquired deficits in DNA repair pathways such as NHEJ or HR may predispose to the development of structural genetic mosaicism (48). A recent GWAS study evaluating mosaic chromosome Y loss as a continuous trait reported 14 loci and in particular, germline variation near ATM and TREX1, genes important for promoting DNA repair, associated with mosaic Y loss (49), further suggesting a link between DSB repair and the development of genetic mosaicism.
While our analysis focused on autosomal mosaicism, mosaicism of the sex chromosomes (e.g. X and Y chromosomes) is more frequent than autosomal mosaicism and more commonly affects the entire chromosome (16,50,51). It is unclear why mosaic gains or losses of an entire chromosome are more common for sex chromosomes. One possibility is the lack of a homologous chromosome limits partner choice for DSB repair. There is little homology between the X and Y chromosome outside of the pseudoautosomal regions. Further, the inactivated X chromosome is late replicating, limiting the presence of a sister chromatid for HR of X chromosome to a short window of time (52). Interestingly, the effect of large mosaic events on the inactive chromosome parallels what has been reported in somatic characterization of cancer genomes, namely the majority arise on the inactive X chromosome and at a density higher than the autosomes (53). Since a sister chromatid is not present for most of the cell cycle, combined with the lack of a homolog, templates for DSB repair thru HR for the sex chromosomes are limited. The lack of a homologue requires DSBs to be repaired by a more error prone pathway that could lead to increases in mosaicism for these chromosomes.
Our investigation of estimated breakpoint regions focused on the two most common classes of mosaic events; therefore, observations here may not accurately reflect the molecular footprint or mechanisms important for breakpoint formation of other structural mosaic event types. We investigated overall associations across all events in two classes of mosaic events and may have missed key biological mechanisms important for individual events or biological mechanisms not related to the data tracks we investigated. Examples of such mechanisms could include DNA replication errors, mitotic missegregation and defects in telomere maintenance (54). Further studies should include expanded sample sizes as well as functional laboratory work beyond the scope of this study. These studies could employ whole genome sequencing approaches so chimeric reads can be utilized to more precisely map breakpoint sites. The non-random clustering of detected mosaic events suggests a common set of mechanisms could predispose to mosaic event formation. We have proposed DNA repair pathways such as NHEJ and HR as plausible biologic mechanisms that may be related to the initiation of somatic mosaic events. Future functional studies on mosaic event formation are needed to better understand the influence local genomic characteristics may have on developing double strand breaks from endogenous and exogenous exposures and how DNA repair pathways permit the formation and clonal expansion of structural genomic events in a subset of cells.
Materials and Methods
To determine if common genetic features are associated with breakpoint formation, event breakpoints were extracted and combined from previously published studies of large scale clonal mosaicism of the autosomes (4,20,21). Breakpoint regions were defined as 200 Kb windows centered on SNP array based breakpoint estimates, which account for the uncertainty around calling breakpoints based on probe location and density. Due to the limited abundance of events from different classes, we chose to focus our analysis on two predominant classes of breakpoints: breakpoints from mosaic losses that did not include a telomere or centromere, hereafter referred to as ‘interstitial mosaic losses’ (N = 688), and non-telomeric breakpoint ends of mosaic copy-neutral events involving telomeres, hereafter referred to as ‘telomeric neutral events’ (N = 544) (Fig. 1).
Several publicly available genetic tracks were downloaded from the University of California, Santa Cruz (UCSC) ftp site (ftp://hgdownload.cse.ucsc.edu) to investigate potential enrichment of genetic elements around breakpoint regions. Specific data tracks included OH Radical Cleavage Intensity Database (ORChID) Version 2 (/gbdb/hg18/bbi/wgEncodeBuOrchidSignalRep2Gm12878.bw), FAIRE-seq Peaks for lymphoblastoid cell lines (/goldenPath/hg18/database/wgEncodeUncFAIREseqPeaksGm12878V3.txt.gz, wgEncodeUncFAIREseqPeaksGm18507.txt.gz, wgEncodeUncFAIREseqPeaksGm19239.txt.gz), DNase I hypersensitivity peaks for lymphoblastoid cell lines (/goldenPath/hg18/database/wgEncodeUwDnaseSeqPeaksRep1Gm06990.txt.gz, wgEncodeUwDnaseSeqPeaksRep2Gm06990.txt.gz, wgEncodeUwDnaseSeqPeaksRep1Gm12865.txt.gz, wgEncodeUwDnaseSeqPeaksRep2Gm12865.txt.gz, wgEncodeUwDnaseSeqPeaksRep1Gm12878.txt.gz, wgEncodeUwDnaseSeqPeaksRep2Gm12878.txt.gz), RefSeq genes (goldenPath/hg18/database/refGene.txt.gz), CpG islands (goldenPath/hg18/database/cpgIslandExt.txt.gz), deCODE sex-averaged recombination rates (/gbdb/hg18/decode/SexAveraged.bw), repeat elements (chr*_rmskRM327.txt.gz) and segmental duplications (goldenPath/hg18/database/genomicSuperDups.txt.gz). When available, data from multiple lymphoblastoid cell lines were merged to obtain better estimates of cell-line specific feature location.
To obtain estimates of the underlying genomic distributions from which the detected mosaic breakpoints arose, 500 random 200 Kb genomic regions spanning accessible regions of the genome were selected for each permutation run. Random points were chosen on chromosomes with selection probability based on chromosomal length. The non-independent nature of matched breakpoints for interstitial events was accounted for by selecting paired points separated by length estimates chosen from normal distributions that incorporated empirically derived mean and standard deviations of length for the event copy number state of interest. These length distributions were left truncated to adjust for the 2 Mb size threshold set reported in the literature (4). The UCSC Gap tracks (goldenPath/hg18/database/chr*_gap.txt.gz) was used to ensure random breakpoint positions were not located in inaccessible genomic regions such as centromeres or the p-arm of acrocentric chromosomes. If random breakpoint positions were present in inaccessible regions, the breakpoint was discarded and a new one selected to ensure comparability between the random breakpoints and the observed mosaic event breakpoints. To account for different SNP coverage on distinct genotyping arrays, the nearest SNPs to the randomly selected points were selected from an intersection set of Illumina SNP manifest coordinates to represent the estimated random breakpoint positions as if it were detected by an Illumina array. Finally, a ±100 Kb region around the chosen SNP was selected as the random breakpoint region. Although rare, some randomly selected regions may overlap with mosaic breakpoint regions since the search space for randomly selecting breakpoints would encompass all observed mosaic breakpoint regions.
Where data values existed for track elements, mean values were taken across mosaic breakpoint regions using the UCSC bigWigAverageOverBed utility and compared to means from 1000 permutations of randomly selected breakpoints. P-values were calculated as the fraction of permutation means that were at least as extreme as the original means derived from breakpoint data. When only regional coordinates were available for track elements, overlaps between breakpoint regions and track elements were counted and compared to overlap counts between random regions and track elements. P-values for feature counts were likewise based on permutation.
Acknowledgements
This research was funded by the Intramural Research Program of the United States National Cancer Institute.
Conflict of Interest statement. None declared.
References