The elusive evidence for chromothripsis

The chromothripsis hypothesis suggests an extraordinary one-step catastrophic genomic event allowing a chromosome to ‘shatter into many pieces’ and reassemble into a functioning chromosome. Recent efforts have aimed to detect chromothripsis by looking for a genomic signature, characterized by a large number of breakpoints (50–250), but a limited number of oscillating copy number states (2–3) confined to a few chromosomes. The chromothripsis phenomenon has become widely reported in different cancers, but using inconsistent and sometimes relaxed criteria for determining rearrangements occur simultaneously rather than progressively. We revisit the original simulation approach and show that the signature is not clearly exceptional, and can be explained using only progressive rearrangements. For example, 3.9% of progressively simulated chromosomes with 50–55 breakpoints were dominated by two or three copy number states. In addition, by adjusting the parameters of the simulation, the proposed footprint appears more frequently. Lastly, we provide an algorithm to find a sequence of progressive rearrangements that explains all observed breakpoints from a proposed chromothripsis chromosome. Thus, the proposed signature cannot be considered a sufficient proof for this extraordinary hypothesis. Great caution should be exercised when labeling complex rearrangements as chromothripsis from genome hybridization and sequencing experiments.

2 Simulation of a chromosome simulated with progressive inversions and deletion 3 Equivalence of inversions, intra-chromosomal translocations, and interchromosomal translocations Claim: A genome that is rearranged by one or more copy number neutral events (i.e. intra-chromosomal translocations) can be equivalently explained by a sequence of inversions.
Our manuscript suggests that a sequence of inversions suffice to explain the rearranged chromosome and copy number states of the cell-line SNU-C1, previously only explained by chromothripsis. This raises the natural question, is a sequence of inversions a better biological explanation? However, the question is misdirected. A more relevent question, are the other biological explanations as plausible as chromothripsis? We do not need to invoke inversions solely to explain the rearrangements. Here, we explain how any sequence of copy-neutral rearrangements can be equivalently explained by inversions. Conversely, a sequence of inversions can possibly be explained using a combination of copy-neutral rearrangements. To intuitively understand the equivalence, we consider three examples below.
• We show that an intra-chromosomal translocation is explained by a sequence of inversions.
• We define the concept of k-break rearrangements (also known as complex genome rearrangements), and show that both inversions and chromothripsis are special cases of k-break rearrangements. • We extend these descriptions to inter-chromosomal translocations, and show that inter-chromosomal translocations can also be modeled as a sequence of inversions.
To reiterate, when we show that a complex rearrangement is modeled by a sequence of inversions, it is equivalent to saying that it is modeled by some sequence of copy neutral events. For more precise and thorough arguments of these ideas, see [11,19,18].

Intra-chromosomal translocation (transposition)
Transposition Figure 2: Transposition of 'de' into 'gh'. The event of a DNA segment moving to a new position in the genome is referred to as transposition. An example of transposition is shown in Figure S2. In the example, the segment 'de' moves between 'g' and 'h', which rearranges "abcdefghij" to "abcfgdehij". The removal of 'de' creates two breaks in sequence, and the insertion breaks the sequence 'gh'. As a result, three breakpoint are observed, corresponding to the fusion of 'cf', 'gd', and 'eh'. While the singular event is referred to as transposition, the breakpoints can also be explained by a sequence of three inversions, inversion of 'de', followed by inversion of 'fg', and finally inversion of '(-e)(-d)(-g)(-f)' ( Figure S2). Contrarily, when we say that the rearrangements are explained by a sequence of inversions, we are only claiming that they are explained by some sequence of copy-neutral events. It is an intuitive but misguided conclusion that rearrangements leave characteristic signatures of paired-end discordant mappings. Certainly for simple, single events, this is true, and we see characteristic signatures for deletions, inversions, etc., as reported by many authors [25]. However, when we look at complex rearrangements, the signatures are no longer predictive. Transposition is an intuitive example of a one step multi-break rearrangement, and we showed in the previous example that it could be equivalently explained by inversions. To generalize this, Alekseyev and Pevzner [19] developed the notion of k-break rearrangements and argued that as k increases the number of rearrangement events dropped dramatically. However, given a rearranged genome, not much can be said about the type and number of rearrangements that created it.

Complex Intrachromosomal Rearrangement
Here, we consider an empirical example. McPherson et al. [26] analyzed cancer RNA-seq data and found numerous examples of aberrant gene fusions that implied complex genome rearrangement events. In Figure S3, we illustrate a poly-fusion, observed on chromosome 8 by McPherson et al. Within the tumor chromosome, the authors observed a polyfusion of genic and non-genic parts comprised of inverted PHF20L1, an inverted non-genic section between PHF20L1 and FAM49B, inverted FAM49B, and SAMD12. In Figure S3, chromosome 8 is represented by the sequence "abcdefghij" where segments represent genes as follows: 'd' corresponds to SAMD12, 'e' corresponds to FAM49B, 'f' corresponds to the non-genic section between PHF20L1 and FAM49B, and 'gh' is PHF20L1. The tumor chromosome 8 observed by McPherson et al. [26] in the block representation is "abcd(-e)(-f)(-h)(-g)ij". Since there was no evidence to support a particular DNA rearrangement mechanism, the authors refrained from suggesting how the complex event arose. We demonstrate two extreme scenarios for explaining the breakpoints observed in Figure S3. First, the breakpoints could be created in a single multi-break shuffle as shown in Figure S3. In the second scenario, the breakpoints could be explained by three sequential inversions, inversion of 'e', inversion of 'f', and inversion of 'gh'. Both scenarios equally explain the rearranged tumor chromosome. Thus, any claim that one or the other event is more likely must be supported by other evidence. One could think of chromothripsis as another extreme scenario where all observed breaks are due to a single massive rearrangement. However, the statistical evidence from the rearranged genome cannot really distinguish chromothripsis from other sequences of copy neutral changes.
While chromothripsis was originally claimed to occur on a single chromosome, the mechanism has been been additionally claimed to affect loci clusters across multiple chromosomes [3,5,6,8,10]. Genome rearrangement theory extends inversions to even apply to rearrangements affecting multiple chromosomes, and once again, a progressive sequence of 'inversions' (a more generalized version) can be used to explain these breakpoints as well. Once again, we are only claiming that the statistical evidence cannot distinguish between a multitude of copy neutral events, or a single, catastrophic event.
McPherson et al. discovered another gene fusion where ZDHHC11 and RNF130 from chromosome 5 are fused together by a non-genic sequence from chromosome 8. In Figure S4, two extreme scenarios for rearrangement explain the breakpoints on the rearranged chromosomes. Here, chromosome 5 is labeled as "abcdef" and chromosome 8 is "ghij". The ZDHHC11 and RNF130 genes are represented by segments 'bc' and 'de', respectively. The observed tumor chromosome 5 is "a(-c)(-b)hdef". Again, one extreme is all the rearranged segments break simultaneously and form the rearranged tumor chromosomes ( Figure S4). The other extreme is shown in Figure S4. First, suppose chromosome 5 and 8 are concatenated into "abcdefghij". The observed tumor chromosome 5 can be created by an inversion of "bcdefg" (also known as reciprocal interchromosomal translocation), an inversion of "(-c)(-b)h", and finally an inversion of "(-g)(-f)(-e)(-d)(-h)bc" (reciprocal interchromosomal translocation). Note that the term "inversions" generalizes in genome rearrangement theory and encapsulates the genomics term "interchromosomal translocation". Also, the example demonstrates both scenarios equally explain the same rearranged chromosome.

What can and cannot be inferred from a complex rearranged genome
We are ready to tie together claims from the previous discussion.
1. Available statistical evidence does not distinguish a single scenario of copy number neutral events (e.g., chromothripsis) as the source of genome rearrangement. 2. Explaining a rearranged genome using only inversions does not mean that inversions are the cause of the rearrangement. Instead, it suggests that an abundance of copy neutral events created the rearrangement. 3. Similarly, the number of inversion events is not representative of actual number of rearrangement events that occurred. 4. The copy neutral events are best generalized as k-break operations. In this scenario, inversions are 2-breaks, transpositions, and inverted transpositions are 3 breaks, and chromothripsis is the extreme example of an n-break. All combinations can generate complex rearranged genome, and deciding among those options needs additional evidence.

Formal description of algorithms
Stephens et al. simulation Observed breakpoints are supplied as input to the simulation. In the process, breakpoints are added to a single chromosome in random order. Each breakpoint is two positions with strandedness of the segments adjacent to the positions. B is a set of breakpoints. Assume all segments are in the forward orientation and there is a parsimonious occurence of rearrangements that create the observed breakpoints. Thus, (+, −), (−, +), (−, −), (+, +) breakpoint strandedness correspond with deletion, tandem duplication, tail-to-tail inversion, and head-to-head inversion. Deletions and tandem duplications change the copy number state of the chromosomal segment between breakpoint positions (chromosome length changes). Inversions only inverted the segment and do not change the copy number state (no chromosome length change). b ← unique or non-unique count of segment copy number changes. 1: procedure KinsellaSimulateChrom(chromosome-length, rearr-num-distr, rearr-size-distr, inv-rate, del-rate, dup-rate, tandem-dup-rate, invert-dup-rate) 2: inv-rate, del-rate, dup-rate add up to 1. b ← unique or non-unique count of segment copy number changes.

14:
return c, b 15: end procedure Kinsella et al. grimace Show an observed chromosome, supposed to have been shattered, can be created by progressive rearrangements. First, we convert a set of breakpoints into a signed permutation of blocks with respect to a reference genome (i.e. synteny blocks). The signed permutation is then given to GRIMM to resolve a series of progressive rearrangements to reconstruct the reference genome.
For SNU-C1, breakpoints lie within a 80MB region (79749024 bp). Resulting blocks only cover a 40MB region (39652551 bp). No chromosome had two breakpoints with overlapping positions.
G(V, E) ← incomplete breakpoint graph. Nodes have 1 or 2 edges. if only two nodes s, t ∈ B have exactly 1 neighbor in G then

16:
Subgraph is a path starting from s and ending at t Assemble complete path from (1, head) to (L, tail)