Toward precise CRISPR DNA fragment editing and predictable 3D genome engineering

Abstract Ever since gene targeting or specific modification of genome sequences in mice was achieved in the early 1980s, the reverse genetic approach of precise editing of any genomic locus has greatly accelerated biomedical research and biotechnology development. In particular, the recent development of the CRISPR/Cas9 system has greatly expedited genetic dissection of 3D genomes. CRISPR gene-editing outcomes result from targeted genome cleavage by ectopic bacterial Cas9 nuclease followed by presumed random ligations via the host double-strand break repair machineries. Recent studies revealed, however, that the CRISPR genome-editing system is precise and predictable because of cohesive Cas9 cleavage of targeting DNA. Here, we synthesize the current understanding of CRISPR DNA fragment-editing mechanisms and recent progress in predictable outcomes from precise genetic engineering of 3D genomes. Specifically, we first briefly describe historical genetic studies leading to CRISPR and 3D genome engineering. We then summarize different types of chromosomal rearrangements by DNA fragment editing. Finally, we review significant progress from precise 1D gene editing toward predictable 3D genome engineering and synthetic biology. The exciting and rapid advances in this emerging field provide new opportunities and challenges to understand or digest 3D genomes.


Introduction
The successful finishing of the Human Genome Project ushers in a new era to understand and engineer genomes by reverse genetics. However, the folding of 3-billion-bp 1D mammalian genomes, which are $2 m long, into 3D structures within cell nuclei of $5 mm in diameter adds another layer of complexity. The secret of 3D genome coding likely resides in the non-coding regions-the 97.5% of mammalian genomes-that were once assumed to be 'junk DNA' but are now regarded as 'crown jewels'. Specifically, high-throughput mapping of functional genomic sequences has revealed numerous non-coding DNA elements, up to 8.4 million in number (Neph et al., 2012). In addition, junk DNA transcribes so-called 'junk RNA'-numerous long non-coding RNA-whose functions are difficult to study (Cech and Steitz, 2014). The organizational and structural roles of these non-coding DNA elements in 3D genome regulation and function necessitate functional genetic experiments.
Trekking across time: the long journey of reverse genetics leading to CRISPR and 3D genome editing Genetic research focuses on heredity and 'mutants' (Castle and Little, 1909;Muller, 1930). Some mutants arise spontaneously but specific mutants are usually generated through tedious forward genetic screening experiments (Acevedo-Arozena et al., 2008). Forward genetic screening in mice was performed before the mouse genome sequencing was finished and greatly contributed to our understanding of human physiology (Kile and Hilton, 2005). However, reverse genetics that would generate specific alterations of mammalian genomic sequences or so-called gene targeting was a dream in the early days.

Transgenic: random integration in animal and plant genomes
Transgenes were originally derived from viruses and transposons or so-called jumping genes in animals and plants (McClintock, 1950;Jaenisch and Mintz, 1974;Bevan et al., 1983). A transgene can be integrated randomly into one or very few sites of the mouse genome and exhibits expression patterns with position-effect variegations ( Figure 1A; Jaenisch and Mintz, 1974;Gordon et al., 1980;Brinster et al., 1981;Costantini and Lacy, 1981). Multiple copies of transgenes are typically integrated at a random genomic site in tandem arrays as a head-to-tail concatemer ( Figure 1A; Brinster et al., 1981;Folger et al., 1982). Homologous recombination (HR) was demonstrated convincingly to be the predominant mechanism of head-to-tail transgene integration (Folger et al., 1982). In fact, it is with this conviction that eventually led to the development of gene targeting in mice (Capecchi, 2005).

Gene targeting or knockout mice
Gene targeting is different from transgenic technologies and has greatly accelerated biological researches. Even before the completion of the mouse genome sequencing, the dream of specific modification of any mouse locus had been realized by so-called gene targeting ( Figure 1A; Smithies et al., 1985;Thomas et al., 1986). The technique is achieved by constructing a targeting vector with designed modification in a specific locus, which is flanked by two homologous arms. This donor template is then introduced into mouse embryonic stem cells (ESCs) (Evans and Kaufman, 1981;Martin, 1981) and replaces the endogenous sequences through HR ( Figure 1A). Finally, the ESC clones carrying the designed specific modification are then injected into the mouse blastocoel cavity to generate chimeric mice. Heterozygous or homozygous mice could then be obtained simply by breeding. The remarkable technique and general protocol for generating knockout mice with any gene targeted were quickly developed (Mansour et al., 1988).
CRISPR: clustered regularly interspaced short palindromic repeats CRISPR/Cas9 is an RNA-guided adaptive immune system of bacteria and archaea, which defends against phage or virus infection and plasmid conjugation. The type II CRISPR/Cas9 system has been widely used for genome editing. The programmable CRISPR/Cas9 system consists of a synthetic single-guide RNA (sgRNA; derived from crRNA and tracrRNA) and RNA-guided Cas9 nuclease ( Figure 1D; Jinek et al., 2012). Upon recognition of a protospacer adjacent motif (PAM, NGG for SpCas9 from Streptococcus pyogenes) downstream of the targeting sequence, Cas9 cleaves the complementary and noncomplementary strands of the target DNA duplex by the HNH and RuvC nuclease domains, respectively (Garneau et al., 2010;Gasiunas et al., 2012;Jinek et al., 2012), resulting in presumed blunt-ended DSBs which are then ligated by cellular endogenous DNA repair machineries ( Figure 1D).

Gene-editing outcomes from single DSBs
There are numerous gene-editing applications of single DSBs from CRISPR. The simplest application is the generation of frameshift mutations in the coding region of a protein-encoding gene. Cas9 can be reprogrammed by single sgRNAs to target a coding exon, generating one DSB that often leads to nucleotide insertions and/or deletions (indels). Two-thirds of these indels can cause a shift in the open reading frame of a protein-coding gene, resulting in truncated protein translation or null mutation through the nonsense-mediated mRNA decay. Recent studies demonstrated, however, that single DSBs also lead to large deletions from extended long resections (Li et al., 2015a;Shin et al., 2017;Kosicki et al., 2018Kosicki et al., , 2020Jia et al., 2020). In addition, Cas9 with single sgRNAs causes frequent loss-ofheterozygosity or gene conversion as well as allele-specific chromosomal removal in human embryos (Alanis-Lobato et al., 2020;Liang et al., 2020;Zuccaro et al., 2020). Finally, if a donor DNA template is provided, single DSBs often lead to targeted precise gene insertions through HR ( Figure 1B).

3D genome primer
Although genetic information is encoded in the finished linear 1D genomic sequences, the extremely long and thin DNA molecules do actually exist in Euclidean 3D space and are physically folded into a cell nucleus. Each interphase chromosome occupies a distinct territory and compartmentalizes further into multiple topologically associated domains (TADs). The recognition sites of architectural protein CCCTC-binding factor (CTCF) are enriched at boundaries of chromatin domains; however, there are also numerous CTCF sites located within topological domains or TADs. Exactly how 3D genomes are folded and regulated remains unknown; however, novel technological developments have enabled tremendous progress in 3D genomics (Banigan and Mirny, 2020;Li et al., 2020a;Zhang and Li, 2020). In particular, DNA fragment editing or CRISPR-induced chromosomal rearrangements have shed significant insights into the mechanisms of 3D genome folding (Liu and Wu, 2020).
There are numerous excellent reviews on CRISPR or 3D genomics (Doudna and Charpentier, 2014;Huang and Wu, 2016;Jiao and Gao, 2016;Yan and Li, 2019;Yang and Huang, 2019;Zhang, 2019;Anzalone et al., 2020;Li et al., 2020a;Yang and Chen, 2020;Zhang and Li, 2020;. Here, we Figure 1 Schematic of genetic methods for specific genome modifications. (A) Gene targeting is achieved by sequence replacement with a donor template harboring designed sequences flanked by two homologous arms in a specific genome locus. In addition to targeted replacement, occasional random integration in a non-specific genome site results in transgenic insertion of a tandem concatemer. (B) DSB greatly stimulates gene targeting but not random transgenic integration. However, it can also result in targeted head-to-tail insertion at the DSB site. (C) A simplified illustration of gene editing by ZFNs and TALENs. In ZFNs, each zinc-finger recognizes three specific nucleotides. In TALENs, each nucleotide is recognized by a TALE repeat, which carries two specific amino acids. ZFP, zinc-finger protein. (D) The type II CRISPR/Cas9 system. Cas9 nuclease is programmed by CRISPR RNA (crRNA) and trans-activating CRISPR RNA (tracrRNA), which can be fused into a single synthetic guide RNA (sgRNA). focus on chromosomal rearrangements and 3D genome engineering by DNA fragment editing using Cas9 with dual sgRNAs.

Chromosomal rearrangements by CRISPR with dual sgRNAs
Structural chromosomal abnormalities or chromosomal rearrangements include DNA fragment deletions, inversions, duplications, translocations, and insertions ( Figure 2; Shaffer and Lupski, 2000;Huang and Wu, 2016). Chromosomal rearrangements are estimated to occur at 0.6% of human newborns (Jacobs et al., 1992). In addition, recurrent chromosomal rearrangements are quite frequent in human neurological diseases (Weckselblatt and Rudd, 2015) and tumors (Rabbitts, 1994;Mitelman et al., 1997). Early studies to model human diseases generated large chromosomal rearrangements of up to tens of millions bp in mice through the combined technologies of gene targeting and Cre/LoxP recombination (Ramirez-Solis et al., 1995;Herault et al., 1998;Wu et al., 2007;and reviewed in Mills and Bradley, 2001;Yu and Bradley, 2001). ZFNs and TALENs have also been used to generate chromosomal rearrangements in human cells (Lee et al., 2010;Gupta et al., 2013;Nyquist et al., 2013;Xiao et al., 2013). In this section, we outline 3D genome engineering by modeling chromosomal rearrangements using the CRISPR/Cas9 system with dual sgRNAs (Figure 2; Li et al., 2015b).

Chromosomal rearrangements by DNA fragment editing
Disruption of a specific gene of interest could be easily achieved by Cas9 reprogrammed with single sgRNAs because two-thirds of random indels at a DSB site within a proteincoding region result in frameshifts. For non-coding elements, however, random indels induced by Cas9 with single sgRNAs are usually not enough. A practical way to characterize noncoding regions, of which there are estimated millions in mammalian genomes, is to generate very large deletions containing defined regions with multiple non-coding elements (Wu et al., 2007). Engineering a large DNA fragment could be achieved by Cas9 reprogrammed with dual sgRNAs, which would generate two concurrent DSBs in a genome ( Figure 2). Specifically, with the participation of cellular DNA repair proteins, the four DSB ends generated by the two Cas9 cleavages are randomly ligated, resulting in DNA fragment deletion or inversion when concurrent DSBs occur on single chromosomes (Figure 2A and B) and DNA fragment duplication or translocation when the DSBs occur on different chromatids or chromosomes ( Figure 2C and D).

DNA fragment inversion by CRISPR
In addition to DNA fragment deletions, DNA fragment inversion events also occur through double cutting, which is different from double nicking, within single chromosomes ( Figure 2B). Different from DNA fragment deletion, in which there is only one junction after deleting the intervening sequences, DNA fragment inversion has an upstream junction and a downstream junction after inverting the intervening DNA fragment (Huang and Wu, 2016).

DNA fragment duplication by CRISPR
Chromosomal duplications can be generated by trans-allelic ligations of DSB ends in two homologous chromosomes or chromatids ( Figure 2C; Golic and Golic, 1996;Wu et al., 2007;Li et al., 2015a). Specifically, DNA fragment duplications can be generated by complementary trans-chromatid ligations of paracentric DSB ends resulting from cleavages by Cas9 guided with dual sgRNAs after DNA replication during both mitosis and meiosis. Thus, Cas9 guided with dual sgRNAs induces DNA fragment duplications in cultured cells (Kraft et al., 2015;Li et al., 2015a). In addition, DNA fragment duplications in mice in vivo can be induced by Cas9 with dual sgRNAs through pronuclear microinjection (Li et al., 2015a;Korablev et al., 2017). In particular, a tandem duplication of a 1211-bp DNA fragment was confirmed by Sanger sequencing of the entire duplicated segment (Li et al., 2015a). Finally, quantitative analyses revealed frequent segmental duplications by Cas9 with dual sgRNAs, though with lower efficiency compared with that of DNA fragment deletions and inversions (Li et al., 2015a).

Chromosomal translocation by CRISPR
Chromosomal translocations result from joining DSB ends in two distinct chromosomes ( Figure 2D). Recurrent chromosomal translocations are frequent in many types of tumors especially in leukemias (Lieber, 2016;Vanoli and Jasin, 2017;Brunet and Jasin, 2018;Cheong et al., 2018). Cas9 reprogrammed with dual sgRNAs that target specific loci in non-homologous chromosomes has been used to induce chromosomal translocations to model human Ewing's sarcoma, desmoplastic small round cell tumors, and acute myeloid leukemia (AML) (Torres et al., 2014;.    Seruggia et al. (2015) continued Precise CRISPR DNA fragment editing and predictable 3D genome engineering | 833 Relationship between DNA fragment size and editing frequency Deletion frequencies at some loci are inversely correlated with the sizes of the intervening sequences between the two cleavage sites (Canver et al., 2014). However, at other loci, there is no inverse correlation between DNA-fragment-deletion frequency and the fragment size ( Table 1; He et al., 2015;Kraft et al., 2015;Li et al., 2015a;Schmieder et al., 2018). In addition, the frequencies of DNA-fragment inversion and DNAfragment duplication have no relationship with fragment sizes ( Table 1). The DNA fragment-editing frequency may be related to the locus-specific 3D chromatin structure as well as the spatial distance between the two cutting sites, which is an unresolved problem requiring further studies.

DNA fragment insertion by CRISPR
DNA fragment insertion can be efficiently achieved through the CRISPR system using Cas9 with either dual sgRNAs or single sgRNAs ( Figure 2E). Mechanistically, DNA fragment insertions can be achieved by either HR or non-homologous end-joining (Suzuki et al., 2016). It is known that single cuts by Cas9 stimulate DNA fragment insertion through HR with a donor template harboring flanking homologous arms. One study carefully investigated the DNA fragment insertion efficiencies of HR by Cas9 with dual sgRNAs (Byrne et al., 2015). Moreover, Cas9 with dual sgRNAs targeting both the genome and donor template may be more efficient through homology-mediated end joining (HMEJ) (Yao et al., 2017). However, insertion needs careful screening for single-copy insertional clones or mice because any donor template could result in random head-to-tail tandem insertions just as transgenes ( Figure 1B; Folger et al., 1982;Skryabin et al., 2020). Thus, the DNA fragment insertion clones or mice are best screened by Southern blot analyses rather than by PCR only.

Many ways to cut and heal
The mutated sequences obtained from CRISPR/Cas9-editing result from eventual consequences of the opposite forces of Cas9 cleavage and cellular repair. Specifically, the observed random indels by Cas9 with single sgRNAs are eventual repaired outcomes after cycles of repeated ligation and cleavage of precisely ligated DNA ends. In addition to blunt-end cleavage, Cas9 can also cohesively cleave the DNA duplex generating staggered ends with 5 0 overhangs. Thus, the cohesive cleavage of Cas9 actually generates diverse profiles of DSB ends with distinct 5 0 overhangs. Finally, rapid progress in the field has made it possible to predict editing outcomes by manipulating DNA repair pathways (Long, 2019;Yeh et al., 2019).

Double cutting vs. single cutting
The plain difference between cleavages of double and single cutting is that double cutting generates four DSB ends. The combinatorial ligations of two of these four DSB ends result in a variety of chromosomal rearrangements ( Figure 2). The fundamental difference between double and single cutting is that in single cutting, after precise ligation of the two DSB ends, the repaired sequences still match the targeting sgRNA and thus can be recut. In contrast, the ligations of combinatorial two DSB ends out of the four ends from double cutting cannot be recut since the rearranged junctional sequences no longer match either of the two targeting sgRNAs (Huang and Wu, 2016;Shou et al., 2018;Shi et al., 2019). Therefore, dual-sgRNA-mediated chromosomal rearrangements maintain the integrity of Cas9-cleavage ends and make them less vulnerable to end-processing by repair enzymes (Figure 2). Hence, precise ligations upon direct rejoining of Cas9 blunt-cleavage ends after double cutting are much more frequent than after single cutting (Li et al., 2015a;Zhu et al., 2016b;Guo et al., 2018;Shou et al., 2018).

Cohesive Cas9 cleavage in vitro and in silico
Since the advent of Cas9-mediated genome editing, it has long been assumed that Cas9 cleaves the targeting DNA duplex at the À3 position upstream of the PAM site, generating blunted DSB ends with no overhang ( Figure 1D; Gasiunas et al., 2012;Jinek et al., 2012). In contrast to the earlier finding that Cas9 has potential exonuclease activity, in silico molecular dynamics modeling and in vitro high-throughput sequencing suggest that Cas9 cleaves the noncomplementary strand at the À4 position upstream of the PAM site Palermo et al., 2016;Zuo and Liu, 2016). In addition, in vitro cleavage of dsDNA, whose noncomplementary strand is labeled at the 3 0 ends, reveals both blunted and cohesive Cas9 cleavages (Shou et al., 2018;Stephenson et al., 2018). Specifically, in vitro cleavage of dsDNA duplex with the 3 0 -biotin-labeled non-complementary strand reveals flexible cleavages at the À4 and À3 positions upstream of the PAM site (Shou et al., 2018). Finally, deep sequencing of in vitro Cas9-cleaved products reveals flexible cleavages of the non-complementary strand at the À6, À5, À4, and À3 positions upstream of the PAM site but the exact cleavage of the complementary strand at the À3 position . Collectively, these studies clearly show that Cas9 endonucleolytically cleaves the non-complementary strand at the À6, À5, À4, and À3 positions in vitro, generating cohesive DSB ends with 1-3-nt 5 0 overhangs as well as blunted ends ( Figure 3A).

Cohesive Cas9 cleavage in vivo
Overwhelming evidence suggests cohesive Cas9 cleavage in vivo. First, the predicted metal coordination distance to the À3 phosphate is much larger than expected for the typical RuvC catalysis ( Figure 3B; Chen and Doudna, 2017). Second, Cas9-mediated nucleotide insertions at junctions of DNA fragment editing are strongly biased toward nucleotides at the À6, À5, and À4 positions upstream of the PAM site in vivo ( Figure 3A; Shou et al., 2018;Shi et al., 2019). Finally, by engineering the Cas9 hinge regions located between the HNH and RuvC nuclease domains, rationally designed Cas9 variants display R-loop-dependent alterations of the scissile profile of the non-complementary strand in vivo ( Figure 3A; Shou et al., 2018). Taken together, these studies suggest that Cas9 cleaves targeting DNA duplex with flexibility on the noncomplementary strand, resulting in DSB ends with 5 0 overhangs.

Mechanism of cohesive Cas9 cleavage
Cas9 RuvC and HNH nuclease domains cleave noncomplementary and complementary strands via putative twometal-ion and one-metal-ion mechanisms, respectively (Jinek et al., 2014;Nishimasu et al., 2014;Chen and Doudna, 2017). In both the two-metal-ion and one-metal-ion mechanisms, nucleophilic attack is always in-line from the 5 0 site of the phosphodiester bond, resulting in 5 0 phosphate and 3 0 hydroxyl groups ( Figure 3B; Yang, 2010). Whereas one magnesium ion coordinates Cas9 HNH active sites to the scissile phosphate at exactly the À3 position upstream of NGG PAM after a large conformational change, two magnesium ions coordinate Cas9 RuvC active sites to the scissile phosphate at positions further upstream of PAM, resulting in flexible Cas9 cleavages with variable staggered 5 0 overhangs.
After cutting-DSB repair pathways DNA damage response pathways are activated after Cas9 cleavage to repair the resulting DSBs. The repair of mammalian DSBs involves three possible pathways: HR, canonical non-homologous end-joining (cNHEJ), and alternative non-homologous end-joining (aNHEJ) that includes microhomology-mediated end joining (MMEJ) ( Figure 3C; Chang et al., 2017). In mammalian cells, when a template donor is available, the HR repair pathway is used to achieve precise genome editing, including insertion or replacement of specific sequences. However, the low efficiency of HR limits its usage (Ceccaldi et al., 2016a). When no donor is provided, both cNHEJ and aNHEJ ( Figure 3C) are predominant pathways for repairing DSBs introduced by Cas9.
In the cNHEJ repair pathway, the Ku70-Ku80 heterodimer recognizes DSB ends to protect them from being processed by resection nucleases ( Figure 3C; Deriano and Roth, 2013). The DNA-dependent protein kinase catalytic subunit (DNA-PKcs) and the endonuclease Artemis are then recruited to the Ku-DNA ends. They form an Artemis-PK-Ku complex at the DSB ends. Finally, precise ligations of the two DSB ends are catalyzed by the ligase IV-XRCC4-XLF complex (Deriano and Roth, 2013). Thus, cNHEJ is an accurate and precise DSB repair pathway (Shou et al., 2018).
The aNHEJ pathway was originally thought to be a backup repair mechanism for cNHEJ and it usually introduces small indels ( Figure 3C). If the cNHEJ repair pathway is not available or is disrupted, the DSB ends will be repaired by the aNHEJ pathway, resulting in error-prone large indels or chromosomal rearrangements. Indeed, in species with no cNHEJ pathway, the genomes are prone to chromosomal rearrangements via aNHEJ (Deng et al., 2018).
In the aNHEJ pathway, extensive resections of DSB ends are catalyzed by several resection nucleases including the MRE11-RAD50-NBS1 (MRN) complex (Nijmegen breakage syndrome protein 1 or nibrin). These resections are facilitated by CtBPinteracting protein (CtIP or RBBP8) and FANCD2 (Ceccaldi et al., 2016b;Chang et al., 2017;Shou et al., 2018). The resection exposes single-stranded DNA (ssDNA) overhangs that could be annealed by complementary base pairing. The annealed DSB ends are then ligated by XRCC1 and DNA ligase III of the aNHEJ pathway, generating indels (Chang et al., 2017). Thus, cNHEJand aNHEJ-mediated DNA repairs either join the DSB ends directly or modify them slightly, resulting in precise ligation or small indels, respectively ( Figure 3C). flexibly cleaves the non-complementary strand at the À6, À5, and À4 positions as well as the À3 position upstream of the PAM site, generating a diverse cohesive DSB ends with 1-, 2-, and 3-nt 5 0 overhangs in addition to blunt ends. (B) Diagram of one-metal-ion cleavage mechanism for HNH and two-metal-ion cleavage mechanism for RuvC domain of Cas9 protein. (C) Schematic of NHEJ repair pathways for repairing of a targeted DSB. NHEJ includes two competing pathways known as classic or canonical NHEJ (cNHEJ) and alternative NHEJ (aNHEJ). The cNHEJ pathway requires XRCC4 and DNA ligase IV. The aNHEJ pathway includes MMEJ. The cleaved DSB ends are ligated by cellular DNA repairing machineries using either the precise pathway of cNHEJ or the mutagenic pathway of MMEJ.

Random vs. non-random indels
Initial gene editing by CRISPR indicates that prevalent random indels are induced by Cas9 cleavage programmed with single sgRNAs in heterologous systems (Cho et al., 2013;Cong et al., 2013;Jinek et al., 2013;Mali et al., 2013). Similarly, random small indels at the junctions of chromosomal rearrangements-or at the Cas9 cleavage site for the so-called scarring-are also introduced by DNA fragment editing with Cas9 reprogrammed with dual sgRNAs (Canver et al., 2014;Kraft et al., 2015;Li et al., 2015a). These random indels likely result from the NHEJ repair pathway ( Figure 3C; Jiang and Marraffini, 2015;Huang and Wu, 2016).

Predictable deletions
When homologous sequences near the DSB ends generated by Cas9 with single sgRNAs are direct repeats, small deletions could be generated via the MMEJ pathway ( Figure 4B; McVey and Lee, 2008;Shou et al., 2018). Specifically, if resections expose short complementary sequences within 3 0 overhangs, they will form a DNA duplex and the 3 0 flap will be cleaved by flap endonuclease 1 (FEN1), resulting in predictable deletions ( Figure 4B; Iyer et al., 2019). Similarly, when direct repeats flank the two cleavage sites of Cas9 targeted by dual sgRNAs, the intervening sequences could be deleted via the MMEJ pathway ( Figure 4C; McVey and Lee, 2008;Shou et al., 2018).

Predictable nucleotide insertions at editing junctions
CRISPR-editing technologies are moving forward at lightning speed. It used to be thought of as uncontrollable or unpredictable but now is considered predictable through machine learning approaches. For example, base-editing outcomes have recently been shown to be predictable (Arbab et al., 2020). In this section, we focus on predictable nucleotide insertions based on the mechanistic understanding of cohesive or staggered Cas9 cleavage. In particular, the cohesive Cas9 cleavage mechanism has a profound impact on gene-editing outcomes of the CRISPR system in a wide variety of scenarios and species. If Cas9 cleavage ends with single-nucleotide 5 0 overhangs are filled in and ligated, it will result in duplications of the À4 nucleotide (Table 2). Similarly, if Cas9 cleavage ends with 2-nt overhangs are filled in and ligated, it will lead to repetition of the dinucleotide of the À5 and À4 positions ( Table 2). Finally, if Cas9 cleavage ends with 3-nt overhangs are filled in and ligated, it will produce repetition of the trinucleotide of the À6, À5, and À4 positions ( Table 2).

Predictable single-nucleotide insertions at single cutting sites
Extensive studies have shown that Cas9-mediated single-nucleotide insertions at repair junctions in budding yeast, mouse ESCs, mammalian cell lines, and mice are predictable ( Figure 4D; Chakrabarti et al., 2018;Kalhor et al., 2018;Lemos et al., 2018;Shen et al., 2018;Shou et al., 2018;Taheri-Ghahfarokhi et al., 2018;Chen et al., 2019;Gisler et al., 2019;Leenay et al., 2019). When Cas9 reprogrammed with single sgRNAs cleaves the non-complementary strand at the À4 position, it will generate two cohesive ends with 1-nt 5 0 overhangs, which could be filled-in by an unknown polymerase ( Figure 4D). The two filled-in DSB ends are then ligated directly, generating single-nucleotide insertion which is the duplication of the À4 nucleotide upstream of PAM ( Figure 4D).
This ligation mechanism is via the cNHEJ pathway since blocking XRCC4 results in a significant decrease of precise ligation in DNA fragment editing (Shou et al., 2018). In addition, knocking down of DNA ligase IV leads to a significant decrease of precise DNA-fragment-deletion efficiency, suggesting that cNHEJ is an error-free DNA repair pathway (Shou et al., 2018). Therefore, numerous cases of 1-bp insertions, which were reported as random insertions, actually result from Cas9 cohesive cleavage at the À4 position ( Table 2). For example, the Nana 'þ1' allele of CCR5 of the unethically edited baby (Ryder, 2018) is probably generated by cohesive Cas9 cleavage at the À4 position, resulting in two DSB ends with 1-nt 5 0 overhang, which are then filled in and ligated precisely ( Figure 4E). All in all, gene editing via Cas9 cohesive cleavage at the À4 position generates predictable 1-bp insertions ( Table 2).

Dinucleotide and trinucleotide insertions at single cutting sites
If Cas9 RuvC domain cleaves the non-complementary strand at the À5 or À6 position upstream of PAM, it will generate two cohesive DSB ends each with a dinucleotide or trinucleotide 5 0 overhang. After both of them get filled-in, these filled-in ends could be blunt-end ligated via the cNHEJ pathway. This will generate a dinucleotide or trinucleotide insertion, which is the tandem duplication of the dinucleotide or trinucleotide further upstream of the À3 position of PAM (Table 2; Figure 4F).

Prominent predictable nucleotide insertions at rearranged junctions of double cutting
Systematic analyses of the inserted nucleotides reveal predictable nucleotide insertions at the junctions of chromosomal  Figure 4 Mechanisms of precise and predictable CRISPR/Cas9 genome editing. (A) Precise chromosomal rearrangements by DNA fragment editing. cNHEJ-mediated precise DNA fragment deletion could be generated through direct ligation by XRCC4-DNA ligase IV of the two staggered or blunted DSB ends from Cas9 cleavage with NGG-CCN PAM configuration. In particular, perturbations of CtIP or FANCD2, two proteins involved in the aNHEJ pathway, enhance the cNHEJ-mediated precise DNA fragment deletion. (B) Predictable deletions. The cohesive rearrangements by Cas9 with dual sgRNAs (Table 3; Shou et al., 2018). Interestingly, the frequency of nucleotide insertions (1, 2, or 3 nt) is much higher at junctions of chromosomal rearrangements by double cutting than that by single cutting . The reason for the increased insertion frequency at rearranged junctions is that the ligated junctions of chromosomal rearrangement after Cas9 double cleavages cannot be recut. For single Cas9 cleavages, the two cohesive DSB ends are always complementary to each other ( Figure 3A). After annealing of the cohesive ends and ligation by cellular repair machineries, it will be recut by Cas9 programmed with the same sgRNA. By contrast, any two DSB ends from chromosomal rearrangements, which have distinct 5 0 overhangs, are rarely complementary to each other, and thus cannot be annealed and recut by Cas9 programmed with either of the two original sgRNAs.
There are barely any 2-or 3-bp insertions with Cas9 reprogrammed with single sgRNAs ( Figure 4F Figure 4F) is that the annealing efficiencies of 2-or 3-bp overhangs after Cas9 single cleavages are much higher than that of 1-bp overhangs, and thus the repaired 2-or 3-bp cohesive overhangs are more frequent to be recut. Overall, predictable nucleotide insertions are easily observed at junctions of chromosome rearrangements by Cas9 with dual sgRNAs (

Toward precise and predictable genome editing
In order to achieve precise and predictable genome editing, the Cas9 endonuclease effector needs first to be located precisely to a targeting site. Once targeted to a genome site, the Cas9 effector can make a predictable modification on the sequences of the targeting site. Novel derivative gene-editing systems such as base editing and prime editing are developed rapidly (Anzalone et al., 2020;Yang and Chen, 2020). The base-editing system is achieved by fusing dCas9 with a nucleobase deaminase such as cytidine deaminases of the APOBEC/ AID family or adenosine deaminase (Komor et al., 2016;Gaudelli et al., 2017). The prime-editing system is achieved by fusing H840A Cas9 with a reverse transcriptase and also fusing sgRNA with designed sequences functioning as a priming RNA template for reverse transcription, so-called prime-editing guide RNA or pegRNA (Anzalone et al., 2019). Both of these new gene-editing systems have advantages of precise editing without the requirement of DNA donor templates and DSBs. In this section, we focus only on precise and predictable genome editing derived from the mechanistic understanding of the Cas9 catalysis.

Factors influencing CRISPR genome editing
Various factors influence the complexity of DNA repair outcomes, including the type of DNA repair pathways chosen by host cells, the diversity of DSB ends from Cas9 cleavage, and the 3D genome sequence context surrounding the DSBs. In particular, inhibiting the aNHEJ pathway by knocking down its component proteins of CtIP or FANCD2 enhances precise DNA fragment deletion since cNHEJ and aNHEJ compete with each other for repair substrates ( Figure 3C; Shou et al., 2018). Conversely, overexpression of CtIP protein facilitates usage of the MMEJ pathway and results in predictable deletions ( Figure 4B; Nakade et al., 2018). In addition, interplays between structures of DSB ends and cellular repair protein machineries (resection nucleases, polymerases, and ligases) likely determine end-joining patterns. Indeed, DSB polarity influences repair outcomes at the editing junctions of Cas9-induced artificial class switching and translocations in human B cells (So and Martin, 2019). and blunted DSB ends could be resected by the MRN complex, resulting in 3 0 overhangs. This resection process could be facilitated by CtIP and FANCD2 proteins. Further resection by EXO1 and DNA2 nucleases exposes micro-homologous sequences in the vicinity of the cleavage site. Base-pairing between the microhomologous sequences and removal of the two 3 0 overhanging flaps by FEN1 generate predictable deletions. (C) Large DNA fragment deletion could also be achieved by MMEJ. When there are direct repeats flanking the two cleavage sites by Cas9 with dual sgRNAs, MMEJ-mediated repair could induce deletion of the intervening sequences between the two direct repeats (rather than between the two cleavage sites through cNHEJ repair pathway). (D) Predictable single-nucleotide insertions. Cleavage at the À4 position by Cas9 generates cohesive DSB ends with 1-nt 5 0 overhangs. Fill-in and ends ligation by cellular repair machineries result in predictable 1-bp insertions, which are the duplication of the À4 nucleotide. (E) The Nana 'þ1' allele of the human CCR5 gene in the CRISPR-edited baby probably results from cohesive Cas9 cleavage at the À4 position of the non-complementary strand. (F) Predictable dior tri-nucleotide insertions. Cleavage at the À5 (or À6) position by Cas9 generates cohesive DSB ends with 2-nt (or 3-nt) 5 0 overhangs. Fillin and ends ligation by cellular repair machineries result in predictable 2-bp insertions, which are the duplication of dinucleotide from the À5 and À4 positions. Thus, nucleotide insertions mediated by Cas9 reprogrammed with single sgRNAs manifest as tandem repeats. Finally, nucleotide insertions mediated by Cas9 reprogrammed with dual sgRNAs at various junctions of chromosomal rearrangements are generated by filled-in of cohesive DSB ends. (G) Predictable DNA fragment inversion. Large DNA fragment inversion could also be achieved by MMEJ. When there are microhomologous inverted repeats flanking the cleavage sites by Cas9 with dual sgRNAs, MMEJ-mediated repair can induce predictable inversion of the intervening sequence between the inverted repeats (rather than between two cleavage sites through cNHEJ repair pathway).    Mechanism for computer programs of machine learning Precise and predictable Cas9-mediated genome editing could be achievable through machine learning. For example, computer programs with machine learning algorithms have been recently developed to predict repair outcomes and to achieve predictable genome editing (Allen et al., 2018;Shen et al., 2018;Chen et al., 2019;Leenay et al., 2019). Specifically, with editing using SpCas9 with the PAM site of NGG, the presence of a nucleotide of 'T' or 'A' at the À4 position tends to result in more predictable 1-bp insertions. In contrast, the presence of a nucleotide of 'G' at the À4 position tends to generate more predictable deletions. The reason for this deletion preference is related to microhomology between the 'G' at the À4 position and the N'GG' of the PAM site .

Predictable MMEJ-mediated DNA fragment inversion
Short inverted repeats flanking the two cleavage sites induce microhomology-mediated inversion of the intervening sequences. Namely, when homology sequences near the DSB ends are inverted repeats, the intervening sequences can be inverted via the MMEJ pathway ( Figure 4G; McVey and Lee, 2008;Li et al., 2015a). Therefore, MMEJ-mediated precise DNA fragment editing may be predicted from microhomologous sequences around the two cleavage sites.

Toward predictable chromosomal rearrangements
Cas9 programmed with dual sgRNAs induces predictable junctional insertions of DNA fragment editing since specific PAM configurations can generate distinct combinations of DSB ends from cohesive Cas9 cleavages ( Figure 5; Shou et al., 2018). For example, in the NGG-NGG PAM configuration, the flexible cleavage profile of Cas9 with sgRNA2 can be obtained by sequencing rearranged junctions of DNA fragment deletion. Similarly, the flexible cleavage profile of Cas9 with sgRNA1 can be obtained by sequencing rearranged junctions of DNA fragment duplication. The nucleotide insertions at the downstream junctions of DNA fragment inversion can be easily predicted by the combined cleavage profiles of both sgRNAs ( Figure 5A). Note that the upstream junctions of DNA fragment inversion for the NGG-NGG PAM configuration are always precise ( Figure 5A). Similarly, the rearranged junctions of DNA fragment deletion ( Figure 5B), the downstream junctions of DNA fragment inversion ( Figure 5C), and the rearranged junctions of DNA fragment duplication ( Figure 5D) are always precise for the NGG-CCN, CCN-CCN, and CCN-NGG PAM configurations, respectively. In addition, the nucleotide insertions at rearranged junctions of DNA fragment duplication, the upstream junctions of DNA fragment inversion, and the rearranged junctions of DNA fragment deletion are predictable for the NGG-CCN, CCN-CCN, and CCN-NGG PAM configurations, respectively ( Figure 5B-D). Understanding the mechanisms of chromosomal rearrangements will facilitate precise and predictable CRISPR DNA fragment editing.

Chromosomal rearrangement mechanisms in the context of 3D genome
After Cas9 cleavage, the histone H2AX within nucleosomes located in the regions flanking the DSB ends is phosphorylated by the ATM kinase, generating cH2AX (Iacovoni et al., 2010;Lee et al., 2014a). Interestingly, a recent study showed that Cas9 is a genome mutator and induces cH2AX accumulation . In addition, long-distance chromatin interactions are increased within the cH2AX chromatin domains (Aymard et al., 2017). However, whether these increased chromatin interactions influence the form of the so-called 'DNA repair foci' needs further exploration (Marnef and Legube, 2017).
Several recent studies have shown that CTCF participates in DSB repair through its interaction with the repair proteins of BRCA2, RAD51, Mre11, and CtIP (Han et al., 2017;Hilmi et al., 2017;Lang et al., 2017;Hwang et al., 2019). In addition, cohesin inhibits distal DSB end joining (Gelot et al., 2016). Because CTCF and cohesin are known prominent 3D genome architecture proteins (Merkenschlager and Nora, 2016), the recruitment of CTCF and its associated cohesin complex to the regions around DSB ends suggests that 3D genome architecture is closely related to DNA DSB repair.

3D motility of DSB ends in the nuclear space
In order to repair and ligate Cas9-induced DSB ends, they need to be brought into close spatial contact in the 3D nuclear space. Nuclear actin may play an important role in DSB motility required for both HR and NHEJ repairs (Caridi et al., 2018). Clustering of DSB ends and formation of a macro-repair center may be a prerequisite for proper chromosomal rearrangements by DNA fragment editing (Jasin and Rothstein, 2013;Aymard et al., 2017). Precise CRISPR DNA fragment editing and predictable 3D genome engineering | 845 Table 3 Predictable nucleotide insertions by cohesive Cas9 cleavage with dual sgRNAs. Toward precise and predictable 3D genome editing: from 1D to 3D The higher order chromatin structure is highly dynamic and is regulated by epigenetic processes of DNA methylation, histone modification, and chromatin remodeling, ensuring proper cellular processes such as DNA replication, RNA transcription, and DNA damage repair in response to developmental or physiological signals (Dekker and Mirny, 2016;Hansen et al., 2018;Bickmore, 2019). Structural variations or chromosomal rearrangements affect 3D genome organization and gene expression. Editing of higher order chromatin structures or engineering chromosomal rearrangements to model genome structural variations not only sheds light on the fundamental mechanisms of 3D genome folding but also contributes to our understanding of aberrant 3D genome folding in human diseases (Wang et al., 2019b). Specifically, 3D genome engineering may pave the way to understanding vast GWAS data and CRISPR correction of aberrant alleles may lead to human disease therapy in the future (Qian et al., 2019).
Proximity ligation-based chromosome conformation capture (3C) technologies, in conjunction with high-throughput nextgeneration sequencing, have led to tremendous progress in understanding 3D genome architecture (Dekker et al., 2002;Rao et al., 2014;Liu et al., 2017a;Tan et al., 2019;reviewed in Denker and de Laat, 2016;Zheng and Xie, 2019). In addition, fluorescence-labeled single-molecule imaging with superresolution microscopy has shed significant light on the mechanisms of genome folding (Hansen et al., 2018;Sigal et al., 2018). Although genetic methods have long been used to investigate the position-effects variegations of chromatin organization (Lewis, 1950;McClintock, 1950), they have not been widely used to probe 3D genome organization compared to various chromosome conformation capture (3C, 4C, 5C, 6C, 7C, Hi-C, capture-C, etc.) 'C' technologies and imaging methods.

General principles of 3D genome organization
The 3D genomes in the nuclear space are thought to be assembled in a hierarchical manner composed of successive chromosomal territories, compartments or clustering regions, TADs or topological domains, and chromatin loops (Dekker and Mirny, 2016;Dixon et al., 2016;Bickmore, 2019). Briefly, each interphase chromosome occupies a distinct territory. Within a chromosome territory, chromatin fibers are segregated into active and inactive compartments with distinct histone modifications. Chromatin compartments are further divided into TADs or topological domains which are thought to be enriched in long-  distance chromatin contacts or loops (Bonev and Cavalli, 2016). Emerging evidence suggests, however, that chromosome compartments are smaller than previously thought and could be the consequences of gene activity (Rowley and Corces, 2018). Nevertheless, chromatin loops are fundamental units of the higher order chromatin structures.
CRISPR DNA fragment inversion reveals that the locations and relative orientations of CTCF sites determine the directionality of chromatin looping Inversion of CTCF sites in the protocadherin alpha (Pcdha) and b-globin clusters switches the directionality of chromatin looping (Guo et al., 2015;Shou et al., 2018;Jia et al., 2020). Specifically, the causality between orientation of mammalian insulators known as CTCF sites and directionality of longdistance chromatin looping is demonstrated by inverting CTCF sites using CRISPR DNA fragment-editing methods ( Figure 6A; Guo et al., 2015;Shou et al., 2018;Lu et al., 2019;Jia et al., 2020). In addition, haplotype variants that alter chromatin looping topology are linked to human disease risks (Tang et al., 2015). In the Sox2 and Fbn2 loci, however, reinserting an inverted CTCF site in the original location does not form new chromatin loops (de Wit et al., 2015). Nevertheless, alterations of native chromatin loops have functional consequence on gene expression (de Wit et al., 2015;Guo et al., 2015). Moreover, genome-wide distributions of forward and reverse CTCF sites tend to be located in close 3D spaces (Rao et al., 2014;Guo et al., 2015). Thus, the relative orientations of CTCF sites determine the directionality of chromatin looping across mammalian genomes ( Figure 6A). Specifically, there are strong long-distance chromatin interactions between forward and reverse convergent CTCF sites. However, there are weak longdistance chromatin interactions between two tandem CTCF sites in the same orientation. Finally, the configuration of reverse and forward CTCF sites constrains long-distance chromatin interactions between remote elements ( Figure 6A). In summary, 3D genome structures could be predicted from 1D nucleotide sequences based on this CTCF-coding mechanism.

Mechanism of 3D genome folding by cohesin 'loop extrusion'
The CTCF coding for the 3D genome could be explained by CTCF blocking of cohesin 'loop extrusion' along chromatin fibers (Guo et al., 2015;Nichols and Corces, 2015;Sanborn et al., 2015;Fudenberg et al., 2016;Merkenschlager and Nora, 2016;Li et al., 2020b). The current model for the formation of TADs or topological domains is the cohesin sliding-mediated 'loop extrusion' (Banigan and Mirny, 2020). Specifically, CTCF helps to establish TADs boundaries by stalling the sliding of cohesin on DNA fibers and thus facilitates chromatin loop formations by 'two-headed' cohesin complex . Therefore, the cohesin complex can bring distant DNA elements into close spatial contact by the socalled active 'loop extrusion', which requires ATP as an energy source (Davidson et al., 2019;Kim et al., 2019). The genome-wide colocalization of CTCF and cohesin as well as a strong tendency of long-distance chromatin interactions between forward-reverse convergent CTCF sites provide strong evidence for CTCF stalling of cohesin 'loop extrusion' (Parelho et al., 2008;Wendt et al., 2008;Rao et al., 2014;Guo et al., 2015). In addition, consistent with the model of cohesin 'loop extrusion', deletion of WAPL, a cohesin releasing factor, thus increasing cohesin enrichments on chromatin, results in a significant increase of TAD size (Gassler et al., 2017;Haarhuis et al., 2017;Wutz et al., 2017). Conversely, deletion of NIPBL, a cohesin loading factor, or deletion of cohesin directly, causes weakening or loss of chromatin loops (Rao et al., 2017;Schwarzer et al., 2017).

Asymmetric reeling of chromatin fibers by cohesin 'loop extrusion'
In the Pcdh gene clusters, a large array of tandem forward CTCF sites in the variable region is followed by tandem reverse CTCF sites in the downstream super-enhancer (Guo et al., 2012;Zhai et al., 2016). CTCF/cohesin-dependent long-distance chromatin interactions bridge the distal enhancer to its target promoters and activate transcription. The reverse CTCF sites in the downstream super-enhancer act as a strong anchor to stall 'one-head' of cohesin complex. The other cohesin head still slides along the variable region and thus reels in chromatin fibers ( Figure 6B). By inverting or deleting single or arrays of CTCF sites in the variable-promoter or super-enhancer regions of the clustered Pcdh genes and assaying the resulting architectural and functional consequences, asymmetric topological effects of long-distance chromatin contacts and disruption of Pcdh gene expression can be detected Jia et al., 2020).

Topological selections of enhancer-promoter pairing
Genome-editing technologies have facilitated our understanding of 3D chromatin architecture in specific enhancerpromoter contacts (reviewed in Schoenfelder and Fraser, 2019). CTCF/cohesin-mediated chromatin looping regulates the promoter selection of the Pcdh gene clusters and their neuronspecific expression patterns (Guo et al., 2012;Jiang et al., 2017;Allahyar et al., 2018;Wu et al., 2020). Specifically, the chromatin conformation capture 3C assay revealed that the enhancer element is spatially close to the promoter of the variable exon in the Pcdh gene cluster. In addition, the CTCF protein recognizes its conserved DNA-binding sites with directionality (Guo et al., 2015;Yin et al., 2017;Xu et al., 2018). Finally, single CTCF sites function as traditional insulators to ensure proper activation of target promoters by cognate enhancers; while tandem CTCF sites function as topological insulators to balance spatial chromatin contacts and to allocate enhancer resources for promoter choice (Zhai et al., 2016;Jia et al., 2020;Wu et al., 2020).

Synthetic single-chromosome yeast
Double cutting by Cas9 guided by two sgRNAs, each targeting to a site close to the telomeres of two separate yeast chromosomes, leads to the fusion of the two chromosomes (Shao et al., 2018). Remarkably, a functional single-chromosome yeast was created by successive repeated fusions of all 16 yeast chromosomes into one giant chromosome by this CRISPR double cutting method (Shao et al., 2018). The two ends of the single linear chromosome could be further fused to generate a single circular chromosome (Shao et al., 2019). Apparently, both linear and circular single-chromosome yeasts have not been found in nature and thus are artificially synthesized yeast strains. This interesting observation indicates the power of targeted 3D genome engineering in synthetic biology by CRISPR with dual sgRNAs (Sadhu and Kruglyak, 2018).

3D genome synthetic biology
Programmed chromosomal fission and fusion by multiplexed CRISPR have generated synthetic genomes with nucleotide precision in bacteria (Wang et al., 2019a). In prokaryotic Escherichia coli, artificial chromosomes in single cells can be fused into a single genome with precise translocation and scarless inversion (Wang et al., 2019a). In eukaryotic yeast, Hi-C experiments revealed that the large-scale 3D organization of the synthetic genome is unaffected by the removal of numerous repeated sequences (Mercy et al., 2017). Interestingly, Hi-C  Figure 6 Predictable 3D genome engineering. (A) CTCF coding from 1D genomic sequences to 3D genome organization. The topology and strength of chromatin loops can be predicted based on the locations and relative orientations of CTCF sites. (B) Schematic of asymmetric 'loop extrusion' model revealed by CRISPR inversion of boundary CTCF sites. Genetic manipulation of CTCF sites demonstrates asymmetric blocking of cohesin loop extrusion by directional CTCF binding to oriented CBS elements. Chromatin fibers are compacted by active cohesin 'loop extrusion' with 'two heads'. Cohesin complex reels in chromatin fibers until anchored by oriented CTCF sites. If 'one head' of cohesin is anchored by CTCF sites, cohesin can continue to reel in chromatin fibers through the 'other head', resulting in so-called asymmetric 'loop extrusion '. experiments demonstrated that the single linear-chromosome and circular-chromosome yeasts have similar globular 3D genome conformation (Shao et al., 2019). These studies suggest that global 3D genome structures have significant plasticity and can tolerate local genetic perturbations.

Perspective
We have sampled flavored highlights of some recent advances of genetic engineering of 3D genomes by CRISPR/Cas9 systems with various precise chromosomal rearrangements. Significant progress has been made recently in understanding the cleavage mechanisms of the CRISPR/Cas9 genome-editing system (Chen and Doudna, 2017). In addition, rapid technological advances in predictable DSB repair outcomes of precise CRISPR DNA fragment editing may accelerate its applications in agriculture and biomedicine (Tang and Fu, 2018). Furthermore, recent multiplexing CRISPR epigenetic technologies inform and promise cross-disciplinary revolutions (McCarty et al., 2020). Finally, CRISPR off-targets remain a big challenge but detecting methods are improving rapidly (Wienert et al., 2019).
Genetic engineering of 3D genomes and predictable chromosomal rearrangements by DNA fragment editing require interdisciplinary research. Obviously, fully predictable 3D genome engineering has not been achieved despite rapid progress in precise CRISPR DNA fragment editing in the last few years. Because very little is known in this area, it is a typical genre of desert-wandering night science that is full of darkness but also may stumble into a gold mine if lucky. 3D genomics integrates live biology with physical geometry. Renaissance of understanding and designing 3D genomes in the future may turn this night science into hypothesis-driven day science. Understanding the mechanisms of 3D genome folding will facilitate future precise and predictable CRISPR DNA fragment editing.