CRISPR/Cas9-induced site-specific DNA double-strand breaks (DSBs) can be repaired by homology-directed repair (HDR) or non-homologous end joining (NHEJ) pathways. Extensive efforts have been made to knock-in exogenous DNA to a selected genomic locus in human cells; which, however, has focused on HDR-based strategies and was proven inefficient. Here, we report that NHEJ pathway mediates efficient rejoining of genome and plasmids following CRISPR/Cas9-induced DNA DSBs, and promotes high-efficiency DNA integration in various human cell types. With this homology-independent knock-in strategy, integration of a 4.6 kb promoterless ires-eGFP fragment into the GAPDH locus yielded up to 20% GFP+ cells in somatic LO2 cells, and 1.70% GFP+ cells in human embryonic stem cells (ESCs). Quantitative comparison further demonstrated that the NHEJ-based knock-in is more efficient than HDR-mediated gene targeting in all human cell types examined. These data support that CRISPR/Cas9-induced NHEJ provides a valuable new path for efficient genome editing in human ESCs and somatic cells.
Zinc-finger nucleases (ZFNs) (1), transcription activator-like effector nucleases (TALENs) (2) and bacterial clustered regularly interspaced short palindromic repeats (CRISPR)-associated protein 9 (Cas9) system (3) have achieved great success in introducing site-specific DNA double-strand breaks (DSBs) with high accuracy and efficiency. They have been developed into versatile tools to introduce a broad range of genomic modifications, such as targeted mutation, insertion, large deletion or gene knock-out, in various prokaryotic, eukaryotic cells and organisms (4). Among these tools, CRISPR/Cas9 has rapidly gained popularity due to its superior simplicity (5,6). In this system, a single guide RNA (sgRNA) complexes with Cas9 nuclease, which can recognize a variable 20-nucleotide target sequence adjacent to a 5′-NGG-3′ protospacer adjacent motif (PAM) and introduce a DSB in the target DNA (7,8). The induced DSB then triggers DNA repair process mainly via two distinct mechanisms, namely, the non-homologous end joining (NHEJ) and the homology-directed repair (HDR) pathways.
The NHEJ pathway repairs DNA DSBs by joining the broken ends through a homology-independent mechanistically flexible process, which often results in random small insertions or deletions (indels) (9). Thus, CRISPR/Cas9-introduced DNA cleavage followed by NHEJ repair has been exploited to generate loss-of-function alleles in protein-coding genes (10). In contrast, the HDR pathway mediates a strand-exchange process to repair DNA damage accurately based on existing homologous DNA sequences (11). Utility of this repair mechanism enables intentional replacement of endogenous genome segments with plasmid sequences, allowing targeted DNA insertion into genome and precise genetic modification in living cells. CRISPR/Cas9-introduced site-specific DNA cleavage greatly promotes HDR at nearby regions and enhances the efficiency of HDR-based gene targeting (12).
In human cells, efficient knock-in of foreign DNA into a selected genomic locus has been long awaited. It is anticipated to facilitate various applications, ranging from gene function study to therapeutic genome editing. Currently, most studies have focused on HDR-based strategies, and the rate of targeted integration was reported to be low (13). This is because HDR in human cells is intrinsically inefficient, whereas NHEJ-mediated DNA repair is prevalent (14). These properties result in generation of few target clones amid a large number of random integrations. Notably, in human embryonic stem cells (ESCs) (15) and induced pluripotent stem cells (iPSCs) (16), which are pluripotent and possess unprecedented potentials for basic research and cell-based therapies (17), gene targeting via HDR is found to be particularly difficult and has impeded the application of these cells (18,19). Even in the presence of ZFN, TALEN or CRISPR/Cas9, the efficiency of HDR-based gene targeting in human pluripotent stem cells is found to be consistently low (20,21). In a recent study by Merkle et al., the efficiency of CRISPR/Cas9-induced HDR-mediated knock-in was estimated to be around 1 × 10−5 without pre-selection (19). Hence, technical expertise for sophisticated selections and cumbersome screening of a large number of clones are required to obtain genetically modified cells (19,21–23).
To date, it still remains unclear whether the extremely low efficiency of HDR is a feature unique to human pluripotent stem cells. Furthermore, it has not been investigated whether the prevalent NHEJ repair can be employed to mediate high-efficiency knock-in in a wide range of human cells, especially in ESCs. In order to address these questions, we constructed a universal reporter system, by targeting the GAPDH locus in human genome with a promoterless fluorescent reporter. Through systematic investigation into the potentials of both HDR and NHEJ repair in mediating CRISPR/Cas9-induced reporter integration, we demonstrated that CRISPR/Cas9-induced NHEJ can mediate reporter knock-in more efficiently than HDR-based strategy, in various human cells types including human ESCs. This finding paves a new path for efficient genome editing in human ESCs and somatic cells, and it offers a great potential in their subsequent applications.
MATERIALS AND METHODS
Cas9 and sgRNA constructs
The human codon-optimized Cas9 (Addgene # 41815) and nickase Cas9D10A (Addgene # 41816) plasmids were obtained from Addgene (8). sgRNAs were designed and constructed as described previously (8,24). Briefly, target sequences (20 bp or 17 bp) starting with guanine and preceding the PAM motif (5′-NGG-3′) were selected from the target genomic regions (8,25). Potential off-target effects of sgRNA candidates were analyzed using the online tool CRISPR Design developed by Zhang's laboratory (http://crispr.mit.edu/), and the sgRNA sequences with fewer off-target sites in human genome were selected for further analysis. Target sequences of sgRNAs used in this study are shown in Supplementary Table S1, and the potential off-target sites for sg-1–4 were listed in Supplementary Table S2.
Various donor plasmids were constructed. Details of the cloning work were provided in Supplementary Data.
LIG4 overexpression construct
Human LIG4 cDNA was amplified by RT-PCR from the RNA extracted from wild type LO2 cells, and cloned into pCAG-ires-Hyg vector at the BglII and XhoI sites (26). Primers used were listed in Supplementary Table S4.
H1 human ESCs (WiCell Research Institute) were cultured as previously described (27), on mitomycin-C-inactivated MEF feeders. Prior to transfection, H1 human ESCs were cultured feeder-free in mTeSR1 medium (Stemcell Technologies), on Matrigel (BD Biosciences). Medium was changed daily and cells were subcultured with collagenase IV (Life technologies) every three days (27). Human somatic cell lines were obtained from ATCC. LO2 and HEK293T cells were cultured in Dulbecco's modified Eagle's medium (DMEM) supplemented with 10% fetal bovine serum (FBS); SMMC-7721, BEL-7402, BEL-7404 and H1299 cells were cultured in Roswell Park Memorial Institute (RPMI) 1640 supplemented with 10% FBS; HK2 cells were cultured in 1:1 F-12/DMEM medium supplemented with 10% FBS; and HCT116 cells were cultured in McCoy 5A medium supplemented with 10% FBS. All media and sera were purchased from Life Technologies. All cells were incubated at 37°C and 5% CO2.
Generation of LIG4 null LO2 cells
Wild type LO2 cells were co-transfected twice with Cas9 together with combined sgLIG4-i–iv. The transfected cells were dissociated into single cells and seeded at low density (2000 cells/10cm dish) for clonal expansion. Individual clones were then isolated and analyzed by genome PCR and western blot (details were provided in Supplementary Data). Primers used are shown in Supplementary Table S4.
Transfection and gene targeting assays
H1 human ESCs were transfected using Amaxa nucleofection (Lonza) according to the manufacturer's instructions. Briefly, human ESCs were dissociated using TrypLE into single cells. For each transfection, 5 × 106 cells were mixed with 100 μl pre-warmed nucleofection reagents (82 μl solution-1 and 18 μl solution-B); the cell suspension was then mixed with 16 μg DNA (6 μg donor + 6 μg Cas9 + 4 μg sgRNA) and electroporated. Electroporated H1 human ESCs were cultured on inactivated MEF feeders (27). Medium was changed daily for 4–5 days and cells were dissociated to prepare single cells for FACS analysis. The estimated transfection efficiency was around 53.5% using 16 μg pEGFP-N1 plasmid.
LO2, HEK293T and HCT116 cells were transfected using Lipofectamine 2000 (Life Technologies). SMMC-7721, BEL-7402, BEL-7404, H1299 and HK2 cells were transfected using FuGENE HD (Promega). Cells were seeded into 12-well plates at a density of 5 × 105 cells/well. 1.6 μg DNA (0.6 μg donor + 0.6 μg Cas9 + 0.4 μg sgRNA) and 4 μl Lipofectamine 2000 or 6 μl FuGENE HD were used for transfection into each well, following the manufacturer's instruction. When more than one sgRNA were used, 0.4 μg of total sgRNAs, divided equally by the number of plasmids, was added. In regard of LIG4 rescue assays in Figure 3B, additional 0.6 μg LIG4 cDNA overexpression plasmid was combined with the 0.6 μg donor + 0.6 μg Cas9 + 0.4 μg sgRNA, and 5.5 μl Lipofectamine 2000 was used for the transfection. The transfected cells were passaged once or twice before FACS analysis (BD LSRFortessa Cell Analyzer). Transfection efficiency in these cell lines was estimated by transfection of 1.6 μg pEGFP-N1 plasmid followed by FACS analysis after 48 h.
Quantification of HDR-mediated knock-in in various human cells
To directly quantify and compare the efficiency of CRISPR/Cas9-induced HDR-mediated DNA integration (HDR-targeting) across human ESCs and different somatic cell types, we constructed a reporter system targeting the GAPDH locus in human genome. Three sgRNAs (sg-1–3) were designed to target the GAPDH 3′-UTR in close proximity to the coding sequences (CDS); while a common donor plasmid was generated to carry a promoterless 2a-copGFP sequence flanked by two homology arms (HAs) at each end, thus named 2a-copGFP(+HAs) donor (Figure 1A and Supplementary Figure S1A). When the DSBs induced by Cas9/sg-1–3 are successfully repaired via homologous recombination between the genome and donor template, the 2a-copGFP fragment will be inserted in frame with the genomic GAPDH CDS and result in GFP expression (Figure 1A). This allows direct assessment of the knock-in efficiency by FACS analysis.
Consistent with previous studies (19,23), we observed HDR-mediated reporter integration at a low frequency in H1 human ESCs. In the absence of either sgRNAs or Cas9, no GFP+ cells were detected within 105 cells. When the 2a-copGFP(+HAs) donor plasmid was co-transfected with Cas9 and sg-1, 2 or 3, GFP+ cells were observed at frequencies of 0.17–0.36% (Figure 1B) with a transfection efficiency at 53.5% (Supplementary Figure S1B). Variation in targeting efficiencies may reflect the intrinsic properties of different sgRNAs, as indicated by T7E1 assays (Supplementary Figure S1C).
On the other hand, the analysis using this reporter system revealed varied but generally higher efficiencies of CRISPR/Cas9-induced HDR-mediated knock-in in human somatic cell lines. In the presence of Cas9/sg-1, immortalized human cells LO2 showed a targeting efficiency at 5.97%, whereas HK2 and HEK293T cells produced 1.61% and 1.80% GFP+ cells (Figure 1C). Among the human cancer cell lines examined, BEL-7402, BEL-7404 and SMMC-7721 exhibited a targeting efficiency of 1.87%, 1.49% and 4.43% respectively; while H1299 and HCT116 produced 1.60% and 2.59% GFP+ cells, respectively (Figure 1C). Genome PCR and sequencing analysis of the sorted GFP+ cells showed that the 2a-copGFP indeed integrated precisely at the GAPDH 3′-UTR in the genome (Figure 1D,E), which supported that the targeting processes were mediated by the HDR pathway. Transfection efficiency in these somatic cell lines ranged from 50.0% to 73.5% (Supplementary Figure S1D); while frequency of indels induced by Cas9/sg-1, which indicates its genome-targeting activity, ranged from 6.8% to 50.1% in different cell lines (Supplementary Figure S1E). No apparent correlation was observed among the transfection efficiencies, Cas9/sg-1 targeting activities and HDR-mediated knock-in efficiencies. Compared to the fully functional Cas9, nickase Cas9D10A induced reporter knock-in at a lower efficiency (Figure 1C).
Together, these data showed that the HDR-mediated DNA integration occurred at varied frequencies in different human cell types; the knock-in frequency was indeed lower in human ESCs than that in somatic cells, by approximatively 10–20 fold.
Homology-independent knock-in via CRISPR/Cas9-induced NHEJ repair
To explore the potential of CRISPR/Cas9-induced NHEJ in mediating DNA integration (non-homology (NH)-targeting), we constructed two donor plasmids that carry promoterless ires-eGFP, but without homology sequences to the GAPDH locus (Figure 2A). In these NH-donors, we inserted a single sgRNA (sg-A) target site at 5′ of ires-eGFP (single-cut donor), or two sg-A sites at both sides of ires-eGFP (double-cut donor), to introduce cleavage for desired integration and to generate ires-eGFP fragments in different lengths (Figure 2A). The ires element was used to bypass any frameshift caused by NHEJ-introduced indels and to ensure GFP expression after reporter integration.
We co-transfected these NH-donor plasmids with Cas9/sg-A/sg-1, 2 or 3 into LO2 cells. Intriguingly, we detected a high frequency of reporter insertion when the single-cut donor was used. GFP+ cells were detected in the presence of sg-1, 2 or 3 at a frequency of 16.41%, 20.99% and 15.05%, respectively (Figure 2B, top row). The targeting efficiencies decreased with all sg-1–3 when the double-cut donor vector was used to produce a shorter ires-eGFP fragment (Figure 2B, middle row). Importantly, no obvious reporter knock-in could be detected in the absence of either sg-1–3 or sg-A (Figure 2B, left column and bottom row), or when nickase Cas9D10A was used to introduce single strand breaks (SSBs; Figure 2C). This indicated that, unlike HDR-based knock-in, site-specific DSBs in both genome and donor DNAs are stringently required for reporter knock-in at a selected genomic locus via the NH-targeting.
PCR analysis of GFP+ cells produced with the single-cut donor verified the integration of ires-eGFP fragment together with vector backbone at the GAPDH 3′-UTR in the genome (Figure 2D, left panel). Similarly, in the GFP+ cells produced with double-cut donor, PCR analysis confirmed the genomic insertion of the short ires-eGFP fragment, which was located between the two sg-A target sites (Figure 2D, right panel). Sequencing analysis of integration junctions in both types of GFP+ cells confirmed the cleavage by specific sgRNAs as well as the rejoining between genome and donor templates at the cleavage sites (Figure 2E and Supplementary Figure S2), suggesting that the integrations indeed occurred at Cas9/sgRNA-induced DSB sites.
To explore whether this NH-targeting approach could produce stable knock-in clones at high efficiency, we expanded the cells transfected with single-cut NH-donor/Cas9/sg-A/sg-2 and 3 at a low density. Among the colonies raised from the unsorted cells, we observed pure GFP+ clones (Supplementary Figure S3A). Among 90 clones randomly isolated from the cells transfected with sg-2, 13 were found to be GFP+ (14.44%). PCR and sequencing analysis confirmed that these clones indeed carried the correct reporter knock-in in their genomes (Supplementary Figure S3B,C), suggesting a success in generating stable knock-in clones without any pre-selection.
To further uncover the molecular basis underlying these homology-independent reporter integrations, we generated DNA ligase IV (LIG4) knock-out LO2 cells by deleting large pieces of the LIG4 CDS using Cas9/sgRNAs (Figure 3A). In the two LIG4 knock-out clones examined, we observed drastic decrease of reporter knock-in after transfection with the single-cut NH-donor/Cas9/sg-A/sg-2, as compared to that in wild type LO2 cells (Figure 3B, left panel, top row). Moreover, the decrease of NH-targeting in these LIG4 null cells could be rescued by a plasmid carrying LIG4 overexpression cassette (Figure 3B, left panel, bottom row). Consistent with the recent studies by Maruyama et al. (28) and Chu et al. (29), we also observed a significant increase of the HDR-based knock-in of 2a-copGFP reporter in these LIG4 null cells (Figure 3B, right panel), which correlated with the loss of NHEJ activity. Collectively, these data showed that the homology-independent reporter integrations observed were indeed largely mediated by the conventional DNA ligase IV-dependent NHEJ pathway.
NHEJ-mediated knock-in is non-directional and it accommodates large DNA inserts
Next, we speculated that the linearized NH-donor or fragments might also integrate via NHEJ repair in the reverse direction and result in no GFP expression (Supplementary Figure S4A). Moreover, the cleavages at both sg-A target sites in the double-cut donor likely produced two fragments, which might compete for genomic integration and lower the efficiency of GFP+ integration (Supplementary Figure S4A, right panel). PCR analysis indeed confirmed the presence of these non-GFP expressing integrations (Supplementary Figure S4B). The detection of non-GFP expressing integrations in the sorted GFP+ cells, which carried the correct reporter knock-in in at least one allele, suggested that different integrations might occur at the two genomic alleles in a single cell. These data indicated that NHEJ-mediated knock-in is non-directional and non-selective, and GFP+ cells observed represented only a portion of cells that carried DNA integrations. This also explained why a lower rate of reporter knock-in was observed when the double-cut donor was used. Given that single-cut donor/Cas9/sg-A/sg-2 produced 20.99% GFP+ cells in LO2 cells (Figure 2B), together with non-GFP expressing events, the total frequency of NHEJ-mediated integration at the single target site deduced might reach up to 40%.
Off-target effect is a general concern to all CRISPR/Cas9-based technology (30). Because of the homology-independent and non-directional nature, the NHEJ-mediated knock-in approach faces a higher chance to introduce DNA insertion at an off-target site than the HDR approach does. To evaluate the off-target effect, we searched for the potential off-target sites that contain ≤ 2 mismatches to the used sgRNAs, throughout the entire human genome (hg19). We found no strong off-target site for sg-A. For sg-1, 2 and 3 targeting GAPDH, we identified 15, 14 and 6 potential off-target sites respectively, and none of these off-targets are located in an exon of a known transcript (Supplementary Table S2). We further selected the top 3 off-targets of sg-2, and performed PCR analysis on off-target integrations. Among the 90 single-cell clones that were expanded previously, none were found to carry reporter integration at the off-target site #1, while integration at off-target site #2 and #3 were found in two and three clones, respectively. Compared with the number of correct knock-in clones obtained (13 out of 90; Supplementary Figure S3), these results indicated that off-target integrations might occur during the NHEJ-mediated knock-in, but at a much lower frequency than the on-target insertion.
Furthermore, we examined whether the NHEJ-mediated knock-in could accommodate a larger insert. We constructed new plasmids named 12k and 34k NH-donors, by inserting the promoterless ires-eGFP reporter together with the 5′ sg-A target sequence into a large PiggyBac vector (12 kb) and an adenoviral vector (34 kb), respectively. These donors can be cleaved at the sg-A target sequence upon the co-transfection with Cas9/sg-A, thus providing linear donors that carry the ires-eGFP in a 12 kb or 34 kb backbone for NHEJ-based knock-in. After co-transfection with the Cas9/sg-A/sg-2, we detected 7.49% GFP+ cells with the 12k NH-donor, and 1.18% with the 34k NH-donor (Figure 3C, upper panel). Together with the 20.99% GFP+ cells observed using the single-cut NH-donor (4.6 kb; Figure 2B), it was apparent that the knock-in frequencies decreased when larger donors were used. This might be caused, at least partially, by the reduced transfection efficiencies of the larger plasmids (Figure 3C, lower panel). PCR analysis of the transfected cells further confirmed the correct knock-in of these large donors at the GAPDH locus (Figure 3D).
Comparison between the frequencies of HDR- and NHEJ-mediated knock-in
To further clarify whether NHEJ repair facilitates DNA integration at a higher efficiency than HDR does, we constructed another donor plasmid that carries an identical ires-eGFP reporter flanked by homology arms to the GAPDH locus. The 5′ homology arm in this plasmid is longer than that in 2a-copGFP(+HAs) donor, covering the GAPDH stop codon as well as sg-2 and sg-3 target sites (Figure 4A, upper panel). When this donor, namely ires-eGFP(+HAs) donor-1, was co-transfected with Cas9/sg-1 in LO2 cells, we detected HDR-mediated reporter knock-in at 7.11% (Figure 4B, Supplementary Figure S5A–C). This frequency was comparable to that produced by HDR-targeting using the 2a-copGFP(+HAs) donor with Cas9/sg-1 (Figure 1C), but lower than that produced by NH-targeting using either single- or double-cut donor and Cas9/sg-1 (Figure 2B).
Interestingly, when we co-transfected the ires-eGFP(+HAs) donor-1 with Cas9/sg-2 or sg-3, which targets to the 5′ homology arm in both genome and donor plasmid, GFP+ cells increased to 14.75% and 17.36% respectively (Figure 4B). These knock-in efficiencies were comparable to NH-targeting with the single-cut donor (Figure 2B, top row). Genome PCR and sequencing confirmed end-joining between genome and donor plasmids beyond the 3′ homology arm (Supplementary Figure S5A,B,D), suggesting that Cas9/sg-2 or sg-3 cleaved both genomic and donor DNAs, and induced NHEJ-mediated integration of the reporters. On the other hand, when Cas9/sg-4 was used to target the 3′ homology arm, GFP+ cells decreased to 10.06% (Figure 4B). Sequencing analysis detected no indels at the 5′-junctions (Supplementary Figure S5A,B,E), suggesting that the intact 5′ homology arms mediated HDR-based integrations, which explained the knock-in observed at a lower frequency.
Next, we constructed the ires-eGFP(+HAs) donor-2 by using a shortened 5′ homology arm that does not contain the sg-2 and sg-3 target sites (Figure 4A, upper panel). This plasmid will not be cleaved by Cas9/sg-2 or sg-3 and can only serve as donor for HDR-based knock-in. Indeed, co-transfection of Cas9/sg-2 with this new donor yielded 6.46% GFP+ cells (Figure 4C, top row). This frequency was much lower than the NHEJ-based knock-in introduced with ires-eGFP(+HAs) donor-1/Cas9/sg-2 (Figure 4B), while it was comparable to the HDR-mediated reporter integrations produced using Cas9/sg-1 together with either type of the (+HAs) donors (Figures 1C and 4B,C).
To compare the NHEJ- and HDR-based knock-in in the identical conditions, we further examined HDR-mediated reporter insertion using a linearized donor. We constructed the ires-eGFP(+HAs) donor-2.A and donor-2.B, by inserting a sg-A target sequence at the 3′ or 5′ of the ires-eGFP(+HAs) cassette, respectively (Figure 4A). These donors thus can be cleaved at the sg-A target site by Cas9/sg-A to provide linear templates carrying homology arms. Using the ires-eGFP(+HAs) donor-2.A in presence of sg-A, we observed 7.30% GFP+ cells with sg-1, and 7.42% with sg-2 (Figure 4C, third row), which were indeed higher than the results obtained using circular donors (Donor-2, or Donor-2.A and 2.B without sg-A; Figure 4C, top, second and fourth rows). These frequencies, however, were still much lower than that produced through NHEJ-based reporter knock-in (Figure 2B, top row; and Figure 4B, with sg-2 and sg-3). Interestingly, using the ires-eGFP(+HAs) donor-2.B and Cas9/sg-A, we observed 19.75% GFP+ cells with sg-1, and 27.23% with sg-2 (Figure 4C, bottom row). It indicated that the linearized donor-2.B enabled NHEJ-based knock-in, and the high proportion of GFP+ cells likely represented a combinatory result of both NHEJ- and HDR-mediated GFP+ knock-in events.
Collectively, these data are consistent with the results observed using 2a-copGFP(+HAs) donor (Figure 1C) or single-cut NH-donor (Figure 2B); and they clearly showed that the simultaneous introduction of DSBs in genome and donors induced targeted DNA integration via NHEJ, at a higher efficiency compared with that mediated by an HDR-based approach.
CRISPR/Cas9-coupled NHEJ introduces efficient knock-in at both active and silenced gene loci
Next, we examined whether the chromatin architecture in a local genomic context influences the efficiency of NHEJ-mediated reporter knock-in, by targeting another actively transcribed locus ACTB and several silenced gene loci, including SOX17, T, OCT4, NANOG and PAX6.
We designed two sgRNAs targeting ACTB 3′-UTR to examine the HDR- and NHEJ-mediated knock-in at the ACTB locus. By co-transfecting the single-cut NH-donor/Cas9/sg-A together with sgACTB-i or sgACTB-ii, we observed GFP+ cells at 10.25% and 15.27%, respectively (Figure 4D, lower left panel, top row). Using the newly constructed ACTB HDR-donor, which carried the ires-eGFP flanked by homology arms to ACTB gene locus, we observed the HDR-based knock-in at 2.38% with sgACTB-i, and 8.60% with sgACTB-ii (Figure 4D, lower left panel, bottom row). Both the NHEJ- and HDR-based knock-in frequencies were comparable to that observed at the GAPDH locus.
In order to examine knock-in at a silenced gene locus directly by FACS analysis, we employed the PGK-eGFP reporter (Figure 4D, upper right panel), which will express GFP after integration regardless whether the target locus is actively transcribed or not. We constructed a constant expression (CE) NH-donor which carries the sg-A target sequence at 5′ of the PGK-eGFP cassette; meanwhile, we generated sgRNAs targeting the SOX17 and T 3′-UTRs. It is noteworthy that because the expression of PGK-eGFP reporter is independent from integration orientations, the GFP+ cells observed in these assays represented knock-in events in either orientation. After transfected with the CE NH-donor/Cas9/sg-A and one of the gene-specific sgRNAs, the LO2 cells were maintained for five passages to eliminate the transient GFP expression before FACS analysis. Indeed, we detected 26.25% and 32.04% GFP+ cells for sgSOX17-i and sgSOX17-ii respectively, and observed 16.00% GFP+ cells with sgT-i (Figure 4D, lower right panel, top row). In contrast, only around 2–3% GFP+ cells were observed in the absence of gene-specific sgRNA; and around 1% GFP+ cells were detected in the absence of sg-A. Using this CE NH-donor, we also examine the NHEJ-mediated knock-in at various positions of OCT4, NANOG, T and PAX6 gene loci, which are largely silenced in LO2 cells. Indeed, we observed varied knock-in frequencies, which correlated neither with the target positions in a gene, nor the transcriptional status of the target loci (Supplementary Figure S6A,B), suggesting that the actual targeting efficiency was largely determined by the intrinsic properties of a sgRNA.
Furthermore, we examined the HDR-based knock-in at the SOX17 and T genomic loci, using donor plasmids carrying PGK-eGFP flanked by homology arms to SOX17 or T genomic regions respectively. Similarly, the transfected cells were passaged for five times before FACS analysis. By transfecting the SOX17 HDR-donor together with Cas9/sgSOX17-i or sgSOX17-ii, we observed 1.30% and 2.83% GFP+ cells, which indicated the HDR-mediated knock-in at SOX17 locus; while usage of T HDR-donor together with Cas9/sgT-i produced 1.59% GFP+ cells (Figure 4D, lower right panel, bottom row). These frequencies were indeed much lower than that produced by the NHEJ-based knock-in at the same target sites (Figure 4D, lower right panel, top two rows). Moreover, they were also lower than the HDR-based knock-in observed in actively transcribed GAPDH and ACTB loci (Figure 4B,C and D, lower left panel, bottom row), which is consistent with previous studies showing that active transcription enhances homologous recombination (31,32).
Collectively, these results indicated that CRISPR/Cas9-coupled NHEJ could mediate efficient knock-in at both active and silenced gene loci, and the efficiencies were higher than that produced by an HDR-based approach.
Efficient knock-in via CRISPR/Cas9-coupled NHEJ in human ESCs and somatic cell lines
Using the ires-eGFP donors with or without HAs, we examined the efficiency of NHEJ-mediated reporter knock-in in human ESCs. Indeed, the reporter knock-in observed was more efficient compared to that introduced using the HDR-based approach. Co-transfection of single-cut NH-donor/Cas9/sg-A/sg-1 produced 0.83% GFP+ cells, and the proportion of GFP+ cells increased up to 1.70% when sg-2 was used (Figure 5A, left panel). These NHEJ-mediated integrations indeed occurred at a higher frequency than the HDR-based knock-in, induced either by Cas9/sg-1–3 with the 2a-copGFP(+HAs) donor (Figure 1B), or by Cas9/sg-1 with the ires-eGFP(+HAs) donor-1 (Figure 5A, right panel, sg-1). Consistently, when we co-transfected the ires-eGFP(+HAs) donor-1 with Cas9/sg-2, which can simultaneously cleave both genome and donor DNAs and induce NHEJ-mediated donor integration, the yield of GFP+ cells increased to 0.93% (Figure 5A, right panel, sg-2). This insertion rate is also higher than that in HDR-targeting (Figure 1B and Figure 5A, right panel, sg-1), and closer to that produced with the single-cut NH-donor/Cas9/sg-A/sg-1 or sg-2 (Figure 5A, left panel). We further sorted the GFP+ cells produced with single-cut donor/Cas9/sg-A/sg-2. These cells maintained the human ESC morphology in culture and expressed the pluripotency markers OCT4 and TRA-1–60 (Figure 5B), suggesting that the NHEJ knock-in process did not interfere with the maintenance of pluripotency state, and it allows the generation of stable knock-in cells simply by FACS sorting.
To verify whether the CRISPR/Cas9-coupled NHEJ-targeting strategy can knock-in the reporter into other genomic loci in human ESCs efficiently, we co-transfected the single-cut NH-donor/Cas9/sg-A together with sgRNAs targeting the OCT4 or ACTB genes at their 3′-UTRs. Both OCT4 and ACTB genes are actively transcribed in human ESCs; hence, knock-in of ires-eGFP reporter at their 3′-UTRs will produce GFP and allow direct analysis by FACS. Indeed, we observed 0.55% and 0.40% GFP+ cells in the H1 human ESCs transfected with sgOCT4-iv or sgACTB-ii, respectively (Supplementary Figure S7A). PCR and sequencing analysis on the OCT4 locus further confirmed the integration of single-cut donor at the target site (Supplementary Figure S7B,C). Collectively, these data showed that CRISPR/Cas9-coupled NHEJ repair can mediate efficient knock-in of reporter genes into a selected genomic locus in human ESCs.
To compare in a broader range of human cells, we further quantified NHEJ- and HDR-mediated reporter knock-in in other human somatic cell lines. We found that, indeed, the NHEJ-based reporter knock-in was more efficient than HDR-mediated integration in all human somatic cell lines examined (Figure 5C). The ratio of GFP+ cells ranged from 2.76% in HCT116 cells to 18.42% in SMMC-7721 cells, when single-cut donor/Cas9/sg-A/sg-1 were used (Figure 5C). Interestingly, LO2, BEL-7402, and SMMC-7721 cells exhibited relatively higher efficiency in both NHEJ and HDR-mediated targeting, whereas, HCT116, H1299 and HK2 were relatively inefficient in both targeting strategies (Figure 5C). Notably, among all the cell lines examined, human ESCs showed the lowest efficiency in both NHEJ- and HDR-targeting (Figure 5C). These results implied that there may be intrinsic restrictions hampering efficient gene targeting in human ESCs, via either HDR or NHEJ repair. This observation is consistent with previous literatures (19,33–34), suggesting that human ESCs possess unique properties in repairing DNA damage. Therefore, further investigation is needed to uncover the deliberate mechanisms and resolve existing discrepancy regarding DNA repair in human ESCs.
In summary, our results demonstrate that the NHEJ repair can enable efficient rejoining of genome and plasmids following CRISPR/Cas9-induced DNA DSBs, which permits knock-in of large DNAs at remarkably higher efficiency than HDR-mediated integrations, in all human cell lines examined. These data have established CRISPR/Cas9-coupled NHEJ repair as a valuable path for efficient knock-in in human ESCs and somatic cells (Figure 5D), providing great potential in biomedical research and therapeutic applications.
Efficient knock-in of exogenous DNA is a highly desirable technology for studies carried out in human cells. Following previous far-reaching success in generating genetically modified mice (35,36), tremendous effort has been made to exploit HDR-mediated approaches for precise DNA insertion or replacement at a selected genomic locus (18,37). Even after the emergence of ZFNs, TALENs and CRISPR/Cas9 technologies, the competence and potential of other DNA repair mechanisms were not well explored, and most gene targeting studies still focused on HDR-mediated approaches to introduce genomic knock-in (19–21,23,28–29).
In fact, NHEJ repair is predominant in mammalian cells. Random DNA integration via NHEJ has been widely used to generate transgenic animals and cell lines, and the frequency was estimated to be over-1000-fold higher than HDR-mediated DNA insertion (37). In the studies of HDR-based gene targeting using TALEN and CRISPR/Cas9, NHEJ-introduced indels were found to occur at a higher frequency than HDR-mediated reporter knock-in (38), and the prevalence of NHEJ-based knock-in was observed when single strand oligonucleotide donors were provided for HDR-mediated gene correction (39). However, very few studies have directly investigated the nature of these NHEJ-based integration processes, and their application potential as a biological technology has not attracted attention until recently. Following pioneer studies that showed the NHEJ-mediated capture of exogenous DNA at genomic DSBs (40), Orlando et al. (2010) first found that short oligonucleotides (<100 bp) could be inserted efficiently at ZFN-induced genomic DSBs via NHEJ repair (41). By introducing a ZFN or TALEN target sequence in donor plasmids, Cristea et al. and Maresca et al. (2013) further showed that simultaneous cleavage on both plasmid and genome DNAs by nucleases enabled targeted integration of large plasmid DNA at the genomic DSBs via NHEJ repair (42,43). This design was then coupled to CRISPR/Cas9 system for reporter knock-in in zebrafish (44–48) and Xenopus (49), in which, the HDR-mediated gene insertion was extremely inefficient. To date, the potential of CRISPR/Cas9-induced NHEJ in mediating large DNA insertion has not been systematically investigated in human cells, and the efficient knock-in in human ESCs still remains a challenge.
In this study, we constructed promoterless fluorescent reporters targeted to the GAPDH locus in human genome. The ubiquitously active nature of the GAPDH gene enables GFP expression upon correct knock-in, allowing rapid assessment by FACS and direct comparison among different human cell types without any cumbersome process of raising single-cell clones. In addition, we employed a sg-A target site taken from prokaryotic DNA sequence, which has no homology to human genome, thus it makes the Cas9/sg-A universal in providing linearized donors for NHEJ knock-in and suitable for the direct comparison across multiple human cell lines. Using these reporter systems, with or without homology arms, we quantified the homology-dependent and independent reporter knock-in directly in various human cell lines. We found that, indeed, CRISPR/Cas9-coupled NHEJ mediates efficient DNA insertion when DSBs are introduced to both genome and donor DNAs; and the knock-in efficiency is much higher than that mediated by HDR-based approach, in all human cell lines examined. Our analysis using LIG4 null LO2 cells showed that the high-frequency homology-independent knock-in events were indeed largely mediated by the conventional NHEJ (C-NHEJ) pathway. Clonal expansion further demonstrated that, with this NHEJ-based targeting approach, stable knock-in clones could be generated in a fluorescence-independent selection-free manner. Together, these data suggest that CRISPR/Cas9-coupled NHEJ repair can provide a valuable path for efficient knock-in in human cells; meanwhile, the results obtained have established our reporter systems as valuable tools for rapid quantification of the HDR and NHEJ activities across different cell lines/clones, which could be useful for dissecting a given molecular pathway, or for screening of therapeutic compounds to restore the impaired DNA repair responsible for human diseases.
In addition, we have unraveled distinct features of the NHEJ-mediated knock-in. Unlike the HDR-based DNA insertion, the NHEJ-based knock-in stringently relies on the presence of DSBs in both genome and donor DNAs; it allows integration at either orientation and it can accommodate insertion of large DNAs (up to 34 kb). Importantly, the homology-independent nature also rendered this NHEJ knock-in approach an advantage in targeting silenced genomic loci, which have been shown to be difficult to access through traditional HDR-based knock-in strategy (50,51). Currently, the CE-NH donor we employed still produced non-specific GFP signal due to transient expression and random integration; hence, further work is needed to improve the reporter design for knock-in at a silenced gene locus.
On the other hand, we showed that the high-efficiency NHEJ knock-in approach could potentially introduce undesired DNA integration at low frequency, due to off-target cleavage by Cas9/sgRNA (30). It is possible that off-target integrations may also produce GFP if the target locus is actively transcribed or the PGK-eGFP reporter is used. Therefore, it is important to follow high-stringency criteria in sgRNA design for minimizing the off-target effect. Usage of the new Cas9 that has been further optimized to reduce off-target effect is also likely beneficial (52). Moreover, as shown by our results as well as by other studies (38,39), high-frequency NHEJ-mediated repair events may occur in many types of genome editing without being detected in a particular assay; hence, unwanted genomic modification should be considered and controlled during data analysis and interpretation. Together, to beware and to confine these limitations are important for further improving the CRISPR/Cas9-based genome editing technology, via either HDR- or NHEJ-mediated DNA repair.
Interestingly, although the C-NHEJ machinery often introduces indels by repairing non-compatible or damaged DNA ends (9), the alternative NHEJ (A-NHEJ) pathway initiates a repair process by single-strand resection (9), and the 3′-5′ exonuclease activity of Cas9 can also introduce deletions by trimming the DNA ends (7), we observed substantial precise ligations between cleaved plasmid and genome DNAs during the NHEJ-mediated knock-in (Figure 2E and Supplementary Figure S4). This is consistent to the analysis results on ZFN- or TALEN-induced DNA integration by Cristea et al. and Maresca et al. (42,43), as well as on CRISPR/Cas9-induced chromosomal inversions by Li et al. (53). These data, together with previous evidence on the precise repair by C-NHEJ (54,55), support that the C-NHEJ pathway can largely mediate precise ligation of DNA ends generated by engineered nucleases, prompting a greater potential of the NHEJ-mediated gene targeting in a wider range of applications.
Supplementary Data are available at NAR Online.
We thank Huck-Hui Ng, Wenyi Feng, Jia-Hui Ng, Wing Ki Wong and Tsz Yau Wong Gerald for critical comments on the manuscript.
Research Grants Council of Hong Kong [CUHK478812, CUHK14102214 and CUHK14104614 to B.F.; HKUST T13-607/12R to Y.I.]; National Natural Science Foundation of China [NSFC 31171433 to B.F., in part]; National Basic Research Program of China [973-Program 2015CB964700 to Y.L., in part]; Shenzhen SZSIA foundation [JCYJ20140425184428469 to J.R., in part]. X.J., C.T. and W.Y. are supported by the CUHK graduate school scholarship. Funding for open access charge: The Research Grants Council of Hong Kong; National Basic Research Program of China.
Conflict of interest statement. None declared.