In-depth assessment of the PAM compatibility and editing activities of Cas9 variants

Abstract A series of Cas9 variants have been developed to improve the editing fidelity or targeting range of CRISPR–Cas9. Here, we employ a high-throughput sequencing approach primer-extension-mediated sequencing to analyze the editing efficiency, specificity and protospacer adjacent motif (PAM) compatibility of a dozen of SpCas9 variants at multiple target sites in depth, and our findings validate the high fidelity or broad editing range of these SpCas9 variants. With regard to the PAM-flexible SpCas9 variants, we detect significantly increased levels of off-target activity and propose a trade-off between targeting range and editing specificity for them, especially for the near-PAM-less SpRY. Moreover, we use a deep learning model to verify the consistency and predictability of SpRY off-target sites. Furthermore, we combine high-fidelity SpCas9 variants with SpRY to generate three new SpCas9 variants with both high fidelity and broad editing range. Finally, we also find that the existing SpCas9 variants are not effective in suppressing genome instability elicited by CRISPR–Cas9 editing, raising an urgent issue to be addressed.

PAM contributes to the targeting specificity of CRISPR-Cas9 by adding extra essential nucleotides that are critical for Cas9 binding (19). However, PAM also limits the targeting scope of CRISPR-Cas9 as well as similar Cas-involved genome editing toolboxes. To broaden the targeting range, several PAM-flexible SpCas9 variants have been engineered. Cas9-NG, xCas9(3.7) and SpG require only NGN PAM compared to the original NGG for SpCas9 (20)(21)(22). The recently reported SpCas9 variant SpRY is even able to target DNA sequences bearing NNN PAMs, though exhibiting higher target activity at NRN than NYN (R for A or G, Y for C or T) (22). These PAM-flexible SpCas9 variants are especially useful for base editors that are often locus restricted (23).
To comprehensively evaluate the editing efficiency, targeting specificity, PAM compatibility and genome integrity of genome editing exerted by high-fidelity or PAM-flexible SpCas9 variants, we employed the high-throughput primerextension-mediated sequencing (PEM-seq) (17) assay for in-depth analysis at target sites with different types of PAMs. We validate the activity of these SpCas9 variants and also find a trade-off between target efficiency and specificity for high-fidelity SpCas9 variants. We compared the targeting range of four PAM-flexible SpCas9 variants and used a deep learning model to investigate the off-target activity of the near-PAM-less SpRY. Moreover, we also uncovered the chromatin abnormality induced by these SpCas9 variants, which are invisible to previous analysis. Finally, we combined the high-fidelity and SpRY to generate sev-eral high-fidelity SpCas9 variants with a broad targeting range. This study gains more insight into the varied activity of high-fidelity and PAM-flexible SpCas9 variants and can shed light on further engineering of CRISPR-Cas9.

Plasmid construction
For fair comparison among different SpCas9 variants, we generated all SpCas9 variants derived from the same parental SpCas9 based on the plasmid pX330 (Addgene 42230) backbone. SpCas9 variants were site-directed mutagenesis generated by Gibson assembly (New England Biolabs). The mutation information is shown in Supplementary Figures S1A and S6. All the plasmids have the same codon optimization, NLS configuration and a CMV-driven mCherry. sgRNA was cloned into another plasmid with a CMV-driven GFP. Sequence for sgRNA is shown in Supplementary Table S1.
A total of 3 g of the Cas9 plasmid and 3 g of the sgRNA plasmid were co-transfected into 6-cm dish HEK293T cells by 18 l of 1 mg/ml PEI (Sigma). Cells were harvested 72 h post-transfection and were sorted by fluorescence-activated cell sorting (FACS, MoFlo XDP, Beckman Coulter) according to mCherry and GFP followed by genomic DNA extraction.

T7EI cleavage assay
General procedures were referred to the method described before (17). FastPfu (TransGen) DNA polymerase was used for general polymerase chain reaction (PCR) followed by purification, denaturation and reannealing of the PCR products. Then, T7EI (New England Biolabs) was used for digestion of the PCR products followed by electrophoresis. Primer sequence for each target site was listed in Supplementary Table S1.

PEM-seq operation and analysis
PEM-seq construction and analysis for off-target, translocation and large deletion were referred to (17,24). Generally, biotinylated primer was designed within 150 bp around the Cas9-targeting site to achieve primer extension. Sitespecific nested primer was designed for following amplification. All the PEM-seq libraries were sequenced by Illumina HiSeq. For off-target analysis, junctions proximal to break site (±20 kb) were excluded and MACS2 callpeak was used to identify translocation enriched region. Off-target hotspots were defined to have less than eight mismatches with on-target site and more than three junctions at the presumable cutting site. Translocations from general doublestranded breaks (DSBs) were calculated by excluding junctions ±20 kb around the target sites and ±100 bp around the off-target sites.
The primer sequence is shown in Supplementary Table. Plasmid insertion analysis was referred to (24).

Deep learning for SpRY off-targets
General procedure is referred to (25). The input is a code matrix with shape of 23 (sgRNA and PAM) × 4 (A, T, C, G). The first layer is a convolutional layer, which is for extracting matching information. The second layer is a batch normalization layer, which is for reducing internal covariate shift in the neural network to speed up learning and avoid over-fitting. The third layer is a global max-pooling layer connected with the previous BN layer to call whether the mismatches modeled by the respective BN layer exist in the input sequence or not. The following layers are two dense layers which consist of 100 and 23 neurons, respectively. A dropout layer is used on the last dense layer to avoid overfitting and the final output layer consists of one neuron using the sigmoid function. The input data for training are divided into two types: true off-targets detected by PEM-seq and false randomly generated sequences that has more than 10 mismatches with the target site, followed by 30 cycles of training. For the prediction, genomic sequences which have less than eight mismatches with target sequence were retrieved and subject to prediction.

Statistical analysis
Wilcoxon-matched pairs singed rank test was used. P < 0.05 was considered significant.

Activities of high-fidelity and PAM-flexible SpCas9 variants at NGG loci
To extensively assess the editing activities of SpCas9 and SpCas9 variants, we employed the PEM-seq to capture various editing outcomes including small insertions/deletions  Plasmids carrying Cas9-mCherry and sgRNA-GFP were co-transfected into HEK293T cells followed by FACS and PEM-seq operation about 72 h later. PEM-seq can simultaneously detect small indels, large deletions and chromosomal translocations with off-target or general DSBs. (B) Editing efficiency for SpCas9 variants at indicated NGG loci detected by PEM-seq. Editing efficiency is referred to the total percentage of indels, large deletions and translocations. (C) Off-target numbers for SpCas9 variants at indicated NGG loci detected by PEM-seq. '-' indicates nearly no editing activity (editing efficiency <2% is defined as nearly no editing activity). (D) Gene annotation for SpRY off-targets at RAG1 locus using KEGG of Enrichr (maayanlab.cloud/Enrichr/). The horizontal axis indicates the gene numbers in the related pathways. (E) Consensus sequence analysis by weblogo (weblogo.berkeley.edu) for SpCas9 and SpRY off-targets at RAG1 locus detected by PEM-seq. On-target sequence is marked below and position for sgRNA and PAM is labeled above. (F) Statistics for the second and third nucleotides of PAM for SpCas9 or SpRY off-targets at the RAG1 locus detected by PEM-seq.
SpCas9 and all the tested high-fidelity SpCas9 variants were able to induce substantial cleavages at the five target sites except that evoCas9 showed almost undetectable cleavage activity at the RAG1 and DNMT1 sites ( Figure   1B). The other high-fidelity SpCas9 variants showed comparable editing efficiencies at these sites with the SpCas9 despite some differences at certain sites for some variants ( Figure 1B). As anticipated, all the high-fidelity variants showed generally significantly lower levels of off-target activities compared to the SpCas9 with LZ3 and Sniper being the least specific ( Figure 1C). Moreover, the off-target sites identified by high-fidelity variants also occurred in the PEM-seq library of the SpCas9 as exemplified by the data from the RAG1 target site (Supplementary Figure S1B and  Table S1), indicating a similar targeting range of these variants with the SpCas9. A trade-off between editing efficiency and specificity was also found for high-fidelity SpCas9 variants (Supplementary Figure S1C), consistent with previous reports (18,26).
With regards to the PAM-flexible variants, the editing efficiencies at the tested NGG-PAM sites for the four variants were generally lower than the SpCas9 though still sufficient to induce efficient gene editing at the target sites (Figure 1B). Though fewer off-targets were detected in xCas9 samples, much more off-targets were found for Cas9-NG, SpG and especially for SpRY except at the VEGFA site with several very strong off-target sites harboring NGG PAMs ( Figure 1C and Supplementary Table S1). For the RAG1 site, a total of 188 off-targets were identified for SpRY and 109 of these off-targets lie in the genes involved in different molecular pathways including viral infection and cancers (27) (Figure 1D). Specifically, the BCL6 gene, as one of the off-target, has been implicated in a variety of tumors, such as B-acute lymphoblastic leukemia and non-small cell lung cancer (28). Moreover, we sought to validate some top offtargets of SpRY at these NGG loci by T7EI assay. Though the sensitivity of T7EI is not as good as sequencing, cleavage was still detected at 8 out of 10 tested sites, except for the third off-target of C-MYC and the second off-target of VEGFA (Supplementary Figure S1D).
The consensus sequence of SpRY off-targets is relatively less conserved in the PAM-distal region of the sgRNA body, displaying a similar mismatch pattern to that of the Sp-Cas9 ( Figure 1E). Nonetheless, more off-targets of SpRY harbored higher numbers of mismatches than those from SpCas9 as exemplified by the RAG1 and EMX1 sites (Supplementary Figure S1E). The consensus PAM sequence for the off-targets of the SpCas9 resembled NGG, while SpRY showed no particular preferred nucleotide at the second or third position with a moderate bias of NRN against NYN (R for A or G, Y for C or T; Figure 1E), consistent with the initial report of SpRY (22). Collectively, broader PAM scope and higher tolerance of mismatch numbers lead to greatly increased off-target activity for SpRY. With regards to other variants, off-targets with NGN are favored by the xCas9, Cas9-NG and SpG, in line with their PAM preference (Supplementary Figure S1F) (20)(21)(22).

Activities of PAM-flexible variants at NGH loci
To further assess the PAM compatibility of these PAMflexible SpCas9 variants at NGH PAMs (NGA, NGT, or NGC) in human cells, we designed five target sites for each type of PAM at genes, including TRAC, EMX1, HBA1, FANCF and C-MYC. We then used PEM-seq for in-depth analysis of CRISPR editing at these target loci in the HEK293T cells. The SpCas9 only exhibited detectable cleavage activity at the target sites with NGA PAM (Figure 2A), in line with previous reports that the NGA is also targetable by CRISPR-Cas9 (29). The Cas9-NG, SpG and SpRY showed robust editing activity at most target sites except two NGT sites in PTEN and FANCF genes in ad-dition to an NGC site in the TP53 gene; however, xCas9 showed the lowest editing capacity and the cleavage was almost undetectable at most tested sites regardless of the PAM composition (Figure 2A). Correspondingly, we detected off-targets from several to tens for these PAM-flexible variants at tested sites and SpRY universally cleaved at more off-target sites than the other variants ( Figure 2B and Supplementary Table S1). Moreover, most of the identified off-targets are shared by Cas9-NG, SpG and SpRY (Figure 2C). The occurrences of several unique off-targets for Cas9-NG and SpG are probably due to compatible but minorly different preference at the NGH PAMs that the SpG showed the strictest constraint at the second G than Cas9-NG and then SpRY ( Figure 2D; examples in Supplementary Figure S2A and B). With regards to mismatch at the sgRNA sequences, the tolerance from high to low is in an order of SpRY > Cas9-NG ≈ SpG > xCas9 with similar general mismatch patterns ( Figure 2E; Supplementary Figure S2A and B), in line with the above findings at target sites with NGG PAMs.

Activities of PAM-less SpRY at NHN loci
SpRY is currently the only near PAM-less SpCas9 variant and greatly broadens the targeting range of CRISPR-Cas9. To assess the activities of SpRY at NHN PAMs (NAN, NCN, or NTN), we designed three target sites for each type of PAM in HEK293T cells and employed PEM-seq for indepth analysis. Overall, SpRY showed varied editing cleavage, ranging from 2.3 to 32.4% at these loci ( Figure 3A). Several to almost one hundred off-target sites were detected for these target loci except none for the TRAC site with an NTN PAM ( Figure 3B and Supplementary Table S1). These off-target PAMs predispose to NNN with a minor bias of R (A or G) at the second position as anticipated ( Figure  3C; Supplementary Figure S3A and B). For example, 77 offtargets have NRN PAMs while 17 with NYN PAMs at the C-MYC-ACC target site ( Figure 3C).
As our data revealed a trade-off between editing range and targeting specificity for SpRY, we adapted a deep learning model developed for evaluating CRISPR-Cas9 offtargets (25) to test the consistency of SpRY off-targets among different tested sites and thereby for further offtarget prediction. We collected the 23-bp information (sgRNA + PAM) from a total of 456 off-targets from our SpRY PEM-seq data to train the convolutional neural networks (CNN)-based model ( Figure 3D) and saved the C-MYC-ACC site (from Figure 3C) for prediction. The 'accuracy' and 'loss' of the learning model achieved 97.8 and 7.5% after data learning of 10 epochs and finally reached 99.5 and 2.0%, respectively (Supplementary Figure S3C). For the prediction, we retrieved the C-MYC-ACC target-site-similar sequences within eight mismatches from the human hg38 genome and subjected them to the trained model for prediction. All the top 15 and 67/80 predicted sites are true off-targets as validated by the PEM-seq data and 90/94 identified off-targets occur in the top 150 predicted sites ( Figure 3E; Supplementary Figure S3D and Table S1), indicating a decent performance of the trained deep learning model for SpRY off-target prediction.

Genome instability during genome editing via CRISPR-Cas9 variants
The DNA repair outcomes induced by CRISPR-Cas9activated DNA repair pathways have raised great concerns recently (17,(30)(31)(32)(33). Among these DNA repair outcomes, chromatin abnormality caused by large deletions (>100 bp) and chromosomal translocations is the most dangerous. Therefore, we used the levels of large deletions and translocations to represent genome instability elicited by genome editing as previously described ( Figure 4A) (24). In order to detect chromatin abnormality for all the SpCas9 variants, we analyzed the PEM-seq data from CRISPR editing at five target sites with NGG PAMs. For the SpCas9, large deletions and translocations occur at average rates of 3.2 and 6.2%, respectively (Supplementary Figure S4A and B). Though showing great potential in reducing the off-target activity of SpCas9, the high-fidelity variants displayed comparable levels of chromosomal translocations as well as large deletions at tested sites ( Figure 4B and C; Supplementary Figure S4A and B). With regards to the PAM-flexible variants, elevated levels of translocations were detected at RAG1 (1.5-fold) and DNMT1 (2.0-fold) sites due to more translocations between the target sites and off-target sites, while similar levels were detected for the EMX1 and C-MYC sites ( Figure 4B and Supplementary Figure S4A). Reduced levels of large deletions (2-fold on average) were detected for these PAM-flexible variants except at the EMX1 site ( Figure 4C and Supplementary Figure S4B). Unfortunately, these data suggested that the current high-fidelity or PAM-flexible SpCas9 variants are not able to suppress genome instability during genome editing, the same problem as the SpCas9.

Plasmid integrations during genome editing via PAM-flexible SpCas9 variants
Plasmid integrations have been widely observed during CRISPR-Cas9 genome editing with DNA-based delivery systems including adeno-associated virus (AAV) and plasmids (24,34,35). To detect plasmid integrations for these Sp-Cas9 variants, we analyzed the PEM-seq data as previously described ( Figure 5A) (24). We found low levels of plasmid integrations for the SpCas9 and high-fidelity variants at the five tested sites with NGG PAMs and the inserted plasmid fragments were evenly distributed across the plasmid backbone ( Figure 5B and C; Supplementary Figure S5A). The  Figure S5A). Though the total levels of plasmid integrations are not increased significantly for xCas9, enrichment at the U6-sgRNA regions is still detected ( Figure  5B and Supplementary Figure S5A). In a zoomed-in view of SpRY, the enrichments mainly occur around the N 17 and N 18 of the sgRNA body CACC (N) 20 GTTT, suggesting potential SpRY cleavage at the plasmids ( Supplementary Figure S5B), consistent with a previous report in plants (36).
To verify the cleavage of SpRY at plasmids, we generated a PEM-seq library from a primer lying 53-bp downstream of the sgRNA in the plasmid to detected indels within the plasmids as well as plasmid-genome fusions. About 10% of plasmids were cleaved by SpRY calculated from the PEM-seq data ( Figure 5D). Substantial plasmid-genome fusion junctions were detected and distributed widely in the genome in the SpRY-edited HEK293T cells ( Figure 5D). Due to the lack of the NGG PAM, the SpCas9 is not supposed to cleave at the plasmid, and only background level of indels (0.7%) was detected ( Figure 5D). Moreover, we placed a Cas9-target site in the plasmid to induce dual cleavage at both plasmid and the genome and finally detected a large number of plasmid-genome fusion junctions, providing further evidence for the danger of using targetable plasmid or virus for SpCas9 or variants delivery (Supplementary Figure S5C).

Enhancing the targeting specificity of SpRY
The combination of SpRY with high-fidelity variant mutations may help improve the specificity of SpRY. To this end, we introduced the mutations of the three best high-fidelity variants eCas9, HF1 and HypaCas9 into the gene of SpRY to generate the eCas9-SpRY, HF1-SpRY and Hypa-SpRY (Supplementary Figure S6A). We applied PEM-seq for evaluating these combined SpCas9 variants at nine tested loci with the most off-targets. These sites harbored NGG, AGA, CAG, ACC or ACT PAMs. Compared to SpRY, eCas9-SpRY and HF1-SpRY showed comparable editing efficiencies at tested loci, while slightly lower editing efficiency for Hypa-SpRY ( Figure 6A). The numbers of identified off-target sites for all the three combined variants at the nine tested sites are decreased significantly and the offtargets were even undetectable at several loci for HF1-SpRY and Hypa-SpRY ( Figure 6B). Correspondingly, the levels of translocation events between on-target and off-target sites were also reduced significantly ( Figure 6C and Supplementary Table S1), indicating a great improvement for specificity. However, similar or elevated levels of chromosomal translocations, large deletions and plasmid integrations were detected for eCas9-SpRY, HF1-SpRY and Hypa-SpRY versus SpRY ( Figure 6D-F), indicating high levels of genome instability with these SpRY-based Cas9 variants.

DISCUSSION
Both high-fidelity and PAM-flexible SpCas9 variants have been evaluated previously by other research groups (18,26,37,38). Whereas the previous assessments utilize a multiplexing system with tens of thousands of parallel target sites in the same library in order to cover as many as different types of SpCas9 variant-targeting sites in the genome (18,26,37), here we used a complementary strategy to assess the PAM compatibility, editing efficiency and targeting specificity of these SpCas9 variants by in-depth analysis of editing outcomes at multiple typical target sites with PEMseq. Our strategy confirms the main findings in the previous studies while also brings new findings of the heterogeneity and complexity of gene editing behaviors of these SpCas9 variants. For instance, SpRY shows 188 off-targets in the RAG1 site with an NGG PAM while none at some other sites including the TRAC-NGA and the TRAC-NTN site ( Figure 1C, 2B and 3B). Moreover, large deletions and general translocations fused by the on-target and genome-wide general DSBs were constant among SpCas9 and its high-fidelity variants ( Figure 4B and C) or SpRY and its highfidelity variants ( Figure 6D and F). These findings can be explained by that large deletions and general translocations are determined by DSB repair pathways and these variants are supposed to have no significant impact on the choice of DSB repair pathways.
The in-depth analysis shows the efficacy of using highfidelity SpCas9 variants to reduce off-target activity and using PAM-flexible SpCas9 variants to broaden the editing range of CRISPR-Cas9 in the genome. However, the PAM compatibility of PAM-flexible SpCas9 variants, especially of SpRY, has been improved for both on-target and off-target activity (e.g. Figure 1F), which may lead to elevated levels of off-target damages. The mismatch patterns in the sgRNA body of these SpRY off-targets are similar to the SpCas9 (Figure1E). Besides, the utilization of PAM for SpRY on-and off-targets also has some features remaining to be explored, e.g. A/G bias. In this context, we used a deep learning model (25) to verify the consistency of these SpRY off-targets, which should be improved when feeding the CNN-based model with more data. The combination of SpRY with high-fidelity variants including eCas9, HF1 and HypaCas9 can largely improve the fidelity of SpRY and make it feasible for some genome editing scenarios.
High levels of plasmid integrations have been detected for these PAM-flexible SpCas9 variants, especially for the PAM-less SpRY, due to potential cleavage of SpCas9 variants at the plasmids ( Figure 5). In this context, the DNAbased delivery systems, including the AAV, are not applicable for transducing PAM-flexible SpCas9 variants into cells. This is not limited to the Cas9 forms of these variants but also includes derived base editors, since base editors may also generate substantial mutations on the sgRNA sequence in the plasmids. Ribonucleoprotein (RNP) would be an optimal choice currently. Further optimization is in demand to suppress plasmid attacking of PAM-flexible Sp-Cas9 variants as well as genome instability induced by Sp-Cas9 or these SpCas9 variants. Moreover, since the editing outcomes can be affected by different transfection methods (DNA-based, RNA-based, RNP), further studies are needed to compare these variants using mRNA or RNP transfection.

DATA AVAILABILITY
Data were deposited on NODE (National Omics Data Encyclopedia) database: OEP001824. Scripts and raw data of off-target prediction via deep learning model in this study are available at GitHub repository (https://github. com/JiazhiHuLab/CNN predict) (25).