Native ribonucleases process sgRNA transcripts to create catalytic Cas9/sgRNA complexes in planta

The current CRISPR/Cas9 gene editing dogma for single guide RNAs (sgRNA) delivery is based on the premise that 5′ and 3′ nucleotide overhangs negate Cas9/sgRNA catalytic activity in vivo. This has led to engineering strategies designed to either avoid or remove extraneous nucleotides on the 5′ and 3′ termini. Previously, we used a Tobacco mosaic virus viral vector to express both GFP and a sgRNA from a single virus-derived mRNA in Nicotiana benthamiana. This vector yielded high levels of GFP and catalytically active sgRNAs. Here, in an effort to understand the biochemical interactions of this result, we used in vitro assays to demonstrate that nucleotide overhangs 5′, but not 3′, proximal to the sgRNA do in fact inactivate Cas9 catalytic activity at the specified target site. Next we showed that in planta sgRNAs bound to Cas9 are devoid of the expected 5′ overhangs transcribed by the virus. Furthermore, when a plant nuclear promoter was used for expression of the GFP-sgRNA fusion transcript it also produced indels when delivered with Cas9. These results reveal that 5′ “auto-processing” of progenitor sgRNAs occurs natively in plants. Towards a possible mechanism for the perceived “auto-processing”, we found, using in vitro generated RNAs and those isolated from plants, that the 5′ to 3′ exoribonuclease XRN1 can degrade elongated progenitor sgRNAs whereas the mature sgRNA end-products are resistant. Comparisons with other studies suggest that sgRNA “auto-processing” may be a phenomenon not unique to plants, but other eukaryotes as well. Summary Native Nicotiana benthamiana ribonucleases cleave exogenous nucleotides 5′ to the sgRNA spacer transcript to create catalytic Cas9/sgRNA complexes in planta.


48
The CRISPR/Cas9 platform, found natively in Streptococcus pyogenes, has been 49 developed into a diverse set of functional genetic tools, used in gene editing technology (Mali et  contradict the consensus in the field which suggests that in vivo delivered sgRNAs containing 74 nucleotide overhangs prevent either Cas9-sgRNA complex assembly or its catalytic activity. 75 The activity associated with in planta produced subgenomic RNAs from the TRBO ). Due to TRBO being an efficient protein and sgRNA co-delivery tool in N. 88 benthamiana, as well as the overall lack of knowledge of native 5′ CRISPR RNAs (crRNA) 89 processing currently in the literature -synonymous to 5′ sgRNA processing used in our models 90 here -we aimed to better understand what is occurring to the 5′ end of sgRNAs in vivo, 91 specifically in the experimental model N. benthamiana. 92 We first use in vitro assays with Cas9 and sgRNA transcripts containing nucleotide 93 overhangs (nucleotides not corresponding/aligning to the 100 nucleotide chimera sgRNA 94 sequence) of either or both the 5′ and 3′ ends of the sgRNA. In doing so, we concluded that in 95 accordance with the prevailing dogma, 5′ overhangs do indeed completely inhibit the ability of 96 the Cas9-sgRNA complex in vitro. Following these results we hypothesized, and found, that 97 upon co-infiltration of a Cas9 and the TRBO-GFP-sgRNA co-expression constructs in N. 98 benthamiana that Cas9 bound transcripts were enriched for sgRNAs which lacked the originally 99 fused 5′ transcript sequences, indicative of a 5′ RNA processing event. Further sub-cellular 100 fractionation analysis determined that the removal/processing of the 5′ nucleotide overhangs 101 occurred in the plant cytosolic fraction. Next, we generated GFP-sgRNA transcripts that 102 mimicked those generated by the viral system, but used a nuclear promoter for transcript 103 expression. The results demonstrated that these transcripts were also capable of programming a 104 catalytically active Cas9-sgRNA complex. Finally, to understand a potential RNA degradation 105 pathway responsible for processing sgRNAs, we used both in vitro and in planta transcribed 106 sgRNA templates and subjected them to the 5′ to 3′ exonuclease, XRN1. This did result in 107 degradation of elongated RNAs but not of maturated sgRNA specific templates. These 108 experiments directed us to develop a tentative model for creating catalytically active 109 Cody and Scholthof,page 5 Cas9/sgRNA complexes that we believe could be applicable to other eukaryotes, and possibly 110 the native processing system in S. pyogenes. Ultimately, the results from these experiments may 111 have far reaching impacts on the development of CRISPR technology in future applications, but 112 it also serves as a model for understanding fundamental biology of the native CRISPR/Cas9 5′ 113 processing system. To test whether Cas9 can cleave a protospacer-harboring DNA template using a sgRNA 120 containing overhangs on either the 5ʹ, 3′, or both ends, we elected to use the viral-based protein 121 and sgRNA overexpression tool we previously developed, TRBO-G-3ʹgGFP (Fig. 1A), as a 122 template to conduct in vitro Cas9 cleavage assays (Cody et al. 2017). In addition to carrying 5′ 123 and 3′ UTR regions, TRBO-G-3ʹgGFP contains both a GFP protein coding segment and sgRNA 124 targeting the mgfp5 gene (gGFP). To model the subgenomic RNAs being produced from TRBO-125 G-3ʹgGFP, a T7 promoter carrying a forward primer was designed at the native coat protein 126 subgenomic RNA transcription start site (T7-F1) and at the start of the spacer sequence of gGFP 127 (T7-F2) to replicate both 5′ overhang carrying sgRNA and "clean" sgRNA (lacking extraneous 128 nucleotides), respectively (Fig. 1B). To evaluate 3′ overhang effects on Cas9 nuclease activity, 129 reverse primers were designed both in the 3′ TMV-UTR and on the 3′ most end of the sgRNA 130 scaffolding, to replicate both 3′ overhang carrying sgRNA and "clean" sgRNA, respectively 131 ( Fig. 1B). PCR amplification of 5ʹ overhang carrying (T7/F1-R2), 3′ (T7/F2-R1), 5′ and 3′ 132 (T7/F1-R1), and "clean" gGFP (T7/F2-R2) were used as a template for T7 transcription reactions  cleavage is merited by the presence of digested DNA template that only occurred, in this case, 137 when using a gGFP transcript without 5′ overhangs (Fig. 1C). Surprisingly, Cas9 still cleaved 138 target DNA with a long 3′ gGFP nucleotide overhang in vitro (Fig. 1C). These results indicate that while 3′ sgRNA overhangs can be present and still allow for Cas9 dependent DSBs, sgRNAs 140 carrying 5′ spacer sequence-adjacent overhangs inhibit Cas9 DNA cleavage.

141
Concentrations of sgRNAs in the previous Cas9 cleavage assays (Fig. 1C) were used at 142 levels suited for optimal function of "clean" sgRNAs. To mimic the TRBO-sgRNA delivery  there was no evidence of DNA cleavage (Fig. 1D). These results indicate that the increased 149 concentrations of 5′-elongated-gGFP-progenitors observed with TRBO delivery in planta is 150 unlikely the source of efficient Cas9 editing, but instead that native 5ʹ sgRNA processing 151 abilities most likely exist in planta.

153
Cas9 bound sgRNAs have processed 5′ ends in planta 154 Previously we established that co-delivery of pHcoCas9 ( Fig. 2A) and TRBO-G-3ʹgGFP pHcoCas9, TRBO-G-3ʹgGFP, or pHcoCas9 and TRBO-G-3ʹgGFP were performed followed by 163 RNA extractions. Additionally, an RT-PCR amplification scheme was designed using three 164 primer sets to detect for an enrichment of TRBO-G-3ʹgGFP derived RNA product with a 165 particular emphasis on shortened (e.g., processed) gGFP spacer fragments (Fig. 2B). Forward do not impede Cas9-sgRNA ability to induce DSBs (Fig. 1C), we elected to amplify sgRNA fragments using a reverse primer starting within the sgRNA scaffolding (R2) to enable us to 171 focus on the biological relevant 5′ proximal to the spacer sequence.

172
Since we previously established that the majority of editing events occur during 2-3 days 173 post-inoculation (dpi) (Cody et al. 2017), 3 dpi samples were assayed from each treatment for 174 analysis. Cas9 protein was isolated through immunoprecipitation (IP) using a Cas9-specific 175 antibody followed by protein G agarose bead pull-down. Cas9 protein isolation on the protein G 176 agarose beads was verified via western blot detection (Fig. 2C). RNA extractions were carried 177 out using all three Cas9-IP samples, and for comparison total RNA samples were also extracted 178 for each tissue. RT reactions were performed using the sgRNA scaffold specific (R2) primer.

179
Total RNA RT-PCR amplifications showed approximately equal quantities of product when 180 comparing theTRBO-G-3′gGFP alone, versus the pHcoCas9 plus TRBO-G-3′gGFP co-infiltrated 181 samples (Fig. 2D). Roughly equal expression quantities held true over 3, 5, and 7 dpi (Fig. 2E). 182 In contrast, RT-PCR amplifications on the IP-products showed a clear enrichment of gGFP 183 specific amplicons (F3-R2) in the pHcoCas9 and TRBO-G-3ʹgGFP co-infiltrated tissue 184 compared to the predicted longer viral subgenomic RNA product (F2-R2) and genomic/first 185 subgenomic containing RNA product (F1-R2) (Fig. 2D). In line with expectations, the two 186 controls either devoid of sgRNA (pHcoCas9 alone) or Cas9 (TRBO-G-3ʹgGFP alone), did not 187 yield amplification products of the expected molecular weight for each primer set. 188 We next aimed at testing whether processing specificity is manifested for sgRNAs loaded 189 within Cas9 by examining if sgRNAs were specifically cleaved at the 5′ terminus of the mature 190 sgRNA or if several subpopulations of sgRNAs containing various 5′ overhang lengths 191 associated with Cas9. Towards this, forward primers were designed from the gGFP (F3) spacer 192 sequence progressively moving upstream of the subgenomic RNA in increments (Fig. 2B).  PCR indicated a clear reduction in band intensity with primers used upstream and 5′ proximal to 194 the gGFP spacer sequence (Fig. 2F). These data confirm that the 5′ end of gGFP is being 195 processed (cleaved) in planta to eliminate the nucleotide overhang produced during viral 196 subgenomic RNA production (transcription) with some level of specificity to the start of the 5′ 197 spacer sequence. Furthermore, it appears that either Cas9 preferentially binds processed sgRNAs, 198 or proper 5′ nucleotide removal is stimulated by association of Cas9 with the progenitor-sgRNA.  Fig 1A). RT-PCR products for total RNA lysate of 16c and 218 wt plants indicated no discrepancies in band intensities using the previously designed primer sets 219 (Sup Fig 1B). Following these results it was concluded that sgRNA 5′ processing was not reliant 220 on a protospacer being present in the nuclear DNA and must be occurring through another 221 mechanism.

222
To test if nuclear localization is required for sgRNA processing we removed the nuclear 223 localization signals (NLS) from Cas9 and constructed p-NLSCas9 (Sup Fig 1C). 16c plants were 224 then infiltrated with TRBO-G-3′gGFP as well as co-infiltrated with either the NLS lacking p-225 NLSCas9 construct or the NLS containing pHcoCas9 vector. To confirm a lack of localization to 226 the genomic DNA, a proxy for nuclear localization, of the p-NLSCas9 encoded protein, 7 dpi 227 DNA was assayed for verification of indel formation following each treatment. As expected, the 228 pHcoCas9 construct produced DSBs from 16c genomic DNA whereas the p-NLSCas9 indel 229 quantification resulted in levels undifferentiated from the TRBO-G-3′gGFP only control (Sup 230 Fig. 1D). These results demonstrate that a lack of nuclear subcellular localization of Cas9 (-231 NLSCas9) negates complex catalysis of substrate DNA. Following these results, tissue was sampled from 4 dpi 16c plants and used for Cas9-IPs followed by RNA extractions as well as for 233 total lysate RNA extractions. Total RNA and Cas9 bound RNA from both pHcoCas9 and p-234 NLSCas9 treatments were subject to RT-PCR and it was confirmed that sgRNA 5′ processing 235 occurred in extracts containing either the NLS lacking or the NLS containing construct (Sup Fig.   236 1E). 237 These results indicated that DNA target recognition events in the nucleus might not be 238 critical for progenitor-sgRNA processing, which suggests the possible contribution of 239 cytoplasmic events to enable catalytic activity of the complex. Therefore, we next interrogated 240 both the cellular localization of Cas9 protein and sgRNAs to identify the location of 5′ sgRNA 241 processing (nucleus or cytosol). Using sub-cellular fractionation in combination with the 242 previously developed RT-PCR scheme (Fig. 2B), we compared the fractions for relative levels of protein through western blotting, which indicated that even though a sub-population of Cas9 246 protein accumulates in the cytosol Cas9 preferentially localizes to the nucleus (Fig. 3A). Both 247 nuclear and cytosol fractions from pHcoCas9 and TRBO-G-3ʹgGFP co-infiltrated tissue were 248 then used for Cas9-IP (Fig. 3A), followed by RNA extractions. Total RNA was also extracted 249 from pHcoCas9 and TRBO-G-3ʹgGFP total cellular lysate as well as from the cytosol and 250 nuclear lysate fractions. RT-PCR analysis from the total nuclear lysate and the Cas9-IP isolated 251 from the nuclear fraction indicated that sgRNAs were being processed prior to translocation into 252 the nucleus (Fig. 3B). Additionally, there was a clear enrichment for specific gGFP 5′ processed 253 forms in the Cas9-IP cytosolic fraction reactions as compared to the reactions from the total 254 RNA in cytosolic fraction (Fig. 3B). While cytosolic lysate showed no discrepancies between the 255 gGFP processing forms, as in the total lysate control, the Cas9-IP RNA contained mostly 5′ 256 processed forms of sgRNAs. Taken together, these data reinforce that 5ʹ sgRNA processing does 257 not depend on Cas9 nuclear localization but instead occurs, at least primarily, in the cytosol 258 using our viral-sgRNA delivery system.  To separate both cytosolic transcript expression/localization and potential viral host 276 responses, the protein-sgRNA fusion transcript, U6-GFP-gGFP, was constructed along with a 277 transcript producing only "clean" (no 5′ nucleotide overhangs) sgRNA, U6-gGFP, to serve as a 278 control for DSB activity. Both U6-GFP-gGFP and U6-gGFP were inserted into the pHcoCas9 279 expression vector to produce pHco-U6-GFP-gGFP and pHco-U6-gGFP, respectively (Fig. 4A). 280 Then, 16c plants were used for half-leaf assays using pHco-U6-GFP-gGFP and pHco-U6-gGFP 281 to test for in planta catalytic activity (Fig 4B). Tissue was taken at 7 dpi from three assayed plant 282 samples and subjected to PCR amplification followed by a BsgI digestion. The pHco-U6-GFP-283 gGFP infiltrated tissue surprisingly showed substantial quantity of indels 17%-30%, but pHco-284 U6-gGFP was considerably higher at 33%-40% (Fig. 4C). Each half-leaf assay, indicated by 285 number (Fig. 4C), consistently measured lower percentages of indel mutations using pHco-U6-286 GFP-gGFP compared to the pHco-U6-gGFP infiltrated part of the leaf (Fig. 4D). One possible 287 explanation for the lower indel percentages using the pHco-U6-GFP-gGFP construct would be 288 the length of the transcript (~850 nts) being much longer than a typical Pol III transcribed RNA 289 (100-150 nts), causing a decrease in gGFP expression due to lower levels of Pol III fidelity at the 290 3′ end of the transcript. To test if the discrepancy of indel mutation percentages between these 291 two constructs was due to lower expression levels of the pHco-U6-GFP-gGFP transcripts or due 292 to 5′ sgRNA overhangs impairing catalytic activity, 5 dpi half-leaf assays were used for RT-PCR expression analysis (Fig. 4E). Ultimately there was no difference in expression levels of gGFP 294 between either pHco-U6-GFP-gGFP or pHco-U6-gGFP, indicating that the lower indel 295 percentages from the pHco-U6-GFP-gGFP is due to a reduction in 5′ processing efficiency in 296 host cells more than likely due to the extended nuclear localization of transcripts synthesized 297 from U6 promoters, confirming that cytoplasmic localization stimulates progenitor-gRNA 298 processing. Perhaps even more importantly these assays demonstrate that, in fact, pHco-U6-299 GFP-gGFP is capable of delivering sgRNAs with considerable 5′ overhangs that are clearly   Fig 2B) in fact increased DSBs at the 325 target loci (Sup Fig 2C). This could be related to the a report that silencing of decapping 326 enzymes, in fact, causes increased viral replication (Ma et al. 2015), and therefore these results 327 could be a result of increased cellular content of sgRNA and Cas9 .

328
Instead of moving forward with the rather complicated genetics of our in planta 329 experimental model (Sup Fig 2) we looked towards recapitulating the XRN-sgRNA interaction for the processing phenomenon we see in planta. 342 In vitro assays were set up by supplying an exogenous RNA 5′ pyrophosphohydrolase 343 (RppH) to produce a 5′ monophosphate transcript that can then readily be degraded by XRN 344 proteins, in this case XRN-1. To test our hypothesis we generated two transcripts, one from the 345 full predicted subgenomic RNA produced from TRBO-G-3′gGFP (F1-R2) and the other 346 containing only the sgRNA sequence gGFP (F2-R2) found in higher concentrations in planta. 347 Upon running the reactions on a denaturing gel we found that in the presence of both RppH and 348 XRN-1 the larger F1-R2 is degraded to what appears to be completion (Fig 5A). Indeed the 349 gGFP specific transcript was completely recalcitrant to degradation regardless of enzymes added 350 (Fig 5A). Furthermore when reactions were ran on a non-denaturing agarose gel we found that 351 XRN1 containing reactions migrated at a slower rate than the sample without XRN-1, possibly 352 indicating XRN-1/gGFP binding, and also without gGFP degradation (Fig 5B). While this was a 353 rather remarkable hypothesis-supporting result we still questioned if, in fact, sgRNAs transcribed 354 in vivo displayed the same property. Therefore, cytosolic Cas9-IP samples (reported in Fig 3B) were treated with XRN-1 followed by RT-PCR to amplify a gGFP specific product. While there 356 might have been a slight reduction in band intensity in the XRN-1 sample compared to the mock 357 sample, there was certainly a substantial population of RNAs that remained resistant to the 358 treatment (Fig 5C). Taken in total we believe this demonstrates XRN-1 resistance of the mature 359 sgGFP transcripts, indicating a potential mechanism that native 5′ to 3′ exoribonucleases play a 360 role in 5′ processing of progenitor-gGFP seen in planta. results from experiments presented here (Fig 6A). Even though this model explains the nuclear 400 generated transcripts processing events, we focused on the viral delivery for simplicity. Upon 401 viral expression of transcripts that contain 5ʹ sequences that do not correspond to the sgRNA 402 sequence, cytosolic localization is critical for optimal processing after or prior binding by Cas9.

403
In order for sgRNA transcripts to be trimmed to the correct length, Cas9 binding might be 404 necessary, as has been suggested previously (Mikami et al. 2017), or due to the sgRNA being 405 inherently recalcitrant to exonuclease (Fig 6B). Specifically, on one hand Cas9 binding can be 406 inferred to as important for proper processing (Fig 2C and 2D), which agrees with previous Cas9. In essence, the inclusion of the sgRNA sequence within the protein would protect the RNA 410 from further degradation by host ribonucleases. This leads to one theory for our observed in 411 planta catalytic activity in which Cas9 "shields" the sgRNA sequence from further degradation 412 by exo-or endoribonucleases (right panel Fig 6B-D). However, we also provide evidence that 413 sgRNAs are resistant to at least one class of nucleases, the 5′ to 3′ exonucleases (Fig 5). Due to 414 the reliance of Cas9/sgRNA duplex catalytic activity on the 5′ sequence specificity of the sgRNA 415 (Fig 1C), we find this result particularly relevant. While endoribonucleases might affect the 416 formation of catalytically active Cas9-sgRNA complexes it seems rather unlikely that an endonuclease would have the specificity seen in Figure 2F or to produce catalytic events at the 418 rate seen in Figure 4C. 419 In either one of the above cases it seems that the most likely scenario for the processing 420 events, at least using our transcriptional models (TRBO-G-3′gGFP and pHco-U6-GFP-gGFP), is 421 an initial RNA cleavage event by an endoribonuclease which would provide the essential step of   For this, the Cas9 encoding sequence without the NLS was amplified using a forward primer 478 designed downstream of the NLS sequence and contained a BamHI site as well as a start codon (ATG) and a reverse primer was designed upstream of the C terminal NLS sequence followed by   slurry (Thermo Scientific) was added and the mixture was incubated for an additional hour at 4ºC. Cas9-protein G beads were collected through centrifugation at 2,500 g for 3 min.

542
Supernatant was removed and the agarose slurry was washed 5 times with 500 µl of RIPA 543 buffer. Following wash steps some of the resuspended slurry was used for western blot analysis 544 to detect proper Cas9-immunoprecipitation and the rest was used for RNA extractions.

727
RT-PCR was performed using primers depicted in A, using cDNA from total RNA and Cas9-IP 728 (also shown in C), to examine presence of in vivo gGFP 5′ overhangs. Enrichment of gGFP RNAs 729 that do not encompass the predicted subgenomic RNAs as shown by ample amplification using 730 F3-R2 primers and not F2-R2 for Cas9-TRBO. The positive control (+C) was carried out using 731 TRBO-G-3′gGFP purified plasmid. The expected amplified sgRNA structure is indicated to the 732 right with the red lines representing sgRNA specific sequence and black depicting viral RNA. E) 733 Total RNA was sampled at 3, 5 and 7 dpi from 16c tissue infiltrated with TRBO-G-3′gGFP both 734 with and without pHcoCas9. Total RNA was assayed for processed and unprocessed sgRNAs by  Primers F4, F7, and F8 are located in increasing distance upstream to gGFP, respectively.