A Cas6-based RNA tracking platform functioning in a fluorescence-activation mode

Abstract Given the fact that the localization of RNAs is closely associated with their functions, techniques developed for tracking the distribution of RNAs in live cells have greatly advanced the study of RNA biology. Recently, innovative application of fluorescent protein-labelled Cas9 and Cas13 into live-cell RNA tracking further enriches the toolbox. However, the Cas9/Cas13 platform, as well as the widely-used MS2-MCP technique, failed to solve the problem of high background noise. It was recently reported that CRISPR/Cas6 would exhibit allosteric alteration after interacting with the Cas6 binding site (CBS) on RNAs. Here, we exploited this feature and designed a Cas6-based switch platform for detecting target RNAs in vivo. Conjugating split-Venus fragments to both ends of the endoribonuclease-mutated Escherichia coli Cas6(dEcCas6) allowed ligand (CBS)-activated split-Venus complementation. We name this platform as Cas6 based Fluorescence Complementation (Cas6FC). In living cells, Cas6FC could detect target RNAs with nearly free background noise. Moreover, as minimal as one copy of CBS (29nt) tagged in an RNA of interest was able to turn on Cas6FC fluorescence, which greatly reduced the odds of potential alteration of conformation and localization of target RNAs. Thus, we developed a new RNA tracking platform inherently with high sensitivity and specificity.


INTRODUCTION
The current live-cell RNA tracking techniques fall into two types: fluorescence-enrichment (FE) type and fluorescence-activation (FA) type. An FE-type platform usually comprises two key components: a fluorescent protein-conjugated RNA binding protein (RBP-FP) that acts as a reporter or tracker, and a specific RBP binding sequence (RBS) genetically constructed on target RNAs. A prominent example of the FE-type is the MS2-MCP RNA tracking system in which MCP stands for MS2 phage coat protein (1). MCP binds a specific RNA stem-loop structure named MBS (MCP binding site). For tracking target RNAs in living cells, RNAs of interest are to be tagged with multiple copies of MBS for recruiting fluorescent protein-conjugated MCP (MCP-FP), thus making the target RNAs visible. The MS2-MCP system is the first invented and the most widely used RNA tracking system. After that, various analogous FE-type RNA tracking systems were established, including PP7-PCP, and N (2,3). However, the FE-type platforms usually have a weakness of high background noise. For enhancing the signal-to-noise ratio (SNR), a rather high copy number (usually >24 copies, >1000 bp) of exogenous tandem-repeated RBSes are used to tag the target RNAs (1,2), which is complex in genetic manipulation and potentially raises the possibility of altering the structure and localization of the target RNAs.
Recently, Cas9 and Cas13 gene editing tools were proposed to RNA imaging (4)(5)(6). Benefiting from the guide RNA (gRNA)-mediated RNA targeting, FP-conjugated Cas9/Cas13 could track any RNA of interest. However, given the fact that at least 12 copies of MBS are required for MS2-MCP system to gain an acceptable SNR (1), application of quite a few gRNAs targeting different motifs of a single target RNA is conceivably required for a Cas9/Cas13 RNA imaging platform to achieve decent SNR. Besides, the relatively large size of Cas9/Cas13 is a concern, since accumulation of multiple Cas9/Cas13 reporters onto a single RNA would possibly compromise the integrity of the conformation and localization of the RNA of interest. So far, the MS2-MCP platform is still the best of FE-type RNA imaging tool.
All FE-type RNA tracking platforms inherently suffer from high background noise due to the application of florescent reporters. This concern is partly addressed by the FAtype RNA tracking/imaging platforms. The FA-type platforms are based on bimolecular fluorescence complementation (BiFC). For example, MCP and PCP (an RNA binding protein analogous to MCP) were fused with N-and C-fragment of a certain FP, respectively; the resultant two reporters, MCP-FPN and PCP-FPC, would not reconstitute a fully functional florescent protein until they are brought together by adjacent MBS and PBS (PCP binding site) on an RNA target (7,8). As such, the FA-type platforms have much improved SNR. Undoubtedly, the current FA-type platforms, such as BiFC or TriFC RNA tracking systems (8), outstrip any FE-type tools in sensitivity. However, in contrast to the 12 × MBS tag which MS2-MCP system requires, even larger RNA tag, e.g. 12 × MBS-PBS (1400 nt), is frequently used for the current FA-type platforms (7,8). Insertion of this large exogenous RNA sequence is liable to cause problems mentioned above.
Cas6 is a core component of type I-E CRISPR complex. It binds and cleaves pre-crRNA by recognizing a specific stem-loop RNA element designated as CBS (Cas6 binding site) (9,10). CBS-binding induces an allosteric change of Cas6, leading to juxtaposing its N and C termini (9). Employing this feature, we engineered an Escherichia colioriginated Cas6 (EcCas6) protein into an FA-type reporter. The catalytic domain-mutated EcCas6 (dEcCas6) was appended at its N-and C-termini with the non-fluorescent moieties of protein Venus, Venus-N (VN) and Venus-C (VC), respectively. The resultant chimera VN-dEcCas6-VC was not fluorescent until binding to CBS on the RNAs of interest. Because Fluorescence Complementation is mediated by the allosteric switch of Cas6, we name this platform as Cas6 based Fluorescence Complementation (Cas6FC). Thus, we designed a new FA-type RNA tracking platform with favorable sensitivity and specificity.

Plasmids
The annotations and important sequences of all the plasmids used in this study are presented in Supplementary Figure S1-S18.

Cell culture and transfection
HEK293T, COS-7 and HeLa cell lines were purchased from China Center for Type Culture Collection (CCTCC). HEK293T and HeLa cells were cultured in Dulbecco's modified Eagle's medium (Gibco) supplemented with 10% fetal bovine serum (Gibco). COS-7 cells were cultured in RPMI Medium 1640 (Gibco) supplemented with 10% fetal bovine serum (Gibco). Lipofectamine ® 2000 Reagent (Invitrogen) was used for COS-7 cell transfection according to the manufacturer's instruction with the modified dose of 1 l per well on 24-well plate. Entranster™-R4000 (Engreen Biosystem) was used for HEK293T and Hela cell transfection according to the manufacturer's instruction with the modified dose of 0.5 l per well on 24-well plate. The plasmid usage was listed in Table 1.

Fluorescence in situ Hybridization of RNA
Hela cells and HEK293T cells were cultured in 24-well plates with poly-D-lysine coverslips, and then these cells were co-transfected with 200 ng VN-dEcCas6-VC-GK and 800 ng Actin-16 × CBS-GK or hTERC-16 × CBS-GK. Single molecule inexpensive fluorescence in situ hybridization (smiFISH) was employed for RNA imaging (11). The sequences of primary probes and FLAP probes were listed in Supplementary Table S1. The smiFISH was performed according to the previously reported method. Briefly, cells were fixed with 4% PFA for 15 min at 37 • C and washed with 1 × PBS (pH7.4) for 5 min three times, followed by permeabilization with 0.5% Triton X-100 for 15 min, and these cells were rewashed with 1 × PBS (pH7.4) for 5 min three times. Cells were incubated in hybridization solution at 37 • C for 12 h. After hybridization, the samples were washed with 15% formamide (in 1 × SSC) at 37 • C for 30 min three times, stained with 0.1 g/ml DAPI at room temperature for 10 min, and mounted in antifading mounting medium (Solarbio).

Flow cytometry assay
The HEK293T cells in 24-well glass-bottom plate were transfected with plasmids according to the experiment design (Table 1). Four replications were performed for each experiment sample, of which two replicates were observed using confocal microscope, the other two replicates were subjected to flow cytometry analysis. At 24 h post transfection, the HEK293T cells were digested with 0.25% (w/v) trypsin (in 1 × PBS pH7.4) and analyzed with BD FAC-SCelesta flow cytometer. Prior to analyzing background and CBS-induced specific fluorescence of VN-dEcCas6-VC, the fluorescent compensation was performed with Actin-GK + pDsRed-Monomer-C1 transfected cells used for red fluorescence channel compensation, and VN-dEcCas6-VC-GK + Actin-16 × CBS-GK transfected cells used for green fluorescence channel compensation, respectively. The parameters for flow cytometry analysis were as follows: 440 V laser voltage for red fluorescence channel (PE-Texas Red) and 320 V laser voltage for green fluorescence channel (FITC), respectively. Fluorescence compensation parameters were 4.8 for red fluorescence channel and 12.6 for green fluorescence channel, respectively. A total of 10 000 single living cells were sorted from all the cells according to FSC (Forward Scatter) and SSC (Side Scatter). As the co-transfected cells should express DsRed-Monomer fluorescent protein, the red fluorescence positive (Red + ) cells were then sorted and taken for analysis of VN-dEcCas6-VC background and specific fluorescence. The Actin-GK + pDsRed-Monomer-C1 co-transfection group worked as control group; the VN-dEcCas6-VC-GK + pDsRed-Monomer-C1 co-transfection group (named as without CBS group) was used for analyzing initial background fluorescence; the VN-dEcCas6-VC-GK + Rm-20 × CBS-C1 group (named as with CBS group) was used for analyzing specific fluorescence. The maximum fluorescence intensity of Red + control group cells was used as a threshold for determining initial background fluorescence or specific fluorescence. The Red + without CBS group cells fluorescence whose intensity was higher than the above threshold was determined as initial background fluorescence; the Turn-off reporter system for confirming dEcCas6 inactivation in HEK293T cells 100 ng CBS-EGFP-N1 + 300 ng pDsRed-Monomer-C1 as negative control Figure 1A 100 ng CBS-EGFP-N1 + 300 ng EcCas6-GK 100 ng CBS-EGFP-N1 + 300 ng dEcCas6-GK Turn-on reporter system for confirming dEcCas6 inactivation in HEK293T cells 100 ng Rm-16 × CBS-Lin28-C1 + 300 ng EGFP-C1 as negative control      Note: In the process of naming plasmid, 'DsRed-Monomer' is abbreviated as 'Rm', e.g. Rm-20 × CBS-C1.
Red + with CBS group cells fluorescence whose intensity was higher than the above threshold was determined as specific fluorescence.

RNA imaging
For RNA imaging in living cells, the transfected cells in 24well plates were observed on ZEISS Vert.A1 microscope platform at 24 h post transfection. For FISH imaging, the coverslips were mounted and observed on ZEISS Vert.A1 microscope platform. Exposure time of each fluorescence channel was set as 5 s, and that of bright field as 100 ms. For the observation of fixed cells by using confocal microscope, the transfected cells were fixed with 4% PFA for 15 min at 37 • C, and washed with 1 × PBS (pH7.4) for 5 min three times, followed by permeabilization with 0.5% Triton X-100 for 15 min, and these cells were rewashed with 1 × PBS (pH 7.4) for 5 min three times. After being stained with 0.1 g/ml DAPI and washed with 1 × PBS (pH7.4) for 5 min three times, the cells were observed with Nikon A1R HD25 Confocal Microscope.

Escherichia coli (Ec) Cas6 could be used for in vivo RNA tracking
Due to their CBS binding capability, members of Cas6 family have the potential to be used as an FE-type RNA tracking tool. To test this possibility, we selected Cas6 derived from Escherichia coli (EcCas6) and investigated its application in tracking RNAs in mammalian cells at physiological temperature (37 • C). We constructed a catalytically inactive EcCas6 mutant (an H→A mutation at Position 20 according to a previous study (12), and designated it as dEcCas6 (dead EcCas6). To confirm that a. EcCas6 could recognize and cleave CBS and b. dEcCas6 lost the endoribonuclease activity but retained its binding activity to CBS in mammalian cells, we test them in de novo designed turnoff and turn-on reporter systems. In the turn-off system, 1 × CBS sequence was inserted into the 5 -UTR of the EGFP mRNA. Since the 5 cap is pivotal for mRNA translation and EcCas6-mediated cleavage would remove 5 -cap from EGFP mRNA, a successful EcCas6-CBS interaction would lead to reduction of EGFP fluorescence intensity. In HEK293T cells, co-transfection of EcCas6-expressing vector significantly reduced EGFP fluorescence intensity, which was not observed when dEcCas6-expressing construct was used instead ( Figure 1A). In the turn-on system, a sequence of 16 × CBS and a sequence of 2 × LIN28 RNA nuclear retaining signal were sequentially inserted into the 3 -UTR of the DsRed-Monomer (abbreviated as Rm) mRNA. LIN28 RNA nuclear retaining signal would retain the host mRNA in nucleus (13); as a consequence, the DsRed-Monomer fluorescent protein could not be translated. Removal of LIN28 RNA nuclear retaining signal by EcCas6 cleavage could release the DsRed-Monomer mRNA for translation in the cytoplasm. Indeed, transfection of the Rm-CBS-LIN28-C1 vector showed nearly undetectable DsRed-Monomer protein expression when evaluated by a fluorescent microscope. However, introduction of EcCas6 significantly enhanced the fluorescence of DsRed-Monomer protein, reflecting a successful EcCas6-mediated cleavage. As predicted, introduction of dEcCas6 to the cells could not turn on the expression of DsRed-Monomer protein ( Figure 1B). Taken together, we verified effectiveness of the EcCas6-CBS system in mammalian cells and the dEcCas6 did lose enzymatic activity. The mutation on dEcCas6 should only abrogate its endoribonuclease activity and will keep the CBS-binding ability. Employing this property of dEcCas6, we genetically labeled dEcCas6 with EGFP and used this fluorescent chimera protein to track target RNAs which were genetically tagged with CBS. As shown in Figure 1C, without the 20 × CBS tag on the target DsRed-Monomer mRNA, both EcCas6-EGFP and dEcCas6-EGFP could be visualized in both cytoplasm and nucleus. With the 20 × CBS tag hanged on the target mRNA, only the cells co-transfected with the dEcCas6-EGFP but not the EcCas6-EGFP construct could localize the target mRNA in the cytoplasm. Thus, dEcCas6, when labeled with a fluorescent protein, could be applied for RNA detection as an FE type tool.

Establishment of a VN-dEcCas6-VC based FA-type RNA tracking platform
Similar to other FE-type RNA tracking platforms, the dEcCas6-EGFP technique also suffers from a highbackground problem with unfavorable signal-to-noise ratios. It was reported that binding to CBS induced a conformational change of Thermus thermophilus Cas6 (TtCas6), resulting in juxtaposing the N and C termini of TtCas6 (9) (Supplementary Figure S19). This finding prompted us to investigate the possibility that Cas6 could be exploited as an FA platform for tracking RNA in vivo. However, the premium working temperature for TtCas6 is around 65 • C, precluding its usage in mammalian cells. E. coli propagate at 37 • C. Based on the similarity of the structures between EcCas6 and TtCas6 (Supplementary Figure S19B) (14), we postulated that EcCas6 could be exploited for this purpose in mammalian cells.
To construct an FA type fluorogenic reporter, we linked split-FP fragments to the ends of dEcCas6 protein with diverse peptide linkers, generating a series of dEcCas6-split FP fusion proteins (Supplementary Figure S20-S22). The FA type fluorogenic reporter should be non-fluorescent at free state but would turn fluorescent upon binding target RNAs. According to this criterion, a desired fusion protein, termed as VN-dEcCas6-VC (53 kDa), was isolated.
This VN-dEcCas6-VC fusion protein consists of sequentially linked Venus-N terminal fragment (1-153), linker1, dEcCas6, linker2 and Venus-C fragment (154-238) (Figure 2A and Supplementary Figure S7). To demonstrate the proof of concept, we transfected HEK293T cells with a plasmid expressing VN-dEcCas6-VC; together, a vector expressing either the natural DsRed-Monomer mRNA or a CBS-tagged DsRed-Monomer mRNA was co-transfected. The 20 × CBS tag was appended at the 3 terminal of the mRNA to minimize its influence on RNA conformation and distribution. VN-dEcCas6-VC did not display self-fluorescence when untagged DsRed-Monomer mRNA was transcribed ( Figure 2B). However, the signal of fluorescent Venus could be successfully detected in the cytoplasm when CBS-tagged DsRed-Monomer mRNA was used. Appending the 20 × CBS tag on the 3 terminal did not affect translation of the mRNA since comparable DsRed-Monomer protein signals could be observed between the two groups ( Figure 2B). Similar results were obtained when COS-7 and Hela cell lines were tested (Supplementary Figure S23), indicative of the applicability of the dEcCas6 based Fluorescence Complementation RNAtracking platform to a broad range of mammalian cells. Different from former BiFC or TriFC FA type RNA tracking platform, whose Fluorescence complementation is mediated by indirect protein-protein interactions through juxtaposed binding with their RBSes; VN-dEcCas6-VC is a one protein, RBS triggered, allosteric switch-based FC (Fluorescence complementation) RNA tracking platform. As all its unique characters are empowered by the Cas6 protein, we name this new RNA tracking platform as Cas6 based Fluorescence Complementation (Cas6FC).
Finally, to validate the authenticity of Cas6FC RNA tracking, we made a side-by-side comparison between the Cas6FC platform and RNA smiFISH (single molecule inexpensive fluorescence in situ hybridization) technique. To exclude the potential bias introduced by technique discrepancy between them, the FISH probes were designed aiming at the natural non-CBS sequence of the target RNAs and the same batch of cells were tested by both Cas6FC and RNA smiFISH ( Figure 2C). In this case, the FISH probes were linked with Cy5, a fluorophore with emission wavelength distant from Venus, enabling co-localization analysis. The ACTB mRNA of HEK293T cells and the hTERC (human telomerase RNA component) lncRNA of Hela cells were chosen as the target RNAs for the comparison. The mRNA of ACTB is cytoplasm-localized and encodes a housekeeping cytoskeleton protein (15) and the nucleuslocalized hTERC RNA is associated with cancer cell proliferation (16). The labeling patterns of both methods were examined by confocal microscopy. The highly consistent rendering between Cas6FC and RNA smiFISH in tracking RNAs at both targets ( Figure 2D and E) warrants the usage of the Cas6FC platform in probing RNAs in mammalian cells. Also, these results demonstrated that the strategy of tagging CBS on the 3 terminal would not affect natural localization of the targeted RNAs.  Table 1. Representative pictures from 3 times of independent experiments.

Cas6FC is a sensitive RNA tracking system
To further evaluate the sensitivity of the Cas6FC platform in tracking target RNAs, we utilized flow cytometry which is relatively more sensitive in differentiating signal from noise than fluorescent microscopy. Compared to the background reference provided by the cells co-transfected with Actin-GK vector (expressing ␤-actin protein as control) and pDsRed-Monomer-C1 vector (expressing DsRed-Monomer mRNA and protein), the cells co-transfected with the VN-dEcCas6-VC-GK vector (expressing the reporter of Cas6FC platform) and pDsRed-Monomer-C1 vector had almost no detectable Venus signal (0.78% positive cells versus 0% of background) ( Figure 3A). In contrast, 41.9% cells in the group co-transfected with the VN-dEcCas6-VC-GK vector and Rm-20 × CBS-C1 vector (expressing DsRed-Monomer-20 × CBS mRNA and DsRed-Monomer protein) displayed strong Venus fluorescence. Thus, the VN-dEcCas6-VC made negligible noise in the absence of CBS sequence in target RNAs.
Simply increasing the copy number of RBP binding motif could improve the sensitivity of FE type RNA tracking platforms. For instance, 24 × MBS (MCP binding sites) were often used in the MS2-MCP system to achieve optimal signals (1). However, a concern that inserting multiple copies of RBS to target RNAs would alter the structure and/or distribution of RNAs always haunts. We were thus set to determine the minimal RBS (CBS in this case) copies required for the Cas6FC platform. To this end, we first tested the sensitivity of Cas6FC in detecting a 29ntlong RNA carrying only one copy of CBS. This short RNA was transcribed under human U6 small nuclear RNA promoter (RNA polymerase III promoter) which is superior in generating short transcripts with high output. To our surprise, the 1 × CBS RNA could be conspicuously beheld in HEK293T cells ( Figure 3B). The signal was exclusively located to the nucleus, in line with the fact that U6 promoterdriven RNAs lack the 5 cap and Poly A tail of mRNAs and preferentially distribute in nucleus. Next, we examined the dose effect of CBS on detecting target mRNA driven by a CMV promoter (RNA polymerase II promoter) ( Figure  3C). The intensity of fluorescence increased in a CBS copy number-dependent manner, and again, the mRNA carrying one copy of CBS was visible in the Cas6FC platform. In conclusion, as minimal as 1 × CBS (∼29nt) could grant detectability of the Cas6FC platform. Insertion of such a short tag could potentially simplify genetic manipulation and at the same time significantly minimize the possibility of interfering with the folding and distribution of target RNAs.

The specificity of the Cas6FC RNA tracking platform
A successful application of the Cas6FC RNA tracking system partly depends on its specificity to the signature CBS binding sequence. To understand the species specificity of CBSes to EcCas6, we first analyzed the interactions between VN-dEcCas6-VC and CBS cognates in vivo. Five CBSes derived from different species were selected for this purpose (their nucleotide sequences and the corresponding Cas6 amino acid sequences were shown in Supplementary Table  S2). Compared to EcCBS, these CBSes differed in 7-11 nucleotides ( Figure 4A). Four copies of each cognate CBS, including EcCBS itself, were individually appended into the 3 UTR of the DsRed-Monomer gene. The results showed that the dEcCas6 based FC platform was highly faithful to EcCBS since none of the CBS derived from other species could induce Venus signal ( Figure 4A).
A previous work revealed that the CBS RNA motif had a hairpin conformation and its stem was critical for Cas6 recognition (17), different from the widely  Table 1. Representative pictures from three independent experiments were shown. used MBS (for MS2) and PBS (for PCP) RNA tracking platforms which more relied on the specificity of the nucleotide sequences on the loop structure (18,19). We next studied the sequence specificity of EcCBS for VN-dEcCas6-VC interaction. Based on their positions on the hairpin structure of EcCBS, the nucleotides were divided into three groups: stem-localized nucleotides (Positions 6-11 and 16-21), loop-localized nucleotides (Positions 12-15) and the rest non-stem-loop positions. Accordingly, a series of DsRed-Monomer-4 × mutant EcCBS gene embedded vectors were constructed. To test the specificity of the stem-localized nucleotides, we individually introduced single-nucleotide bulge mutations, transition/transversion mutations and stem-length mutations. As shown in Figure  4B, all the single-nucleotide bulge mutations, no matter occurring at the ends (C6G or C11G) or the middle (C8G) of the stem, blocked their interaction with VN-dEcCas6-VC, manifested by lack of fluorescence. Similarly, either transition/transversion mutation or stem-length mutation could completely abrogate their VN-dEcCas6-VC interaction ( Figure 4C and D), even though these EcCBS mutants retained the stem-loop structure. These mutagenesis assays  Table 1. substantiate the importance of the identity of nucleotides composing the stem structure in Cas6 recognition, in agreement with a structure-based projection predicting that the phosphate backbones of the nucleotides at positions 16-21 on EcCBS interact with dEcCas6 through both electrostatic force and hydrogen bonds (Supplementary Figure S24A).
In contrast to the stem nucleotides, mutation of any single nucleotide residing in the loop structure had minimal effects on its interaction with VN-dEcCas6-VC ( Figure 4E). According to the structure-based projection, both C14 and A15 putatively bind to dEcCas6 with hydrogen bonds (Supplementary Figure S24B). However, a C14G|A15U double mutant did not affect its interaction with VN-dEcCas6-VC, either ( Figure 4E). These data indicate that nucleotides in Position 12-15 are trivial in interacting with VN-dEcCas6-VC ( Figure 4E).
As for the third group of nucleotides, U5, A22, U23 and A24 were predicted to interact with EcCas6 with hydrogen bonds (Supplementary Figure S24B). However, A22 should not be crucial for interaction since it is supposed to bind to His20 of EcCas6, a homolog of His26 of TtCas6 (Supplementary Figure S24C and D); however, we constructed dEcCas6 by replacing His20 to Ala20 and dEcCas6 was still successfully used for probing CBS-carrying RNA ( Figure  1C). Indeed, A22U mutation was not found to impede the interaction ( Figure 4F). Similarly, A24U mutation had no effect, either. However, mutation at either U5 (to A) or U23 (to A) abrogated the development of Venus signal. Thus, the nucleotides at Positions 5 and 23, together with those stemlocalized nucleotides, of EcCBS appear irreplaceable for the dEcCas6 based RNA tracking platform (Cas6FC).
Finally, to interrogate potential off-target effect resulting from CBS-like sequence in mammalian transcriptomes, we analyzed occurrence frequencies of EcCBS core motif in the transcriptomes of several mammalian species. According to our data above and a previous study (14), the core motif of EcCBS for interacting with VN-dEcCas6-VC is between Position 4-24, i.e. 5 -UUCCCC GCNNN NGCGG GGNUN-3 . This core motif was aligned to RefSeq RNA databases (NCBI) of Homo sapiens, Rattus norvegicus and