SMARTer single cell total RNA sequencing

Abstract Single cell RNA sequencing methods have been increasingly used to understand cellular heterogeneity. Nevertheless, most of these methods suffer from one or more limitations, such as focusing only on polyadenylated RNA, sequencing of only the 3′ end of the transcript, an exuberant fraction of reads mapping to ribosomal RNA, and the unstranded nature of the sequencing data. Here, we developed a novel single cell strand-specific total RNA library preparation method addressing all the aforementioned shortcomings. Our method was validated on a microfluidics system using three different cancer cell lines undergoing a chemical or genetic perturbation and on two other cancer cell lines sorted in microplates. We demonstrate that our total RNA-seq method detects an equal or higher number of genes compared to classic polyA[+] RNA-seq, including novel and non-polyadenylated genes. The obtained RNA expression patterns also recapitulate the expected biological signal. Inherent to total RNA-seq, our method is also able to detect circular RNAs. Taken together, SMARTer single cell total RNA sequencing is very well suited for any single cell sequencing experiment in which transcript level information is needed beyond polyadenylated genes.


INTRODUCTION
To understand the complexity of life, knowledge of cells as fundamental units is key. Recently, technological advances have emerged to enable single cell RNA sequencing (RNAseq). In 2009, Tang et al. published the first single cell RNAseq protocol in which cells were picked manually and transcripts reverse transcribed using a polydT primer (1). As the throughput was low, new methods using early multiplexing, such as STRT-seq and SCRB-seq, were introduced in which cells were pooled at an early step in the workflow, enabling processing of many cells in parallel (2)(3)(4). In contrast to these methods that have inherent 3 end or 5 end bias, Smart-seq2 generates read coverage across the whole transcript expanding the spectrum of applications as this method can be used for fusion detection, single nucleotide variants (SNV) analysis and splicing, beyond typical gene expression profiling applications (5,6). To reduce the polymerase chain reaction (PCR) bias generated in the aforementioned methods, CEL-seq and MARS-seq were introduced using linear in vitro transcription (IVT) instead of PCR to obtain enough cDNA for sequencing (7)(8)(9). Most recently, droplet and split-pool ligation based methods capturing thousands of single cells were developed, providing new insights in cellular heterogeneity and rare cell types (10)(11)(12)(13)(14). The main drawback of these methods is that analyses are typically confined to gene expression of only (3 ends of) polyadenylated transcripts (Table 1).
More complex analyses with respect to alternative splicing, allele specific expression, mutation analysis, assembly of (novel) transcripts, circular RNA (circRNA) quantification and post-transcriptional regulation, require full-length and full-transcriptome methods. Moreover, sequencing a large number of cells is often compromising sequencing depth, resulting in low coverage per cell and detection of only the most abundant transcripts (15). In contrast to these droplet-based methods, microfluidic chip and flowcytometry based platforms typically capture fewer cells, but are able to sequence entire transcripts and detect a substantially higher number of genes per cell, providing a more complete view of the complexity and richness of single cells' transcriptomes (6,8). Of note, most single cell RNA-seq studies assess only 3 end polyadenylated (polyA[+]) transcripts, ignoring non-polyadenylated (polyA [-]) transcripts (Table 1) (6,12,14). Since a substantial part of the human transcrip-  tome is non-polyadenylated, various RNA types, including circRNAs, enhancer RNAs, histone RNAs, and a sizable fraction of long non-coding RNAs (lncRNAs), are not quantified using these classic methods (16)(17)(18). In order to study polyA [-] transcripts at the single cell level, total RNAseq workflows were developed (19)(20)(21). While in principle both polyA[+] and polyA [-] transcripts are converted into a sequencing-ready library using random primer mediated reverse transcription, these methods suffer from one or more of the following limitations: the strand-orientation information is lost and a high percentage of reads map to ribosomal RNA (rRNA) ( Table 1). Therefore, new methods circumventing these limitations are warranted. A rRNA depletion step is essential as up to 95% of the total RNA content in a mammalian cell consists of rRNA. Moreover, to discriminate sense and antisense overlapping transcripts, stranded sequencing is crucial; at least 38% of the annotated transcripts in cancer cells have antisense expression (22). Here, we developed a novel easy to use and efficient single cell total RNA-seq workflow based on the SMARTer Stranded Total RNA-Seq Kit -Pico Input Mammalian combining for the first time strandeness and effective removal of ribosomal cDNA (Table 1). We ported the method to Fluidigm's C1 single cell microfluidics instrument, and demonstrated that the method works equally well on FACS sorted cells in microplates. In total, 458 cells from five different human cancer cell lines in four experiments were sequenced with a total sequencing depth of 1528 million reads. Using our novel method, we consistently observe <3% of ribosomal reads and we detect >5360 genes by at least four reads, including novel genes, polyA[-] genes and circRNAs.

Cell lines
The

Cell cycle analysis
Four million cells were washed with PBS (Gibco, 14190094) and the pellet was resuspended in 300 l PBS. Next, 700 l of 70% ice-cold ethanol was added dropwise while vortexing to fix the cells. Cells were stored at −20 • C for at least 1 h. After incubation, cells were washed with PBS and the pellet was resuspended in 1 ml PBS containing RNAse A (Qiagen, 19101) at a final concentration of 0.2 mg/ml. After 1 h incubation at 37 • C, propidium iodide (BD biosciences, 556463) was added to a final concentration of 40 g/ml. Samples were loaded on a S3 cell sorter (Bio-Rad) and analyzed using the FlowJo v.10 software.

RNA isolation and cDNA synthesis
Total RNA was isolated using the miRNeasy mini kit (Qiagen, 217084) with DNA digestion on-column accord- Nucleic Acids Research, 2019, Vol. 47, No. 16 e93 ing to the manufacturer's instructions. RNA concentration was measured using spectrophotometry (Nanodrop 1000, Thermo Fisher Scientific). cDNA was synthesized using the iScript Advanced cDNA synthesis kit (Bio-Rad, 1708897) using 500 ng RNA as input in a 20 l reaction. cDNA was diluted to 2.5 ng/l with nuclease-free water prior to RT-qPCR measurements.

Single cell total RNA library preparation of nutlin-3 treated NGP cells
Cells were washed with PBS and pellets of vehicle treated cells were resuspended and incubated in 1 ml pre-warmed (37 • C) cell tracker (CellTracker Green BODIPY Dye, Thermo fisher Scientific, C2102) for 20 min at room temperature. After incubation, cells were washed in PBS and resuspended in 1 ml wash buffer (Fluidigm, 100-6201). An equal number of stained (vehicle treated) and non-stained (nutlin-3 treated) cells were mixed and diluted to 300 000 cells/ml. Suspension buffer (Fluidigm) was added to the cells in a 3:2 ratio and 6 l of this mix of was loaded on a primed C1 Single-Cell Open App IFC (Fluidigm, 100-8134) designed for medium-sized cells (10-17 m). Cells were captured using the 'SMARTer single cell total RNA-seq' script deposited in Script Hub (Fluidigm). Upon capture, cells were visualized using the Axio Observer Z1 (Zeiss) and a median multiplet rate of 34.54% was observed over all experiments. These cells were excluded from further analyses. Sequencing libraries were generated using the C1 running the 'SMARTer single cell total RNA-seq' script deposited on Script Hub. In short, the SMARTer Stranded Total RNA-Seq Kit v2 -Pico Input Mammalian (Pico v2, total RNA, Takara, 634413) was used to synthesize cDNA with following modifications. Cells were fragmented and lysed by loading 7 l of 10× reaction mix [2.3 l SMART Pico Oligo Mix v2, 6 l 5× first-strand buffer, 1 l 20× C1 loading reagent (Fluidigm), 3 l lysis mix (19 l 10× lysis buffer, 1 l RNAse inhibitor (40 U/l)), 1 l 1/1000 diluted ERCC spikes (Ambion, 4456740), 6.7 l water] and incubating the cells at 85 • C for 6 minutes (to lyse cells and fragment RNA) followed by 2 min at 10 • C. Next, 8 l first strand master mix [1 l C1 loading reagent, 4 l 5× first-strand buffer, 0.9 l RNAse inhibitor (40 U/l), 3.5 l SMARTScribe reverse transcriptase (100 U/l), 7.9 l SMART TSO Mix v2 (from Takara kit, 634413), 2.7 l water] was loaded and incubated at 42 • C for 90 min followed by 70 • C for 10 min. Finally, a PCR master mix for each well was made [1 l 20× loading reagent, 2 l 2.4 M forward primer (Takara, 634413), 2 l 2.4 M reverse primer, 13.1 l 1.5× PCR mix (1050 l 2× SeqAmp CB buffer, 42 l SeqAmp DNA polymerase, 308 l water)] and 5 l of each of these mixes was loaded in the harvest wells of the IFC. The samples were incubated for 1 min at 94 • C followed by 11 PCR cycles (30 s at 98 • C, 15 s at 55 • C, 30 s at 68 • C) and 2 min at 68 • C. Following this initial cDNA amplification, 12 wells were pooled per tube using 8 l of cDNA per cell. Next steps of the library prep were performed according to manufacturer's instructions with minor modifications. 13 PCR cycles were used for PCR2 and a 1:1 ratio was used for beads cleanup after PCR2. Next, the samples were resuspended in 22 l 5 mM tris buffer (from kit) and 20 l was used to perform a second beads cleanup using a 0.9:1 ratio. Finally, the samples were resuspended in 12 l tris buffer and the quality was determined on the Fragment Analyzer (Advanced Analytical). Of note, the protocol can also be executed using the single cell specific version of the kit, released by Takara (SMART-Seq Stranded Kit, 634442) after we had completed our C1 experiments.

Single cell polyA[+] RNA library preparation of nutlin-3 treated NGP cells
Vehicle treated cells were stained with cell tracker as described above. An equal number of stained (vehicle treated) and non-stained (nutlin-3 treated) cells were mixed and diluted to 300 000 cells/ml. Suspension buffer was added to the cells in a 3:2 ratio and 6 l of this mix of was loaded on a primed C1 Single-Cell Auto Prep Array for mRNA Seq (Fluidigm, 100-6041) designed for medium-sized cells (10-17 m). Single cell polyA[+] RNA sequencing on the C1 was performed using the SMART-Seq v4 Ultra Low Input RNA Kit for the Fluidigm C1 System (SMART-Seq v4, polyA[+] RNA, Takara, 635026) according to manufacturer's instructions. One microliter of the ERCC spikein mix was diluted in 999 l loading buffer to get a 1/1000 dilution of the ERCC spikes. One microliter of this dilution was added to the 20 l lysis mix. The quality of the cDNA was checked for 11 random single cells on the Fragment Analyzer. The concentration of the cells was measured using the quantifluor dsDNA kit (Promega, E2670) and glo-

Single cell total RNA library preparation of FACS sorted A375 and Jurkat cells
Cells were processed using the SMARTer Stranded Total RNA-Seq Kit v2 -Pico Input Mammalian (Takara, 634413) or the SMART-Seq Stranded Kit (Takara, 634444) reagents according to the manufacturer's instructions with some modifications that were also implemented in the C1 protocol. For the SMART-Seq Stranded Kit, the Ultra Low Input workflow described in the user manual was followed by pooling of eight samples according to Appendix A of the user manual. For the SMARTer Stranded Total RNA-Seq Kit v2 -Pico Input Mammalian, the cells were also processed as described for the SMART-Seq Stranded Kit, but using the reagents specific to the SMARTer Stranded Total RNA-Seq Kit v2 -Pico Input Mammalian, which were also used for the C1 protocol. For both kits, cells sorted in a lysis solution instead of 1× PBS were processed without addition of lysis buffer. Identital to the C1 protocol, the initial RNA shearing step was performed at 85 • C for 6 min and 10 and 13 PCR cycles were carried out for PCR1 and PCR2, respectively.

Library sequencing
All libraries were quantified using the KAPA library quantification kit (Roche) and libraries were diluted to 4 nM. For NGP, the polyA[+] RNA library and total RNA library were pooled in a 1/4 ratio and 1.5 pM of the pooled library was single-end sequenced on a NextSeq 500 (Illumina) with a read length of 75 bp and a total sequencing read depth of 274 million reads, combining single cell polyA[+] and total RNA libraries to prevent inter-run bias. A median sequencing read depth of 0.81 and 3.67 million reads per cell was reached for the single cell polyA[+] and total RNA libraries, respectively. In addition, 1.3 pM of the total RNA library was also sequenced in 2 × 75 paired-end sequencing run mode on the NextSeq 500, yielding 327 million reads and a median sequencing read depth of and 4.04 million per cell. The fastq data is deposited in GEO (GSE119984). A375 and Jurkat total RNA libraries were pooled and 1.2 pM of the pooled library was sequenced in 2 × 75 pairedend run mode on the NextSeq 500, yielding 41 million reads. FASTQ data is deposited in GEO (GSE130578).

Sequencing data quality control
While single-end sequencing libraries do not require pretrimming, the paired-end libraries were trimmed using cutadapt (v.1.16) (23) to remove three nucleotides of the 5 end of read 2. To assess the quality of the data, the reads were mapped using STAR (v.2.5.3) (24) on the hg38 genome including the full ribosomal DNA (45S, 5.8S and 5S) and mitochondrial DNA sequences. The parameters of STAR were set to retain only primary mapping reads, meaning that for multi-mapping reads only the best scoring location is retained. Using SAMtools (v1.6) (25), reads mapping to the different nuclear chromosomes, mitochondrial DNA and rRNA were extracted and annotated as exonic, intronic or intergenic. In contrast to the unstranded nature of polyA[+] Smart-seq v4 data, the total RNA SMARTerseq data is stranded and processed accordingly (unless explicitely mentioned). Gene body coverage was calculated using the full Ensembl (v91) (26) transcriptome. The coverage per percentile was calculated, followed by a loess regression fit.

Circular RNA detection
CircRNAs were detected using the deeper sequenced paired-end sequencing data. Trim galore (v.0.4.1) was used to trim adaptor sequences, perform quality filtering and remove three nucleotides from the 5 end of read 2. Subsequently, reads from all samples were combined, adding originating sample names to read names for later splitting of data. The combined data was used for circRNA detection using find circ (v.1) (31) using the reads2sample (find circ.py -r) option to allow circRNA detection on the combined dataset while dividing out the contribution from each sample in the output. Only circRNAs with unique mapping on both anchors were accepted. Human genome hg19 was used for circRNA analysis. CircRNAs were annotated with host gene names from RefSeq (release 75) and circBase IDs from circbase.org. The Database for Annotation, Visualisation and Integrated Discovery (DAVID, v.6.8) (32,33) was used for Gene Ontology (GO) analysis for the circRNA host genes using biological processes (BP) and molecular function (MF). P-value <0.05 was used for statistical significance.  transcriptomes were merged with the Ensembl (v.91) transcriptome as a reference. From the merged multi-cell transcriptome, only multi-exonic genes with a minimum length of 200 nt were retained. To define the set of novel genes, genes annotated in Ensembl (26) or LNCipedia (v.5.0) (28) were filtered out. All genes in this novel multi-cell transcriptome were quantified using Kallisto on single-end subsampled data (1, 4 or 8 million reads per cell). Genes with an estimated count higher than 1 were retained.

Principle of SMARTer single cell total RNA sequencing
We developed a single cell total RNA-seq protocol for unbiased, full transcript and strand-specific analysis of both polyadenylated and non-polyadenylated transcripts from mammalian cells. The method uses reagents from the SMARTer Stranded Total RNA-Seq Kit v2 -Pico Input Mammalian (Pico v2, total RNA), a kit that is meant for low input bulk total RNA-seq, whereby we optimized reaction volumes, number of PCR cycles, and duration and temperature of the RNA fragmentation. The library preparation method employs random primers and a template switching mechanism to capture full transcript fragments of both polyadenylated (polyA[+]) and non-polyadenylated (polyA[-]) transcripts. Unwanted ribosomal cDNA is removed using probes, complementary to mammalian rRNA. After successfully porting the bulk library prep protocol to Fluidigm's C1 single cell instrument, we assessed the performance of the single cell total RNA-seq protocol through three distinct experiments in which nutlin-3, JQ1 or doxy-cycline was used to treat NGP, SK-N-BE-2C and SHSY5Y-MYCN-TR neuroblastoma cell lines, respectively (with vehicle treated cells as control) (Figure 1). In addition, we performed matched single cell polyA[+] RNA-seq as a reference using cells from the same pool. While all experiments were successful, we focus our analyses and performance assessment on the NGP data. In this experiment, the treated and control cells were processed in the same microfluidic chip (preventing possible chip bias), the highest number of cells were captured, and the highest sequencing depth was reached.

SMARTer single cell total RNA sequencing yields highquality data
In single cell sequencing experiments, it is important to prevent or limit potential biases that mask true biological differences. In particular, the cell cycle state is a known confounder (35). Therefore, we synchronized the cells through serum starvation for 24 hours. Upon synchronization, 80.3% of the NGP cells showed an arrest at the G0/G1 stage compared to only 53.3% for non-synchronized NGP cells (Supplementary Figure S1A and B). Subsequently, the synchronized NGP cells were treated for 24 h with vehicle or nutlin-3, the latter known to release TP53 from its negative regulator MDM2. 95% CI], respectively). Nevertheless, the fraction of nuclear rRNA is very low in the total RNA libraries considering the use of random priming data (Figure 2A), and substantially lower compared to the RAMDA-seq method (9.667% rRNA [9.615, 9.719; 95% CI], Supplementary Figure S7) Figure 2B). Non-polyadenylated histone genes are highly abundant in the total RNA libraries, while low or absent in the polyA[+] libraries, confirming the validity of our single cell total RNA-seq workflow (Supplementary Figure  S8). Equal results were obtained for the SK-N-BE-2C, and SHSY5Y-MYCN-TR cell lines (Supplementary Figure S9).

SMARTer single cell total RNA sequencing reveals a unique set of genes
More reads map to long intergenic RNAs (lincRNAs) using the single cell total RNA-seq protocol ( Figure S10). Considering both polyA[+] RNA-seq and total RNA-seq data, 3978 different antisensesense relationships with an overlap of >200 nucleotides were detected with expression of the sense or antisense gene in at least one cell. These loci are prone to erroneous quantification. Quantification of the stranded SMARTer data in an unstranded way shows that 42.1% (median of 180 of the 428 detected antisense genes per cell) of the detected antisense genes (in six random cells) are receiving counts, while they have zero counts when properly treated as stranded data; further, 10.1% of the antisense genes detected in both analyses display fold change differences larger than 2 (Supplementry Figure S11). Most of these genes with fold change differences (87.0%) are more abundant in the unstranded analysis compared to the stranded analysis, explained by the fact that these antisense genes are consuming counts from the sense gene.
LincRNAs, antisense genes and pseudogenes are clearly expressed in fewer cells compared to protein coding genes. We hypothesize that low abundant genes might be missed because of sampling bias during the sequencing workflow or that lincRNAs, often low abundant in nature, are expressed under specific conditions or stimuli (Figure 4)    million) in the total RNA library protocol results in the detection of a higher number of genes. We observed no saturation when generating 8 million reads per cell, suggesting that deeper sequencing could yield even more detected genes ( Figure 3). The overlap between protein coding genes detected in the polyA[+] and total RNA libraries (subsampled for 1 million reads/cell and mean expression of at least 1 read over all cells) ( Figure 5A) is high. Genes detected in only one of the library types are generally lower abundant compared to genes detected with both methods ( Figure 5B). In contrast to protein coding genes, the overlap for lincR-NAs between the methods is much smaller ( Figure 5C). Importantly, a significant fraction of the total RNA-seq spe-  cific lncRNAs display a high expression, thus possibly representing functionally important RNAs ( Figure 5D). Lin-cRNA RMRP is one of the most abundant lincRNAs that is solely detected by our novel single cell total RNA-seq workflow. This gene is known to be 3 non-adenylated and is the first known RNA encoded by a single-copy nuclear gene imported into mitochondria (38,39). As only a subset of the lincRNAs and antisense genes are currently annotated in Ensembl, we also quantified our libraries with the LNCipedia transcriptome (the most comprehensive human resource of both antisense and lincRNA genes, further referred to as lncRNAs). While the number of detected lncR-NAs is slightly lower in the total RNA-seq libraries if an equal number of reads (1 million) is used, each library type contains a certain proportion of unique lncRNAs (Supplementary Figure S12). LNCipedia is likely biased towards medium-to-high abundant polyadenylated lncRNAs.

SMARTer single cell total RNA sequencing detects circular RNAs and novel genes
In addition to linear RNA biotypes, we tested whether the single cell total RNA-seq protocol is able to quantify cir-cRNAs as this class of non-coding RNAs lacks a polyAtail and in principle can only be detected using unbiased total RNA-seq. With a requirement of at least two unique back-spliced junction reads, 537 circRNAs were identified derived from 460 host genes (Supplementary Table S2). The majority of the circRNAs were found in fewer than 3 out of 64 cells, with only 14 circRNAs detected in at least four cells. Gene Ontology analysis for molecular functions and biological processes was performed on the circRNA host  genes from both treated and untreated cells. A significant enrichment of TP53 binding, TP53 pathway, cell cycle, and chromosome organization suggests that the identified cir-cRNAs may play a role in these biological functions.
In the single cell total RNA libraries, the fraction of intergenic reads (relative to existing Ensembl and LNCipedia annotation) is high, suggesting that these reads originate from novel unannotated transcripts. To validate this hypothesis, we generated genome and transcriptome guided transcriptome assembly of the paired-end single cell total RNA-seq data resulting in 5360 novel, multi-exonic genes. The novel transcripts have a median length of 317 nucleotides (Figure 6A) and consist on average of more than three exons ( Figure 6B). Quantification of this novel transcriptome using the single-end data subsampled at 1 million reads per cell resulted in a median number of 59 novel genes per cell [55-63; 95% CI] ( Figure 6C). Of note, most novel genes are expressed in only one cell ( Figure 6D).

SMARTer single cell total RNA profiles reflect the biological signal
To assess whether the single cell total RNA-seq protocol is also able to reveal known biological signal, we performed differential expression analysis using DESeq2 combined with the Zinger method coping with zero inflated data. Based on the ranking obtained by the DESeq2 test statistic, gene set enrichment analysis using the hallmark gene sets was performed. Firstly, the same gene sets are significantly enriched in both library preparation protocols (Figure 7A); secondly, TP53 target genes are--as expected--the most significantly enriched gene set ( Figure 7B), confirming that the biological signal is recapitulated through single cell total RNA-seq analyses.

SMARTer single cell total RNA sequencing of FACS sorted cells in microplates
To demonstrate that our novel single cell total RNA seq method also efficiently works on FACS sorted cells in microplates, we processed A375 and Jurkat sorted cells. In parallel, Takara's single cell purposed SMART-Seq Stranded Kit was also tested on these cells (Figure 1). Equally low amounts of ribosomal cDNA were sequenced using both reagent kits, i.e. 1 (Figure 8, Supplementary Figure S13). Similar to the total RNA seq libraries generated on the C1 system, we analysed the number of reads assigned to intron, exon and intergenic regions and the read fraction for all RNA biotypes. The microplate sorted single cell data was very comparable to the C1 data ( Figure 8, Supplementary Figure S13).

DISCUSSION
In this study, we developed a single cell total RNA-seq method to sequence full transcripts from single cells in an essentially unbiased manner. To demonstrate the perfor-mance of the method, we applied single cell total RNAseq in four experiments on five different cancer cell lines, of which three undergoing a specific perturbation. In parallel, we also performed single cell polyA[+] RNA-seq on three cell lines using the well-established Smart-seq v4 method (6,40). As in any genomics study, the experimental set-up may suffer from confounding factors, such as variations in cell cycle states of the cells and batch effects of single cell capture and sequencing, masking real biological differences.
In two of the four experiments, we carefully controlled all these experimental biases. The cell cycle bias was minimized  by cell cycle synchronization using serum starvation. We also avoided potential cell selection bias by capturing differentially labeled treated and untreated cells on the same chip (35,36). Finally, sequencing bias was minimized by sequencing both polyA[+] and total RNA libraries on the same Illumina flow cells.
The single cell total RNA-seq method has some distinctive advantages compared to other methods. First, in any total RNA-seq library, depletion of rRNA is essential as this makes up the bulk of the total RNA mass. Depletion of rRNA from single cells prior to cDNA synthesis is technically very difficult. Here, we used ribosomal cDNA specific removal probes, resulting in <3% of ribosomal reads per single cell library. This highly efficient rRNA depletion step is a major improvement compared to RAMDA-seq, where 10-35% of the reads map to rRNA (20). Second, given the stranded nature of the single cell total RNA sequencing data, quantification of antisense genes is accurate, which is not possible when using unstranded data. In contrast to the three existing single cell total RNA-seq methods, our method uniquely combines these two features that are highly desirable for total RNA-sequencing (19)(20)(21). Third, as expected, our single cell total RNA libraries contain substantially more intronic reads compared to polyA[+] RNA libraries (41,42). Such intronic reads can be used to detect changes in nascent transcription, whereby the difference in exonic and intronic reads provides insights in posttranscriptional regulation (43). As such, we believe that our method may be particularly well suited for 'RNA velocity analysis' of single cells (44). Fourth, the single cell total RNA-seq workflow presented in this paper detects relatively more protein coding genes, pseudogenes, lincRNAs and miscellaneous RNA (miscRNA) compared to single cell polyA[+] RNA libraries, when corrected for equal sequencing depth. While the number of detected genes increases with sequencing depth, there seems to be no plateau yet at 8 million reads, suggesting that further increasing the sequencing depth, could enable low abundant gene detection. Fifth, our method also detects non-polyadenylated RNA molecules, such as histone genes, lncRNAs and cir-cRNAs. In the NGP dataset, 537 circRNAs were detected using reads with evidence for back splicing. In order to detect more circRNAs in an individual cell, a higher sequencing depth is required or libraries should be enriched for cir-cRNAs by selectively removing linear RNA by exonuclease treatment prior to library prep and sequencing (18). Sixth, the data enables reference guided transcriptome assembly, resulting in the detection of 5360 novel genes. Finally, differential gene expression analysis and gene set enrichment of NGP cells treated with nutlin-3 confirmed activation of the TP53 pathway at the transcriptional level.
One limitation of the implementation of the single cell total RNA library preparation method on the C1 instrument is the relatively low throughput, as maximally 96 cells are simultaneously captured. In contrast, current dropletbased single cell methods capture thousands of individual cells, but these systems are limited to 3 end sequencing of polyadenylated RNA, preventing quantification of splice variants and non-polyadenylated transcripts. To enable the analysis of higher cell numbers, we demonstrated that the method works equally well on FACS sorted cells in microplates. By using FACS sorted cells the throughput can be increased and no specialized devices, such as the C1, are required. Finally, an advantage of our total RNAseq protocol on both C1 and in microplates is that singleend sequencing is sufficient while more expensive pairedend sequencing is required for most droplet-based methods. We advice to use the single cell total RNA-seq method rather than polyA[+] methods if it is desired to study nonpolyadenylated RNA molecules such as lncRNAs or circR-NAs, if strand-specific data is a must and if full transcript sequencing is priority (e.g. analysis of alternative splicing, RNA editing or somatic mutations).

DATA AVAILABILITY
The SMARTer single cell total RNA sequencing script is deposited in Script Hub (Fluidigm). The fastq files and  processed data is available through GEO (GSE119984 and GSE130578).