Tissue-specific transcriptome profiling of the Arabidopsis thaliana inflorescence stem reveals local cellular signatures

Genome-wide gene expression maps with a high spatial resolution have substantially accelerated molecular plant science. However, the number of characterized tissues and growth stages is still small because of the limited accessibility of most tissues for protoplast isolation. Here, we provide gene expression profiles of the mature inflorescence stem of Arabidopsis thaliana covering a comprehensive set of distinct tissues. By combining fluorescence-activated nucleus sorting and laser-capture microdissection with next generation RNA sequencing, we characterize transcriptomes of xylem vessels, fibers, the proximal and the distal cambium, phloem, phloem cap, pith, starch sheath, and epidermis cells. Our analyses classify more than 15,000 genes as being differentially expressed among different stem tissues and reveal known and novel tissue-specific cellular signatures. By determining transcription factor binding regions enriched in promoter regions of differentially expressed genes, we furthermore provide candidates for tissue-specific transcriptional regulators. Our datasets predict expression profiles of an exceptional amount of genes and allow generating hypotheses toward the spatial organization of physiological processes. Moreover, we demonstrate that information on gene expression in a broad range of mature plant tissues can be established with high spatial resolution by nuclear mRNA profiling. One sentence summary A genome-wide high-resolution gene expression map of the Arabidopsis inflorescence stem is established.


Introduction
Characterizing gene expression in individual cell types is a powerful tool for revealing local molecular signatures in multicellular organisms. By combining genetically encoded fluorescent reporters driven by tissue-specific promoters and fluorescence-activated cell sorting (FACS), high-resolution gene expression profiles have been established for developmental hotspots like the root and shoot tips of Arabidopsis thaliana (Arabidopsis) (Birnbaum et al., 2003;Brady et al., 2007;Yadav et al., 2009;Yadav et al., 2014).
Although these datasets have served as central resources for the scientific community for many years, high-resolution gene expression maps have not been developed for many other organs or tissues. One of the obstacles is the reliable isolation of RNA from more differentiated tissues and cell types. Depending on their identity or developmental stage, plant cells are surrounded by cell walls with very diverse properties. This requires extensive tuning of methods for disrupting cell walls for each case individually (Bart et al., 2006;Lin et al., 2014) or hampers the isolation overall. Even when protoplasts can be isolated, their highly diverse sizes render the subsequent sorting process challenging.
Laser capture microdissection (LCM) is an alternative tool for a precise isolation of cell material (Schad et al., 2005;Chandran et al., 2010;Agusti et al., 2011;Blokhina et al., 2016). Spatial specificity of the profiling process is lower in this case, because genetically encoded markers for cell identity can hardly be used. However, LCM is a powerful method when genetically encoded markers are not available and cell types can be clearly identified by morphology or anatomical position.
Plant stems play fundamental roles in determining shoot architecture and act as transport routes connecting photosynthetically active source organs with the remaining plant body (Sanchez et al., 2012;Serrano-Mislata and Sablowski, 2018). After the transition from vegetative to reproductive growth, Arabidopsis forms inflorescence stems, which are composed of various tissues including epidermis, cortex, starch sheath, vascular bundles and pith ( Figures. 1A, B). These tissues fulfil very special roles in plant bodies and, by acting in concert, form the stem as a functional unit. For example, the epidermis protects the plant from desiccation by building a transpiration barrier and serves as a first line of defense against pathogens (Esau, 1977;Suh et al., 2005). In turn, the starch sheath, also designated as endodermis, executes gravity sensing (Morita et al., 2002;Nawrath et al., 2013). In vascular bundles, xylem and phloem tissues are composed of specialized cell types such as xylem fibers, xylem vessel elements, phloem sieve elements and phloem companion cells enabling water and nutrient transport ( Figure. 1B). In comparison to these highly specialized cells, cambium stem cells maintain the potential to generate secondary xylem and phloem cells for increasing transport capacities and mechanical support of the growing shoot system (Suer et al., 2011). Interestingly, many of the tissues found in stems are also found in other organs like roots or leaves. However, their organ-specific profiles have not been characterized systematically.
Like in most dicotyledonous species, Arabidopsis stems undergo a major anatomical transition during the initiation of radial growth and of extensive wood formation (Sanchez et al., 2012). In stems holding a primary tissue conformation, cambium stem cells are restricted to vascular bundles whereas in secondary stems, cambium cells are also found in interfascicular regions. Thereby, cambium stem cells establish concentric domains of vascular tissues important for an organized radial growth (Sehr et al., 2010;Sanchez et al., 2012).
Considering the central role of plant stems in determining plant architecture and physiology, it is vital to have information on gene expression profiles from a comprehensive set of cell types and tissues. From these data, physiological and developmental features of stem tissues can be reveled and the organ can be characterized as a functional unit. Although several studies have provided transcriptome profiles from several stem tissues and stages (Ko and Han, 2004;Ko et al., 2004;Suh et al., 2005;Brackmann et al., 2018), a systematic analysis has not been performed and stem-specific tissues like the pith or the phloem cap have not been targeted at all.
Due to the large organ diameter and the heterogeneity of cell walls in plant stems, enzymatic protoplasting seems unsuitable for harvesting material from different stem tissues with the same efficiency. Therefore, nucleus isolation intrudes itself as an attractive alternative. Nuclei are released from tissues after manual chopping (Galbraith et al., 1983) and are possible to be isolated from individual tissues by fluorescenceactivated nucleus sorting (FANS) (Zhang et al., 2005;Zhang et al., 2008;Slane et al., 2014). Moreover, although nuclear mRNA and cytosolic mRNA have different compositions and roles (Yang et al., 2017;Choudury et al., 2019), previous studies demonstrated that cellular gene expression can be deduced accurately from nuclear mRNA levels (Zhang et al., 2008;Deal and Henikoff, 2010;You et al., 2017).
Motivated by these considerations, we employed in this study FANS and LCM to extract and profile mRNA from a large set of tissues present in the primary Arabidopsis stem.
By using tissue-specific promoters for fluorescence labeling and profiling nuclei from seven tissues and LCM for two tissues for which no promoter has been identified, we reveal spatial information on gene activities in a genome-wide fashion. Because the primary inflorescence stem contains a large spectrum of tissues including extremes like cambium stem cells and terminally differentiated cells in the vasculature, our results demonstrate the broad applicability of these approaches.

Microscopic inspection of cross sections from the second bottom-most internode
showed that only nuclei in the expected tissues were labeled by GFP (Figures 1C-J, Supplemental Figure 1) and, thus, that our lines carried GFP-positive nuclei in a tissuespecific manner.

Tissue-specific gene activity in stems can be determined by nucleus profiling
To see whether we were able to faithfully extract tissue-specific mRNA, we first focused on inner tissues and tissues producing prominent secondary cell walls expecting that isolation was most challenging in these cases. Therefore, we collected nuclei from fibers marked by NST3 pro activity, the distal cambium (SMXL5 pro ) and the phloem SMART-seq2 amplification of mRNA (Picelli et al., 2013) and RNA-seq analysis of three replicates for each sample type. Following this strategy, we found that the levels of the H4-GFP mRNA (detected through the GFP sequence) and the mRNA of the NST3, SMXL5 or APL genes whose promoters were used for expressing H4-GFP in the respective lines, were significantly enriched in extracts from GFP-positive nuclei compared to extracts from GFP-negative nuclei in all cases Supplemental Dataset 1). These findings indicated that cell type-specific transcriptomes were thoroughly accessible following our experimental pipeline.

Transcriptome analysis of seven stem tissues
Next, we performed transcriptome profiling of GFP-positive nuclei harvested from the remaining four lines (PXY pro , LTP1 pro , VND7 pro , SCR pro , Supplemental Figure  Confirming the biological relevance of the obtained profiles, xylem-related (PXY pro , NST3 pro , VND7 pro ) and phloem-related (SMXL5 pro , APL pro ) datasets showed a high correlation coefficient among each other but belonged to two different major branches within the correlation plot ( Figure 3B).
Overall, among 37,051 Arabidopsis protein-coding and non-coding genes (Cheng et al., 2017), we detected 25,679 -33,949 genes to be expressed in the respective tissues (Transcripts Per Million, TPM > 1), suggesting sufficient coverage by our RNA-seq analyses ( Table 1). The average percentage of 14.4 % to 24.3 % of reads mapping to introns compared to reads mapping to exons was higher than the 7.7 % found on average in our previous whole RNA-seq analyses of the second internode (Brackmann et al., 2018). This difference suggested that nucleus-derived RNA contains more nonprocessed transcripts compared to RNA derived from whole tissues (Table 1). In contrast, the average ratio of reads mapping to intergenic regions compared to the ratio of reads mapping to exons was comparable (Table 1)  gene (Friml et al., 2002), were also most abundant in SCR pro -positive nuclei ( Figure 4E).
Likewise, reads from NST1, known to be expressed in fiber cells (Mitsuda et al., 2005) had a maximum in NST3 pro -positive nuclei ( Figure 4G) and reads from the VND6 gene, the closest homologue of VND7 (Zhong et al., 2008) and its downstream target XYLEM CYSTEINE PROTEASE1 (XCP1) (Zhong et al., 2010;Yamaguchi et al., 2011) showed the highest activity in VND7 pro -positive nuclei ( Figures 4I, J). We also found activity of WUSCHEL RELATED HOMEOBOX4 (WOX4) ( Figure 4L), whose expression domain is congruent with the PXY expression domain (Hirakawa et al., 2008;Suer et al., 2011;Brackmann et al., 2018;Shi et al., 2019) to peak in PXY pro -positive nuclei. In addition, are expressed in SMXL5 pro -positive cells (Gursanscky et al., 2016;Miyashima et al., 2019) and peaked, as expected, in SMXL5 pro -positive nuclei ( Figures 4N-P). Although LTP1 reads were not enriched in LTP1 pro -positive nuclei ( Figure 4Q), the reads of FIDDLEHEAD (FDH) and ECERIFERUM6 (CER6), which are known to be expressed in the epidermis (Yephremov et al., 1999;Pruitt et al., 2000;Hooker et al., 2002), were most abundant in LTP1 pro -positive nuclei (Figures 4R,S). Further verifying that we succeeded in determining the transcriptome of the epidermis, we found that 32 out of 40 genes identified previously to show the highest activity levels in the stem epidermis (Suh et al., 2005) had more normalized read counts in the LTP1 pro -derived dataset compared to the overall tissue-average and the reads of 23 out of the same 40 genes were indeed most abundant in LTP1 pro-positive nuclei (Supplemental Figure 4).

Transcriptome dataset of the phloem cap and the pith
Because for the pith and the phloem cap no reliable tissue-specific promoters have been identified to date, we employed LCM to determine transcriptome profiles of these tissues. To this end, we first collected the phloem cap and the pith followed by the collection of the remaining vascular bundle which we harvested as a comparison ( Figure 5A, n = 2 replicates). Subsequently, RNA was extracted, mRNA was amplified and analyzed by RNA-seq ( Figure 5A, Supplemental Dataset 3). Confirming reliable sample preparation, replicates generated from each sample type grouped together in PCA plots ( Figure 5B). Moreover, correlation analyses showed that the phloem cap profile and the profile from the remaining vascular bundle were more similar to each other than to the pith profile ( Figure 5C). This finding was expected considering the high spatial and ontogenetic relatedness between the vascular bundle and the phloem cap.
On average, we detected 15,127 -17,393 genes to be expressed in each LCM-derived sample type (TPM > 1) suggesting a lower coverage in comparison to our FANS-based analyses ( Table 1). The average ratio of the number of reads mapped to intron regions compared to the number of reads mapped to exons for each sample type were with 6.5 % to 7.7 % in a similar range as in standard RNA-seq approaches for the Arabidopsis stem (Brackmann et al., 2018) (Table 1). Supporting reliable profiling, reads from the CYTOCHROME P450 83A1 (CYP83A1) and CYP83B1 genes, which are part of the indole and aliphatic glucosinolate biosynthetic pathways and known to be expressed in the phloem cap (Nintemann et al., 2018) were significantly enriched in phloem cap samples ( Figures 5D, E). In addition, reads from the ANAC074 and AT2G38380 genes which are known to be expressed in the pith (Fujimoto et al., 2018;Schürholz et al., 2018) were enriched in pith-derived samples (Figures 5F, G).

Identifying genes with tissue-related expression patterns
To identify genes predominantly active in individual tissues, we next applied the DESeq2 software package and the likelihood ratio test (LRT) to our FANS-derived datasets (Love et al., 2014). Thereby, we classified 14,063 genes as significantly  Figure 6). The remarkable exception in this regard were developing (SMXL5 pro ) and differentiated (APL pro ) phloem cells. In line with their strong ontogenetic relationship, genes very active in one of the two tissues were often very active in the second one ( Figure 6). In contrast, clusters containing genes that were very active in fiber cells were mostly distinct from clusters containing genes whose activity was high in developing vessel elements. Similarly, genes characterizing developing phloem cells (SMXL5 pro ) were, by large, distinct from genes characterizing developing xylem cells (PXY pro ). This suggested that, although these cell types partly originate from the same procambial precursors (Shi et al., 2019), they quickly establish very distinct profiles.
To explore the power of our SDE predictions, we next selected nine genes from the group of genes specifically active in a distinct tissue according to our FANS-derived profiles (Supplemental Dataset 7), generated promoter reporter lines and analyzed their activity in the inflorescence stem ( Figure 7G). From the nine promoter reporters generated, three (AT1G24575 pro , AT1G29520 pro , AT5G28630 pro ), behaved exactly as genes whose reads were significantly enriched in pith-derived samples in comparison to samples from the vascular bundle area (Supplemental Dataset 9, BH adjustment of p value in Wald test < 0.01, fold change >2). Performing GO term analyses for this group of genes, identified the term 'cell death' (GO:0008219) to be enriched (Supplemental Dataset 9). This was in accordance with previous findings that programed cell death is a characteristic for pith cells (Fujimoto et al., 2018). Taken together, these analyses demonstrated that our LCM-based transcriptomes recapitulated expression profiles of characterized genes and provide informative insights into phloem cap and pith-specific cellular processes.

Pairwise comparisons of tissues identify tissue-associated gene activities
To see whether other pairwise comparisons were useful for predicting tissue-specific processes, we directly contrasted expression profiles from functionally and ontogenetically related tissues. Fibers (NST3 pro ) and xylem vessels (VND7 pro ) both determine wood properties in angiosperms and show distinct morphologies which are important for conducting their specific functions (Evert, 2006). Accordingly, their functional specialties are reflected in very distinct expression profiles ( Figure 6).
Comparing NST3 pro -and VND7 pro -derived datasets, we identified 991 genes as being predominantly expressed in NST3 pro -positive nuclei and 1,503 genes as being predominantly expressed in VND7 pro -positive nuclei ( Figure 8A Because secondary xylem and phloem tissues originate from cells located in PXY pro and SMXL5 pro activity domains, respectively (Shi et al., 2019), we also contrasted expression data from those tissues. When directly comparing data from PXY pro and SMXL5 pro -positive nuclei, we identified 4,124 PXY pro -associated and 1,911 SMXL5 proassociated genes ( Figure 8B, Supplemental Dataset 12, BH adjustment of p value in Wald test < 0.01, fold change >2). Performing GO term enrichment analyses, we found the terms photosynthesis (GO:0015979), response to stimulus (GO:0051869) and translation (GO:0006412) enriched among PXY pro -associated genes, whereas the term drug transport (GO:0015893) was enriched within the group of SMXL5 pro -associated genes ( Figure 8B, Supplemental Dataset 13). Interestingly, genes expressed in the xylem of roots (combined S4, S18, JO121 datasets in (Brady et al., 2007)) were overrepresented in the group of PXY pro-associated genes but under-represented among SMXL5 pro -associated genes (p < 1e-06 in Fisher's Exact Test; Figure 8C). In comparison, genes expressed in the root phloem (combined SUC2, S32, APL, S17 datasets in (Lee et al., 2006;Brady et al., 2007)) were over-represented among SMXL5 pro -associated genes but under-represented among PXY pro -associated genes (p < 0.01 in Fisher's Exact Test; Figure 8C). This observation suggested that a substantial amount of genes is shared between vascular tissues of primary roots and stems but that there are also large differences in the molecular signatures comparing vascular tissues of both organs.

Identification of enriched transcription factor binding sites
Networks of transcription factors are vital for establishing tissue-specific gene expression profiles and, thereby, determine cellular behavior (Gaudinier and Brady, 2016). Taking advantage of our identified SDE genes, we therefore sought identifying  transcription factor was mostly active in the phloem cap itself, in addition to be expressed in the protophloem of roots (Brady et al., 2007). Transcription factor binding region enrichment analyses for pith-associated genes predicted a set of 61 potential regulators ( Table 2 (Zhou et al., 2014), our analysis indeed holds the potential to identify tissuespecific regulators in the inflorescence stem. In turn, the large number of clusters for which no enrichment was detected suggests that, in general, established tissues do not depend on a small set of transcriptional regulators maintaining identity of mature tissues.

Discussion
Organs as functional units are composed of different tissues determining distinct aspects of their performance. By combining FANS and LCM-based tissue harvesting with RNA-seq analyses, we established a tissue-specific gene expression atlas of the primary Arabidopsis inflorescence stem. In addition to be accessible via the GEO depository (Barrett et al., 2013) under the accession number GSE142034, data for genes of interest can be accessed via a web-site allowing extraction of expression profiles (http://arabidopsis-stem.cos.uni-heidelberg.de/). Several observations underline the robustness of our mRNA profiling. First, the activities of six out of seven genes whose promoters were used for tissue-specific nucleus labeling were found to peak in the respective tissues. This observation suggests that the chosen promoters mostly recapitulate the expression patterns of endogenous genes.
Moreover, nuclei keep a sufficient amount of their mRNA content during the isolation process and the contamination by cytosolic mRNA, which is released during tissue disruption, is apparently low. The fact that LTP1 activity did not peak in LTP1 pro -positive nuclei, may reveal discrepancies between the activity pattern of our chosen LTP1 promoter and the distribution of the endogenous LTP1 mRNA. It is also important to be aware that the stability or movement of the LTP1 mRNA and the H4-GFP mRNA driven by the LTP1 promoter may be very different. This may result in discrepancies between patterns of H4-GFP protein accumulation and LTP1 mRNA accumulation. However, our comparative analyses suggests that we succeeded in isolating epidermis-specific mRNA.
The second observation supporting robustness of obtained profiles is that promoters from genes we predicted to have a distinct spatial pattern recapitulated, by large, these patterns. In addition to the possible reasons resulting in differences between the accumulation of endogenous transcripts and the H4-GFP protein discussed above, it is important to note that the promoter regions upstream of respective start codons, which we chose to drive fluorescent reporters, may miss regulatory elements substantially influencing gene expression of endogenous genes (Raatz et al., 2011). Therefore, a certain level of differences between FANS/LCM-derived profiles and reporter activities are expected and do not necessarily argue for a low predictive power of our transcriptional profiles. As a third observation arguing for the relevance of our profiles, the activity pattern of genes and pathways known to be associated with certain tissues, was reflected in our datasets. Taken together, we conclude that the obtained expression data provide a realistic picture of gene activities in the Arabidopsis inflorescence stem.
Although cytosolic and nuclear mRNA populations may differ due to a tight regulation of nuclear export or differences in mRNA homeostasis (Yang et al., 2017;Choudury et al., 2019), our results indicate that gene expression profiles of mature plant organs can be faithfully characterized by profiling nuclear mRNA. Interestingly, our RNA-seq results show that the sensitivity of FANS/RNA-seq-based analyses is comparable or even higher than conventional profiling of whole organs by RNA-seq (Table 1)

DNA vector construction
An H4-GFP-containing construct was a gift from Daniel Schubert (Freie Universität Berlin, Germany). NST3 pro :H4-GFP (pMS59), VND7 pro :H4-GFP (pTOM61), APL pro :H4-GFP (pPS02), SCR pro :H4-GFP (pPS20) and LTP1 pro :H4-GFP (pPS16) were cloned using the pGreen0229 vector (Hellens et al., 2000) as a backbone. Cloning of APL promoter and terminator regions was described previously (Sehr et al., 2010). Primer sequences to clone promoter and terminator regions of other genes and to amplify H4-GFP sequence with appropriate sequence for restriction enzyme are listed in Supplemental Table 1. Constructs for promoter reporters were generated using the GreenGate system (Lampropoulos et al., 2013). Promoter regions were amplified using primers listed in Supplemental Table 1, cloned into the pGGA000 vector and used for GreenGate reactions with the ER signal peptide sequence (pGGB006), the GFP sequence (pGGC014), the HDEL sequence (pGGD008), the RBCS terminator sequence (pGGE001), the BASTA resistance sequence (pGGF001) and the destination vector (pGGZ003) (Lampropoulos et al., 2013). Hoechst detection. Autofluorescence at 405 nm excitation was detected using a 585/42 bandpass filter. FSC (forward scatter) detector voltage was set to 55 and a 1.0 neutral density filter was used. SSC (side scatter) detector voltage was set to 275, Hoechst detector voltage to 352 and GFP detector voltage to 379. Events were triggered on FSC using a threshold of 5,000 and Hoechst with a threshold of 400. All the events were first filtered by FSC and SSC (Supplemental Fig 2E, gate 1). Nuclei were then identified based on Hoechst fluorescence. Doublets and aggregates were characterized by their higher Hoechst signal width values and were excluded from sorting by plotting Hoechst signal width against Hoechst signal area (Supplemental Figure 2F, gate 2). To exclude false positive events due to autofluorescence in the yellow-green spectral range, the GFP detection channel was plotted against a yellow fluorescence detection channel and only events with low yellow signal intensity were selected for sorting (Supplemental Figure 2G, gate 3). Then, the gate for GFP positive nuclei selection were set using wild type plants as a reference (Supplemental Figure 2H, gate 4). DNA content distribution was used to monitor the whole procedure (Supplemental Figure 2I).
RNA was precipitated and washed according to the manufacture's protocol, and then resuspended in 15 µl water. 1 µl of the total RNA was used for mRNA-seq library construction using the Smart-seq2 protocol (Picelli et al., 2013;Hofmann et al., 2019).
The resulting cDNA was quantified using the Qubit Fluorometer (Thermo Fisher, Waltham, U.S.A.) and the dsDNA High Sensitivity Assay (Thermo Fisher, Waltham, U.S.A.). DNA quality was checked on the Agilent Bioanalyzer (Agilent, Santa Clara, U.S.A.). Libraries for next generation sequencing were generated using the NEBNext Ultra II gDNA prep kit with the NEBNext Multiplex Oligos for Illumina (New England Biolabs, Ipswich, U.S.A.). Prior to library generation, the samples were fragmented using the Covaris S2 system (Covaris, Woburn, U.S.A.). Libraries were sequenced in the single-end 50 base mode on an Illumina HiSeq 2500 machine (Illumina, San Diego, U.S.A.). When more than three samples were prepared, amplified cDNA samples were subjected to PCR analysis amplifying the H4-GFP sequence using (H4GFPfor4 and GFPrev3 primers; see Supplemental Table 1 for sequence). The preparations showing high contrast between GFP-positive and GFP-negative samples were chosen for subsequent RNA-seq analysis.

Laser capture microdissection
LCM, subsequent RNA extraction and amplification was carried out as previously described . Library preparation and RNA sequencing were carried out by BGI Genomics (Shenzhen, China) using HiSeq 2000 (Illumina, San Diego, U.S.A.) using the single-end mode 50 base mode.

Bioinformatic analyses
The TAIR10 genome sequence was obtained through Ensemble Plants

Accessing RNA sequencing data
Raw data discussed in this publication have been deposited in NCBI's Gene Expression Omnibus (Barrett et al., 2013) and are accessible through GEO Series accession number GSE142034 at https://www.ncbi.nlm.nih.gov/geo/. In addition, data for genes of interest can be accessed via a web-site based tool allowing extraction of expression profiles (http://arabidopsis-stem.cos.uni-heidelberg.de/).

Accession Numbers
Arabidopsis

Supplemental Data
Supplemental Table 1. Primers used in this study.
Supplemental Table 2. Basic statistics of RNA-seq dataset in this study.          Summary of RNA-seq results for the different tissues from the second internode collected by FANS, for the whole organ by conventional RNA extraction (Brackmann et al., 2018) and collected by LCM. n = 2 or 3 for each dataset.