Comparative cell-specific transcriptomics reveals differentiation of C4 photosynthesis pathways in switchgrass and other C4 lineages

Highlight Determination of mesophyll and bundle sheath cell-specific transcriptomes for the monocot NAD-ME C4 plant switchgrass reveals both evolutionary divergence and common elements in C4 establishment.


Introduction
C 4 species are among the world's most important food, feed, and bioenergy crops, including maize (Zea mays), sugarcane (Saccharum officinarum), sorghum (Sorghum bicolor), and switchgrass (Panicum virgatum) (Brutnell et al., 2010;John et al., 2014). C 4 plants have evolved the C 4 cycle pathway, which creates a CO 2 pump that concentrates CO 2 around the carboxylating enzyme Rubisco (Christin et al., 2009). In most C 4 species, this is achieved through integrating the two CO 2 assimilation pathways spatially into two discrete cell types, namely mesophyll (M) and bundle sheath (BS) cells (Sheen, 1999;John et al., 2014). C 4 plants can achieve high photosynthetic efficiency and consequently decrease photorespiration. These characteristics enable C 4 plants to thrive in tropical and subtropical environments that induce high rates of photorespiration in C 3 plants (Brutnell et al., 2010).
In C 4 plants, CO 2 fixation is a two-step process. Atmospheric CO 2 is initially fixed by phosphoenolpyruvate carboxylase (PEPC) in the cytosol of M cells. The resulting four-carbon dicarboxylic acid oxaloacetate (OAA) is converted to malate or aspartate. These C 4 acids are then transferred into the inner compartment of the BS, where they are decarboxylated to release CO 2 to the Calvin cycle (Hatch, 1987;Furbank and Taylor, 1995;Furbank et al., 2004). The mechanism of decarboxylation in BS cells has been traditionally divided into three different C 4 types; NAD-malic enzyme (NAD-ME), NADP-malic enzyme (NADP-ME), and phosphoenolpyruvate carboxykinase (PEPCK) (Hatch, 1987), with some degree of flexibility or co-existence (Wang et al., 2014) (Fig. 1). However, only NAD-ME and NADP-ME subtypes are considered to be exclusive subtypes from specific lineages; both contain the PEPCK cycle as the supplementary photosynthetic pathway (Wang et al., 2014;Liu and Osborne, 2015). C 4 photosynthesis is thought to have first arisen ~30 million years ago and is found in >66 independent lineages of monocotyledons and dicotyledons (Aubry et al., 2014;Burgess and Hibberd, 2015) (Fig. 1). Most of the C 4 species occur in the grasses (~4600) and sedges (~1600) of monocots, while only 1600 C 4 species are found in the dicots (Gowik and Westhoff, 2011). The mechanism of establishment of C 4 photosynthesis is still unclear. Genes involved in C 4 photosynthesis recruited existing C 3 chloroplastic ancestors and did not evolve de novo (Gowik and Westhoff, 2011;Maier et al., 2011). C 4 photosynthesis may have emerged through gene duplication, promoter insertion, alteration of leaf structure, and establishment of the photorespiratory CO 2 pump (Gowik and Westhoff, 2011;Maier et al., 2011).
A distinct set of transcripts are preferentially expressed in the fully specialized M or BS cells to enable their co-operation in carbon fixation (Majeran et al., 2005). Comparison of transcriptomes for M and BS cell-specific gene expression in different lineages of C 4 plants will provide insight into the commonality and differentiation of regulatory mechanisms for the compartmentalization of C 4 photosynthesis in its independent evolutionary origins (Burgess and Hibberd, 2015). Only limited lineages have so far been assessed for M and BS cell-specific profiles; two for the monocot NADP-ME subtype and one for the dicot NAD-ME subtype. Chang et al. (2012) and Li et al. (2010) generated M and BS transcriptome profiles from mature leaves of maize, a monocot NADP-ME subtype. Pairwise comparisons of M and BS transcriptomes have been conducted for maize versus Setaria viridis (monocot NADP-ME subtype), and maize versus Cleome gynandra (dicot NAD-ME subtype) (Li et al., 2010;Chang et al., 2012;Aubry et al., 2014;John et al., 2014). However, the transcript expression profiles of M and BS cells in a monocot NAD-ME subtype C 4 plant have yet to be determined, resulting in the absence of a global comparison of cell-specific transcriptomes in the two distinct subtypes of monocot and dicot C 4 plants.
Switchgrass (Panicum virgatum L.) is being targeted as a source of biomass for biofuel production (McLaughlin Fig. 1. Phylogenetic tree of selected C 4 and C 3 species and carbon fixation process of C 4 leaves. (A) Phylogenetic analysis of C 4 model species with rice, Arabidopsis, and Brachypodium C 3 model species, drawn based on Bennetzen et al. (2012). (B) The major biochemical cycles in NAD-ME and NADP-ME subtypes of C4 photosynthesis. AlaAT, aspartate aminotransferase; AspAT, aspartate aminotransferase; CA, carbonic anhydrase; PEPC, phosphoenolpyruvate carboxylase; PEPCK, phosphoenolpyruvate carboxykinase; NAD-ME, NAD-dependent malic enzyme; NAD-MDH, NAD-dependent malate dehydrogenase; NADP-ME, NADP-dependent malic enzyme; NADP-MDH, NADP-dependent malate dehydrogenase; PPDK, pyruvate/orthophosphate dikinase; Ala, alanine; Asp, aspartate; Mal, malate; Pyr, pyruvate; OAA, oxaloacetic acid; PEP, phosphoenolpyruvate. (This figure is available in colour at JXB online.) and Kszos, 2005;Bouton, 2007), and is here selected as the representative of the monocotyledonous NAD-ME-type C 4 plant (Christin et al., 2009). We first defined the transcriptomic profiles of M and BS cells from switchgrass by manually isolating M and BS cells from mature leaves. We further applied comparative transcriptome analysis to dissect the evolutionary convergence and divergence of monocot and dicot C 4 plants: switchgrass, maize, Setaria viridis, and Cleome gynandra. Our study provides an overview of the functional differentiation and co-ordination of M and BS cells in the C 4 photosynthesis pathway, and provides insights into the possible evolutionary pathway of differentiation in C 4 photosynthesis.

Materials and methods
Plant material, growth conditions, and enzyme assays Switchgrass plants (cultivar Alamo-AP13) were grown in the greenhouse at 28 °C and 16/8 h day/night conditions, using supplemental lighting from halide lamps (250 mol photons m −2 s −1 ). For age gradient experiments, leaves were harvested from the first fully expanded leaves to the fourth leaves of three-node stage plants. For leaf developmental gradient experiments, leaves were harvested as the top, middle, and base sections from the second or third leaf of three-node stage plants. Samples were immediately frozen in liquid nitrogen and extracted as described (Sommer et al., 2012). Enzyme activity measurements of NAD-ME, NADP-ME, PEPCK, aspartate aminotransferase (AspAT), and alanine aminotransferase (AlaAT) were conducted using coupled spectrophotometric assays as described previously (Sommer et al., 2012). All assays were performed at 28 °C. The concentration of total soluble protein was determined by the Bradford method (Bradford, 1976) using the Bio-Rad protein assay reagent at 595 nm.

Fixation, cryosectioning, and cell-specific micro-dissection
The middle sections from the second or third leaf of three-node stage plants were fixed and subsequently used for micro-dissection. A Leica CM 1850 cryostat (Leica Microsystems Inc., Bannockburn, IL, USA) was used for obtaining cross-sections of switchgrass leaves, which were processed by fixation and cryoprotection as described previously (Srivastava et al., 2010). We found that the optimum longitudinal section thickness was 10 μm. For obtaining M and BS cells, leaf sections were manually dissected with a scalpel blade under a fluorescence stereo microscope. Around 1000 cells were collected into each tube and stored at −80 °C, and every 5-10 tubes were used together for RNA isolation.

RNA extraction and analysis
Total RNA was extracted from manually harvested M and BS cells using an RNeasy Micro Kit (Qiagen, Valencia, CA, USA) according to the manufacturer's protocol. RNA quality and quantity were estimated using an RNA 6000 Pico chip on an Agilent 2100 Bioanalyzer (Agilent Technologies, Palo Alto, CA, USA). RNA was then amplified using an Arcturus ® RiboAmp ® HS PLUS 2-round kit (Life Technologies, CA, USA) following the manufacturer's protocol (Li et al., 2010). At least 1 ng of starting total RNA was used for amplification of each replicate. The quality of amplified RNA was checked with the Bioanalyzer 2100, using the RNA 6000 nanochip.
Construction of cDNA libraries and sequencing procedure As previously described (Rao et al., 2014), 1 μg of total RNA for each sample was used to construct an RNA sequencing (RNA-seq) library using TruSeq RNA Sample Prep Kits v2 (Illumina Inc., San Diego, CA, USA), according to the manufacturer's instructions, at the Genomics Core Facility at the Noble Foundation (Ardmore, OK, USA). Each library was indexed. Six libraries with different indexes were pooled together in one lane for 100 bp paired-end sequencing. The Hiseq2000 run was conducted at the Genomics Core Facility of the Oklahoma Medical Research Foundation (Oklahoma City, OK, USA).

Determination of transcript abundance
All paired-end Illumina reads were trimmed using in-house PERL scripts with two filters, quality scores of ≤20 from the end of each read, and poly(A) (forward sequencing) or poly(T) (reverse sequencing) tails >15 bp in length. Reads <50 bp in length after trimming were discarded along with their mates. The trimmed reads were mapped to the Switchgrass genome Panicum virgatum v1.1 (http://phytozome.jgi.doe.gov/) using Bowtie 2 version 2.0.0 (Langmead and Salzberg, 2012) and TopHat version 2.0.10 (Kim et al., 2013) in conjunction with SAMtools version 0.1 (Li et al., 2009) with default parameters. Based on Panicum virgatum v1.1 annotation, normalized gene expression levels were calculated in FPKM (fragments per kilobase of exon model per million mapped fragments) (Trapnell et al., 2010) using Cufflinks version 2.1.1 (default settings, set to 100 mean inner distance for paired-end reads; Trapnell et al., 2013) and in TPM (transcripts per million) (Wagner et al., 2012) using RSEM version 1.2.23 (default settings; Li and Dewey, 2011). For differential expression testing, alignments of reads from Tophat v2.0.10 were counted to genes using HT-SEQ with union mode (Anders et al., 2015), then analyzed using DESeq2 (Love et al., 2014). A Benjamini-Hochberg corrected P-value of <0.05 was set to identify differentially expressed genes. Raw data are provided in Supplementary Table S1 at JXB online.

Data annotation
Panicum virgatum v1.1 gene annotation was downloaded from the Phytozome v10.3 website (http://phytozome.jgi.doe.gov/). Classification for cell type-enriched genes was based on MapMan mappings of their Arabidopsis homologs (Thimm et al., 2004). Significant functional enrichment was determined using Fisher's exact test with Benjamini-Hochberg multiple testing correction [false discovery rate (FDR) ≤0.1]. Identification of transcription factors (TFs) in Panicum virgatum v1.1 genes was based on their Arabidopsis and rice homologs and the annotation from the Plant Transcription Factor Database v3.0 (Jin et al., 2014). Identification of homologous genes among switchgrass, maize, and S. viridis was performed using alignments of Panicum virgatum v1.1, Setaria italica v2.1, and Zea mays v5b.60 protein sequences obtained from Phytozome v10.3 using blast+ tools. The syntenic gene set of maize, sorghum, and rice was obtained from Schnable et al. (2012) and that of switchgrass, S. viridis, and maize was generated for pairwise comparisons using SynMap (Lyons et al., 2008) with default parameters (score ≥90).

Real-time quantitative reverse transcription-PCR
Total RNA was isolated from the second and third leaf, leaf sheath, and stem of three-node stage switchgrass plants. cDNA synthesis, primer design, and real-time quantitative reverse transcription-PCRs (qRT-PCRs) were performed as previously described (Rao et al., 2014). qRT-PCR was performed in triplicate for each sample, and three biological replicates were evaluated for each gene tested. Data were collected and analyzed using PikoReal Software (Thermo Scientific). PCR efficiency was estimated using PikoReal software, and transcript levels were determined by relative quantification using the actin gene as a reference (Gimeno et al., 2014).
In situ hybridization For in situ hybridization, samples of middle sections from the second or third leaf of three-node stage switchgrass plants were harvested. Tissue fixation, dehydration, and paraffin embedding were performed according to Long's protocol (http://www.its.caltech. edu/~plantlab/protocols/insitu.pdf). Pre-hybridization, hybridization, and washing were conducted on the robotic GenePaint™ system (Tecan Inc., Durham, NC, USA) following the manufacturer's instructions (Zhou et al., 2011).

Construction of phylogenetic trees
For phylogenetic reconstruction, multiple protein sequences were aligned using the ClustalW algorithm (Larkin et al., 2007). Neighbor-Joining (NJ) phylogenetic trees were generated by the MEGA 6.0 program (Tamura et al., 2013), using 1000 replicates of bootstrap analysis. Protein sequences were obtained from a previous report (Maier et al., 2011).

Accession numbers
The data sets supporting the results of this article are available in the NCBI Sequence Read Archive (SRA) repository, NCBI SRA accession no. SRX1160366.

Activity of C 4 enzymes in the switchgrass leaf
To determine the differences in CO 2 fixation enzymes along a developmental gradient in switchgrass leaves as a basis for selection of tissues for transcriptomic analysis, we measured the activities of enzymes involved in C 4 photosynthetic pathways in crude leaf extracts. For the age gradient experiment, we selected from the first to the fourth leaf blades as young, mature, and old samples from plants with three nodes, in the elongation stage of the plant (Moore et al., 1991). The three decarboxylating enzymes NAD-ME, NADP-ME, and PEPCK were assayed, as well as AspAT and AlaAT ( Fig. 2; Supplementary Table S2). This analysis confirmed that switchgrass performs C 4 photosynthesis of the NAD-ME type with a minimal PEPCK decarboxylating pathway (Warner et al., 1987). Compared with young and old samples, the mature leaves (the second and third leaves) showed significantly higher activity of NAD-ME, AspAT, and AlaAT. A base to top gradient experiment was further conducted on the second and the third leaves to assess area-related differences in C 4 enzyme activities of mature leaves. The classical NAD-ME subtype enzymes NAD-ME and AlaAT showed lower activities in the base section of leaves compared with the top or middle sections. This is consistent with the results of transcriptomic and C 4 enzyme activity assays in maize leaves, since cells in the leaf base are younger with minimal photosynthetic activity (Li et al., 2010;Pick et al., 2011). Considering the C 4 enzyme activity and the size of the leaf cells, the middle sections in the mature zones of the second and third leaves were selected for subsequent experiments.

Isolation of bundle sheath and mesophyll cells
Cross-sections from selected parts of the switchgrass leaf showed the typical Kranz anatomy structure of NAD-ME subtype C 4 plants; a layer of small M cells surrounding the layer of large BS cells with the inner mestome sheath and vascular tissue (Supplementary Fig. S1) (Edwards and Walker, 1983;Edwards et al., 2001). As observed in a previous study (Warner et al., 1987), the >10 μm width of BS and M cells makes hand dissection feasible ( Supplementary Fig. S1). Hand dissection has been successfully applied for tissue-specific and single-cell analysis in plant research, and can be simply carried out by use of a razor blade (Outlaw and Zhang, 2001). In this study, leaf fragments were cut into longitudinal cryosections with optimal thickness of 10 μm, and BS and M cells were obtained by manual micro-dissection under a fluorescence stereo microscope ( Supplementary Fig. S1). RNA isolated from BS and M cells ( Supplementary Fig. S2) was of suitable quality for amplification, library construction, and sequencing.

Deep sequencing, read mapping, and identification of expressed genes
Duplicated BS and M cell samples from switchgrass leaves were sequenced on the HiSeq™ Sequencing System to generate paired-end reads. Clean sequence reads for BS and M samples were mapped to the switchgrass reference genome Panicum virgatum v1.1, resulting in 79% of the cleaned reads mapped to the reference genome (Table 1). Considering the high complexity of the tetraploid switchgrass genome Means and standard errors from three independent biological replicates are plotted. The significant differences by the Student's t-test between pairwise samples are provided in Supplementary Table S2 at JXB online. (Sharma et al., 2012) and the fact that Panicum virgatum v1.1 represents a partial genome sequence, a moderate ratio of mapped reads is expected.
Normalized transcript levels were calculated using RSEM version 1.2.23 (Li and Dewey, 2011) and Cufflinks version 2.1.1  in terms of TPM (Wagner et al., 2012) and FPKM (Trapnell et al., 2010). To define 'differentially expressed genes', we used the criterion of adjusted P≤0.05 between the two RNA samples. On this basis, 5122 and 4630 genes were considered as differentially expressed in BS and M cells, respectively. Moreover, increasing the criterion of difference in fold value, more genes were enriched in BS cells than in M cells (Table 2). For example, 332 genes in BS but only 199 genes in M showed a 16-fold difference in expression level. This finding is consistent with observations in the other C 4 plants maize (Chang et al., 2012) and S. viridis (John et al., 2014), and suggests that BS cells play a more important role in C 4 photosynthesis and may also differentially perform other metabolic processes (Chang et al., 2012).

Estimation of C 4 pathway transcript abundance in the switchgrass leaf
The main difference in the photosynthetic pathway between C 3 and C 4 plants is that in C 4 plants the CO 2 concentration and assimilation mechanisms are divided between M and BS cells (Chang et al., 2012). Gene families that encode proteins with key roles in C 4 photosynthesis were detected in our M/ BS transcriptome data set (Supplementary Table S3). Among them, 25 genes were selected for further analysis due to strong cell-specific expression ( Table 3). The fold change between M and BS samples was similar using either the FPKM or TPM value (Supplementary Table S3). Because TPM eliminates statistical biases compared with FPKM (Wagner et al., 2012), we used TPM to present expression levels of C 4 genes in the following sections.
The C 4 marker genes encoding carbonic anhydrase (CA), pyruvate orthophosphate dikinase (PPDK), PEPC, and NAD-dependent malate dehydrogenase (NADP-MDH; Fig. 1B) that are involved in carbon fixation (Chang et al., 2012) were preferentially expressed in M cells, whereas NAD-MEs, NAD-MDH, and PCK that are involved in releasing CO 2 from C 4 acid (Chang et al., 2012), and nine genes encoding enzymes involved in the Calvin cycle, showed much higher transcript levels in BS cells than in M cells. In addition, phosphoglycerate kinase (PGK), glyceraldehyde 3-phosphate dehydrogenase B subunit (GADPH-B), and triosephosphate isomerase (TPI), that work in M cells to allow balancing of reducing equivalents between the two cell types (John et al., 2014), were preferentially expressed in the M cells.
On the basis of previously published microarray analyses (Zhang et al., 2013), C 4 marker genes in switchgrass displayed a higher expression level in leaf blade compared with other major tissues and organs such as leaf sheath and stem ( Supplementary Fig. S3). The high expression of C 4 marker genes in the mature switchgrass leaf was confirmed through qRT-PCR analysis ( Supplementary Fig. S3). To validate further the digital expression profile, NAD-ME, CA, PEPCK, phosphoribulokinase (PRK), PEPC, and PPDK were selected for localization analysis by in situ hybridization. Labeled antisense probes were used to hybridize with the target mRNAs in situ, while labeled sense probes were used as negative controls (Srivastava et al., 2010). The sequences of the primers used for qRT-PCR and in situ hybridization are given in Supplementary Table S4. As expected, NAD-ME, CA, PEPCK, and PRK were preferentially expressed in BS cells, whereas PEPC and PPDK transcripts were enriched in M cells (Fig. 3). These results are consistent with those from the transcriptome data set, and indicate that our isolation methods caused only low-level cross-contamination of M and BS cells.

Convergence in C 4 transcript abundance among monocot and dicot C 4 plants
We compared C 4 transcript abundance in the M and BS transcriptomes from switchgrass (present work), S. viridis (John et al., 2014), maize (Chang et al., 2012), and C. gynandra (Aubry et al., 2014) (Fig. 4; Supplementary Table S5). Maize, S. viridis, and switchgrass belong to the C 4 grasses, and are classified into two distinct subtypes, NADP-ME and NAD-ME (Wang et al., 2014). The dicot plant C. gynandra performs NAD-ME-type C 4 photosynthesis (Sommer et al., 2012). The major biochemical processes for C 4 carbon fixation in the two C 4 photosynthetic subtypes are shown in Fig. 1B. Transcripts involved in the C 4 cycle showed very similar compartmentalization between M and BS cells in the grass species; genes encoding enzymes involved in CO 2 fixation (CA, PEPC, PPDK, and NADP-MDH) and the Calvin cycle [ribulose bisphosphate carboxylase small chain (RBCS), Rubisco activase (RCA), PRK, ribose 5-phosphate isomerase (RPI), ribulose-phosphate 3-epimerase (RPE), transketolase (TKL), sedoheptulose-bisphosphatase (SBP), fructose-1,6-bisphosphatase (FBP), and fructose-bisphosphate aldolase (FBA)] were enriched in M and BS cells, respectively. Transcripts from the C. gynandra transcriptome showed reduced enrichment in BS cells, possibly due to posttranscriptional regulation (Aubry et al., 2014). The consistency of C 4 core gene expression preference in four distinct C 4 lineages highlights the evolutionary convergence in the C 4 pathway.
The GADPH of land plants includes two plastidic tetramer isoforms named A 4 and A 2 B 2 (Fermani et al., 2012). Transcripts encoding GADPH-B were clearly enriched in M cells in the four species, consistent with the role of the A 2 B 2 isoform in catalyzing the reducing step of the Calvin cycle in  M cells. In contrast, transcripts encoding GADPH-A accumulated in both cell types, with a slightly higher ratio in BS cells. Furthermore, transcripts encoding CP12, a small protein involved in regulation of the Calvin cycle by forming a supercomplex with PRK and GADPH (Groben et al., 2010;Lopez-Calcagno et al., 2014), were exclusively found in BS cells. This suggests that in BS cells, CP12 and A 4 GADPH mainly function together for regulation of the activity of PRK, in agreement with the observation of formation of a stable CP12-A 4 GADPH complex in the dark (Fermani et al., 2012). PEPCK, the decarboxylation enzyme that utilizes OAA to release CO 2 in the cytosol (Hatch, 1987), is enriched in BS cells at moderate expression levels in all four C 4 species. This is consistent with the detection of PEPCK activity in switchgrass, maize, and C. gynandra (Prendergast et al., 1987;Sommer et al., 2012), and suggests that a mixed and flexible decarboxylation system commonly exists in NAD-ME and NADP-ME subtypes, through an additional service of the PEPCK cycle (Furbank, 2011;John et al., 2014).

Divergence in C 4 transcript abundance between NAD-ME and NADP-ME subtypes
C 4 plants have evolved different means to decarboxylate organic carbon compounds to release CO 2 at the site of Rubisco; NADP-ME decarboxylates malate to pyruvate in chloroplasts, whereas NAD-ME decarboxylates malate to pyruvate in mitochondria (Brautigam et al., 2014) (Fig. 1B). Transcripts encoding NADP-ME, rather than NAD-ME or NAD-MDH, show enrichment in BS cells in two different lineages of the NADP-ME subtype (maize and S. viridis), whereas NAD-ME is enriched in BS cells in the monocot switchgrass and the dicot C. gynandra (Fig. 4). In the phylogenetic tree constructed with known NAD-ME sequences, four switchgrass NAD-ME genes detected in our M/BS transcriptome data set are classified into two distinct clusters, and photosynthetic NAD-ME in switchgrass belongs to the NAD-ME 2 group (Supplementary Fig. S4).
Interestingly, transcripts of two switchgrass genes (PvEa00263 and PvEb00308) that are annotated as chloroplastic NADP-ME showed high accumulation in BS cells. The high expression level of NADP-ME genes in switchgrass leaves has been observed in several previous reports (Zhang et al., 2013;Meyer et al., 2014;Palmer et al., 2015) and here was confirmed by qRT-PCR analysis using multiple pairs of primers ( Supplementary Fig. S3; Supplementary Table S4). To address the evolutionary origin of switchgrass NADP-MEs, a phylogenetic tree was constructed from protein sequence alignments of NADP-MEs from C 4 and C 3 plants ( Supplementary Fig. S5). The two highly expressed switchgrass NADP-ME genes clustered with the C 4 NADP-ME of S. viridis, and share identity to the non-photosynthetic plastidic NADP-ME in maize (Saigo et al., 2004), which presents high intrinsic NADP-ME activity but is expressed constitutively at low levels (Maier et al., 2011). Considering the low NADP-ME activity detected in switchgrass leaves, it is likely that in switchgrass either strong post-transcriptional or translational control leads to degradation or loss of activity of NADP-ME in BS cells, or the two genes encode low activity NADP-ME isoforms. To assess further the possible derivation of C 4 functional decarboxylation enzyme from the same ancestral genomic region (Schnable et al., 2012), we identified the syntenic orthologs of NAD-ME and NADP-ME isoforms among C 4 grasses and outgroup C 3 rice ( Supplementary Fig.  S6). Interestingly, NAD-ME 2 genes (functional C 4 group) were syntenic within C 4 grasses (switchgrass, S. viridis, maize, and sorgum), whereas NAD-ME 1 genes were syntenic between C 4 grasses and C 3 rice. Genes in non-C 4 and C 4 NADP-ME groups were syntenic in switchgrass, S. viridis, maize, and sorghum. Overall, it is possible that the photosynthetic and non-photosynthetic isoforms found in C 4 plants could present at some level within the most recent common ancestor (Washburn et al., 2015) but may or may not have been recruited into C 4 photosynthesis depending on selective pressure.
In NAD-ME and PEPCK subtypes, carboxylation and decarboxylation are linked by the transfer of aspartic acid and alanine in M/BS cells, whereas NADP-ME subtypes mainly recruit malate/pyruvate shuttles between the two cell types (Weber and von Caemmerer, 2010). The amino acid aspartate in the chloroplast of M cells is generated by chloroplastic AspAT, then enters the BS cells where it is converted into OAA in the mitochondria by mitochondrial AspAT (Weber and von Caemmerer, 2010) (Fig. 1B). In switchgrass, transcripts encoding chloroplastic and mitochondrial AspAT are abundant in M and BS cells, respectively, which supports a function in maintaining the ammonia balance between M/BS cell types (Weber and von Caemmerer, 2010). In contrast, AspAT shows reduced expression in BS cells in the NADP-ME subtypes maize and S. viridis. Transcripts encoding AlaAT are significantly accumulated in both M and BS cells of C. gynandra, consistent with a dual function in converting from/into alanine in M/BS cells. In contrast, the AlaAT in the NADP-ME subtypes (maize and S. viridis) displayed unequal distribution in M and BS cells. However, the apparent low expression level of the AlaAT gene in switchgrass is possibly artifactual due to the incompleteness of the switchgrass genome. At least 20 expressed sequence tags (ESTs) were predicted to represent genes encoding AlaAT in switchgrass Zhang et al., 2013); one EST (AP13CTG00178) was highly expressed in mature leaves with similar expression pattern to other C 4 genes (SupplementaryFig. S3), but is not represented in the current genome assembly Panicum virgatum v1.1.
Differences in C 4 shuttle/transport between NAD-ME and NADP-ME subtypes C 4 photosynthesis recruits a high rate of metabolic exchange between the two main cell types, which requires multiple transporters in M and BS chloroplast envelopes to facilitate this metabolic collaboration (Majeran and van Wijk, 2009). Based on the present study and previous work (Majeran and van Wijk, 2009;Weber and von Caemmerer, 2010;Brautigam et al., 2014), the detailed processes of C 4 photosynthesis in the NAD-ME subtype are summarized in Fig. 5.
In both NAD-ME and NADP-ME subtype plants, PEP generated from pyruvate in M cell chloroplasts is exported into the cytosol by a PEP/phosphate translocator (PPT) . Similarly in maize and S. viridis (Chang et al., 2012), two transcripts (Pv.Fa00799, M TPM=913; and Pv.Fb00666, M TPM=650) encoding PPT were enriched in M cells, whereas the other three (Pv.J33600, Pv.Ea00394, and Pv.J15038) were enriched in BS cells. This suggests that PPT encoded by Pv.Fa00799 and Pv.Fb00666 may specifically export PEP and import triose phosphate in the M chloroplasts. The triose phosphate translocator (TPT) is thought to export triose phosphates and 3-phosphoglycerate (3-PGA) from chloroplasts to the cytosol in M and BS cells; this is necessary to generate half reducing equivalents for the Calvin cycle (Knappe et al., 2003;Weber and von Caemmerer, 2010). In switchgrass, maize, and S. viridis (Chang et al., 2012), genes encoding TPT were expressed at a high level in both M and BS cells. In C. gynandra, PPT (AT5G33320) and TPT (AT5G46110) display high expression levels in both cell types as well. This suggests that C 4 plants recruit PPT and TPT to transfer PEP/phosphate and triose phosphate, respectively, in both NAD-ME and NADP-ME subtypes.
For NADP-ME type C 4 photosynthesis, DiT1 (a 2-oxoglutarate/malate transporter) exchanges OAA/malate across the M cell chloroplast envelope membrane and DiT2 imports malate into BS chloroplasts (Majeran and van Wijk, 2009;Chang et al., 2012). These processes are not required for NAD-ME-type C 4 photosynthesis . Genes encoding DiT1 were preferentially expressed in M cells and genes encoding DiT2 were only weakly expressed in both M and BS cell types in switchgrass (Supplementary  Table S6). No increased transcript abundance of DiT1 in C 4 (C. gynandra) was detected compared with its closely related C 3 species (C. spinosa) . This suggests that DiT1 may not act specifically for C 4 metabolite transport in M cells in the NAD-ME subtype because it dually acts as the malate valve and as a 2-oxoglutarate/malate transporter for photorespiratory nitrogen recycling (Kinoshita et al., 2011). For NAD-ME-type C 4 photosynthesis, malate is required to be generated from OAA in mitochondria or transported into mitochondria (Fig. 5). We found that transcripts encoding the mitochondrial malate/phosphate carrier DIC (Pavir.Fb01780) and the mitochondrial phosphate transporter PIC2 (Pavir.Aa00590) were significantly enriched in switchgrass BS cells with a high expression level. Similar BS cell-type enrichment and high expression of genes encoding two mitochondrial transporters (DIC, AT2G22500; and PIC2, AT5G14040) (Brautigam et al., 2014;Jia et al., 2015) was also observed in C. gynandra, but DIC expression was not enriched in BS cells of maize and S. vidiris. This suggests a requirement for the combination of the malate phosphate antiporter DIC with the phosphate/proton transporter PIC2 to pump malate into mitochondria in NAD-ME species, but not in NADP-ME species (Fig. 5).

Global comparisons of the M and BS transcriptomes among monocot and dicot C 4 plants
To define the extent to which gene expression patterns are conserved or divergent in monocot and dicot C 4 plants, the same adjusted P≤0.05 between M and BS samples was applied as the definition of 'differentially expressed genes' for switchgrass, S. viridis (John et al., 2014), maize (Chang et al., 2012), and C. gynandra (Aubry et al., 2014). Among S. viridis, maize, and C. gynandra, 5049 and 4631 transcripts, 6691 and 7647 transcripts, and 372 and 338 transcripts were enriched in M or BS cells, respectively. In our switchgrass transcriptome data set, 4630 and 5122 genes were considered as differentially expressed in M or BS cells, respectively (Supplementary Table S6). The higher numbers in switchgrass and maize are probably due to the highly duplicated and complex natures of their genomes (John et al., 2014).
Switchgrass, S. viridis, and maize are close relatives among the C 4 grasses (Fig. 1A). To determine if homologous genes underlie C 4 photosynthetic development in C 4 grasses, homology alignment was applied to differentially expressed genes in switchgrass, S. viridis, and maize (Fig. 5B). A total of 378 and 452 homologous genes are preferentially expressed in M and BS cells of these three C 4 grasses. Pairwise comparison showed that S. viridis and maize share the most homologs that are preferentially expressed in M and BS cells, whereas fewer homologs are shared between S. viridis and switchgrass, and between switchgrass and maize. Classification into the same subtype contributes to this higher degree of similarity in M-and BS-enriched genes of maize and S. viridis, though switchgrass and S. viridis have a very close phylogenetic relationship ( Fig. 1A) (Bennetzen et al., 2012).
To compare the functional differentiation between M and BS cells in C 4 plants, the biological functions of cell-specific enriched genes were visualized using Mapman. Between 74% and 100% of the cell-specific enriched genes in the four C 4 species could be assigned into 35 main categories ( Fig. 6; Supplementary Table S7). To evaluate the convergence in the distribution of each category in C 4 plants, we compared the gene number assigned into categories for pairs of species in M and BS cells, respectively. The Pearson's correlation coefficient (r 2 ) ranged from 0.68 to 0.96, indicating a high degree of convergence in distribution of biological processes in the four distinct C 4 species (Supplementary  Table S7).
Using the Fisher exact test with FDR ≤0.1, we identified two and one, 12 and five, seven and nine, and 14 and seven main functional groups significantly enriched in M and BS cells of C. gynandra, S. viridis, maize, and switchgrass, respectively (Fig. 6A). Similarity in distribution of genes in certain categories was observed in these four C 4 species. For example, considering that primary carbon assimilation is restricted to BS cells, transcripts in the categories of the TCA (tricarboxylic acid) cycle and major CHO (carbohydrate) metabolism were significantly enriched in BS cells of S. viridis, maize, and switchgrass, and moderately enriched in BS cells of C. gynandra. The preferential expression of genes involved in transport in BS cells is consistent with the fact that these cells are surrounded by vascular tissue for release and transport of metabolites to other plant parts (Christin et al., 2009).
However, many differences were detected in functional enrichment between M and BS cells in the four C 4 plants. For instance, transcripts encoding maize genes in the photosystem (PS) and mitochondrial electron transport/ATP synthesis categories were more abundant in M cells, whereas in the other three C 4 species, these genes were preferentially expressed in BS cells. This is possibly because maize is depleted in PSII in the BS cells to lower oxygen production in these cells (Gowik and Westhoff, 2011).
Further analysis of subcategories within the main categories of protein and RNA revealed more divergence in functional enrichment within the four C 4 plant species ( Fig. 6B; Supplementary Fig. S7). Interestingly, genes involved in protein synthesis, targeting, and folding displayed preferential expression in BS cells in switchgrass and equal distribution in both cell types in C. gynandra, but strong enrichment in M cells in the two NADP-ME-type plants, S. viridis and maize. Strong enrichment in BS cells of genes involved in protein post-translational modification, protein glycosylation, and protein degradation mediated by AAA-type and ubiquitin proteases was observed in maize, whereas equal distribution in both cell types or moderate enrichment in M cells were found in switchgrass, S. viridis, and C. gynandra. In addition, genes involved in RNA regulation of transcripts were abundant in M cells of switchgrass, but are more abundant in BS cells of maize and S. viridis. This suggests that at both the post-transcriptional and translational levels, switchgrass, maize, and S. viridis may display different machineries for RNA transcriptional regulation and protein import, sorting, folding, assembly, and degradation that facilitate the preferential accumulation of proteins and accommodate protein homeostasis between M and BS cells. We suggest that the functional differentiation of post-transcriptional and translational regulatory mechanisms in M or BS cells of NAD-MEtype switchgrass, and NADP-ME-type maize and S. viridis, might be a result of their distinct evolutionary pathways, and/ or may be associated with the accommodation of the different distribution of metabolites within M and BS cells of these two subtypes. For instance, NADP-ME-type plants require extra ATP produced by cyclic electron transport in BS cells to balance the production of NADPH for preventing photoinhibition or photodamage, and this is not required by NAD-ME-type plants (Nakamura et al., 2013;Walker et al., 2014).

Convergence and divergence of cell type-enriched transcription factors in C 4 plants
Given that differentially expressed TFs will play a major role in regulatory networks for the functional differentiation of M and BS cell types (Li et al., 2010;Chang et al., 2012), we annotated and identified TFs in the M and BS cell transcriptomes in switchgrass based on their homologs to Arabidopsis and rice in the Plant TF Database v3.0 (Jin et al., 2014). Compared with the annotated TFs of S. viridis (John et al., 2014), maize (Chang et al., 2012), and C. gynandra (Aubry et al., 2014), 243 and 175, 232 and 296, 274 and 412, and 23 and 20 TFs were identified as enriched genes in M and BS cells of switchgrass, S. viridis, maize, and C. gynandra, respectively (Supplementary Table S8). These genes were assigned into 60 TF families according to the Plant TF Database v3.0 (Jin et al., 2014). Consistent with previous studies (Li et al., 2010;Chang et al., 2012), many TF families showed preferential cell type expression, indicating an overall differentiation in the regulatory networks underlying the functional differentiation of the two photosynthetic cell types in C 4 leaves. However, similarities and differences were observed in the distributions of cell type preference of TF families within the four distinct C 4 plants. Overall, maize and S. viridis shared more similarities in the distributions in TF families, resulting in Pearson's correlation coefficients (r 2 ) of 0.70 and 0.89 in M and BS cells, respectively. In contrast, fewer similarities in distribution of cell type preference of TFs were shared between the monocot switchgrass and the dicot C. gynandra, although they belong to the same NAD-ME subtype.
To assess the potential co-option of TFs derived from a common ancestor, syntenic orthologs of TFs were determined between maize, S. viridis, and switchgrass. Switchgrass shared 34 and 13 TFs in M cells, and 44 and 18 TFs in BS cells as syntenic orthologs with S. viridis and maize, respectively. Among them, two and 10 syntenic ortholog TFs overlapped with M and BS cell-type enrichment, respectively, in the three C 4 grass genomes. Interestingly, one and three syntenic ortholog TFs in maize with M and BS cell specificity, respectively, have been reported as regulatory candidates in differentiation of C 4 Kranz anatomy (Wang et al., 2013;Tausta et al., 2014). For example, SCR is highly expressed in the BS in the three C 4 grasses and has been proven to play an essential role in BS cell fate specification in plants (Wysocka-Diller et al., 2000;Slewinski et al., 2012;Cui et al., 2014). This suggests that TFs might be in convergent evolution from a common ancestor and recruited in parallel in distinct C 4 linages.
To examine further possible candidates in BS establishment, homologous TFs with BS cell enrichment in four C 4 plants were identified, and MYB59, which is reported to regulate the cell division process in Arabidopsis roots (Cominelli and Tonelli, 2009;Mu et al., 2009), was found to be significantly enriched in BS cells of all four C 4 species. MYB59 in maize was listed as a candidate in regulating C 4 Kranz formation in a previous study (Wang et al., 2013;Tausta et al., 2014). It is possible that MYB59 may play a role in controlling BS cell size in C 4 species. In addition, transcriptional activators of lignification such as MYB42 (Hussey et al., 2013) were more abundantly expressed in the BS cells in the three monocot C 4 grasses than in C. gynandra, and the transcriptional repressor of lignification MYB4 (Shen et al., 2012) was highly expressed in M cells in switchgrass. This is consistent with the thicker cell walls in the BS cells of monocot as compared with dicot plants (Majeran et al., 2005), and suggests that the C 4 grasses might share similar regulatory mechanisms for secondary cell wall formation in BS cells.

Conclusions
The differentiation of M and BS cells with their specific accumulation of transcripts in C 4 leaves is considered a hallmark of the C 4 pathway (Burgess and Hibberd, 2015). Enzymatic and mechanical isolation or laser capture micro-dissection of M and BS cells have been applied in quantitative transcriptomics of specific cell types (Li et al., 2010;Chang et al., 2012;Aubry et al., 2014;John et al., 2014). Here we employed manual dissection of switchgrass M and BS cells, which is a simple and inexpensive means of obtaining a small specimen for further cell type-specific transcriptome analysis (Outlaw and Zhang, 2001). This has enabled, to the best of our knowledge, the first estimate of mRNA profiles of M and BS cells in a monocot NAD-ME subtype C 4 plant.
A high degree of convergence in the distribution of C 4 marker genes in M/BS cells was observed in the four distinct C 4 species; monocot NAD-ME-type switchgrass, monocot NADP-ME-type S. viridis and maize, and dicot NAD-ME-type C. gynandra (Fig. 4A). Furthermore, these species shared a high degree of convergence in distribution of gene functional categories in M/BS cells based on analysis of the Pearson's correlation coefficient (r 2 ) ( Fig. 6; Supplementary Table S7). Genes grouped into the categories of TCA and major CHO metabolism were significantly or moderately enriched in BS cells in all four C 4 species. A similar preferential expression of SCR and MYB59 TFs was found in C 4 grasses and in all four C 4 species, respectively (Supplementary Table S8). We conclude that the transcript abundance of core components in C 4 photosynthesis, the distribution of gene function, and the mechanism of establishment of BS might be convergent in these four independent lineages of C 4 plants.
The pathways of C 4 photosynthesis are divided into three basic biochemical subtypes based on the major decarboxylating enzymes to release CO 2 : NAD-ME, NADP-ME, and PEPCK (Gowik and Westhoff, 2011). However, the differentiation observed in our comparative transcriptomes of NAD-ME and NADP-ME subtypes goes beyond the decarboxylating enzymes. Besides aminotransferase (Fig. 4A), the NAD-ME type differs from the NADP-ME type by the differential compartmentation of specific metabolite transporters, such as sodium:pyruvate and proton:pyruvate co-transporters (Supplementary Table S3), and by more general cell-specific functional enrichment, such as RNA regulation and protein biogenesis/homeostasis (Fig. 6). The abundance of transcripts functioning in protein synthesis, folding, and assembly is more prominent in BS and M cells of switchgrass and C. gynandra than in S. viridis and maize. C 4 plants might have undergone convergent processes such as gene duplication, anatomical pre-conditioning, and compartmentalization of C 4 core enzymes to evolve C 4 photosynthesis (Sage, 2004), but may have undergone divergent processes for the optimization of M and BS cell co-ordination that are not directly associated with the C 4 pathway.

Supplementary data
Supplementary data are available at JXB online. Figure S1. Cross-and longitudinal sections of a switchgrass leaf. Figure S2. Qualitative and quantitative analysis of total RNA from manually isolated BS and M cells using a Bioanalyzer. Figure S3. Validation of C 4 -related gene expression in switchgrass by qRT-PCR. Figure S4. Phylogenetic comparison of switchgrass NAD-ME with NAD-ME protein sequences in other species. Figure S5. Phylogenetic comparison of switchgrass NADP-ME with NADP-ME protein sequences in other species. Figure S6. Analysis of syntenic genes of NAD-ME and NADP-ME in grasses Figure S7. Cell type-enriched functional groups in subcategories of protein degradation identified by Fisher exact test. Table S1. Raw and normalized data. Table S2. The fold change of C 4 enzyme activity in age-and area-related experiments in switchgrass leaves. Table S3. List of C 4 -related genes in the switchgrass M and BS transcriptomes. Table S4. Sequences of primers used in the present work. Table S5. Convergence in abundance of C 4 marker genes in switchgass, S. viridis, maize, and C. gynandra. Table S6. List of differentially expressed genes in M and BS cells from switchgrass, S. viridis, maize, and C. gynandra. Table S7. Functional distribution of differentially expressed genes in M and BS cells from switchgass, S. viridis, maize, and C. gynandra. Table S8. List of differentially expressed TFs in M and BS cells from switchgrass, S. viridis, maize, and C. gynandra.