Gene expression atlas of pigeonpea and its application to gain insights into genes associated with pollen fertility implicated in seed formation

Highlight A gene expression atlas of pigeonpea revealed spatio-temporal gene expression, co-expressed gene clusters and an important gene network critical for normal pollen and seed development.


Introduction
Pigeonpea is an important, resilient crop of the semi-arid tropics, well-suited to the dryland cropping system. It is a cross-pollinated diploid (2n=2x=22) food legume with an estimated genome size of 833.07 Mb (Varshney et al., 2012). It is mostly grown in marginal environments with low inputs and poor management practices in Asia, Africa and parts of Central America. In these developing countries, it is a major source of protein (23-30% seed protein content) and therefore plays a vital role in alleviating malnutrition, especially in children. Although it is a key staple food crop in these regions, limited efforts have been made to enhance its productivity, which has remained less than 1 ton ha -1 for over six decades. Pigeonpea productivity has been greatly challenged by various biotic and abiotic stresses. Recently, a cytoplasmic genetic male sterility based hybrid system has been established in this crop and has demonstrated improved productivity, breaking the long-standing yield barrier (Saxena et al., 2010). Pigeonpea has gained global attention due to continuously increasing demand in the developing world and lack of the desired amount of seeds in the international market.
Genomics approaches have proved to be efficient in overcoming production constraints in a number of crop species (Varshney et al., 2013;Kole et al., 2015). In the case of pigeonpea, availability of the draft genome sequence (Varshney et al., 2012) and a range of genomic resources have provided an opportunity to undertake genomics-assisted breeding (GAB). These genomic resources include the genome sequence, molecular markers, genetic maps, quantitative trait loci (QTLs), and transcriptome assembly (Pazhamala et al., 2015). Furthermore in pigeonpea, expression studies using transcriptome sequencing and quantitative real-time PCR have been conducted to understand the plant's response to abiotic stresses (drought and salinity; Sinha et al., 2015) and biotic stresses (fusarium wilt and sterility mosaic disease; Singh et al., 2016), and to study pod and seed development (Pazhamala et al., 2016). However, a baseline study of all the tissues from different developmental stages such as a gene expression atlas to increase the efficiency of such approaches has been lacking in pigeonpea. A gene expression atlas has been developed in several crops especially to define subsets of genes expressed in different tissue systems using either DNA microarray or RNA-seq approaches. In legumes, gene expression atlases are available for Medicago truncatula (Benedito et al., 2008), Lotus japonicus (Verdier et al., 2013), Glycine max (Libault et al., 2010;Severin et al., 2010), Phaseolus vulgaris (O'Rourke et al., 2014), Pisum sativum (Alves-Carvalho et al., 2015), Vigna unguiculata (Yao et al., 2016) and Arachis hypogaea (Clevenger et al., 2016). These studies aimed at investigating the complex biological processes underlying pod development, seed maturation, nodulation, and symbiosis.
The advent of the next generation sequencing technology has made sequencing of many non-model crops feasible in recent years. Pigeonpea is one of the few crops for which the technology was adopted early on to develop a draft genome sequence using the 'Asha' genotype. The genome sequence of pigeonpea has provided useful insights into the protein coding regions and gene functions, and clues to biological processes. However, this information was mainly based on the homology and de novo gene prediction programs. In order to correlate and complement the genome information with the gene expression that is modulated in a temporal and spatial manner, a gene expression atlas with 30 samples (tissues) collected from the 'Asha' (ICPL 87119) genotype has been developed. The Asha pigeonpea genotype is a widely cultivated, high yielding, medium duration inbred line resistant to several important diseases (fusarium wilt and sterility mosaic disease), for which a number of genetic and genomic resources including a draft genome have been developed.
In the present study, a gene expression atlas has been developed for pigeonpea that reports the identification and quantification of genes exhibiting spatio-temporal expression using 30 diverse tissues. Further, an example has been provided to elucidate the efficacy of this comprehensive dataset to identify a co-expressed gene network exclusive to floral tissues (bud, flower, stamen, pistil, sepal and petal). The targeted tissues are highly specialized and their development is tightly controlled and coordinated for effective fertilization and production of viable seeds. The candidate genes identified were associated to pollen fertility and seed setting, which have specific implications for the key agronomic trait of yield for their possible deployment in GAB in pigeonpea.

Plant material
Seeds of the 'Asha' genotype (ICPL 87119) were sown in three different sets under glasshouse conditions by maintaining 26 °C day/22 °C night temperature with a photoperiod of 13 h day/11 h night. These sets included seeds germinated in (i) Petri plates with filter paper, (ii) paper cups containing sterile sand, and (iii) pots containing sterile black soil and sand (1:1). All these three sets of experiments were set up in three biological replicates. Set (i) was used for harvesting tissues from the germinal stage, set (ii) for the seedling stage, and set (iii) for the vegetative, reproductive and senescence stages (Supplementary Fig. S1 at JXB online). Tissues from the germinal stages (embryo, hypocotyl, radicle, and cotyledons) and other flower/seed tissues (stamen, pistil, petal, sepal, and immature seeds) were carefully dissected on ice and immediately frozen in liquid nitrogen. Root tissues and nodules from all the stages were excised after brief washes in diethyl pyrocarbonate-treated water followed by flash freezing in liquid nitrogen. All other aerial tissues including leaves, stem, petioles, pods, shoot apical meristem, flowers and buds were excised from plants and directly frozen in liquid nitrogen. All the tissues were stored at −80 °C until total RNA isolation.
RNA extraction, Illumina sequencing and data pre-processing Total RNA was isolated using the Nucleospin RNA plant kit (Macherey-Nagel, Duren, Germany) as per the manufacturer's instructions. The qualitative and quantitative assessments of these total RNA samples were conducted using an Agilent 2100 Bioanalyzer (Agilent Technologies, Palo Alto, CA, USA). RNA samples with RNA integrity (RIN) value ≥8 were pooled in equimolar amounts from three biological replicates prior to library preparation and subsequent sequencing. The cDNA libraries were prepared using an Illumina TruSeq RNA Sample Preparation Kit (Illumina Inc., San Diego, CA, USA) following the manufacturer's instructions. Pair-end sequencing was performed in two sets: a set of 20 samples (nos 1-20) was sequenced using an Illumina HiSeq 2000 at Genotypic Technology Pvt. Ltd, India and a second set of 10 samples (nos 21-30) was sequenced in-house using an Illumina HiSeq 2500 sequencing system. The raw sequencing data were subjected to quality check to ensure high quality reads for downstream analyses. Reads with Phred score <20, read length <50 bases, and consisting of any uncalled bases using NGSQC Box (Katta et al., 2015) were filtered out.

Co-expressed gene network analysis
The co-expressed gene modules were identified and a topological overlap matrix (TOM) plot was generated using Weighted Gene Co-expressed Network Analysis (WGCNA; Horvath, 2007, 2008). The identified gene modules were further visualized using Cytoscape v 3 (Lopes et al., 2010). In addition, the PlantCARE (Lescot et al., 2002) database was used to identify and study cis-acting regulatory elements in selected genes.

Gene expression analysis using qPCR
Real-time quantitative polymerase chain reaction (qPCR) was performed to validate selected genes in four genotypes (ICPA 2039, ICPB 2039, ICPA 2089, and ICPB 2089 differing in their ability to develop fertile pollen. qPCR analysis was carried out using Applied Biosystems 7500 Real Time PCR System with SYBR Green chemistry (Applied Biosystems, USA). The actin gene was used as an endogenous control and reactions were performed with two technical replicates and two biological replicates. The relative expression of the genes in each of the four genotypes (two male sterile and two fertile) was calculated using a modified Livak method. The ΔC t value was calculated for each of the genes with respect to the housekeeping genes and was converted to fold change (2 −ΔC t) value (Livak and Schmittgen, 2001).

Identification of cis-regulatory elements and splice variants
The cis-acting elements were identified by scanning 1500 bp upstream regions of the transcription start site of the selected genes using the PlantCARE database (Lescot et al., 2002). Further, the splice junctions were identified using Finesplice (Gatto et al., 2014) in all 30 samples. Those alternative splice junctions that were supported by 10 or more reads were retained and considered for further study. The identified spliced junctions were then categorized into three alternative splicing events, namely alternative 5′ donor, alternative 3′ acceptor, and exon skipping. The splicing variants were visualized using the Integrative Genomics Viewer (Robinson et al., 2011) and represented in the form of a Sashimi plot (Katz et al., 2015).

Development of the Cajanus cajan gene expression atlas
To generate the Cajanus cajan gene expression atlas (CcGEA), 30 samples were collected representing all the major tissues encompassing the plant's complete lifecycle (Fig. 1A). These 30 samples represented five major stages of plant development, namely the germinal (four tissues), seedling (two tissues), vegetative (four tissues), reproductive (16 tissues), and senescence stages (four tissues) ( Table 1). Germinal tissues were represented by cotyledons, radicle, hypocotyl, and embryo. In the case of the seedling stage, shoot and root parts were separated. Vegetative tissues were composed of shoot apical meristem (SAM), nodule, root, and leaf. The majority of the samples/tissues were collected at the reproductive stages and encompassed leaf, petiole, bud, root, nodule, immature pod, immature seed, mature pod, mature seed, stem, SAM, whole flower, and distinct flower organs (i.e. sepal, petal, pistil, and stamen). Finally, at the senescence stage, petiole, leaf, root, and stem were collected. All samples were harvested in three biological replicates. For simplicity of usage, all the organs and organ system that were considered for the study were referred here as 'tissues' or 'samples'.
Using Illumina sequencing platform, a total of 590.84 million paired-end reads were generated from 30 samples ( Fig. 1B and Table 1). The low quality reads were filtered out (see Materials and methods, RNA extraction, Illumina sequencing and data pre-processing), retaining 559.98 million paired-end reads (94.5% of the total paired-end reads) for downstream analysis. An average of 94.43% of the filtered reads mapped to the reference genome (Varshney et al., 2012) using TopHat. Subsequently, a total of 28 793 genes were identified from reference guided assembly and their expression was quantified as fragments per kilobase of transcript per million fragments sequenced (FPKM) using Cufflinks (Supplementary Table S1). Gene expression profiling of 28 793 genes within different tissues and stages identified 27 132 genes with higher abundance (FPKM>1) and 1661 genes with lower abundance (FPKM<1). These results demonstrated sufficient sequencing coverage across the pigeonpea genome. In order to generate a comprehensive dataset for further analyses, genes were annotated based on previously annotated pigeonpea gene models (Varshney et al., 2012). Gene information such as the protein domain (UNIPROT), the putative cellular localization, the orthologous gene annotations in other species, the gene ontology (GO) IDs and names, and the putative corresponding pathways of each of these genes are indicated in Supplementary Table S1. Expression values of all annotated genes from the 30 samples were finally log 2 -transformed (Supplementary Table S1).

Analysis of gene expression clusters
To evaluate the quality of the generated gene expression atlas, a multi-dimensional scaling (MDS) analysis was first performed using the global expression dataset across the 30 samples. A clear repartition of the dataset in four major groups was observed, which represented the origin of the tissues, with aerial, underground, floral and embryo groups ( Fig. 2A). Simultaneously, Weighted Gene Co-expressed Network Analysis (WGCNA; Horvath, 2007, 2008), an R package, was utilized to identify sample outliers. Thirty samples clustered into two major clades, designated as clade I (Cl-I) and clade II (Cl-II) (Fig. 2B). Clade Cl-I bifurcated into three subclades, Cl-Ia corresponding to floral tissues, Cl-Ib corresponding to mature seed tissues, and Cl-Ic corresponding to aerial tissues including shoot primordial tissues such as hypocotyl, SAM and seedling. On the other hand, clade Cl-II comprised two subclades, Cl-IIa, which consisted of aerial tissue representing the reproductive and senescent stages, and Cl-IIb, which included all underground tissues (radicle, root, and nodule). As expected, there was no outlier detected from our set of samples, which validated the tissue sample preparation.
From the transcriptomic dataset, a total of 1044 stably expressed genes have been identified within all tissues (i.e. displaying a coefficient of variation below 30%; Supplementary Table S2). The catalogue of stably expressed genes represents a resource for identification of reference or housekeeping genes, which are necessary for comparative gene expression analysis to normalize transcript expressions due to developmental or environmental fluctuations. In the dataset, 62 stably expressed genes within all the tissues with a coefficient of variation (CV) below 20% were identified (Supplementary Table S2). These genes were mainly annotated as involved in basic cell functions such as RNA machinery (C.cajan_18478, C.cajan_31580, C.cajan_19893, C.cajan_16707), cell cycle machinery factors (C.cajan_10165, C.cajan_16931, C.cajan_25222) and actinrelated protein (C.cajan_31922). Based on the large diversity of analysed tissues in this study, it was difficult to identify genes displaying a CV lower than 10%. However, based on our comparative gene expression experiment, this catalogue of stably expressed genes could be refined to study specific tissues and/or stages. For instance, in the case of an experiment based on specific plant organs, 50 potential reference genes displaying a CV below 10% were identified for the study focused on underground tissues, 43 genes for the study of pod and seed tissues, and three genes for the study of aerial tissues (Supplementary Table S2).

Gene expression patterns across different tissues
The 28 793 genes were grouped into ten clusters (Cl-I to Cl-X) based on the k-means clustering algorithm with Euclidean distance as the similarity criterion using Multiexperiment Viewer (MeV; Howe et al., 2010;Supplementary Fig. S2). The optimum cluster number has been determined by plotting the sum of squared error values against the different values of k (Everitt and Hothorn, 2005). Cl-I, -III, -VI, -VII, -VIII, and -IX exhibited genes with constitutive expression across the tissues, among which Cl-I, -III, -VII, and -IX showed high gene expression. Interestingly, Cl-II comprised 685 genes and displayed floral-tissue preferential expression (Fig. 3A) including 14 transcription factors (TFs), 11 cytochrome P450, and ten L-ascorbate oxidase homologs among others. This cluster consisted of several transcription factors previously reported as developmental regulators, such as floral homeotic proteins AGAMOUS, APETALA 1, GLOBOSA and DEFICIENS (Egea-Cortines et al., 1999;Lohmann and Weigel, 2002). Other putative regulators were AGAMOUS-like MADS-box proteins AGL18 and AGL19, zinc finger proteins CONSTANS-like 3 and 4, NAC25, putative WRKY42, MYB113, SEPALLATA 2-3, and HAT5 (see corresponding gene annotations in Supplementary Table S3).
Similarly, spatially regulated genes between aerial and underground tissues were also studied. In order to examine the gene expression patterns in these two types of tissues, all the aerial tissues and all the underground tissues were pooled. As expected, a volcano plot depicted significant up-and down-regulation of DEGs between aerial and underground tissue samples (Fig. 5). All up-regulated genes were plotted on the right hand side of the dotted line passing through the center of the volcano, while down-regulated genes were plotted towards the left hand side. A total of 75 DEGs including 68 significantly up-regulated and seven down-regulated with log 2 fold change values ≥5 or ≤-5, respectively, were identified (Fig. 5). Among aerial tissues, highly expressed genes included those encoding mainly photosynthetic apparatus-related chloroplastic proteins (C.cajan_01316, C.cajan_13889, C.cajan_13832, etc.) and cell-wall modifying enzymes (C.cajan_42632), along with other proteins such as late embryogenesis abundant protein D-29 (C.cajan_03928), basic 7S globulin (C.cajan_10207), and oxygen-evolving enhancer protein 3-2 (C.cajan_37794). Other enzymes involved in photorespiratory carbon metabolism and wax biosynthesis have also been identified in aerial tissues, such as serine-glyoxylate aminotransferase (C.cajan_22602) (Somerville and Ogren, 1980) and 3-ketoacyl-CoA synthase 17 (C.cajan_28231) (Todd et al., 1999), respectively. Among the underground tissue samples, DEGs encoding leghemoglobin (C.cajan_22417), sugar transport protein 13 (C.cajan_07532), subtilisin-like protease SDD (C.cajan_02637), probable 2-oxoglutarate-dependent dioxygenase AOP1.2 (C.cajan_00158), and cytochrome P450 78A5 (C.cajan_19316) were highly expressed in nodules, while cytokinin dehydrogenase 3 was over-expressed in the radicles. Subtilisin-like proteases and leghemoglobins have been suggested to have widespread function during early stages of nodule symbioses (Ribeiro et al., 1995;Szczyglowski et al. 1997).

Tissue-specifically expressed genes
Apart from DEGs, 220 genes were identified with specific expression in exclusively one tissue ( Fig. 4 and Supplementary Table S8). Tissues displaying the highest number of specifically expressed genes were bud (39), stamen (32) and seedling root (20) (Fig. 6). Various TFs have been specifically expressed in different tissues, including MYB-like protein J (C.cajan_48299) from bud, RADIALIS (C.cajan_31277) from leaf (reproductive stage), and GATA transcription factor 27 (C.cajan_17218) from immature seeds (5 days after anthesis). Several other genes were also identified, such as those encoding BOBBLER 2, PHD finger protein ALFIN-LIKE 1, and plantacyanin in bud, superoxide dismutase and auxin-induced protein 6B in nodules (vegetative stage), FAR-RED IMPAIRED RESPONSE 1 protein in stamen, polygalacturonase in pistil, and sucrose synthase 2 in mature seeds (30 days after anthesis), apart from several retrovirusrelated proteins. Defense-related proteins such as defensinlike protein 19 and defensin-like protein 244 were specifically expressed in root (reproductive stage) and immature pods (5 days after anthesis), respectively. A complete list of tissuespecific genes is provided in Supplementary Table S8.

Identification of flower-related genes using a network analysis approach
A group of genes identified by expression patterns that are tightly interconnected constitutes a module that is most likely to be involved in common biological functions. In order to understand the complex biological networks involved in the different developmental processes, a systems biology approach Fig. 5. Volcano plot showing significant genes that were differentially expressed between aerial and underground tissues. Analysis and visualization of significant differentially expressed genes was performed using CummeRbund. Each spot represents a gene that has been plotted between log2 (fold change) on the x-axis and -log 10 of the P-value on the y-axis. Red dots represent significantly regulated genes (either up-or down-regulated) and black dots represent non-significant genes.
using WGCNA was used to analyse DEGs from different tissue combinations. A correlation matrix was generated among the 30 tissues using a set of 1076 DEGs resulting in three major modules, turquoise (containing 400 genes), blue (208 genes) and brown (197 genes). Here, modules are referred to the distinct groups formed by the clustering of genes, and each module has been designated by an arbitrary color to distinguish between them (Fig. 7A). Module to tissue association was measured and resulted as a close relationship between the turquoise module and aerial tissues, whereas blue and brown modules were associated with underground and floral tissues (i.e. bud, flower, pistil, and stamen), respectively. As observed by Pearson's correlation coefficients and P-values, the strongest expression association was measured in the brown module, especially in the floral tissues (marked with a red square in Fig. 7B). Therefore, a weighted correlation network for genes belonging to the brown module was visualized using Cytoscape.
The circular network depicted 28 genes represented by nodes, interconnected with edges (connecting lines between genes) to identify three highly connected genes, referred to as 'hub' genes. WGCNA defines co-expression networks as weighted gene network, where the nodes correspond to gene expression profiles, and edges are determined by pairwise correlations between gene expressions. Genes within the co-expression module that display high connectivity form the 'highly connected genes' referred to as 'hub genes' (Langfelder and Horvath, 2008). The hub genes identified in the co-expression network were annotated as encoding a sucrose-proton symporter 2 (C.cajan_35396), a pollen specific SF3 protein (C.cajan_07765) and an uncharacterized protein (C.cajan_28171). C.cajan_35396 gene encoding a putative H + symporting sucrose transporter protein 2 has been suggested to be involved during pollen maturation in mediating sucrose uptake in pollen grains (Lemoine et al., 1999). In the co-expression network generated, each of the two hub genes (C.cajan_35396 and C.cajan_28171) was connected to all the other 27 genes, which encoded serine threonine protein kinases (C.cajan_18757, C.cajan_20002, C.cajan_07067), pectinesterase inhibitors (C.cajan_10140, C.cajan_04310, C.cajan_46391), pectate lyase 3 (C.cajan_ 44741, C.cajan_27022), pollen-specific proteins (C.cajan_02582, C.cajan_07765, C.cajan_11513), Olee1like (C.cajan_31667, C.cajan_32224), L-ascorbate oxidase homolog (C.cajan_19226, C.cajan_20191), ATPase 8 (C.cajan_45656), β-galactosidase 13 (C.cajan_32927), polygalacturonase (C.cajan_04312), phosphatidylinositol transfer protein (C.cajan_35458), boron transporter 6 (C.cajan_04911), formin-like protein 5 (C.cajan_32517), aldose 1-epimerase (C.cajan_31220), and uncharacterized proteins (C.cajan_ 04239, C.cajan_24722, C.cajan_02776, C.cajan_27282). Among these genes, L-ascorbate oxidase, β-galactosidase, polygalacturonase homolog and sucrose transporter were reported to be pollen-specific genes (Masuko et al., 2006). C.cajan_07765, annotated as SF3 protein, displayed a high and specific expression in floral tissues including bud, flower, pistil, stamen, petal, and sepal. This gene has been previously reported as a TF regulating the expression of late pollen genes (Baltz et al., 1992a) and has been found to be connected to 13 other genes in the co-expression network (Fig. 7C). All the genes were co-expressed exclusively in floral tissues (bud,  stamen, pistil, petal, and sepal) with high gene expression in stamen (Fig. 7D), which validated this approach to identifying a gene network of flower development-related genes. Table S9) belonging to the floral gene network (circular-closed network, Fig. 7C) was analysed in the floral tissues using qPCR. Gene expression was studied in four pigeonpea genotypes, two of which were male sterile (ICPA 2039 and ICPA 2089) and two fertile (ICPB 2039 andICPB 2089). These genotypes were used for an A 4 -based hybrid breeding system in pigeonpea (Saxena et al., 2011). Out of 25, 20 genes showed marked differences in their expression among the male sterile and fertile counterparts. These genes showed much lower expression (0 to 0.6-fold change) in the male sterile genotypes compared with the fertile genotypes (ICPB 2039) (Fig. 8). A more than 1.4fold increased expression of C.cajan_46391, C.cajan_44741, C.cajan_27282 and C.cajan_35390 was observed in both fertile genotypes, namely ICPB 2039 and ICPB 2089, when compared with their male sterile counterparts. C.cajan_27282, an uncharacterized protein, showed 2.5-fold induced expression in the fertile genotypes (ICPB 2039) while in the sterile genotypes the expression was found negligible (0.5-fold).

Promoter analysis of flower-related genes
Sequence analyses of the promoter regions of these coexpressed genes identified a majority of cis-acting elements to be involved in light-responsiveness, in addition to the elements required for seed-specific regulation, endosperm expression and phytohormone responsiveness. In 25 flower-related genes, the promoter sequences could be identified and contained 243 light-responsive cis-elements, suggesting a strong stimulusdependent expression, particularly in response to light. In addition, phytohormone regulation of these genes was suggested due to the presence of multiple methyl jasmonic acid (MeJA), salicylic acid (SA), gibberellin (GA) and abscisic acid (ABA) responsive elements. Genes that were validated using qPCR also showed the presence of light responsive, circadian control, MeJA, SA, auxin, ABA, and endosperm-responsive sequence elements (Supplementary Table S10).

Splice variant study of flower-related genes
Alternative splicing (AS) events were studied in all the 28 genes belonging to a floral gene network across all the 30 samples. Overall, 18 AS events were identified among eight genes, namely C.cajan_11513, C.cajan_31667, C.cajan_35458, C.cajan_45656, C.cajan_35396, C.cajan_18757, C.cajan_27022 and C.cajan_24722. These AS events consisted of an alternative 3′ acceptor site, an alternative 5′ donor site, exon skipping, and alternative 3′ and 5′ splice sites. All the eight genes showed splicing events in floral tissues (bud, flower, stamen, pistil, sepal, and petal), while C.cajan_18757 also showed an alternative 5′ donor site in Immature seed, Immature pod, Sen_Petiole, and Rep_SAM (Supplementary Table S11). The 'hub' gene encoding sucrose-proton symporter 2 (C.cajan_35396) and two other genes encoding a pollen-specific SF3 protein (C.cajan_11513, Fig. 9A) and an uncharacterized protein (C.cajan_24722, Fig. 9B) have AS of 'alternative 3′ acceptor site' preferentially in stamen. On the other hand, two splice variants (alternative 5′ donor site and exon skipping) have been identified for C.cajan_45656 preferentially found in bud and stamen (Fig. 9C), whereas an alternative 3′ acceptor site and alternative 5′ donor site has been observed exclusively in petals and sepals for the gene C.cajan_31667 encoding an Olee 1-like protein. In all the six floral tissues, an alternative 3′ and 5′ splice site was also identified for C.cajan_35458 and C.cajan_32927. Three examples of AS events preferentially found in Rep_Stamen for three different genes are shown in comparison with Rep_Pistil ( Fig. 9A-C). The AS events, namely alternative 3′ acceptor site (Fig. 9A), alternative 5′ donor site (Fig. 9B) and exon skipping (Fig. 9C), have been identified for the genes C.cajan_11513, C.cajan_24722, and C.cajan_45656, respectively.

Discussion
In this study, a comprehensive gene expression atlas of pigeonpea (CcGEA) has been developed, which catalogued more than 28 000 genes that were expressed in 30 diverse tissues of the plant and at five different developmental stages.
This comprehensive dataset will enhance the present understanding of the genes involved in various regulatory and metabolic processes, which could directly impact important agronomic traits. With the recent advances in genomics research, GAB has accelerated precision and efficiency of breeding in many crops (Varshney et al., 2013;Kole et al., 2015). Development of the CcGEA together with the available genome sequence and other genomic resources offers an opportunity for pigeonpea researchers to not only solve biological questions, but also facilitate the GAB for pigeonpea improvement.
The RNA-seq data were analysed by pairwise comparison of all the samples between one another to identify the differentially expressed genes and also by clustering genes based on different algorithms. The best way to look into the complex biological and metabolic processes is to cluster genes based Fig. 9. Alternative splicing in three flower-related genes. Three examples of alternative splicing variants have been shown in Rep_Stamen samples compared with Rep_Pistil using a Sashimi plot. The three flower-related genes, namely C.cajan_11513, C.cajan_24722, and C.cajan_45656, have shown an alternative splicing event in Rep_Stamen (displayed in blue) with respect to Rep_Pistil (displayed in red). Alternative 3′ acceptor site, alternative 5′ donor site and exon skipping have been found in C.cajan_11513, C.cajan_24722, and C.cajan_45656, respectively. The numbers in the figure denote the read coverage supporting the alternative splicing. on their expression pattern and analyse them. By clustering genes, it would be possible to understand the co-regulated and functionally related genes. k-means algorithm based clustering identified clusters related to aerial, underground and senescing tissues, in addition to the DEGs identified by pairwise comparison between tissues and stages. A more complex co-expression-based gene cluster analysis (i.e. WGCNA) was further used to identify functional modules based on the assumption that each module contains genes involved in similar biological function (Holter et al., 2001;Hasty et al., 2001;Shen-Orr et al., 2002;Mutwil et al., 2010). A similar approach was effectively utilized to uncover complex and novel networks involved in strawberry flower development (Hollender et al., 2014). Instead of looking into the individual gene, studying the module or cluster of genes that are expressed in a similar fashion would be more informative.
To demonstrate the usefulness of the CcGEA for identifying candidate genes for key agronomic traits, as an example, a module related to flower development (the brown module) identified by gene network analysis has been studied. The brown module revealed pigeonpea genes involved in late pollen maturation, pollen tube formation and fertilization. The 'hub' gene of the network (C.cajan_07765) was a pollen-specific SF3 gene, which is a developmentally regulated TF, well documented in Arabidopsis to play a role in expression of late pollen genes, pollen tube formation and fertilization events (Baltz et al., 1992a, b). Another 'hub' gene, C.cajan_35396 codes for a H + -symporting sucrose transporter protein 2, typically named SUC2 protein, responsible for loading sucrose into sink cells such as developing pollen (Sauer, 2007) and other floral tissues. These genes were predicted to regulate directly or indirectly others genes in the network. Others genes such as C.cajan_44741, C.cajan_04312, and C.cajan_27022 encoded proteins involved in pentose and glucuronate interconversion, also known to have a significant role in mature pollen development (Ma et al., 2012). A putative boron transporter (C.cajan_04911), important for maintaining boron homeostasis, is critical for pollen viability and ability to accumulate starch, as boron deficiency could lead to impaired pollen viability (Sze et al., 2006;Fang et al., 2016). A formin-like protein 5 (C.cajan_32517) has also been identified and is known to be involved in pollen-pistil interaction (Boavida et al., 2005). The expression and putative function of all these genes strongly suggests that they are associated to pollen viability and fertilization, potentially significant for seed setting directly or indirectly. These candidate genes would be a putative targets for functional validation and further study.
In pigeonpea, a hybrid breeding system based on cytoplasmic male sterility for A 4 cytoplasm is well-established and commercialized (Saxena et al., 2010). This provided an opportunity to validate the expression of genes involved in pollen fertility in two sets of male sterile genotypes and its fertile counterparts. Using qPCR, expression of genes encoding pectinesterase inhibitor 13 (C.cajan_46391), probable pectate lyase 3 (C.cajan_44741), serine/threonine-protein kinase (C. cajan_07067), sucrose-proton symporter 2 (C.cajan_35390) and an uncharacterized protein (C.cajan_27282) have been shown to have important roles in development of pollen. Further, the sequence analysis of the promoter regions of these genes has suggested their stimulus-dependent expression in response to light and phytohormones such as abscisic acid, auxin, salicylic acid, and methyl jasmonic acid. In addition, the preferential splicing events in seven of the genes exclusively in the floral tissues including bud, flower, stamen, pistil, sepal, and petal have suggested their critical role in normal pollen and seed development. Additionally, gene clusters represent interconnected and highly correlated genes that would be helpful in interpreting the biological role of those that are novel or uncharacterized. That is, clustering and visualization of the co-expressed gene network allows understanding of the basic function of genes that were annotated or unannotated genes forming a module in performing a specific function (Childs et al., 2011). In this study, two uncharacterized proteins, C.cajan_27282 and C.cajan_28171, have been shown to be involved in normal pollen development.
Indeed, growth and development have been understood as tightly regulated processes, through a multi-level regulation of gene expression. At the DNA level, chemical modifications of DNA and histone modification have been recognized as important regulatory mechanisms for controlling gene expression (Pfluger and Wagner, 2007;Saletore et al., 2012). Such modifications have also been found at the RNA level, especially modifications of the mRNA, which have come to be known as the 'epitranscriptome'. This is presumed to be an additional layer of regulation between DNA modification and post-translational modification. As DNA and histone modifications affect regulation of gene expression, mRNA modifications have recently been found to be crucial for proper plant development (Bodi et al., 2012;Fray and Simpson, 2015). In mRNA, methylation of adenosine at the N 6 position seems to be the most prevalent and has been found to be absolutely necessary for plant survival (Zhong et al., 2008). This post-transcriptional modification is reversible, with socalled writer, reader, and eraser proteins . Epitranscriptomics was explored in pigeonpea by identifying orthologous genes and looking at their expression patterns in pigeonpea tissues, previously reported in Arabidopsis (Zhong et al., 2008). It was revealed that eight genes potentially encoding orthologous proteins were transcriptionally active in the CcGEA dataset. Moreover, some tissues related to active developmental processes such as embryo and immature/mature seed displayed higher abundance of these genes, suggesting that post-transcriptional regulation plays a crucial role in seed and embryo development in pigeonpea. These genes could be studied further for their involvement in modulating important agronomical traits related to seed development and germination.
A gene expression atlas has been developed in different legumes such as Medicago, Lotus, soybean, pea, black-eyed pea, and peanut with a focus on important traits. For instance, in the case of Medicago and pea, genes preferentially expressed in nodules have been emphasized, whereas in the case of the peanut, genes involved in geocarpy have also been described. Similarly, in the case of soybean and black-eyed pea, seed development and maturation have been focused, respectively. In pigeonpea, the CcGEA has facilitated identification of candidate genes of agronomic importance for possible deployment of GAB. This has been illustrated with the identification of candidate genes associated with pollen fertility and fertilization crucial for seed formation, providing potential candidates for future studies. Likewise, the CcGEA provides a compendium of genes identified in 30 diverse tissues that are clustered based on their expression patterns. Gene clusters have been identified with aerial and underground preferential expression, in addition to those vital for floral morphology and symmetry. These clusters could be further analysed to identify candidate genes for different agronomic traits. For instance, Cl-I has been shown to be enriched with stress-responsive genes, especially for drought and heat stress. These included genes encoding stress-induced protein (SAM22), universal stress protein-A, heat shock proteins, protein EARLY RESPONSIVE TO DEHYDRATION 15 (ERD15), and many stress-related proteins (SRP). This cluster could be analysed for identifying candidate genes that would be crucial for bolstering hardiness to the crop. Furthermore, this resource could also be utilized to look into the baseline expression of genes studied in other legumes/ crops in different tissues of pigeonpea that could be traced at different developmental stages. Thus, this resource could be valuable for the scientific community not only working in pigeonpea but also in related legume crops.

Conclusions
The gene expression atlas (CcGEA) developed in pigeonpea complements the genome sequence of pigeonpea and other genomic resources in understanding gene functions and their biological role. The CcGEA represents a comprehensive dataset of genes expressed in 30 diverse tissues across five developmental stages from embryo to senescence. The dataset has been analysed using pairwise comparison, clustering and correlation network analysis. The efficacy of the CcGEA has been demonstrated by identifying a gene network of 28 genes putatively regulated by a pollen-specific SF3 and a sucrose-proton symporter. Gene expression studies using two sets of male sterile and fertile genotypes revealed 20 genes crucial for pollen development. The role of these genes could also be ascertained in floral tissues with exclusive splicing variants identified in these tissues. This study also provide genes that would be excellent candidates for a reverse genetics approach to determine their roles in pollen fertility and seed formation. Likewise, this dataset could be further analysed to identify candidate genes for various agronomic traits such as abiotic stress tolerance, especially for drought and heat stress. The CcGEA would also be useful in looking at the basal expression of genes when investigating mutant genotypes or any candidate gene expression for a specific agronomic trait. This resource will be valuable for studying the genes expressed in specialized tissues or organ systems such as nodules, flowers and pods in pigeonpea or related legumes. With further refinement of the existing draft genome assembly or the development of a pan genome, the CcGEA could be improved further and in that scenario, it will provide more and comprehensive insights into gene expression.

Supplementary data
Supplementary data are available at JXB online. Fig. S1. Three experimental set-ups of pigeonpea grown under glasshouse conditions for sampling different tissues. Fig. S2. k-means clustering of 28 793 significantly expressed genes into ten clusters. Table S1. Expression values (log 2 transformed FPKM) of genes expressed in all 30 tissues included in the gene expression atlas. Table S2. Stably expressed genes across all the 30 tissues. Table S3. Expression values of genes expressed preferentially in floral tissues. Table S4. Expression values of genes expressed preferentially in underground tissues. Table S5. Expression values of genes expressed preferentially in aerial tissues. Table S6. Expression values of genes expressed preferentially in senescing tissues. Table S7. List of DEGs depicting significant differential expression. Table S8. Expression values (log 2 transformed FPKM) of specifically expressed genes. Table S9. Gene-specific primers used for qPCR. Table  S10.
Cis-acting elements identified in flower-related genes. Table S11. Splice sites identified in flower-related genes.