Transcriptome-wide characterization of candidate genes for improving the water use efficiency of energy crops grown on semiarid land

Highlight A matrix correlation analysis between transcriptome-wide expression levels and water use efficiency of Miscanthus lutarioriparius identified candidate genes facilitating adaptation of the energy crop to semiarid marginal land.


Introduction
Drought or water deficiency is one of the most significant abiotic stresses that negatively impacts on plant survival, growth, and productivity (Cattivelli et al., 2008).Decreasing water availability, as a result of increasing industrialization and continuing climate change poses a growing threat to sustainable agriculture (Boyer, 1982;Ings et al., 2013).Water use efficiency (WUE), a drought-adaptive trait that balances carbon assimilated per unit of water transpired, has been linked to drought resistance and higher yields (Wallace, 2000;Tardieu, 2012).WUE is a particularly important factor for dedicated energy crops that are established on marginal land for lignocellulosic feedstock production (Zhang et al., 2011;Suyker and Verma, 2012).Therefore, unravelling the genetic basis of the WUE of energy crops under water deficit conditions holds the potential for the development and improvement of energy crops.
The giant C 4 grasses from the genus Miscanthus have been identified as a valuable germplasm source for secondgeneration energy crops (Clifton-Brown and Lewandowski, 2000;Carroll and Somerville, 2009;Somerville et al., 2010;Sang, 2011;Feltus and Vandenbrink, 2012;Robson et al., 2013).Miscanthus lutarioriparius, an endemic species to central China, produces the highest biomass among the wild Miscanthus species and is also capable of high carbon sequestration and effective soil restoration in eroded regions (Chen and Renvoize, 2006;Sang and Zhu, 2011;Mi et al., 2014).When transplanted to the semiarid Loess Plateau where the annual precipitation is less than half of that in its native habitats, it showed higher WUE and produced even higher biomass than in its native habitats (Liu et al., 2012(Liu et al., , 2014;;Liu and Sang, 2013;Yan et al., 2012Yan et al., , 2015)).Whereas it is evident that WUE is of great importance for developing the energy crop in this area, the genetic mechanisms underlying the improvement of WUE in M. lutarioriparius remains unknown.
Microarray analysis was a widely used transcriptomic technique for identifying candidate genes related to drought resistance and WUE in crops and plants with genome sequences, such as Triticum aestivum L, Oryza sativa, and Arabidopsis (Xue et al., 2006;Karaba et al., 2007).The expression profiling analysis performed on wheat revealed that 11 genes are positively correlated with high WUE, measured as carbon isotope discrimination (Xue et al., 2006).The overexpression of the Arabidopsis HARDY gene in rice improved WUE by enhancing photosynthetic assimilation and reducing transpiration (Karaba et al., 2007).These studies suggested that these differentially expressed genes were candidates underlying WUE or targets for investigating expression quantitative trait loci (eQTLs).With the development of next-generation sequencing, RNA-Seq has become a powerful tool for the comparative analyses of gene expression which is also applicable to study systems without a reference genome sequence (Martin and Wang, 2011;Zhang et al., 2011;Xu et al. 2015).
In this study, RNA-Seq data of 78 individuals of M. lutarioriparius were compared and a correlation analysis was conducted between gene expression patterns and field-measured WUE.The 78 individuals were collected from the 14 populations of the species and planted in two experimental fields, one near its native habitats in Jiangxia of Hubei Province (JH) and the other in the target domestication site, Qingyang of Gansu Province (QG) located in the Loess Plateau of China (Yan et al., 2012).The leaf-level WUE of these individuals was measured in both fields.To investigate the genetic basis of WUE changes from the native to the domestication site, a method of matrix correlation analysis was developed and candidate genes presumably associated with WUE were identified.These results have provided new insights into the adaptive mechanisms of the energy crops to the semiarid region and may help to speed up energy crop domestication.

Plant materials
Mature seeds of 14 populations of M. lutarioriparius were collected in October and November 2008 and planted in 2009 in the experimental fields at Jiangxia in Hubei Province (JH) near its native habitats and at Qingyang in Gansu Province (QG) located in the Loess Plateau (Yan et al., 2012).The altitudes of JH and QG were 45 m and 1 258 m, respectively.The average annual temperature was 16.7 °C and the average annual precipitation was 1 319 mm in JH, while it was 9.3 °C and 556.5 mm in QG, respectively.In this study, 39 individuals of M. lutarioriparius at each site were sampled from 14 populations taken at random (see Supplementary Table S1 at JXB online).The individuals from one site were considered to be one large population because no distinct population structure was detected for either site (Xu et al., 2015).
The samples were kept in liquid nitrogen and brought to the laboratory for RNA isolation.The transcriptome of those leaf samples were sequenced using Illumina HiSeq 2000.The reference transcriptome that included 18 503 unigenes of M. lutarioriparius was assembled via Population RNA-Seq to Assemble Reference Transcriptome (PopART) as described by Xu et al. (2015).The expression level of each sample was estimated as FPKM (fragments per kilobase of exon per million fragments mapped) with Cufflinks v2.0.2 (Trapnell et al., 2010;Xu et al., 2015).The transcriptome coverage per sample, after filtering for read depth, ranged from 41.2% to 74.7%, with an average of 60.4%.Moreover, the sequencing depth was saturated when the number of 80 bp reads of an individual used for assembly, reached about 40 million.The number of reads for each individual is presented in Supplementary Table S1 at JXB online.Genes in the reference transcriptome had an average length of 1 601 bp and N50 of 1 871 bp, of which 93.6% ranged from 500-5 000 bp [see Table S2 in Xu et al. (2015) for details].The raw sequence data are available at NCBI's Short Read Archive under three BioProjects, PRJNA227191, PRJNA227195, and PRJNA226258.

Gas exchange measurements
CO 2 assimilation rate (A) and transpiration rate (E) were logged using the LI-6400 portable photosynthesis system (LI-COR 6400 XT system; LI-COR, USA).Instantaneous water use efficiencies (WUE) were calculated as the ratio of the CO 2 assimilation rate to transpiration rate (A/E) (Polley, 2002).Counting from the top, measurements were conducted on the middle part of the fourth leaf, which was fully expanded, under ambient temperature and photon flux density, while the CO 2 concentration was maintained at 400 μmol mol -1 .An infra-red gas analyser (IRGA) was used to reach equilibrium (monitor ΔCO 2 and ΔH 2 O) every 20 min.These measuring processes were conducted between 10.00 h and 12.00 h.Once the measurements were completed, the leaves were cut off as soon as possible and frozen immediately in liquid nitrogen for subsequent RNA isolation.Because the growing season was one month later in QG than in JH (Yan et al., 2012), the samples were taken around noon on 12 June 2012 in JH and on 13 July 2012 in QG.

Matrix correlation between WUE and expression data
As described above, the levels of gene expression for 18 503 transcripts were estimated based on transcript abundance calculated using FPKM.In order to identify the candidate genes responsible for the change of WUE in the new environment, a new method was explored for non-model species without reference genomes.First, genes were weeded out with the median FPKM value of zero and 15 495 genes were left for further analysis.For each gene, the FPKM value of each individual in QG was divided by the value in JH, with all combinations,which resulted in a 39 × 39 matrix, as it did for the WUE value.The 39 × 39 WUE matrix was then correlated with each 39 × 39 FPKM matrix by mantel test using Spearman's rank correlation method (Fig. 1).The analyses were conducted using vegan package (version 2.0-10) in the R statistical environment release 3.0.2.Thus, a Spearman's rank correlation coefficient was generated for each gene while using a 10 000 permutation test to assess its statistical significance.

Functional categorization and expression pattern analysis of candidate genes
The whole transcriptome of M. lutarioriparius was annotated and a search was made for all the photosynthesis genes of Zea mays and Sorghum in the KEGG and NCBI databases.Then the DNA sequences of these genes were compared with M. lutarioriparius transcripts and photosynthesis genes in M. lutarioriparius were identified.In addition, a detailed functional categorization for candidates was performed using the Nucleotide Basic Local Alignment Search Tool (BlastN) on the National Center for Biotechnology Information (NCBI) non-redundant nucleotide database (Nt).The expectation value (E value) was used to determine the most likely result of a query sequence.Only those results with an E value lower than 10 -10 were considered.
Hierarchical cluster analysis was conducted from the expression levels of all candidate genes (mantel test, P <0.01), using Pearson's correlation distance in Multiexperiment Viewer (MEV) 4.9 software (Saeed et al., 2003).All of the absolute FPKM values of each individual in JH and QG were normalized (divided by the standard deviation of the observations).The differentially expressed genes of candidate genes between the two sites were identified using the t test (normal distribution) or the Wilcoxon test (non-normal distribution) on expression levels.In order to control the family-wise type I error rate (FWE), the raw P values were adjusted using the Benjamini and Hochberg method (1995), which corrected for false discovery rate (FDR).

Validation of gene expression from RNA-Seq
Eight randomly selected genes were used to validate the expression profiling accuracy of RNA-Seq by quantitative real-time PCR (qPCR).The RNA samples were the same as the ones for RNA-Seq.The qPCR primers for the amplification of targeted genes were designed using Primer Premier 5.0 (Premier Biosoft International).Complementary DNA (cDNA) synthesis was carried out using PrimeScript™ Reverse Transcriptase (Takara).Amplification of cDNA was monitored using a SYBR Premix Ex Taq (Takara) on a StepOne Plus Real-Time PCR system (Life Technologies).Each PCR reaction contained 2 μl of the diluted cDNA, 10 μl Takara SYBR Premix Ex Taq, 6.8 μl of nuclease-free water, and 0.8 μl of the forward and reverse primers (10 mΜ stock) in a 20 μl reaction mixture.The PCR cycling conditions were as follows: 95 °C for 30 s, followed by 40 cycles of 95 °C for 5 s and 60 °C for 40 s.The melting curve was routinely performed after 40 cycles to verify primer specificity.Three technical replicates were analysed for each template to calculate the average and standard deviation of the expression levels.Relative expression levels of target genes among the sampled individuals were determined using the 2 -ΔΔCt method with the β-actin gene used as the normalizer (Schmittgen et al., 2000).

Genetic variation of candidate genes
Single nucleotide polymorphisms (SNPs) were identified using SAMtools with default settings as described in Xu et al. (2015).Genotypes of all individuals in SNPs within one gene were pooled together as input data for haplotype inferences using PHASE v2.1.1 software which was based on a coalescent model and a hidden Markov model (HMM) method (Stephens and Donnelly, 2003).Nucleotide diversity (π) was calculated for each transcript in JH and QG using custom Perl script according to the method introduced by Nei and Li (1979).In our study, haplotypes were considered as alleles, which were the connection of SNPs within a gene.The genotype of each individual was the best reconstruction of haplotypes with the highest probability.The average gene expression levels were computed for individuals with the same genotype (The minimum number of individuals for each genotype which was compared was not less than three, n ≥3.)To quantify the effects of genotype, environment, and genotype-by-environment interaction on the candidate genes with genetic variation, a two-way (genotype and environment) factorial analysis of variance (ANOVA) was performed directly on the transcriptional levels (FPKM) (Anholt and Mackay, 2004;Landry et al., 2006).For each transcript, firstly, an ANOVA was performed in R project (http:// www.R-project.org)taking genotype, environment, and their interaction into consideration.In the second step, genes without significant genotype×environment effect were analysed using a model without any interaction effect to identify those genes showing main effects of genotype and growth environment alone.

Results
Comparison of water use efficiencies of M. lutarioriparius between two sites WUE ranged from 2.484 to 4.027 μmol mmol -1 in JH, while it ranged from 2.886 to 4.930 μmol mmol -1 in QG.The minimum value of WUE was lower in JH than in QG, and the maximum value of WUE showed the same pattern.In addition, the average WUE in JH and QG were 3.340 and 3.956 μmol mmol -1 , respectively.In order to detect the differences of WUE between M. lutarioriparius individuals in JH and QG, an unpaired two-group Student's t test analysis was carried out which showed that the WUE of M. lutarioriparius individuals was significantly higher in QG than those in JH (P <0.001; Fig. 2).The CO 2 assimilation rate (A) and transpiration rate (E) were recorded for all individuals in both fields (see Supplementary Fig. S1 at JXB online).

Identification and classification of candidate genes in the M. lutarioriparius transcriptome
According to the matrix correlation between expression levels and WUE, a total of 48 genes were identified at the P <0.01 level and predicted to be candidates for WUE alteration in the new environment.Correlation coefficients (r) of these genes ranged from 0.232 to 0.429, revealing their contributions to WUE alteration (Table 1).The detailed functional categorization of candidates was performed on NCBI using BlastN with E values lower than 10 -10 (Table 2).
According to the blast, there were eight photosynthesisrelated genes among the identified candidates.Five of them encoded proteins involved in the assembly of photosystems I and II and one was involved in the C 4 pathway.There were nine genes found to be related to stomatal regulation, of which five could regulate abscisic acid (ABA) signal transduction.One gene encoded a homeobox transcription factor regulating stomata density and other three encoded proteins regulating the stomatal movements.Twelve abiotic stressrelated genes were also found in the 48 candidate genes.They participated in multiple pathways responding to abiotic stress, including drought stress and salt stress.Four genes were involved in protein metabolism.The remaining genes included 12 with unknown function (Table 2).Of 36 candidate gene with annotated functions, one third are abiotic stressrelated, followed by the stomatal regulation-related and the photosynthesis-related genes, each accounting for about one quarter.Expression pattern of candidate genes in two sites The fold changes of average expression levels of the candidate genes between the two sites ranged from 0.26 to 15.60 (see Supplementary Table S2 at JXB online).Compared with those in JH, 72.92% and 27.08% of the genes were up-regulated and down-regulated in QG, and 29.2% and 6.25% of the candidates were up-regulated and down-regulated more than 2-fold, respectively (Fig. 3).Fourteen genes were upregulated more than 2-fold including three photosynthesisrelated genes, three stomatal regulation-related, and four abiotic stress-related.Three genes down-regulated more than 2-fold belonged to the 'Other' functional category.
Fifteen candidates showed significant differentiation on gene expression levels (P <0.05;Table 3).Ten were significantly up-regulated, including five photosynthesis-related genes, two stomatal regulation-related, one abiotic stressrelated, one protein metabolism-related gene, and one gene in the 'Others' functional category.Genes encoding the ORMDL family protein, ASRGL1-like, and three hypothetical proteins were significantly down-regulated.The results showed that 62.5% of photosynthesis-related genes were significantly up-regulated in QG.Considering the significant enhancement of WUE in QG (Fig. 2), the up-regulation of photosynthesis-related genes should have played an important role in regulating WUE upon water deficiency.
Cluster analysis of the 48 candidate genes with normalized expression levels revealed four major groups (Fig. 4A).Cluster 1 comprised nine genes that were expressed largely at higher levels in JH than in QG (Fig. 4B).It contained four genes related to abiotic stress responses and one gene 'MluLR3628' encoding the ORMDL family protein involved in protein folding in the endoplasmic reticulum.The functions of MluLR4945, MluLR16034, MluLR12611, and MluLR13061 were related to abiotic stress and encoded lecithin-cholesterol acyltransferase-like 1 (LCAT1-like), cyclophilin type peptidyl-prolyl cis-trans isomerase (cyclophilin type PPIase), methyl-CpG binding domain 106 protein, and lysine-specific histone demethylase 1 (LSD1), respectively.
Cluster 2 included nine genes showing higher expression levels in QG than in JH (Fig. 4C).Four genes belonged to the 'Stomatal regulation-related' functional category, including genes encoding ubiquitin-protein ligase E3, anion transporter 4, starch synthase II-2, and WRKY transcription factor 4 (WRKY4).Among them, MluLR2876 and MluLR11870 encoding ubiquitin-protein ligase E3 and Fig. 3.The distribution of gene expression change patterns of the candidate genes.Fold changes of gene expression levels are expressed in log 2 (FPKM ratio), where the FPKM ratio was calculated as the ratio of FPKM (QG) to FPKM (JH).FPKM (JH) and FPKM (QG) values represent the average expression levels of each transcript in the experimental fields in Jiangxia of Hubei Province (JH) and Qingyang of Gansu Province (QG), respectively.The log-ratios beyond zero represent up-regulated genes, while the ratio of 1 and -1 mean 2-fold up-regulation and down-regulation, respectively.

Table 3. Differentially expressed genes of candidates at the 0.05 level
The statistics of the t test (normal distribution) or the Wilcoxon test (non-normal distribution) on expression levels was carried out between individuals in JH and QG.P values were adjusted using the Benjamini and Hochberg method (1995), which monitored the false discovery rate (FDR).Up-regulated genes represent that the expression levels are higher in QG than in JH, and down-regulated genes show that the expression levels are lower in QG than in JH.
those encoding PsbI, PsbK, PsbH, Ycf4/PsaI, CemA-like, and thioredoxin-like proteins.The enrichment in photosynthesis-related genes, with a proportion of 60% in this cluster, suggested a powerful coexpression pattern of photosynthesisrelated genes in regulating WUE under the water deficit conditions.Genes encoding CemA-like, Ycf4/PsaI, and PsbH, were up-regulated more than 2-fold in QG.Furthermore, this group included two genes in the chloroplast genome encoding ribosomal protein S4 and S16 that both played crucial roles in protein metabolism.Two genes encoding RH57-like and RLC retrotransposon involved in abiotic stress responses were also clustered in this group and were up-regulated more than 2-fold in QG.
Finally, Cluster 4 comprised 20 genes that were up-regulated in QG compared with lower expression levels in JH (Fig. 4E).Five genes were related to stomatal regulation, including three involved in the response to ABA, which plays a pivotal role in adaptation to drought stress and the production of reactive oxygen species (ROS).The expression levels of genes encoding CRK10 and HEX6 were increased more than two times in QG.MluLR10901 encoding ammonium transporter 2 was up-regulated more than 2-fold in QG, which was subjected to nitrogen regulation (Sohlenkamp et al., 2000).Four genes related to abiotic stress were also identified, including genes encoding methyltransferase-like protein 2-like (Mettl2-like), alcohol dehydrogenase-like 5-like (ADH5-like), prolyl 4-hydroxylase alpha-2 subunit (P4HA1), and 18.8 kDa class V heat shock protein (HSP18.8).Among them, HSP18.8 expression was increased more than 2-fold in QG.In addition, ten genes encoding products with unknown function were clustered in this group.
Quantitative real-time PCR was performed by using 15 randomly sampled individuals from each field site for eight genes, psbH, psbI, psbK, ycf4, petE, OAT4, RH57, and rps4 (Table 4).The relative expression levels determined by the two methods were significantly correlated for all eight genes (Spearman's rank correlation test, P <0.01; Fig. 5).

Genetic variation of candidate genes
SNP analysis showed that 19 candidate genes harboured 92 SNPs in total, with one to 17 SNPs for an individual gene (Table 5).The haplotypes of each gene were phased from SNPs, and each gene had 2-75 haplotypes and the number of genotypes based on haplotype combinations of each gene ranged from 3-71 (Table 5).The nucleotide diversity (π) ranged from 0.000304 to 0.00253 in JH, while it ranged from 0.000200 to 0.00239 in QG.The nucleotide diversity of nine and 10 genes were decreased and increased, respectively.
The contribution of genotype, environment, and genotype×environment interaction on gene expression variation was measured using an ANOVA model for each of 19 genes with SNPs.Detecting significant effects of those factors on gene expression levels represent, respectively, genetic variation for gene expression (G), phenotypic plasticity (E), and genetic variation for phenotypic plasticity (GEI, genotypeby-environment interaction).Of the 19 genes, 16 with a genotype found at least three times in a field site were analysed for GEI through ANOVA (Fig. 6).
The expression levels of eight genes were not significantly affected by genotype, environment or genotype×environment interaction (Fig. 6C, D, F, G, H, K, L, O).Expression of the remaining genes were all affected by environment (P E <0.05; Fig. 6A, B, E, I, J, M, N, P), of which five genes were affected only by environment (P E <0.05, P G >0.05, P GEI >0.05; Fig. 6E, I, J, M, P).Two genes were affected by both genotype and environment (P E <0.05, P G <0.05, P GEI >0.05; Fig. 6A,  B).The expression of one gene was affected by genotype and environment interaction (P E <0.05, P GEI <0.05; Fig. 6N).

The methodology of candidate gene identification
In this study, candidate genes were identified through a correlation analysis between water use efficiency and RNA-Seq data obtained from two field sites with substantially different water availability.By using the mantel test, pairwise ratios for FPKM and WUE between the two fields were calculated, which helped to exclude biases possibly caused by genetic variance and technical errors (Fig. 1).The results indicated that the matrix correlation coefficient could be effective for assessing the genetic basis of a quantitative trait based on expression data obtained from the field conditions.Thus, this method opens an opportunity for studying the genetic and genomic basis of an adaptive physiological trait using RNA-Seq data, which is becoming increasingly accessible for natural populations of organisms without reference genome sequences.

Candidate genes involved in the improvement of WUE
Since WUE is calculated as the ratio of net photosynthesis to transpiration rate, it is expected that the candidate genes of WUE include those involved in photosynthesis.The proportion of photosynthesis genes in the whole reference  (Velthuys, 1980;Fromme and Grotjohann, 2008).PsbK, PsbI, PsbH are plastome-encoded and low molecular mass subunits that are located in the peripheral region of the photosystem II (PSII) reaction centre core (Schwenkert et al., 2006;Nickelsen and Rengstl, 2013).They all play important roles in assembly and stabilizing the binding of cofactors to the PSII core (Iwai et al., 2010;Pagliano et al., 2013).
In sequence alignment, psaI and ycf4 are both annotated in one transcript, since they are very close and are located in a large operon in the chloroplast genome.PsaI is required for the stabilization of the PsaH and PsaL subunits and they are important for building the docking platform for the lightharvesting complex of PSII (LHCII) proteins in the state transition process.Ycf4 is an assembly chaperone of photosystem I (PSI) and appears to be absolutely essential for PSI accumulation in Chlamydomonas (Boudreau et al., 1997).
In the process of photosynthesis, plastocyanin functions as an electron transfer agent between cytochrome f of the cytochrome b 6 f complex and P 700 of PSI.According to Wang et al. (2014a), some photosynthesis-related genes including petE were found to interact with a calcium-sensing receptor which accelerates stomatal movement and the formation of photosynthetic electron transport, thereby regulating WUE and drought tolerance.Thioredoxin as an integral part of the ferredoxin-thioredoxin oxidoreductase executes functions in the reductive regulation of probably hundreds of chloroplast enzymes as well as in the regeneration of components of the antioxidative system, such as peroxiredoxins (Schurmann and Buchanan, 2008).Thioredoxin-like protein is also involved in the biogenesis of the cytochrome b 6 f complex in Arabidposis (Lennartz et al., 2001).All of the above genes except for MluLR15163 encoding thioredoxin-like were up-regulated in QG, which is comparable with findings that some genes in PSII were overexpressed in poplar under drought (Berta et al., 2010).Chloroplast envelope membrane protein belongs to the CemA family and is possibly involved in CO 2 uptake across the inner envelope Fig. 6.Expression reaction norms for WUE-related genes with genetic variation responding to growth environments.Detecting significant effects of those factors on the gene expression level represent, respectively, genetic variation for gene expression (G), phenotypic plasticity (E), and genetic variation for phenotypic plasticity (GEI, genotype-by-environment interaction).The graphs (A)-(P) represent the genes: MluLR17106 (ycf4), MluLR5294 (OAT4), MluLR2876 (UBE3), MluLR7126 (SSII-2), MluLR16886 (WRKY4), MluLR13061 (LSD1), MluLR16034 (Cyclophilin-type PPIase), MluLR4945 (LCAT1-like), MluLR12611 (mbd106), MluLR15146 (FKBP-type PPIase), MluLR17624 (RLC), MluLR14116, MluLR18370 (ASRGL1-like), MluLR3563, MluLR9412, and MluLR18082, respectively.The detailed functional annotation of these genes are given in Table 2.The average expression levels (FPKM) are computed only on genotypes with more than or equal to three individuals.Different genotypes are in a different colour and the error bars indicate the standard deviations.
membrane of the chloroplast (Rolland et al., 1997).Pyruvate orthophosphate dikinase (PPDK) is regulated by the PPDK regulatory protein (PDRP), a bifunctional enzyme, and plays a regulatory role in the phosphoenolpyruvate (PEP)regeneration phase of the C4 photosynthetic pathway (Naidu et al., 2003;Wang et al., 2008;Chastain et al., 2011).Therefore, these results suggest that the up-regulation of these photosynthesis-related genes played a crucial role in improving WUE (Ruiz-Nieto et al., 2015).
Because CO 2 assimilation and transpirational water loss occur predominantly through stomatal pores, it is conceivable that genes involved in stomatal development and stomatal opening/closure affect WUE (Yoo et al., 2010;Des Marais et al., 2014).Among nine stomatal regulation-related genes, five encode products regulating ABA signal transduction which contain the cysteine-rich receptor-like cytosolic kinase 10 (CRK10), hydroxyacid oxidase 1, auxin response factor 4, ubiquitin-protein ligase E3, and WRKY transcription factor 4. It is well known that the plant hormone ABA is involved in abiotic stress responses and regulates stomatal closure (Lee and Luan, 2012;Lim et al., 2012).CRK10 is induced by abiotic stress and up-regulated and probably negatively controls ABA signalling (Tanaka et al., 2012).Hydroxyacid oxidase 1 (HAO1) could catalyse the formation of hydrogen peroxide that is involved in ABA-induced stomatal closure (Zhang et al., 2001;Desikan et al., 2004).Auxin response factor 4 might indirectly participate in ABA signalling due to cross-talk between auxin-and ABA-signalling under drought responses (Bianchi et al., 2002).Ubiquitin-protein ligase probably positively regulates ABA signalling under abiotic stress (Zhang et al., 2007) while, in Arabidopsis, the CER9 gene encodes an E3 ubiquitin ligase involved in cuticle formation that could suppress transpiration and maintain plant water status (Lu et al., 2012).WRKY4 was also found to be involved in ABA signalling and the mediation of stomatal closure (Rushton et al., 2012).
The MluLR5858 transcript encodes the homeobox transcription factor WOX14, which might regulate stomata density and plant vascular proliferation (Nakata et al., 2012;Etchells et al., 2013).The MluLR5294, MluLR7126, and MluLR12213 transcripts encode anion transporter 4, starch synthase II-2, and hexose carrier protein 6 (HEX6).Anion transporter 4 is essential for resistance to abiotic stress and ion homeostasis and might be involved in the regulation of stomatal movements (Dietrich et al., 2001;Wang et al., 2014b).Starch synthase II-2 and HEX6 regulate the balance of starch and sucrose synthesis that can control stomata opening/closure.Thus, genes regulating stomata development and movement also played important roles in the improvement of WUE.
Twelve abiotic stress-related genes participate in multiple pathways related to abiotic stress.Drought stress responses can be regulated through epigenetic mechanisms such as DNA methylation and histone modification (Chinnusamy and Zhu, 2009;Yin et al., 2014).Genes encoding the methyl-CpG binding domain 106, Mettl2-like, and LSD1 were involved in epigenetic regulation.It was found that the methyl-CpG binding domain 106 played an important role in interpreting the genetic information encoded by methylated DNA (Zou et al., 2012).Mettl2 was associated with the histone deacetylase activity (Song et al., 2010), and LSD1, a histone demethylase, played a role in the epigenetic regulation of gene expression (Shi et al., 2004;Kim et al., 2008;Papaefthimiou and Tsaftaris, 2012).

Expression and genetic variation of candidate genes and environmental effect
The significant correlation between qPCR validation and RNA-Seq data indicated that the relative gene expression levels between the fields are reliable, which provided a satisfactory validation of the RNA-Seq results (Li et al., 2013).Although the fold changes of some candidate genes across the two sites were small, the variation in gene expression was high among individuals in each site (see Supplementary Table S2 at JXB online; Fig. 5).Given that all of the 48 identified genes are significantly correlated with the changed WUE, the correlation coefficient may reflect to a certain extent the relative contribution to WUE.The psbK gene involved in photosynthesis showed the highest contribution to WUE (r=0.429)(Tables 1, 2).The average correlation coefficients of photosynthesis-related, stomatal regulation-related, protein metabolism-related, abiotic stress-related, and other genes are 0.304, 0.277, 0.310, 0.280, and 0.292, respectively, which suggested that the photosynthesis-and protein metabolismrelated genes showed a higher contribution to WUE than other functional categories.Although the variation range of fold change is from 0.26 to 15.60, it is noteworthy that there is no correlation between fold change of FPKM and correlation coefficients (r=0.043;Table 1).
The most remarkable up-regulation (15.6-fold change) is found in the MluLR10901 transcript which encodes ammonium transporter 2 (AMT2) involved in nitrogen metabolism.Leaf nitrogen is a major driver of photosynthetic capacity and is critical to determining WUE (Donovan et al., 2007).The expression level of the AMT2 gene that increased sharply in QG might imply that the significant overexpression could compensate for the high carbon assimilation rate, due to low stomatal conductance under water deficit, whereas, psbK and psbI genes were significantly up-regulated with 1.37-and 1.53-fold change, respectively (Tables 1, 3).They are elements of photosystem electron transport and might possess high WUE through steady increase of expression levels.The hierarchical clustering analysis showed strong representation of photosynthesis-related genes in cluster 3 (Fig. 4D), and half of the genes in this cluster were up-regulated more than 2-fold, which indicated a coexpression pattern of photosynthesis genes.
In Fig. 6J, MluLR15146 coding for FKBP-type PPIase was involved in abiotic stress responses since levels of FKBPs were reported to increase in response to drought stress in sorghum (Sharma and Singh, 2003).On the other hand, the expression level of MluLR12611, which encodes methyl-CpG binding protein linking DNA methylation to histone methylation (Zou et al., 2012), is down-regulated under drought stress (Fig. 6I).The expression of MluLR18082 and MluLR18370 (Fig. 6M, P) classified in the 'Other' functional category are reduced under the water deficit environment.MluLR16886, a WRKY gene, increased its transcriptional level under abiotic stresses (Fig. 6E).WRKY transcription factors respond to several stress factors and regulate stress-related genes in order to adapt to adverse conditions (Chen et al., 2012), and are also involved in ABA signalling (Rushton et al., 2012).The expression level of MluLR3563 shows genetic variation for phenotypic plasticity (P E <3E-06, P GEI =0.003), but its function is still unknown.Finally, the transcriptional levels of MluLR5294 and MluLR17106 are induced by drought stress and both show genetic variation and phenotypic plasticity (P E <3E-16, P G =0.01; P E <5E-05, P G =0.02).MluLR5294 codes for an anion transporter which is essential for nutrition, resistance to biotic and abiotic stresses, and stomatal movement by regulating chloride homeostasis (Dietrich et al., 2001;Wang et al., 2014b).MluLR17106 encodes Ycf4, an assembly chaperone of PSI and appears to be of central importance to PSI accumulation (Boudreau et al., 1997;Schottler et al., 2011).
The expression levels of eight genes were all affected by growth environments (Fig. 6), which indicate that they have phenotypic plasticity to adapt to the new environment.It appears unreasonable that none of the eight genes was affected by genetic variation, compared with previous studies in yeast (Smith and Kruglyak, 2008) and Arabidopsis (Des Marais et al., 2012).One probable explanation for this phenomenon could be that the growth environment is more complicated under natural than controlled conditions.Furthermore, given that the genes are related to WUE rather than all transcripts, the WUE-related genes could have responded more sensitively to the changes of environment than to the genetic variations.
In conclusion, the five functional categories of identified genes were found relevant to the regulation of WUE, which substantiates the effectiveness of the matrix correlation analysis of physiological traits and RNA-Seq for candidate gene identification.It is noteworthy that most of the candidate genes involved in photosynthesis, stomatal regulation, and abiotic stress responses were up-regulated.Moreover, our analyses suggested that the relatively drastic changes in expression levels of the candidate genes were affected by environment rather than genotype.This study identified the candidate genes important for water deficiency adaptation of second-generation energy crops, which are subjected to further functional validation and possibly future utilization for crop improvement.

Fig. 1 .
Fig. 1.Matrix correlation based on physiological trait and expression data.Water use efficiencies (W) ratio and FPKM (F) ratio matrices are performed by mantel test via Spearman's rank correlation method.The mantel test is conducted on 15 495 transcripts of M. lutarioriparius (i=1, 2, …, 15 495).Each correlation is conducted by 10 000 permutations.

Fig. 2 .
Fig.2.Comparison of water use efficiencies (WUE) of M. lutarioriparius between two sites.The empty boxplot shows the mean value of WUE in Jiangxia of Hubei Province (JH) and the solid boxplot the mean value of WUE in Qingyang of Gansu Province (QG).The t test between the WUE values in JH and QG is examined and the result shows that the WUE value of M. lutarioriparius is significantly higher in QG than in JH (P <0.001).

Fig. 4 .
Fig. 4. (A) Hierarchical clustering of 48 candidate genes differentially expressed between individuals in JH (left) and QG (right).The normalized gene expression values (FPKM) of candidate genes of each individual are used for the cluster display.The colour scale (representing the normalized gene expression) is shown at the bottom.The genes which share similar expression patterns are divided into four groups, (B) Cluster 1, (C) Cluster 2, (D) Cluster 3, and (E) Cluster 4. For the full names of abbreviations in annotation see Table2.Up-regulated (+), down-regulated (-), up-regulated more than 2-fold (2+), down-regulated more than 2-fold (2-).

Table 1 .
Candidate genes of water use efficiency (WUE) identified at the 0.01 level by mantel test

Table 2 .
Functional categorization and putative annotation of candidate genesThe annotation and potential functional groups of candidate genes are performed using Nucleotide Basic Local Alignment Search Tool (BlastN) on National Center for Biotechnology Information (NCBI).The abbreviations of annotation and the best species of sequence alignments are enclosed in parentheses and brackets, respectively.

Table 4 .
Primers for quantitative real-time PCR.

Table 5 .
Genetic variation of candidate genesHaplotypes are inferred from the connection of SNPs within a gene using PHASE software, and genotype of each individual is the best reconstruction of haplotypes with the highest probability.Nucleotide diversity is represented by π.Expression level correlation between RNA-Seq and qPCR.Negative correlation between FPKM values of RNA-Seq and average Ct values of qPCR indicate a consistent estimation of the relative expression levels between the two methods.The R in the graphs indicates the correlation coefficient.**representsthe significant level (P <0.01, Spearman's rank correlation test).Sequences of PCR primers are given in Table4.Red and blue dots represent individuals sampled from JH and QG, respectively.