TEF-7A, a transcript elongation factor gene, influences yield-related traits in bread wheat (Triticum aestivum L.)

Summary TaTEF-7A is a functional regulatory factor gene for grain number and other traits, and its diagnostic markers could be used in marker-assisted selection to improve yield potential in wheat.


Introduction
Transcript elongation is a simple extension of an initiated transcript (Grasser, 2005). In recent years, it has become apparent that synthesis of mRNA is a dynamic and highly regulated step in gene expression, and it requires the concerted action of a variety of elongation factors (Lolas et al., 2010). Transcript elongation factors (TEFs) play a crucial role in regulation, proliferation, and differentiation of cells, and also control various growth processes. The vast majority of studies on the regulation of eukaryotic transcription have focused on the early stages of the transcription cycle. However, studies of Drosophila and zebrafish mutants defective in elongation factors revealed intriguing results about the essential roles of these factors in development. Mutations in the Drosophila gene encoding a member of the ELL family (dELL), which belongs to the TEF group, cause segmentation defects and embryonic lethality, and dELL is required for normal expression of diverse genes during development (Eissenberg et al., 2002). Characterization of the effect of a foggy mutant on the TEFs Pandora/Spt6 and Foggy/Spt5 with an amino acid substitution in zebrafish revealed that the mutation not only affected typical overall anatomy but also influenced the differentiation of cells in specific neuron lineages (Keegan et al., 2002). Recent studies in Arabidopsis and maize also indicated that factors regulating transcript elongation play crucial roles in plant development. Mutations in the Arabidopsis DEFORMED ROOTS AND LEAVES1 (DRL1) gene cause disorganized shoot, inflorescence, flower, and root growth (Nelissen et al., 2003). DRL1 is likely to be involved in regulation of meristem activity and organ growth. Mutations in three genes encoding subunits of the Arabidopsis elongator complex display a pleiotropic phenotype, to some extent resembling the phenotype of the drl1 mutant (Nelissen et al., 2005). Thus, elo mutants show reduced growth of organs that can be attributed to a reduced rate of cell proliferation. Molecular analysis of elo mutants revealed that many genes were abnormally expressed in the mutants compared with the wild types (Nelissen et al., 2005). A study of the maize Etched (et1) mutant revealed the intriguing possibility that control of transcript elongation also plays a crucial role in expression of plastid gens; et1 mutants display delayed greening of seedling leaves. Mature kernels of these plants are severely fissured and cracked. Microscopic examination demonstrated aberrant plastid development in mutant leaves and kernels (Da Costa e Silva et al., 2004).
In rice, a T-DNA insertion mutant in a gene encoding a TEF homologous to yeast elf1 (Os2TEF1) led to a 60-80% reduction in tillering, retarded growth of seminal roots, and sensitivity to salt stress compared with wild-type Basmati 370. Detailed transcriptomic profiling of OsTEF1 revealed that mutation in the TEF differentially regulated expression of >100 genes with known function and finally regulated the tillering process by inducing expression of cytochrome P450 (Paul et al., 2012). These findings show that the TEFs are more pleiotropic than previously thought, and that previous studies failed to identify all functions of the factors.
Bread wheat (Triticum aestivum L.), as one of the three most important cereals, is widely cultivated worldwide, and improvement of wheat yields could prevent global food shortages. As it is an allohexaploid species with an extremely large and complex genome (2n=6x=42) of ~16 000 Mbp (Tilman et al., 2011), gene isolation is extremely difficult. It is well known that grain yield is manifested via a complex relationship among yield component traits such as 1000-kernel weight (TKW), grain number per spike (GN), and effective tiller number (Zhang et al., 2010). Thus, deciphering the molecular mechanisms underlying yield-related genes in wheat is of great interest not only to plant biologists, but also to breeders. Wheat yield, or yield components, is controlled by numerous genes with additive and epistatic effects that are highly interactive with the environment. Quantitative trait locus (QTL) mapping has been widely used to study specific yield-related traits in bread wheat (Zhang et al., 2012). Numerous QTLs have been reported for traits such as TKW, GN, grain weight per spike, and tiller number (Huang et al., 2004;Narasimhamoorthy et al., 2006), but fine mapping of particular genes from QTLs associated with yield is rarely reported (Barrero et al., 2011). Fine mapping is hampered by the large genome and large number of repeated nucleotide sequences. A few genes associated with yield have been isolated by comparative genetics. For example, an orthologue of the rice OsGW2 gene, named TaGW2, was cloned in wheat; this gene significantly affected TKW (Su et al., 2011;Yang et al., 2012). Zhang et al. (2012) characterized TaCKX6-D1 that also increased grain productivity, and further analysis showed that it explained phenotypic variance of 1.3-1.4 g per 1000 kernels. The function and mechanism of a glutamine synthetase gene controlling nitrogen use was comprehensively studied by Li et al. (2011) and Quraishi et al. (2011), who found that the best haplotype conferred superior seedling growth, better agronomic performance, and improved N uptake during vegetative growth and grain N concentration. Xue et al. (2011) identified TaMYB13, a transcriptional activator of fructosyltransferase genes, that is involved in β-2, 6-linked fructan synthesis in wheat, and positively associated with grain yield. Currently, there is an increasing need to clone yield-related genes, exploit the more favourable alleles, and develop tailor-made markers for marker-assisted selection (MAS).
In this study, work is presented on isolation of the gene TaTEF-7A in wheat and its functional characterization by expression analysis, association analysis, near-isogenic line (NIL) comparison, and overexpression in Arabidopsis. The findings indicate that TaTEF-7A mainly regulates yieldrelated traits involved in vegetative growth and reproductive development. Furthermore, the coding and promoter regions were analysed and sequence polymorphisms among wheat cultivars were identified. Association analysis and comparison of NILs showed that TaTEF-7A was associated with grain number. Finally, a functional marker for TaTEF-7A was developed for MAS of high-yielding wheat genotypes.

Plant materials
A Chinese wheat mini core collection (MCC) was employed for association of yield traits with markers. The MCC contains 262 wheat accessions, comprising 157 landraces, 88 modern cultivars, and 17 introduced lines, and represents 1% of the national wheat collection but >70% of the genetic diversity . Accessions of the three putative diploid progenitors of hexaploid common wheat (AABBDD), namely Triticum urartu (AA, accession UR203), Aegilops speltoides (SS, closely related to BB, accession Ae49), and Aegilops tauschii (DD, accession Y2282) were used for identification of the genomic origins of TaTEF homoeologues. A set of nullisomic-tetrasomic lines of Chinese Spring was used for chromosomal location of TaTEF genes. A double haploid (DH) mapping population (Hanxuan 10×Lumai 14) was used for linkage mapping. NILs for TaTEF-7A were identified in a BC 3 F 6 of Jinmai 47//4*/Lumai14 population. In addition, a large number of modern wheat cultivars, comprising 384 European, 429 North American, 53 CIMMYT, 82 Russian, and 51 Australian cultivars (Hou et al., 2014), were used to determine the geographic distribution of TaTEF-7A haplotypes.

Phenotypic assessment and statistical analyses
Phenotypic traits of all accessions in the MCC, namely heading date (HD), maturity date (MD), spike length (SL), spikelet number per spike (SN), plant height (PH), grain number per spike (GN), effective tiller number (ETN), 1000-kernel weight (TKW), kernel length (KL), kernel width (KW), and kernel thickness (KT), were collected in four environments-in 2002, 2005, and 2006 at Luoyang, Henan province, and in 2010 at Shunyi, Beijing; they were named 02LY, 05LY, 06LY, and 10SY, respectively. Each accession was planted in a 2 m two-row plot with 30 cm between rows, and 40 seeds per row. Ten plants from the middle of each plot were used in investigating the above phenotypic traits. The DH mapping population (Hanxuan 10×Lumai 14) and NILs were planted at Changping, Beijing. Five plants in the middle of each plot were randomly sampled for analysis. Eight agronomic traits were measured, namely PH, SL, TKW, GN, ETN, SN, penultimate internode length, and uppermost internode length.
Mean values of phenotypic traits and standard errors were analysed by SPSS 16.0 software (http://www.brothersoft.com/downloads/spss-16.html). The mean value of each trait in the Chinese wheat MCC was estimated by the best linear unbiased predictor (BLUP) method (Bernardo et al., 1996a, b, c).

DNA and RNA extraction
Genomic DNA was extracted from young leaves of each accession using the cetyltrimethylammonium bromide (CTAB) method (Stewart and Via, 1993). Total RNA was isolated using an RNAprep pure plant kit (Tiangen) according to the manufacturer's instructions. cDNA was synthesized with M-MLV Reverse Transcriptase (Promega), the cDNA was diluted 10 times, and 2 μl were used for subsequent PCR.

Primers and PCR
Primers were designed by the software Primer Premier Version 5.0 (Premier Biosoft International, Palo Alto, CA, USA), and the primers were synthesized by Sangon (www.sangon.com). The Primer sequences are shown in Supplementary Table S1 available at JXB online. LA-Taq enzyme from TaKaRa (www.takara.com.cn) was used for PCR amplification. PCR were performed in total volumes of 15 μl, including 3 pmol of each primer, 120 μM of each dNTP, 80 ng of template DNA or cDNA, 0.75 U of La-Taq, and 7.5 μl of 2× buffer [TaKaRa Biotechnology (Dalian) Co. Ltd, Product Code: DRR20AG]. PCRs were performed as follows: 95 °C for 4 min; followed by 30-35 cycles of 95 °C for 30 s, annealing (55-62 °C) for 30 s, and extension at 72 °C (30 s to 3 min), and 72 °C for 30 s, with a final extension of 72 °C for 10 min. The annealing temperatures and extension times depended on the primer sets and lengths of expected PCR products.

Functional marker development
Development of TaTEF-4F/R was based on two selected variations in the promoter region of TaTEF-7A, including a 1 bp indel at position -629 bp (InDel1-629) and another 1 bp indel at position -604 bp (InDel1-604). First, genomic-specific primer set TaTEF-3F/R was used to amplify fragments from chromosome 7A in all cultivars. A second PCR was performed as follows: the first PCR solution was diluted 100 times, taking 1 μl as template for the second PCR, primer TaTEF-4F/R was used for the second PCR, with annealing at 54 °C for 30 s, and extension at 72 °C for 30 s.

Subcellular location of TaTEF-7A protein by fusion with green fluorescent protein (GFP)
The full-length cDNA clone of TaTEF-7A was fused upstream of the GFP gene and put under control of the constitutive Cauliflower mosiac virus (CaMV) 35S promoter in the pJIT163-GFP expression vector to construct a 35S::TaTEF-GFP fusion protein. Restriction sites, KpnI and XbaI, were added to the 5' and 3' ends of the coding region by PCR. The PCR products were digested with restriction endonucleases, ligated with the pJIT163-GFP plasmid, and cut with the corresponding enzymes to create recombinant plasmids expressing the fusion protein. The subcellular location of TaTEF-7A was detected by monitoring transient expression of GFP in wheat protoplast cells as described by Mao et al. (2011) Expression analysis Quantitative real-time PCR was performed using SYBR ® Premix Ex Taq™ II (Takara) according to the manufacturer's instructions on a 7300 Real-time PCR System (Applied Biosystems). The relative expression of each gene was calculated according to the 2 -ΔΔCT method (Livak and Schmittgen, 2001). The glyceraldehyde-3-phosphate dehydrogenase gene was used as an endogenous reference for real-time PCR, and all analyses were performed with three technical and three biological replicates.

Binary vector construction and transformation of Arabidopsis
The coding sequence of TaTEF-7A, including the XbaI (5′) and BamHI (3') restriction sites, was first amplified. The PCR products and pbi121 plasmid were digested with XbaI and BamHI, and then cloned into the binary vector pbi121 under control of the CaMV 35S promoter in the sense orientation. The construct was transformed into Arabidopsis (ecotype Col-0) by the floral-dip method according to previously described procedures using Agrobacterium tumefaciens strain GV3101 (Herrera-Estrella et al., 2005). Transgenic plants were selected on solid half-strength Murashige and Skoog (MS) medium containing 50 mg l -1 kanamycin (Ahmed et al., 2012).

Transient expression in tobacco
The promoter fragment of TaTEF-7A was fused with the β-glucuronidase (GUS) gene in the pCAMBIA1391Z vector. Restriction sites, BamHI and AvrII, were added to the 5′ and 3′ ends of the coding region by PCR. The PCR products were digested with restriction endonucleases, and then ligated with the plasmid. pBICHP57. The luciferase (LUC) reporter gene (Wu et al., 2009) was cloned in the binary vector pBI121 downstream of the CaMV 35S promoter. Nicotiana benthamiana leaves were co-infiltrated with plasmids and pBICHP57. The GUS activity was normalized by the LUC activity. Plant transformation, and GUS and LUC activity assays were determined according to Wu et al. (2009).

Plant growth and documentation
Arabidopsis thaliana (ecotype Columbia-0) was chosen for transgene analysis, and the T 3 generation was grown in a controlled environment chamber at 22 °C, with a 16 h/8 h photoperiod, light intensity of 120 mmol m −2 s −1 , and 70% relative humidity. After sowing, seeds were stratified in darkness at 4 °C for 48 h prior to incubation in a plant growth chamber. At least five plants of each line were used for phenotypic characterization. Distances from the stem bases in each line were measured to compare silique lengths. The numbers of siliques on each line were counted. Seeds of each line were photographed and measured using an Olympus BH-2 microscope (Olympus, Japan).

Isolation of TaTEF-7A and phylogenetic analysis of orthologous genes in wheat and other species
Using the rice TEF gene (Os02g04160) as a query, contig08999 was found in the Chinese Spring draft genome sequence (http://www.cerealsdb.uk.net/cerealgenomics/CerealsDB/ Documents/DOC_search_reads.php). A full-length wheat coding sequence (CDS; accession CJ655632) was retrieved from NCBI by blasting the contig sequence against GenBank (Supplementary Fig. S1 available at JXB online). The TaTEF genomic sequence and CDS were amplified by TaTEF-5F/R from Chinese Spring genomic DNA and cDNA, respectively.
Their lengths were ~1100 bp and 270 bp, respectively. cDNA sequences of the three homoeologous genes in the A, B, and D genomes were obtained (Supplementary Figs S2, S3 available at JXB online). Four exons and three introns were detected in this gene ( Supplementary Fig. S1 available at JXB online).
TaTEF encodes an 89 amino acid polypeptide that is 84.9% identical to OsTEF protein ( Supplementary Fig. S1b available at JXB online). In most species, TEF genes have similar structures with four exons and three introns. A phylogenetic tree of TEF genes was generated based on the predicted polypeptide sequences in monocots, eudicots, and yeast (Supplementary Table S2 available at XB online). All of the proteins shared high similarity of 52.8-95.5%, implying potentially conserved functions among orthologues ( Supplementary Fig. S1c available at JXB online). The tree consists of at least three divergent clades with strong bootstrap support; the species include monocots, eudicots, and Saccharomyces cerevisiae.

Diversity mainly occurs in the promoter region of TaTEF on chromosome 7A
A number of single nucleotide polymorphisms (SNPs) and Indels were found in the intron sequences of TaTEF genes from the A, B, and D genomes. This allowed discrimination of TaTEF-7A from the other copies on chromosomes 7B and 7D by amplification of genome-specific primers TaTEF-1F/R, TaTEF-2F/R, and TaTEF-3F/R (Fig. 1). The region of each primer set is shown in Fig. 2. No nucleotide difference was detected in the coding region from 48 cultivars with quite different phenotypes, implying that the mechanism of control of wheat development possibly depends on regions external to the gene. Approximately 700 bp upstream of ATG of the 48 accessions were then sequenced. Fourteen nucleotide substitutions and two Indels were detected among the 48 cultivars, and three major haplotypes were formed by 16  Table S3 available at JXB online). The design of primer TaTEF-3F/3R was based on polymorphisms in the promoter region. As shown in Fig. 1C, target fragments were detected only in accessions with the A genome, and were not amplified in Ae. speltoides, Ae. tauschii, NT7A7B, and NT7A7D. This indicated that TaTEF was amplified from chromosome 7A in Triticum spp. Moreover, primer TaTEF-4F/R (Table 1) was used to genotype 150 DHs derived from Hanyuan10×Lumai14. TaTEF was mapped to a region flanked by P3156.3 (7.2 cM distal) and Xwmc83 (6.5 cM proximal) on chromosome 7A (Fig. 3).

TaTEF-7A is mainly expressed in young spikes and developing seeds, and the protein is located in the nucleus
Transient expression of TaTEF-7A fused with GFP in wheat protoplast cells showed that TaTEF7A-GFP accumulated only in the nucleus, whereas GFP alone was present throughout the whole cell (Fig. 4), demonstrating that TaTEF-7A interacted with the cell nucleus. TaTEF-6F/R (Table 1) was designed based on the 5'-untranslated region (UTR) sequence, which was amplified only from the 7A chromosome ( Supplementary Fig. S4 available at JXB online). A tissue-specific expression study using quantitative real-time PCR showed that TaTEF-7A was expressed in nearly all tissues tested (Fig. 5). The highest expression levels were detected in young spikes and developing seeds. The transcription level was consistent in spikes at various stages from young to immature seeds and increased gradually during grain filling, indicating that it may have effects on yield traits.

Overexpression of TaTEF-7A in Arabidopsis alters vegetative and reproductive development
Tissue-specific expression showed that TaTEF-7A was constitutively expressed, but at much higher levels in young spikes and developing seeds. A T-DNA insertion in rice caused 60-80% less tillering, retarded growth of seminal roots, and increased sensitivity to salt stress compared with the wild type (Paul et al., 2012). The question that arises is what traits are affected when TaTEF-7A is overexpressed in Arabidopsis. Transgenic plants harbouring TaTEF-7A constructs driven by the CaMV 35S promoter were generated. Overexpression of TaTEF-7A in Arabidopsis enhanced vegetative growth of both seedlings and booting stage plants (Fig. 6A, B). As shown in Fig. 6C and D, and Table 1, silique and grain lengths were increased in transgenic lines 1 and 2 compared with the   control. The expression levels of TaTEF-7A transgenics were also higher (Fig. 6E). The fact that TaTEF-7A affects silique length, silique number, and grain length suggests that this gene should also influence yield traits in wheat.
The MCC is regarded as a suitable population for detection of major QTLs controlling yield traits (Zhang et al., 2012). In comparing the effects of each haplotype on agronomic traits GN, KL, KW, KT, and TKW in MCC landraces, significant associations were identified, and there was a negative relationship between GN and TKW. However, among modern cultivars, there were significant differences in GN between haplotypes. Interestingly, the phenotypic means for KL, KW, KT, and TKW in Hap-7A-3 accessions were all higher than for other haplotypes in this subset, and GN differences between the three haplotypes were also significant. Thus the average GN of Hap-7A-3 was increased, with TKW enhanced (P<0.05) between landraces and modern cultivars during yield improvement (Table 2). Collectively, these results demonstrated that Hap-7A-3 was positively selected because it had a significantly positive effect on yield. Previous studies demonstrated that GN differences were associated with TaELF-7A haplotypes (Table 2). Therefore, differences in GN between TaELF-7A haplotypes under different environmental conditions were examined (Table 3). In landraces, the GN of Hap-7A-1 was higher than those of Hap-7A-2 and Hap-7A-3 in all environments, but a significant difference (P<0.05) occurred only in 02LY. In modern cultivars, in contrast, the GN of Hap-7A-3 was significantly higher than those of Hap-7A-1 and Hap-7A-2 in both 02LY and 10SY. Because Hap-7A-3 had a significantly positive effect on GN, it was considered a potentially superior allele for improvement of grain yield in wheat.

The effects of TaTEF-7A haplotypes on yield traits were further confirmed in NIL comparisons
NILs derived from a BC 3 F 6 population of Jinmai 47//4*/ Lumai14 were used to evaluate haplotype genetic effects, in which Jinmai 47 represented Hap-7A-3 and Lumai 14 represented Hap-7A-2. Functional analysis of TaTEF-7A in NILs is shown in Fig. 8. It was clear that Hap-7A-3 has higher GN (18.4%) than Hap-7A-2. Moreover, although Hap-7A-3 (NIL) has a longer penultimate internode than Hap-7A-2 (L14) (13.9 ± 2.6 cm versus 11.4 ± 1.3 cm), there was no significant difference in PH. Hap-7A-3 also exhibited an increase in SN and  TKW but a lower TN that did not reach statistical significance. This further confirmed that Hap-7A-3 potentially increases grain yield through regulating grain number per spike.

TaTEF-7A positively regulates yield-related traits
To confirm the expression patterns of TaTEF-7A, young spikes of each haplotype represented by four, one, and four varieties, respectively, were collected. Their relative expression values were measured by quantitative real-time qPCR (Table 4). Hap-7A-3 was the highest overall, and a significant positive correlation between the expression level of TaTEF-7A and GN was observed (r=0.889, P<0.001) The average expression level of TaTEF-7A was significantly higher than that of either TaTEF-7B or TaTEF-7D in a set of NILs. GN was positively correlated with the total expression level of the three TaTEF genes. Furthermore, TaTEF-7A was the highest among the three homoeologues, except in the recurrent parent Lumai 14 (Fig. 9). Association analysis also showed that Hap-7A-3 at TaTEF-7A was significantly related to higher GN, whereas Hap-7A-1 and Hap-7A-2 were associated with lower GN (Table 2). Transient expression analysis in tobacco  also showed that the average relative expression of Hap-7A-3 was higher than that for the other two haplotypes (Fig. 10).
All of these results further demonstrated that TaTEF-7A positively regulated yield-related traits through its expression level.
This region of TaTEF-7A coincides with several previously reported yield-related QTLs, including spikelet number per spike QSpn.nau-7A (Ma et al., 2007;Zhang et al., 2010), flour yield QFy550.b22-7A (Kunert et al., 2007), test weight QTw. crc-7A, and grain yield QYld.crc-7A (Huang et al., 2006). In an immortalized F 2 population of 136 lines derived from Nanda 2419×Wangshuibai, QSpn.nau-7A explained 10-12% of the spikelet variation (Ma et al., 2007). Agronomic and quality traits were also detected in this region in a DH population from AC Karma'×87E03-S2B1. QTw.crc-7A for test weight and grain yield per plot explained 10.6% of the phenotypic variance in test weight, and QYld.crc-7A explained 8.1% of the variance in grain yield (Huang et al., 2006). This locus was also related to flour yield in another study (Kunert et al., 2007). An advanced backcross QTL (AB-QTL) strategy utilized to locate QFy550.b22-7A for baking quality traits in two BC 2 F 3 populations of winter wheat explained 11.9% of the genetic variance (Kunert et al., 2007). These results suggest that TaTEF-7A might be related to major yield QTLs mapped to the same region, and that the gene possibly confers complex pleiotropic effects on growth, yield, and quality.

Hap-7A-3 was positively selected in Chinese wheat breeding programmes
A total of 348 modern cultivars released in China since the 1940s  were classified into six subgroups  These cultivars were planted at Shunyi, Beijing. The expression level of YM18 was assumed to be 1.00. (1940s, 1950s, 1960s, 1970s, 1980s, and 1990s) according to decade of release, and were used to determine haplotype frequencies of TaTEF-7A over time. The frequency of Hap-7A-3 increased from 12.5% in the 1940s to 38.2% in the 1990s. Interestingly, the frequencies of this haplotype from the 1940s to 1990s were consistent with yield-related differences in GN. This strongly indicates that Hap-7A-3 was positively selected in Chinese wheat breeding and that its frequency should be closely correlated with increases in GN (Fig. 11).

Geographic distribution of haplotypes of TaTEF-7A in global wheat cultivars
In order to evaluate comprehensively and systematically the distribution of all TaTEF-7A haplotypes in global wheat cultivars, haplotypes were determined in cultivars from North America, Australia, China, CIMMYT, Europe, and Russia. In Chinese landraces, Hap-7A-1 was predominant across all zones (Fig. 12A), whereas the other two haplotypes had lower frequencies in some ecological regions. The favoured haplotype Hap-7A-3 occurred only in two zones (III at a frequency of 8.7% and VII at 11.1%). A population with 348 Chinese modern cultivars was employed to reveal further the haplotype distributions in China. As shown in Fig. 12B, Hap-7A-1 frequency declined in almost all zones except in VII. Interestingly, Hap-7A-3 dramatically increased in different zones, particularly I (14.3%), II (29.9%), and III (29.4%), which have the oldest and strongest breeding programmes in China. It can be inferred that Hap-7A-3 is the most favoured haplotype in breeding, because modern breeding has significantly promoted its frequency in Chinese modern cultivars (Jiang et al., 2011;Su et al., 2011). In addition, haplotype analysis of 384 wheat cultivars released in 18 European countries showed that the average frequency of Hap-7A-3 was 41.4%, ranging from 10% to 68.2%, which was much higher than that in cultivars released in China (Fig. 12C). Hap-7A-3 distribution showed a geographic bias, with a higher frequency in western than in eastern Europe (Fig. 12C). Comparing the global distribution in six major international wheat regions (Fig. 12D), Hap-7A-3 occurred at the highest frequency in varieties released in Russia and North America, but at lower frequencies in CIMMYT and Australian cultivars. This further demonstrated that the favoured haplotype Hap-7A-3 underwent selective pressure but with different degrees in global wheat breeding.

Genome rearrangement of the TEF orthologues occurred in grass crop evolution
Poaceae species have diverged >65 million years. Comparative mapping studies indicate that there is high synteny between species at the macro level (Sorrells et al., 2003). An 89 amino acid wheat protein encoding the TaTEF-7A gene was identified using genome-specific primers and Chinese Spring nullisomictetrasomic lines. In rice, LOC_Os02g04160, the orthologue of TaTEF-7A, is located on chromosome 2. Collinearity was shown between wheat homoeologous group 6 and rice chromosome 2 (Bolot et al., 2009;Su et al., 2011). To date, there is little information to identify and characterize shared duplications within wheat and between rice and wheat. Interestingly, one model proposed a pathway for evolution of the rice, wheat, sorghum, and maize genomes from a common ancient ancestor (n=5) followed by genome duplication, with breakage and fusion of different chromosomes in the genomes of each of the four species (Salse et al., 2008). Translocations and fusions were found between ancestral chromosomes A2, A4, and A6, which inevitably led to rearrangements of wheat homoeologue groups 6 and 7, and this also disrupted the collinearity between wheat group 6 chromosomes and rice chromosome 2. The amplification products of TaTEF-7A produced by three pairs of genome-specific primers were mapped on chromosome 7A. This confirmed the above opinion and indicated that grass genomes are labile, rapidly evolving entities with complex structural and functional relationships (Quraishi et al., 2011).

Advantages of genetic analysis using natural populations and NILs
Dissecting the genetics underlying complex traits in crops not only provides insights into genetic pathways, but also provides targets for MAS in breeding (Clark, 2010). Recently, candidate gene-based association analysis has been used to trace the origin of agronomically important alleles and to explore the process of domestication of cultivated rice. This approach takes advantage of historical and evolutionary recombination events in natural populations to resolve complex trait variations in individual nucleotides (Lu et al., 2012). Combining linkage analysis and candidate gene association mapping, Zhang et al. (2014) identified a soybean gene related to P efficiency. Moreover, for pleiotropic genes, association mapping also dissects trait correlations at the gene level because polymorphic sites can be independently associated with different traits (Chen and Lubberstedt, 2010). For example, the maize pleiotropic gene Dwarf8, which affects both flowering time and plant height, was shown to contain two SNPs that were independently associated with each trait (Thornsberry et al., 2001). The MCC contains 1% of the basic wheat germplasm collection with an estimated 70% representation of the genetic variation in that collection, making it a suitable population for detection of major QTLs controlling yield traits Zhang et al., 2012). In this study, candidate gene association mapping showed that TaTEF-7A was significantly associated with yield traits. The location of this gene was consistent with the location of a previously identified QTL associated with yield and also likely to possess pleiotropic influences on grain yield traits. NILs are a well known useful and reliable tool to evaluate trait-related allelic effects, but there are few reports of yield traits being validated by NILs in wheat (Lei et al., 2012). In this study, a NIL derived from a backcross of Lumai 14 (recurrent parent, genetic background reconstitution, 93.8%) and Jinmai 47 (donor parent) was used for functional analysis of TaTEF-7A by comparing haplotype Hap-7A-2 with Hap-7A-3. Haplotype Hap-7A-3 had a higher GN and longer penultimate internode (Fig. 8). Hence, this result with NILs was consistent with the association analysis using natural populations and led to the conclusion that TaTEF-7A was the major genetic determinant controlling yield-related traits on chromosome 7A, especially grain number, which was verified by overexpression of the gene in Arabidopsis.

TEF1 affects plant vegetative and reproductive development
Over the past few years, it has become apparent that TEFs regulate various aspects of plant development expressed as Hap-7A-2 Hap-7A-3 GN Fig. 11. TaTEF-7A haplotype frequencies and GN/spike changes over decades in Chinese modern wheat cultivars released during the 1940s to 1990s. changes in flowering, branching, leaf venation patterns, root growth, and seed development (Guo et al., 2002;Grasser et al., 2009;Lolas et al. 2010). In rice, Paul et al. (2010) reported a T-DNA insertion mutant of OsTEF1, an orthologue of TaTEF-7A. Functional loss of this gene led to 60-80% reduced tillering, retarded growth of seminal roots, and sensitivity to salt stress compared with wild-type Basmati 370. Detailed transcriptional profiling of OsTEF1 revealed that mutation in the transcribed elongation factor influenced >100 genes with known function. It finely regulated tillering in rice by induction of expression of MAX1, a member of the cytochrome P450 family. P450s regulates various aspects of biological development, such as cell proliferation (CYP2D6) (Sandee and Miller, 2011), endosperm growth (CYP78A) (Schyman et al., 2010), and grain yield-related parameters (CYP714D1) (Yang et al., 2013). A pathway was proposed for TEF regulation traits through interaction with the Cdc73 component of the Paf1 complex in plants (Kubinski et al., 2006). By its methyltransferase action on histone, the Paf1 complex silences the expression of many genes, such as MAX1 members (Paul et al., 2010). MAX1 acts downstream of HTD1/MAX3 and D3/MAX4 to produce carotenoid-derived branch-inhibiting hormones, such as strigolactone and auxins (Grasser et al., 2009). In this study, overexpression of TaTEF-7A in A thaliana led to pleiotropic effects affecting vegetative development, number and lengths of siliques, and grain length compared with the wild type. Clearly, this gene needs further investigation. It was also shown that TaTEF-7A had a significant influence on wheat yield traits by association analysis; specific haplotypes of this gene were significantly associated with GN, an important component of grain yield. This effect was confirmed in a study of NILs, and the findings indicated that TaTEF-7A probably has pleiotropic effects on plant development, especially on young developing spikes and seed.

Potential application of a functional marker for
TaTEF-7A in global wheat breeding for high GN GN is an important yield trait that continues to attract the attention of wheat breeders. Genes contributing to high GN should be targets for selection in breeding. A significant difference in GN between Hap-7A-3 and other haplotypes indicated that it has been a target of global wheat breeding, especially in China (Figs 11,12). Since 1950 wheat varieties have changed 4-6 times in China, with ~10% yield increases in each cycle (Zhuang, 2003). Yield increases in China have largely depended on higher TKW and GN (Zhang et al., 2012). As shown in Fig. 11, further increases in the frequency of Hap-7A-3 should lead to higher overall grain yields. Plant breeding through phenotypic selection has made considerable progress in this regard, but it is a time-consuming and relatively inefficient process (Gedye et al., 2012). Recently, MAS has provided a strategy for accelerating progress. Functional markers have been developed for plant height (Wu et al., 2011), vernalization response (Yan et al., 2004), photoperiod response (Beales et al., 2007), TKW (Jiang et al., 2011;Su et al., 2011), disease resistance (Gennaro et al., 2009), and grain quality . In this study, after functional verification in transgenic Arabidopsis of the yield-related gene TaTEF-7A, molecular markers based on variation in the promoter region of TaTEF-7A were developed to identify haplotype Hap-7A-3. Geographic distributions of Hap-7A-3 indicated that such a marker for increasing GN should have worldwide application.

Supplementary data
Supplementary data are available at JXB online. Figure S1. Sequence characterization of TEF genes in wheat and other plant species. Figure S2. Gene sequence of TaTEF. Figure S3. cDNA sequences of TaTEF homoeologues. Figure S4. PCR amplification of cDNA in nullisomic-tetrasomic lines of homoeologous group 7. Figure S5. Linkage disequilibrium matrix among pairwise polymorphisms in the promoter region of TaTEF-7A .  Table S1. The 48 cultivars used for identification of haplotypes.