TaBT1, affecting starch synthesis and thousand kernel weight, underwent strong selection during wheat improvement

Knockdown of TaBT1 caused decline of wheat grain starch content, and natural variations in its promoter region led to significant differences in thousand kernel weight.


Introduction
Common wheat (Triticum aestivum L.) is one of the most important staple crops worldwide, providing ~50% of human calorie intake. It has been estimated that wheat production needs to increase by 70% from the present level to meet the demand of an increased world population by 2050 (International Wheat Genome Sequencing Consortium, 2014). With reduction of arable land area, increasing yield is the foremost objective for world wheat breeding programs (Khush, 2013). Wheat production improvement in China mainly depends on the increase of yield per unit area, while thousand kernel weight (TKW) has contributed the most over the past six decades (Zheng et al., 2011;Wang et al., 2012;Zhang et al., 2012). As grain filling directly determines grain weight and starchy endosperm comprises >80% of the cereals grain, increasing starch synthesis could effectively improve TKW and yield (Barron et al., 2007;Zuo and Li, 2013).
Starch synthesis in wheat grain involves the participation of multiple enzymes and transporters (International Wheat Genome Sequencing Consortium, 2014;Kumar et al., 2018). Sucrose, the primary substrate of starch, is first transported into the grain endosperm cells by the sucrose transporter (SUT) (Deol et al., 2013). Then, under successive catalysis of sucrose synthase (SUS) and UDP-glucose pyrophosphorylase (UDPase), sucrose is converted into glucose-1-phosphate (G1P) (Neuhaus and Emes, 2000). After the catalysis of ADP-glucose pyrophosphorylase (AGPase), G1P is converted into ADP-glucose (ADP-Glc), the precursor of starch. Then ADP-Glc is transported into amyloplast for starch synthesis by the BT1 protein located on its membrane (Beckles et al., 2001). As 65-95% of the total AGPase activity in wheat endosperm is cytoplasmic (Tetlow et al., 2003), this implies that most of the ADP-Glc for starch synthesis is synthesized in the cytoplasm. Therefore, BT1 protein, responsible for transmembrane transportation of ADP-Glc, plays a vital role in starch synthesis (Denyer et al., 1996;Thorbjørnsen et al., 1996;Burton et al., 2002). BT1 belongs to the mitochondrial carrier family (MCF), which is responsible for the translocation of small molecules between the mitochondria and cytoplasm (Haferkamp, 2007;Palmieri and Monné, 2016). Generally, MCF proteins are presumed to be located in the inner membrane of mitochondria (Millar and Heazlewood, 2003); some are believed to be involved in the transport of multiple solutes across organelle membranes (Leroch et al., 2005;Palmieri et al., 2006;Kirchberger et al., 2008). The bt1 mutant was first discovered in maize, which shows reduced rates of ADP-Glc uptake into amyloplasts and accumulates substantial ADP-Glc in the cytoplasm, leading to a collapsed angular appearance of amyloplasts at maturity (Mangelsdorf, 1926;Liu et al., 1992;Tobias et al., 1992;Shannon et al., 1996Shannon et al., , 1998. Rice OsBT1 could also affect starch synthesis and starch granule formation, and its mutants result in smaller grain size and starch granules (Cakir et al., 2016;Li et al., 2017). In barley, maize, and wheat, BT1 maintains a conserved function in importing ADP-Glc by counter-exchange with ADP in vitro (Bowsher et al., 2007;Kirchberger et al., 2007;Soliman et al., 2014).
To date, studies on the physiological role of wheat BT1 protein have seldom been reported. The aim of the current study was to examine the phenotypic effect of wheat BT1, identify its favorable alleles, and investigate the breeding selection to TaBT1. In order to address these questions, we first discuss the alteration of starch granule morphology, starch content, and yield-related components in RNAi transgenic wheat. Secondly, we conducted association analysis between TaBT1 haplotypes and yield-related phenotypes. Thirdly, we evaluated the frequency of TaBT1 haplotypes in wheat modern cultivars and its nucleotide diversity in wheat wild relatives. Our results demonstrate the essential role of BT1 in wheat starch synthesis and TKW, and its potential for wheat molecular breeding.

Plant materials and field trials
Two populations, Chinese 157 landraces from the mini-core collection (MCC) and 348 Chinese modern cultivars (MC) released since the 1940s (see Supplementary Table S1 at JXB online), were used for association analysis between genotypes and yield-related phenotypes . Two populations were planted at Luoyang, Henan province (China) during 2002(2002LY) and 2005(2005 cropping seasons, and at Shunyi, Beijing (China) in the 2010-2011 (2010 SY) cropping season. Field trials were conducted in randomized complete blocks with three replicates at all locations. Each plot consisted of three 2 m rows spaced 20 cm apart. Collection of phenotypic traits was followed as described by Zheng et al. (2014). Ten plants from the middle of each plot were randomly sampled. Phenotypic traits include effective tiller number (ETN), spikelet number per spike (SN), grain number per spike (GN), TKW, kernel length (KL), kernel width (KW), and kernel thickness (KT).
A wheat introgression line derived from Fumai/Handan 6172 (BC 3 F 7 ) (Supplementary Table S2) was used to verify the effect of TaBT1-6B haplotypes on TKW. Fifteen common wheat cultivars (Supplementary  Table S3) were used for expression difference analysis among TaBT1-6B haplotypes. A total of 67 wheat ancestor collections (Supplementary  Table S4) were used in an evolutionary study of BT1 homoeologous genes. These collections comprised 17 diploid collections with the AA genome, 13 with the SS genome, 19 with the DD genome, and 18 tetraploid wheat collections with the AABB genome. In addition, Chinese 348 MC and 957 MC (Supplementary Table S5) from the other five major wheat production regions worldwide were used to investigate the global geographic distribution of TaBT1-6B haplotypes. These latter cultivars comprise 398 from North America, 53 from CIMMYT, 375 from Europe, 71 from the former USSR, and 60 from Australia.

Cloning and sequencing of TaBT1 homoeologous genes
All primers used in this study (Supplementary Table S6) were designed by the Primer Premier 5.0 software (http://www.premierbiosoft.com/; last accessed Jan 2019) and synthesized by the BGI Tech (Shenzhen, China). PCR was performed in a total volume of 15 μl, containing 7.5 μl of GC buffer I, 50 ng of DNA, 1 μl of forward and reverse primers (10 mM), 0.24 μl of dNTPs (25 mM), and 0.15 μl of LA Taq polymerase (Takara, Dalian). PCR was carried out in a Veriti 96-Well Thermal Cycler (Applied Biosystems, USA) with the following procedure: denaturing at 94 °C for 5 min, followed by 35 cycles of 94 °C for 30 s, annealing at 55-65 °C for 45 s, and 72 °C for extension (1 kb min -1 ), and finally extension at 72 °C for 10 min. PCR products were purified by an AxyPrep DNA Gel Extraction Kit (Axygen Biosciences, Hangzhou), cloned into the pEASY-T1 Cloning Vector, and then transformed into Trans1-T1 competent cells by the heat shock method (TransGen Biotech, Beijing). Conjugative plasmids were extracted with an AxyPrep Plasmid Miniprep Kit (Axygen Biosciences) and sequenced by an ABI 3730XI DNA Analyser (Applied Biosystems).

Gene expression analysis
For assay of the spatial and temporal expression pattern of TaBT1, various tissue samples were collected at different developmental stages at 06.00 h from T. aestivum cv. Chinese Spring grown in the field at the Chinese Academy of Agricultural Sciences (CAAS) experimental station in Beijing during the natural cropping season in 2014-2015. For assay of the rhythmic expression pattern of TaBT1, flag leaves and developing grains at 10 days post-anthesis (DPA) were sampled from Chinese Spring in a growth chamber simulating the natural light intensity in the field with light-dark cycle conditions of 12 h light/12 h dark at 20 °C. The samples were collected in two cycles from 06.00 h and subsequently every 4 h.
Total RNA was extracted with a RNAprep Pure Plant Kit Plus Reagent (TIANGEN Biotech, Beijing), and cDNA was synthesized with the FastQuant RT Kit (TIANGEN Biotech). Gene expression was measured by quantitative real-time PCR (qRT-PCR) in a total volume of 20 μl, containing 2 μl of cDNA, 0.4 μl of each primer (2 μM), 0.4 μl of ROX Reference Dye (50×), and 10 μl of 2× SYBR ® Premix Ex Taq (Takara, Dalian). TaActin was used as the endogenous control to normalize expression levels of different samples. All assays were performed and analyzed from two independent experiments, each consisting of three technical repetitions.

Subcellular localization
To investigate the subcellular localization of TaBT1, the cDNA sequence of TaBT1-6D without the termination codon was isolated from Chinese Spring with primer set Sub-F containing a HindIII site and Sub-R with a BamHI site (Supplementary Table S6). After sequencing verification, the fragment was digested with HindIII and BamHI, and fused to the green fluorescent protein (GFP) gene in the pJIT163-GFP vector to generate TaBT1-GFP. The fusion construct and GFP control were each introduced into wheat protoplasts from seeding leaves following Mao et al. (2011). After an incubation at 22 °C for 18 h in the dark, the fluorescence signal of GFP was observed under a confocal laser scanning microscope (LSM880, Carl Zeiss, Germany). At least 10 independent protoplasts were imaged per experiment.

Wheat transformation and Southern blotting
A trigger fragment of 355 bp derived from the coding sequence of TaBT1-6D was used to generate the TaBT1-RNAi construct. The sense fragment was amplified with primer pair BT1-147F containing a BamHI site and BT1-501R with a KpnI site. The antisense fragment was amplified with primer pair BT1-A147F containing a SacI site and BT1-A501R with a SpeI site. After the two fragments were successively cloned to pWMB006 vector and ascertained as the correct sequence, the resulting construct was digested by HindIII and EcoRI, and cloned into pCAMBIA3301 vector. The TaBT1-RNAi construct was first mobilized into Agrobacterium tumefaciens strain EHA105, and then transformed into T. aestivum cv. Fielder by the A. tumefaciens-mediated transformation method . DNA of the transformed plants was extracted by the cetyltrimethylammonium bromide (CTAB) method (Doyle, 1987). The positive lines were verified by PCR amplification with primer pair BT1-A501R/NosR2 (Supplementary Table S6).
Southern blotting was conducted as described by Wang et al. (2017). In brief, a large amount of total genomic DNA was first extracted with a standard CTAB method (Sambrook et al., 1989); then 10 μg of DNA from each sample was digested with HindШ. The digested DNA were then fractionated on a 0.8% agarose gel and transferred onto a nylon Hybond-N + membrane. PCR products of the bar gene (429 bp), amplified by bar-F/R, were labeled with digoxigenin and used as probes to hybridize with the digested DNA on the membrane. The hybridization and detection steps were conducted with the DIG High Prime DNA Labeling and Detection Starter Kit II (Roche, Germany).

Phenotypic identification of TaBT1-RNAi transgenic wheat
Both the wild-type control and transgenic plants were grown in the field at the CAAS experimental station in Beijing (40.18°N, 116.58°E) during the natural growing season. Positive transgenic lines were self-pollinated and harvested, and transgenic T 2 plants were used for further analysis. Developmental grains at 10 DPA was sampled for TaBT1 expression assay, and TaActin was used as the internal control. For each line, three biological replicates, each with three technical replicates, were performed and the data were expressed as means ±SE. At maturity, SN and GN were evaluated from main stem spikes. After grains from single plants were harvested and dried, KL, KW, and KT of each plant were measured following the method described by Qin et al. (2014), while TKWs were converted from grain weight of an individual plant. Phenotypic data of each line came from the average value of 15 individual plants.
For the observation of starch granules, mature grains of wild-type and transgenic wheat were crushed with a mortar, and the surface of ruptured transverse sections was coated with gold powder to prepare samples for observations. All samples were then observed by field emission SEM (SU8010, Hitachi, Japan). At least two mature grains from three independent plants of each line were imaged. Micrographs of each sample were taken with a magnification of ×600. Based on this, the starch granule number of each line was counted from five micrographs of the same size. Grain total starch content was quantified according to the national standard (NY/T11-1985) for cereals with the optical rotation method. Data were collected from at least five plants with three replicates per line.

SNP detection and molecular marker development
Thirty-six common wheat cultivars with high diversity (Supplementary  Table S7) were used to detect the diversity of TaBT1 homoeologous genes. After sequencing, the sequences were aligned by DNAMAN (http:// www.lynnon.com/; last accessed Jan 2019), and SNPs (single nucleotide polymorphisms) were identified by DNASTAR (http://www.dnastar. com/; last accessed Jan 2019). Two molecular markers were developed to discriminate the three haplotypes at TaBT1-6B. Compared with Hap2 and Hap3, Hap1 has a deletion of 4 bp at base pair position -2029. Based on this, the InDel-2029 marker was developed by two-step PCR. In the first round, the genome-specific primer set B-2F/4R was used to amplify fragments in all cultivars. Then the PCR product was diluted 30 times, and 1 μl was taken as template for the second round PCR with primers B-357F/569R. After the two-step PCR, the fragment length of Hap1 was 192 bp, whereas those of Hap2 and Hap3 were 196 bp. The CAPS-1664 marker was developed based on the SNP (T/C) at base pair position -1664. Only the amplified fragments of Hap3 could be cleaved by the restriction endonuclease AclI. Thus, the PCR product of Hap1 and Hap2 amplified by the genome-specific primer pair B-CAPS-F/R was 1099 bp, whereas those of Hap3 were 778 bp and 321 bp after digestion.

Rice transformation and β-glucuronidase (GUS) activity assay
The promoter region (~2.5 kb) of three haplotypes at TaBT1-6B were isolated with primers Pro-Fe and Pro-Re, introducing DNA restriction enzyme sites HindIII and EcoRI. The three fragments were then inserted into vector pCAMBIA1391Z containing the GUS reporter gene; pCAMBIA1391Z with its native promoter was used as the vector control. Constructs were first mobilized into A. tumefaciens strain EHA105, and then transformed into Oryza sativa cv. Kitaake by the A. tumefaciens-mediated transformation method (Hiei et al., 1994). Positive transgenic rice lines of the T 1 generation were grown at Langfang (39.03°N, 116.04°E), Hebei Province (China), and grains at 21 DPA were sampled for subsequent assays.
GUS staining was performed as described by Kosugi et al. (1990). Images of stained grains were captured by using a Discovery V20 stereomicroscope (Carl Zeiss, Germany). Fluorometric GUS activity assays were based on a previous method with minor modifications (Hänsch et al., 1995). For each sample, 200 mg of fresh grains were ground into powder after freezing in liquid nitrogen, and then homogenized in GUS extraction buffer (0.02885 M Na 2 HPO 4 , 0.02115 M NaH 2 PO 4 , 0.01 M EDTA, 0.1% SDS, 0.1% β-mercaptoethanol, and Triton X-100). The homogenate was then centrifuged at 12 000 rpm for 10 min at 4 °C. Protein concentration in the supernatant was measured according to the method described by Bradford (1976). The supernatant was assayed for GUS activity with 4-methylumbellifery-β-d-glucuronic acid (4-MUG) as the substrate. Fluorescence values were measured on a TriStar LB941 fluorescence spectrophotometer (Berthold, Germany) with 4-methylumbelliferon (4-MU) as the calibration control.

Isolation and sequence analysis of TaBT1 homoeologous genes
Primers TaBT1-5F/1536R based on the mRNA sequence of TaBT1 (GenBank accession no. BT008958.1) were designed to amplify the genomic DNA and cDNA from Chinese Spring. All three homoeologous genes consist of three exons and two introns, and the genomic sequence lengths of TaBT1-6A, -6B, and -6D are 1479, 1473, and 1470 bp, respectively. The coding sequences of both TaBT1-6B and -6D are 1290 bp, encoding 430 amino acid polypeptides, whereas that of TaBT1-6A is 1302 bp, encoding a 434 amino acid polypeptide. Moreover, the proteins encoded by the three homoeologous genes show a high identity of >98% (Fig. 1A). In addition, the promoter regions of TaBT1-6A/6B/6D (~2.5 kb) were obtained from wheat 3BSEQ (http://wheat-urgi.versailles.inra.fr/Projects/3BSeq; last accessed Jan 2019), and a Skn-1 motif in the promoters of all the three genomes was predicted to be relevant to endosperm expression (Fig. 1B) TaBT1 is mainly expressed in developing grains and is localized to the plastid qRT-PCR was employed to analyze the expression pattern of TaBT1 in different tissues with the universal primer set BT1-RT-635F/881R. In the current study, TaBT1 was mainly expressed in developing grains, and the transcription levels were extremely high from 10 to 15 DPA ( Fig. 2A). As all tissues samples were collected at 06.00 h, to eliminate the effect of sampling time, we used flag leaves and developing grains of Chinese Spring at 10 DPA to study the circadian expression pattern of TaBT1. The results showed that TaBT1 was highly expressed in developing grains after being exposed to darkness for 12 h, but no expression was detected in flag leaves throughout the cycle ( Supplementary Fig. S1). This is a further confirmation that TaBT1 is mainly expressed in developing grains. Then we analyzed the expression differences among TaBT1-6A, -6B, and -6D in developing grains with genome-specific qRT-PCR primers. As expected, the three homoeologous genes displayed similar expression patterns, but the expression levels differed: TaBT1-6A exhibited the highest level, -6D was slightly lower, and -6B was substantially lower at 10 and 15 DPA (Fig. 2B).
To investigate the subcellular localization of TaBT1, a TaBT1-GFP fusion was constructed and transiently expressed in wheat protoplasts. As shown in Fig. 2C, the free GFP fluorescence was observed in the entire cell, whereas TaBT1-GFP fusion protein fluorescence was targeted to the chloroplast, indicating that TaBT1 is a plastid-localized protein.

Knockdown of TaBT1 induces alteration of grain weight and starch content
To investigate the potential function of TaBT1, we constructed TaBT1 transgenic RNAi lines in T. aestivum cv. Fielder. The  355 bp trigger fragment from TaBT1-6D for the RNAi vector construction shared 97.7% and 98.6% sequence similarity with TaBT1-6A and TaBT1-6B, respectively. A total of 24 individuals (T 0 ) were obtained, of which 17 were transgenic positive lines, while the other 7 were negative. Compared with the wild type, the positive transgenic lines did not show any obvious alteration in ETN, SN, and GN ( Supplementary  Fig. S2), while a decrease in grain size was observed (Fig. 3A). Three representative field-grown positive lines with a double copy ( Supplementary Fig. S3) had reductions by 0.7, 4.5, and 5.6% for KL (Fig. 3B), 8.1, 13.2, and 23.3% for KW (Fig. 3C), and 10.1, 15.4, and 17.1% for KT (Fig. 3D). Moreover, TKW in the three lines was significantly reduced by 15.7, 31.1, and 47.3%, respectively (Fig. 3E).
Expression analysis showed that compared with the wild type, transcript levels of TaBT1 in the above three lines were reduced by 20.8, 50.3, and 69.5%, respectively (Fig. 4A). Further, all three homoeologous genes were simultaneously silenced in the transgenic lines ( Supplementary Fig. S4). Also, the arrangement of starch granules became loose ( Fig.  4B-F) and the proportion of A-and B-type starch granules was clearly changed (Fig. 5A). Compared with the wild type, the number of A-type starch granules in the three positive lines decreased by 27.3, 36.4, and 45.5%, respectively. In contrast, the number of B-type granules increased by 0.7, 0.9, and 1.1 times. Moreover, total starch contents in the three lines were reduced by 3.4, 6.8, and 10.6%, respectively (Fig. 5B). The above results reflect that TaBT1 affects starch synthesis and TKW in wheat.

Diversity is detected only at the TaBT1-6B locus in common wheat
Considering the obvious effects of TaBT1 in wheat, we then detected its nucleotide polymorphism with 36 common wheat cultivars with high diversity (Supplementary Table S7) to find preferred alleles that can be used in wheat production. No allelic variation was found at either TaBT1-6A or TaBT1-6D loci, whereas 24 polymorphic sites were detected at TaBT1-6B, with two SNPs (base pair 378 A/G; base pair 1332 C/T) falling within the coding region and the rest in the promoter region. These variations formed three haplotypes, Hap1, Hap2, and Hap3. Moreover, haplotype-specific cis-elements at the mutation sites in the promoter region were predicted (Fig. 6A). Two molecular markers, named InDel-2029 and CAPS-1664, were developed to discriminate the three haplotypes (Fig. 6B, C). Based on a deletion of 4 bp at base pair position -2029 (Fig. 6A), the InDel-2029 marker was developed to discriminate Hap1 from Hap2 and Hap3. After two-step amplification, the PCR products could be distinguished on 6% denaturing polyacrylamide gels (Fig. 6B). The CAPS-1664 marker was developed based on the SNP (T/C) at base pair -1664 (Fig.  6A). After being cleaved by the restriction endonuclease AclI, the amplified fragments of Hap3 could be discriminated from Hap1 and Hap2 (Fig. 6C).

Strong haplotype effect on TKW is detected at TaBT1-6B
The InDel-2029 and CAPS-1664 markers were used to genotype 157 landraces from the MCC and 348 cultivars from the MC. Differences in ETN, SN, GN, KL, KW, and KT were not notable among TaBT1-6B haplotypes, while the differences in TKW were significant (Table 1). In landraces, the mean TKW of Hap1 was significantly higher than that of Hap3 by 22.8% in 2005 (P<0.01). In modern cultivars, the mean TKW of Hap1 was significantly higher than that of Hap3 in all three environments (P<0.01) by 19.2% in 2002 LY, 20.9% in 2005 LY, and 12.3% in 2010 SY, respectively. In addition, the mean TKW of Hap2 was markedly higher than that of Hap3 by 13.4% in 2002 LY and 16.0% in 2005 LY (P<0.01). However, no significant difference was found between Hap1 and Hap2 in landraces or modern cultivars. These findings show that Hap1 and Hap2 are preferred haplotypes correlated with higher TKW. Additionally, the haplotype effects on TKW between Hap1 and Hap3 were proved in a wheat introgression line derived from Fumai/Handan 6172; the mean TKW of Hap1 (50.6 g) was significantly higher than that of Hap3 (45.3 g) at P=0.05 (Supplementary Table S2). These results indicate that TaBT1-6B-Hap1 and -Hap2 positively regulate TKW.

Expression differences among TaBT1-6B haplotypes are mainly attributed to variations in the promoter region
To explore the possible causes of haplotype effects, we further investigated the sequence differences among TaBT1-6B haplotypes. Since variations in the coding region did not lead to changes in amino acids, we then focused on polymorphisms in the promoter region. Thus, the promoters of Hap1, Hap2, and Hap3 were each integrated into the vector pCAM-BIA1391Z to drive GUS gene expression in transgenic rice (Fig. 7). Histochemical staining showed that developing grains with exogenous promoters of Hap1 and Hap2 were stained distinctly darker than those of Hap3, while grains carrying the vector control were not stained (Fig. 7A). In addition, the GUS  .62 a 35.17±1.19 a 32.64±0.73 a 35.72±2.28 a 31.63±0.88 a b 29.09±0.61 b 33.12±1.17 a 33 activities in grains of Hap1 and Hap2 were 3.1 and 2.0 times significantly higher than that in Hap3, respectively ( Fig. 7B; Supplementary Table S8), indicating that promoters of Hap1 and Hap2 possessed higher driving activity. Sequence prediction revealed that some cis-elements were located at the haplotype-specific polymorphic sites (Fig. 6A), but the exact site responsible requires further confirmation. Considering that polymorphisms in the promoter region could influence gene transcription (Zuo and Li, 2014), we then selected five common wheat accessions of each haplotype to quantify the expression difference among TaBT1 haplotypes. As expected, the mean expression level of Hap1 and Hap2 in grains at 10 DPA was 4.3 and 2.7 times higher than that of Hap3 (Fig. 8), generally in agreement with the GUS activities of different haplotypes. To sum up, the results from sequencing, rice transformation studies, and qRT-PCR analysis suggest that the effect on TKW is due to polymorphisms in the promoter region of TaBT1.

TaBT1 underwent selection during wheat improvement and polyploidization
Several yield-related genes have been reported to have undergone breeding selection by our group (Su et al., 2011;Ma et al., 2016;Hou et al., 2017). To investigate whether the TaBT1-6B haplotypes were selected during the course of Chinese wheat breeding, 157 landraces from the MCC and 348 cultivars from the MC were genotyped by the two markers mentioned above. From landraces to modern cultivars, the ratio of the preferred haplotypes, Hap1 and Hap2, was increased from 49.7% to 89.7% (Supplementary Table  S1), and the increase was obvious in all 10 ecological wheat zones, especially in the major production zones I, II, and III ( Fig. 9A-B). Further evidence showing that Hap1 and Hap2 underwent positive selection, while Hap3 underwent negative selection, during Chinese wheat breeding process is provided in Fig. 9C. Their ratios in cultivars released in the other five major wheat production continents around  Table S5). The extremely high ratios of Hap1 and Hap2 in American, CIMMYT, European, and Australian wheat cultivars might be due to their longer breeding histories. The results demonstrate that TaBT1-6B-Hap1 and -Hap2 are dominant in global wheat cultivars and that selection pressure differs in different production regions around the world.
Common wheat originated from two-step polyploidization events (Kihara and Lilienfeld, 1949). To examine the evolution of BT1 during the wheat polyploidization process, we analyzed the nucleotide diversity (π) of BT1-6A, -6B, and -6D in wheat progenitor collections. The result showed that the π value of all the three genomes declined to a different extent during polyploidization events (Fig.  10). In terms of BT1-6A and -6D, the decline did not reach significant levels (P>0.05), because they had extremely low diversity in the diploids and tetraploids (Fig. 10A-B; Table  2). However, the diversity decline at BT1-6B was distinct (P<0.05) for both the π and F ST values between species of different ploidy (Table 2; Fig. 10D). This indicates that BT1-6B has undergone selection during both tetraploidization and hexaploidization.

Discussion
TaBT1 plays a vital role in starch synthesis and is significantly correlated with TKW in common wheat BT1, which is responsible for the transmembrane transportation of ADP-Glc in endosperm starch synthesis, universally exists in cereals (Denyer et al., 1996;Sikka et al., 2001;Tetlow et al., 2003). It affects TKW in maize and rice by regulating starch synthesis, and the mutants exhibit abnormal phenotypes (Sullivan et al., 1991;Cakir et al., 2016;Li et al., 2017). Although BT1 has been isolated and characterized in a variety of crops, it has not been identified in wheat and its effect on wheat yield has not been reported yet. In this study, we cloned and characterized TaBT1 in common wheat. Expression analysis as well as the Skn-1 motif in the promoter region indicate that TaBT1 has a grain-specific expression pattern. TaBT1 homoeologous genes were highly expressed in developing grains at 10 and 15 DPA, the period during which starch is being synthesized in the endosperm (Fig. 2), similarly to that of rice OsBT1-1 (Toyota et al., 2006). We may also infer that amyloplast formation has almost been completed by 15-20 DPA (Fig.  2B), but starch synthesis can be detected even at 35-40 DPA in wheat endosperm in major Chinese production regions.
Previous studies indicate that the formation of A-type starch granules mainly initiates at 4-15 DPA, while that of B-type granules generally starts at 15 DPA and lasts until the grain matures (Darlington et al., 2000;Peng et al., 2000). In this study, the expression of TaBT1 in transgenic RNAi lines was knocked down, which may subsequently affect the formation of starch granules and total starch content (Figs 4,5). In addition, the transgenic wheat exhibited a significant decrease in TKW, compared with the wild type (Fig. 3E). All of the above results indicate that TaBT1 plays a vital role in starch synthesis and is significantly correlated with TKW in common wheat.

Natural variation in the promoter region of TaBT1-6B influences its activity
Since TaBT1 has a significant effect on TKW, we then investigated its diversity in common wheat in an attempt to find elite allelic variations. Three haplotypes at TaBT1-6B were formed by 24 polymorphic sites, and most variations were located in the promoter region. Previous studies have shown that polymorphisms in gene promoter regions could give rise to variation in yield-related traits by affecting binding elements and subsequent expression Qin et al., 2014;Zuo and Li, 2014). For example, variation in the promoter of OsGS5 leads to expression level changes and grain width variations . In this study, some cis-elements were previously found to be haplotype specific (Fig.  6A). For instance, an element named PRECONSCRHSP70A, involved in the expression of HSP70A under the induction of Mg-protoporphyrin IX and light (von Gromoff et al., 2006), was found at base pair -1271 of Hap1. Moreover, an ASF1 motif involved in transcriptional activation of several genes is present at base pair -1813 of Hap3 (Després et al., 2003). The influence of these cis-elements on the expression of TaBT1 needs to be further investigated. Both GUS activities in transgenic rice grains with exogenous promoters and expression of TaBT1-6B in varieties of the three haplotypes indicate that Hap1 and Hap2 possess higher promoter activity than Hap3 (Figs 7, 8; Supplementary Table S8). Moreover, association analysis between haplotypes and yieldrelated phenotypes in the MCC and MC, combined with TKW differences between Hap1 and Hap3 in introgression lines (Table  1; Supplementary Table S2), further indicate that Hap1 and Hap2 were favorable haplotypes with a higher TKW. Thus, polymorphisms in the TaBT1-6B promoter region influence its driving activity and further affect TKW in wheat.

Potential application of molecular markers in wheat breeding
Genetic diversity analysis of functional genes is important for understanding the genetic background of phenotypic variation (Xu et al., 2014). Abundant genetic variation in wheat enables breeders to create better gene combinations and select superior varieties to meet local breeding objectives. Marker-assisted selection (MAS) provides an efficient method for genetic improvement of crops. Many functional markers associated with important agronomic traits, such as grain weight (Ma et al., 2016;Hou et al., 2017), plant height (Li et al., 2012), photoperiod response (Beales et al., 2007), disease resistance , and end-use quality (Sun et al., 2005;He et al., 2007), can be widely used in MAS (William et al., 2007;Liu et al., 2012). Based on variations in the promoter region of TaBT1-6B, the functional markers CAPS-1664 and InDel-2029 were developed to distinguish the three haplotypes in germplasm from China and five other global major wheat production regions (Fig. 9). These markers are co-dominant and can be easily implemented in the laboratory. Their potential value for selection of TKW was validated by association analysis; TaBT1-6B-Hap1 and -Hap2 identified with the two markers were correlated with higher TKW. Although the frequency of Hap1 and Hap2 was high enough among MC globally, further increases are still feasible in regions where Hap1 occurs at relatively lower frequencies. In addition, functional markers related to higher TKW so far reported are available for TaSus2-2A-HapA,  TaGS5-3A-T, and TaAGP-S1-7A-HapI Ma et al., 2016); pyramiding preferred alleles of these genes with MAS will be instrumental in wheat breeding by enhancing additive genetic variation.

TaBT1 has undergone selection during wheat polyploidization and improvement
Although domestication and modern breeding contribute to the development of different ecotypes and cultivars, genetic diversity of genes controlling agronomic traits has also been narrowed down (Haudry et al., 2007). Several yield-related genes in wheat have undergone genetic diversity reduction during polyploidization, such as the sucrose synthase gene (TaSUS1), the grain size gene (TaGS5), and the ubiquitin E3 ligase gene (TaGW2) (Su et al., 2011;Hou et al., 2014;Ma et al., 2016;Qin et al., 2017). In this study, the genetic diversity of BT1 differs among the three wheat genomes, with the highest identified in the B genome, followed by the D and A genomes; this is in general accord with previous studies (Gao et al., 2017;Hao et al., 2017). The higher diversity in the B genome may be due to the cross-pollination characteristics of Aegilops speltoides, the most likely donor of the B genome (Zohary and Imber, 1963;Shewry, 2009). However, the low diversity in the D genome was due to the bottleneck when common wheat arose from the extremely few hybridizations of tetraploid wheat with Aegilops tauschii ~10 000 years ago (Lupton, 1987;Petersen et al., 2006). Additionally, the A genome underwent higher selection pressure due to it carrying more agronomic trait-related genes (Peng et al., 2003;Marcussen et al., 2014;Hao et al., 2017). The extreme low diversity of BT1-6A and BT1-6D indirectly reflects the crucial role of this gene for propagation of these species ( Fig. 10; Table 2). TKW has been continually improved during evolution from diploid to tetraploid and to the widely cultivated hexaploid common wheat. This may be due to the accumulation of preferred haplotypes associated with high TKW. For example, the frequencies of TaBT1-6B-Hap1 and -Hap2 increased from Chinese landraces to MCs, indicating that the two haplotypes were positively selected during modern wheat breeding. Moreover, Hap1 and Hap2 were present at very high frequencies in varieties from five other global major wheat production regions. This suggests that selection for preferred haplotypes with high TKW was global, while frequencies of Hap1 could still be beneficially elevated.

Supplementary data
Supplementary data are available at JXB online. Table S1. Accessions of hexaploid wheat in the mini-core collection (MCC) and modern cultivars (MCs) in China. Table S2. Thousand kernel weight data for introgression lines derived from Fumai/Handan 6172 (BC 3 F 7 ). Table S3. Fifteen common wheat cultivars used for expression analysis among TaBT1-6B haplotypes. Table S4. Progenitor accessions used in this study. Table S5. Accessions of common wheat from five non-Chinese major wheat production regions. Table S6. Primers used in this study. Table S7. Thirty-six common wheat cultivars used for haplotype analysis of TaBT1. Table S8. GUS activities in transgenic rice grains. Fig. S1. Circadian expression of TaBT1-6A/6B/6D. Fig. S2. Effective tiller number (ETN), spike number (SN), and grain number (GN) in TaBT1 transgenic RNAi lines. Fig. S3. Southern blot detection of bar genes for the identification of transgenic wheat plants in the T 2 generation. Fig. S4. Relative expression of three homoeologous genes in TaBT1-RNAi transgenic wheat.