A proteomic analysis of grain yield-related traits in wheat

Abstract Grain yield, which is mainly contributed by tillering capacity as well as kernel number and weight, is the most important trait to plant breeders and agronomists. Label-free quantitative proteomics was used to analyse yield-contributing organs in wheat. These were leaf sample, tiller initiation, spike initiation, ovary and three successive kernel development stages at 5, 10 and 15 days after anthesis (DAA). We identified 3182 proteins across all samples. The largest number was obtained for spike initiation (1673), while the smallest was kernel sample at 15 DAA (709). Of the 3182 proteins, 296 of them were common to all seven organs. Organ-specific proteins ranged from 148 in ovary to 561 in spike initiation. When relative protein abundances were compared to that of leaf sample, 347 and 519 proteins were identified as differentially abundant in tiller initiation and spike initiation, respectively. When compared with ovary, 81, 35 and 96 proteins were identified as differentially abundant in kernels sampled at 5, 10 and 15 DAA, respectively. Our study indicated that two Argonaute proteins were solely expressed in spike initiation. Of the four expansin proteins detected, three of them were mainly expressed during the first 10 days of kernel development after anthesis. We also detected cell wall invertases and sucrose and starch synthases mainly during the kernel development period. The manipulation of these proteins could lead to increases in tillers, kernels per spike or final grain weight, and is worth exploring in future studies.


Introduction
Wheat contributes to nearly one-fifth of the total dietary calories and protein worldwide (Shiferaw et al. 2010;Reynolds et al. 2012). European Union, China, India and the USA are responsible for 60 % of the global production (Hawkesford et al. 2013). Genetic gains, defined as yield increases per unit time, have recently slowed down to <1 % per annum (Reynolds et al. 2012), while the consumption of cereals has increased worldwide (Shiferaw et al. 2010). For example, in China, wheat consumption increased from 19 million tons in 1962 to 123 million tons in 2012-a 6-fold increase in half a century (Curtis and Halford 2014). Therefore, it is important to identify the barriers and accelerate the genetic gains in wheat and other cereals.
Tiller number is a major agronomic trait in cereal crops affecting plant architecture and grain yield (Xu et al. 2017), which influences grain number (GN) per unit area (Sreenivasulu and Schnurbusch 2012). Final grain yield is the product of GN per unit area and kernel weight (KW) (Sreenivasulu and Schnurbusch 2012). The GN, in combination with the size of grains makes the total sink size (or sink strength). The limitation in sink strength is already a well-known bottleneck in modern wheat production (Calderini et al. 2006;Reynolds et al. 2012;Serrago et al. 2013). Yield improvements in the past have generally resulted from increases in GN rather than increases in KW. Understanding the physiology and genetics of sink strength is essential to design high-yielding crosses that could successfully enhance GN, mainly by increasing the number of grains per spike dry matter (Ferrante et al. 2012), while maintaining, if not increasing, the contribution of KW in improving yield (Sadras and Lawson 2011;Pask and Reynolds 2013;Aisawi et al. 2015).
Evidences provided by Miralles and Slafer (1995) and Acreche and Slafer (2006) suggest that the undesirable correlation between major yield components may not be due to a competition between growing grains. Experimental data showed simultaneous increases of GN and KW in the progeny of a cross between two CIMMYT lines with exceptional yield potential (Bustos-Korts et al. 2013;García et al. 2013). This cross produced grain yield of up to 16 t ha −1 in 3-4 % of the breeding progeny in exceptionally favourable environments (Bustos-Korts et al. 2013). This means that the undesirable correlation between the two major yield components may not represent any competition between growing grains after anthesis (Miralles and Slafer 1995;Acreche and Slafer 2006). Therefore, identification of genes that control GN and KW may provide opportunities to improve grain yield.
The development of reproductive growth involves spikelet formation and spike elongation. Identification of genes that derive inflorescence architecture can help improve our understanding and potentially help identify natural variants and their uses in wheat breeding. For example, Schilling et al. (2018) extensively discussed the importance of MADS-box genes for reproductive development. In general, the MADS-box genes are key regulators of flowering time, inflorescence architecture, floral organ identity and seed development.
Omics methods are valuable tools to identify genes and proteins that are differentially expressed/abundant in different organs, during different growth stages, and under different environmental conditions. In wheat, proteomics approaches are used in a variety of research such as grain development processes, spatiotemporal expression mapping and disease responses (Zhang et al. 2015;Yang et al. 2016aYang et al. , b, 2017Duncan et al. 2017). A recent large-scale proteome map of wheat, reporting on 24 organs and developmental samples was published by Duncan et al. (2017), representing a wide range of spatial and temporal organs, without regard to source and sink specificity. Their work aimed at dissecting total proteome and correlation of transcript and protein data for different wheat tissues. Given the large and hexaploid nature of the wheat genome (Appels et al. 2018), the task of unbiased and accurate identification of proteins based on masses is challenging. Majority of earlier wheat proteomics studies, Duncan et al. (2017) for example, did not use the recent reference genome assembly IWGSC v1.0 (Appels et al. 2018). In addition to the IWGSC v1.0 assembly, advances in mass spectrometry (MS)-based proteomics technique now provide an unprecedented opportunity for accurate identification and quantitation of proteins and assigning them to homologous groups in the wheat genome. We sought to identify proteins that are differentially abundant in different organs that contribute to the development of organs that produce tillers, spikes and kernels. These traits enhance sink strength and are physiologically relevant to the determination of grain yield. We conducted this study by employing liquid chromatographytandem mass spectrometry (LC-MS/MS) analysis.

Plant growth condition
We obtained seed for accession 'PI 495816 Cranbrook' from the National Small Grain Collection, Aberdeen, ID, USA. One hundred seeds of variety 'Cranbrook' were germinated on a tray filled with Metro-mix soil (Sun Gro Horticulture Canada Ltd) in the greenhouse condition, with 25 °C day/22 °C night ± 2 °C temperature and 16 h/8 h day/night photoperiod. Seven days after sowing, each seedling was transplanted into a square pot (10 cm × 10 cm × 15 cm). For nutrients, we used a generalpurpose greenhouse 24-8-16 N-P-K Miracle-Gro (ScottsMiracle-Gro), by dissolving 5.3 g of Miracle-Gro per litre of water. The nutrient solution was applied to the plants, starting from threeleaf stage, at a rate of 100 cm 3 per pot, twice a week.

Experimental design and sample collection
This study was designed to identify proteins associated with the following traits: the total number of grains by means of tillering and grains per spike, and the growth of the developing kernels. Therefore, the experimental design consisted of collecting three biological replicates of organs that are involved in these traits. Each biological replicate consisted of organs collected from three independent plants. We collected seven tissue samples. Leaf sample (LS) was taken at the seedling stage, when the first leaf was fully expanded and the second leaf just emerging (Fig. 1A). For LS, we cut a 3-cm segment from the fully developed leaf. Tiller initiation (TI) was taken at tillering stage (Fig. 1B). Tiller initiation for Cranbrook occurred in our greenhouse when plants were in the three-leaf stage. At this stage, we removed the plant gently from the soil and sampled the budding tiller by excising it from the main stem using a sharp cutter (Fig. 1C). The third sample, spike initiation (SI), was taken during the formation of terminal spikelet after a double-ridge stage in the inflorescence structure. Approximately 4-5 weeks after emergence, when plants were at tillering and before stem elongation, the leaves were gently removed from the plant and the very tip of the terminal meristem was excised by using a laboratory blade cutter under binocular. Ovary (OV) samples were taken before anthesis. The kernel development samples were taken successively at 5, 10 and 15 days after anthesis (DAA), named 5, 10 and 15 DAA. The kernel samples at each time point were first collected in polystyrene weighing dish floated in liquid nitrogen. After enough kernels were collected, the samples were wrapped in aluminium foil, and then flash-frozen in liquid nitrogen. For all the other samples, we wrapped the samples immediately in aluminium foil and flash-frozen in liquid nitrogen. All the sampled tissues were kept in deep freezer until processed in the proteomics facility. No other standards or internal references were used in this study.
In this study, we also monitored accumulation of dry matter in the developing kernels. Unlike the proteomics experiment where three time points were studied, we sampled developing kernels at five developmental stages 5, 10, 15, 20 and 25 DAA. At each time point, three spikes were sampled, kernels from each spike were harvested separately, counted and weighed to obtain fresh weights (W f ). Then, we oven-dried the developing kernels for 24 h at 60 °C and measured the dry weight (W d ). The averages of W d and W f per single kernel were expressed in milligram (mg). The difference between the W f and W d , expressed as percentage, was used as a measure of moisture content.

Protein extraction and LC-MS sample preparation
Biological samples were homogenized in 400 µL of 20 mM Tris-HCl, pH 7.5, 100 mM NaCl, 1 mM EDTA (ethylene diamine tetraacetic acid), 5 % glycerol and 0.5 mM DTT (dithiothreitol) using Precellys® 24 Bead Mill Homogenizer (Berti) at 4487 g for 30-s cycles. The tissue lysates were centrifuged at 20 817 g for 15 min at 4 °C, and the supernatant was transferred to new tubes. The insoluble pellet fractions were solubilized in 400 µL of 8 M urea and were incubated at room temperature for 1 h with continuous mixing. Then, the solution was centrifuged at 20 817 g for 15 min at 4 °C to remove any undissolved pellets and cell debris. Proteins both in soluble and insoluble fractions were precipitated using five volumes (v/v) of cold (−20 °C) acetone and incubated overnight at −20 °C. Precipitated proteins were pelleted by centrifugation at 20 817 g for 15 min at 4 °C. Protein pellets were washed once with 80 % cold (−20 °C) acetone and redissolved in 8 M urea. The bicinchoninic acid (BCA) assay was used to determine protein concentration in each sample with bovine serum albumin (BSA) as a standard (Kapoor et al. 2009).
About 50 µg protein from each sample was reduced by incubating with 10 mM DTT at 37 °C for 45 min, and cysteine alkylated with 20 mM iodoacetamide (IAA) in the dark for 45 min at room temperature. Then, the solution was incubated in 5 mM DTT for 20 min at 37 °C to scavenge residual IAA. Proteins were digested using sequencing-grade trypsin and Lys-C mix from Promega at a 1:25 (w/w) enzyme-to-protein ratio and 37 °C temperature overnight. The digested peptides were cleaned using C18 silica micro-spin columns from The Nest Group Inc. using the manufacturer's protocol. Peptides were eluted using 80 % acetonitrile containing 0.1 % formic acid (FA). The samples were vacuum-dried and resuspended in 3 % acetonitrile and 0.1 % FA. Peptide concentration was determined by BCA assay using BSA as standard. The concentration of peptides was adjusted to 0.2 µg µL −1 , soluble and insoluble samples were mixed and 5 µL (1 µg total peptide) was used for LC-MS/MS analysis as described below.

LC-MS/MS data acquisition
Samples were analysed by reverse-phase LC-ESI-MS/MS system using the Dionex UltiMate 3000 RSLC nano System coupled with the Q Exactive High Field (HF) Hybrid Quadrupole Orbitrap MS (Thermo Fisher Scientific). Peptides were loaded onto a trap column (300 µm internal diameter (ID) × 5 mm) packed with 5 µm 100 Å PepMap C18 medium, and then separated on a reverse-phase column (15 cm long × 75 µm ID) packed with 3 µm 100 Å PepMap C18 silica (Thermo Fisher Scientific). All the MS measurements were performed in the positive ion mode and using 120-min LC gradient. Mobile phase solvent A was 0.1 % FA in water and solvent B was 0.1 % FA in 80 % acetonitrile. Peptides were loaded to the trap column in 100 % buffer A for 5 min at 5 µL min −1 flow rate, and eluted with a linear 80-min gradient of 5-30 % of buffer B, then changing to 45 % of B at 91 min, 100 % of B at 93 min at which point the gradient was held for 7 min before reverting to 95 % of A at 100 min. Peptides were separated from the analytical column at a flow rate of 300 nL min −1 . The column temperature was maintained at 50 °C. The mass spectrometer was operated using standard data-dependent mode. Mass spectrometry data were acquired with a Top20 data-dependent MS/MS scan method. The full-scan MS spectra were collected in the 300-1650 m/z range with a maximum injection time of 30 ms, a resolution of 120 000 at m/z 200, spray voltage of 2 and AGC target of 1 × 10 6 . Three biological samples were analysed for each organ. Fragmentation of precursor ions was performed by high-energy C-trap dissociation (HCD) with the normalized collision energy of 27 eV. The MS/MS scans were acquired at a resolution of 15 000 at m/z 200 with an ion-target value of 1 × 10 5 and a maximum injection of 60 ms. The dynamic exclusion was set at 30 s to avoid repeated scanning of identical peptides. The instrument was calibrated at the start of each batch run and then in every 72 h using a calibration mix solution (Thermo Scientific). The performance of the instrument was also evaluated routinely using Escherichia coli digest.
The IWGSC v1.0 assembly contains 107 891 high-confidence genes. The database search was completed by considering the minimum amino acid length of six, a precursor mass tolerance of 10 ppm and a MS/MS fragment ion tolerance of 20 ppm. Database search was performed with enzyme specificity for trypsin and Lys-C, allowing up to two missed cleavages. Oxidation of methionine (M) was defined as a variable modification, and carbamidomethylation of cysteine (C) was defined as a fixed modification for database searches. The 'unique plus razor peptides' were used for peptide quantitation. The false discovery rate (FDR) of both peptide spectral match (PSM) and proteins identification was set at 0.01. Proteins labelled either as contaminants or reverse hits were removed from the analysis. Similarly, proteins identified without any quantifiable peak (0 intensity) and those identified by a single MS/MS count were also removed from downstream analyses. We only kept proteins that were detected in at least two biological replicates. We also consider proteins with MS/MS counts > 4 for further downstream analyses.
For downstream analysis, we used the InfernoRDN (formerly known as DanTE) bioinformatics software (Polpitiya et al. 2008) to produce a correlation heat map and 3D plot of the first three principal components (PCs). To highlight the number of unique or general proteins across different samples, we drew Venn diagrams using the InteractiVenn (Heberle et al. 2015) online tool. For quantitative abundance analysis, we used two-layer criteria. First, we compared mean abundance values, in log2 scale, by using t-test and screened proteins with adjusted P-value < 0.01. Then, further screened proteins that differed in abundance level by log2 fold change > 3. These criteria allowed identifying differentially abundant proteins (DAPs). To functionally group proteins, we used the Cluster of Orthologous Groups of protein database (COGs, http://www.ncbi.nlm.nih.gov/COG). We also used AgriGO (bioinfo.cau.edu.cn/agriGo/) and wheat Ensembl BLAST (plants.ensembl.org/Triticum_aestivum/Info/index) to make further functional characterization.

Overview of the experiment
Within the framework of yield formation (Fig. 2), it is useful to uncover sets of genes and proteins that are functional for the development of these organs. For example, it is important to know which genes and proteins are involved in the initiation and development of tillers in wheat seedlings. To identify these genes, we analysed wheat organs by using a quantitative proteomics approach. Seven plant organs were sampled ( Fig. 1) including LS, TI, SI, OV, 5 DAA, 10 DAA and 15 DAA. We hypothesized that TI represents enrichment for proteins that are abundant (and therefore the closest approximation of) tiller formation. The SI sample provides enrichment for proteins involved in SI and determination of the number of kernels per spike. The OV and the subsequent three kernel development samples (5, 10 and 15 DAA) provide enrichment of proteins that are abundant during kernel formation and progression towards developmental stages, and, therefore, critical for determination of kernel size and weight. Based on this rationale, we studied these yield forming organs via label-free LC-MS/MS-based proteome quantification according to schematic proteomic workflow presented in Fig. 3 to identify and quantify the intensity of proteins for each sample.

Protein identification across different organs
A total of 15 244 peptides were identified across all samples by LC-MS/MS analysis [see Supporting Information- Table  S1]. After filtering, 3182 proteins were retained and used for downstream analyses [see Supporting Information- Table S2]. The raw MS/MS data were uploaded to MASSIVE Data Repository (https://massive.ucsd.edu/), and are accessible under accession MSV000083833. The largest number of expressed proteins were encoded by genes located on subgenome D (n = 1550), as per IWGSC v1.0, followed by n = 793 proteins were encoded by genes located on subgenome B, and, finally, n = 776 proteins by the genes located on subgenome A. For example, 272 of the proteins were encoded by genes located on chromosome 2D while 95 proteins were encoded by genes located on chromosome 6A. We also identified 63 proteins that matched to the genes on IWGSC v1.0 but not yet associated to any of the chromosomes, and so, are unmapped.
Principal component analysis (PCA) was performed to summarize label-free quantitation (LFQ) measurements into a 3D space (Fig. 4). We applied PCA to all proteins that were quantified in each organ sample. The first PC, which accounted for 44 % of variation, created three groups. The first group, on the far left, had only one organ and which was LS. The second group on the far right included OV, SI and TI. The middle group included the three stages of kernel development, i.e. 5, 10 and 15 DAA. The second PC, accounting for 18.7 % of variation, grouped the organs almost in the same patterns. However, the third PC, which accounted for 11.5 % of variation, mainly separated the SI (on the bottom across the PC3 axis, Fig. 4) from OV and TI (on the top, across the PC3 axis). Altogether, all three components accounted for 74.3 % of the total variation. In addition, the close clustering of the three biological replicates of each organ together, revealed by PC analysis (Fig. 4), indicated the LC-MS/MS analysis and peptide quantification were largely reproducible, which is a prerequisite for accurate label-free protein quantitation.

Unique and shared proteins across different organs
Two Venn diagrams ( Fig. 5A and B) represent the shared proteins in all organs and the unique proteins identified in each organ. The list of genes and their corresponding proteins in each of the sampled tissues were given in Supporting Information- Table S3). The lowest number of proteins was identified in 15 DAA sample (n = 709) and the highest number of proteins was identified in SI sample (n = 1673). For simplicity, data for all three stages of kernel development (5, 10 and 15 DAA) were pooled together and reflected in the first Venn diagram, which represents five groups (Fig. 5A). In total, 296 shared proteins were identified across all these samples. The sample with the greatest number of unique proteins was SI (n = 561). In contrast, OV showed the lowest number (n = 148) of unique proteins. A total of 529 proteins were shared by 5, 10 and 15 DAA (Fig. 5B). The 5 DAA sample had the greatest number (n = 530) of unique proteins and the 10 DAA had the lowest number (n = 50) of unique proteins.
In TI, a total of 231 proteins were uniquely identified, which were enriched for glutathione-S-transferase, lipid transfer proteins, heat shock proteins, defensins, ferredoxin, actin-related proteins, peroxidases, histones, proteasomes and ribosomal proteins. The unique proteins of the SI sample (n = 561) were enriched for RNA processing, ubiquitin-like proteins, heat shock proteins and other transcription factors, as well as several cell division-associated proteins such as mitotic checkpoint protein, mitotic spindle-organizing protein and sister chromatid cohesion protein. Similarly, unique proteins in 5 DAA were enriched for carbohydrate-degrading enzymes and cell division-related proteins such as starch synthase and branching, expansins, defensins, protease inhibitors, subtilisinlike proteins, peroxidases, gibberellin-regulated protein, actinlike proteins and DELLA protein. The 10 DAA unique proteins included acid phosphatase, invertase, storage proteins and a TERMINAL FLOWER 1-like protein. The 15 DAA unique proteins were considerably enriched with storage proteins and, to a lesser extent, with xylanase inhibitors, lipid transfer proteins, thaumatin-like proteins, vicilin-like proteins and defensins. Interestingly, one sucrose synthase protein, grain softness proteins, dimeric α-amylase inhibitors and desiccation-related proteins were also exclusively expressed in kernels sampled at 15 DAA.

Quantitative proteome expression and organ comparisons
Expression profiles in TI and SI (as compared with LS). We compared TI with LS by using a t-test to identify DAPs. There were 200 DAPs with increased abundance and 147 DAPs with decreased abundance in the TI compared to the LS (Table 1; see Supporting  Information-Table S4). Proteins related to translation, ribosomal structure and biogenesis; amino acid transport and metabolism; post-translational modification; and chromatin structure and dynamics were higher and those related to chaperone and heat shock proteins, electron transport, lipid biosynthesis, energy production and conversion; and carbohydrate transport and metabolism were relatively lower in TI compared to LS.
Similarly, we compared SI with LS by using t-test to identify DAPs, where we identified 313 DAPs with enhanced abundance and 206 DAPs with decreased abundance in the SI compared to the LS (Table 1; see Supporting Information- Table S4). The majority of the proteins with enhanced abundance in SI were grouped as translation, post-translational modification, amino acid transport and metabolism, chromatin structure and dynamics, carbohydrate transport and metabolism, and energy production and conversion.

Considerable grain size increase occurred until 10 DAA.
To understand the pattern of kernel development, we measured fresh and dry weights of the grains taken at five developmental stages (5, 10, 15, 20 and 25 DAA). Both fresh and dry weights increased consistently (Fig. 6). The dry weight increased by 3.5-fold between 5 and 25 DAA. The moisture content dropped sharply, particularly beyond 10 DAA. Interestingly, the size of the grain also remained constant beyond 10 DAA. These results could imply that, in the variety tested (Cranbrook), the first 10 DAA was the period of a significant increase in grain size.

Expression profiles in 5, 10 and 15 DAA (as compared with OV).
For the comparison of protein expression levels during kernel development stages, we compared each stage with OV sample, with the same two-step criteria for identifying DAPs. At 5 DAA, a total of 81 DAPs were identified [see Supporting Information- Table S4], of which 57 DAPs showed enhanced abundance levels and 24 DAPs showed decreased abundance levels than the OV (Table 1). About 53 % (30 proteins) of DAPs with enhanced abundance in 5 DAA were grouped as amino acid transport and metabolism, carbohydrate transport and metabolism, and energy production and conversion. Approximately 41 % of the DAPs with decreased abundance in 5 DAA were involved in translation and transcription. At 10 DAA, a total of 35 DAPs were identified (Table 1; see Supporting Information-Table S4), of which 17 DAPs showed enhanced abundance than OV while 18 DAPs showed decreased abundance than OV. About 67 % of DAPs with enhanced abundance in 10 DAA were involved in carbohydrate transport and metabolism, post-translational modification, protein turnover, chaperones, and amino acid transport and metabolism. Approximately 60 % of the DAPs with decreased abundance in 10 DAA were involved in transcription, carbohydrate transport and metabolism, translation, inorganic ion transport and metabolism, and defence mechanisms. In total, 96 DAPs were identified in kernels sampled at 15 DAA compared to OV (Table 1; see Supporting Information-Table S4), of which 67 DAPs showed enhanced abundance levels while 29 DAPs exhibited decreased abundance levels. Approximately 76 % of DAPs with enhanced abundance than OV in 15 DAA were functionally grouped as storage proteins such as glutenin, globulins and gliadins; and carbohydrate transport and metabolism such as β-amylase and α-amylase inhibitors. The DAPs with decreased abundance in 15 DAA compared to OV were involved in translation, ribosomal structure and biogenesis; energy production and conversion; chromatin structure and dynamics; and transcription. Proteins unique to 5, 10 and 15 DAA samples were involved in carbohydrate metabolic process, nutrient reservoir activity, cellulose activity, cell wall organization and fatty acid biosynthesis.

Discussion
Grain yield depends in part on proteins that function in organ biogenesis, development of tillers, spikes and kernels, carbohydrate metabolism and nutrient storage. Expression of genes can be evaluated via transcriptome studies. Proteome analysis can be studied via 2D gel electrophoresis as well as shotgun proteomics approaches. Previous proteomic studies were carried out before the publication of wheat genome assembly (Jiao et al. 2018), and mostly relied on published genomes of closely related species. Therefore, none those studies could be matched with the exact homoelogous genes among the three wheat subgenomes.   (2017) presented the most comprehensive wheat proteomics study to this date. They completed a proteomics search on 24 organs. However, their effort was without regard to any specific phenotypic goal. For example, they produced a large specio-developmental proteomic map, which showed what proteins were expressed in which organ. In contrast, our study targeted organs that are specifically related to GN and KW. Our organ sampling strategy is different from theirs because of inclusion of tillering site and inflorescence initiation site in our study and a continuum spectrum of kernel development from ovary to later stages of kernel development (e.g. 15 DAA). Secondly, for MS/MS search, Duncan et al. (2017) used a protein database containing the protein sequences from the IWGSC1.0 + popseq PGSB/MIPS (Chapman et al. 2015) version 2.2 annotation. We used the IWGSC v1.0 wheat genome assembly published in 2018 (Appels et al. 2018), which is the complete assembly. Lastly, Duncan et al. (2017) used labelfree spectral counts for protein quantification but spectral count-based proteomics is semi-quantitative. It has long been established that protein quantification using MS1 intensity greatly enhances quantitative accuracy and dynamic range compared to spectral counts (Bondarenko et al. 2002). For this reason, in this study, we used label-free MS1 intensity-based quantitative proteomics. We discuss these identified proteins under three categories of organ biogenesis, carbohydrate metabolism and storage proteins. Similar to other studies that are conducted in controlled environment, this data may not be predictive of proteomic responses under natural field conditions. The non-homogenous nature of field, and interaction of the microclimate by unsynchronized growth stages of plants can all be potential cause for the lack of transferability results to field conditions. However, our study provided insight into specific proteins and their corresponding encoding genes (Table 2). These genes, after a thorough study, could be incorporated in elite breeding lines that will be tested for their effect in the actual field condition.

Organ biogenesis
A proliferating cell nuclear antigen (PCNA), TraesCS2D01G361800, on chromosome 2D, was identified with nearly 11-fold and 19-fold higher abundance in TI and SI tissues than in LS, respectively. Proliferating cell nuclear antigen and TEOSINTE BRANCHED1 (TB1) are members of TCP transcription factor family (Li 2015). While increased expression of TB1 restricts the outgrowth of axillary buds into lateral branches in maize and wheat (Doebley et al. 1997;Lewis et al. 2008), members of PCNA were shown to promote cell proliferation and organ growth in rice (Oryza sativa) (Li 2015). Unlike TB1, PCNA acts as a positive regulator of TI. Therefore, we predict that overexpression of this gene could result in increases in final tiller number in wheat. We identified two Argonaute proteins, TraesCS1D01G453600 and TraesCSU01G065300, that were solely expressed in SI sample, and not detected in other samples. Argonaute proteins were shown to play a variety of roles in plant development. As loss of function of the B subgenome ARGONAUTE1d homoeologous was shown to produce shorter spikes and fewer grains per spike than wild-type controls (Feng et al. 2017). Other roles suggested for Argonaute proteins are seed development and grain size and width. For example, AGO802 was shown (Singh et al. 2013) to play an important role in seed development, and preferentially expressed in wheat embryos. The OsAGO17 was shown to positively regulate grain size and weight in rice (Jun et al. 2019).
We also identified four expansin-related genes, three of which, i.e. TraesCS4D01G323800, TraesCS1D01G215400 and TraesCS4B01G3271000, were mainly expressed during kernel development, particularly in the first 10 DAA. Expansins are important proteins for cell wall loosening and cell enlargement (Cosgrove 2000;Lin et al. 2005;Marowa et al. 2016). Lizana et al. (2010) observed that TaExpA6 accumulates in the early grain development period, implying their role in determination of grain size in wheat. Castillo et al. (2018) have shown that expansins are involved in the extension of grain tissue, and their expression is associated with the grain size of sunflower.
Our study identified several proteins present during kernel development and ovary. For example, a cell wall invertase encoded by TraesCS2A01G295400 was found to be highly expressed at early kernel development stage. Invertases, that cleave sucrose into two monosaccharides, mainly appear in two isoforms of vacuolar (acid invertases) and extracellular (cell wall invertases) in plants. Invertases were shown to have a role in the control of cell differentiation and plant development (Silva and Ricardo 1992;Sturm 1999). For example, in a study on water-soluble carbohydrate and remobilization to the grains, significant difference of acid invertase activity was reported between wild and cultivated wheats (Yadhu et al. 2015). Functional analyses have shown that the rice GRAIN INCOMPLETE FILLING 1 gene encodes a cell wall invertase which has a role in carbon partitioning during early grain filling . Wang and Ruan (2012) also proposed the involvement of cell wall invertase in early grain-filling period in cotton. Several other studies have reported the importance of cell wall invertase in kernel development in different cereal crops such as maize (Chourey et al. 2006), wheat (Ma et al. 2012), barley (Weschke et al. 2003) and rice (Hirose et al. 2002;Wang et al. 2008). While our finding is not a new discovery, the exact identification of the subgenome-specific copy of the invertases identified in this study, along with the availability of complete genomic sequences, will allow future functional analyses in wheat.

Carbohydrate metabolism
The last step of kernel growth, after cell division and extension, is starch deposition, and the amount of starch synthesis and deposition is critical to kernel development and size (Edurne et al. 2003). Sucrose is the main transportable carbohydrate into non-photosynthetic organs. The products of sucrose breakdown are crucial for starch biosynthesis in the amyloplast of the sink organ (Edurne et al. 2003). Sucrose that is destined for sink organs  Table 2. Gene ID, annotation, number of peptides per protein, protein score, and Label Free Quantitation (LFQ) averages across the three biological replicates of each tissue type. Tissues are leaf blade (LB), tiller initiation (TI), spike initiation (SI), ovary (OV) before anthesis, and three progressive kernel development stages of five, ten, and fifteen days after anthesis, named as 5DAA, 10DAA, and 15DAA. FDR for protein identification was set to 1%. is either cleaved by sucrose synthase to yield UDP-glucose and fructose, or by invertase to yield glucose and fructose (Spielbauer et al. 2006). Sucrose synthase is a key enzyme in plant sucrose catabolism that enables sucrose mobilization into multiple pathways involved in metabolic, structural and storage functions (Subbaiah et al. 2007). Previously, Subbaiah et al. (2007) isolated three orthologues of Sucrose Synthase 2 (TaSus2) and mapped them to the wheat chromosomes 2A, 2B and 2D. We identified concerted co-expression of sucrose synthase and starch synthase during kernel development. The sucrose synthases encoded by two wheat genes on chromosomes 2A and 2B (Jiang et al. 2011) were at higher abundance during the three kernel development periods than during ovary development, with the highest abundance observed at 15 DAA. Grain endosperm is composed of starch that is synthesized and accumulated during grain-filling period (Dupont and Altenbach 2003;Ma et al. 2014;Huang et al. 2016). A strong relationship was observed between starch biosynthesis and grain yield in wheat and maize (Irshad et al. 2019). The activity of starch synthase was shown to be the highest at 10-20 DAA (Fahy et al. 2018), and that starch accumulation rates are correlated with the activities of key enzymes (Emes et al. 2003;Jiang et al. 2003). Our study identified the expression of several starch synthase enzymes during kernel development. The expression of sucrose and starch synthesis genes were studied in several other crops such as rice, maize and barley (Spielbauer et al. 2006;Hirose et al. 2008;Liu et al. 2008; Barrero-Sicilia et al. 2011). For future follow-up studies, we suggest collecting more biological tissues that allow analysis of starch and other soluble carbohydrates in order to correlate protein expression with starch accumulation.
The starch synthase enzymes synthesize starch that is needed for grain development and growth, and, finally, yield (Parry et al. 2011). While starch synthase synthesizes starch by mobilizing products of sucrose metabolism to sink organs, amylases reverse these reactions to degrade starch and provide energy to growing embryos during and after germination. Interestingly, concurrent with expression of starch synthases, we identified several members of the α-amylase and β-amylase protein families.
There are three possible explanations for this co-expression of starch synthases and amylases. The first possible explanation is that the activity of amylases may require post-translational modification such as phosphorylation and thus protein abundance alone is not sufficient for degradation activity. Perhaps phosphorylation only takes place during germination and not during kernel development. One experimental evidence supporting this explanation is that, Dong et al. (2015), using 2D differential gel electrophoresis, found that β-amylase was phosphorylated in germinating wheat seeds. The second explanation is that starch synthases and amylases may be expressed in tissues that are spatially different and hence do not act antagonistically. Barrero et al. (2013) reported that α-amylase is synthesized by the scutellum and adjacent aleurone, while starch synthase is expressed in endosperm cells. The third explanation is that it has been shown that α-amylase activity increased again during germination to levels much higher than those detected during kernel development (Thévenot et al. 1992). Therefore, it is possible that starch degradation requires enzyme activities much higher than the levels observed during kernel development. Therefore, accumulation of amylases during kernel development, by itself, may not be sufficient to drive starch degradation.

Storage proteins
Mature wheat grains contain 8-20 % protein (Dupont and Altenbach 2003). One group of these proteins, i.e. puroindolines,  Table 2. Continued is responsible for kernel hardness Morris 1997, 1998). Hard grain phenotype results in courser flour with greater abundance of damaged starch granules and their interaction with lipid-binding proteins on the surface of starch granules (Glenn et al. 1991;Giroux and Morris 1997;Bhave and Morris 2008), resulting in absorption of more water compared to grains with soft endosperm (Matus-Cadiz et al. 2008). One aspect of kernel growth and development is the grain texture. For example, the degree of softness or hardness of the endosperm in wheat is controlled by puroindoline-a (Pina-D1), and puroindoline-b (Pinb-D1), which are both located at the Ha locus (Lillemo et al. 2006;Bhave and Morris 2008) on the distal end of the short arm of chromosome 5D. In addition to puroindoline genes, Grain Softness Protein (GSP) is located on Ha locus. However, to date, strong evidence is lacking for its contribution to kernel hardness phenotype. Our study revealed expression of puroindoline and grain softness proteins during kernel development.
In addition to texture-related proteins, albumins, globulins, gliadins and glutenins are major protein components of kernels (Dwivedi et al. 2013). Majority of proteins identified during 15 DAA are related to storage proteins such as glutenins, gliadins and globulins. For example, this study identified increasing trends of expression of 26 gliadins, 12 globulins, 8 avenins, and 12 highand low-molecular-weight glutenins towards later stages (10 and 15 DAA) of kernel development. Albumins and globulins of wheat endosperm represent ~25 % of total grain proteins (Merlino et al. 2009). The dough-making quality of wheat grains is associated with glutenin and gliadin composition and properties (Huebbner and Wall 1976;Bottomley et al. 1982). During dough development, glutenin and gliadin for gluten network, which determine the viscoelastic property of the dough (Sharma et al. 2020). Glutenin properties control dough resistance and extensibility (Eagles et al. 2006), and gluten elasticity and extensibility have significant influence on dough viscoelasticity (Belderok 2000) and breadmaking quality of wheat (Nieto-Talasriz et al. 1994). Gluten consists of high-molecular-weight glutenin subunits (HMW-GS) and lowmolecular-weight glutenin subunits (LMW-GS) (Lindsay et al. 2000). Prolamins are also divided into two classes of monomeric gliadins and polymeric glutenins including both LMW and HMW proteins. These gradually accumulating storage proteins provide necessary nutrients for seed germination and seedling growth, and are the main nutritional components of grains consumed by humans or other animals.

Conclusions
The current study discovered organ-specific expressions of many important wheat proteins that may contribute to tiller and spike formation as well as kernel development. The data resources of this study may be used to support future analysis of selected genes or proteins and determine how they are linked to grain yield. Future trials should follow up with validations on the putative candidate genes for their precise effect on plant development processes such as tillering, spike formation and kernel development that are important to grain yield. The data will be a valuable resource for future studies on wheat systems biology and integrated, and enhancing agronomically important traits in wheat breeding programmes.

Supporting Information
The following additional information is available in the online version of this article- Table S1. List of all peptides identified by liquid chromatography-tandem mass spectrometry (LC-MS/MS) analysis with their mapped proteins (both protein group IDs and leading protein ID), molecular weights, sequence start and end position, posterior error probability (PEP), charge state and intensities across samples are provided. Protein group IDs are the identifier(s) of proteins in the group that are identified by both unique and shared peptides and are sorted by the number of identified peptides in a descending order. Leading protein IDs are the proteins that have at least half of the unique plus shared peptides mapped. Table S2. The intensity of all the proteins identified in the seven sampled tissues and associated encoding genes for the identified proteins. Table S3. The proteins along with the genes coding them which were recorded for each sampled tissue. This information corresponds with the Venn diagrams in Fig. 5. Table S4. Differentially abundant proteins (DAPs) for tiller sample and spike initiation site compared to leaf tissue as well as the three kernel samples (5, 10 and 15 days after anthesis (DAA)) compared to ovary tissue.

Data Availability
The raw MS/MS data were submitted to MASSIVE Data Repository (https://massive.ucsd.edu/), and can be accessible under accession number MSV000083833.

Sources of Funding
Financial support from USDA Hatch grant 1013073 provided via Purdue College of Agriculture.

Conflict of Interest
The authors have no conflict of interest.