Selection for seed size has uneven effects on specialized metabolite abundance in oat (Avena sativa L.)

Abstract Plant breeding strategies to optimize metabolite profiles are necessary to develop health-promoting food crops. In oats (Avena sativa L.), seed metabolites are of interest for their antioxidant properties, yet have not been a direct target of selection in breeding. In a diverse oat germplasm panel spanning a century of breeding, we investigated the degree of variation of these specialized metabolites and how it has been molded by selection for other traits, like yield components. We also ask if these patterns of variation persist in modern breeding pools. Integrating genomic, transcriptomic, metabolomic, and phenotypic analyses for three types of seed specialized metabolites—avenanthramides, avenacins, and avenacosides—we found reduced heritable genetic variation in modern germplasm compared with diverse germplasm, in part due to increased seed size associated with more intensive breeding. Specifically, we found that abundance of avenanthramides increases with seed size, but additional variation is attributable to expression of biosynthetic enzymes. In contrast, avenacoside abundance decreases with seed size and plant breeding intensity. In addition, these different specialized metabolites do not share large-effect loci. Overall, we show that increased seed size associated with intensive plant breeding has uneven effects on the oat seed metabolome, but variation also exists independently of seed size to use in plant breeding. This work broadly contributes to our understanding of how plant breeding has influenced plant traits and tradeoffs between traits (like growth and defense) and the genetic bases of these shifts.


Introduction
Plants produce diverse arrays of specialized metabolites, generating a classification of hundreds of thousands of metabolites (Sorokina et al. 2021), that are nonessential for plant survival and frequently only found in specific plant lineages (Mutwil 2020). Plant specialized metabolites are of interest for their role in biotic and abiotic stress tolerance as well as their implications for human health as nutraceutical compounds (Afendi et al. 2012;Jacobowitz and Weng 2020). Plant breeding efforts to enhance specialized metabolite abundance in crop plants, however, are constrained by resource-intensive metabolomic phenotyping, genotype by environment interactions, and limited understanding of the genetic drivers of phenotypic variation in cultivated germplasm (Soltis and Kliebenstein 2015). While advances in the study of model organisms like Arabidopsis have contributed to our understanding of specialized metabolism (Wager and Li 2018), large-scale studies on metabolomic diversity in cultivated germplasm-like glycoalkaloids in tomato (Solanum lycopersicum L.) (Zhu et al. 2018) and benzoxazinoids in maize (Zea mays L.) (Zhou et al. 2019)-provide information about specialized metabolism limited to specific lineages and in contexts more directly applicable for plant breeding programs. Overall, characterization of genomic variation and strategies to translate this information into widely applicable plant breeding strategies are critical steps to making specialized metabolite composition an accessible goal for plant breeding.
Studying specialized metabolites in cultivated plants in addition to wild progenitors or model organisms is important as specialized metabolite profiles may have also shifted in response to direct selection or indirect selection for other traits, or through genetic drift. While there is a longstanding prediction that cultivated plants would have reduced specialized metabolite concentration as compared with wild-plants (as cultivated plants are more susceptible to biotic stress), there is not a consistent relationship between cultivation status and specialized metabolites across multiple species (Whitehead et al. 2017). Instead, differences in specialized metabolite abundance are frequently observed in distinct breeding pools and pedigrees. For instance, divergence in volatiles has been noted in roots of maize (Rasmann et al. 2005), and leaves in cranberry (Rodriguez-Saona et al. 2011) and there is variation in leaf glucosinolates in cultivated Brassicas (Poelman et al. 2008). For plant breeders, insight into how selection processes affected specialized metabolites can provide a basis for ongoing work and germplasm selection for breeding efforts.
We explored existing variation of specialized metabolites in oats (Avena sativa L.) and how the metabolomic profile has been shaped by plant breeding. Oats were domesticated from weedy progenitors (Loskutov 2008) and, like other cereal crops, domesticated oats have increased seed size compared with wild species (Preece et al. 2017). Oats are used as livestock feed and have been an important part of human diet in some parts of Europe since before the Renaissance (Murphy and Hoffman 1992). The nutraceutical benefits of fiber, skin soothing and general health promotion of oats were also noted in the first century CE by Dioscorides (Murphy and Hoffman 1992). Today, oats are still known as a healthy whole grain (Singh et al. 2013;Stewart and McDougall 2014), with high concentrations of unsaturated fats (Carlson et al. 2019) and heart health-promoting b-glucans (Newell et al. 2012). Both have been the subject of plant breeding efforts, but yield and disease resistance are still predominant traits of interest for plant breeding (Haikka et al. 2020;Gonzá lez-Barrios et al. 2021). In addition to these health-promoting compounds, oat seeds contain multiple specialized metabolites (Sang and Chu 2017) but, to the best of our knowledge, these metabolites have not been a direct target of selection. With this history, we predict that oat specialized metabolites may have been subject to genetic drift or indirect selection processes (e.g., for seed traits or disease resistance) leading to changes in patterns of variation. Characterizing the genetic bases of variation will provide a starting point for plant breeding.
We focused on three types of specialized metabolites in oat seed: avenanthramides, and the saponins avenacins and avenacosides. Avenanthramides are in highest concentration in the outer layers of the seed, most notably the aleurone layer (Liu and Wise 2021), while the saponin avenacosides are concentrated in the endosperm (Ö nning et al. 1993). Avenanthramides have antioxidant properties (Meydani 2009;Sang and Chu 2017) that are retained through processing of oats into many consumer products (Pridal et al. 2018). The committed enzymes of avenanthramide biosynthesis have been characterized, and it is wellestablished that avenanthramides are the result of condensation between phenolic acids and anthranilic acid, products of different branches of aromatic amino acid biosynthesis (Collins 2011;Wise 2014;Li et al. 2019). Avenanthramides are associated with resistance to crown rust (pathogen Puccinia coronata f. sp. avenae) (Wise et al. 2008;Wise 2014), and demonstrate variation in response to the environment (Emmons and Peterson 2001;Peterson et al. 2005;Redaelli et al. 2016;Michels et al. 2020). The avenacins and avenacosides are both saponins that have been implicated in reducing plant fungal infections and in lowering cholesterol when consumed, but have received less attention for research and breeding (Sang and Chu 2017). Core biosynthetic genes for avenacin biosynthesis have been identified in roots of the noncultivated species, Avena strigosa (Kemen et al. 2014;Leveau et al. 2019), but whether variation in expression of these genes affects abundance in seed tissues of cultivated oat remains unknown.
Knowledge of biochemical pathways is a crucial foundation but, for plant breeding, it is important to further investigate whether variants that affect enzyme activity, or regulation, or pathway flux, or metabolite transport contribute to the observed phenotypic variation (Soltis and Kliebenstein 2015). While loss of function mutations in biosynthetic enzymes are observed and employed by breeders for specialized metabolites in some crops [e.g., Pun1 mutation prevents capsaicin production in pepper (Stewart et al. 2005)], mutations in regulatory elements are critical in others [e.g., transcription factor Bt mediates cucurbitacin accumulation in cucumber (Shang et al. 2014)]. For oats, there is experimental evidence that avenanthramides increase in response to activation of systemic acquired resistance (SAR) (salicylic acid mediated defense) (Wise 2011(Wise , 2017Wise et al. 2016), and degree of induction varies between oat genotypes (Wise et al. 2016), suggesting that regulatory variants could be an important target for selection. While expression of key biosynthetic enzymes has been profiled (Dimberg and Peterson 2009;Wise 2017), there has not been a genome-wide association study (GWAS) or transcriptome-wide association study (TWAS) to identify novel genes. We are not aware of comparable studies of saponins. In other crops, integrated genomic, transcriptomic, and metabolomic analyses have been critical in understanding metabolic profiles. For instance, concomitant changes in fruit metabolome and fruit size have been characterized in tomatoes (Zhu et al. 2018).
We sought to integrate oat seed metabolomic, transcriptomic and genomic data to characterize genetic variation contributing to specialized metabolite abundance in oat seed. We also measured oat seed size to evaluate if selection on that yield component has affected specialized metabolite profiles, as some dimensions of seed shape are negatively correlated with the healthful compound b-glucan, in oat (Zimmer et al. 2021). Using a diverse germplasm panel that includes oat varieties developed beginning in 1920 and an elite germplasm panel, we measured whole seed metabolome phenotypes and seed size and weight traits. In the diverse germplasm panel, we also conducted transcriptome sequencing of developing seed. We hypothesized that heritable genetic variation is greater in the diversity than the elite panel, and examined the relationship between seed traits and specialized metabolites in both of these panels. We also investigated the relative roles of variation in regulation and known biosynthetic enzyme pathway genes in mediating metabolite variance. To test these predictions, we conducted a GWAS and TWAS, respectively, and eQTL analysis for metabolites and seed traits. Overall, this work provides insight into breeding for oat specialized metabolites and more broadly adds to our foundation of how the relative contributions of genetic variation in regulation or direct biosynthesis shapes phenotypic variation of specialized metabolites in crop plants.

Oat germplasm
We used two germplasm panels of inbred lines, a diversity panel intended to capture genetic diversity in cultivated oats and an elite panel consisting of lines selected from the North American uniform oat performance nursery. These germplasm panels have been previously described in Campbell et al. (2021a) and Hu et al. (2021). In the diversity panel, there were 368 entry genotypes (inbred lines) and seven check genotypes planted in an augmented design in plots at Ithaca, New York, United States in 2018. Six genotypes that lacked both genotyping data and gene expression data were removed from our analysis. The elite panel consisted of inbred lines and was evaluated in three northern US environments (Minnesota, "MN"; South Dakota, "SD"; Wisconsin, "WI") in 2017 in plots in an augmented design with 232 entries and three checks. Crown rust was not detected in MN or SD and had low incidence in WI, and was not recorded for the diversity panel. Nineteen entries were included in both the diversity and elite panels and were removed from the elite panel analyses to compare independent sets of germplasm.

Oat seed secondary metabolite phenotypes
We profiled the seed metabolome in the oat diversity and elite panels. Details of extraction and processing of these samples has been previously described (Campbell et al. 2021b;Hu et al. 2021) and is provided here in Supplementary Method S1. The extractions and measurements were conducted at the Bioanalysis and Omics Center of the Analytical Resources Core ("ARC-BIO"), at Colorado State University (Fort Collins, CO, USA). Briefly, 50 seeds were dehulled, homogenized, and extracted using a biphasic extraction method to separate polar and nonpolar compounds. Chromatography analysis of the polar compounds (aqueous layer) was done using a Waters Acquity UPLC system with a Waters Acquity UPLC CSH Phenyl Hexyl column (1.7 lM, 1.0 Â 100 mm) and a Waters Xevo G2 TOF-MS with an electrospray source in positive mode. Mass features were annotated by first searching against an in-house spectra and retention time database using RAMSearch (Broeckling et al. 2016) and then by using MSFinder (Tsugawa et al. 2016). The final phenotype reported was the relative signal intensity (relative concentration) of each metabolite. Names and spectra of the specialized metabolites are given in Supplementary Table S1. The mass spectra of the specialized metabolites were positively annotated by these methods in the diversity panel, which was analyzed in 2018. Many of the specialized metabolites were also annotated in the elite panel (measured in 2017), and missing annotations were completed by comparing spectra to the diversity panel and published mass spectra for avenanthramides (de Bruijn et al. 2019), avenacins (Leveau et al. 2019), and avenacosides (Bahraminejad et al. 2008).
Best linear unbiased predictions (BLUPs) were calculated for each metabolite for the diversity panel, and separately for each environment of the elite panel. To account for skew, data were log2 transformed. Then, relative concentration of each metabolite was modeled with a linear mixed model in R (R Core Team 2016) with lme4 (Bates et al. 2015). For each metabolite, there were fixed effects of whether the genotype was a replicated check and days to heading ("DTH") as a numeric covariate, and random effects of experimental block, batch in which the sample was run on the LCMS, and genotype. Outliers were defined as having studentized residual >3 and were removed, and the model was recalculated. Effect significance of the DTH covariate is shown in (Supplementary Table S2). The BLUPs were then deregressed (Garrick et al. 2009). The deregressed BLUPs (drBLUPs) were used in all following analyses. Pearson's correlations were estimated between phenotypes using the "cor.test" function in R.

Oat seed size and mass phenotypes
After dehulling, 50 seeds were used for evaluating seed length, width, and height. The seeds were scanned with a 2D scanner, where seed length and width were extracted with the software WinSeedle (Regent Instrument Canada Inc., version 2017). Seed height was measured separately using an electronic caliper manually with accuracy of 0.01 mm. Seed length and width measurements are not available from the elite panel that was evaluated in South Dakota. Seed volume was estimated as an ellipsoid (Clohessy et al. 2018), and surface area of an ellipsoid was estimated by S % 4p *((lw) 1.6 þ(lh) 1.6 þ(wh) 1.6 ))/3) (1/1.6) . Separately, 100 hand dehulled seeds and their respective hulls were weighed (hundred kernel weight, "HKW" and hundred hull weight, "HHW," respectively) and the percent groat (kernel) was calculated as the percent of total (kernel plus hull) weight. A summary of the raw measured seed size and mass values (mean, standard error, and range) is presented in Supplementary Table S3. Deregressed BLUPs were then calculated from untransformed values in the same manner as the metabolites (above) for use in further analyses. The relationship between drBLUPs of seed traits and metabolites was modeled with a linear model and effect significance was tested by ANOVA.

Oat variety release year
We conducted an extensive literature search to determine the year of variety release for as many varieties in the diversity panel as possible. Most varieties were identified from information on USDA GRIN (https://npgsweb.ars-grin.gov), some in Triticeae Toolbox (https://triticeaetoolbox.org/POOL), others in the United States (https://apps.ams.usda.gov/), Canada (https://www.inspec tion.gc.ca/english/plaveg/pbrpov/cropreport/oat), or Europe (https://ec.europa.eu/food/plant/plant_propagation_material/plant_ variety_catalogues_databases/) plant registrations, and finally as published variety releases. In sum, we identified the year of variety release for 155 varieties (Supplementary Table S4).

Genotyping and GWAS
Genotyping-by-sequencing data was retrieved from T3/Oat (https://oat.triticeaetoolbox.org/), filtered to remove markers with more than 60% missingness and markers with a minor allele frequency of <0.02, and then imputed using the glmnet function (Friedman et al. 2010) in R. Overall, there were 73,527 markers, of which 54,284 could be anchored to the genome (PepsiCO OT3098v1; https://wheat.pw.usda.gov/GG3/graingenes_down loads/oat-ot3098-pepsico). All 54,284 SNPs were used for the diversity panel, and 54,219 SNPs were used for the elite panel after these imputed SNPs were again filtered by minor allele frequency. Kinship matrices were calculated for the diversity and elite panels with their SNPs using the "A.mat" function, and genomic heritability (de los Campos et al. 2015) was calculated from variance components extracted from the "kin.blup" function in rrBLUP (Endelman 2011). Genetic correlations were calculated in sommer using the "mmer" and "cov2cor "functions (Covarrubias-Pazaran 2016). Principal components to use as covariates to account for population structure were calculated using the "prcomp" function in R. The first 25 PCs were calculated, and the scree plot was visually examined to determine the number of PCs to use in future analyses (Supplementary Figure S1). Five PCs were chosen for the diversity panel and four PCs were chosen for the elite panel. A genome-wide association study (GWAS) was conducted for each phenotype (drBLUP) in statgenGWAS (Rossum and Kruijer 2020) using the PCs as covariates and the kinship matrix. For GWAS results, P-values were adjusted with a Bonferroni correction on a per-trait basis and SNPs with a P Bonf < 0.05 were considered significant. To determine if any results colocalized with known QTL for crown rust, crown rust QTL were recorded from recent publications and mapped to the latest genome version (Supplementary Table S5) (Lin et al. 2014;Babiker et al. 2015;McNish et al. 2020;Zhao et al. 2020).

Transcriptome analyses of oat diversity panel
Developing oat seed tissue was dissected, and RNA was extracted using a hot borate protocol at 23 DAA as this time point showed slightly higher correlation between transcript and relative concentration of metabolites than other sampled developmental time points (Hu et al. 2020). RNAseq reads were aligned to the oat transcriptome using Salmon v0.12 (Patro et al. 2017) and transformed using variance stabilizing transformation in DESeq2 (Love et al. 2014) as described by Hu et al. (2020). For these analyses, we removed all transcripts expressed in fewer than 50% of samples as these are not useful for TWAS, leaving 54% of the original set (29,385). We examined the median absolute deviance of these transcripts to look for outliers and none exceeded a cutoff of MAD > 10. Deregressed BLUPs were then calculated in sommer (Covarrubias-Pazaran 2016) using the "mmer" function. For each transcript, there were fixed effects of whether the genotype was a replicated check, the plate in which RNA was extracted from, and DTH as a numeric covariate, and random effects of experimental block and genotype. In all, 22,638 transcripts had converged drBLUPs (nonzero heritability). To remove any additional factors associated with experimental design, we ran probabilistic estimation of expression residuals (PEER) and found that k ¼ 5 factors (determined by visual examination of scree plot) was sufficient (Supplementary Figure S2).
We then conducted a TWAS and enrichment analyses. We used the transcript PEER residuals and a kinship matrix, as well as five genomic PCs as covariates for TWAS on the metabolite and seed trait drBLUPs. We implemented TWAS using the "createGData" and "runSingleTraitGwas" functions in the statgenGWAS package (Rossum and Kruijer 2020). P-values were adjusted per trait using a false discovery rate adjustment, and transcripts with P FDR < 0.05 were considered significant. The adjusted P-values for all transcripts were used in gene ontology (GO) enrichment analysis for each of the phenotypes for biological processes GO terms. Enrichment analysis was implemented in the R package topGO, where significance was determined based on the default "weight01" algorithm followed by a Fisher test (Alexa and Rahnenfuhrer 2016). Finally, transcripts had previously been assigned to temporally covarying groups (Hu et al. 2020) and these annotations were used to assign transcripts by date (8, 13, or 18 DAA) and direction (up or down) that expression pattern shifted. Those that changed on multiple dates were split into the two respective days. We tested for enrichment of any temporal and direction class using a hypergeometric test with the "phyper" function in R.
We also identified transcripts associated with the avenanthramide biosynthetic pathway (beginning at PAL) and the preceding shikimate pathway using Ensemble Enzyme Prediction Pipeline (E2P2) annotations (Chae et al. 2014) of transcripts (Supplementary Table S6).

eQTL analysis
We implemented eQTL analysis in Matrix eQTL (Shabalin 2012) in R using the PEER residuals for transcript counts and with five genomic PCs as covariates. SNPs were defined as significant eQTL at a threshold of P FDR < 0.05 per transcript. As only half of the transcripts are mapped, we did not differentiate between cis and trans eQTL, although future genome and transcriptome assemblies will facilitate this analysis.

Results
Heritability and correlations of specialized metabolites in oat seed Specialized metabolites (avenanthramides, "AVNs"; avenacins, "AECs"; avenacosides, "AOSs") were measured in seeds of a diverse germplasm panel evaluated in one environment and an elite set of oat germplasm evaluated in three environments. Genomic heritability was low to moderate for most metabolites, and some metabolites had heritability <0.05 (Figure 1). In general, there was a strong degree of phenotypic and genetic correlation within metabolite groups (e.g., within AVNs) across populations and environments, with the exception of avenacoside B (AOS_B) (Figure 1). In the diversity panel, the phenolic AVNs tended to have negative phenotypic and genetic correlations with both saponins (AEC and AOS), while AECs and AOSs were positively correlated ( Figure 1A). This trend was less pronounced in the elite population phenotypes in most environments (Figure 1, B-D). While there was still strong within-group correlation, there were no significant negative phenotypic correlations between phenolics and saponins. Specialized metabolites were significantly positively correlated between environments in the elite panel (Supplementary Table S7).

Relationship between seed traits and specialized metabolites
We examined seed size traits in dehulled seeds (volume, surface area, and surface area to volume ratio), as well as kernel and hull weight and percent groat (kernel). In general, heritability of the seed traits was greater than those of the specialized metabolites (Supplementary Table S8) and seed volume was used for further analyses (diversity panel h 2 ¼ 0.72; elite panel Minnesota h 2 ¼ 0.50; elite panel Wisconsin h 2 ¼ 0.33). There were significant relationships between some metabolites and seed size (Figure 2; Supplementary Table S9) and seed weight (Supplementary Table  S10). In both the diversity and elite panel, relative concentration of AVNs (present in outer seed layers) increased with seed size, despite the decreased surface area to volume ratio. There was no relationship between avenacins and seed size except as measured in the elite panel in WI. Finally, relative concentration of AOSs (concentrated in the inner endosperm) decreased with seed size in the diversity panel but had no relationship to seed size in the elite panel. This relationship was further confirmed by examining the genetic correlation between seed traits and the specialized metabolites. In the diversity panel, there was strong positive genetic correlation between seed volume, seed surface area and HKW and AVNs (>0.70), negative correlation with AOSs (< À0.23) and essentially no correlation with AECs (between 0 and À0.12). This relationship was less consistent when examined in the elite panel. There were also not consistent patterns between percent groat and metabolite traits in any panel or location (Supplementary Table S11).

Effect of breeding intensity on metabolites and seed traits
Using year of variety release as a proxy for plant breeding intensity (where later years indicate more intensive breeding efforts), we tested if breeding intensity affected seed size or metabolites in the individuals in the diversity panel for which these data are available (phenotypes and year information is available for 138-146 individuals per trait; Supplementary Table S12). Seed volume increased over time and, correspondingly, seed surface area increased and the surface area to volume ratio decreased (Figure 3, A-C). Both HKW and HHW also increased over time, but percent groat (kernel) remained constant (Figure 3, D-F). Of the specialized metabolites, the relative concentration of AOSs decreased over time, but AVNs and AECs were unaffected (Figure 3, G-I). Using multiple regression with year and seed volume as predictors for groat percentage and the specialized metabolites, the regression coefficient for year was not significantly different from zero for any metabolite (Supplementary  Table S13). These results indicate that while seed size was likely a target of selection as a yield component that had indirect effects on the seed metabolome composition, factors independent of size and breeding intensity also contributed to the observed metabolome variation.

Genome-wide association study
Single-trait GWAS was conducted for each of the specialized metabolites and seed traits in the diversity panel and each environment of the elite panel. Few metabolite traits had SNPs below a significance threshold of P Bonferroni < 0.05 (Table  1; Supplementary Figure S3). No seed size traits had a significant GWAS result, except percent groat in one environment of the elite panel (Table 1; Supplementary Figure S3). None of these eleven significant SNPs were within genes (all genes within 6100 kb of the SNPs are presented in Supplementary Table S14).
The significant GWAS results for AVN_A in the diversity panel on chromosome 3A did not colocalize with known QTL for resistance to crown rust (Lin et al. 2014;Babiker et al. 2015;McNish et al. 2020;Zhao et al. 2020) (Supplementary Table S2), despite the previously reported relationships between AVN concentration and crown rust resistance.
To visualize genomic regions relevant for metabolite and seed traits and determine if there is shared genetic control between traits, populations or environments, we examined all SNPs that met a reduced significance threshold of P FDR < 0.20 and plotted them in 10 Mb bins (Figure 4). Within population and environment (e.g., elite panel in Minnesota), there were no shared SNPs between any two or more traits (e.g., between AVNs and seed size), indicating that the metabolite and seed traits do not have common large effect loci. Within AVNs, only results from the diversity panel met this threshold ( Figure 4A). There were multiple points of overlap between environments and panels for AECs, with the highest count of shared SNPs on 5A (elite Minnesota, diversity panel) and 5C (elite Minnesota, elite Wisconsin, diversity panel) (Figure 4B), and there were consistent SNPs identified for AOSs between the elite panel evaluated in Minnesota and South Dakota on chromosomes 1C and 4A ( Figure 4C). However, the regions identified for seed size traits in the elite panel and the diversity panel were not shared ( Figure 4D). As different genomic regions were implicated between panels and environments for the same trait, these results indicate genetic heterogeneity between panels and genotype-byenvironment interactions.

Transcriptome analyses
A TWAS was conducted for each of the specialized metabolites in the diversity panel to assess the relationship between gene expression and metabolite relative concentration. Of these, both AVNs had significant (P FDR < 0.05) TWAS results (72 for each AVN_A and AVN_B), with 51 shared and expression of most of these shared transcripts (50) positively correlated with increased AVNs (Table 2;  Supplementary Table S15). Of these, phenylalanine ammonia-lyase ("PAL," TRINITY_DN26560_c0_g2_i1), the first committed enzyme of phenylpropanoid biosynthesis and phosphoenolpyruvate/phosphate translocator 1 (TRINITY_DN1581_c0_g1_i3), an enzyme in the pentose-phosphate pathway, a pathway that precedes the shikimate pathway, could be connected to biosynthesis. The other specialized metabolites had few significant TWAS results (Table 3): the two AECs shared four significant transcripts and only AOS_B had a significant result. No significant transcripts were detected for any seed traits, even at a less stringent cutoff (P FDR < 0.25).
To better understand the biological relevance of the rest of the transcripts, GO enrichment analysis was conducted on the falsediscovery rate adjusted P-values. While only AVN_B had a significantly enriched term after multiple test correction (pentosephosphate shunt, GO: 0006098), GO terms related to the shikimate biosynthesis (chorismate biosynthetic process, GO: 0009423) and L-phenylalanine catabolic processes (GO: 0006559) were top GO terms for both AVNs (Table 4). There was no significant enrichment of GO terms for either the AECs (Supplementary  Table S16) or AOSs (Supplementary Table S17).
We also examined how expression of the significant AVN TWAS transcripts we identified here at 23 DAA changed over seed development. Developing oat seed transcripts were previously categorized into temporally covarying groups (Hu et al. 2020) and we found that significant transcripts from AVN TWAS Figure 2 Relationship between specialized metabolites and seed size in the diversity panel (evaluated only in New York) and elite panel evaluated in Minnesota ("MN") and Wisconsin ("WI"). Data are not available for the elite panel evaluated in South Dakota. For each metabolite class, an example was chosen where "Avenanthramide" refers to avenanthramide B, "Avenacin" refers to avenacin A1.1, and "Avenacoside" refers to avenacoside A (Supplementary  Table S1). Model results for all metabolites are presented in Supplementary Table S9. The ***P < 1EÀ6, **P < 1EÀ3, and "ns" indicates P > 0.05.
were enriched for transcripts that had a trajectory of increased expression beginning at 8 days after anthesis (DAA) when compared with all transcripts (hypergeometric test, AVN_A: P ¼ 2.39EÀ13, AVN_B: P ¼ 2.82EÀ10). In contrast, there was only weak evidence for enrichment of any transcript class in known avenanthramide biosynthetic enzymes (hypergeometric test, decrease in expression at 8DAA, P ¼ 0.049) or Shikimate pathway enzymes (hypergeometric test, decrease in expression at 13DAA, P ¼ 0.062) at any time point ( Figure 5).
Finally, we tested if seed volume corresponded to expression of AVN TWAS results to determine if there was expression variation independent of seed volume that could be a target of selection. Seed size was less predictive of TWAS gene expression than the phenotype (AVN_B) as measured by coefficient of determination (Supplementary Table S18). For instance, PAL and phosphoenolpyruvate/phosphate translocator 1 were not strongly associated with seed volume (Figure 6). These results indicate that while relative concentration of AVN tracks with seed volume Figure 3 Relationship between year of variety release and deregressed BLUPs of (A) seed volume, (B) seed surface area, (C) seed surface area to volume ratio, (D) HKW, (E) HHW, (F) groat percent, (G) avenanthramide, (H) an avenacin, and (I) an avenacoside in the diversity panel. For each metabolite class, an example was chosen where "Avenanthramide" refers to avenanthramide B, "Avenacin" refers to avenacin A1.1, and "Avenacoside" refers to avenacoside A (Supplementary Table S1). Model results for all traits are presented in Supplementary Table S12. ***P < 1EÀ6, **P < 1EÀ3, *P< 0.05, and "ns" indicates P > 0.05. The diversity panel was evaluated in only one environment (NY, United States). The P-value is adjusted with a Bonferroni correction.
a Trait names are defined as follows: avenanthramide A, "AVN_A"; avenacin A1, "AEC_A1.1" and 26-Desglucoavenacoside A, "AOS_dA"; hundred kernel weight, "HKW"; groat percentage "GP." and gene expression, gene expression is not strongly linked to seed volume, and thus gene expression is an independent contributor to patterns of variation in AVN abundance.

eQTL analysis
Because we predicted that expression variation is important for oat specialized metabolites, especially AVNs, we conducted eQTL analysis on genes detected in TWAS and on known pathway genes and examined if those eQTL colocalized with our GWAS results. Two AVN TWAS results had eQTL at a P FDR <0.05 threshold, TRINITY_DN1008_c0_g2_i2 a serine hydroxymethyltransferase 4, and TRINITY_DN13684_c0_g1_i1 a mitochondrial aconitate hydratase 3. These two genes neither colocalized with the AVN GWAS result nor were definitively annotated to a single position in the oat genome. Relaxing the significance threshold to P FDR < 0.2 revealed eQTL of four additional genes (Supplementary Figure  S4), but the eQTLs detected on chromosome 3A were not in LD with the GWAS result (r 2 < 0.02 for all).

Discussion
Oat (Avena sativa L.) is a cereal crop with known health benefits from consuming the grain or through topical skincare application. These benefits are derived from a diverse suite of metabolites, including unsaturated fatty acids and b-glucans as well as the specialized avenanthramides, avenacins and avenacosides. We characterized the genomic and transcriptomic bases of specialized metabolite variation in diverse and elite oat germplasm in the context of seed size and selection over a century of oat breeding. We found that heritable genetic variation is diminished in elite germplasm, but selection for larger seeds only accounts for part of that reduction. For avenanthramides in particular, we found in addition to increased abundance in larger seeds, there was also variation in biosynthetic enzymes upstream of the committed pathway enzymes that contributed to phenotypic variation. Broadly, this work contributes to our understanding of how crop breeding has shaped specialized metabolome profiles, and prospects for continued plant breeding.

Historical dimensions of oat specialized metabolism and change in seed size
Specialized metabolites serve multiple purposes in plants, with one prominent use being plant defense against biotic stresses (Mithö fer and Boland 2012; Kessler and Kalske 2018; Jacobowitz and Weng 2020). The relationship between plant domestication and breeding, resistance to biotic stress, and specialized metabolites has been widely examined to understand how plant selection has shaped agro-ecological interactions. Most work has been conducted comparing wild and domesticated plants, and has found that cultivated plants are more susceptible to biotic stress than their wild progenitors (Turcotte et al. 2014;Whitehead et al. 2017;Fernandez et al. 2021). A concomitant decrease in secondary metabolites, however, has not been consistently observed (Whitehead et al. 2017). Instead, tradeoffs between plant growth and defense (Whitehead and Poveda 2019) or plant nutrition (Fernandez et al. 2021) may be important factors. The studies that interrogate a spectrum of plant breeding intensity from domestication to landraces to modern varieties have used less than 25 accessions each, and have produced mixed results where some find a decrease in resistance with breeding intensity (Rosenthal and Dirzo 1997  A full list of all significant transcripts is in Supplementary Table S15. Rank refers to overall transcript significance in TWAS analysis, and effect refers to the direction of correlation between expression and relative concentration of avenanthramide. associated reduced biotic stress resistance with reduced metabolite diversity, but not absolute metabolite concentrations. Overall, these findings indicate that there are nuanced cropspecific patterns in how breeding has shaped specialized metabolites (and plant defense), but there is a need for work that includes a greater number of plant accessions and a finer-scale gradient of plant breeding intensity. In our work, we surveyed oats spanning almost a century of plant breeding-beginning with the rediscovery of Mendel in the early 20th century to genomics-enabled breeding in the 21st century. Yield has consistently been a trait of plant breeding interest, where yield gains were observed throughout the 20th century (Rodgers et al. 1983) and yield is still a focus of current breeding programs (Haikka et al. 2020;Gonzá lez-Barrios et al. 2021). We examined the relationship between breeding intensity (by year of variety release), seed size, and defensive metabolites in more than 138 individuals. We found that more intensive breeding led to larger oat seeds, but not a greater proportion of edible tissue (groat) and, while relative concentrations of specialized metabolites were tied to seed size, they were not a direct target of plant breeding. We found that larger seeds had high avenanthramide abundance, despite decreased surface area to volume ratio inherent to larger seeds, but there was no relationship with breeding intensity. In contrast, avenacoside abundance decreased with increasing seed size associated with breeding intensity, despite larger endosperm volume. These results indicate that there are not consistent tradeoffs between growth (seed size) and defense (avenanthramides, avenacosides). Further, we found that ongoing plant breeding did not uniformly reduce or increase plant specialized metabolites but may have affected size of and concentration of metabolites in specific seed tissues (like the aleurone layer).

Breeding for oat avenanthramides
Of the oat seed specialized metabolites, avenanthramides have garnered the most research interest. Avenanthramides are antioxidants (Bratt et al. 2003) and have been implicated in resistance to the oat crown rust (Wise et al. 2008;Wise 2014). The avenanthramide biosynthetic pathway has been defined (Collins 2011;Wise 2014;Li et al. 2019), yet this work has not been translated into tools for oat breeders, like molecular markers. Critically, it remains unknown whether functional or regulatory mutations in the committed biosynthetic pathway enzymes (enzymes specific to avenanthramide biosynthesis) or upstream biosynthetic pathway enzymes (not specific to avenanthramide biosynthesis) are the most significant contributors to heritable variation in cultivated oats. Neither our GWAS nor TWAS results implicated committed pathway genes. Instead, TWAS revealed that biosynthetically upstream enzymes expressed early in seed development contributed to avenanthramide abundance. In addition, we found that an eQTL of a biosynthetically upstream enzyme colocalized with our avenanthramide GWAS result. While our interpretation and enrichment analyses were limited by availability of transcript annotations (which, likely, are more complete for highly conserved, rather than oat-specific, genes) these results nonetheless suggest that regulation of or flux through the pathway may be a promising avenue for plant breeding. Dimberg and Peterson (2009) examined the relationship between avenanthramides and compounds that are precursors or derived from other branches of related biosynthetic pathways. Their results did not offer a straightforward indication of which biosynthetic step moderates pathway flux; instead, PAL expression neither depended upon the amount of its substrate (phenylalanine) nor affected expression of HHT (the terminal enzyme in avenanthramide biosynthesis). Our results implicate PAL expression as important for avenanthramide abundance, as well as a phosphoenolpyruvate translocator in the pentose phosphate pathway, and other transcripts of unknown function. These results add to the widely recognized importance of PAL expression as a regulator of flux in phenylpropanoid biosynthesis (Huang et al. 2010;Kim and Hwang 2014;Barros and Dixon 2020). In addition, a broader examination of precursor metabolites, including those in the pentose phosphate pathway may produce interesting results as diversification of enzymes from primary metabolism is important for contributing to specialized metabolism diversity (Moghe and Last 2015;Maeda 2019). Overall, our results should prompt future work on avenanthramides to focus on upstream biosynthetic processes, as most variation affecting avenanthramides appears to be in enzymes preceding committed biosynthetic steps. Table 3 Significant transcripts (P FDR < 0.05) from TWAS of avenacins (AEC) and avenacosides (AOS) where rank refers to overall transcript significance in TWAS analysis, and effect refers to the direction of correlation between expression and relative metabolite concentration  The P-values are unadjusted and the * indicates that it is significant when adjusted for a false discovery rate.

Figure 5
Oat seed transcripts classified by temporal variant category (day after anthesis at which expression of a particular gene substantially changes during seed development) and direction (increase or decrease) as described in Hu et al. (2020), and shown by color here. The percent of transcripts in each category is shown for all transcripts in the dataset ("all"), transcripts annotated to be part of the preceding shikimate pathway ("Shik_Pwy"), transcripts annotated in avenanthramide biosynthesis ("Avn_Pwy"), and both avenanthramides (AVN_A and AVN_B). The numbers at the top indicate the number of transcripts that were annotated by temporal group.

Figure 6
The relationship and coefficient of determination between expression of (A) phenylalanine ammonia-lyase, "PAL" and (C) phosphoenolpyruvate/phosphate translocator 1, "PEPT" and avenanthramide B ("AvnB") concentration and the relationship between seed volume and (B) PAL and (D) PEPT expression. The relationship between avenanthramide B and all significant TWAS results are given in Supplementary Table S18.
Our results also contribute to an understanding of when avenanthramide biosynthesis occurs in oat seeds. Avenanthramides are detected as early as 8 DAA, and while Hu et al. (2020) found that HHT is expressed at 8 DAA, Peterson and Dimberg (2008) did not observe expression until 20 DAA. By sampling gene expression at only 23 DAA, we likely sampled at a time where it would be possible to detect differences in HHT expression, but we may have missed peak differential expression of upstream enzymes that contributed pathway flux. Our avenanthramide TWAS results were enriched for genes that were expressed early in seed development (8 DAA), and Hu et al. (2020) found that two other pathway enzymes, 4-coumaroyl-CoA3-hydroxylase (CCoA3H), caffeoyl-CoA3-O-methyltransferase (CCoAOMT) increase in expression early in development before dropping beginning at 18 DAA. Together, these results indicate that the precursors of avenanthramides may be biosynthesized early in seed development. Our understanding will improve with further use of oat genomic resources, as well as transcriptomic analysis paired with metabolomic profiling over seed development.
Finally, despite the connection between avenanthramides and the disease, crown rust, no results from our GWAS or TWAS results colocalized with previously reported crown rust QTL (Lin et al. 2014;Babiker et al. 2015;McNish et al. 2020;Zhao et al. 2020). One explanation for this finding is that we did not inoculate oats with crown rust, nor trigger SAR. Both crown rust infection and treating oats with analogs of hormones that activate SAR increase avenanthramide concentration (Wise et al. 2008(Wise et al. , 2016Wise 2011Wise , 2017. We predict that, if SAR was activated, there would be more extreme variation in avenanthramide concentrations and we would implicate more genetic loci, some of which would colocalize with crown rust QTL due to shared regulation. Overall, these results suggest that genetic variation in regulation exists, but regulatory elements may need to be activated to effectively map or select upon this variation. Broadly, independent of crown rust infection, the role of upstream regulation during seed development in avenanthramide abundance may relate to the high degree of environmental variability noted for avenanthramide concentrations (Emmons and Peterson 2001;Peterson et al. 2005;Redaelli et al. 2016;Michels et al. 2020).

Prospects for oat saponins-avenacins and avenacosides
Oat saponins are of interest from a human health perspective as they are associated with reduction of cholesterol (Sang and Chu 2017). Our results did not implicate promising candidate genes by GWAS or TWAS that could be applied to develop tools for plant breeders. Like avenanthramides, our TWAS results are limited by only sampling at one time point. We also found that the saponins, especially the avenacosides, were more sporadically detected in the elite germplasm and within compound class correlations were weaker, potentially indicating a decrease in abundance in moving from diverse to elite germplasm. This may be due to taste: high concentrations of avenacosides in oat seed can contribute to an undesirable bitter off taste (Gü nther-Jordanland et al. 2016, 2020. Selection for organoleptic quality has been implicated in reducing saponin concentration in cultivated legumes (Ku et al. 2020), and our results indicate there has been a similar historical trajectory in oat. However, to the best of our knowledge, current oat breeding efforts do not regularly incorporate sensory evaluations.

Selection for an optimized oat seed specialized metabolome
In breeding for nutrition, flavor, or esthetics (color), plant breeders have changed crop metabolomic profiles. However, working with specialized metabolites compared to major nutritional metabolites presents different challenges and thus may require different plant breeding approaches. As an example, fatty acid methyl esters (FAMEs) are healthful fats in oat seed that comprise 3-11% of oat seed composition, compared with 0.2% for avenanthramides. Also, while fatty acid biosynthetic enzymes have some degree of cross-species conservation, this is not true for avenanthramides that are only present in a few (nonmodel) plant species (Ponchet et al. 1988;Wise 2014) and a caterpillar (Blaakmeer et al. 1994). In addition, the specialized metabolites we measured in oats are negatively correlated and do not have shared genetic control, presenting a challenge for selecting for both traits simultaneously but promising for efforts to select for a single trait. Finally, and perhaps most importantly, the specialized metabolite heritability (AVNs: h 2 < 0.26; AECs: h 2 < 0.61; AOSs: h 2 < 0.52) we report here is lower than that of FAMEs (h 2 > 0.61) (Carlson et al. 2019). Overall, these results suggest that work to increase specialized metabolite concentrations will benefit from strategies that reduce environmental variation to improve trait heritability, or increase replication in plant breeding trials, and incorporate seed size into phenotyping efforts.

Conclusions
An understanding of patterns of variation in the plant specialized metabolome is relevant for developing health-promoting functional food crops that may also withstand biotic stress. Due to the low concentrations and lineage specificity of specialized metabolites, they are infrequent direct targets of plant breeding, but may have been inadvertently shaped through processes like selection on other traits or genetic drift. In a diverse panel of cultivated oats, we measured seed size and specialized metabolites and conducted genomic and transcriptomic analyses to characterize existing variation and the processes that contributed to it. Overall, we show that the increased seed size associated with modern plant breeding has uneven effects on the oat seed metabolome, and variation also exists independently of seed size. Broadly, despite the multitudes of phenotypic changes in crops from plant breeding, variation for some specialized metabolites persists in cultivated plants and could be targeted by future plant breeding efforts.
Supplementary material is available at G3 online.