Abstract

High-dimensional and high-throughput genomic, field performance, and environmental data are becoming increasingly available to crop breeding programs, and their integration can facilitate genomic prediction within and across environments and provide insights into the genetic architecture of complex traits and the nature of genotype-by-environment interactions. To partition trait variation into additive and dominance (main effect) genetic and corresponding genetic-by-environment variances, and to identify specific environmental factors that influence genotype-by-environment interactions, we curated and analyzed genotypic and phenotypic data on 1918 maize (Zea mays L.) hybrids and environmental data from 65 testing environments. For grain yield, dominance variance was similar in magnitude to additive variance, and genetic-by-environment variances were more important than genetic main effect variances. Models involving both additive and dominance relationships best fit the data and modeling unique genetic covariances among all environments provided the best characterization of the genotype-by-environment interaction patterns. Similarity of relative hybrid performance among environments was modeled as a function of underlying weather variables, permitting identification of weather covariates driving correlations of genetic effects across environments. The resulting models can be used for genomic prediction of mean hybrid performance across populations of environments tested or for environment-specific predictions. These results can also guide efforts to incorporate high-throughput environmental data into genomic prediction models and predict values in new environments characterized with the same environmental characteristics.

Introduction

Crop adaptation to local environments depends on developmental responses to weather, photoperiod, nutrient availability, and both abiotic and biotic pressures that vary among growing regions (Romay et al. 2013; Bian et al. 2014; Peiffer et al. 2014; Lasky et al. 2015; Adee et al. 2016). Maize is one of the most important crops worldwide as food for humans and animals, as raw material for industrial processes, and as a model plant for understanding evolution, domestication, and heterosis (Romay et al. 2013). Maize is grown in a wide range of environments, and maize varieties differ for their range of environmental adaptation, primarily conditioned by their flowering time, tolerance to abiotic stresses, and disease resistance (Mercer et al. 2008; Ruiz Corral et al. 2008; Romay et al. 2013; Romero Navarro et al. 2017; Mercer and Perales 2019). Genome-wide association studies (GWAS) in some plant species have identified loci with large effects on traits related to environmental adaptation (Turner et al. 2010; Fournier-Level et al. 2011; Ågren et al. 2013; Cavanagh et al. 2013; Lipka et al. 2015; Meyer et al. 2016). However, GWAS has low power to detect loci associated with adaptation when the phenotype is controlled by a large number of variants with small effects (Lipka et al. 2015). This is a common occurrence in both plant and animal breeding, where many traits of economic importance tend to be influenced by many small effect loci and inherited quantitatively (Falconer and Mackay 1996).

Phenotypic variance (σP2) of quantitative traits can be decomposed into genotypic (σG2), environmental (σE2) components, and genotype-by-environment (G × E) interactions (σG×E2) (Comstock and Moll 1963; Falconer and Mackay 1996; Gage et al. 2017). G × E accounts for the variable response of specific genetic backgrounds to the environments they experience, and this interaction represents a portion of the phenotypic variance that hinders broad-scale adaptation but can be exploited for environment-specific adaptation (Comstock and Moll 1963; Falconer and Mackay 1996; Anholt and Mackay 2004; Gage et al. 2017). The genotypic variance can further be partitioned into additive (σA2) and dominance (σD2), and epistatic genetic variance components, although experimental estimation of the higher-order variance components is often difficult and not always possible, depending on the mating and experimental designs employed (Comstock and Robinson 1948; Holland 2001; Hill et al. 2008). Similarly, the overall G × E variance can be partitioned into additive-by-environment interaction, dominance-by-environment interaction, and epistatic-by-environment interaction variances. Understanding this breakdown of phenotypic variance into its constituent components for each trait may enable a breeder to make more informed decisions regarding optimal breeding methods and selection of individuals to use as breeding parents for the next generation, and, for hybrid crops, selection of the best combination of current breeding lines to produce hybrids for production.

G × E interactions complicate both the early- and late-generation evaluation phases of cultivar development, and are often treated as a nuisance factor to be minimized, when the breeding goal is to develop cultivars with stable performance across a wide range of environments (Comstock and Moll 1963). Understanding the relative importance of G × E interactions can help in the optimization of breeding schemes. Typically, crop breeders evaluate large numbers of new breeding materials in one or a small number of environments, followed by more expansive testing of a relatively few selected breeding lines across many environmental conditions. Understanding the relationship between breeding values evaluated in the small subset of initial evaluation environments to breeding values averaged across larger samples of environments can be critical in designing efficient selection strategies.

Rather than simply selecting on the basis of mean performance across testing environments, it may be more effective to jointly identify sets of environments in which certain varieties have optimal performance (Jannink et al. 2010; Van Eeuwijk et al. 2016). Further, the target population of environments may be considered a mixture of multiple environmental distributions, and it may be possible to identify geographic regions that correspond with more homogeneous subsets of environments (Cooper and DeLacy 1994). For example, the International Center for Maize and Wheat Improvement (CIMMYT) has defined several mega-environments, based on their similarity of biotic and abiotic pressures faced by field crops in the growing season (Rajaram et al. 1994). More homogeneous groups of environments can be identified on the basis of similarity of genotype performance from multienvironment trial experiments. Such groups are characterized by having strong, positive genotypic correlations among the environments within the group, and consequently, a large ratio of genotype main effect to genotype-by-environment variances within the group. Defining subgroups of target environments allow breeders to shift away from trying to breed a cultivar with optimal mean performance across all growing locations of interest and toward identifying different sets of genotypes with adaptation and optimal performance within subsets of the target growing region (Cooper and DeLacy 1994; Gage et al. 2017). For most breeding programs, there is a balance between defining a target population of environments narrowly enough to have stressors experienced by the crop be somewhat homogeneous while at the same time wide enough that breeding investment can be recouped by having a sufficiently large market (Murray et al. 2019).

Animal and plant breeders have adopted genomic selection (GS), a statistical method that considers both large and small effect loci to determine the breeding value of a selection candidate (Heffner et al. 2009; Lorenz et al. 2011; Meuwissen et al. 2013, 2016). In many plant species, GS is already being employed to accelerate breeding cycles and more efficiently utilize field resources (Heffner et al. 2011a, 2011b; Sorrells 2015; Crossa et al. 2017; Isik and McKeand 2019). For traits with highly polygenic genetic architecture, accurate predictions of breeding values can be made using genomic relationship matrices, which infer the proportion of genomic identity-by-descent among individuals in the population based on identity-in-state of markers throughout the whole genome (VanRaden 2008; Endelman and Jannink 2012; Isik et al. 2017). The use of marker-based realized genomic relationship matrices result in better model fit and predictive accuracy for selections than use of pedigree information alone to estimate genetic similarity in most reported cases (Burgueño et al. 2012; Heslot et al. 2014; Crossa et al. 2017; Howard et al. 2019; Juliana et al. 2019). GS models leverage additive genetic effects in estimation of breeding values of individuals, and thus provide a quantification of the additive genetic variance (σA2) and values of heritability of a trait. This idea can be generalized to include other forms of realized genomic relationships, such as dominance genomic relationships, leading to the ability to estimate nonadditive genetic variance components (Vitezica et al. 2013; Muñoz et al. 2014) and predict hybrid combinations (Kadam et al. 2016; Guo et al. 2019; Ramstein et al. 2020). Although the additive genomic breeding value estimates are the appropriate criterion for selecting among candidates to produce new breeding populations, the variances and composite genotypic predictions including dominance and higher-order genetic effects can be exploited in some plant breeding situations, such as clonally propagated species, highly inbred varieties, and hybrid crops to select candidates for immediate production (Endelman et al. 2018; Ovenden et al. 2018).

Analogous to using genetic markers as predictors in GS, environmental covariates such as weather variables and their interactions with genetic markers can be included in genomic prediction models to predict average performance of a line across environments or environment-specific performance of a genetic background (Heslot et al. 2014; Jarquín et al. 2014; Saint Pierre et al. 2016). Similarly, the concept of a genomic relationship matrix (which measures genomic similarity among individuals) can be extended to environmental relationship matrices (which measure similarity among environments based on weather and other variates). A difficulty with this approach is that whereas the weighting of individual marker influence on a genomic relationship matrix has an established theoretical basis in polygenic inheritance (Fisher 1919; Endelman and Jannink 2012), we do not know how to properly weight environmental variables in the estimation of an environmental relationship matrix. Measuring the influence of individual environmental variables on genetic covariances between pairs of environments for a phenotypic trait may lead to improved models for estimating and predicting the correlations of breeding values across environments. Understanding the environmental factors that contribute to trait outcomes is an important part of crop modeling research (Chapman 2008; Bustos-Korts et al. 2019), but researchers are just beginning to develop approaches to incorporate environmental and G × E interaction effects into predictive frameworks in a way that is useful for creating GS models (Cooper et al. 2014; Heslot et al. 2014; Jarquín et al. 2014; Van Eeuwijk et al. 2016; Millet et al. 2019). Being able to utilize the relationship between environmental variables and environmental covariance is a step toward performing prediction in new environments.

The Genomes to Fields (G2F) Initiative (Lawrence-Dill 2017; Alkhalifah et al. 2018, https://www.genomes2fields.org/) is a large collaborative project with participants across the United States and Canada that includes a network of field phenotyping sites linked to environmental data collection encompassing a wide range of growing environments. This project has evaluated a wide array of experimental hybrids that are both phenotypically and genetically diverse. Previous work using G2F data has shown that selection during modern breeding may limit genetic potential for plastic response and environment-specific breeding (Gage et al. 2017). The G2F data network provides a basis for many types of analyses, including crop growth modeling, testing hybrids across locations of interest for a breeding program, and understanding the varying environmental and disease pressures experienced across the major growing regions of the United States using maize as a model system. In this article, we first describe a systematic approach to curate trait, environmental, and genetic data from the first three years of this collaborative project, and to filter the data sets using quality control methods to identify and resolve or remove incorrect data. Second, we estimate the relative importance of the different portions of genetic and G × E variances contributing to phenotypic variance in multiple agronomic traits across a wide span of environments. In particular, we measure the relative importance of additive, dominance, additive-by-environment, and dominance-by-environment variances for several agronomically important traits, including grain yield. Finally, we estimate the pairwise genetic covariances among environments for grain yield and identify environmental factors related to the covariance of genotypic performance across environments.

Materials and methods

Data cleaning and filtering

Data from the 2014 to 2016 G2Fs Project (Alkhalifah et al. 2018) were obtained from the public releases on Cyverse Discovery Environment: https://www.doi.org/10.25739/9wjm-eq41 for data from 2014, https://www.doi.org/10.25739/kjsn-dz84 for data from 2015, and from https://www.doi.org/10.25739/yjnh-kt21 for data from 2016 (McFarland et al. 2020). Datasets used included the hybrid phenotype data, inbred genotype data, and weather data taken from the Spectrum Watchdog weather stations located at each field station described previously (Alkhalifah et al. 2018). Hybrid genotype data were constructed using GBS data from inbred parental lines. Hybrid trait data come from replicated field trials across a variety of locations in the United States and Canada, with some changes in locations from year to year, with 12 core locations participating in all 3 years, and unbalanced assignment of hybrids to locations within and across years (Supplementary Table S1). Over the 3-year period, there were 73 site-by-year combinations, referred to hereafter as environments. After data quality control, 65 environments remained for further analysis. Each environment will be referred to by its Location-Year designation, for example NCH1_2014 refers to North Carolina Hybrid Trial 1 in 2014, while IAH3_2015 refers to Iowa Hybrid Trial 3 in 2015 (Supplementary Table S1). A detailed explanation of all Marker, Phenotype, and Environmental data processing can be found in Supplementary Files S1, S2, S3, S4, S5, and S6.

Statistical analysis

Stage 1 trait analysis

Linear mixed models were applied to trait data in two steps. In the first stage, each trait–environment combination was analyzed separately with a series of models that account for the experimental design factors specific to that environment with stand count as a covariate for yield only (Supplementary Table S2; Supplementary File S6). The grid layout of plots within each environment was known for experiments in 2014 and 2015. For these environments, we also tested a variety of residual covariance structures [IID, autoregressive first order (AR1) in row direction, AR1 in column direction, and AR1 in both directions].

Models were fit using ASReml-R version 3 (Butler et al. 2009) in R (R Core Team 2020) (Supplementary File S6). The fixed covariates for stand counts were tested first and significant (p 0.05) covariates were maintained in subsequent models. The best fitting random model for each trait within a location was then selected using Schwarz’s Bayesian Information Criterion (BIC) (Schwarz 1978) (Supplementary Table S3) and used to estimate hybrid Best Linear Unbiased Estimators (BLUEs) of hybrids in that environment (Supplementary File S7). The selected model varied by environment, with some environments having only replicates fit in the experimental structure (Supplementary Table S3). The selected model for each trait-experiment combination was used to calculate within-site hybrid mean-basis heritabilities using the Cullis estimator (Cullis et al. 2006) (Supplemental Methods File S1; Supplementary Table S3), which was then used as a filtering criterion to remove problematic trait–experiment combinations. Trait–experiment combinations with a heritability of less than 10% were set to missing, and experiments with a yield heritability of less than 10% were completely removed from further analysis for all traits. Artificially inoculated disease trials from New York were also removed from analysis.

Stage 2 trait analysis

In stage two, several different genetic and G × E covariance structures were fit to the BLUEs from Stage 1 analyses using Echidna mixed models software (Gilmour 2018) (–Files S8, S9, S10, and S11). Genetic covariance structures for hybrid effects included identical and independent (IDV), additive (A) realized genomic relationships, and dominance (D) realized genomic relationships (Table 1). A and D realized relationship matrices for 1918 hybrids maintained in the final combined data set were estimated from the final set of 22,214 SNPs remaining after extensive quality control of inbred parental genotype data and in silico construction of hybrid genotypes (Supplementary Methods, File S1). Models that were unable to converge within 75 iterations were not considered further.

Table 1

Stage 2 variance components estimates for grain yield for a variety of model structures

ModelBICRuntimeσ^A2σ^D2σ^GI2σ^Env2σ^AxE2σ^DxE2σ^GIxE2
Independent Hybrid Effects (IDVG)
E + GI + G×E32602.2822 s0.8793.6990.950
E+GIE × FA(1)32087.6358s0.8423.7430.974
E+GIE × FA(2)32226.073m46s0.7113.7581.079
E+GIE × FA(3)32539.899m59s0.6613.7941.153
Additive (A) genomic relationships + independent hybrid effects (GI)
E+A +GI+A×E + GI×E29383.9829m35s0.8080.2593.9920.9010.306
E+ A(E) × FA1+GI(Ediag)30253.01131m55s2.9493.8301.3010.368
E+ AE × FA1+GI+GI×E29527.66461m28s0.9390.2313.8221.2520.259
E+ AE × FA1+GI+GI(Ediag)29824.96491m28s0.9280.2343.8391.0160.309
E+ AE × FA2+GI(Ediag)30537.7266m50s2.8000.3683.8731.3880.368
E+ AE × FA2+GI+GI × EA29820.15885m23s0.8370.2323.8771.3760.257
E+ AE × FA2+GI+GI(Ediag)30099.981049m25s0.8960.2363.8761.0400.318
Dominance (D) genomic relationships + independent hybrid effects (GI)
E+D +GI+ DxE + GI×E29102.8830m22s0.5330.1083.6780.9300. 081
E+ D(E) × FA1+GI(Ediag)29604.775m47s0.7043.5740.835
E+ DE × FA1+GI+GI×E29082.51397m23s0.5350.0913.5321.0300.045
E+ DE × FA1+GI+GI(Ediag)29569.45407m25s0.5400.0833.5430.8300.182
E+ DE × FA2+GI(Ediag)29819.55236m14s0.6583.6250.8270.212
E+ DE × FA2+GI+GI×EA29326.31171m08s0.4900.0823.5921.0610.061
E+ DE × FA2+GI+GI(Ediag)29788.00798m08s0.5120.0793.5980.8230.193
Additive (A) genomic relationships + dominance (D) genomic relationships + independent hybrid effects (GI)
E+A+D+GI+ A×E + D×E + GI×E28913.6129m30s0.5130.2500.1023.8100.5320.4530.148
E+A+D+A×E + D×E +GI(Ediag)28914.1229m25s0.5220.4133.8370.5030.2670.317
E+ A + AEdiag + D + D(Ediag) + GI + GI(Ediag)29802.2693m19s0.4360.2430.0973.7770.5800.3820.214
ModelBICRuntimeσ^A2σ^D2σ^GI2σ^Env2σ^AxE2σ^DxE2σ^GIxE2
Independent Hybrid Effects (IDVG)
E + GI + G×E32602.2822 s0.8793.6990.950
E+GIE × FA(1)32087.6358s0.8423.7430.974
E+GIE × FA(2)32226.073m46s0.7113.7581.079
E+GIE × FA(3)32539.899m59s0.6613.7941.153
Additive (A) genomic relationships + independent hybrid effects (GI)
E+A +GI+A×E + GI×E29383.9829m35s0.8080.2593.9920.9010.306
E+ A(E) × FA1+GI(Ediag)30253.01131m55s2.9493.8301.3010.368
E+ AE × FA1+GI+GI×E29527.66461m28s0.9390.2313.8221.2520.259
E+ AE × FA1+GI+GI(Ediag)29824.96491m28s0.9280.2343.8391.0160.309
E+ AE × FA2+GI(Ediag)30537.7266m50s2.8000.3683.8731.3880.368
E+ AE × FA2+GI+GI × EA29820.15885m23s0.8370.2323.8771.3760.257
E+ AE × FA2+GI+GI(Ediag)30099.981049m25s0.8960.2363.8761.0400.318
Dominance (D) genomic relationships + independent hybrid effects (GI)
E+D +GI+ DxE + GI×E29102.8830m22s0.5330.1083.6780.9300. 081
E+ D(E) × FA1+GI(Ediag)29604.775m47s0.7043.5740.835
E+ DE × FA1+GI+GI×E29082.51397m23s0.5350.0913.5321.0300.045
E+ DE × FA1+GI+GI(Ediag)29569.45407m25s0.5400.0833.5430.8300.182
E+ DE × FA2+GI(Ediag)29819.55236m14s0.6583.6250.8270.212
E+ DE × FA2+GI+GI×EA29326.31171m08s0.4900.0823.5921.0610.061
E+ DE × FA2+GI+GI(Ediag)29788.00798m08s0.5120.0793.5980.8230.193
Additive (A) genomic relationships + dominance (D) genomic relationships + independent hybrid effects (GI)
E+A+D+GI+ A×E + D×E + GI×E28913.6129m30s0.5130.2500.1023.8100.5320.4530.148
E+A+D+A×E + D×E +GI(Ediag)28914.1229m25s0.5220.4133.8370.5030.2670.317
E+ A + AEdiag + D + D(Ediag) + GI + GI(Ediag)29802.2693m19s0.4360.2430.0973.7770.5800.3820.214

Average residual variance from stage 1 within environment analysis is 1.89 (min = 0.53, max = 5.53).

Table 1

Stage 2 variance components estimates for grain yield for a variety of model structures

ModelBICRuntimeσ^A2σ^D2σ^GI2σ^Env2σ^AxE2σ^DxE2σ^GIxE2
Independent Hybrid Effects (IDVG)
E + GI + G×E32602.2822 s0.8793.6990.950
E+GIE × FA(1)32087.6358s0.8423.7430.974
E+GIE × FA(2)32226.073m46s0.7113.7581.079
E+GIE × FA(3)32539.899m59s0.6613.7941.153
Additive (A) genomic relationships + independent hybrid effects (GI)
E+A +GI+A×E + GI×E29383.9829m35s0.8080.2593.9920.9010.306
E+ A(E) × FA1+GI(Ediag)30253.01131m55s2.9493.8301.3010.368
E+ AE × FA1+GI+GI×E29527.66461m28s0.9390.2313.8221.2520.259
E+ AE × FA1+GI+GI(Ediag)29824.96491m28s0.9280.2343.8391.0160.309
E+ AE × FA2+GI(Ediag)30537.7266m50s2.8000.3683.8731.3880.368
E+ AE × FA2+GI+GI × EA29820.15885m23s0.8370.2323.8771.3760.257
E+ AE × FA2+GI+GI(Ediag)30099.981049m25s0.8960.2363.8761.0400.318
Dominance (D) genomic relationships + independent hybrid effects (GI)
E+D +GI+ DxE + GI×E29102.8830m22s0.5330.1083.6780.9300. 081
E+ D(E) × FA1+GI(Ediag)29604.775m47s0.7043.5740.835
E+ DE × FA1+GI+GI×E29082.51397m23s0.5350.0913.5321.0300.045
E+ DE × FA1+GI+GI(Ediag)29569.45407m25s0.5400.0833.5430.8300.182
E+ DE × FA2+GI(Ediag)29819.55236m14s0.6583.6250.8270.212
E+ DE × FA2+GI+GI×EA29326.31171m08s0.4900.0823.5921.0610.061
E+ DE × FA2+GI+GI(Ediag)29788.00798m08s0.5120.0793.5980.8230.193
Additive (A) genomic relationships + dominance (D) genomic relationships + independent hybrid effects (GI)
E+A+D+GI+ A×E + D×E + GI×E28913.6129m30s0.5130.2500.1023.8100.5320.4530.148
E+A+D+A×E + D×E +GI(Ediag)28914.1229m25s0.5220.4133.8370.5030.2670.317
E+ A + AEdiag + D + D(Ediag) + GI + GI(Ediag)29802.2693m19s0.4360.2430.0973.7770.5800.3820.214
ModelBICRuntimeσ^A2σ^D2σ^GI2σ^Env2σ^AxE2σ^DxE2σ^GIxE2
Independent Hybrid Effects (IDVG)
E + GI + G×E32602.2822 s0.8793.6990.950
E+GIE × FA(1)32087.6358s0.8423.7430.974
E+GIE × FA(2)32226.073m46s0.7113.7581.079
E+GIE × FA(3)32539.899m59s0.6613.7941.153
Additive (A) genomic relationships + independent hybrid effects (GI)
E+A +GI+A×E + GI×E29383.9829m35s0.8080.2593.9920.9010.306
E+ A(E) × FA1+GI(Ediag)30253.01131m55s2.9493.8301.3010.368
E+ AE × FA1+GI+GI×E29527.66461m28s0.9390.2313.8221.2520.259
E+ AE × FA1+GI+GI(Ediag)29824.96491m28s0.9280.2343.8391.0160.309
E+ AE × FA2+GI(Ediag)30537.7266m50s2.8000.3683.8731.3880.368
E+ AE × FA2+GI+GI × EA29820.15885m23s0.8370.2323.8771.3760.257
E+ AE × FA2+GI+GI(Ediag)30099.981049m25s0.8960.2363.8761.0400.318
Dominance (D) genomic relationships + independent hybrid effects (GI)
E+D +GI+ DxE + GI×E29102.8830m22s0.5330.1083.6780.9300. 081
E+ D(E) × FA1+GI(Ediag)29604.775m47s0.7043.5740.835
E+ DE × FA1+GI+GI×E29082.51397m23s0.5350.0913.5321.0300.045
E+ DE × FA1+GI+GI(Ediag)29569.45407m25s0.5400.0833.5430.8300.182
E+ DE × FA2+GI(Ediag)29819.55236m14s0.6583.6250.8270.212
E+ DE × FA2+GI+GI×EA29326.31171m08s0.4900.0823.5921.0610.061
E+ DE × FA2+GI+GI(Ediag)29788.00798m08s0.5120.0793.5980.8230.193
Additive (A) genomic relationships + dominance (D) genomic relationships + independent hybrid effects (GI)
E+A+D+GI+ A×E + D×E + GI×E28913.6129m30s0.5130.2500.1023.8100.5320.4530.148
E+A+D+A×E + D×E +GI(Ediag)28914.1229m25s0.5220.4133.8370.5030.2670.317
E+ A + AEdiag + D + D(Ediag) + GI + GI(Ediag)29802.2693m19s0.4360.2430.0973.7770.5800.3820.214

Average residual variance from stage 1 within environment analysis is 1.89 (min = 0.53, max = 5.53).

The simplest models utilized IDV genetic and environmental variance structures:
where μ is the intercept, Gi is the random genetic effect associated with hybrid genetic background (GiiidN(0, σG2I)) with I as an identity matrix of dimension 1918 (the number of hybrids with both trait and genotype data), Ej is the random effect associated with each environment (location-year combination), Ejiid N(0, σEnv2), GEij is the random effect due to hybrid-by-environment interaction, GEijiid N(0, σGE2), and εr is the error term with variance fixed to one but scaled by weights representing the reciprocals of the effective variances of the BLUEs previously measured in Stage 1 analysis: εij ~ N(0, W), where W is a diagonal matrix with diagonal values of 1/weights obtained as the diagonal elements of the inverse of the variance matrix of the BLUEs from the first stage analysis (Smith et al. 2001). Fixing the residual variance permits estimation of the hybrid × environment interaction variance separately from the residual. Versions of this model with environments or hybrids fit as fixed effects were also employed to obtain the marginal BLUEs of environments or hybrids separately (Supplementary Files S12 and S13).
Genomic relationships can be added to this model by introducing hybrid effects with covariances proportional to the realized additive genomic relationship matrix (Ai) and the realized dominance genomic relationship matrix (Di) (Su et al. 2012; Muñoz et al. 2014) in addition to the independent hybrid effects (GIi), plus separate interaction effects for each of these terms with environments:
In this model:
where A and D are the realized additive and dominance genomic relationship matrices, respectively, with dimension 1918, I1918 is an identity matrix of dimension equal to the number of hybrids, I65 is an identity matrix of dimension equal to the number of environments, I1918·65 is an identity matrix with dimension equal to the total number of hybrid–environment interaction effects, and W is the same diagonal matrix with inverses of BLUE variances described previously. Reduced versions of this model with subsets of the genetic and G × E effects were also fit to the data.
The previous model assumes a common variance component for each type of G × E effect, implying a common correlation for hybrid performance between all pairs of environments associated with each type of genetic effect. To relax this assumption, hybrid-within-environment effects can be fit without hybrid main effects but with a factor analytic (FA) structure applied to the variance–covariance matrix of hybrid performance across environments (Smith et al. 2005; Isik et al. 2017):
In the base FA model, the hybrid effects are independent:
We can add the genomic relationships to this model as follows:
where GEij represents hybrid effects with either additive or dominance genomic relationships, such that the following distributions were fit in separate models:
And GIEij represents genetically independent hybrid-within-environment effects that were modeled as cross-classified or nested within environments with three different structures:
GIi represents independent hybrid main effects, GIEij represents independent hybrid by environment interaction effects with a common variance (compound symmetry), and GIEdiagij represents independent hybrid-within-environment effects with a diagonal environment covariance structure (different variances within each environment, heterogeneous compound symmetry):

We fit models with different combinations of additive or dominant genetic effects with FA covariances across environments plus one of these three-independent genetic within environment effect structures (Table 1).

The FA covariance structure for hybrid effects across environments is: ΣFAN(0, Σi=1kλiλiT+Ψ) for k factors. In this model specification λis are vectors of environment factor loadings (with length 65 in this case), k is the number of factors fit, and Ψ is a diagonal matrix of environment-specific genetic variances of dimension 65. This provides a parsimonious parameterization of 65·66/2 = 2145 distinct variances and covariances requiring only 65 + 65k parameters to be estimated for an FA(k) model. The additional term IEdiagij requires 65 additional parameter estimates (one independent genetic variance component per environment), and was fit only in models including A or D relationships to capture independent within-environment genetic deviations that are not associated with the AΣFA or DΣFA covariances. Time to convergence for all models was measured on a system with eight 3.5 GHz processors and 64 GB RAM using Echidna Version 0.92 and without specifying initial parameter values; convergence was not achieved for some of the models using ASReml. A bug in Echidna resulted in incorrect likelihoods being reported for models involving realized genomic relationships in combination with FA structures. Therefore, correct likelihoods for these models were obtained by fitting models with parameters fixed at convergence in ASReml version 4.2 (Gilmour et al. 2015).

Derivation of variance components from genetic correlations

Comparing results from cross-classified models to FA models is difficult because the former estimate variance components for overall genetic main effects and G × E interactions, whereas the FA models involve many within-environment variances and across-environment covariances. To permit comparisons of the relative importance of genetic main effect variance (σG2) and genotype-by-environment variance (σG×xE2) among these models, we used the following relationship:
where rgij is the genetic correlation between performance in environment i and in environment j, σGij is the genetic covariance between environments i and j, and σGEi2 and σGEj2 are the genetic variances within environments i and j, respectively (Farjat et al. 2017; Isik et al. 2017). The cross-classified model assumes that the genetic variance within environments is constant and relates to the main effect genetic and genotype-by-environment variances as:
The cross-classified model also assumes that the genetic covariance between environments is constant, and therefore the genetic correlation between pairs of environments is constant:
The overall genetic main effect is typically represented by the first factor in the FA model rather than as a single variance. Therefore, we can estimate it and the overall GE interaction variance using the average within-environment variance and the average pairwise genetic correlation (Isik et al. 2017):

Relationships between environmental data and G × E

Environmental data, including daily weather measurements and soil characteristics, were collected during the growing season from each testing environment. Environmental variables were standardized and summarized over periods of 5, 10, 15, or 30 days (Supplementary Methods). Environments were clustered on the basis of their similarity for the 30-day period environmental covariates using the first 10 factors of a factor analysis of weather data alone. Factor analysis of weather data was done using the R package psych (Revelle 2020) and clustering was performed using the stats package. The number of clusters was chosen by examining the ratio of variance within and between clusters as the number of clusters was increased and identifying an inflection point where the ratio changed less than 0.01 with further splitting of clusters.

To test the relationship between weather-defined clusters and G × E variance for grain yield, we fit the following linear mixed model:
where yield BLUEs for hybrids at specific environments were used as a response with Clusteri as a fixed effect, Hybridj is a random effect with Hybrid  N(0, I1918σG2), where I1918 is an identity matrix of dimension 1918 for the number of hybrids with yield data. EnvClusterki is a random term for the effect of environment k nested within cluster i, Cluster*Hybridij is the random interaction between cluster i and hybrid j, and Hybrid*EnvClusterj*k(i) is the interaction between hybrid j and environment k within cluster i. The residual error (εijk) variance was fixed at one with weights based on the stage 1 modeling. This model permits a test of the null hypothesis of no difference in mean trait values across weather-defined clusters, and also allows the overall G × E variance component to be partitioned into components due to interactions among and within clusters defined by weather variables.

To aid understanding of which environmental variates are related to G × E patterns and environmental mean performance, forward stepwise regression models using AIC for model selection were fit to the parameter estimates from the D×FA(1) and A×FA(1) models. Specifically, environment factor loadings (λ1), site-specific genetic variances (Ψ), and the environment mean yields (from the IDVG model) were response variables in separate models and the environmental variables were fit as predictors. Note that similarity in factor loadings for the environments from the FA models correspond to similarity in relative hybrid performance between environments because the covariances between pairs of environments are modeled as the product of factor loadings for the environments. We used four different environmental data sets for this analysis, corresponding to 5, 10, 15, and 30-day windows over which the environmental variables were averaged, and limited the number of predictors to a maximum of 15 in each case.

Evaluation of prediction ability for hybrid marginal values

We compared the prediction ability of different models for hybrid marginal yield values (averaged value across environments). Observed BLUE values for hybrid marginal effects were estimated from a linear model with hybrid BLUEs estimated from a linear model with fixed hybrid effects, random environment and hybrid-environment interaction effects. Genomic prediction models tested correspond to models:
in Table 1, except that environment main effects were fit as fixed to speed computation. Weighted analysis with residual variance fixed at 1 was used for these models. In addition, two-step models that fit either A, D, or both relationship matrices to the BLUEs were tested. Ten-fold cross validation was used to fit each prediction model with yield data from 10% of hybrids masked in the training set, and the correlation between genomic predictions and observed BLUEs for the held-out 10% test set estimated for each fold. The mean and standard deviation of prediction ability over the 10 folds was computed for each model. The same fold structure was used for all models.

Data availability

Genomic, phenotypic, and environmental data from the 2014-2016 G2Fs Project (Alkhalifah et al. 2018) are available via public release on Cyverse Discovery Environment (2014 (https://www.doi.org/10.25739/9wjm-eq41), 2015 (https://www.doi.org/10.25739/kjsn-dz84), 2016 (https://www.doi.org/10.25739/yjnh-kt21). Supplementary files and tables are uploaded to the GSA Figshare portal. Supplementary File S1 contains detailed supplemental methods. Supplementary File S2 contains R code for combining and filtering trait data. Supplementary File S3 contains R code for processing weather data. Supplementary File S4 contains correlations between data obtained from the in-field Spectrum Watchdog weather stations and data scraped from the Iowa Environmental Mesonet database. Supplementary File S5 contains weather variables (in original units) averaged over 5-day periods for each environment. Supplementary File S6 contains R code for Stage 1 trait analysis. Supplementary File S7 contains hybrid best linear unbiased estimators (BLUEs) for all traits within each environment. Supplementary File S8 is a sample Echidna code for models utilizing IDVG variance structures; Supplementary File S9 is a sample Echidna code for models utilizing the A Matrix; File S10 is a sample Echidna code for models utilizing the D Matrix; and Supplementary File S11 is a sample Echidna code for models utilizing both the A and D matrices. Supplementary File S12 has environment BLUEs (averaged across all hybrids) for all traits; Supplementary File S13 has hybrid BLUEs (averaged across all environments) for all traits. Supplementary File S14 contains hybrid cluster assignments and A and D diagonal values. Supplementary File S15 contains variance components estimates for each trait and model fit criteria. Supplementary File S16 has factor loadings of each environment variable from the FA(10) model applied to 30-day period weather data. Supplementary File S17 contains as list of hybrid names in order of rows and columns of the realized genomic relationship matrices. Supplementary File S18 contains the realized additive relationship matrix (A) in TASSEL output format; Supplementary File S19 contains the realized additive relationship matrix (D) in TASSEL output format.

Supplementary material is available at fighsare DOI: https://doi.org/10.25387/g3.12636095.

Results and discussion

Genomic relationships

The diagonal values of the realized additive genomic relationship matrix (A) are centered at 0.93 (Min=0.47, Max=1.81) (Figure 1A). These values represent estimates of 1 + F, where F is the genomic-estimated inbreeding coefficient. Therefore, the entire set of hybrids has an average inbreeding coefficient near zero, with some hybrids having negative inbreeding coefficients, which is possible when inbreeding coefficients are defined in terms of allelic correlations (Powell et al. 2010). The inbreeding coefficients less than zero represent hybrids from more distant than average pairs of parents, resulting in greater heterozygosity throughout the genome than expected under random mating. In contrast, several hybrids with large inbreeding coefficients were derived from crosses between lines derived from a common breeding program (University of Guelph, Canada; CG).

Distribution of elements of the realized additive (A) and dominance (D) relationship matrices. (A) Histogram of the diagonal elements of A. (B) Histogram of the off-diagonal elements of A. (C) Histogram of the diagonal elements of D. (D) Histogram of the off-diagonal elements of D.
Figure 1

Distribution of elements of the realized additive (A) and dominance (D) relationship matrices. (A) Histogram of the diagonal elements of A. (B) Histogram of the off-diagonal elements of A. (C) Histogram of the diagonal elements of D. (D) Histogram of the off-diagonal elements of D.

The off-diagonal elements of A are centered at −0.0005 (Min= -0.53, Max=1.21) and have both a right skew and a small second mode (Figure 1B). The larger peak centered at 0 is expected in a population of unrelated individuals, while the smaller peak at 0.45 is indicative of the family structure arising from the mating design of the G2F hybrids. The panel of hybrids included groups of largely unrelated lines crossed to common inbred testers, resulting in many pairs of F1 hybrids with additive relationships near 0.5 (Supplementary Figure S1).

The diagonal values of the realized dominance genomic relationship matrix (D) are centered at 0.99 (Min = 0.47, Max = 1.35) (Figure 1C). This is very close to the value of 1 expected for a Hardy-Weinberg population, and less than the value expected for crosses mainly performed between unrelated heterotic groups, as is commonly done in commercial maize breeding. The group of hybrids tested includes crosses within as well as among known heterotic groups, resulting in varying levels of heterozygosity among the hybrids. The values greater than one in the distribution of diagonal values indicate that the individuals have higher dominance covariance with themselves than expected by random chance, due to the mating of more unrelated parents than expected by chance. The off-diagonal values of D are centered at 0.007 (Min= -0.17, Max=1.09) and are right skewed (Figure 1D). This right skew comes from pairs of hybrids where individuals from a single family were crossed to a common tester.

The A and D matrices are correlated (Supplementary Figure S2). The correlations between off-diagonal values of the two matrices is r =0.83 and between diagonal values is r =0.54. These correlations demonstrate that the A and D matrices are not orthogonal, which implies that we cannot perfectly separate the additive from dominance effects in this data set.

To identify genetic groupings among the hybrids alone, hybrids were clustered based on their genotype data using Ward’s method. At 10 clusters, the within-between variance ratio is 0.88, indicating substantial heterogeneity remains within clusters (Figure 2; Supplementary Table S4; Supplementary File S14). A larger number of clusters may be useful in creating more homogeneous clusters for description of population structure, but even at 50 clusters, the within-between ratio is equal to 0.78 indicating that to create more homogeneous clusters, a very large number of clusters would need to be used. The first and second principal components of the combined parental inbred and hybrid progeny genotype data account for 5.75% and 3.46% of variance observed in the marker data, respectively (Figure 2), indicating the complexity of genetic relationships among these genotypes. The first 40 PCs account for 43.6% of genetic variance across all parents and progeny.

Plot of first two principal components of marker data on hybrids and their inbred parents analyzed as a common dataset. Parents are colored dark purple, hybrids are colored according to the cluster they belong to, based on a separate principal components analysis of the marker data of the hybrids alone. The number of genotypes in each cluster is indicated in the legend.
Figure 2

Plot of first two principal components of marker data on hybrids and their inbred parents analyzed as a common dataset. Parents are colored dark purple, hybrids are colored according to the cluster they belong to, based on a separate principal components analysis of the marker data of the hybrids alone. The number of genotypes in each cluster is indicated in the legend.

Similarity of hybrids tested across environments

The assignment of hybrids to testing environments was highly unbalanced, primarily because more hybrids were evaluated overall than could be evaluated within any one environment due to resource limitations. Among the 65 environments, environment pairs ranged from sharing a single hybrid in common to sharing 352 hybrids (Supplementary Figure S3). We also computed the mean additive genomic relatedness of individuals between pairs of environments as a measure of similarity of genetic materials tested among environments. Mean additive relationships within and between environments ranged from 0 to 0.3. Genomic relationships of materials tested across sites was higher within 2014 because the entries that year represented a smaller, more closely related set of hybrids compared to the complete set of hybrids.

Within-environments (stage 1) trait analysis

Environment mean yield varied significantly among environments, ranging from 5.2 to 13.0 Mg ha−1 (Figure 3A). Patterns of locations with similar yield performance across the three years can be observed: locations in Western Nebraska (NEH2 and NEH3) that experienced low water availability consistently had lower yields, while Iowa and Ontario locations were ranked in the upper half of environments in all years. Residual error variances for yield varied dramatically across environments, reflecting substantial differences in precision of BLUEs for yields among environments (Figure 3B).

Results of stage 1 trait analysis. A. Boxplot distribution of hybrid BLUEs for grain yield, environment mean is marked by a black line. B. Histogram of the within-environment error variance for grain yield estimated from the selected model from within each environment in stage 1 analysis.
Figure 3

Results of stage 1 trait analysis. A. Boxplot distribution of hybrid BLUEs for grain yield, environment mean is marked by a black line. B. Histogram of the within-environment error variance for grain yield estimated from the selected model from within each environment in stage 1 analysis.

Additive, dominance, and G × E variance modeling across environments (stage 2)

Stage-two analyses were performed using the hybrid yield BLUEs estimated within environments to estimate genetic main effect and G × E interaction variances. Models of varying complexity for both genetic relationships among the hybrids and for genetic correlations among environments were fit. Hybrid effects were modeled as independent (IDVG), or with covariances proportional to the additive (A) or dominance (D) realized relationships. On the environment side, we fit cross-classified models that assume a constant genetic variance within environments and a constant genetic correlation among all pairs of environments, and FA(1) and FA(2) models that allow each environment to have a unique within-environment genetic variance and each pair of environments to have a unique genetic correlation using either one or two factors, respectively. We combined as many types of genetic models with as many types of environmental covariance models as possible, but models with more than one FA factor for environments in combination with either A or D relationships for hybrids did not converge, nor did models with one FA factor for environments in combination with both A and D hybrid relationships. Models were compared using BIC and computational time. Models with a good fit but very long runtime may be impractical for use in a breeding program.

Computation times were shortest for models with IDVG hybrid relationships (Table 1). The best model (smallest BIC) in this category was the FA(1) model, corresponding to genetic covariances between environments modeled as cross-products of loadings on a single latent variable. Selection of the FA(1) model as best suggests that genetic correlations vary significantly among environment pairs. The pairwise genetic correlations across environments estimated by this model ranged from r =0.036 to r =0.955 with a mean of r =0.497 and the within-environment genetic variance ranged from 0.076 to 7.356 with a mean of 1.815 (Supplementary Table S5). BIC for the FA(2) model increased compared to the FA(1) model, indicating that the improvement in likelihood by addition of 65 additional parameters going from FA(1) to FA(2) was less than the penalty for adding those factors. This trend continued with the FA(3) model, which had higher BIC (poorer fit) than the FA(2) model. Therefore, we did not attempt to fit models with more than three factors.

All models including A or D relationships among the hybrids had lower (better) BIC than the best IDVG but also had increased computational times by one or more orders of magnitude (Table 1). These types of models also can be used to predict untested hybrids that can be included in the genomic relationship matrices if their genomic data are available. Within the A and D model classes, fitting independent hybrid effects nested within environments (with heterogeneous variances among environments), for example, E+ A(E) × FA1+GI(Ediag), but not including independent hybrid main effects, always resulted in a substantial inflation in the additive or dominance genetic main effect variance component. This occurs because the combination of omitting hybrid main effects and fixing the residual variance (at the variances of the BLUEs previously computed in stage 1 analysis) results in some of the independent genetic effects being absorbed into estimated additive effects. The variance components estimates from this class of model were not reliable.

Models incorporating D relationships had better fit than the corresponding models incorporating A relationships only (Table 1), even though the dominance variance components estimates are always smaller than the corresponding additive variance components estimates. For example, in the simplest cross-classified model in each case, the additive main effect variance component was 0.808 and the dominance variance component estimate was 0.533. The scale of the two relationship matrices was similar, although the smaller average diagonal value of the A matrix would tend to be associated with larger estimated variance components if the two components caused equal amounts of variation. Our results suggest that although the additive variance component estimate was larger than the dominance variance component estimate (as is typical across a very wide range of genetic architectures) (Hill et al. 2008; Huang and Mackay 2016), it may be more important to account for dominance genomic relationships in this set of highly heterozygous hybrids for genomic prediction for yield.

Fitting the additive and dominance relationships cross-classified with environments together produced the best model BIC (Table 1). The variance components for additive, dominant, and independent genetic effects were 0.513, 0.250, and 0.102, respectively. This is congruent with the larger variance components for additive than dominance effects when they were fit separately, although the magnitudes of the additive and dominance variance components are greater when they are fit separately because they account for some of the same variation due to the correlation between the two relationship matrices. The corresponding G × E variance components from this model were 0.532, 0.453, and 0.148 (Figure 4A). Remarkably, for yield all the genetic-by-environment variance components were greater than their corresponding genetic main effect variance components, reflecting the wide diversity of environments that the hybrids were tested in.

Variance components estimates from stage 2 (across environments) analysis of all traits. (A) Proportion of total variance due to additive (A), dominance (D), and uncorrelated (independent) genetic effects, and their respective interactions with environment. (B) Proportions of total variance due to environment main effects, total genotype effects, genotype-by-environment interaction, and residual variance.
Figure 4

Variance components estimates from stage 2 (across environments) analysis of all traits. (A) Proportion of total variance due to additive (A), dominance (D), and uncorrelated (independent) genetic effects, and their respective interactions with environment. (B) Proportions of total variance due to environment main effects, total genotype effects, genotype-by-environment interaction, and residual variance.

The optimal structure for modeling G × E interactions varied among different classes of genetic relationships. The FA(1) model that allows genetic variances to vary among environments and genetic correlations to vary among pairs of environments was best within the IDVG and D classes of genetic relationship models. The combination of fitting dominance genetic relationships and FA(1) G × E effects revealed that genetic effects on yield were positively correlated among almost all environments except for two exceptions that had no obvious geographic or climate patterning (Figure 5). Higher-yielding and higher-heritability environments tended to have higher genetic correlations with each other (Figure 5). The cross-classified model, which assumes a constant genetic variance within environments and a constant pairwise genetic correlation between environments was best in combination with the A relationship matrix.

Genetic correlations for yield among environments based on the D×FA(1) model. Mean grain yield (Mg ha−1) and entry mean-basis heritability for grain yield are also displayed for each environment.
Figure 5

Genetic correlations for yield among environments based on the D×FA(1) model. Mean grain yield (Mg ha−1) and entry mean-basis heritability for grain yield are also displayed for each environment.

The same set of models was fit for each trait available in the G2F dataset (Supplementary File S15). For all traits, environmental main effect variance was the single largest contributor to phenotypic variation (Figure 4B). The ratio of σG2 to σG×E2 varied among traits, with yield and lodging having the greatest proportions of G × E and flowering time traits (DTA and DTS) having the lowest amount of G × E variance (Figure 4, B). Decomposing G and G × E variances revealed that the influence of A, D, GI, and their respective G × E components also varied among traits (Figure 4, A). Phenotypic variance for yield was partitioned into 11.2% genetic (A,D, GI) variance and 14.7% G × E (A×E, D×E, I×E) variance. G and G × E variances represent similar proportions of phenotypic variance for grain moisture and test weight. Additive genetic variance was most important for plant height, ear height, DTA, and DTS, with little additional contribution from G × E variance. The relative importance of genotypic main effect versus G × E variances are congruent with previous estimates in maize: flowering time and height variation are mostly affected by additive genetic variance with little G × E variance (Buckler et al. 2009; Romay et al. 2013; Peiffer et al. 2014), whereas maize yield tends to be influenced by heterosis, and consequently, dominance effects, with substantial contribution from G × E effects (Comstock and Robinson 1948; Hallauer et al. 2010; Baldauf et al. 2018).

Factor analysis of environmental data

Factor analysis was performed on the centered and scaled 30-day window weather data using 10 factors that accounted for 85.6% of variation observed in the data (Supplementary File S16). The first factor appears to be largely influenced by late season photoperiod and high temperature; environments with warmer late season temperatures and greater late season sunlight availability have higher positive scores on this factor. Environments from the southeast tended to have larger positive scores for the first factor, as they have longer average day length and higher temperature in the later parts of the growing season than their northern counterparts. The most northern environments tended to have negative scores as daylength and temperature both drop dramatically during the later season in such environments (Supplementary Figure S4A). Early season daylength loads negatively on this factor, therefore environments (such as NY and MN) with long early season day length but short late season photoperiod would receive a more negative score on factor one. The second factor has high positive loadings for early season temperature, and negative loadings for mid-late growing season photoperiod length, while the third factor has high loadings for mid-season temperatures and negative loadings for the presence of clay in the soil. Factor four has high positive loadings for all wind covariates, and negative loadings for the presence of high humidity and sandy soils.

Environment clustering

Environmental clustering was performed using Ward’s minimum variance criterion (Ward 1963) for agglomerative clustering on environmental scores from factor analysis, with seven environmental clusters selected [within-between (WB) variance ratio is 0.68; Figure 6; Supplementary Figure S5]. Geographic patterning is apparent between clusters (Figure 6), and although year-to-year variation in environmental data is present, the variance among experimental environments appears to be largely influenced by geographic area: environments representing multiple years at a common location tended to cluster together. The null hypothesis of equal representation of sites among clusters is rejected at P <10−12 using a chi-square test, whereas the null hypothesis of equal representation of years among clusters was not rejected (P =0.17). There are some exceptions to the geographic clustering, however; for example in 2016, the two experiments in Wisconsin are placed in different clusters because they were planted more than two weeks apart (WIH1_2016 planted on May 9, WIH2_2016 planted on May 24), and exposed to different weather conditions relative to the developmental stages of the hybrids. A Midwest cluster in Iowa and Illinois not observed in previous years appears in 2016, likely due to these locations being planted much earlier than the nearby Midwestern locations of Wisconsin and Minnesota.

Geographic patterns of weather variable-based clustering. Each site within a year is colored according to its weather pattern cluster. (A) Locations in 2014. (B) Locations in 2015. (C) Locations in 2016.
Figure 6

Geographic patterns of weather variable-based clustering. Each site within a year is colored according to its weather pattern cluster. (A) Locations in 2014. (B) Locations in 2015. (C) Locations in 2016.

All three years of experiments at TXH1 in College Station, Texas group in a common cluster characterized by high temperatures throughout the season compared to the average G2F environment, high GDD, and short photoperiod at the beginning of the growing season (Figure 6). The other Texas location (TXH2 in Halfway, TX) is in the plains nearer to the panhandle of Oklahoma and has much drier conditions than those experience closer to the Gulf of Mexico. In terms of overall geography, the clusters broadly account for a Southeastern environment group (cluster 1: Southeast, purple), a somewhat disparate temperate environment group in the Northeast and part of the Midwest (cluster 2: Temperate Northeast and Midwest, blue), a wet Midwestern environment group that appears only during 2016 (cluster 3: Cornbelt Early Planting 2016, teal), a far northern group (cluster 4: Northern, light green), a southern Cornbelt group (cluster 5: Southern Cornbelt, orange), a dry plains group (cluster 6: High Plains, light orange), and a Central Texas environment (cluster 7: Central Texas, yellow). These clusters capture broad mega-environment groupings but substantial variation remains within the clusters (within-between ratio = 0.68). The clustering dendrogram (Supplementary Figure S4B) shows locations that are geographically close tending to merge together quickly.

When the weather-defined clusters are introduced as a factor in the linear mixed model for yield, the fixed effect for cluster was nearly significant (p =0.065), suggesting that environment clusters are associated with some of the environment main effects on yield. In this model, the G × E variance is partitioned into a component between clusters (Cluster*Hybrid interaction) and a component within clusters (Hybrid*EnvCluster interaction). About 12% of the G × E variance for yield is explained by the variance between clusters based on environmental data, with the remaining G × E variance occurring among environments within clusters (Supplementary Table S6). Thus, although the weather-based environment clusters do not explain most of the G × E variance, they do appear to account for significant portions of the environment main effects and G × E variances, suggesting some relationship between weather and the observed yield G × E patterns. Environment clustering based on similarity of growing conditions could be used to define mega-environments (ME) to more efficiently allocate resources in multienvironment trials (Rajaram et al. 1994; Krishnamurthy et al. 2017; Gerrish et al. 2019; González-Barrios et al. 2019). Similarities of conditions experienced in environments provide a basis for sharing of information that can lead to increased accuracy in predictive studies and may help plant breeders understand sources of G × E in multienvironment trials (Burgueño et al. 2012; Heslot et al. 2014; González-Barrios et al. 2019; Millet et al. 2019; Monteverde et al. 2019).

Environmental covariates related to yield factor

A summary of broad patterns of interactions between hybrids and environments can be obtained by comparing the mean environment loadings for the seven weather-defined environment clusters and mean genotype scores for the 10 hybrid clusters defined by marker data on the yield factor estimated by the D×FA1 model (Supplementary Figure S6). The highest yielding group of hybrids is the “ex-PVP” cluster comprising mostly crosses between commercial inbreds with expired plant variety protection (ex-PVP) and crosses between mixed background lines and the tester inbreds LH185, LH195, and PB80 (Supplementary Figure S6; Table S4). This group of hybrids also has the highest mean score on the yield factor, indicating that its mean relative yield advantage is increased in the environments with higher positive loadings, which are the most northern and have the coolest early-season and average temperatures and highest yields (Supplementary Figure S6). The superiority of these hybrids decays in the environments with smaller factor loadings (which are farther south geographically). Several groups of hybrids (mostly involving hybrids with at least one noncommercial parent) had negative mean scores on the factor (Supplementary Figure S6); we would expect rank changes between these hybrids and the other hybrid groups when moving from typical environments to those with negative loadings, but few such environments were included in this experiment. Nevertheless, rank changes occur in these data, because the factor only explains part of the observed variation: site-specific genetic and independent genetic effects also contribute to the observed yields. The results also suggest the potential of exotic germplasm to enhance hybrid yields in temperate environments: the hybrid cluster with contribution from the tropical inbred line Ki3 from Thailand had both positive mean scores on the factor and higher than average mean yield across all environments (Supplementary Figure S6).

We can also examine the relationships between the statistical components quantifying G × E and the environmental inputs experienced throughout plant development using forward stepwise regression models to model each component of the A×FA1 and D×FA1 models. For the 5-day window set, the environment mean yields, A×FA(1) parameters, and D×FA(1) model parameters were modeled using a maximum of 15 factors (Supplementary Table S7). Five-day windows consistently lead to the highest explanation of variance in each parameter type, indicating that higher resolution was helpful in explaining both variance in environmental means (R2= 0.84) and in G × E parameters (R2= 0.68, R2=0.75) for both the A and the D models, respectively (Supplementary Table S7). Most models using environmental Ψs as response fit only an intercept and in cases where weather covariates were included, they were not consistent across sliding window sets (Supplementary Table S7). The random covariates fit for the Ψs between windowed datasets had no patterning and explained little variance, likely these covariates are selected largely by chance. These results indicate that site-specific variances were not significantly correlated with any patterns of environmental covariates.

The overall best fitting model for the environment mean yields utilized the 5-day window set and included the maximum allowed 15 predictors (R2=0.84). The model uses four temperature-associated covariates, four terms accounting for rainfall, three terms accounting for humidity, photoperiod near the reproductive stage, early season wind, and the amount of sand in the soil to explain variance in the environmental means (Supplementary Table S8). Two of the four temperature covariates were positively associated with and two negatively associated with environment mean yield. The two temperature covariates negatively associated with mean yield were during the vegetative to reproductive transition and peak flowering time, suggesting that heat stress at these critical periods of development can lead to decreased yield (Lobell et al. 2013; Millet et al. 2019). Increased photoperiod around flowering time was associated with increased environment mean yield, this may be due to the combined effects of more photosynthetically active radiation available in those environments and the generally better soil and growing conditions at the more northern locations. All of the hybrids tested in this study have at least one temperate-adapted parent, such that photoperiod sensitivity (which could have caused yield reductions under long day lengths) was not a problem for these hybrids. Rainfall accumulation generally had a positive influence on environment mean yield if rains occurred during early development and a negative association with yield for late season rains during grain fill.

Similarly, the best fitting model for the λ1 values of the dominance FA model used the five-day window set and the maximum 15 covariates (R2=0.68). This model contained four rain associated terms, three GDD terms, and one temperature term, three terms associated with solar irradiance, three humidity terms, and one wind term (Supplementary Table S7). Individual weather variables had a mix of positive and negative effects that changed depending on growing season period, indicating the complexity of the G × E structure in this data set (Supplementary Figure S7). Increases in weather covariates positively associated with the yield factor λ1 are associated with larger positive covariances (more similar relative hybrid performance) among environments with positive loadings and reduced covariances involving environments with negative loadings (there is only one such environment, GAH2_2016). GDD was positively associated with λ1 in two time periods during the vegetative to reproductive transition and flowering time, indicating that positive covariances between some environments is partly due to shared higher than average values of GDD during these two periods. The negative value associated with GDD right after flowering time indicates a difference between environments that remain hot after flowering time and those that begin to cool after flowering time. Similarly, solar radiation was either positively or negatively associated with the λ1 value, depending on the period of the growing season. As was the case for the environment mean yields, high temperature around peak flowering time was associated with a decreased value of λ1, contributing to decreased covariance between heat-stressed environments and their non-stressed counterparts. Rain accumulation around flowering time was associated with a decrease in the value of λ1 while increased rain in days 6–10 after planting was associated with an increased λ1 value.

Although maize hybrids can manifest plastic responses to environmental stimuli throughout their development, responses at short time scales may not contribute greatly to observed differences in phenotype, whereas persistent or recurrent periods of stressful environmental conditions are likely to have a more drastic effect on development and yield (Heffner et al. 2009; Gage et al. 2017). For example, short periods under drought stress may not have severe effects on phenotypic outcomes, but prolonged periods lead to decreased fitness across multiple phenotypes (Lasky et al. 2015; Adee et al. 2016; Gage et al. 2017; Gerrish et al. 2019). Periods of environmental stress during critical growth periods likely also have a larger effect on phenotypic outcome, as indicated by examples in flowering time where heat stress and drought conditions lengthen the anthesis-silking interval (ASI) (Song et al. 2017) and that drought during flowering time can cause significant decreases in yield (Salvi et al. 2007). Optimal window size is challenging to determine; crop modeling and physiology can inform genotype-dependent definition of time periods where plant development is particularly sensitive to environmental factors. With sufficient information about the physiological development of each hybrid in the test, such hybrid-specific modeling of environmental effects can substantially improve environment-specific predictions (Millet et al. 2019).

Implications for genomic prediction

We performed 10-fold cross-validation to compare the ability of different models to predict hybrid marginal yield values (across all environments). Models including dominance genomic relationships substantially improved prediction ability (by 7–10 percentage points) compared to additive relationships and fitting both relationship matrices together had only little effect (Supplementary Table S9). This result supports the greater importance of dominance than additive genetic variance in this set of hybrids. Modeling the covariance of yield performance across environments using the FA1 model, however, resulted in a slightly lower prediction ability for the marginal values than the simpler cross-classified models (Supplementary Table S9). This suggests that more complex G × E models are not generally useful for predicting genotype marginal values. We suggest that the FA model formulations for multi-environment trials are still informative for understanding environment relationships, and further may provide improved prediction capability for environment-specific performance, in particular for prediction of performance in new environments that are not represented in a training set. The success of this approach would rely on (1) a FA model with a small number of factors capturing a very large proportion of the genetic covariance across environments, and (2) sufficiently large samples of environments paired with robust environmental indices that can accurately predict factor loadings for new environments. Using weather and soil data to predict environment factor loadings and genomic relationship matrices to predict genotypic scores, new untested combinations of environments and genotypes could be predicted.

Conclusions

The G2Fs project coordinates an extensive testing network with data useful for quantifying and exploring G × E interactions, but also provides challenges to how large data sets including data sets contributed by many different collaborators are processed. As extensive genomic and environmental data become increasingly available to breeding programs, one of the many challenges that researchers will face is how to effectively utilize data to improve breeding efficiency. While methodology for processing genomic data for use in both plant and animal breeding has rapidly developed over the last two decades, implementation of such methods for environmental data that would standardize input into GS type models has lagged behind and tends to remain the realm of crop growth modeling (Millet et al. 2019). Estimation of crop growth modeling parameters for the many thousands of unique lines in a typical maize breeding program at any one time will be difficult because of the extensive field-testing resources required. Utilizing field conditions directly as parameters in modeling the environment and G × E will allow representation of environments in a way that would allow for simpler implementation and for prediction in untested environments. Recent years have seen implementation of environmental covariates into some GS models, but these have usually focused on a few covariates such as temperature defined over a specific developmental window (Jarquín et al. 2014). As climate change threatens to significantly change many environments currently used for agricultural production, development of methods that use the available data resources for predicting performance in future environments will be important.

Our results demonstrate that G × E variance for yield was as large or larger than corresponding genotypic main effect variance when maize hybrids are tested across a wide geographic range in the United States and Canada. For other traits, such as flowering time, G × E variance accounts for only a small portion of phenotypic variance, indicating that it may be safely ignored when conducting selection for these traits. Understanding the relative importance of genotypic main effects compared to G × E interactions for a given trait can allow breeders to make effective decisions over when to implement more complex models to account for G × E, and when a simpler model may perform well for making selections. Our results suggest that, for genomic prediction of environment-specific hybrid yield performance predictions, dominance, and G × E interactions should be incorporated into prediction models, and that the measured environmental covariates may help to improve such predictions.

Acknowledgments

The authors would like to thank Alejandro Castro Aviles, Naser Alkhalifah, Emily Rothfusz and Jane Petzoldt from the University of Wisconsin, Madison for assisting with coordination of the project; and Dustin Eilert and Marina Borsecnik for assistance organizing and conducting field trials at the University of Wisconsin, Madison.

Funding

This project was supported by the Iowa Corn Promotion Board, Nebraska Corn Board, Minnesota Corn Research and Promotion Council, Illinois Corn Marketing Board, Wisconsin Corn Promotion Board, Arkansas Corn & Grain Sorghum Board, Kansas Corn Commission, Corn Marketing Program of Michigan, Texas Corn Producers, the National Corn Growers Association, USDA Hatch program funds to multiple PIs in this project, University of Georgia startup funds, and the USDA Agricultural Research Service.

Conflicts of interest: None declared.

Literature cited

Adee
E
,
Roozeboom
K
,
Balboa
GR
,
Schlegel
A
,
Ciampitti
IA.
2016
.
Drought-tolerant corn hybrids yield more in drought-stressed environments with no penalty in non-stressed environments
.
Front Plant Sci
.
7
:
1
9
.

Ågren
J
,
Oakley
CG
,
McKay
JK
,
Lovell
JT
,
Schemske
DW.
2013
.
Genetic mapping of adaptation reveals fitness tradeoffs in Arabidopsis thaliana
.
Proc Natl Acad Sci USA
.
110
:
21077
21082
.

Alkhalifah
N
,
Campbell
DA
,
Falcon
CM
,
Gardiner
JM
,
Miller
ND
, et al.
2018
.
Maize genomes to fields: 2014 and 2015 field season genotype, phenotype, environment, and inbred ear image datasets
.
BMC Res Notes
.
11
:
1
5
.

Anholt
RRH
,
Mackay
TFC.
2004
.
Quantitative genetic analyses of complex behaviours in drosophila
.
Nat Rev Genet
.
5
:
838
849
.

Baldauf
JA
,
Marcon
C
,
Lithio
A
,
Vedder
L
,
Altrogge
L
, et al.
2018
.
Single-parent expression is a general mechanism driving extensive complementation of non-syntenic genes in maize hybrids
.
Curr Biol
.
28
:
431
437
.

Bian
Y
,
Yang
Q
,
Balint-Kurti
PJ
,
Wisser
RJ
,
Holland
JB.
2014
.
Limits on the reproducibility of marker associations with southern leaf blight resistance in the maize nested association mapping population
.
BMC Genomics
.
15
:
1068
1015
.

Buckler
ES
,
Holland
JB
,
Bradbury
PJ
,
Acharya
CB
,
Brown
PJ
, et al.
2009
.
The genetic architecture of maize flowering time
.
Science
.
325
:
714
718
.

Burgueño
J
,
de los Campos
G
,
Weigel
K
,
Crossa
J.
2012
.
Genomic prediction of breeding values when modeling genotype × environment interaction using pedigree and dense molecular markers
.
Crop Sci
.
52
:
707
719
.

Bustos-Korts
DV
,
Boer
MP
,
Malosetti
M
,
Chapman
SC
,
Chenu
K
, et al.
2019
.
Combining crop growth modeling and statistical genetic modeling to evaluate phenotyping strategies
.
Front Plant Sci
.
10
:
1
21
.

Butler
D
,
Cullis
BR
,
Gilmour
AR
,
Gogel
BJ.
2009
.
ASReml–R Reference Manual Version 3
.
Hemel Hempstead, HP1 1ES, UK
:
VSN Int. Ltd
.

Cavanagh
CR
,
Chao
S
,
Wang
S
,
Huang
BE
,
Stephen
S
, et al.
2013
.
Genome-wide comparative diversity uncovers multiple targets of selection for improvement in hexaploid wheat landraces and cultivars
.
Proc Natl Acad Sci USA
.
110
:
8057
8062
.

Chapman
SC.
2008
.
Use of crop models to understand genotype by environment interactions for drought in real-world and simulated plant breeding trials
.
Euphytica
.
161
:
195
208
.

Comstock
RE
,
Moll
RH.
1963
.
Statistical Genetics and Plant Breeding; a Symposium and Workshop Sponsored by the Committee on Plant Breeding and Genetics of the Agricultural Board
. In:
Hanson
WD
,
Robinson
HF
, editors. Genotype-environment interactions.
NAS-NRC
:
Washington D.C
. pp.
164
196

Comstock
RE
,
Robinson
HF.
1948
.
The components of genetic variance in populations of biparental progenies and their use in estimating the average degree of dominance
.
Biometrics
.
4
:
254
266
.

Cooper
MD
,
DeLacy
IH.
1994
.
Relationships among analytical methods used to study genotypic variation and genotype-by-environment interaction in plant breeding multi-environment experiments
.
Theoret Appl Genet
.
88
:
561
572
.

Cooper
MD
,
Messina
CD
,
Podlich
D
,
Radu Totir
L
,
Baumgarten
A
, et al.
2014
.
Predicting the future of plant breeding : complementing empirical evaluation with genetic prediction
.
Crop Pasture Sci
.
65
:
311
336
.

Crossa
J
,
Pérez-Rodríguez
P
,
Cuevas
J
,
Montesinos-López
OA
,
Jarquín
D
, et al.
2017
.
Genomic selection in plant breeding: methods, models, and perspectives
.
Trends Plant Sci
.
22
:
961
975
.

Cullis
BR
,
Smith
AB
,
Coombes
NE.
2006
.
On the design of early generation variety trials with correlated data
.
JABES
.
11
:
381
393
.

Endelman
JB
,
Jannink
J.
2012
.
Shrinkage estimation of the realized relationship matrix
.
G3 Genes, Genomes, Genet
.
2
:
1405
1413
.

Endelman
JB
,
Schmitz Carley
CA
,
Bethke
PC
,
Coombs
JJ
,
Clough
ME
, et al.
2018
.
Genetic variance partitioning and genome-wide prediction with allele dosage information in autotetraploid potato
.
Genetics
.
209
:
77
87
.

Falconer
DS
,
Mackay
TFC.
1996
.
Introduction to Quantitative Genetics
,
4
th edn.
Longman, Essex
:
Pearson
.

Farjat
AE
,
Chamblee
AK
,
Isik
F
,
Whetten
RW
,
McKeand
SE.
2017
.
Variation among loblolly pine seed sources across diverse environments in the Southeastern United States
.
For Sci
.
63
:
39
48
.

Fisher
RA.
1919
.
The correlation between relatives on the supposition of Mendelian inheritance
.
Trans R Soc Edinb
.
52
:
399
433
.

Fournier-Level
A
,
Korte
A
,
Cooper
MD
,
Nordborg
M
,
Schmitt
J
, et al.
2011
.
A map of local adaptation in Arabidopsis thaliana
.
Science
.
334
:
86
89
.

Gage
J
,
Jarquín
D
,
Romay
MC
,
Lorenz
A
,
Buckler
ES
, et al.
2017
.
The effect of artificial selection on phenotypic plasticity in maize
.
Nat Commun
.
8
:
1348
.

Gerrish
BJ
,
Ibrahim
AMH
,
Rudd
JC
,
Neely
C
,
Subramanian
NK.
2019
.
Identifying mega-environments for hard red winter wheat (Triticum aestivum L.) production in Texas
.
Euphytica
.
215
:
1
9
.

Gilmour
AR.
2018
. Echidna Mixed Model Software. In Proceedings of the World Congress on Genetics Applied to Livestock Production, Auckland, New Zealand.

Gilmour
AR
,
Gogel
BJ
,
Cullis
BR
,
Welham
SJ
,
Thompson
R.
2015
.
ASReml User Guide Release 4.1 Structural Specification
.
Hemel Hempstead, UK
:
VSN International Ltd
.

González-Barrios
P
,
Díaz-García
L
,
Gutiérrez
L.
2019
.
Mega-environmental design: using genotype × environment interaction to optimize resources for cultivar testing
.
Crop Sci
.
59
:
1899
1915
.

Guo
T
,
Yu
X
,
Li
X
,
Zhang
H
,
Zhu
C
, et al.
2019
.
Optimal designs for genomic selection in hybrid crops
.
Mol Plant
.
12
:
390
401
.

Hallauer
A
,
Carena
M
,
Filho
JBM.
2010
.
Quantitative Genetics in Maize Breeding
.
New York
:
Springer
.

Heffner
EL
,
Jannink
J-L
,
Iwata
H
,
Souza
E
,
Sorrells
ME.
2011a
.
Genomic selection accuracy for grain quality traits in biparental wheat populations
.
Crop Sci
.
51
:
2597
2606
.

Heffner
EL
,
Jannink
J-L
,
Sorrells
ME.
2011b
.
Genomic selection accuracy using multifamily prediction models in a wheat breeding program
.
Plant Genome
.
4
:
65
75
.

Heffner
EL
,
Sorrells
ME
,
Jannink
J-L.
2009
.
Genomic selection for crop improvement
.
Crop Sci
.
49
:
1
12
.

Heslot
N
,
Akdemir
D
,
Sorrells
ME
,
Jannink
J-L.
2014
.
Integrating environmental covariates and crop modeling into the genomic selection framework to predict genotype by environment interactions
.
Theor Appl Genet
.
127
:
463
480
.

Hill
WG
,
Goddard
ME
,
Visscher
PM.
2008
.
Data and theory point to mainly additive genetic variance for complex traits
.
PLoS Genet
.
4
:
e1000008
.

Holland
JB.
2001
.
Epistasis and plant breeding
.
Plant Breed Rev
.
21
:
27
95
.

Howard
R
,
Gianola
D
,
Montesinos-López
OA
,
Juliana
P
,
Singh
RP
, et al.
2019
.
Joint use of genome, pedigree and their interaction with environment for predicting the performance of wheat lines in new environments
.
G3 (Bethesda)
.
9
:
2925
2934
.,

Huang
W
,
Mackay
TFC.
2016
.
The genetic architecture of quantitative traits cannot be inferred from variance component analysis
.
PLoS Genet
.
12
:
e1006421
.

Isik
F
,
Holland
JB
,
Maltecca
C.
2017
.
Genetic Data Analysis for Plant and Animal Breeding
.
Springer, New York
.

Isik
F
,
McKeand
SE.
2019
.
Fourth cycle breeding and testing strategy for Pinus taeda in the NC State University Cooperative Tree Improvement Program
.
Tree Genet Genomes
.
15
:
70
.

Jannink
J-L
,
Lorenz
AJ
,
Iwata
H.
2010
.
Genomic selection in plant breeding: From theory to practice
.
Brief Funct. Genomics Proteomics
.
9
:
166
177
.

Jarquín
D
,
Crossa
J
,
Lacaze
X
,
Du Cheyron
P
,
Daucourt
J
, et al.
2014
.
A reaction norm model for genomic selection using high-dimensional genomic and environmental data
.
Theor Appl Genet
.
127
:
595
607
.

Juliana
P
,
Montesinos-López
OA
,
Crossa
J
,
Mondal
S
,
González Pérez
L
, et al.
2019
.
Integrating genomic-enabled prediction and high-throughput phenotyping in breeding for climate-resilient bread wheat
.
Theor Appl Genet
.
132
:
177
194
.,

Kadam
DC
,
Potts
SM
,
Bohn
MO
,
Lipka
AE
,
Lorenz
AJ.
2016
.
Genomic prediction of single crosses in the early stages of a maize hybrid breeding pipeline
.
G3 (Bethesda)
.
6
:
3443
3453
.

Krishnamurthy
SL
,
Sharma
PC
,
Sharma
DK
,
Ravikiran
KT
,
Singh
YP
, et al.
2017
.
Identification of mega-environments and rice genotypes for general and specific adaptation to saline and alkaline stresses in India
.
Sci Rep
.
7
:
1
14
.

Lasky
JR
,
Upadhyaya
HD
,
Ramu
P
,
Deshpande
S
,
Hash
CT
, et al.
2015
.
Genome-environment associations in sorghum landraces predict adaptive traits
.
Sci Adv
.
1
:
e1400218
.

Lawrence-Dill
CJ.
2017
. Genomes to fields: GxE Field Experiment. https://www.genomes2fields.org/resources/#sop. (Accessed: 2021 January 11).

Lipka
AE
,
Kandianis
CB
,
Hudson
ME
,
Yu
J
,
Drnevich
J
, et al.
2015
.
From association to prediction: Statistical methods for the dissection and selection of complex traits in plants
.
Curr Opin Plant Biol
.
24
:
110
118
.

Lobell
DB
,
Hammer
GL
,
McLean
G
,
Messina
CD
,
Roberts
MJ
, et al.
2013
.
The critical role of extreme heat for maize production in the United States
.
Nature Clim Change
.
3
:
497
501
.

Lorenz
AJ
,
Chao
S
,
Asoro
FG
,
Heffner
EL
,
Hayashi
T
, et al.
2011
.
Genomic selection in plant breeding: knowledge and prospects
.
Adv. Agron
.
110
:
77
123
.

McFarland
BA
,
Alkhalifah
N
,
Bohn
M
,
Bubert
J
,
Buckler
ES
, et al.
2020
.
Maize genomes to fields (G2F): 2014-2017 field seasons: Genotype, phenotype, climatic, soil, and inbred ear image datasets
.
BMC Res Notes
.
13
:
4
9
.

Mercer
KL
,
Martínez-Vásquez
Á
,
Perales
HR.
2008
.
Asymmetrical local adaptation of maize landraces along an altitudinal gradient
.
Evol Appl
.
1
:
489
500
.

Mercer
KL
,
Perales
HR.
2019
.
Structure of local adaptation across the landscape: flowering time and fitness in Mexican maize (Zea mays L. subsp. mays) landraces
.
Genet Resour Crop Evol
.
66
:
27
45
.

Meuwissen
T
,
Hayes
B
,
Goddard
ME.
2013
.
Accelerating improvement of livestock with genomic selection
.
Annu Rev Anim Biosci
.
1
:
221
237
.

Meuwissen
T
,
Hayes
B
,
Goddard
ME.
2016
.
Genomic selection: a paradigm shift in animal breeding
.
Anim Front
.
6
:
6
14
.

Meyer
RS
,
Choi
JY
,
Sanches
M
,
Plessis
A
,
Flowers
JM
, et al.
2016
.
Domestication history and geographical adaptation inferred from a SNP map of African rice
.
Nat Genet
.
48
:
1083
1088
.

Millet
EJ
,
Kruijer
W
,
Coupel-Ledru
A
,
Alvarez Prado
S
,
Cabrera-Bosquet
L
, et al.
2019
.
Genomic prediction of maize yield across European environmental conditions
.
Nat Genet
.
51
:
952
956
.

Monteverde
E
,
Gutiérrez
L
,
Blanco
P
,
Pérez de Vida
F
,
Rosas
JE
, et al.
2019
.
Integrating molecular markers and environmental covariates to interpret genotype by environment interaction in rice (Oryza sativa L.) grown in subtropical areas
.
G3 (Bethesda)
.
9
:
1519
1531
.

Muñoz
PR
,
Resende
MFR
,
Gezan
SA
,
Resende
MDV
,
de los Campos
G
, et al.
2014
.
Unraveling additive from nonadditive effects using genomic relationship matrices
.
Genetics
.
198
:
1759
1768
.

Murray
SC
,
Mayfield
K
,
Pekar
J
,
Brown
P
,
Lorenz
A
, et al.
2019
.
Tx741, Tx777, Tx779, Tx780, and Tx782 inbred maize lines for yield and Southern United states stress adaptation
.
J Plant Regist
.
13
:
258
269
.

Ovenden
B
,
Milgate
A
,
Wade
LJ
,
Rebetzke
GJ
,
Holland
JB.
2018
.
Accounting for genotype-by-environment interactions and residual genetic variation in genomic selection for water-soluble carbohydrate concentration in wheat
.
G3 (Bethesda)
.
8
:
1909
1919
.

Peiffer
JA
,
Romay
MC
,
Gore
MA
,
Flint-Garcia
SA
,
Zhang
Z
, et al.
2014
.
The genetic architecture of maize height
.
Genetics
.
196
:
1337
1356
.,

Powell
JE
,
Visscher
PM
,
Goddard
ME.
2010
.
Reconciling the analysis of IBD and IBS in complex trait studies
.
Nat Rev Genet
.
11
:
800
805
.

R Core Team

2020
.
R: A Language and Environment for Statistical Computing
.
Vienna, Austria
:
R Foundation for Statistical Computing
.

Rajaram
S
,
van Ginkel
M
,
Fisher
RA.
1994
. CIMMYT’s wheat breeding mega-environments. In Proceedings of the 8th International Wheat Genetics Symposium, Beijing.

Ramstein
GP
,
Larsson
SJ
,
Cook
JP
,
Edwards
JW
,
Ersoz
ES
, et al.
2020
.
Dominance effects and functional enrichments improve prediction of agronomic traits in hybrid maize
.
Genetics
.
215
:
215
230
.

Revelle W (2020). psych: Procedures for Psychological, Psychometric, and Personality Research. Northwestern University, Evanston, Illinois. R package version 2.0.12.

Romay
MC
,
Millard
MJ
,
Glaubitz
JC
,
Peiffer
JA
,
Swarts
KL
, et al.
2013
.
Comprehensive genotyping of the USA national maize inbred seed bank
.
Genome Biol
.
14
:
R55
.

Romero Navarro
JA
,
Willcox
M
,
Burgueño
J
,
Romay
MC
,
Swarts
KL
, et al.
2017
.
A study of allelic diversity underlying flowering-time adaptation in maize landraces
.
Nat Genet
.
49
:
476
480
.

Ruiz Corral
JA
,
Durán Puga
N
,
Sánchez González
JDJ
,
Ron Parra
J
,
González Eguiarte
DR
, et al.
2008
.
Climatic adaptation and ecological descriptors of 42 Mexican maize races
.
Crop Sci
.
48
:
1502
1512
.

Saint Pierre
C
,
Burgueño
J
,
Crossa
J
,
Fuentes Dávila
G
,
Figueroa López
P
, et al.
2016
.
Genomic prediction models for grain yield of spring bread wheat in diverse agro-ecological zones
.
Sci Rep
.
6
:
1
11
.

Salvi
S
,
Sponza
G
,
Morgante
M
,
Tomes
D
,
Niu
X
, et al.
2007
.
Conserved noncoding genomic sequences associated with a flowering-time quantitative trait locus in maize
.
Proc Natl Acad Sci USA
.
104
:
11376
11381
.,

Schwarz
G.
1978
.
Estimating the dimension of a model
.
Ann Statist
.
6
:
461
464
.

Smith
AB
,
Cullis
BR
,
Gilmour
AR.
2001
.
The analysis of crop variety evaluation data in Australia
.
Aust New Zeal J Stat
.
43
:
129
145
.

Smith
AB
,
Cullis
BR
,
Thompson
R.
2005
.
The analysis of crop cultivar breeding and evaluation trials: an overview of current mixed model approaches
.
J Agric Sci
.
143
:
449
462
.

Song
K
,
Kim
HC
,
Shin
S
,
Kim
K-H
,
Moon
J-C
, et al.
2017
.
Transcriptome analysis of flowering time genes under drought stress in maize leaves
.
Front Plant Sci
.
8
:
1
12
.

Sorrells
ME.
2015
. Genomic selection in plants: empirical results and implications for wheat breeding. In:
Ogihara
Y
,
Takumi
S
,
Handa
H
, editors.
Advances in Wheat Genetics: From Genome to FIELD
.
Japan, Tokyo
:
Springer
. pp.
401
409

Su
G
,
Christensen
OF
,
Ostersen
T
,
Henryon
M
,
Lund
MS.
2012
.
Estimating additive and non-additive genetic variances and predicting genetic merits using genome-wide dense single nucleotide polymorphism markers
.
PLoS One
.
7
:
e45293
7
.

Turner
TL
,
Bourne
EC
,
Von Wettberg
EJ
,
Hu
TT
,
Nuzhdin
SV.
2010
.
Population resequencing reveals local adaptation of Arabidopsis lyrata to serpentine soils
.
Nat Genet
.
42
:
260
263
.

Van Eeuwijk
FA
,
Bustos-Korts
DV
,
Malosetti
M.
2016
.
What should students in plant breeding know about the statistical aspects of genotype × environment interactions?
Crop Sci
.
56
:
2119
2140
.

VanRaden
PM.
2008
.
Efficient methods to compute genomic predictions
.
J Dairy Sci
.
91
:
4414
4423
.

Vitezica
ZG
,
Varona
L
,
Legarra
A.
2013
.
On the additive and dominant variance and covariance of individuals within the genomic selection scope
.
Genetics
.
195
:
1223
1230
.

Ward
JHJ.
1963
.
Hierarchical grouping to optimize an objective function
.
J Am Stat Assoc
.
58
:
236
244
.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)
Editor: E Akhunov
E Akhunov
Editor
Search for other works by this author on: