Genotype-specific P-spline response surfaces assist interpretation of regional wheat adaptation to climate change


 Yield is a function of environmental quality and the sensitivity with which genotypes react to that. Environmental quality is characterized by meteorological data, soil and agronomic management, whereas genotypic sensitivity is embodied by combinations of physiological traits that determine the crop capture and partitioning of environmental resources over time. This paper illustrates how environmental quality and genotype responses can be studied by a combination of crop simulation and statistical modelling. We characterized the genotype by environment interaction for grain yield of a wheat population segregating for flowering time by simulating it using the the Agricultural Production Systems sIMulator (APSIM) cropping systems model. For sites in the NE Australian wheat-belt, we used meteorological information as integrated by APSIM to classify years according to water, heat and frost stress. Results highlight that the frequency of years with more severe water and temperature stress has largely increased in recent years. Consequently, it is likely that future varieties will need to cope with more stressful conditions than in the past, making it important to select for flowering habits contributing to temperature and water-stress adaptation. Conditional on year types, we fitted yield response surfaces as functions of genotype, latitude and longitude to virtual multi-environment trials. Response surfaces were fitted by two-dimensional P-splines in a mixed-model framework to predict yield at high spatial resolution. Predicted yields demonstrated how relative genotype performance changed with location and year type and how genotype by environment interactions can be dissected. Predicted response surfaces for yield can be used for performance recommendations, quantification of yield stability and environmental characterization.

model non-linear responses Eilers and Marx, 1996), and can be even extended to model variation across multiple dimensions (Lee et al., 2013;Rodríguez-Álvarez et al., 2015;Wood, 2017;Wood et al., 2013). Two-dimensional P-spline models are being used to separate genetic differences from spatial heterogeneity within trials (Boer et al., 2020;Rodríguez-Álvarez et al., 2018;Velazco et al., 2017). In this paper, we aim to illustrate the use of 2-dimensional P-spline approaches at larger scales than trials, i.e. to model spatially-dependent G×E variation at the level of the TPE, predicting the yield response surfaces of individual genotypes as a function of only latitude and longitude.
The choice of which environmental covariables to include in the prediction model largely depends on the environmental drivers for G×E. Within the TPE sample represented by METs, there may be recurring or repeating characteristics (i.e. that remain constant across years for a given location) that induce differential genotypic responses, which are an expression of G×E. Typical repeating characteristics are associated with latitude, longitude and to soil type. Latitude usually has a large effect on differential genotype adaptation via its effect on phenology (Zheng et al., 2012(Zheng et al., , 2013, whereas soil characteristics largely impact nutrient and water availability for the crop. Hence, soil differences between locations are usually increased in environments with low rainfall (Wang et al., 2017). Note however that in real-world METs, the effects of ‗location' may be affected by the fact that a ‗location' reference is usually associated with a nearby geographical reference (such as a town name), and the actual trials are not done in the exact same field each year even if breeders try to choose a typical soil type and management. In low rainfall environments like Australia where fallows and crop rotations are common, running a trial in exactly the same field and same place in successive years would be considered poor experimental practice due to potential carry-over effects of plot effects, impacts on soil water reserves and pressure of weeds and diseases.
Other characteristics of environmental quality are less predictable because they change from year to year with the weather fluctuations, but they are predictable from their frequencies estimated from long-term data. For example, the long-term frequency of water supply-demand ratio (Chapman et al., 2000b;Chenu et al., 2013) and the frequency of El Niño-Southern Oscillation (ENSO) events in the eastern Australian wheat-belt (Zheng et al., 2018) could potentially be used to estimate the probability of a certain stress level to occur at a particular location. If the probability of a certain year type (weather scenario) can be estimated from long-term data, it becomes attractive to predict response surfaces across latitude and longitude for each of the likely weather scenarios across years. In that Downloaded from https://academic.oup.com/insilicoplants/advance-article/doi/10.1093/insilicoplants/diab018/6316219 by guest on 09 July 2021 A c c e p t e d M a n u s c r i p t way, latitude and longitude effects become repeatable G×E, conditional on a year type or weather scenario.
In Australia, national variety trials of wheat are conducted by the GRDC (Grains Research and Development Corporation) in cooperation with commercial breeding companies. Those companies also conduct their own research trials. However, in neither of these variety trials are the same varieties grown over large numbers of seasons. The limited replication of varieties over years represents a bottleneck in studying long-term adaptative responses. This bottleneck can be addressed by utilizing crop simulation models to construct synthetic/virtual breeding trial datasets that span a longer series of years. This approach has been widely adopted for multiple crops; e.g. sorghum (Chapman et al., 2002;Hammer et al., 2014Hammer et al., , 2019; maize (Chenu et al., 2009;Harrison et al., 2014;Messina et al., 2011); wheat , and soybean (Messina et al., 2006), including studies that look at flowering time effects in wheat for current (Zheng et al., 2015b) and future climates .
In this paper, we present the use of P-splines embedded in mixed models to interpret G×E and predict the adaptive responses of individual wheat genotypes from simulated data for a region of about 677 by 445 km in size. APSIM yield was simulated for 156 genotypes varying in flowering time and sown in 13 Australian locations across 39 years of weather records. We focus on the adaptive responses across latitude and longitude, and we examine how these response surfaces change depending on the level of drought and heat stress present across years. We also describe the adaptation landscape in terms of the traits contributing to adaptation across environmental conditions (i.e. sensitivity to photoperiod, vernalization requirements and thermal time requirements from floral initiation and flowering).

Methods
This section describes the main steps of our approach; data generation using simulations in APSIM Wheat, G×E analysis of outputs generated by APSIM and construction of environmental indices using APSIM outputs to facilitate classification of years in year types. Conditional on year type, yield response surface models for individual genotypes were fitted as functions of longitude and latitude using P-spline methods within a mixed model context. The fitted response surfaces provided predictions of yield responses at any desired resolution level for all genotypes. The yield predictions were used to subdivide the area defined by longitude and latitude coordinates in adaptation zones. Our workflow is schematically represented in Figure 1.

Simulated data
Data corresponded to grain yield for 156 wheat genotypes simulated by Zheng et al. (2018) using the APSIM cropping system model (Holzworth et al., 2014) together with a phenology model (Zheng et al., 2013), frost impact module (Zheng et al., 2015a) and heat impact module (Lobell et al., 2015). In this data set, variation in APSIM genotype specific parameters was induced by allelic variation for the VRN-A1, VRN-B1, VRN-D1, and PPD-D1 genes, and the full range of values of additional thermal time requirements from floral initiation to flowering (from 425 to 1025 •Cd, Zheng et al., 2013). The set of genotypes included commercial varieties and virtual genotypes that could potentially be bred based on the flowering alleles present in the Australian germplasm pool (Zheng et al., 2013). Allelic combinations at vernalization and photoperiod genes produced variation for the APSIM parameters; Ppd, Vrn and Eps. Genotypes with the same phenology (but different allelic combinations) were disregarded, so that a total of 156 genotypes unique for their phenology were considered. Overall, the selected genotypes had APSIM parameters ranging from 0 to 1.2 for the photoperiod sensitivity (Ppd parameter, with values of 0, 0.3, 0.6, 0.9 and 1.2, with 0.6 for the reference genotype Janz), 0.9 to 1.7 for the vernalization sensitivity (Vrn parameter, with values of 0.9, 1.1, 1.3, 1.5 and 1.7, with 0.9 for Janz), and 425 to 1025•Cd for earliness-per-se (Eps parameter,with values of 425,475,525,575,625,675,725,775,825,925,975 and 1025•Cd, with 675•Cd for Janz). Genotypes were labelled by their flowering time parameters; the first number indicates the value for sensitivity to photoperiod, the second indicates vernalization requirement, and the third number indicates the minimum thermal time requirement from floral initiation to flowering. For example, ‗g1.2_0.9_425' indicates a genotype with a sensitivity to photoperiod of 1.2, vernalization requirements of 0.9 and minimum thermal time requirement from floral initiation to flowering of 425⁰ Cd. For most of the environments, the range for flowering time was around 50 days (Supplementary Figure S5). Note that this genotypic variation can be considered to be rather extreme compared to real-world conditions. Australian breeders tend to select wheats for early-season (slower maturing) or mainseason (faster maturing) sowing times. Commercial wheats are often classified as ‗quick', ‗medium' or ‗slow' and are usually compared to reference cultivars that are established on the market. Consequently, the range in flowering time within a typical breeders trial may be 3-4 weeks in early generation breeding, or 1 to 3 weeks in the type of MET we are considering here, i.e. much less than 50 days.
In this study, we focused on 13 out of the 15 locations used in Zheng et al. (2018), removing ‗Emerald' and ‗Roma' (Figure 2; Table 1). We dropped Emerald because it was geographically too distant, which doesn't allow for a reliable surface estimation with P-splines. Roma had extreme stress conditions, leading to zero yield for many genotypes. For the 13 locations we Downloaded from https://academic.oup.com/insilicoplants/advance-article/doi/10.1093/insilicoplants/diab018/6316219 by guest on 09 July 2021 A c c e p t e d M a n u s c r i p t considered, yield was simulated for each season from 1978 to 2016. Zheng et al. (2018) simulated several sowing dates. In this study, we restricted ourselves to one sowing date per location (the same date was used across years). The selected sowing date per location was identified as the one leading to the largest yield for the average of the genotypes (‗optimal' sowing date, Table 1). For a location, the starting soil conditions were the same in every year of simulation and represented the average starting condition for that location after the analysis of historical data in Zheng et al. (2018). Four environments with a large crop failure were removed, leaving in total 503 environments for the G×E analysis.

Using environmental covariables to classify years into year types
Water and temperature stress are common environmental drivers for grain yield in Australian wheat production systems (Ababaei and Chenu, 2020;Chenu et al., 2011Chenu et al., , 2013 year-location combination. The average environmental indices across genotypes were used to characterize environmental quality. Then, data was arranged in a matrix with years as objects in rows and index-location combinations as variables in columns. Variables were centred and scaled, and were used to calculate Euclidean distances between years. These distances were used in a hierarchical clustering procedure to classify years according to the four environmental indices relating to water, frost and heat stress (Ward method; Zelterman, 2015) using the ‗hclust' function in R (R Core Team, 2019). With this clustering procedure, years were classified into two classes; ‗mild' years (with reduced water and temperature stress) and ‗hot and dry' (with water, frost and heat stress).

Variance components model
To quantify the relative contribution of locations and years to the total G×E, we fitted the following mixed model to the APSIM yield: (1) Downloaded from https://academic.oup.com/insilicoplants/advance-article/doi/10.1093/insilicoplants/diab018/6316219 by guest on 09 July 2021 A c c e p t e d M a n u s c r i p t and year, whereas for real data the residual term would contain within trial error as well. Random effects were assumed to be independent and normally-distributed with zero means and specific variances; , , , .
To characterize the main sources of variation and quantify the contribution of year types (‗s', scenarios) to G×E variation, the model was expanded to model (2): (2) Equation (2) substitutes the random terms and from equation (1) by the following random terms; , which is the random interaction between genotype and year type, that represents genotype by year (within year type) interaction, that is the three way genotype by location by year type interaction, and that corresponds to a residual term that contains genotype by location by year interaction. As in model (1), all the random effects are assumed to be independent and normally distributed, with zero means and homogeneous variances. For the fixed part of the model, the year effect was partitioned into a year type ( and year within year type ( ) effect.
As the set of genotypes used in this study was specifically segregating for APSIM parameters related to flowering time, we also assessed the relative importance of those parameters, using the following model: (3) A c c e p t e d M a n u s c r i p t In model (3), is the yield of genotype i in environment t (year-location combination or trial), is the fixed environment effect, and are the random effects of the APSIM parameters that were used to generate the genotype i (see section ‗simulated data').
These parameters regulate photoperiod response, vernalization requirements and thermal time requirements from floral initiation to flowering. For the Ppd, Vrn and Eps APSIM parameters, the parameters defined 5, 5 and 13 classes, respectively. The interactions between APSIM parameters regulating phenology and the environment were also included. The term represents residual G×E. All random terms were assumed to be independent and normally distributed, with zero means and homogeneous variances.

GGE biplot
To visualize the contribution of APSIM parameters regulating phenology to APSIM yield and to describe their relation to genotypic performance across environments, we used a genotypegenotype-by-environment model (GGE, Yan and Kang, 2002;Yan and Rajcan, 2002).
In model (4), represents the mean yield of the i th genotype in the t th trial, stands for an intercept and is the fixed trial effect. The genotype main effect and the interaction are explained by M multiplicative terms. Each multiplicative term is formed by the product of a genotypic sensitivity (genotypic score) and environmental scores . Finally, is a residual term. To visualize how sensitivity to photoperiod, vernalization requirements and earliness contribute to G×E, the first two PCs estimated for G+G×E (∑ in model 4) were visualized as a GGE biplot, label-colouring genotypes according to their values for the APSIM parameters regulating phenology.
To understand the contribution of environmental conditions to genotypic performance, the standard GGE biplot was enriched with environmental information. The scaled environmental covariables were regressed against the environmental PC1 and PC2 from the GGE model. The coefficients of these regressions were used to describe directions of greatest change for these covariables in the biplot (Graffelman and Van Eeuwijk, 2005;Voltas et al., 1999). The direction is given by the regression coefficients and the origin. Furthermore, the angles between the direction vectors again give information about correlations: small angles between vectors mean high correlations, while angles > 90° indicate negative correlations between the vectors. A c c e p t e d M a n u s c r i p t

Predicting response surfaces with P-splines embedded in a mixed model
To describe the genotypic response across latitude and longitude, we used the following model for the APSIM yield data for each genotype separately, i.e., conditional on i: In Equation (5), is the yield of genotype i in location j and year k, is the fixed year effect.
The term defines the evaluation at location j of a set of basis functions (in vector form) for the spline fit on the latitude of the trial, while ( ) is the corresponding set of genotype specific random coefficients (in row vector form). Similar spline terms are defined for longitude, ( ) , and the interaction between latitude and longitude, ( ) . The interaction is orthogonal to the main effect spline terms for latitude and longitude (Boer et al., 2020;Piepho et al., 2021;Wood, 2017;Wood et al., 2013). The residual term contains G×E not explained by the spline surfaces and, for real data, within trial error. Smooth terms were fitted using first-degree P-splines with 10 segments and first-order penalties . Higher degree P-splines and higher differences could also be used, but the advantage of using first-degree P-splines and first order differences is that this model is equivalent to a linear variance model (Boer et al., 2020;Williams, 1996). The latitude and longitude range of the spline segments coincides with the latitude and longitude range spanned by the trials. The P-spline mixed model was fitted with ASReml-R v. 4 (Butler et al., 2017).
As there were contrasting year types, it made sense to predict the genotypic response surfaces of each genotype across latitude and longitude, conditional on the year type (one predicted surface per genotype for ‗mild' years and another predicted surface per genotype for ‗hot and dry' years).
Then, for each genotype we used the predicted surfaces to subtract ‗mild' years from ‗hot and dry' years. In this way, the difference in yield between ‗mild' and ‗hot and dry' years represents the sensitivity to year type, across the whole latitude and longitude span.
To characterize the surfaces for explicit environmental quality, we fitted model (5), replacing yield as a response by the environmental indices that were used to classify years; rainfall (S2_sum.rain), average maximum temperature (S3_avg.maxt), sum of frost temperatures (S2_frost.sum2) and vapour pressure deficit (S2_vpd2) averaged across genotypes.
To quantify the contribution of year-to-year variation (within year type) to the total variation, we extended model (5) as follows: A c c e p t e d M a n u s c r i p t Here, the term ( ) represents the set of random coefficients for latitude (in row vector form) that are specific to each year, i.e, they are scenario dependent deviations of the genotype specific coefficients introduced above. Similar spline terms are defined for year-by-longitude variation, ( ) , and the contribution of year to year variation to the interaction between latitude and longitude, ( ) . We quantified the contribution of each model term by starting with a null model with only a fixed year main effect. Then, we sequentially added terms in model 6. We quantified the contribution of each term by calculating the difference in the residual variance between the null model and the model with spline terms, divided by the residual variance of the null model.

Highest yielding genotypes across latitude and longitude (method 1 in Figure 1)
We used model (5) to make predictions for each genotype over a grid of 140 by 100 points that covered a latitude range from -27 ⁰ S to -33.5 ⁰ S and a longitude range of 147 ⁰ E to 151.5 ⁰ E (these ranges corresponded to the latitude and longitude ranges for the locations used in the simulations, Figure 2). The covered area approximately spans 677 km from North to South and 445 km from East to West, making each pixel equivalent to about 4.8 by 4.8 km.
For each pixel on this grid (=location), we identified the five genotypes that had the highest yield in the tested conditions (i.e. 1 sowing date, 1 soil condition), as predicted by the 2-dimensional spline surfaces across years within year type. The location by genotype matrix contained a 1 if a genotype was in the top five of the genotypic ranking at a particular location and a 0 if it was not. This is equivalent to applying a selection intensity of 3.2%. This presence or absence matrix was used to calculate the Jaccard similarities between locations, based on which were the five highest yielding genotypes. This similarity matrix was used for a hierarchical clustering procedure of locations (Ward method) implemented in the ‗hclust' function in R (R Core Team, 2019). The resulting regions have similar set of highest yielding genotypes. In such a way, these regions would reflect set of locations for which breeders might do the same genotype selection, at least in terms of maturity. We ran this procedure using the predictions for each year type separately (i.e. producing one classification for years with mild heat and water stress and one classification for ‗hot and dry' years).
Downloaded from https://academic.oup.com/insilicoplants/advance-article/doi/10.1093/insilicoplants/diab018/6316219 by guest on 09 July 2021 A c c e p t e d M a n u s c r i p t Figure 1) Assigning locations to regions by commonality of the five winning genotypes is a simple and straightforward method to classify locations. However, if genotypes that are in the upper yield percentiles are very similar in yield, this might lead to frequent changes in genotypic ranking across latitude and longitude, associated to small yield differences. This potentially leads to frequent spatial discontinuities in the classification of environments. A more robust procedure assigns locations to regions by the full set of fitted yields for all genotypes. Hence, we created yield predictions for each genotype on a grid of 15 by 24. For computational convenience, we reduced the grid resolution, compared to the analysis above that looked at the winning genotype per pixel. We considered that each point in the grid defined by latitude and longitude defined a (virtual) location. In this part of the analysis, each pixel covered about 28 by 28 km.

Using G+G×E to cluster environments in the whole adaptation landscape (method 2 in
As in the GGE model (Yan and Kang, 2002), we fitted a principal components models to the genotype by virtual location matrix of the environment-centred predicted yields from the 2dimensional spline surfaces across years within each year type. As the first two principal components explained most of the relevant variation (87.1 and 87.6% for ‗mild' and ‗hot' years, respectively), we retained the scores of both of them to construct a location by location similarity matrix (Euclidean distances). This similarity matrix was used for a hierarchical clustering procedure of locations (Ward method) implemented in the ‗hclust' function in R (R Core Team, 2019). For each year type, the clustering procedure resulted in location classes (regions).

Quantifying the contribution of sensitivity to photoperiod, vernalization requirements and earliness per se to the predicted adaptation landscape per year type
The contribution of sensitivity to photoperiod, vernalization requirements and earliness per se to the predicted adaptation surface was also quantified with a mixed model that was fitted to the spline-predicted yield across virtual locations. The following mixed model was fitted to predictions made for each year type separately, where the subscript r refers to the region as identified within year type by clustering on G+G×E of the predicted adaptation landscape; parameters regulating phenology and regions were also included. The term represents residual G×E.

Year classification
The clustering procedure on environment covariables suggested two clearly defined groups of years ( Figure 3), which were contrasting in their levels of water and temperature stress.  Ababaei and Chenu, 2020).
Within year type, there was also spatial variation for the environmental conditions ( Figure 4). years. The spatial pattern of S2_frost.sum differed between year types; in ‗mild' years, S2_frost.sum was larger in Southern locations than in the rest of the region, reflecting lower frost stress ( Figure 4G). In contrast, frost temperatures were more important in ‗hot and dry' years ( Figure 4H). A c c e p t e d M a n u s c r i p t

Variation in flowering time and yield
The variation in sensitivity to photoperiod, vernalization requirements and earliness per se led to large variation in flowering time (Supplementary Figure 5). Locations differed in the means and range of days to flowering. The shortest mean duration between sowing and flowering was observed in Goondiwindi, with a long-term mean across genotypes of 107 days. The largest duration was in Wellington, with a long-term mean across genotypes of 156 days. Within location, days to flowering did not seem to vary much between year types (Supplementary Figure   6). Note that the majority of the flowering dates (25 and 75 percentiles) coincide with the flowering ranges observed in real breeding trials. However, the full range of flowering dates is ca.
10 to 25 days greater than in most breeding trials.
The influence of year types was much larger on yield than on days to flowering, where year types had a strong effect on genotypic performance, with ‗hot and dry' years having in general lower yield across locations than ‗mild' years ( Figure 5), consistent with differences in environment quality. Given that ‗hot and dry' years have increased in their frequency during the most recent years ( Figure 5), a strategy to select for wheat varieties that are well-adapted to future climate conditions could be to select for varieties with adapted 'flowering genetics' or sow earlier Zheng et al., 2016).

Variance components for yield across locations and years
The variance components model indicates that there is large G×E for yield in this data set ( Table   3). Most of the G×E is not consistent from one year to the next and is mainly related to the twoway genotype by year interaction and the three way interaction between genotypes, locations and years (captured in the residual in Table 3). For the two-way genotype by year interaction, we investigated the option to classify years in groups that are more internally homogeneous and predictable. For example, from explicit covariables, as we did here and as done in Chenu et al. (2011);Zheng et al. (2018), or from long-term frequencies, as done in Chenu et al. (2013). The genotype by location interaction will be examined by fitting smooth surfaces with P-splines across latitude and longitudes.

Variance components for yield between year types
Upon integrating year type in the G×E analysis (  Table 3 are different from those of Table 4 (especially the genotype main effect) because of the absence of some locations in particular years (four yearlocation combinations were removed because of large crop failure in those environments).

Contribution of APSIM parameters regulating flowering time to G×E
We also estimated the contribution of APSIM parameters regulating flowering time on yield.
Across the 503 environments considered in this analysis, the largest contribution to the yield G×E was made the three-way interactions, especially Ppd:Eps:Env and Vrn:Eps:Env (Table 5). This implies that specific combinations of photoperiod and vernalization alleles, in combination with different earliness levels make important contributions to wheat adaptation across environments.
The two-way interactions between Eps:Env and Ppd:Env also made very important contributions to the total G×E variation ( Table 5) (Figure 7). In both years, the genotype main effect (Geno_mean) was more associated to larger values of Ppd (smaller angle between vectors), than to the Vrn and Eps. This result coincides with Figure 6, which showed that larger values for Ppd generally lead to larger yields. In contrast, Eps contributed more to G×E than to the genotype main effect. These results coincide with the variance components model in Table 5 and with Figure 6, which shows that the optimum value for Eps depends on the environment. In Figure 7, examining the relative positions of the vectors for environment indices and genotypic parameters regulating flowering time gives insight in the underlying physiological mechanisms of G×E in this data set. In ‗hot and dry' years, Eps induced a large G×E interaction with warmer and dryer environments (this can be observed by the almost opposite position of the vector for Eps with S2_avg.max and S2_vpd). In Figure 6, this can be observed as the position of the optimum for yield, in relation to Eps values, occurs at higher values for environments that are less drought-prone, like ‗Narrabri' or ‗Wellington' (Figure 4). This indicates that when more water is available, it is advantageous to have larger values for Eps (hence, later flowering), compared to very dry environments. In ‗mild' years, Eps also had a positive interaction with more favourable environments that had a larger rainfall (S2_sum.rain2). Larger values for Ppd and Vrn contributed to generate a negative interaction with warmer environments (i.e. those with larger values for S2_avg.max). In Figure 6, this can for example be observed as lower Ppd and Vrn values leading to a larger yield in droughtprone locations like ‗Meandarra', ‗Goondiwindi' and ‗Dalby' (Figure 4).

Spline surfaces across latitude and longitude for each year type
As the interaction of genotype and year type was large (Table 4 and  However, the average yield deviation for each virtual location (pixel) in the latitude-longitude range spanned by the trials, compared to the mean yield, was much less in ‗mild' than in ‗hot and dry' years. In ‗mild' years, the best locations had average yields that were about 1.3 times the general mean (for the same year type), and the worst locations had average yields that were about 0.8 times the general mean. In contrast, in ‗hot and dry' years, the best locations had average yields that were about 1.5 times the general mean (i.e. still lower than mean yield in a mild year) and the poorest locations had average yields that were about 0.4 times the general mean, i.e. the gradient of environmental quality was much stronger in ‗hot and dry' than in ‗mild' years. This coincides with Figure 8A, which show a larger difference in phenotypic variances in more rainy places in ‗mild' years.
Besides inspecting the gradients in average yield per year type, we examined the response surfaces for individual genotypes and focused on the yield difference between ‗hot and dry' and ‗mild' years across latitude and longitude for each genotype (Figure 9). Most genotypes showed a large yield reduction when comparing ‗hot and dry' and ‗mild' years. Only few of them (e.g.
‗g0_0.9_425', Figure 9A) had a similar yield (or even larger yield for some locations) in ‗hot and dry' than in ‗mild' years, causing strong G×E within and between year types ( Figure 9A). These exceptional genotypes were very early flowering, with small values for the three flowering time parameters and a very low mean yield (e.g. mean yield of ‗g0_0.9_425' in mild years was 773.9 kg ha -1 , Figure 9A). Genotypes with small Eps values, but larger sensitivity to photoperiod (e.g. ‗g1.2_0.9_425', Figure 9B) had much larger average yield than ‗g0_0.9_425' (2053.5 vs 773.9 kg ha -1 in mild years, Figure 9A) and showed an intermediate behaviour; maintaining or increasing yield in ‗hot and dry' compared to ‗mild' years in South-Eastern locations, and reducing yield in Western locations, in ‗hot and dry' compared to ‗mild' years ( Figure 9). The locations that led to a yield increase in ‗hot and dry' years for genotypes similar to ‗g0_0.9_425' ( Figure 9A) were characterized by lower temperatures and milder drought, compared to the Western locations ( Figure 4), which are more prone to drought (Chenu et al., 2011 and and heat stress (Ababaei and Chenu, 2020).
Genotypes with large Eps values (e.g. ‗g0_0.9_1025', Figure 9D; ‗g1.2_0.9_1025', Figure 9E and ‗g1.2_1.7_1025', Figure 9F) were much more sensitive to year type, with yield reductions between ‗mild' and ‗hot and dry' years of about 90% for some locations. For these genotypes, yield reduction in ‗hot and dry' compared to ‗mild' years was especially strong in Eastern locations, which have lower temperatures and milder drought (Figure 4). This apparent contradiction can be explained because genotypes with large Eps values express their larger yield A c c e p t e d M a n u s c r i p t potential in ‗mild' years and non-stressing conditions (i.e. Eastern locations). However, their yield is rapidly reduced under water limitation, heat and frost stress, and their high yield is no longer expressed in those locations during ‗hot and dry' years, leading to a large proportional reduction.
In contrast, Western locations, even for ‗mild' years, have less favourable conditions than Eastern locations ( Figure 3). Hence, their yield reduction between mild and ‗hot and dry' years is less strong than for Eastern locations. In both year types, genotypes with intermediate Eps values had larger general mean than those with very large values for Eps, and had a smaller yield reduction between ‗mild' and ‗hot and dry' years ( Figure 9G-I). Especially for those genotypes with intermediate values for photoperiod sensitivity (e.g. ‗g0.3_1.1_825' and ‗g0.6_1.1_675'), the range of reduction was also smaller between locations, corresponding to more stable genotypes (Figures 7 and 9).

Within-year type location classification based on the five highest-yielding genotypes
We predicted the yield response surfaces for each genotype and year type. The yield predictions per pixel in the latitude-longitude grid were used to classify pixels, defining adaptation regions that share the same five highest-yielding genotypes. Figure 10 shows that in both year types, locations in the North-West had a different set of top genotypes than in the South-East. However, the pattern was more marked for ‗hot and dry' than for ‗mild' years. In ‗mild' years ( Figure 10A), most of the locations belonged to the same region and the clustering procedure clearly indicated only two regions. In contrast, the response surface in ‗hot and dry' years had a more complex geographical separation and led to three regions ( Figure 10B). This shows that in hot and dry years, which are becoming increasingly frequent because of climate change, the TPE becomes more heterogeneous and more G×E is expressed.

Location classification based on G+G×E of all genotypes for each year type
Besides classifying locations based on the five highest-yielding genotypes, we also applied a clustering method on the first two environmental GGE scores calculated on the yield data. In this strategy, the response surfaces of all genotypes are considered simultaneously. For both year types, we observe that the classification pattern ( Figure 11) is similar to the one realized when focusing on the five highest-yielding genotypes ( Figure 10) and it coincides with the geographical distribution of temperature and water stress ( Figure 4); in both there is a diagonal pattern running parallel to the longitude boundaries (Figure 11), and approximating the rainfall patterns which are higher in the East than West of this region (Figure 4). The variance components for genotypes, genotype by region and residual G×E indicated that the relative importance of G×E, compared to the main effect was larger in ‗hot and dry' years than in ‗mild' years (Table 6A). In ‗hot and dry' years, the regions obtained after clustering on G+G×E of the P-spline surfaces also explained a A c c e p t e d M a n u s c r i p t smaller part of the G×E variation, indicating that the G×E has a more complex structure in ‗hot and dry' years than in ‗mild' years (Table 6A).
When estimating the APSIM parameter contribution to G×E across the predicted response surfaces per year type, it was clear that the genotype main effect was largely influenced by thermal time requirements between floral initiation and flowering time (Earliness-per-se, Table   6B). However, the relative importance was larger for ‗hot and dry' years than for ‗mild' years, coinciding with the genotype response surfaces across ‗mild' and ‗hot and dry ‗years ( Figure 6) and the observations made in the GGE biplot (Figure 7).  Figure 6, which indicate that in regions that are less drought-prone (like region 3, Figures 6 and   11), larger values of Ppd and Vrn lead to higher yield. In contrast, for regions that are more drought-prone (like regions 1 and 2, Figures 6 and 11), intermediate values for Ppd and Vrn lead to larger yield. This is especially the case for hot years, and in combination with large values for Eps. The residual G×E variation was much smaller than the total variation captured by the twoand three-way interactions between flowering time parameters and regions (Table 6B), indicating that the clustering of the G+G×E present in the predicted response surfaces was an effective method to capture the most important sources of spatial variation for G×E within year type.

Simulated yield landscapes allow testing of statistical methods over space and time
In this paper, we illustrate how to fit response surfaces for individual genotypes, and how these surfaces can be used to decompose the structure of repeatable G×E. We also describe the adaptation landscape encompassed by the phenotypic responses ascribed to traits regulating phenology.
The P-spline methodology presented here could be directly applied to real yield data from METs.
However, the use of bio-physical simulations of yield has the advantage of generating phenotypes for a long series of years. We illustrated how to identify year type scenarios, placing the predicted response surfaces and adaptation landscapes in a long-term context. The 39 years of data that we included in this study showed that the frequency of heat and drought stress is increasing over time, likely associated to climate change (Ababaei and Chenu, 2020;Fletcher et al., 2020).
A c c e p t e d M a n u s c r i p t When comparing the year types resulting from the clustering procedure and the ENSO events (Potgieter et al., 2005), it can be observed that most of the ‗El Niño' events correspond to ‗hot and dry' years, whereas ‗La Niña' events correspond to ‗mild' years. Years that could not be classified as either ‗El Niño' or ‗La Niña' were classified as ‗hot and dry' or ‗mild' in a comparable proportion (11 hot and 13 mild years; supplementary Table 1). Importantly, the approach that we have taken here could only improve the predictability of adaptation surfaces a posteriori, i.e. when knowing which kind of year is encountered. Nevertheless, the scientific community is making large efforts to improve weather forecasting (for example by the European ). Such forecasts would be https://www.ecmwf.int/ Range Weather Forecasts, -Centre for Medium very useful to inform breeders and farmers about the level of heat and water stress to be encountered by the crop in the upcoming season. An alternative strategy could be to undertake bivariate modelling of both year types simultaneously, borrowing information across year types.
In that way, predictions could be made at each location for the most likely year type, weighing year type information by their relative frequencies. The fact that the frequency of occurrence of ‗hot and dry' years has dramatically increased indicates a shift in the TPE. This needs to be addressed by selection strategies for future varieties, focusing on phenologies and sowing dates that are better targeted for hot years . In that sense, our response surfaces predicted for hot years might be more informative to select for varieties that are welladapted to future growing conditions, than the surfaces predicted for ‗mild' years.
The simulation setting used here shows how to arrive at useful insights about the environmental drivers of adaptation and repeatable G×E in the variable climates of the NE Australian wheat-belt, but remains limited to one sowing date and a fixed initial soil condition. The use of genotype specific yield response surfaces in combination with adaptation landscapes per year type can also be used to provide a better understanding of G×E for different management conditions, different regions and also future growing conditions, in the context of climate change. Another potential application of the predicted surfaces could be to inform the decisions about location sampling for METs. Given that breeders usually need to limit the number of trials that are conducted, the predicted surfaces could be used to identify those locations that represent well the G×E that is to be expected in the TPE in the long term (Chauhan et al., 2017). An option to ensure that the METs are informative about adaptation across the TPE could be to always include trial locations that do induce different adaptive responses across mild and hot years, and trial locations without much variation between mild and hot years (more consistent genotype discrimination across years).
Our spline approach is implicitly a spatial emulation of APSIM output, comparable to the approach taken by Stanfill et al. (2015) in the sense that the P-splines response surfaces produce yield as a statistical function of the same genotype and environment specific inputs that are fed Downloaded from https://academic.oup.com/insilicoplants/advance-article/doi/10.1093/insilicoplants/diab018/6316219 by guest on 09 July 2021 A c c e p t e d M a n u s c r i p t into APSIM to produce yield. In principle, the response surfaces could have been directly produced by running APSIM simulations on a very dense grid across latitude and longitude.
However, that requires massive computation time, and a dense grid for soil and weather if we want to accurately simulate yield variation. One simplification applied by Chapman et al. (2000c) used a weather grid in the mapping of variation in environment types, and ran the model multiple times for different soil types with the implication that environment types could be spatially referenced by knowing the latitude, longitude and the soil type. Our combined approach of APSIM simulations and spline models provides a more feasible and computationally efficient framework for a detailed characterization of genotypic responses, also applicable to real-world data sets. The main difference between both approaches (P-splines vs. APSIM output) is that Psplines by definition assume smooth gradients across latitude or longitude. However, there might be abrupt changes in environmental quality, for example due to changes in the soil quality resulting from the geological history of soil landscape. While the weather tends to vary spatially in some smooth way (associated with geography and interactions with weather systems), the soil does not vary smoothly, especially in a continent as old as Australia. Geology has, at extremely long time scales, created a landscape which, for any given region may comprise both abrupt (over 10s of metres) and gradual (over 100s of metres) changes in soil characteristics like water holding capacity. Abrupt changes are better addressed by crop growth models like APSIM, which contain explicit functions of environmental variables. A potential way to compare both methodologies would be to re-fit our P-spline models on a very dense grid of APSIM output and quantify the prediction error introduced by the discontinuities and abrupt changes that cannot be captured with the P-spline model. In such a case, additional fixed effects could also be introduced in the P-spline model to account for soil covariables that are present in a spatially discontinuous fashion.
In this proof of concept, we used a simulated population that was segregating only for flowering time parameters. This had the advantage that we could examine an adaptation landscape represented in a simple space of variation determined only by three traits, making it conceptually straightforward to understand. The range of variation for APSIM parameters regulating flowering time was very broad, leading to very large G×E. This G×E was also partially driven by the fact that, for a given location, all genotypes were sown on the same date. However, slow-maturing genotypes are sown early, and fast-maturing genotypes are sown late. Overall, the flowering time variation observed in this data set can be considered as an upper bound and it could be made narrower, to match the range of flowering time variation (1 to 3 weeks) that is commonly encountered in Australian wheat germplasm targeted for a given region (Zheng et al., 2012(Zheng et al., , 2013. The way in which the simulations cover the parameter space can modify the G×E patterns for the final trait. Therefore, it would be informative (i) to extend this approach by restricting the range of flowering time variation to that observed for well-adapted local germplasm and sowing A c c e p t e d M a n u s c r i p t dates to inform growers about better adapted lines depending on the year type or ENSO events (Zheng et al., 2018) and (ii) to consider other genetically-varying APSIM parameters to assist breeding progress (Bustos-Korts et al., 2019b;Chapman et al., 2003;Hammer et al., 2014). The decision about which parameters should show variation to achieve G×E patterns that directly relate to the target population of genotypes depends on the traits for which the population is segregating and on the range of environmental conditions that is explored. On the other hand, the sensitivity of APSIM to changes of specific parameters defines the impact of those traits on G×E in the TPE (Casadebaig et al., 2016). Furthermore, for a final proof of the utility of this approach, it would be important to implement it on real data, like the MET data coming from preregistration and post-registration trials.

Environment classification to look for spatial adaptation per year type
In this paper, we investigated two levels of environment classification. First, we classified years according to explicit environmental quality and we identified two groups; one for years with mild levels of temperature and water (drought) stress and a second group that consisted of years with higher temperatures and stronger water and frost stress. These groups captured part of the G×E associated to weather patterns, which is in general not very repeatable. As a second step, we fitted yield response surfaces, conditional on year type. These response surfaces provided insight on G×L, allowing us to classify locations according to the genotypic ranking across latitude and longitude. Within year type, these location classes correspond more directly to geographical and soil characteristics, which are more repeatable than across year types, allowing breeders to exploit specific adaptation. We observed that hot years are becoming increasingly frequent, coinciding with results reported by Ababaei and Chenu (2020). Hence, the response surfaces and adaptation landscapes generated for hot years become especially useful to select varieties that are adapted to future growing conditions. Such an approach can also be directly applied for projected future climate scenarios (e.g. Watson et al., 2017). We also showed that there is an increased spatial heterogeneity in environmental quality during ‗hot and dry' years, reflecting that the TPE is becoming more heterogeneous because of climate change. In the long run, this would require breeders to revisit the number and location of their METs because and increased heterogeneous TPE would also require an increased number of testing locations.
We assessed two methods to classify locations into regions; grouping locations with a similar set of genotypes in the top five of the genotypic ranking (Figure 10), and clustering locations based on the GGE scores ( Figure 11). The main conceptual difference between both methods is the degree of importance attached to the highest-yielding genotype, compared to the whole set of genotypes. Both strategies led to similar groupings, with South-Western locations generally belonging to a different region, than those more continental locations. However, from the plant Downloaded from https://academic.oup.com/insilicoplants/advance-article/doi/10.1093/insilicoplants/diab018/6316219 by guest on 09 July 2021 A c c e p t e d M a n u s c r i p t breeding perspective, it might be more appealing to focus the attention on the highest-yielding genotypes as those are the ones that will determine genetic gain (Yan and Kang, 2002).

Characterization of G×E across the adaptation landscape
In this study, a) we compared several methods to understand the structure of G×E across latitude and longitude within part of the north-eastern Australian wheat-belt; b) we calculated yield differences between hot and mild years for each genotype, c) we used the predicted response surfaces for each genotype to identify the highest-yielding genotype per virtual location (pixel), d) we clustered on G+G×E generated by all genotypes across latitude and longitude, creating regions for each year type, and e) we quantified the contribution of APSIM parameters regulating flowering time to G×E across those regions. In general, all strategies supported the conclusion that locations in the South-East of the studied region have higher yield (Figure 8), but also higher yield fluctuations between year types, making a strong contribution to G×E in this part of the north-eastern Australian wheat-belt. It also became apparent that the spatial heterogeneity is larger under ‗hot and dry' years, which are becoming increasingly frequent because of climate change.
We used simulated data to illustrate the approach. However, it would be interesting to confirm these findings in real phenotypic data that are coming from METs. For example, to investigate the efficiency of variety testing in VCU trial networks (VCU= value for cultivation and use). Such an analysis could give useful insight about the changes in variance and genotypic ranking across the whole target production area. If large changes in genotypic ranking occur, it is convenient to subdivide the TPE into smaller areas (Atlin et al., 2000;Piepho and Möhring, 2005), and to potentially select or recommend varieties within those smaller regions. The high spatial resolution of our predictions contain valuable information to support variety recommendation at the farmer's level because predictions can be made for the exact location of fields , provided that accurate soil information at the farm level is available. In this way, MET information is more directly leveraged to the level of specific locations.
As the relative contribution of traits underlying yield severely changes depending on the environment conditions Chenu et al., 2009Chenu et al., , 2013Slafer et al., 2021), another aspect that could be further explored is the contribution of underlying traits to genotype adaptation. Here, we focused on grain yield predictions using P-Splines. A c c e p t e d M a n u s c r i p t

Including explicit environmental covariables
Where differences in adaptation are not related to latitude and longitude, or when G×E variation is not ascribed to specific locations and years (not very repeatable), as often occurs in Australian environments (Chapman, 2008;Chapman et al., 2000a;Chenu et al., 2011Chenu et al., , 2013, an alternative is to express G×E as function of explicit environmental covariables (Nicotra and Davidson, 2010).
The use of environmental covariables could also open interesting opportunities to model the year to year variation more explicitly. We now used environmental indices related to water and temperature stress to classify years into the categories ‗mild' and ‗hot'. Although these year types do explain substantial G×E variation, they are still internally heterogeneous. Hence, part of the G×E information is lost when predicting response surfaces for each year type. An alternative would be to model yield variation as explicit function of latitude, longitude and an environmental covariables (that could also be, for example, an index that condenses the information contained in water and temperature-related variables). In that way, the model would make a more explicit use of the continuous variation that is also contained in the year to year variation.

Future developments
In this paper, we fitted spline surfaces to subsets of the data (per genotype, per year type, etc). In future work, we plan to extend the model, fitting it to all genotypes simultaneously. Such a model extension would provide parameter values that have been estimated simultaneously, which allows to borrow strength between genotypes. A second advantage of estimating parameters in the same model fit is that it can provide more insight into G×E than using the predicted response surfaces alone. An additional development of the spline approach would be to fit surfaces for different soil types, and then to determine how these could be combined and re-fitted if more local soil data can be provided. The effect of sowing dates is also an important aspect that would need to be considered in a more detailed study of wheat adaptation. This is especially relevant as large interactions between rainfall and soil type are to be expected because of the differences in soil water retention capacity (Chenu et al., 2013). Further developments (already introduced above) could be the use of bivariate models considering either yield and underlying traits or yield in mild and hot years. Such bivariate approaches would have the advantage of borrowing strength across traits/year types and would provide useful insights in the interplay between traits in modulating adaptive responses across environments.
Downloaded from https://academic.oup.com/insilicoplants/advance-article/doi/10.1093/insilicoplants/diab018/6316219 by guest on 09 July 2021 A c c e p t e d M a n u s c r i p t

Conclusions
Fitting yield response surfaces for individual genotypes was useful to understand the structure of G×E and to predict genotype adaptation across the whole latitude and longitude range encompassed by the simulated METs.
The long-term simulations indicated the presence of two types of years, with different levels of water and temperature stress. These year types contributed significantly to G×E variation and for that reason it is advisable to predict response surfaces per year type separately.
The frequency of years with increased temperature and water stress is increasing in the most recent years, indicating that, to select varieties that are well-adapted to future growing conditions in a context of climate change, it might be advisable for breeders to base their selections on the predicted response surfaces for ‗hot' years.
We used simulated data to illustrate the approach, but the spline methodology presented here can also be applied to real data from METs in breeding programs and VCU networks.
From the three APSIM parameters regulating phenology, the thermal time requirement between floral initiation and flowering had a large yield main effect (genotypes with larger requirements had higher yield, especially in mild years). Yield G×E interactions were mostly driven by specific combinations of Eps (thermal time requirement between floral initiation and flowering) and Ppd values. Our approach combining crop simulations and statistical models was useful to interpret the adaptation landscape as combinations of the APSIM phenology parameters. M a n u s c r i p t A c c e p t e d M a n u s c r i p t Table 2. Description of the environmental indices calculated with APSIM output and that were used to classify environments according to their levels of water and temperature stress. The four indices were calculated in a period from flowering + 100 o C to flowering + 600 o C.

Name environmental index Description
S2_sum.rain Sum of rainfall S2_frost.sum Accumulated thermal time when minimum temperature is less than 0 A c c e p t e d M a n u s c r i p t A c c e p t e d M a n u s c r i p t A c c e p t e d M a n u s c r i p t A c c e p t e d M a n u s c r i p t Table 6. Variance components estimated for the APSIM yield in the 13 locations for 156 genotypes varying in APSIM parameter relative to photoperiod sensitivity (PPD), vernalization requirement (Vrn) and thermal time from floral initiation to flowering (earliness per se; Eps). The analysis was made separately for ‗mild' and ‗hot and dry' years. For each year type, locations were classified into regions by clustering on spline-predicted yields for the full set of genotypes ( Figure 10).