Incorporating Realistic Trait Physiology into Crop Growth Models to Support Genetic Improvement

In-silico plant modeling is the use of dynamic crop simulation models to evaluate hypothetical plant traits (phenology, processes and plant architecture) that will enhance crop growth and yield for a defined target environment and crop management (weather, soils, limited resource). To be useful for genetic improvement, crop models must realistically simulate the principles of crop physiology responses to the environment and the principles by which genetic variation affects the dynamic crop carbon, water, and nutrient processes. Ideally, crop models should have sufficient physiological detail of processes to incorporate the genetic effects on these processes to allow for robust simulations of response outcomes in different environments. Yield, biomass, harvest index, flowering date, and maturity are emergent outcomes of many interacting genes and processes rather than being primary traits directly driven by singular genetics. Examples will be given for several grain legumes, using the CSM-CROPGRO model, to illustrate emergent outcomes simulated as a result of single and multiple combinations of genotype-specific parameters and to illustrate genotype by environment interactions that may occur in different target environments. Specific genetically-influenced traits can result in G x E interactions on crop growth and yield outcomes as affected by available water, CO 2 concentration, temperature, and other factors. An emergent outcome from a given genetic trait may increase yield in one environment but have little or negative effect in another environment. Much work is needed to link genetic effects to the physiological processes for in-silico modeling applications, especially for plant breeding under future climate change.


Introduction
Genetic improvement of crops is important for increasing world food supply, in view of the continued increase in world population on a fixed arable land base with limited water and fertilizer resources (Hickey et al., 2019). Enhancing crop yield through science-based breeding has occurred over many decades, a task that has been further accelerated in recent years through molecular technologies using DNA-based markers (Varshney et al., 2020). Plant breeders, for many years, have attempted, and succeeded in many cases, to model plant ideotypes that result in higher yields (Donald, 1968) starting in the 1960s with the shorter semi-dwarf rice (Oryza sativa) cultivars that did not lodge under increased N supply (Chandler, 1969). With improved tools (molecular genetics, more advanced crop models) and an urgent mission of adapting to climate change and feeding an increased world population, it is timely to use these tools to hypothesize genetic improvement in yield. With the recent advancements in dynamic crop growth simulation (Thorburn et al., 2018;Boote, 2019), crop models have excellent potential for analyzing past genetic improvement from experimental data and for proposing plant ideotypes for target environments (Suriharn et al., 2007(Suriharn et al., , 2011Peng et al., 2008;Putto et al., 2008). Crop simulation models have potential for creating "virtual crop cultivars" and for assisting the breeder"s selection criteria as well as integrating molecular marker-based information Hoogenboom et al., 2004;Chenu et al., 2009;Hammer et al., 2010;Hammer et al., 2016;Muller and Martre, 2019;Oliveira et al., 2021), and for genetic enhancement of important traits that contribute to yield improvement in different target environments (Hammer et al., 2004;Hammer and Jordan, 2007;Technow et al., 2015;Yin and Struik, 2016).
Using crop simulation models to evaluate traits for genetic improvement is not new and has been attempted a number of times during the past 40 years (Duncan et al., 1978;Landivar et al. 1983;Elwell et al., 1987;Boote and Tollenaar, 1994;Hammer et al., 1996;Boote et al., 2001;Boote et al., 2003;Chapman et al., 2003). What is different is that molecular genetics information was not available during those early efforts and connection to true genes was not feasible. What is perhaps unique about those early efforts is that those models were relatively mechanistic for their time. Elwell et al. (1987) used the SOYMOD soybean model (Curry et al., 1978), which included leaf level photosynthesis and mass flow phloem translocation. Boote and Tollenaar (1994) used an early version of CROPGRO that had leaf-to-canopy assimilation, explicit tissue compositions, growth respiration, and single seed growth dynamics. Also during that time, there were attempts at genetic variation with simpler models, although the interpretation of genetic information to alter radiation use efficiency (RUE), for example, is difficult because RUE is the emergent outcome of the expression of many genes. However, even models that used emergent traits as dynamic state variables have been successful for simulating genetic variability influences on life cycle phase durations, and are useful for defining best cultivar life cycles for given target environments (Loffler et al., 2005;Technow et al., 2015).
In the past two decades , crop modelers have attempted to connect genetic yield improvement more closely to genes and quantitative trait loci (QTLs). White and Hoogenboom (1996) were among the first to add specific gene effects into a crop model A c c e p t e d M a n u s c r i p t 4 for dry bean (Phaseolus vulgaris). Other examples include QTL-based predictors of specific leaf area of rice (Yin et al., 1999), leaf area expansion in maize (Zea mays L.) (Tardieu, 2003), time to flower in rice (Yin et al., 2005a(Yin et al., , 2005b, time to heading in wheat (Triticum) (Zheng et al., 2013), and time to flower, pod-set, and maturity of soybean (Glycine max L.) (Messina et al., 2006). QTL connection to reproductive life cycle progression appears to be convenient and relatively easy to achieve. Life cycle is important to fit cultivars to their target environment, and as Messina et al. (2006) showed, those QTL effects on life cycle account for a considerable amount of cultivar yield variability.
Others have tried to connect QTLs and genes in crop models to simulate more complex phenotypic outcomes for breeding purposes (Yin et al., 1999;Chapman et al., 2003;Yin et al., 2004;Hammer et al., 2004;Cooper et al., 2005;Hammer et al., 2005;Chenu et al., 2009;Hammer et al., 2010;Hammer et al., 2016). For example, Chenu et al. (2009) linked 11 QTL markers to three Genotype Specific Parameters (GSPs) affecting leaf elongation rate (LER) and one GSP affecting the anthesis-to-silking interval (ASI) of maize. The LERconnected coefficients were temperature sensitivity, VPD-sensitivity, and water-potential sensitivity. These sensitivities for leaf area expansion and grain-set (latter via ASI) were placed into a modified APSIM-Maize model, and all possible re-combinations of the QTLs were simulated to evaluate maize yield under drought conditions. The re-combinations were coded with the same approach as Messina et al. (2006) to indicate presence (1) or absence (0) of a given allele. Also important in this case, is that these features of hourly leaf elongation sensitivity to temperature, VPD, and water potential were placed into the APSIM model (Hammer et al., 2010) which already responds to management (sowing date, sowing density, irrigation, N fertilization, soil characteristics), and environmental conditions (temperature, irradiance, rainfall), such that the APSIM model generated emergent phenotypic outcomes for simulated drought conditions. They simulated a number of G × E interactions resulting from combinations of the QTL markers for maize in different drought environments.
Early crop simulation models were designed to be specific for a given crop, with little if any emphasis on genetic differences within a species (see BACROS and SUCROS examples in Penning deVries and van Laar, 1982). Other models such as CERES-Maize and CERES-Wheat  have cultivar differences but with relatively few cultivar-specific parameters (GSPs). Both types of models are generic in the sense of simulating crop C, N, and water balances, and crop life cycle progression using generally accepted principles of biology and biogeochemical processes but with minimal consideration for real genetics. In these cases, there was little linkage to the true genetics (e.g., DNA, molecular markers). Most recent crop models are designed to simulate cultivars within a species; nevertheless, those models" GSPs are artificial constructs (parameters) that reproduce different phenotypic life cycles, daylength sensitivities, productivities, and seed size/growth rate traits without considering molecular genetics information. Although there is a need to link crop models to molecular genetics, fully accomplishing this task will take decades due to the complexity and lack of sufficient details at the process level that are normally captured by the crop simulation models. The goal in this paper is to describe how GSPs in current crop simulation models can be used to best consider traits for use in genetic A c c e p t e d M a n u s c r i p t 5 improvement. Using crop simulation models to evaluate genetic improvement has advantages over pure bio-informatics analysis. This is because the models have embedded in them the functional sensitivities of physiological processes to temperature, water deficit, N stress, and other stresses, and they incorporate C, N, and water balances. The latter features are missing in pure bio-informatics approaches, thus yield predictions that integrate dynamic crop models in Whole Genome Prediction are more predictive of responses to weather, soil, and management (Technow et al., 2015;Cooper et al., 2016;Messina et al., 2018). Weather, soil, and management components are very important in order to fully understand the dynamic interactions between genetics and environmental conditions (Tsuji et al., 1998). We will briefly review general principles of crop simulation models and then introduce ways in which the crop models handle cultivar differences and genetic input.

Dynamic Crop Simulation Models -Processes and Responses to Environment
Dynamic crop growth models compute the crop carbon (C) balance on a daily (or hourly) basis, based on rates of photosynthesis, C losses to growth and maintenance respiration, C losses to senescence/abscission of plant parts, and partitioning of the net C to produce dry matter of different plant organs (Boote et al., 1998). Mathematical relationships for process sensitivities to environmental variations are included in the models. Those equations, rules and sensitivities of processes to weather and environment have been learned from analyses of measurements and studies of crop physiologists.
In addition to simulating soil N and C balance processes associated with organic matter decomposition and mineral N dynamics, the models also simulate the crop N balance on a daily time step, with N input from root N uptake and/or N 2 -fixation, N losses to senescence/abscission, N partitioning to different organs, and N mobilization from one organ to another (Boote et al., 2008). Likewise, there are equations and sensitivities of root N uptake, N 2 -fixation, cost of N reduction, and N mobilization and partitioning that are parameterized from measurements. The daily crop-soil N balance simulates that tissue N concentrations will vary over time and, therefore, affect other processes such as photosynthesis, partitioning, and grain growth.
The soil water balance is also extremely important in crop models as most crops are grown under rainfed conditions. An important principle that links the crop C gain to the water balance is the obligate stomatal coupling of CO 2 uptake (photosynthesis) to water vapor escape (transpiration) from the stomatal pores. The processes affecting water balance include inputs from rainfall and irrigation considering infiltration versus run-off, water movement through the soil (leaching and lateral flow), as well as evaporative losses from the soil surface and the crop transpiring surfaces (Ritchie, 1998;Boote et al., 2009). The soilcrop water balance usually honors principles of conservation of water and energy, on a daily or shorter time step, using ET methods such as the Penman-Monteith evapotranspiration equations (Allen et al., 1998).
Crop development is a fourth type of process in the models that leads to crop growth stage progression and crop phenology, which are very important in all dynamic crop growth A c c e p t e d M a n u s c r i p t 6 models. Simply stated, crop development is driven by growing degree days (GDD) or photothermal time unit accumulation for successive phases of the crop from sowing to maturity, that allows for the crop to fit in a given growing season niche.
The word "dynamic" is very important in the context of crop growth models. This means that the crop responds dynamically to each day"s weather and soil conditions, as influenced by that day"s already-existing crop state (leaf area index, amount of biomass, root length density, etc.). That differs considerably from statistical regression models. Some models actually simulate and scale up from leaf to canopy assimilation on an hourly time step (Boote and Pickering, 1994;Pickering et al., 1995;Grant et al., 1995) although Class A weather data (solar radiation, Tmax, Tmin, and rainfall) are typically only available on a daily basis.
Finally, crop models integrate over time these multiple process rates with sensitivities to environment, and they simultaneously honor C, N, and soil water balances as the crops progress through their life cycle sequences. Thus, crop models represent inherent resource limitations and environmental sensitivities and illustrate that yield requires inputs for water, N and other nutrients, solar radiation, CO 2 , and temperature over time during growing seasons.

Genetics and Genotype-Specific Parameters (GSPs) in Crop Simulation Models
The above-described process-oriented models are relatively generic in their structure and simulation of vegetative and reproductive development, photosynthesis, respiration, translocation, partitioning, and reproductive growth processes. However, the desire to accurately simulate the growth and development of particular genotypes within a given species resulted in the incorporation of cultivar-specific information in the modeling structure as done with the Decision Support System for Agrotechnology (DSSAT) crop models beginning in DSSAT V3.5 (Tsuji et al. 1998;Boote et al., 1998) and subsequent versions Boote et al., 2003;Hoogenboom et al., 2019a, b). White and Hoogenboom (2003) proposed six levels by which genetic details are included in crop simulation models: 1. Generic model with no reference to species. 2. Species-specific model with no reference to genotypes. 3. Genetic differences represented by cultivar-specific parameters. 4. Genetic differences represented by specific alleles, with gene action represented through linear effects on model parameters. 5. Genetic differences represented by genotypes, with gene action explicitly simulated based on knowledge of regulation of gene expression and effects of gene products. 6. Genetic differences represented by genotypes, with gene action simulated at the level of interactions of regulators, gene-products, and other metabolites (in other words, the full genetic architecture relative to current crop state and its environment).
A c c e p t e d M a n u s c r i p t 7 Most crop simulation models originated at level 2 for a given crop, based on experience with growth measurements and the developer"s knowledge of a candidate crop. The early models developed in the "School of de Wit" were at level 1, including BACROS and SUCROS (Bouman et al., 1996). Some models remain at level 2, but many are now at level 3, with genetic differences represented by cultivar-specific parameters (GSPs). No candidate models have approached levels 5 and 6, while the GeneGro models for common bean (White and Hoogenboom, 1996;Hoogenboom et al., 1997;Hoogenboom and White, 2003) and soybean (Messina et al., 2006) and for wheat (White et al., 2008) have attempted level 4, with actions of specific genes (allelic 0 vs 1). Zheng et al. (2013) developed a genebased model for time to heading in wheat that was incorporated into APSIM-Wheat, based on a vernalization genes (Vrn-A1, Vrn-B1, Vrn-D1) and a photoperiod sensitivity (Ppd-D1) gene. The predictability of heading was 4.3 days for 210 spring wheat lines over many sowing dates and locations in Australia. Recently, Hwang et al. (2017) developed a simple crop phenology model based on QTL marker information. Oliveira et al. (2021) used a similar approach to integrate a dynamic QTL-based module (Vallejos et al., 2020) into the CSM-CROPGRO-Drybean model for predicting first flower appearance. For most crop models at level 3, there are two types of genetic information in the models that allow them to simulate unique growth and development responses of specific genotypes under specific environments. First, they have crop parameters and relationships (and this represents genetics as well) that are hard-wired in the computer code or inputs from species files that create the crop species type (maize or soybean or wheat). Species traits and relationships are defined in the computer code for CERES-Maize along with a small read-in species file . By contrast, CROPGRO has generic source code that allows the same code to be used for many species (up to 18 at present), but the species parameters and relationships are specified in a read-in species file. In addition, there are GSPs in read-in cultivar files that distinguish differences among cultivars, varieties, or hybrids within a species. While model developers may make this distinction for convenience of model users, in reality this is all genetically controlled and there should be no real distinction between species traits and cultivar traits within a species. For example, cardinal temperature parameterization of processes is defined in the CROPGRO species file at present, but cultivars may actually differ, for example, in heat tolerance of grain-set (see later example). It is a situation of initially being parsimonious with cultivar GSPs, especially when information is lacking.
In this paper, we will present examples with the DSSAT crop models, particularly the CROPGRO-legume model (Boote et al., 1998;Hoogenboom et al., 2019b) using V4.7 release available from www.DSSAT.net, to describe genetic traits and how they are simulated to evaluate genetic yield improvement for various target environments. Genotype-specificparameters (GSP) for the CROPGRO model are defined in Table 1. The most important GSPs are those that define daylength sensitivities and phase durations (EM-FL, FL-SH, FL-SD, SD-PM) that affect crop life cycle (Table 1). For example, soybean is daylengthsensitive (CSDL=12.58 hr. and PPSEN=0.311 for MG 5 cultivar), whereas peanut (Arachis hypogaea L.) is not daylength-sensitive (PPSEN=0.000). For simulating phenology and life A c c e p t e d M a n u s c r i p t 8 cycle of winter cereals, vernalization parameters are important in addition to daylength and phase-duration parameters. For CERES-Maize, phase durations are described by growing degree days (GDD), while for CROPGRO, photothermal days (ptd) are used. CROPGRO has six vegetative GSPs affecting early leaf size (SIZFL), specific leaf area under optimum conditions (SLAVR), duration of leaf appearance (FL-VS), rate of leaf appearance (TRIFOL), duration of leaf expansion (FL-LF) beyond flowering, and leaf photosynthesis (LFMAX). Reproductive GSPs affect potential seed size (WTPSD), seeds per pod (SDPDV), individual grain growth rate (set by 1/SFDUR), the ptd duration of pod addition (PODUR), and threshing percentage (THRSH). Seed protein (SDPRO) and lipid (SDLIP) compositions are also defined as cultivar GSPs; these have become more important with the emphasis on nutrition security, especially under climate change.
Three example species within CROPGRO in Table 1 have dramatic differences in life cycle, especially in the duration of the seed-filling phase (SD-PM), with dry bean having the shortest and peanut having the longest seed-filling phase (Table 1 and Figure 1). Not surprisingly, the yield potential of the three crops generally follows that trend, with a lower yield for the shorter cycle dry bean crop. There are also differences in potential yield due to seed composition, with oil and protein requiring more energy for synthesis compared to carbohydrates. Dry bean has a short life cycle and lower yield, but compensates by having more rapid LAI development along with higher sowing density of 25 plants m -2 compared to 18 plants m -2 for soybean and 16 plants m -2 for peanut ( Figure 1A). Peanut, under good fungicide treatment, does not self-senesce leaves, while dry bean and soybean have a grand senescence of LAI that is rapid after the beginning maturity stage (first mature pods). Peanut is an example of an indeterminate crop, with a long slow phase of pod addition (PODUR), less than 1.00 partitioning intensity (XFRT), and a long period of seed-filling (SFDUR and SD-PM).
A c c e p t e d M a n u s c r i p t 9 The CERES-Maize model has six GSPs  shown in Table 2. There are three GSPs related to life cycle and daylength sensitivity (P1, P2, and P5, which describe crop life cycle progression). P1 is GDD from emergence to end-juvenile, while P2 A c c e p t e d M a n u s c r i p t 10 is daylength sensitivity which affects an internal phase to end of floral initiation. The outcome of P1 and P2 affect an internal phase that determines time from floral initiation to flowering. There are two reproductive parameters, G2 which is genetic potential number of grains per plant, and G3 which is single kernel growth rate under optimum conditions. PHINT is an additional GSP, which is meant to represent the rate of leaf appearance (PHINT being the GDD per leaf tip appearance). However, PHINT also re-scales simulated LAI (with a larger PHINT value reducing LAI). There are additional coefficients for potential leaf size and leaf longevity for the IXIM-Maize model, which is somewhat more mechanistic than CERES-Maize. The IXIM-Maize model has leaf-to-canopy photosynthesis (Lizaso et al., 2005) and more detailed relations for leaf area growth (Lizaso et al., 2003). Table 2 illustrates that longer life cycle hybrids grown in Iowa produce higher biomass and grain yield at maturity.

Some Important Principles that Crop Models Should Consider for Hypothesizing Genetic Improvement
To be truly useful for genetic improvement, crop models must realistically simulate crop physiological responses to environmental factors and how genetic variation affects the processes of crop carbon, water, and nutrient balance. These principles will be discussed in three ways: 1) Do the models have adequate mechanism to simulate the traits of interest: 2) Do the models honor the C, N, water, and energy balance, 3) Do the models consider pleotropic effects, e.g. physiological linkages.
The first important question is whether the current GSPs and the mechanisms in the crop models are sufficient to correctly simulate a given process and its associated genetics. Does the GSP actually represent the genetic trait? For example, radiation-use-efficiency is an approach used in many crop models, but RUE is a complex trait affected by many genes (QTLs) and there is no direct connection to leaf-level physiology that a molecular geneticist can measure. In other words, RUE-based models may be too simple for some purposes. That would imply that crop models should have a minimum level of physiology, such as leaf photosynthesis, respiration, and organ growth, to enable one to interpret genetic effects. That level of detail may be needed for the modeler to be able to communicate successfully with geneticists. As Parent and Tardieu (2014) stated, the phenotype (in this case variation in RUE) must be the emergent outcome of how genes express themselves in response to environment. Thus, RUE is an emergent outcome, not a genetic trait. This is also true for other GSPs, although some of them more closely describe physiological processes than RUE. A goal should be to incorporate genetic effects based on known or hypothesized pathways for their influence on physiological processes. For a molecular marker effect to be incorporated into a crop model, one should know the mode of action of that QTL. What is the specific mode of action and environmental sensitivity of a given QTL at its more basic level of action? Saying that the QTL affects RUE or yield or HI or seed size is not sufficient, as they are the emergent outcomes. Following the logic of Hammer et al. (2016) and Chenu et al. (2018), these are examples of complex traits that should be dissected into component traits at ecophysiology level which are then evaluated for their effects on those emergent outcomes.
A second aspect is whether the models honor the C balance, N balance, water balance, and energy balance. Is there a free lunch? This is not an easy question to understand, because one might argue that increasing leaf area index (LAI) is a simple way to increase early seedling vigor, but in fact, allocation to leaf area growth comes at the expense of less assimilate allocation to roots, and there are also feed-forward effects to consider. Another example is that greater constitutive (all the time) allocation to roots may enhance drought tolerance but may come at the expense of shoot (and reproductive) growth. Water conservation associated with reduced transpiration (reduced leaf conductance) trait is proposed for drought conditions. But the "no free lunch" problem to consider here is that reduced leaf conductance also reduces leaf photosynthesis, and the degree of photosynthesis reduction is the critical issue to resolve. Messina et al. (2015) showed that a limitedtranspiration trait increased maize drought tolerance, resulting in a yield increase for dry environments, but a slight yield decrease for well-watered environments. Going one step A c c e p t e d M a n u s c r i p t 13 further, the reduced conductance may warm the canopy above the optimum temperature for photosynthesis. N balance examples are also possible, and at the very least, models that lack a soil-plant N balance are not useful for hypothesizing genetic variation in environments where N is a major limiting factor.
A third consideration is whether the crop models consider built-in pleotropic linkages. For example, increased leaf-level photosynthesis is commonly associated with increased specific leaf weight (SLW) or specific leaf nitrogen (SLN). That is what we would call a pleotropic linkage because intrinsic rate per unit leaf mass or N mass is not increased. If modeled correctly, this may come at a cost. If the plant makes high SLW or SLN leaves, the amount of total leaf area will be lower and light capture will be less. The net benefit will depend on whether the increased leaf rate offsets the decreased light capture. An example will be given of this later.

The Role of Environment and Trait Response to Environment Is Very Important
The environment plays a major role in influencing ultimate plant phenotype (size, height, yield), and the phenotype is, or should be, an emergent outcome of how the individual genes and combinations of genes respond to environment. Hammer et al. (2010) indicate that crop models should have algorithms for underlying processes that link to genetics and lead to simulated phenotypes as emergent properties. Likewise, Parent and Tardieu (2014) argued that phenotype should be an emergent outcome of the genetic variation in processes as affected by environment. An example of this is harvest index (HI), which plant breeders may call a trait, but in reality HI is an emergent outcome of many genes that affect the timing, duration, grain-set, grain-growth rate, and intensity of partitioning to reproductive growth (even those five items are not single gene actions). So, how can the crop models be enhanced to get at the controllers of the underlying processes, rather than thinking that emergent outcomes are the crop model "traits"? For example, Reymond et al. (2003) and Chenu et al. (2009) simulated the effect of genetic markers linked to thermal rate of leaf area expansion and proposed individual QTLs for optimum leaf extension rate, for sensitivity to temperature, and sensitivity to vapor pressure deficit. It is important to highlight that before QTL/gene effects can be incorporated into crop models, the models need to be developed and sufficiently enhanced to allow scaling up the impact from the leaf level to the plant level (Chenu et al., 2008). In their case, this involved inclusion of an hourly metereology module and a leaf area growth module that simulated rate of leaf initiation, leaf tip appearance, leaf ligule appearance, and leaf expansion rate dependent on hourly temperature, hourly vapor pressure deficit, pre-dawn soil water potential, and leaf rank position. Other examples are the studies of gene effects on cultivar life cycle of dry bean (White and Hoogenboom, 1996;Hoogenboom et al., 1997;Hoogenboom and White, 2003;Hoogenboom et al., 2004) and soybean (Messina et al., 2006). For dry bean and soybean, these are well-researched genes that are known to respond differently to daylength and temperature. Thus, the time to flower and the time to maturity, as well as final grain yield are emergent outcomes of individual genes and in combinations. A G x E interaction is a probable outcome if a gene for strong A c c e p t e d M a n u s c r i p t 14 daylength effect is present, but that will occur only if the crop is compared for short versus long daylength environments. Messina et al. (2006) linked GSPs in the CROPGRO-Soybean model to genes with a multiple linear regression approach, where 1 or 0 represents the presence or absence of the dominant allele. They illustrated how six loci related to photoperiod sensitivity and determinancy (E1, E3, E3, E4, E5, and E7) in soybean could be translated into GSPs in the CROPGRO-Soybean model. For example, critical short daylength (CSDL) and EMFL were defined as a function of number of dominant loci (NLOCI) and presence or absence (1 or 0) of critical loci, as follows: The equation for photophermal days from emergence to flowering (EMFL) is the following EMFL = 20.77 + 2.1E1 +1.8E3

Linking QTL Markers to GSPs in CROPGRO-Soybean
The QTL genetic markers associated with these genes were determined for a set of cultivars. In the example above, cultivar Savory is e1e2e3e4e5, and differs from Vinton 81 which is e1e2E3e4e5. Other cultivar coefficients, PPSEN, EM-FL, FL-SD, FL-VS, SD-PM, V1-JU, and R1PRO, were also affected by some of the same loci and number of loci. Putting these all together, allowed a different phenotype to be simulated based on absence or presence of multiple dominant loci (such as E3). Then, with independent data on maturity and yield from seven trial locations in Illinois USA over 5 years, the CROPGRO-Soybean model with genetic coefficients based on E genes predicted 75% of the variation in days to maturity and 54% of variation in yield across the 35 site-year combinations (Messina et al., 2006).

Linking Genetic Architecture to GSPs in CROPGRO-Drybean to Develop Ideotypes
To connect genes to physiological processes, one can hypothesize ideotypes based on either dominant or recessive genes. White and Hoogenboom (1996) developed the first genebased model GeneGro from the BEANGRO model (Hoogenboom et al., 1992;. In CSM-GeneGro six known genes were linked to the physiological processes of the BEANGRO model through the GSPs. These included the Ppd gene for basic photoperiod response, the Hr gene to enhance the effect of Ppd, the Fin gene for determinacy, the Fd gene for early flowering and maturity and three seed size genes, i.e., Ssz1, Ssz2, and Ssz-3. The A c c e p t e d M a n u s c r i p t 15 CSM-GeneGro-Drybean model was calibrated for 46 cultivars and 10 trials conducted in Canada, USA, Mexico, and Colombia. The same cultivars were then evaluated with an independent data set from 26 trials for the same locations for a total of 333 observations White and Hoogenboom, 2003). Following calibration and evaluation, 96 genotypes were created for the CSM-GeneGro-Drybean model based on either dominant or recessive genes for Ppd * Hr * Fin * Fd * SSz1 * Ssz1 * Ssz2 * Ssz3. These 96 genotypes were then evaluated for dry bean production environments including Michigan, Idaho, and Washington, using standard management practices with at least 30 years of historical weather data to determine the G * E interactions and to identify the best performing ideotypes for each location. The genotype with all genes dominant except for Ssz3 resulted in the highest mean yield. With respect to that ranking, Genotype 2 (1111110), Genotype 1 (1111111) and Genotype 9 (1110111) ranked highest for Michigan, Genotype 9 (1110111), Genotype 2 (1111110) and Genotype 10 (1110110) ranked highest for Twin Falls, Idaho, and Genotype 10 (1110110), Genotype 9 (1110111), and Genotype 2 (1111110) ranged highest for Prosser, Washington. For the three locations, two of the genotypes ranked among the top three. These environments differed in growing season temperature and, therefore, the Fd gene for early versus late flowering and the Sss3 gene for seed size were the critical genes.

Preparing to Simulate Genetic Improvement with a Crop Model
7.1 Use a crop model suited to the task. Does the model have the desired genetic trait capability?
Modelers and geneticists need to understand the limitations of the crop model and determine whether the model, with its GSPs, has the ability to simulate desired genetic variations (QTL, genes) in a realistic way. For example, if the model cannot simulate heat stress, disease tolerance, salinity, water-logging, soil Al saturation, or soil compaction, then conducting synthetic breeding studies of those issues is not possible. Furthermore, if the genetic traits are related to drought, N fertilization, temperature, or daylength effects, then it is important that the models be tested beforehand and documented for their capabilities to simulate responses to those factors, especially with respect to actually simulating a soil water and nitrogen balance. If the target trait relates to variation in leaf-level photosynthesis, then the crop model should simulate leaf photosynthesis at the appropriate temporal scale and scaling up to canopy assimilation in a realistic way.

7.2
Clearly define the target environment (weather, soils, and anticipated crop management, whether irrigated or rainfed, or well-fertilized or not). Target environments will vary by region, rainfall, temperature, soils, and management. It is important to carefully characterize the crop environment of interest, especially considering the major production areas the breeding is targeted for (Chenu, 2015;Cooper et al., 2016). For example, the crop cycle may be defined by intermittent droughts, or terminal drought, or limited by killing frosts, or by a desire for two (rice) crops in a season. Obtain data (phenological data, final yields, time-series data on biomass, LAI, and reproductive growth) for baseline crop cultivars for the target regions along with the weather, soils, and management inputs needed simulate the model. Do some initial calibration to start from a realistic yield level.

7.4
Consult with plant breeders and geneticists to jointly decide upon the genetic traits to vary (and likely range possible) to improve yield or modify life cycle or other goal. Chenu et al. (2018) proposed that consultation with breeders and geneticists should be iterative with feedback, to avoid mis-conceptions on what is possible. They recommended an integrated approach that combines insights from crop modelling, physiology, genetics, and breeding to characterize traits contributing to yield gain for target environments. Furthermore, they recommended that complex traits such as RUE or HI or transpiration efficiency should be broken down into component traits that contribute to the complex traits, to allow improved physiological understanding.
It is important that the extent of variation in a genetic trait be limited to ranges documented in the literature or known by plant breeders/geneticists. Knowledge of the possible genetic variability will prevent simulating unrealistic results that are outside the range of genetic feasibility. For example, RUE is thought to be a relatively conservative trait. So, under that constraint, RUE should not be varied much, because models show large almost 1:1 increases in yield with increased RUE.

7.5
Determine whether the GSPs and parameters available in the crop model have the ability to simulate those desired genetic traits in a realistic manner. This may require comparison to known response data.
Are there linkages among traits? These may occur genetically or because of conservation of C and N. First, there are true genetic linkages (where two genes reside in close proximity on the chromosomes) and therefore the outcome illustrates true linkage. The Fin gene in common bean is an example of this. It affects determinacy as well as time to first flower appearance and is possibly thought to be just a single gene . But in a secondary viewpoint, there are pleiotropic linkages that can be understood from a physiological or C or N balance approach. For example, a 1% increase in either seed protein or seed oil for soybean, is simulated to reduce yield by about 0.6 to 0.8 %, simply because of the additional energy needed to produce proteins in contrast to producing starch compounds that require lower energy to produce (Boote and Tollenaar, 1994). Seed protein concentrations of soybean cultivars have declined by about 2 absolute percentage units over the past three decades in accordance with cultivar release date even while grain yield has increased (Naeve, 2019, US Soy Quality Report). It is possible that plant breeders selected for yield without paying sufficient attention to maintaining seed protein. An example of pleiotropic linkage caused in part by physiological trait and N balance conservation is provided Hammer et al. (2016) in which the stay-green outcome of sorghum (Sorghum bicolor) can be a simulated outcome of a dwarfing trait which results in less N in stem, but A c c e p t e d M a n u s c r i p t 17 more N in leaf which sustains LAI longer during grain-filling. Likewise, pleiotropic linkage of a physiological trait and water balance/conservation can be demonstrated by simulated tolerance to terminal drought being an emergent outcome of lower early leaf area (less tillering in sorghum, Hammer et al., 2016) or less early plant vigor in wheat (Bourgault et al., 2020).
Another example of pleiotropic linkage not caused by "pure" genetics, but by C balance and leaf physics, is the strong relationship of increased leaf photosynthesis (Amax) to increased specific leaf weight (SLW) which occurs in many species including soybean (Dornhoff and Shibles, 1970;Buttery et al., 1981;Morrison et al., 1999). This creates a pleiotropic linkage not caused by "specific" genetics, but it affects seasonal LAI, light interception, and canopy assimilation. As simulated by Boote et al. (2003), increasing leaf photosynthesis with strict coupling to SLW gives much less benefit to canopy photosynthesis and yield than a pure increase in leaf rate not coupled to SLW. The reason is that increases in SLW increase leaf photosynthesis rate but reduce leaf area expansion and early season LAI. The net effect on growth and yield is complex and interactive effects occur depending on row spacing, sowing density, crop life cycle, and elevated CO 2 Boote et al., 2011).
The "limited transpiration" traits proposed by Sinclair et al. (2010) and Gilbert et al. (2011), provide another example of whether models are able to realistically simulate a trait. In one case the limited transpiration is routinely proposed to occur relatively soon when the crop experiences mild stress, while in another case limited transpiration is induced only under high VPD. These traits can give complex responses, because they influence C balance, water balance, and energy balance, and additionally depend on time-series timing of rainfall events and soil water depletion. At its core, the action of the constitutive "all-the-time" limited transpiration trait is that a small reduction in leaf conductance will reduce leaf transpiration and canopy transpiration, thereby allowing some soil water conservation prior to water deficit periods. The water conservation effect is hypothesized to maintain or increase yields because the crop extends better into a future rainfall period. But, an important issue here is that reduction in stomatal conductance also reduces leaf photosynthesis and the extent of the reduction in leaf rate is the crux of the problem. The reduction of photosynthesis (case 1) during good rainfall season may reduce yield, even while the same trait may increase yield under severe drought seasons. This is what Battisti et al. (2017) found with that approach in CROPGRO-Soybean, although Sinclair et al. (2010) with a simpler model (SSM) proposed greater benefits under water-limitation and very small negative effect under good rainfall environment. Both of these simulated cases represent G x E interactions (same gene action, but effect depends on environment). Case 2 with the photosynthetic reduction under high VPD is a more adaptive response since it does not act under moderate evaporative demand environments. Nevertheless, the reduction in photosynthesis associated with reduced conductance (transpiration) under high VPD may approach unrealistic simulations of internal CO 2 (Ci). The data of Gilbert et al. (2011), if taken to the stronger conductance sensitivities to VPD, can give an internal Ci/Ca ratio approaching 0.4 (like C-4 crops) under high VPD which is a concern because that has not been reported previously and tends to violate theory.
A c c e p t e d M a n u s c r i p t 18 As an independent example of this, Cuadra et al. (2020) incorporated the high VPD effect on photosynthesis and conductance into the hourly energy balance version of CROPGRO, and reductions in canopy transpiration were obtained, although the benefits to growth and yield have not yet been investigated. In that hourly energy balance model, with strong sensitivities of conductance to high VPD using the Ball-Barry-Leuning sensitivity to VPD (Leuning, 1995;Miner et al., 2017), the simulated Ci/Ca ratio fell as low as 0.4 under extreme large VPD, although it was between 0.6 to 0.8 most of the time. The message is that the modelers and their models must honor water balance (nothing is free) and resulting effects on crop C balance and energy balance (canopy temperature rises under large VPD, which pushes canopy transpiration back up somewhat. Sinclair et al. (2010) conducted simulations illustrating that reduction in transpiration at high VPD gave a soybean yield gain near 200 kg ha -1 in the USA. Battisti et al. (2017) using a similar approach with the CROPGRO-Soybean model simulated a gain of 1 to 75 kg ha -1 for most regions in Brazil, although the gain was 75 to 150 kg ha -1 for some regions. He suggested that differences between the two examples could be attributed to either climate of the regions or the level of model penalization on transpiration and photosynthesis.

7.6
Conduct the hypothetical trait simulations for multi-year weather across differing target environments (weather, soils, and management) to determine the genetic improvement and any genotype by environment by management interactions. Examples follow.

In-silico Modeling of Traits for Genetic Improvement
Assume that you have defined the target environment (multiple years of weather for a site and the soils), along with the basic cultivar type to define season-length and the crop management (rainfed or irrigated, and N fertilization level). A crop simulation model for the given crop would be simulated with those soils and 20 to 30 weather years, with a baseline cultivar and crop management. The outcome would be a mean yield with the distribution of yields depending on weather variability. This defines your baseline yield case. Ideally, one should have yield trial data that verifies that the model simulations compare reasonably to field-observed yields for the baseline cultivar. It is important to know that the future hypothetical evaluations are well-grounded. Caution is needed if yield trial data experience pest and disease losses, because most crop models do not account for pest, disease, and other biotic stress effects.
Next, one would create hypothetical genetic traits. Assume that one wants the same season length, with earlier anthesis but longer grain-filling duration. This would require decreasing the GSP input of photothermal unit requirement to anthesis, and increasing the trait for photothermal units from anthesis to physiological maturity. These two traits would be considered as "genetic coefficients" for the crop model. The model would be simulated for the new virtual cultivar with the same weather and soils, to see how the two newly modified traits affect the percent change in mean yield, as well as the resulting yield variability. The simulations should be conducted with multiple target environments, to evaluate whether the new trait is best in just some environments or all environments (Loffler et al., 2005;Putto et al., 2008;Loison et al., 2017). This would be equivalent to testing for a G x E effect. The target environments should be described in terms of weather (over multiple seasons), water availability, soil physical and chemical constraints, desired crop life cycle, and even management constraints. The target environments can be: 1) low N or high N level, 2) rainfed versus irrigated, 3) altered management, 4) current versus future climate (higher CO 2 , increased temperature, or altered rainfall). The advantage is that virtual crop modeling can quickly evaluate trait effects on yield responses for multiple weather and soil environments (or future climate change), without actually doing the multi-location field trials. This can save years and millions of dollars in conducting multi-location trials. However, there are important things to consider when these computer experiment exercises are conducted: 1) What is the degree of confidence in the crop model ability to simulate that trait or to simulate yield response to weather and management generally? 2) What is the knowledge of the range of genetic variability in the trait that one is simulating and what is possible?

Evaluating Possible Range of a GSP Compared to Literature-Reported Range
Crop models can be used to evaluate the quantitative response over the entire possible range of a single trait, such as rate of root depth progression. Crop models can also be used to evaluate response to multiple combinations of traits. In such evaluations, it is essential that the feasible genetic range for each given GSP be considered relative to reported literature. Figure 2 from Boote et al. (2003) illustrates soybean yield response to simulated rate of root depth progression for MG 3 soybean grown on a deep Nicollet clay loam at Ames, Iowa. The published range of variation in this trait comes from Kaspar et al. (1978Kaspar et al. ( , 1984. This figure illustrates several important points: 1) that the response to a given trait over its whole range may become asymptotic or saturating (responses are not necessarily linear), 2) that the response within the feasible published range of variation may be relatively modest as shown here, and 3) that yield variability may be influenced by the trait, in this case, deeper rooting leads to less variability in yield.
The full range of yield response to another frequently studied genetic trait, lightsaturated leaf photosynthesis (LFMAX) is illustrated in Figure 3 from Boote et al. (2003). The published literature for soybean indicates that most of the genotypic increase in lightsaturated leaf rate is associated with increase in specific leaf weight (SLW, or reciprocal of SLA) (Dornhoff and Shibles, 1970;Buttery et al., 1981;Morrison et al., 1999). So, one cannot simply propose increases in leaf photosynthesis without considering this pleiotropic relationship to SLW. In other words, there is a cost (of crop dry matter) to pay for making thicker leaves, which results in smaller canopy LAI and reduced light interception. As a result, the optimistic picture of increasing yield with increasing single leaf photosynthesis is made more complex by this pleiotropic connection of leaf photosynthesis to SLW. In Figure  3, the yield response to the LFMAX with coupling to SLW is really quite modest (rising, but only slowly), while the response to LFMAX with no coupling to SLW is much more optimistic and is only slowly asymptotic. Also important is the range of measured genetic A c c e p t e d M a n u s c r i p t 20 variation in light-saturated LFMAX of soybean (shown as a horizontal bar in Figure 3) shows relatively modest potential for improving yield based on increased leaf photosynthesis. The net effect (of increased SLW to obtain increased LFMAX) on growth and yield is complex and interactive effects occur depending on row spacing, sowing density, crop life cycle, and elevated CO 2 Boote et al., 2011). The increased SLW trait can give neutral or negative effects under low sowing density, wide row spacing, and ambient CO 2 , but becomes increasingly more positive for yield at narrow row spacing, high sowing density, and elevated CO 2 (all of which compensate for the reduced LAI).

Evaluating Single and Multiple Combinations of Traits for Genetic Yield Improvement
We can use crop models to evaluate single effects of traits as well as combinations of multiple genetic traits to improve yield. Plant breeders, in their selection process, will happily accept all traits that contribute to increased yield unless quality and disease resistance are compromised. Particularly with past traditional plant breeding, breeders may not have known all the specific traits of their improved cultivars, or what the traits may be contributing, but they were happy to accept the improvement in yield. Thus, we need to consider that crop modeling for yield improvement should also consider the degree of additivity of multiple traits. It is important to evaluate the traits in different environments, e.g., water-limited, elevated temperature, elevated CO 2 , and management conditions, to appreciate G x E interactions, as shown in the simulation examples below. Boote et al. (2001) analyzed the growth patterns of old and new soybean cultivars and concluded that there were multiple traits (up to 5 or 6, as interpreted from the CROPGRO-Soybean analysis), which contributed to the observed yield gains of 12 to 23% for the improved cultivars. These traits included earlier onset of podset, faster pod addition, longer grain filling, increased leaf photosynthesis, and slower N mobilization. Later crop model simulations by Boote et al. (2011) evaluated the effect of combining multiple traits on soybean yield and suggested that the effects of traits were generally additive (Table 3). Of particular interest is that some traits were management-and climate-dependent. For example, some traits such as determinacy, early anthesis, and increased SLW gave modest or no yield improvements in ambient CO 2 and low management (wide rows, low sowing density, low input) conditions, but the same traits when placed in high CO 2 and/or high management, were contributors to increased yield. Traits such as early anthesis and determinancy that reduce LAI are not a disadvantage under conditions that favor abundant vegetative growth. This message is particularly important for breeding for crop yield improvement under present and future CO 2 increase. In fact, some GSPs gave a greater response under elevated CO 2 than under 350 ppm CO 2 (see highlighted increases in bold in Table 3). The additivity of multiple GSPs in two, three, and four-way combinations progressively increased yield above the base MG 3 cultivar. A single trait of longer filling period gave 5.3% yield increase, filling period plus high SLW gave 6.3%, and filling period plus slower N mobilization gave 10.2% increase in yield (these are conservative possibilities). Staying within feasible genetic range, the three-way combination of 15% longer filling period, 10% higher LFMAX, and 10% slower N mobilization from vegetative A c c e p t e d M a n u s c r i p t 21 organs gave 17.7% increase in yield. Pushing the envelope for somewhat greater changes (such as 15% increase in LFMAX) in three or four-way combinations resulted in a yield increase of 20.6 to 23.9%, approaching what the plant breeders have achieved since the Williams 82 cultivar was released in 1982 (Williams 82 is the baseline in Table 3, which also corresponds approximately to the period when CO 2 was about 350 ppm). The responses to GSPs and the additivity of GSPs continued to be present at the higher CO 2 concentration of 500 ppm. Yield improvements with single and multiple GSP trait variations were simulated with the CROPGRO-Peanut model using 15-years of rainfed conditions in Ghana under hypothesized increase in CO 2 and a +3 ºC increase in temperature (Table 4). The shortseason cultivar, Chinese, had been well calibrated for multiple seasons in Ghana and the soil water-holding characteristics were also well established by comparison to observed data (Naab et al., 2004;Naab et al. 2005). Because this is a short-season cultivar with a relatively low LAI, yield increased 4.2% with delayed flowering (+10% EM-FL) and increased 9.0% with longer grain-filling duration (+10% SD-PM), and 12.4% with both traits. Likewise, A c c e p t e d M a n u s c r i p t 22 with a relatively low LAI, an increase in photosynthesis (+10% LFMAX) caused a 7.3% increase in yield, while an increase in SLA (decreased SLW) increased yield by only 0.5%. An increase in partitioning intensity increased yield by 3.4% because this cultivar is relatively indeterminate. Moreover, combinations of traits such as delayed flowering, longer grain fill, higher partitioning (+10% XFRT), and higher photosynthesis were additive and increased yield by 24.7%. Of particular interest is that three traits (delayed flowering, longer grainfilling, and increased partitioning) increased yield more under the +3 ºC temperature scenario than at ambient conditions. This would be a G x E interaction. Under future climate change, a temperature increase of this magnitude is probable and would cause a shorter life cycle, which the increased EM-FL and increased SD-PM help to offset. Zheng et al. (2016) also suggested that genetic alleles for longer life cycle would be needed for crops such as wheat under the rising temperature anticipated with climate change. In addition, the peanut model simulates a reduced rate of pod addition and partitioning under elevated temperature (relationship in model is based on observations under elevated temperature (Prasad et al. 2003;Boote et al., 2018)), and the increased partitioning helps to bring that back.  Crop growth models can be used to investigate traits contributing to past genetic yield improvement by comparison to phenotyping data collected on old versus improved cultivars. This was accomplished by Narh et al. (2015) who evaluated improved peanut cultivars released by ICRISAT for use in West Africa, by comparison to cv. Chinese the baseline cultivar in that region. Narh et al. (2015) used the CSM-CROPGRO-Peanut model as an analysis tool to evaluate possible genetic contributions to yield increase among a set of 19 cultivars that included improved cultivars from ICRISAT plus local cultivars. Data on phenology, biomass, pod mass, pod HI, and final yield were collected on 19 cultivars over A c c e p t e d M a n u s c r i p t 24 two seasons at four sites in Burkina Faso and Ghana. Optimization techniques were used with the model to solve GSPs. The two highest yielding lines Nkatesari and ICGV-IS 96814, both ICRISAT-released genotypes, yielded 76 to 80% more than Chinese. The model was able to successfully mimic this yield increase with the GSPs described in Table 5. Traits important to yield under this no-fungicide rainfed production included longer life cycle, higher photosynthesis, higher partitioning, longer grain-fill, and modest leafspot resistance. Leafspot resistance was partial, but certainly contributed to the higher leaf photosynthesis (greater crop growth rate) along with extended life cycle. The longer life cycle, longer grainfilling and higher partitioning were also documented in the phenology and pod HI observations. Table 5. Observed pod yield and observed pod harvest index over two seasons at four sites, and GSPs of peanut cultivars derived from data using the CROPGRO-Peanut model with an optimizer (from Narh et al., 2014Narh et al., , 2015. Only the three highest yielding cultivars and the three local check cultivars are shown here. † PD, Photothermal days.

Simulating Genotype by Environment Interactions
Genotype by environment interactions are of great interest to plant breeders because this helps select cultivars and cultivar traits for different target environments. Table 6 illustrates strong G x E interactions for the case of simulated GSP traits with the CSM-CROPGRO-Chickpea model for crops grown either under full irrigation or under waterlimited terminal drought . Chickpea (Cicer arietinum) in India is typically sown at the end of the monsoon and depends on stored residual water on high-clay soils (Singh et al, 1999a, b). The target environment, irrigated versus rainfed, is important, because the responses to GSPs were often opposite and large for contrasting soil water A c c e p t e d M a n u s c r i p t 25 availability. For example, the effect of increased SLW was to decrease yield 11 % under irrigation but to increase yield 18 % under water-limitation. Increased SLW had a negative effect on yield under irrigation because it reduced LAI and light interception. But the same higher SLW trait was beneficial under rainfed conditions because it reduced LAI, light interception, and transpiration, thus conserving water for grain growth later in the life cycle. Later flowering acted to increase LAI, and gave 15.4% yield increase under irrigation, but reduced yield 13.5 % under the terminal drought. The common factor was the amount of LAI produced and amount of soil water conserved and left at the time seed growth began. Likewise, higher photosynthesis had a much larger benefit under irrigation; while under terminal drought, it had little benefit because it increased early LAI and water extraction too much. Bourgault et al. (2020) suggested that an early vigor trait in wheat could similarly predispose the crop for yield reduction under terminal drought because of early depletion of soil water caused by the earlier LAI. Table 6. Grain yield response to varying GSPs, simulated for 22 years for Annigeri chickpea grown under either rainfed or irrigated conditions at Patancheru, India. Sown on September 29 (day 302) on a very fine montmorillonitic clay soil, starting at field capacity. Simulated with the CSM-CROPGRO chickpea model as developed by Singh and Virmani (1994) and modified by Singh et al. (2014). It is important to understand how and when crop models reproduce G x E interactions. It is proposed that G x E interaction can come from a "normal" single gene or gene package where the effect of that gene is neutral in one environment but negative or positive in another environment, as in the example for chickpea grown under irrigation or terminal water deficit. The point here, is that there is not a need for a "special" G x E "gene", but rather that a given gene or gene package may be beneficial in one environment, but negative in another, as also suggested by Chapman (2008)  In the examples so far, we have illustrated that simulated G x E can be an emergent outcome for environments differing in water availability, CO 2 , temperature, and crop management. G x E effect for locations (L) can occur if the location varies in soil waterholding capacity, rainfall, and temperature. Putto et al. (2013) found G x L effects in different locations as GSPs were varied with the CROPGRO-Peanut model. G x E interaction for management can occur if the management varies in irrigation, N fertilization, row spacing, and plant population. For all the water-supply based environments, deep rooting trait will benefit only if water deficit occurs. Even different cultivar life cycles can result in a G x E interaction if the environment is different during the non-common part of the life cycle.

8.4
Simulating Adaptation to Climate Change Factors: Crop models are useful for evaluating genetic traits for adaptation to climate change factors Singh et al., 2012;Singh et al., 2017;Hammer et al., 2020). In a project funded by the International Food Policy Research Institute, K. J. Boote modified the code of the CERES-Pearl Millet model to account for elevated temperature effects on grainset. Then the model was calibrated by Singh et al. (2017) to pearl millet (Pennisetum glaucum) data from six sites in India and two sites in West Africa, followed by simulating response to climate change scenarios (increased temperature, rainfall change, CO 2 increase) for those sites (Table 7). Traits involved a 10% shorter life cycle, a 10% longer life cycle, increased productivity, deeper more effective water extraction, and heat tolerance for grainset and single grain growth rate. Life cycle was modified by changing P1, P2O, and P5 (definitions are similar to those of CERES-Maize in Table 2). Simulated productivity trait was accomplished by 10% increases in G1, G4, GT, and RUE parameters. For drought tolerance, the shape of rooting depth was made deeper and the lower limit of the soil water holding traits was reduced to give 5% more available soil water. The grain-set sensitivity to temperature was based on field observations of grain-set at an elevated temperature location (Gupta et al., 2015). For the model, the temperature thresholds were set to give no reduction in grain-set below 33 ºC, but with a linear reduction in grain number between 33 to 39 ºC, and zero grain-set at 39 ºC. The grain number in the millet model is determined by the assimilate supply and cumulative biomass, and the daily mean temperature during the ISTAGE4 phase. To hypothesize a heat tolerance trait for an improved cultivar, both the threshold and the ceiling failure temperature for grain-set were increased by 2 ºC to 35 and 41 ºC.
Simulations with the CERES-Pearl Millet model for the sites in India (Hisar, Jaipur, Jodhpur, Bikaner, Aurangabad, and Bijapur) and in West Africa (Sadore and Cinzana) revealed that a shorter life cycle was not a successful strategy under climate change, although a longer life cycle increased yield for about half of the sites ( Table 7). The increased productivity trait (four combined GSPs listed above) increased yield at most sites, but were not uniformly consistent across all sites, but we have concerns for the reality of modifying all those traits at once, and whether plant breeders can achieve them. The drought tolerance trait (deeper more effective water extraction) gave more response for the low-yielding waterlimited sites of Bikaner and Sadore. The heat tolerance trait (+2 ºC more tolerance) gave no A c c e p t e d M a n u s c r i p t 27 yield benefit for the cool sites (Aurangabad, and Bijapur), but gave 17-18% increases for the warmest sites. This would be a G x E interaction. The simulated combination of droughts and heat tolerance traits was additive. In a similar manner, Singh et al. (2012) evaluated GSPs of CROPGRO-Peanut for peanut adaptation to climate change across multiple sites in India. A 2 ºC increase in heat tolerance of pod-set and seed growth rate resulted in larger yield responses (3 to 12%) under climate change than under current climate, except for two cool sites where there was little difference. Responses to multiple traits were also generally additive. Table 7. Yield of base cultivar under climate change (CG) by 2050 and percentage gain or loss in yield for virtual pearl millet cultivars with 10% shorter duration, 10% longer duration, yield potential (increased productivity), drought tolerance (DT), heat tolerance (HT), or drought plus heat tolerance (DT+HT) traits at sites in India and West Africa.  A c c e p t e d M a n u s c r i p t 28 9.

Looking to the Future Role of Crop Models in Genetic Improvement
There is great opportunity to use dynamic crop simulation models as tools to evaluate past genetic improvement, to evaluate virtual cultivars for future yield improvement in target environments, to hypothesize traits that account for G x E responses in many environments, and to assist plant breeder trait selection for yield improvement. We agree whole-heartedly with Muller and Martre (2019) that crop models are important tools at the cross-roads needed to link physiology, genetics, and phenomics. With the rapid advance in genotyping and QTL analysis, the most limiting factor now is the phenotyping of crop traits and performance in multiple environments (Furbank and Tester, 2011). The significant amount of resources directed to phenotyping and QTL analysis (Tardieu et al., 2019) will benefit considerably if combined with crop simulation modeling and physiological understanding as a way to integrate those many phenotypic outcomes over multiple environments (Chenu et al., 2018). There is good opportunity to integrate crop growth models with genome wide prediction to improve genomic prediction accuracy, in particular when G×E interactions are an important determinant of performance (Technow et al., 2015;Cooper et al., 2016;Messina et al., 2018).
However, for linking crop models to genes and QTLs, it is important to consider the correct physiological mode of action and to consult with plant breeders on the feasible genetic range for a given trait. Most traits of interest to plant breeders are emergent outcomes of multiple genes and physiological processes, which implies that crop models may need improvement to include more detailed representation of processes and dissection of traits into component traits at ecophysiology level to better simulate those emergent outcomes (Hammer et al., 2016;Chenu et al., 2018). Evaluating traits with crop growth models reveals that G x E interactions are associated with environments differing in water supply, temperature, CO 2 level, soil water-holding characteristics, and crop management. Models often mimic G x E because a single trait (gene) that is beneficial in one environment may be negative in another. The crop models will need a certain depth of physiological detail, genotypic information, and understanding of genetic direct (G) and interactive (GxE and GxG) effects on dynamic physiological processes to robustly incorporate QTLs and genes into dynamic crop models.