Evaluation of soybean ( Glycine max L.) adaptation to northern European regions under different agro-climatic scenarios

Soybean is a candidate crop to increase the independency of Europe in leguminous protein crops. However, its adaptation to northern European regions is not yet well defined due to the lack of long-term references. Herein, we simulated soybean yield potential in northern France and identified the main yield limiting factors under rainfed vs. irrigated conditions. Two cultivars representing maturity groups 000 and 00 were planted within three different trials. Leaf area index, shoot and pod biomass, main phenological stages and yield were recorded to evaluate CROPGRO-soybean model predictability. Adjustment of genetic coefficients was performed prior to simulate yield on 21-years weather database (1999–2018) at Beauvais (France, N 49.46°, E 2.07°) and Estrées-Mons (France, N 49.88°, E 3.01°) under different water regimes and planting dates. Predictions showed that adding irrigation at grain filling period would increase yield potential to the level of non-water limited scenarios. Although simulated yield variability is reduced with irrigation, the remaining variability suggests that water is not the only yield-limiting factor. A tentative explanation is proposed by deriving environmental covariates from the model. The analysis confirmed the importance of precipitation amount (optimum around 200 mm) and duration (optimum around 60 days) of the flowering to physiological maturity period under rainfed conditions. Under irrigated conditions, increasing evapotranspiration and average minimum temperature affected simulated yield positively while increasing the number of days below 10 °C had a negative impact. These results give insights for soybean crop management and bring indications to breeders for adapting the existing genetic material to northern Europe.


IN TROD UCTION
Increasing the production of protein crops in Europe for both food and feed usages is becoming a growing priority for three main reasons: being more independent from imports, decreasing usage of nitrates and pesticides, giving new options to farmers (Plaza-Bonilla et al. 2017;Marraccini et al. 2020). Among the leguminous crops responding potentially to the demand, soybean (Glycine max) is a serious candidate. To date, soybean is very little cultivated in northern Europe, and northern France, despite a rising interest from farmers. The potential of the crop in these regions is not yet well defined and long-term cropping historical data are needed. Several studies have shown that cool temperatures (<15 °C), low solar radiations and low water availability along the crop cycle may affect soybean growth and development (Zhang et al. 1996;Krüger et al. 2014;Kumagai and Sameshima 2014;Giménez et al. 2017).
According to the Köppen-Geiger World climate classification, northern France belongs to a Cfb type of climate: marine temperate without dry seasons, warm summers and mild winters (Peel et al. 2007;Beck et al. 2018). This cluster includes a large part of France, UK, Belgium, Western Germany, the Netherlands and part of Denmark. Another classification has been proposed for Europe where the region falls under a Maritime zone which includes additional countries like Switzerland, Sweden, Ireland, part of Austria (Bouma 2005). Based on 20 years of Agri4cast data, the average temperature in Northern France is 17.11 °C for soybean growing season (1 May to mid-September) with minimum averages of 12.4 °C and maximum of 21.84 °C, a solar radiation of 17.49 Mj/ m 2 and a cumulative precipitation around 276 mm. The latitude between 48 and 51° North is impacting daylength and the crop is submitted for 2 months at daylength greater than 15.5 h.
The use of Crop Growth Models (CGM) is now well adopted to run simulations, unless there is a minimal calibration effort done in the field, for testing crop in specific pedoclimatic contexts (Boote et al. 2016;Maiorano et al. 2017). Such in silico references can help develop strategies to maximize yield potential (Asseng et al. 2003;Chenu et al. 2017). Furthermore, analysis of data from CGM simulations can help identify traits of importance to work in a breeding program for improving crop adaptation (Chapman et al. 2002;Hammer et al. 2002;Chapman 2008;Chenu et al. 2011).
Among soybean CGM, CROPGRO has been calibrated for North America under well-watered conditions . In 2001, a calibration of CROPGRO-soybean model under rainfed and cool conditions was performed in Spain (Ruıź-Nogueira et al. 2001). In their study, adjusting soil depth, water holding capacity and root elongation rate improved phenology and yield predictability prior to perform long-term simulations under irrigated and rainfed conditions. Then, identification of environmental parameters impacting yield variability is critical to develop agronomic and breeding strategies for adapting soybean to northern European conditions (Board and Kahlon 2011). The objectives of this study are to (i) simulate soybean yield potential in northern France after crop model calibration and to (ii) identify the main soybean yield-limiting factors under rainfed vs. irrigated conditions. The expectations are to help farmers adopting this new crop by providing information that can overcome the lack of long-term soybean cropping references through calibration efforts and simulations under latitudes higher than 49° North.  Jones et al. 2003;Hoogenboom et al. 2019) was chosen for simulations. The Willmott agreement index (d-stat) (Willmott et al. 2012) and root mean square error (RMSE) were used to evaluate simulations accuracy (Willmott 1982). A model calibration was performed using the in-season growth and development field measurements. Cultivar coefficients of the two generic cultivars MG 000 and MG 00 available in the model were manually adjusted. The calibration procedure started with phenology first and then growth parameters (Hunt and Boote 1998). The coefficients were kept within the typical range values for soybean  and validated on the basis of maximizing the d-stat values and minimizing the RMSE for each time-series growth trait.

Evaluation of agronomic suitability and yield potential of soybean
To evaluate yield potential and agronomic suitability, two sites representing contrasted environments of Northern France were selected. One was Beauvais (same site used for calibration) and the other one was Estrées-Mons, France (Lat. 49.878, Long. -3.0067). It is expected that our study sites have a prediction power for regions beyond our scope. A weather database including daily minimum and maximum temperature, solar radiation (MJ/m 2 ), precipitation (mm) and potential evapotranspiration (mm) was assembled from 1999 to 2019 for both sites. Two soils representing each location (deep silt) were selected for simulations. Two agronomic factors were evaluated to assess yield potential of the calibrated cultivars RGT Sirelia (MG 000) and ES Mentor (MG 00): the planting date and water regime. CROPGRO-soybean model has demonstrated its value in evaluating yield response to planting date and irrigation scenarios (Salmerόn et al. 2017;Yassi et al. 2019). A series of planting dates were tested in a first time (April 20, April 30, May 10, May 20 and May 30) to select the optimal and latest possible ones: April 30 and May 20, respectively. The optimal date maximized the average simulated grain yield and minimized its variability across the 21 years of simulations. The latest possible date ensured a harvest maturity date compatible with a secure winter wheat planting. These two dates will be the two planting scenarios used in our study. Two water regimes were tested: rainfed and irrigated. The irrigated regime, compatible with regional agronomic practices, was determined by minimizing the simulated yield difference with the non-water limited simulation option of the model: 3 irrigation amounts of 30 mm at 60, 80 and 100 days after planting. All simulations were performed considering a planting density of 70 seeds per square metre and a row spacing of 0.45 m.

Identification of environmental covariates 2.3.1 Environmental variables calculation. Based on the individual
year simulations, 35 environmental covariates were calculated to identify key factors explaining year to year yield variation. The variables were defined around three development stages that are simulated in DSSAT seasonal analysis: sowing to emergence (SEM), emergence to flowering (EMFL), flowering to physiological maturity (FLPM). Four major categories of variables were identified: duration of stages (in number of days), temperature variables (average minimum temperatures, number of days below 10 °C, number of days below 15 °C, average maximum temperatures, number of days above 30 °C, number of days above 34 °C), water variables (cumulative rainfall and irrigation in mm, potential evapotranspiration in mm), solar radiation (cumulative daily solar radiation, average of solar radiation in MJ/m 2 ), photothermal quotient (defined as the ratio of solar radiation on heat units).

Statistical analysis.
After generating the environmental variables, two tables were created. The X-table contained the year/location combinations with all the calculated environmental covariates and the Y-table represented the simulated yield values, estimated by the model for each year/location combination. Statistical models are available to sort out the most relevant variables explaining yield variation. Among them, bilinear models such as Partial Least Square Regression (PLS) models seemed particularly appropriate (Crossa et al. 2010). One of the advantages of PLS over linear factor regression models is the possibility to evaluate a high number of covariables allowing to integrate as many as possible environmental variables and so, limiting a priori statements on the factors to include or not in the analysis. There are adapted versions of PLS to address different objectives (regression, classification, variable selection, and survival analysis) (Mehmood and Ahmed 2016). We used the variable selection methods accepting that X-matrix contains redundant or irrelevant variables without impacting the results (Mehmood et al. 2012). Among these variable selection methods, we used the filter method taking VIP (Variable Important in the Projection) as a selection criterion. Variables with a VIP>1 are the ones that are considered as able to explain the Y-table. The number of components were defined based on the Wold algorithm (Wold et al. 1984(Wold et al. , 1987(Wold et al. , 2001Tenenhaus et al. 2005). Analysis were performed under R software.

Model performance
Crop stages were well simulated by using generic MG 000 and MG 00 cultivar coefficients, especially for RGT Sirelia (Table 2). For ES Mentor, simulations were site-dependent for phenology and adjustment of phenological trait coefficients did not consistently improve the predictions across sites. For both genotypes, physiological maturity was accurately predicted (with a maximum of 3 days difference). Therefore, the generic values of phenological trait coefficients were kept for running long-term simulations. For RGT Sirelia and ES Mentor, adjustment of two growth trait coefficients: maximum leaf photosynthesis rate at 30 °C (+0.10 mg CO 2 m −2 s −1 ) and maximum size of full leaf (three leaflets) (+10 cm 2 ) consistently improved model predictability for time-series growth traits at Beauvais (Table 3). Phenology predictions were not affected by such adjustments (data not shown). Yield prediction was globally satisfactory and improved after calibration, except for the first planting date at Beauvais and for RGT Sirelia at Catillon-Fumechon. The calibration did not necessarily improve model performance for predicting time-series LAI and leaf weight at Catillon-Fumechon for both cultivars. However, values of Willmott agreement index for these two traits remained close to the upper limit of 1 (d-stat = 0.951 and 0.922 for RGT Sirelia and d-stat = 0.846 and 0.824 for ES Mentor, respectively, for LAI and leaf weight).

Yield potential under rainfed conditions
At Beauvais site, considering the two planting dates under a rainfed scenario, simulated grain yield reached 2716 ± 652 kg ha −1 for RGT Sirelia (MG 000) and 2878 ± 615 kg ha −1 for ES Mentor (MG 00) ( Fig. 1). At Estrées-Mons site, simulated grain yield reached 2982 ± 402 kg ha −1 for RGT Sirelia (MG 000) and 3184 ± 410 kg ha −1 for ES Mentor (MG 00). For both maturity group, the planting date had no significant effect on simulated grain yield, neither at Beauvais site nor at Estrées-Mons. Under rainfed conditions at Estrées-Mons site, simulated grain yield was slightly higher for ES Mentor (3272 ± 418 kg ha −1 ) compared with RGT Sirelia (3006 ± 433 kg ha −1 ), planted on 30 April (W = 307, P < 0.05). No significant differences in simulated grain yield was observed between the two maturity groups in any other treatments.

Variables driving rainfed yield
PLS analysis sorted the 35 environmental covariates from the highest VIP to the lowest ones relative to simulated yields under rainfed conditions in the two locations, on 21 years (1999-2019) for RGT Sirelia (MG 000) and ES Mentor (MG 00). Table 4 displays the variables that show VIP greater than 1. First of all, the 3 first variables with the highest VIP concerned only the period from flowering to the end of grain filling period. The main variable impacting simulated yields was the amount of precipitation accumulated from flowering to physiological maturity with a very high VIP (2.93) compared with all the other variables. This variable tended to follow a logarithmic function with an R 2 coefficient of 0.45 (Fig. 2). Then, the average minimum temperature in the same period appeared to influence simulated yield with a high VIP of 1.90 and a positive β-coefficient of 0.40, meaning that increasing the average minimum temperature at grain filling has a positive impact on simulated yield level. Then, the duration of the grain filling period was showing a VIP of 1.87 with a positive β-coefficient of 0.30, meaning that an extension of this period has a positive impact on simulated yield level. This variable seems to follow a polynomial trend (R 2 = 0.35) (Fig. 3) with an optimum around 60 days. The other variables showed less impact based on their VIPs.

Yield potential under irrigation
Yield response to irrigation scenario (3 irrigation paths of 30 mm at 60, 80 and 100 days after planting) was the same for the two cultivars RGT Sirelia (MG 000) and ES Mentor (MG 00) at any location and planting date (data not shown). At Beauvais site, adding irrigation enhanced simulated grain yield by +498 kg ha −1 and +467 kg ha −1 , respectively, for the planting dates 30 April and 20 May (W = 1287, P < 0.001 and W = 1319, P < 0.001) (Fig. 4). At Estrées-Mons site, the positive effect Table 2. Prediction of main phenological stages for RGT Sirelia (MG 000) and ES Mentor (MG 00) using CROPGRO-soybean model generic cultivars. Differences (number of days) between observed and simulated stages (days after planting) are given for each trial and cultivar. Three trials distributed across two sites are considered: Beauvais, France (two planting dates) and Catillon-Fumechon, France.

Experimental sites Beauvais (France) Beauvais (France) Catillon-Fumechon (France)
Coordinates ( Evaluation of soybean adaptation to northern Europe • 7 of irrigation was only observed for the planting date 30 April (+249 kg ha −1 , W = 1147, P < 0.05). A decrease of simulated yield variability was obtained by adding irrigation at Beauvais (around 300 kg ha −1 ) and Estrées-Mons (around 70 kg ha −1 ) sites for the two planting dates. Considering the 21 years of simulation at Beauvais site, the irrigation scenario raised the lowest yield values from 1126 to 2450 and from 1273 to 2525 kg ha −1 , respectively, for the two planting dates.

Variables driving yield variation under irrigation
PLS analysis sorted the 35 environmental covariates from the highest VIP to the lowest ones in regards of simulated yields under irrigation conditions in the two locations, on 21 years (1999-2019) for RGT Sirelia (MG 000) and ES Mentor (MG 00). Table 5 is displaying the variables that show a VIP greater than 1. The precipitation accumulation during the grain filling period, that was the strongest factor in rainfed conditions, did not appear anymore in the irrigated scenario. Instead, the potential evapotranspiration factor during the same period going from flowering to physiological maturity was showing up as a major driver of yield levels (highest VIP of 1.77), suggesting that the increase of evapotranspiration leads to higher yields (β-coefficient of +0.24). The minimum average temperature in the same period was, as in the rainfed scenario, a key factor (VIP = 1.64). This factor seemed to correlate with the cold stress factor defined as a number of days below 10 °C during the grain filling period. This cold stress variable shown a negative β-coefficient (-0.18) meaning that increasing the number of days below 10 °C during grain filling is detrimental to yield. An additional factor concerning the vegetative period is showing a VIP value of 1.55: heat stress factor defined as the number of days above 34 °C. This factor had a negative β-coefficient (-0.25) meaning that increasing of heat stress has a negative impact on yield potential. It is interesting to note that this factor was also detected in the rainfed scenario but compared with the 3 main factors, its VIP value was much lower.

Soybean yield potential and impact of adding irrigation
Soybean is known to be sensitive to water deficit from vegetative to grain filling period (Karam et al. 2005;Giménez et al. 2017;Nemeskéri and Helyes 2019). Several studies have simulated the positive effect of water supply to soybean crop under tropical and sub-tropical climate (Bhatia et al. 2008;Grassini et al. 2015;Sharda et al. 2019). In our region, yield gap between irrigated and rainfed treatments appeared to be limited globally, although it was higher at Beauvais site (+450 kg ha −1 ) than at Estrées-Mons (+250 kg ha −1 ). While the total amount of water received at both sites during the crop cycle was comparable (222 and 228 mm for Beauvais and Estrées-Mons, respectively), it cannot explain the low differences between irrigated and not irrigated scenarios at Estrées-Mons. A more detailed analysis of the water distribution along the cycle shows that the difference in water supply occurs between first pod and physiological maturity (70 and 77 mm for Beauvais and Estrées-Mons, respectively) while between first flower and first pod, the precipitation average is similar for both sites (44 mm). This difference translated into a higher average simulated water stress index calculated at Beauvais (0.50 ± 0.22) compared with Estrées-Mons (0.43 ± 0.23) under rainfed conditions. Additionally, the level of stress can also be linked to the potential evapotranspiration that is higher at Beauvais compared with Estrées-Mons for both emergence to flowering (173 ± 22 mm and 158 ± 21 mm, respectively) and flowering to physiological maturity (230 ± 20 mm and 219 ± 21 mm, respectively) periods. We propose to investigate soybean response to water stress at various levels of irrigation in the region, by implementing multi-year field experiments on MG 000 and MG 00 cultivars.

Variables impacting soybean yield under rainfed conditions
The PLS results concerning the rainfed simulations underlined the strong impact of water supply during the grain filling period for the yield elaboration under our Northern France conditions. Although, the region is often considered as a humid region, water appears to be the main limiting factor of the production as in general water is not well distributed along the crop cycle. Under Hungarian conditions, it was observed that when whole season precipitation is lower than 300 mm, yield is penalized (Gajić et al. 2018;Anda et al. 2020) and the drought stress at vegetative stage does not affect yield (Gajić et al. 2018). This agrees with our precipitation data calculated on the growing season   Evaluation of soybean adaptation to northern Europe • 9 from emergence to physiological maturity, the 21 years average for the season precipitation is 225 mm for both sites with the following partitioning: 21 mm from planting to crop emergence, 86 mm from emergence to flowering stage and 118 mm from flowering stage to physiological maturity. The factor precipitation alone seems to follow a logarithmic function and the optimum level of precipitation would be around 200 mm, which is close to the average potential evapotranspiration estimated by the model: 224 mm. This result supports the decision to evaluate the irrigation option as a way to increase yield potential in northern regions. Furthermore, this factor underlines the need for improved water stress tolerant germplasm to widely expand the crop.
The second factor affecting yield is the minimum temperature average from flowering to physiological maturity. The average value based on the simulation is around 13 °C (from 11.6 to 15 °C). The β-coefficient shows that the increase of the minimum temperature average is favourable to yield. The average value of 13 °C is below the threshold of 15 °C which is considered as the level below which optimal growing conditions are not reached and where soybean starts to suffer from cool temperatures (Baker et al. 1989;Funatsuki et al. 2004); as we are testing adaptation for high latitudes, it seems not surprising that this factor is appearing. Evaluating cultivars that are able to maintain a superior level of photosynthetic activity under this level of chilling stress could be an area of investigation to develop better adapted germplasm. On the other hand, the maximum temperature average is not appearing as a critical factor. Maximum temperature average is around 24.6 °C (ranging from 21.5 °C to 28.2 °C) which is not limiting for the crop according to the literature (Caulfield and Bunce 1988;Baker et al. 1989).
The third factor concerns the duration of the flowering to maturity period. In our context, the duration of the flowering to physiological maturity period strongly correlates with the following variables: number of days where minimum temperatures are The variation of this period is fairly large going from 49 days up to 69 days. It has a positive β-coefficient (0.30) which means that increasing the duration of the period is associated with better yields. Some authors have suggested that a way to increase yields in soybean is increasing this reproductive period length ). Moreover, they have shown that the response is not linear but asymptotic. In our case, the simulations showed a similar trend with a tendency to plateau at 60 days before decreasing. From 49 days to 60 days, each additional day brings 154 kg ha −1 .
Increasing the grain filling period over 60 days does not bring any advantage. This could be related to additional cold stress (r = 0.83) and decrease in maximum temperature average (r = −0.77) that are offsetting the advantage of a longer grain filling period. However, extending the flowering to physiological maturity period to get closer to the optimum can be achieved by earlier planting dates. The simulations are showing that, on average, the planting date of 30 April is leading to a grain filling period of 59 days and an average yield of 3017 kg ha −1 when 20 May date leads to a grain filling period of 57 days and an average yield of 2865 kg ha −1 . The difference in vegetative period between the two planting dates is 6 days in favour of the earlier planting. Choosing or adapting soybean to earlier planting dates under high latitude environments may be a solution to increase yield potential, as it has an indirect effect on the grain filling duration. It has been suggested that selecting germplasm that can flower earlier under high latitudes (around mid-June) will allow longer grain filling periods and improve yields (Cooper 2003). We can also imagine that earlier planting dates may impact positively root development and in return improve crop capacity to sustain water deficit, by exploring more in depth the soil profile; it may also impact the grain filling duration.

Variables impacting yield potential under irrigated conditions
The simulations run under a realistic irrigation scenario (three applications of 30 mm) for our region have shown that a substantial yield variation still persists in spite of optimum water supply. PLS-analysis under this scenario has been carried out to define environmental covariates explaining this variation. The results can give indications about the strategies that can be designed to reduce this variability, and as a consequence, stabilize/increase yield potential beyond irrigation application. Again, it appears that the most critical period is the flowering to physiological maturity stage. As expected, precipitations (sum of rainfalls and irrigation) is not the key factor anymore; instead, the potential evapotranspiration appears as the key factor having the most impact on simulated yield. It has been demonstrated that evapotranspiration under irrigated conditions is explaining well the biomass, especially for the earlier maturity groups (Purcell et al. 2007). Knowing the strong correlation between biomass and grain yield for soybean, this result seems logical (De Bruin and Pedersen 2009). The positive relationship between yield and evapotranspiration has been demonstrated as a factor explaining maximum yield potential for a corn crop (Cooper et al. 2020). It seems also to be relevant for soybean. In our conditions, potential evapotranspiration better correlates to maximum temperature (R 2 = 0.35) rather than solar radiation (R 2 = 0.08) that is low and show little variation (17.48 ± 2.25 MJ m −2 ). This was already demonstrated in a previous study on temperatures effect on evapotranspiration under controlled conditions (Allen et al. 2003). It will be necessary to confirm this information with experimental data. The second factor is the average minimum temperature during the flowering to physiological maturity phase; this factor has already been discussed in the rainfed data analysis above. Under simulated irrigation, this factor can be related to the cold stress Table 5. Environmental variables with a VIP>1 sorted by descending VIP for simulated irrigated yield conditions after PLS analysis. PLS analysis was performed on two genotypes: RGT Sirelia (MG000) and ES Mentor (MG00), considering two locations: Beauvais, France (n = 84) and Estrées-Mons, France (n = 84) and 21 years of weather data . The statistical model estimated that the 3 components option was the optimal solution based on RMSE and cross-validation. β-regression coefficients are indicating whether the considered variable has a positive or negative effect on yield. variable (TMN10FLPM) which is showing a negative coefficient meaning a negative impact on yield. It is interesting to underline that the correlation between the cold stress variable and the minimum temperature average variable is not strong (r = 0.22), meaning that for a given temperature average, the number of days with temperature below 10 °C can differ substantially. For average temperatures between 12 and 13 °C, number of days below 10 °C varies from 0 to 18 days. So, the two variables need to be treated separately. Concerning the cold stress effect on soybean, one single day of exposure of soybean to a 8 °C night stress is able to reduce by 87 % the photosynthetic activity of soybean (van Heerden and Kruger 2000) or decrease drastically photosynthesis rate at V5 (Wang et al. 1997). A long period of cold stress applied after R1 stage has shown to impact harvest index of cultivars when compared with a warm treatment (18/23 °C) (Schmid and Keller 1980). Considering the impact of cold stress, it appears critical to assess more in depth the impact of this factor under more controlled conditions and further investigate if genotypic differences exist, to give breeding directions and eventually develop high throughput screening methods for the trait. The screening methods could be based upon measurements of photosynthetic activity if acceptable correlations with yield are demonstrated.

CON CLUSION
Critical environmental variables driving yield potential in Northern France have been defined. The major constraint under rainfed conditions is related to water supply. This result leads to build simulations for optimizing irrigation scenarios. Nevertheless, even if irrigation is able to mitigate substantially the yield penalty, a non-neglectable yield variability still persists in the simulations. This yield variation is related to traits such as duration of grain filling period, adaptation to cool night temperatures and cold stress occurring during the reproductive phase of the crop cycle. Nevertheless, the outcomes of the study are based upon one single crop growth model. It would be worthwhile to run the database with other crop growth models to confirm the relevance of our results. Based on these findings, some interesting directions in both crop management and cultivar improvement can be proposed to increase yield potential under high latitudes. Furthermore, assessing genotypic variability related to those traits needs to be done and if variability exists, the development of high throughput indirect screening methods should be designed to identify the best adapted germplasm for northern European regions.