- Split View
-
Views
-
Cite
Cite
Jussi Gillberg, Pekka Marttinen, Hiroshi Mamitsuka, Samuel Kaski, Modelling with historical weather information improves genomic prediction in new environments, Bioinformatics, Volume 35, Issue 20, October 2019, Pages 4045–4052, https://doi.org/10.1093/bioinformatics/btz197
- Share Icon Share
Abstract
Interaction between the genotype and the environment () has a strong impact on the yield of major crop plants. Although influential, taking explicitly into account in plant breeding has remained difficult. Recently has been predicted from environmental and genomic covariates, but existing works have not shown that generalization to new environments and years without access to in-season data is possible and practical applicability remains unclear. Using data from a Barley breeding programme in Finland, we construct an in silico experiment to study the viability of prediction under practical constraints.
We show that the response to the environment of a new generation of untested Barley cultivars can be predicted in new locations and years using genomic data, machine learning and historical weather observations for the new locations. Our results highlight the need for models of : non-linear effects clearly dominate linear ones, and the interaction between the soil type and daily rain is identified as the main driver for for Barley in Finland. Our study implies that genomic selection can be used to capture the yield potential in effects for future growth seasons, providing a possible means to achieve yield improvements, needed for feeding the growing population.
The data accompanied by the method code (http://research.cs.aalto.fi/pml/software/gxe/bioinformatics_codes.zip) is available in the form of kernels to allow reproducing the results.
Supplementary data are available at Bioinformatics online.
1 Introduction
Global yield improvements are needed to feed the growing population (Tester and Langridge, 2010). One possibility is to breed varieties for higher environmental adaptability, known as targeted breeding (Braun et al., 1996). By improving the genetic fit of varieties in their growth environments, yield potential in the interaction between the genotype and environment could be realized. Although the importance of for agronomic performance is widely accepted, utilization calls for methods that predict yields in new environments, because actual experimental data, consisting of yields of plant variety candidates from yield trials, will in practice be available only from a very limited number of environments. Importantly, prediction of a plant’s response to a new environment cannot be based on weather data from the growth season, as those will never be available at the time of prediction.
Predicting the yield of a new genotype in an untested environment is an instance of ‘cold start’ problems, where predictions are needed for completely novel instances. Previously the machine learning community has developed methods for such problems, e.g. to design novel drugs for previously unseen cancers (Costello et al., 2014) and to create recommendations in on-line shopping for new customers and/or products (Schein et al., 2002). These methods are based on using external covariate data that describe properties of the novel instances.
We develop a new method, an extension of Kernelized Bayesian Matrix Factorization (KBMF; Gönen and Kaski, 2014), to account for the uncertainty in the covariates, which allows the use of historical records to predict weather conditions for future growth seasons, and eventually makes future prediction for yield possible. Therefore, our new method, unlike the existing alternatives (Heslot et al., 2014; Jarquín et al., 2014; Lopez-Cruz et al., 2015; Malosetti et al., 2016), does not rely on accurate weather information from the growth season from the new location (Fig. 1). Although the most relevant use case for our method is making predictions for the next growing season, it could also be used to predict phenotype values in any year. For the years further in the future, however, using climate simulations instead of the historical sample of the microclimate would likely be needed.
In genomic selection (GS) (Meuwissen et al., 2001), field trials are replaced with genomic predictions to speed up plant breeding. We formulate an in silico experimental setup for GS in targeted breeding that, unlike existing works (Albrecht et al., 2014; Burgueño et al., 2012; Heslot et al., 2014; Jarquín et al., 2017; Malosetti et al., 2016; Saint Pierre et al., 2016), strictly satisfies all realistic constraints: test locations, years and genotypes are all genuinely new (not part of the training set) and yields are predicted for the offspring of the training set. In this setup, we demonstrate the feasibility of targeted breeding by investigating the accuracy of prediction using environmental data including historical weather information but without in-season data (Model ). We compare this with multiple competing settings, including the non-realistic ideal situation having in-season data , a model without the interaction , a previous implementation with interactions using in-season data (GE-BLUP) by Malosetti et al. (2016), and the industry standard that does not include (best linear unbiased prediction using genomic data by de los Campos et al., 2013, GBLUP). Data from a barley breeding programme in Finland from Boreal Plant Breeding Ltd, including historical weather information for the target environments, are divided into training, validation and test sets, and the prediction accuracy is measured as the average correlation between predicted and observed yields in the test sets (Malosetti et al., 2016). A sensitivity analysis is done to further explore the impact of model assumptions.
2 Materials and methods
2.1 Data
All data used in the experiment come from a barley breeding programme in Finland, which is a part of a larger population of target environments for barley as varieties used in Finland are also used in other Nordic countries. The phenotype consists of (z-transformed) yield measurements (kg/ha) for 2244 lines observed in trials at 11 locations across the 4 southernmost growth zones in Finland from 2008 to 2015. The total number of observed location × year combinations is 35. In some locations, trials have been performed on several years and several fields with varying soil properties, and a total of 12 277 yield observations have been recorded. The number of observations per genetic line ranges from 1 to 118 (median 4). The lines were genotyped with the Illumina 9k iSelect SNP Chip, SNPs with minor allele frequency < 0.05 or with > 5% values missing were omitted. Also all genotypes with > 5% of SNPs missing were omitted. The final proportion of missing genotype data is 0.002. The experiment-specific average yields range from 3700 to 8600 kg/ha. Test fold -specific yields are summarised in Supplementary Table S3. Soil types range from organic to clay soil; climatic variation is summarized in Supplementary Figure S2.
The soil characteristics for each field block are measured in terms of the proportions of sand, silt and clay (soil classification triangleShepard, 1954) and the proportion of organic content. Meteorological information consists of daily averages of temperature and rainfall, and the distances to the closest meteorological station range from 1 to 40 km (average 13.5 km). The baseline approach (GE-BLUP Malosetti et al., 2016) requires summarizing the weather information per crop stage: vegetative (from sowing to visible awns), heading time (from visible awns to the end of anthesis) and grain filling (from the end of anthesis to maturity). The times of the crop stages are estimated using temperature sum accumulation; the details are given in Section 2.7. In the weather observations, the proportion of missing values in daily average temperature and rainfall measurements is < 0.0015 (max 3 missing values/environment) and < 0.0032 (max 2 missing values/environment), respectively.
2.2 Experimental setup
To study prediction accuracy, we use a setup that strictly imposes the realistic constraints related to modelling in targeted breeding for new locations. Predictions are required for new locations (not part of the experimental grid) and for years for which no phenotype data are available (to mimic future growth seasons). Additionally, predictions are needed for the offspring of the lines in the training set, which have no phenotype data observations.
Different prediction tasks, distinguished by the availability of different data types, are summarized in Figure 2. Setups 1–4 have been studied by Jarquín et al. (2017) and Malosetti et al. (2016): in Setup 1, phenotype measurements are available for the genotypes and environments to be predicted, and both genotypes and environmental covariates are fully observed. In Setups 2 and 3 phenotype measurements are still available, but only for the genotypes or the environments to be predicted, but not for both, and covariates are fully observed. In Setup 4, no phenotype data are available for the environments/genotypes to be predicted, but both genetic and environmental covariates are still fully observed.
We introduce two additional setups. In Setups 5 and 6 environmental covariates from the environments of interest are only partially available: location and soil characteristics are known but in-season weather measurements are not available for the year of interest. However, historical observations for the same locations are available and can be used to estimate the performance of each genotype. Setups 5 and 6 differ in whether phenotype measurements are available from some other environment for the genotypes (5) or not at all (6). The results in this article are for the most challenging Setup 6, where no phenotype data are available for any of the lines of interest. We emphasize that a further difference to earlier work (Jarquín et al., 2017; Malosetti et al., 2016) is that we strictly require the test environments to simultaneously be both from a location and from a year not included among the training environments, and that the genotypes in the test and validation sets are required to be from the progeny of the training set. A summary of the differences between our setup and the earlier works is given in Supplementary Table S2.
We measure prediction accuracy using cross-validation, where the training, validation and test sets are selected to enforce the realistic constraints (Fig. 3a). The pseudo code for the nested 3D (genotype, location and year) cross-validation is given in Supplementary Section S4. In brief, 41 test sets were constructed corresponding to different year, location and genotype sets. For each test fold, at maximum 10 training/validation set splits were further created. Hyperparameters for each test fold were selected based on the average performance over the validation sets. The final predictions were created by retraining the model with the selected hyperparameters without omitting any validation set. To maximally exploit the data in our experiment, we also used the past years as test sets (from which all data were then similarly omitted). This is made possible by the fact that the progeny genotypes were tested on several years.
To measure the prediction accuracy, we employ the commonly used Pearson correlation between the predicted and observed yields in the test set (see Albrecht et al., 2014; Burgueño et al., 2012; Malosetti et al., 2016; Saint Pierre et al., 2016). This correlation is computed for each cross-validation fold in turn, and averaged over the test cases. Similarly to Malosetti et al. (2016), the test-case-specific correlations are transformed into Fisher’s z-scores before averaging and back-transformed to obtain the final results. We regress the interactions on the average characteristics of the growing season: for each yield trial, we use weather observations from the typical growing season (from May 1 until the end of August) regardless of the sowing date. This indirect approach enables the use of historical weather data. When predicting with historical data, the prediction for each genotype is made for each year for which historical weather observations are available, and the median of those is used as the final predicted value.
We also carry out a sensitivity analysis that allows studying the impact of modelling assumptions, such as inclusion and the number of interaction components to the model. In detail, the sensitivity analysis shows variability (median and 90% interval) in the predictive performance in a given test environment (location-year combination) when we vary (i) the hyperparameter values over their specified ranges (see Section 2.4), (ii) the genotype sets that we are predicting and (iii) the training set by removing any single training environment for validation. For details of the sensitivity analysis, see Supplementary Section S7.
2.3 Model
In the models and we assume that (i) the yield yij of genotype i in environment j is affected by the genotype, the environmental conditions throughout the growing season, and the interactions between the two, (ii) the response to the environmental properties is non-linear and that (iii) it may involve interactions between different environmental properties. For instance, temperature/rainfall either too low or too high reduces yield, and the response to rainfall is also affected by the soil type. We further assume that (iv) the responses to the environmental conditions are highly polygenic. Assumptions i-iv are encoded using the kernel trick (see Shawe-Taylor and Cristianini, 2004), in which covariate data are represented as similarities, or kernels, between different data items. Kernel methods are a computationally effective way to model non-linearities and interactions and they have been applied to breeding data (Gianola et al., 2014). An additional complication in the data is the low number of observed trials compared with the complexity of the problem. To handle this, we constrain our model to only learn the most prominent combinations of environmental conditions affecting yield, by assuming a low-rank approximation for the model parameters accounting for the effects. Finally, we follow the Bayesian statistical framework (Gelman et al., 2013), and regularize the model by placing priors on all parameters, which alleviates overfitting to the training data and improves prediction accuracy in the test data.
Our method builds on KBMF (Gönen and Kaski, 2014). In brief, KBMF is a matrix factorization method that incorporates information about the dependencies between the rows and columns of the data matrix in the form of side information. The side information data sources are kernelized for efficient computation of non-linear effects. Plant breeding literature typically considers locations as static and years as a dynamic component of , but we represent each environment implicitly as a probability distribution of growing conditions. We assume that a realization of that distribution fully captures the environmental covariates and hence the location and the year are not explicit factors in our regression model. These assumptions provide a framework for predicting for new years without observed weather data: an estimate for the location-specific environmental covariate distribution can be estimated using historical data and predictions are obtained by integrating over this probability distribution.
2.4 Specifying hyperparameter values
We use a combination of prior knowledge and cross-validation to determine the hyperparameters: first prior knowledge to inform about a grid of sensible values, then cross-validation to select a value from the grid. To express the prior knowledge, we relate the sizes of the different terms in Equation (1) to the total variation of the output variables yij and to each other, and select hyperparameters such that the expected proportion of total variance explained (PTVE) scales reasonably with the total variance of the outputs; e.g. the covariates obviously cannot explain >100% of the variance. In practice, because the PTVE depends also on the data, its value corresponding to a given hyperparameter combination is estimated using Monte Carlo simulation. This way, either a single fixed value or a grid of values to be selected from by cross-validation is determined for each hyperparameter (see below for details). Previously, a similar approach has been used by Gillberg et al. (2016). The assumptions about the relative sizes of different effects that were used to define the numeric values of the hyperparameters are given below.
This large value maintains the identifiability of the different terms in Equation (1), by assuming that noise comes mainly from the last term ϵij. However, the value cannot be made exactly equal to unity, to have some flexibility in the variational inference algorithm. The second condition is that the variance of the genetic main effects, Vargenetic = Var, is either 0.2, 0.4 or 0.6, to be selected by cross-validation.
2.5 Inference
For inference we use variational approximation (Beal, 2003), which is a computationally feasible way to approximate posteriors of parameters in complex models. The variational updates required here can be derived similarly to Gönen and Kaski (2014), except that we have extended their model and algorithm by including the genotype and environment main effects, i.e. the terms gi and ej in Equation (1). Further details are given in Supplementary Section S2.
2.6 Data pre-processing and kernels
The environmental kernel Ke is formulated as a weighted sum of data source and transformation specific normalized kernels (), see Supplementary Table S1. The weights reflect the importance of data sources, and they are learned by fitting Bayesian Efficient Multiple Kernel Learning (BEMKL; Gönen, 2012), a multiple kernel regression method, to the training data using experiment-specific yield means as the target variable. For BEMKL, shape (α) and scale (β) parameters of the Gamma priors are set to 1, except for the λ parameter, whose scale is fixed to 10 to provide stronger regularization. Regression bias term b is set to 0. For details of BEMKL, see Gönen (2012). Before combining the kernels, the weights are normalized such that their sum of squares is equal to 1 and the largest weight (in absolute value) is positive. Results of the sensitivity analysis of the kernel weights are presented in Supplementary Figure S1. Finally, the composite kernel Ke is normalized according to Equation (7).
2.7 Comparison methods
The mixed model computations for the comparison methods GBLUP and GE-BLUP are performed using the R library rrBLUP (Endelman, 2011). For both methods, fixed effects were used to account for field block-specific effects, corresponding to the terms ej in and . For GBLUP, the genomic kernel (see Section 2.3) was used as the covariance matrix Σ. For GE-BLUP, the environmental kinship model (model GE-KE in Malosetti et al., 2016), is used and the full covariance matrix Σ is generated through the Kronecker product , where ΣG and ΣE are the genetic and environmental covariance matrices, respectively. The environmental covariance matrix ΣE is generated from the available environmental data to describe soil properties and the growth conditions during the vegetative, heading time and grain filling developmental stages. All soil data and growth zone information are used as such whereas the daily average temperature and rainfall measurements are summarized as the mean and the standard deviation of the daily observations per crop stage. The growth periods are estimated using the sowing date and temperature sum accumulation-based estimates of heading and ripening times (440.2 and 905.9°C, respectively), which were estimated from external breeding data. The vegetative stage is assumed to last 3 weeks starting from sowing, the time of heading is assumed to start 2 weeks before and last 1 week after the estimated heading time and grain filling was assumed to start after heading and to last 1 week longer than the estimated time of ripening. Wide estimates for the growth periods were used to account for varying growth speeds. The resulting set of environmental covariates is z-normalized and a linear kernel is used, further normalized according to Equation (5).
3 Results
Table 1 shows a comparison of the different methods in the challenging setup where yield experiments are not available for either the genotypes or the environment (i.e. location-year combination). We see that modelling with historical weather data, , improves prediction accuracy as compared with the industry standard, GBLUP (P = 0.023, a two-sided paired Wilcoxon signed rank test, df = 17). The improvement is comparable to using in-season data (, P = 0.011). The Bayesian methods in general show a higher accuracy whereas GE-BLUP performs poorly with the data available. Overall, the absolute prediction accuracy of all methods is relatively low in this challenging setup, with having the highest correlation of 0.105. Nevertheless, the improvement is considerable over the industry-standard with correlation 0.077, which corresponds to the proposed new method explaining 85% more of the variation of the phenotype on average.
. | New location, year and genotype . | CV1: new genotype, tested environment . | CV2: tested genotype, new environment . | ||||||
---|---|---|---|---|---|---|---|---|---|
Model . | Mean . | . | P . | Mean . | . | P . | Mean . | . | P . |
0.104 | 0.04 | 0.011 | 0.173 | 0.040 | 0.16 | 0.251 | 0.034 | 0.22 | |
0.105 | 0.03 | 0.023 | 0.158 | 0.042 | 0.64 | 0.244 | 0.034 | 0.81 | |
GE-BLUB | 0.004 | 0.03 | 0.109 | 0.123 | 0.04 | 0.458 | 0.191 | 0.036 | 0.10 |
GBLUB | 0.077 | 0.03 | N/A | 0.156 | 0.038 | N/A | 0.240 | 0.034 | N/A |
0.097 | 0.04 | 0.109 | 0.151 | 0.040 | 0.75 | 0.237 | 0.034 | 0.71 |
. | New location, year and genotype . | CV1: new genotype, tested environment . | CV2: tested genotype, new environment . | ||||||
---|---|---|---|---|---|---|---|---|---|
Model . | Mean . | . | P . | Mean . | . | P . | Mean . | . | P . |
0.104 | 0.04 | 0.011 | 0.173 | 0.040 | 0.16 | 0.251 | 0.034 | 0.22 | |
0.105 | 0.03 | 0.023 | 0.158 | 0.042 | 0.64 | 0.244 | 0.034 | 0.81 | |
GE-BLUB | 0.004 | 0.03 | 0.109 | 0.123 | 0.04 | 0.458 | 0.191 | 0.036 | 0.10 |
GBLUB | 0.077 | 0.03 | N/A | 0.156 | 0.038 | N/A | 0.240 | 0.034 | N/A |
0.097 | 0.04 | 0.109 | 0.151 | 0.040 | 0.75 | 0.237 | 0.034 | 0.71 |
Note: The new prediction method outperforms the others in the most challenging setup and performs equally in the less demanding setups. Mean, correlation between predicted and observed yields, averaged across test sets; SE, standard error of the mean; P, P-value compared with the industry standard (GBLUP).
. | New location, year and genotype . | CV1: new genotype, tested environment . | CV2: tested genotype, new environment . | ||||||
---|---|---|---|---|---|---|---|---|---|
Model . | Mean . | . | P . | Mean . | . | P . | Mean . | . | P . |
0.104 | 0.04 | 0.011 | 0.173 | 0.040 | 0.16 | 0.251 | 0.034 | 0.22 | |
0.105 | 0.03 | 0.023 | 0.158 | 0.042 | 0.64 | 0.244 | 0.034 | 0.81 | |
GE-BLUB | 0.004 | 0.03 | 0.109 | 0.123 | 0.04 | 0.458 | 0.191 | 0.036 | 0.10 |
GBLUB | 0.077 | 0.03 | N/A | 0.156 | 0.038 | N/A | 0.240 | 0.034 | N/A |
0.097 | 0.04 | 0.109 | 0.151 | 0.040 | 0.75 | 0.237 | 0.034 | 0.71 |
. | New location, year and genotype . | CV1: new genotype, tested environment . | CV2: tested genotype, new environment . | ||||||
---|---|---|---|---|---|---|---|---|---|
Model . | Mean . | . | P . | Mean . | . | P . | Mean . | . | P . |
0.104 | 0.04 | 0.011 | 0.173 | 0.040 | 0.16 | 0.251 | 0.034 | 0.22 | |
0.105 | 0.03 | 0.023 | 0.158 | 0.042 | 0.64 | 0.244 | 0.034 | 0.81 | |
GE-BLUB | 0.004 | 0.03 | 0.109 | 0.123 | 0.04 | 0.458 | 0.191 | 0.036 | 0.10 |
GBLUB | 0.077 | 0.03 | N/A | 0.156 | 0.038 | N/A | 0.240 | 0.034 | N/A |
0.097 | 0.04 | 0.109 | 0.151 | 0.040 | 0.75 | 0.237 | 0.034 | 0.71 |
Note: The new prediction method outperforms the others in the most challenging setup and performs equally in the less demanding setups. Mean, correlation between predicted and observed yields, averaged across test sets; SE, standard error of the mean; P, P-value compared with the industry standard (GBLUP).
In the less challenging setups CV1 and CV2, where measurements are available for either the genotype or the environment, the Bayesian methods that incorporate environmental information achieve a minor improvement, which is not statistically significant in prediction accuracy as compared with the comparison method (see Table 1). The improvement is not as dramatic as has been reported earlier (Heslot et al., 2014; Jarquín et al., 2014; Malosetti et al., 2016). The explanation for this may be our indirect way of using the environmental data, by considering weather data over the whole growing period regardless of the exact sowing date, which may not be as effective for this problem as the models formulated by others. This hypothesis could be studied by the means of a direct comparison with the methods from e.g. Heslot et al. (2014); Jarquín et al. (2014) on such a dataset, which contains more accurate information about the growing stages of the varieties, as required by the earlier methods.
Comparison of the results from the proposed new setup, CV1 and CV2 well quantifies the relative challenges related to the different setups. Predicting the performance of a new genotype in a tested environment is considerably more challenging (average correlation between predictions and test data ∼0.16) than predicting the performance of a line for which phenotype observations are already available (average correlation ∼0.24), and simultaneously making predictions for new environments and new genotypes is still substantially more difficult (average correlation ∼0.08), as was also observed by Jarquín et al. (2017) and Malosetti et al. (2016). Clearly the used in silico experimental setup dominates the general level of predictive performance and should be chosen carefully when evaluating computational methods for plant breeding. Finally, the needs for processing environmental data are different in the three setups. In CV1 and CV2 methods have access to phenotype data from all the environments and effects can be modelled implicitly without explicitly taking into account environmental data. We hypothesize that as a result, to gain significant performance improvements, accurate data is needed. In the new setup, the environment is completely new and transfer of information about variety performance under the conditions of the new environment calls for explicit modelling of .
The sensitivity analysis demonstrates considerable variability between test environments (Fig. 3b). Indeed, including interaction terms into the model decreased accuracy in 1/18 environments, had little effect in 11/18 environments, but improved the accuracy substantially in 6/18 environments. In the last group, increasing model complexity by adding more components consistently improved performance, which highlights the potential to increase accuracy through complex modelling of . Importance of different data sources to the predictions can be further analysed by investigating the kernel weights that the method has learned to summarise the contributions of the data sources (Supplementary Fig. S1). We see that the two most influential kernels were the ones representing (i) the non-linear interaction between soil type and daily rainfall, and (ii) the non-linear effect of rain, matching well the biological understanding of the problem.
4 Discussion
Our experiments confirmed that prediction in new environments is a challenging task, as reported earlier (Jarquín et al., 2017; Malosetti et al., 2016), our new method reaching the highest correlation of 0.105 between predictions and observations. Nevertheless, the usefulness of including multiple interaction terms and non-linear interactions between environmental covariates became evident from our results. We expect that gains from modelling will increase in the future as more data, representing further locations and years, will allow more accurately distinguishing the interactions from the main effects. Other ways to improve the predictions include using more detailed genomic modelling, e.g. using Gaussian and other kernels for summarizing the SNP data.
Besides targeted breeding, there are several other needs for prediction models. They could mitigate the problems of conventional breeding: accounting for historical weather in the actual target population of environments can help prevent overfitting to the conditions in the few field trials performed, as discussed in detail in Supplementary Section S3. The assumption of the match between field trials and actual growing locations is equally crucial for the official variety trials for value of cultivation and use, required in most countries to evaluate new varieties. models are also needed in assessing the effects of climate change and to select for varieties that react favourably to the altering conditions (Tester and Langridge, 2010). For this purpose, the historical weather observations in can be replaced with climate simulations to assess the performance of varieties under various climate scenarios.
To summarize, we showed that prediction in the setup required by targeted breeding, where the environments are strictly new and predictions are based on historical weather data available at the time of prediction, significantly outperforms the current industry standard in prediction accuracy. Such improvements are essential to accelerate the implementation of targeted breeding. Future work includes comparing methods for prediction on datasets comprising all measurements required by the different methods, facilitating a meaningful comparison. Adopting techniques that are currently being studied in the wider context of research, such as marker × environment interactions (Cuevas et al., 2016; Lopez-Cruz et al., 2015) and modelling multiple traits simultaneously (Dias et al., 2017; Montesinos-López et al., 2016), could further improve the usefulness of our approach.
Acknowledgements
We acknowledge Boreal Plant Breeding Ltd for the access to the plant breeding data and we gratefully thank Outi Manninen, Mika Isolahti and Esa Teperi for very helpful comments.
Funding
This work was in part funded by Tekes, the Finnish Funding Agency for Innovation [Dnro 1718/31/2014 to J.G., H.M. and S.K] and the Academy of Finland [Finnish Centre of Excellence in Computational Inference Research COIN, and grant numbers 294238, 292334 and 319264 to S.K.; grant numbers 286607 and 294015 to P.M. and grant number 315896 (AIPSE) to H.M. and P.M.].
Conflict of Interest: none declared.
References