Spatio-temporal modeling of high-throughput multispectral aerial images improves agronomic trait genomic prediction in hybrid maize

Abstract Design randomizations and spatial corrections have increased understanding of genotypic, spatial, and residual effects in field experiments, but precisely measuring spatial heterogeneity in the field remains a challenge. To this end, our study evaluated approaches to improve spatial modeling using high-throughput phenotypes (HTP) via unoccupied aerial vehicle (UAV) imagery. The normalized difference vegetation index was measured by a multispectral MicaSense camera and processed using ImageBreed. Contrasting to baseline agronomic trait spatial correction and a baseline multitrait model, a two-stage approach was proposed. Using longitudinal normalized difference vegetation index data, plot level permanent environment effects estimated spatial patterns in the field throughout the growing season. Normalized difference vegetation index permanent environment were separated from additive genetic effects using 2D spline, separable autoregressive models, or random regression models. The Permanent environment were leveraged within agronomic trait genomic best linear unbiased prediction either modeling an empirical covariance for random effects, or by modeling fixed effects as an average of permanent environment across time or split among three growth phases. Modeling approaches were tested using simulation data and Genomes-to-Fields hybrid maize (Zea mays L.) field experiments in 2015, 2017, 2019, and 2020 for grain yield, grain moisture, and ear height. The two-stage approach improved heritability, model fit, and genotypic effect estimation compared to baseline models. Electrical conductance and elevation from a 2019 soil survey significantly improved model fit, while 2D spline permanent environment were most strongly correlated with the soil parameters. Simulation of field effects demonstrated improved specificity for random regression models. In summary, the use of longitudinal normalized difference vegetation index measurements increased experimental accuracy and understanding of field spatio-temporal heterogeneity.

(4) The authors use a two-stage approach to fitting their model.Different readers may expect different things under the general label "two-stage approach."I had to read the paper twice before starting to fully grasp what the two stages are.It would be useful early on, before and detailed description of any models, to explain in general terms what exactly is done in the first stage and what is done in the second stage of the analysis.
(5) I had some trouble understanding what the authors mean by the term "local environmental effect (LEE)".This term sounds a bit cryptic.It would be much easier to understand if this term were introduced via a linear model being fitted.This could have an explicit effect for what the "LEE."My guess is that what the authors mean is simply the nongenetic effects associated with an observation / plot, i.e. possibly time-varying environmental effects for plots, blocks, replicates, as opposed to the genotypic effects of interest.But a clarification, best via a statistical model and when the term is first introduced, would be very useful.
(6) Equation 1: The residual "e" needs to be explained.What are the fixed effects in beta?What is a "local environment"?Isn't this just a plot, and is this why u_p carries the subscript "p"?Is this a model for a single time point or for multiple time points?Why does this model have no effects representing the randomization layout of the replicated field experiment (replicates, blocks)?(7) In the text following Equation 1, and also that preceding Equation 3, the authors refer to the multivariate case in the case where multiple time points are considered for the same trait.I find the term "multivariate" misleading in this context, where the correlation that needs to be accounted for is one for serial or autocorrelation.Also, as the same trait is being measured at each time point, heterogeneity of variance is not necessarily an issue, as it would be with multiple traits.So perhaps the authors can reconsider their terminology here, also because their analysis ultimately focuses on the correlation between NDVI and agronomic traits of interest, which is what I would consider as a multivariate (multitrait) problem.
(8) Equation 4: It is not clear how this scalar equation ties in with the preceding models stated in matrix form.This is also because the notation looks completely different.
(9) Equation 5: The text purports that this equation give a structure for spatial correlation according to the 2DSpl model, but the structure clearly implies independence between all plots.
(10) Equation 6: The text here suggests that the model is for the "multivariate case", i.e. for the different time points, but the structure for Sigma_u_p is diagonal, implying independence between time points, which is not a realistic model for repeated measurements.
(12) Equation 9: This equation is for a further approach to spatio-temporal modelling.However, comparison with previously described models is made difficult by yet another change in notation.This equation is scalar.The model is announced to be for repeated measurements, but neither of the two random effects a_kl and p_bl is indexed by time.The only time-varying effects seems to be the residual e_ikn:t, so it is not clear how this model can account for serial correlation.Nor is it clear how this model accounts for spatial correlation.The description of the incidence matrices Z_a and Z_p would need to be given in detail for readers to appreciate how spatial and temporal correlation are accounted for.The authors just mention on page 11 that Legendre polynomials or linear spline functions can be used, but this is not enough.What exactly did the authors use as covariates?Certainly, there would need to be time indices involved in the scalar expressions in Equation 9 for this model to account for both spatial and temporal correlation.The matrix expressions given in the first paragraph on page 11 do involve subscripts for time, but the notation is not in agreement with Equation 9.So something is wrong here.The subscript for time itself has subscripts i and j, which is also confusing, because the subscript i has a different meaning in Equation 9.The fixed effect F_i is described as effect for the "replicate nested with image event data", but the effect is not indexed by t, which is at odds with this description.It is good, though, to have design effects in the model.Why is this not done for the other models as well?Later in the text, a matrix expression is given for the variancecovariance matrix of the response vector y.For this to be comprehensible, the ordering of scalar observations in the vector y would need to be given.
(13) The description of the structures for E on the lower part of page 11 is not succinct.Only one explicit equation is given for E_ij.For this second option, isotropy seems to be assumed.Is this a realistic assumption?What exactly are "positions"?The third to fifth option is cryptic.What are "empirical correlations"?How exactly are they computed?Explicit equations would help here.What is the underlying rationale?(14) The description of the simulation on page 12 is very scant.It remains unclear from what model exactly the data are simulated.The same goes for the evaluation of accuracy, which is denoted as a five-step process.It mentions a first-stage model, but it is unclear what exactly a first-stage model is here.Also see my comment (4).In a second step, "the computed NLLE were subtracted from NDVI HTP in order to minimize latent spatiotemporal effects in the NDVI."This subtraction is never mentioned up to here, also not with the description of the various models.This would seem to be a key step in the two-stage approach announced in the introduction.Should this not be described as part of the modelling approach?Also, the subtraction will induce correlations among the corrected values.How are these taken into account in the downstream analysis?Ignoring them will lead to invalid inferences.For an illustration based on a simple case see Piepho, H.P., Williams, E.R., Ogutu, J.O. ( 2013): A two-stage approach to recovery of interblock information and shrinkage of block effect estimates.Communications in Biometry and Crop Science 8, 10-22.
(15) On page 13, the stage-wise approach is mentioned again.Specifically, a "first-stage NLEE" is mentioned.What exactly is this?It seems to be some kind of estimator derived from the models fitted to the NDVI data, but it seems this is not really explained anywhere.The subsequent Equation 13 used the empirical correlation among the NLEE over time to estimate the matrix L.An explicit equation for the correlation would be useful here.For the second option, based on "fixed factor effects (FE)", the key feature of the models seems to be a regression of the agronomic response on the NLEE as covariates.The description of the regression terms is a bit cryptic because it is not made explicit what forms the H and F matrices take and how NLEE features in these.
(16) Model performance is assessed via the heritability.The equation given seems to be on a plot basis.This should be made explicit and also justified.In field trials, heritability is usually assessed on a plot basis.Also, I am not sure the genetic variance can just be plugged in as done here because it is estimated from a model involving the GRM.How does this simple equation for heritability account for the pairwise genetic covariance among hybrids?And how does is account for the environmental covariance among residual errors "e"? (17) Table 3: It is not clear how the correlations reported in this table were estimated.Is this based on a trivariate model for the three agronomic traits?It seems that the section "Agronomic Genomic Prediction" only describes univariate (single-trait) models.Even the "multi-trait model" in Equation 12 is only for one agronomic trait at a time.
Reviewer #2 (Comments for the Authors (Required)): In general, I liked the manuscript.It brings up an interesting topic and presents an array of modelling approaches to capture spatial heterogeneity in phenotypic variation in a secondary trait (vegetation index) and link that to spatial heterogeneity in target agronomic traits.I think the work can be very relevant as many research platforms are putting their trust in high-throughput phenotyping methods.
Having said that, I have some points of concern which I will try to explain below and which, as it happens, I believe become progressively more important.
One point that is missing is a discussion about the validity of the aerial images themselves as a proxy for spatial heterogeneity in local biomass/'green-up' effects.A UAV flying at a particular date may/does not differentiate true spatial heterogeneity from e.g.heterogeneity in phenological stages of different plants.Therefore what you see as spatial variation in the field may partly be variation in phenology rather than 'local spatial effects' .Not sure how relevant this problem is for prediction since you find local environment effects from the images (NLEE) to correlate with spatial heterogeneity in grain yield, but, judging from Fig. 2, correlations are not that high and NLEE appears to correlate weakly with soil parameters (Table 4).I therefore wonder to what extent we're looking at true spatial effects vs. different plant development stages.I guess it's an inherit caveat to HTP methodologies in general, but it would be worth discussing as this paper heavily revolves around spatial heterogeneity in the field.
On the subject of spatial heterogeneity in the field, it is hard to judge from the text and figures how important spatial effects are in the studied trials.We are presented with spatial images of the field without the raw, phenotypic variation in the traits of interest as a reference.Also, if I'm correct the experimental design has never been mentioned; all I know is that there are two replicates (mentioned on p.15).The point is, in my experience working with spatial models in many a trial, most spatial variation is captured by the experimental design (e.g. by adding row/column effects to the model) and what spatial heterogeneity the spatial model picks up is usually quite small in relation to raw phenotypic variation.To put the importance of NLEEs into perspective I think the reader needs to be given (visual) guidelines as to the importance of these NLEEs in general.
A more general point about the manuscript is that it's quite dense, with many models run and many results presented.I wonder if the manuscript could be tidied up a bit.For example, by adding a table that lists all models with their distinct random and fixed effects structures and a conceptual figure that shows how the first and second stage of the analyses are linked.Also, I wonder if the number of models can be trimmed down a bit -or at least better justified/introduced.For example, random regression models (RRs) are invoked as an alternative to running spatial models at each time point.This seems like an appealing idea but could work only, in my view, if the curves can be fitted accurately.We don't get an impression reading the manuscript of how well the curves fit the data (3rdorder polynomials are used).If they don't fit well, we would a priori not expect the RRs to accurately predict spatial variation in the target trait, which is indeed what we see for grain yield in Figs. 4 and S8.
A final point is that the authors should try to get the overall aim of the manuscript -or its value -as well as its conclusion more clear.The authors aim to investigate whether "local environment effects are able to detect spatial heterogeneity in agronomic traits" and whether these effects can aid in the genomic prediction.I'd say the latter is the most interesting part for breeders as we're ultimately interested if we can get accurate predictions, yet the manuscript puts in my view too much emphasis on 'getting the right spatial model' .We've already established that the correlation between local env.effects and spatial heterogeneity in target traits is not very strong; what really matters for breeders, I think, is how plot-level (or genotype-level, rather) values for intermediate phenotypes relate to values of the target trait.That comes in the second-stage analysis, but to my surprise there has been no cross validation.The authors basically looked at goodness of fit.For a true validation of your analyses you'd want to see how well your models do when predicting the phenotypes of 'unknown' genotypes.I think the authors need be clearer on the value of their analysis in this regard.This becomes also apparent in the conclusion section, which came across to me as somewhat inconclusive.This section mentions a lot of the model abbreviations and how they compared, but what is a lacking a clear message, e.g.how we should go about leveraging HTP and spatial information to gain clearer predictions of our traits of interest.
My apologies for this long-winded review.I do like the ideas presented here and I like the variety of ways in which spatial effects are estimated; at this point, however, I think the manuscript is a bit too dense and lacking a clear goal.If the authors were somehow able to fix this it would make the manuscript much more interesting and relevant.

Some comments by section/page:
In general, please use subsections, especially in the methods section, to make it easier to navigate between the different analyses.p.2, 2nd par: "...allow estimation of permanent environment effects from RR...".I think you need to introduce this concept a bit more.In animal breeding, a PE effect would indicate an effect of the animal on the phenotype independent of its genetics (e.g.food conditions during upbringing, body size, etc).Are you referring to the fact that individual plants (replicates) may consistently perform better or worse across time than other replicates from the same genotype?So if I'm correct, rather than estimating spatial effects across plots in the field at each time point you're estimating spatial heterogeneity as different longitudinal trajectories differing between individual plants/plots by fitting lines through time; you then look at the variation between longitudinal predictions at each time point (as per info on p11).I think a conceptual figure distinguishing the two approaches may be useful.But see my earlier comment on RR.

P4, Field
Experiments: what is the design of the experiment?Later somewhere you mention there are two replicates.P8, Eqn.1: Please already give a brief explanation of the Z_p u_p term.Also, how is the experimental design of the experiment factored into the model?P10, eqn.9: See my comment above about the use of RR.P11, mid: what does n stand for in the equation?P11, same par: "...third order Legendre polynomials were considered".How well did these polynomials capture the spatial trend in HTPs?Would longitudinal spline curves not make more sense in these types of data?P13, eqns 11a-c: is there any other spatial correction to these models, based on experimental design?P14, table 2. Please make reference to the equations; it's at times hard to remember which model represents which.
Reviewer 1: This paper considers modelling of a spatio-temporal dataset obtained by high-throughput phenotyping of replicated field trials with hybrid maize.The focus is on NDVI as assessed over the growing season and correlated agronomic traits.Analysis is based on various models that account, in different ways, for spatial, temporal and trait-to-trait correlation.The ultimate objective is to improve analysis for targeted agronomic traits, which are correlated with the NDVI data.The approach taken by the authors is described as a two-stage approach.
Comment 1.The authors consider four experiments.The design of these experiments is not described at all.It is important to provide information on the field layout, the randomization and blocking structure and the number of replications and genotypes.The reason this is important is because any analysis should account for the experimental design.It appears that the design is largely ignored.For example, I would expect there to be replicates and possibly incomplete blocks, assuming that an efficient design was used.Such design effects would need to be included in at least a baseline analysis.For readers to convince themselves to what extent this has been done, a succinct description of the design is needed, including any blocking structure.Incidentally, the introduction asserts that randomization can help control errors.It may be argued that randomization only helps assessing and accounting for errors, but it does not really control errors.The key design feature that allows error control is blocking.
Author: In order to clarify this point, such design information was included in the model, and we amended the text to describe the design type, the number of genotypes tested, and the number of replicates and plots, for the four experiments (Line 160-170).The rows and columns were included in the models and are now described (Line 170-175).We added more detail in the model specification for the fixed effects, and added details about the tested germplasm.To clarify the introduction, we changed "control for errors" to "account for confounding genetic, spatial, and environmental variation" (Line 75).
Comment 2. For the data at hand, correlation needs to be taken into account in three dimensions: (i) spatial, (ii) temporal and (iii) trait-to-trait.The different model approaches do this to varying degrees, and it appears that none or not all of the models accounts for all sources of correlation simultaneously, which does not seem satisfactory.I will come back to this general point when considering specific models.
Author: We agree that there are three aspects to consider (i) spatial, (ii) temporal, and (iii) trait-to-trait.This study explores spatial and temporal information derived from high-throughput phenotype (HTPs) (in Equation 1) and agronomic traits (previously in Equations 11a-c and now in Equations 5a-c) independently.However, the trait-to-trait aspect is explored in two ways, 1) between agronomic and HTP traits using multi-trait models (previously in Equation 12and now in Equation 6) and 2) between HTP time points (Equation 3).Combining the three aspects is explored by leveraging spatial and temporal PE derived from HTP into genomic prediction in the two-stage models (previously in Equations 14a-d and now in Equations 8a-d).Ideally this would be done via a parsimonious singlestage model; however, the models and results as presented demonstrate the relationship between spatial variation in HTP and agronomic traits and the benefits of using longitudinal HTP to improve the accuracy of spatial corrections in agronomic traits.We will continue to pursue improved models moving forward, but we feel these results represent a significant contribution to the literature and should be shared with the community to spur further research on this topic.
Comment 3. The outline of the models suffers from an inconsistency in notation across models.The description at times gives the impression of being extracted from manuals and papers related to different packages used in the various an analysis, without any serious attempt made to unify the notation.This makes comparison of models difficult for a reader.Also, the description of model terms is often incomplete or inaccurate.This general point will also be taken up when discussing specific models.There is a simple litmus test for a sufficient model description: If the reader is equipped with my data and my description of the methods in my M&M section, would he or she be in a position to reproduce my results?I am afraid to say the answer to this question for the present submission is a clear 'no!'.
Author: We appreciate the concern and have made an effort to unify the notation, ensuring that matrices are bold and uppercase throughout.The models also now consistently reference the design effects modeled as fixed effects (Line 295,440,465,480,500). Model numbers were reordered to be more concise.The main text now only uses matrix notation for the numbered equations.
Comment 4. The authors use a two-stage approach to fitting their model.Different readers may expect different things under the general label "two-stage approach."I had to read the paper twice before starting to fully grasp what the two stages are.It would be useful early on, before and detailed description of any models, to explain in general terms what exactly is done in the first stage and what is done in the second stage of the analysis.
Author: A statement at the end of the Introduction was clarified to summarize the two-stage modeling approach (Line 140).The first stage estimates the plot's local environmental effect from the HTP, while the second stage leverages that estimate into genomic prediction either using random effects or fixed effects.
Comment 5. I had some trouble understanding what the authors mean by the term "local environmental effect (LEE)".This term sounds a bit cryptic.It would be much easier to understand if this term were introduced via a linear model being fitted.This could have an explicit effect for what the "LEE."My guess is that what the authors mean is simply the non-genetic effects associated with an observation / plot, i.e. possibly time-varying environmental effects for plots, blocks, replicates, as opposed to the genotypic effects of interest.But a clarification, best via a statistical model and when the term is first introduced, would be very useful.
Author: We appreciate the concern about using the term "local environmental effect" (LEE) and so we opted to use the term "permanent environment" (PE) instead of LEE throughout the paper.We believe PE is a more standard term to describe these effects when modeling longitudinal data.We amended the explanation of Equation 1to indicate the PE are solutions to the plot's specific effect (Line 299).Comment 6.
Equation 1: The residual "e" needs to be explained.What are the fixed effects in beta?What is a "local environment"?Isn't this just a plot, and is this why u_p carries the subscript "p"?Is this a model for a single time point or for multiple time points?Why does this model have no effects representing the randomization layout of the replicated field experiment (replicates, blocks)?
Author: We added the reference to the residual error in Model 1 and amended the explanation of Equation 1 to indicate the NDVI PE are solutions to the plot's specific non-genetic effect (Line 295-300).We added a clarification to indicate that the design replication nested with imaging event is represented in the fixed effects and how the different HTP time points are accounted (Line 295, 440, 460, 475).
Comment 7. In the text following Equation 1, and also that preceding Equation 3, the authors refer to the multivariate case in the case where multiple time points are considered for the same trait.I find the term "multivariate" misleading in this context, where the correlation that needs to be accounted for is one for serial or autocorrelation.Also, as the same trait is being measured at each time point, heterogeneity of variance is not necessarily an issue, as it would be with multiple traits.So perhaps the authors can reconsider their terminology here, also because their analysis ultimately focuses on the correlation between NDVI and agronomic traits of interest, which is what I would consider as a multivariate (multitrait) problem.
Author: We acknowledge the confusion that could come from the usage of the term multivariate when describing a model fitting many HTP time points, and we amended the manuscript to distinguish these models as either the "single-trait" case for fitting a single HTP time point or the "single-trait-repeated" case for fitting several HTP time points (Line 305, and throughout paper).The "multi-trait model" case of fitting HTP and agronomic traits in a single model is defined in Equation 6 (previously Equation 12).It should also be noted that, while these are repeated measurements, they are taken at different developmental stages of the plant and therefore would not be expected to have a uniform correlation structure or the same expression of genetic and residual variation.It would be expected that variance is heterogeneous across time and covariances would also change through time.To use an example from animal breeding, it is common to refer to birth weight, weaning weight, and weight at slaughter as separate but correlated traits.
Comment 12. Equation 9: This equation is for a further approach to spatio-temporal modeling.However, comparison with previously described models is made difficult by yet another change in notation.This equation is scalar.The model is announced to be for repeated measurements, but neither of the two random effects a_kl and p_bl is indexed by time.The only time-varying effects seems to be the residual e_ikn:t, so it is not clear how this model can account for serial correlation.Nor is it clear how this model accounts for spatial correlation.The description of the incidence matrices Z_a and Z_p would need to be given in detail for readers to appreciate how spatial and temporal correlation are accounted for.The authors just mention on page 11 that Legendre polynomials or linear spline functions can be used, but this is not enough.What exactly did the authors use as covariates?Certainly, there would need to be time indices involved in the scalar expressions in Equation 9for this model to account for both spatial and temporal correlation.The matrix expressions given in the first paragraph on page 11 do involve subscripts for time, but the notation is not in agreement with Equation 9.So something is wrong here.The subscript for time itself has subscripts i and j, which is also confusing, because the subscript i has a different meaning in Equation 9.The fixed effect F_i is described as effect for the "replicate nested with image event data", but the effect is not indexed by t, which is at odds with this description.It is good, though, to have design effects in the model.Why is this not done for the other models as well?Later in the text, a matrix expression is given for the variance-covariance matrix of the response vector y.For this to be comprehensible, the ordering of scalar observations in the vector y would need to be given.
Author: We appreciate the concern and to simplify the text, we now only use matrix notation in the main text.Discussion of the random regression model has references to papers for further documentation.The text now shows more succinctly how spatial effects and RR permanent environment effects are related to the underlying plot-level permanent environment (PE) effect (Line 325-355).
Comment 13.The description of the structures for E on the lower part of page 11 is not succinct.Only one explicit equation is given for E_ij.For this second option, isotropy seems to be assumed.Is this a realistic assumption?What exactly are "positions"?The third to fifth option is cryptic.What are "empirical correlations"?How exactly are they computed?Explicit equations would help here.What is the underlying rationale?
Author: We altered the description to be more succinct and clarified that these are ordinal positions (Line 399).The empirical correlation matrices described in the third and fourth options were calculated using the plot-level soil measurements, while the empirical correlation matrices in the fifth and sixth options were calculated using the plot-level spatial effect estimates across the growing season.Therefore, plots which experienced similar spatial effects would empirically have higher correlations than those with dissimilar spatial effects.We added an equation in the text to explain the approach (Line 402).Isotropy is a common assumption for spatial models out of necessity.When there is only one measurement per plot this assumption allows partitioning of plot specific spatial effects from residuals.The use of longitudinal data enables this assumption to be relaxed, which is one of the drivers for exploring these models.RRID would be an extreme case in which there are no assumptions about spatial relationships, albeit the spatial correlations are completely ignored in this case.The fact that it seems to perform well relative to models with strong assumptions of isotropy would suggest that it is indeed not a realistic assumption.The soil derived matrices remove this assumption while still accounting for spatial relationships, but make another strong assumption that specific soil properties are the main drivers of spatial patterns.
Comment 14.The description of the simulation on page 12 is very scant.It remains unclear from what model exactly the data are simulated.The same goes for the evaluation of accuracy, which is denoted as a five-step process.It mentions a first-stage model, but it is unclear what exactly a first-stage model is here.Also see my comment (4).In a second step, "the computed NLLE were subtracted from NDVI HTP in order to minimize latent spatio-temporal effects in the NDVI."This subtraction is never mentioned up to here, also not with the description of the various models.This would seem to be a key step in the two-stage approach announced in the introduction.Should this not be described as part of the modelling approach?Also, the subtraction will induce correlations among the corrected values.How are these taken into account in the downstream analysis?Ignoring them will lead to invalid inferences.For an illustration based on a simple case see Piepho, H.P., Williams, E.R., Ogutu, J.O. ( 2013): A two-stage approach to recovery of inter-block information and shrinkage of block effect estimates.Communications in Biometry and Crop Science 8, 10-22.
Author: We appreciate the opportunity to clarify this text.The simulation process was used to test the ability for the model to detect a plot's known local environmental effect, and therefore was not part of the agronomic genomic prediction piece.We have altered the text to remove the reference to the "first stage" and placed the statements more cohesively.Evaluating the simulation accuracy did not play a part in the larger part of this study, so the subtraction to minimize the latent effects did not occur outside of the simulation study.We amended the description on the simulation to be more concise (Line 420-430).
Comment 15.On page 13, the stage-wise approach is mentioned again.Specifically, a "first-stage NLEE" is mentioned.What exactly is this?It seems to be some kind of estimator derived from the models fitted to the NDVI data, but it seems this is not really explained anywhere.The subsequent Equation 13 used the empirical correlation among the NLEE over time to estimate the matrix L.An explicit equation for the correlation would be useful here.For the second option, based on "fixed factor effects (FE)", the key feature of the models seems to be a regression of the agronomic response on the NLEE as covariates.The description of the regression terms is a bit cryptic because it is not made explicit what forms the H and F matrices take and how NLEE features in these.
Author: We have clarified this in the text.The two-stage approach is summarized in the last paragraph of the introduction (Line 139).The first stage estimates the plot's local environmental effect (PE) using NDVI, while the second stage incorporates these estimates into genomic prediction as either random effects or fixed effects.The matrix L was defined using the NDVI PE effects and we added the equation in the text.The text now uses bold to mean matrices versus vectors throughout.
Comment 16.Model performance is assessed via the heritability.The equation given seems to be on a plot basis.This should be made explicit and also justified.In field trials, heritability is usually assessed on a plot basis.Also, I am not sure the genetic variance can just be plugged in as done here because it is estimated from a model involving the GRM.How does this simple equation for heritability account for the pairwise genetic covariance among hybrids?And how does is account for the environmental covariance among residual errors "e"?
Author: We modified the text to indicate heritability was calculated on a plot basis (Line 545).The additive genetic variance is the variance component for the random effect of the lines and follows a genomic relationship matrix G.We amended the text to state these are variance components (Line 550).
Comment 17.Table 3: It is not clear how the correlations reported in this table were estimated.Is this based on a trivariate model for the three agronomic traits?It seems that the section "Agronomic Genomic Prediction" only describes univariate (single-trait) models.Even the "multi-trait model" in Equation 12 is only for one agronomic trait at a time.
Author: The correlations in Table 3 are between spatial effects estimated from the 2DSpline and AR1 models.Correlations for GY, GM, and EH were calculated individually and a trivariate model was not used.We modified the text to indicate these models were run independently for each trait (Line 582).The agronomic genomic prediction models, including the multi-trait model, used one agronomic trait at a time.We modified the text to distinguish between single-trait, single-trait-repeated, and multi-trait models more clearly (Line 304, and throughout text).
Reviewer 2: In general, I liked the manuscript.It brings up an interesting topic and presents an array of modelling approaches to capture spatial heterogeneity in phenotypic variation in a secondary trait (vegetation index) and link that to spatial heterogeneity in target agronomic traits.I think the work can be very relevant as many research platforms are putting their trust in high-throughput phenotyping methods.
Having said that, I have some points of concern which I will try to explain below and which, as it happens, I believe become progressively more important.
Comment 1.One point that is missing is a discussion about the validity of the aerial images themselves as a proxy for spatial heterogeneity in local biomass/'green-up' effects.A UAV flying at a particular date may/does not differentiate true spatial heterogeneity from e.g.heterogeneity in phenological stages of different plants.Therefore what you see as spatial variation in the field may partly be variation in phenology rather than 'local spatial effects'.Not sure how relevant this problem is for prediction since you find local environment effects from the images (NLEE) to correlate with spatial heterogeneity in grain yield, but, judging from Fig. 2, correlations are not that high and NLEE appears to correlate weakly with soil parameters (Table 4).I therefore wonder to what extent we're looking at true spatial effects vs. different plant development stages.I guess it's an inherit caveat to HTP methodologies in general, but it would be worth discussing as this paper heavily revolves around spatial heterogeneity in the field.
Author: Thank you for the detailed discussion and review.We have amended the introduction to highlight the importance of using genomic relationships in our models for the HTP (Line 120).While physiological factors (e.g.greenness of different physiological stages) can be or are confounded in HTP like vegetation indices, the models presented in this study use genomic relationships to minimize such confounding between the additive genetic effects and spatial heterogeneity.Ultimately the performance of the models are evaluated by the accuracy in estimating agronomic trait effects, so any bias associated with confounding due to phenology that is specific to HTP would not improve accuracy for estimating genetic components of grain yield for example.Furthermore, the simulation piece in this study presented the ability of the tested models to recover induced spatial variability.
Comment 2. On the subject of spatial heterogeneity in the field, it is hard to judge from the text and figures how important spatial effects are in the studied trials.We are presented with spatial images of the field without the raw, phenotypic variation in the traits of interest as a reference.Also, if I'm correct the experimental design has never been mentioned; all I know is that there are two replicates (mentioned on p.15).The point is, in my experience working with spatial models in many a trial, most spatial variation is captured by the experimental design (e.g. by adding row/column effects to the model) and what spatial heterogeneity the spatial model picks up is usually quite small in relation to raw phenotypic variation.To put the importance of NLEEs into perspective I think the reader needs to be given (visual) guidelines as to the importance of these NLEEs in general.
Author: We have amended the text to include design information for the experiments (Line 153, 179).We appreciate the comment and have attempted to illustrate the importance of spatial heterogeneity in Figure 1 by including the magnitudes of the spatial effect in the heatmap (Line 572).We presented this approach to compare the two-dimensional spline against the simplest baseline model.
Comment 3. A more general point about the manuscript is that it's quite dense, with many models run and many results presented.I wonder if the manuscript could be tidied up a bit.For example, by adding a table that lists all models with their distinct random and fixed effects structures and a conceptual figure that shows how the first and second stage of the analyses are linked.Also, I wonder if the number of models can be trimmed down a bit -or at least better justified/introduced.For example, random regression models (RRs) are invoked as an alternative to running spatial models at each time point.This seems like an appealing idea but could work only, in my view, if the curves can be fitted accurately.We don't get an impression reading the manuscript of how well the curves fit the data (3rdorder polynomials are used).If they don't fit well, we would a priori not expect the RRs to accurately predict spatial variation in the target trait, which is indeed what we see for grain yield in Figs. 4 and S8.
Author: We have tried to clarify the text by significantly changing the Methods section and moving some details to the Supplemental.Table 2 summarizes the models presented across the various fixed and random effects tested, and we amended the text to be clearer (Line 439,460,510).That equations now use matrix notation through the main text.The third order random regression model was selected by looking at the log-likelihood and estimated permanent environment effects with better correlations (as in Figure 4) than random regressions of first or second orders, and we amended this to reference studies into RR polynomials (Line 385).Comment 4. A final point is that the authors should try to get the overall aim of the manuscript -or its value -as well as its conclusion more clear.The authors aim to investigate whether "local environment effects are able to detect spatial heterogeneity in agronomic traits" and whether these effects can aid in the genomic prediction.I'd say the latter is the most interesting part for breeders as we're ultimately interested if we can get accurate predictions, yet the manuscript puts in my view too much emphasis on 'getting the right spatial model'.We've already established that the correlation between local env.effects and spatial heterogeneity in target traits is not very strong; what really matters for breeders, I think, is how plot-level (or genotype-level, rather) values for intermediate phenotypes relate to values of the target trait.That comes in the second-stage analysis, but to my surprise there has been no cross validation.The authors basically looked at goodness of fit.For a true validation of your analyses you'd want to see how well your models do when predicting the phenotypes of 'unknown' genotypes.I think the authors need be clearer on the value of their analysis in this regard.This becomes also apparent in the conclusion section, which came across to me as somewhat inconclusive.This section mentions a lot of the model abbreviations and how they compared, but what is a lacking a clear message, e.g.how we should go about leveraging HTP and spatial information to gain clearer predictions of our traits of interest.
Author: Thank you for the comment and we have amended the text to be more concise.Several studies have established the importance of spatial variability for agronomic traits, and this study observed significant variability in yield, grain moisture, and ear height that could be accounted for by in-field spatial heterogeneity.We agree that the manuscript has essentially two parts, first to explore the spatial heterogeneity in the HTP and agronomic traits using different models, and secondly to leverage the estimated spatial heterogeneity for spatial corrections.We have narrowed the text to focus on spatial corrections and not genomic prediction (Line 433).The metrics we chose to evaluate were heritability, model fit, and the correlation of genetic effects across replicates.We felt that looking at the correlation of genetic effects across replicates was most relevant because it assesses the extent to which we are able to account for spatial heterogeneity in the field.In future work we would like to assess cross validation, but a dataset with a larger number of field experiments and a larger overlap in hybrids would be more suitable than what was used in this study.
Comment 5. My apologies for this long-winded review.I do like the ideas presented here and I like the variety of ways in which spatial effects are estimated; at this point, however, I think the manuscript is a bit too dense and lacking a clear goal.If the authors were somehow able to fix this it would make the manuscript much more interesting and relevant.
Author: We have made significant edits to the Introduction, Methods, and Results sections to hopefully clarify the models presented and the intended goals.Comment 6.Some comments by section/page: In general, please use subsections, especially in the methods section, to make it easier to navigate between the different analyses.
Author: We appreciate the comments presented.By rewriting the methods section and moving many extraneous details to the Supplemental we believe it is now easier to navigate the Methods section.Comment 7. p.2, 2nd par: "...allow estimation of permanent environment effects from RR...".I think you need to introduce this concept a bit more.In animal breeding, a PE effect would indicate an effect of the animal on the phenotype independent of its genetics (e.g.food conditions during upbringing, body size, etc).Are you referring to the fact that individual plants (replicates) may consistently perform better or worse across time than other replicates from the same genotype?So if I'm correct, rather than estimating spatial effects across plots in the field at each time point you're estimating spatial heterogeneity as different longitudinal trajectories differing between individual plants/plots by fitting lines through time; you then look at the variation between longitudinal predictions at each time point (as per info on p11).I think a conceptual figure distinguishing the two approaches may be useful.But see my earlier comment on RR.
Author: Your understanding of how we intended to use the random regression (RR) permanent environment effects is correct.We fitted random regression lines for each plot through time, separated from the random additive genetic effects and therefore capturing non-genetic effects.We amended the text to be shorter and clarified that the RR effects are distinct from the spatial effects, but both are estimations of the specific plot's PE We also now use the term "permanent environment" throughout the text to mean non-genetic, plot-specific environmental effects, estimated through spatial models or temporal RR models, and avoided adding the new "local environment effect" LEE term.Comment 8. P4, Field Experiments: what is the design of the experiment?Later somewhere you mention there are two replicates.
Author: The field experiment design is now described including the design type and the number of genotypes tested in the different experiments (Line 153, 170).
Comment 9. P8, Eqn.1: Please already give a brief explanation of the Z_p u_p term.Also, how is the experimental design of the experiment factored into the model?Author: We appreciate the comment and have clarified the text to describe how the experimental design is incorporated (Line 297).
Comment 10.P10, eqn.9: See my comment above about the use of RR.
Author: The random regression model fits the permanent environment as lines for each specific plot across time.We amended the text to consistently use permanent environment (PE) to describe a plot's non-genetic local environment effect.
Comment 11.P11, mid: what does n stand for in the equation?
Author: The n stands for repeated observations on a given plot on a given time point; however, in this study only one observation per plot per time point was used.We now use matrix notation throughout (Line 439).
Comment 12. P11, same par: "...third order Legendre polynomials were considered".How well did these polynomials capture the spatial trend in HTPs?Would longitudinal spline curves not make more sense in these types of data?
Author: Third order polynomials were considered for the random regression after looking at the log likelihood of first and second order polynomials.We also looked at the correlation between the estimated HTP permanent environment effects and agronomic trait spatial heterogeneity.We edited the text to clarify this point (Line 385).
Comment 13.P13, eqns 11a-c: is there any other spatial correction to these models, based on experimental design?Author: In those equations (now 5a-c), the fixed effects account for the replication in the design, while the row and column design information are used in the random spatial effects.Comment 14. P14, table 2. Please make reference to the equations; it's at times hard to remember which model represents which.