Predicting plant biomass accumulation from image-derived parameters

Abstract Background Image-based high-throughput phenotyping technologies have been rapidly developed in plant science recently, and they provide a great potential to gain more valuable information than traditionally destructive methods. Predicting plant biomass is regarded as a key purpose for plant breeders and ecologists. However, it is a great challenge to find a predictive biomass model across experiments. Results In the present study, we constructed 4 predictive models to examine the quantitative relationship between image-based features and plant biomass accumulation. Our methodology has been applied to 3 consecutive barley (Hordeum vulgare) experiments with control and stress treatments. The results proved that plant biomass can be accurately predicted from image-based parameters using a random forest model. The high prediction accuracy based on this model will contribute to relieving the phenotyping bottleneck in biomass measurement in breeding applications. The prediction performance is still relatively high across experiments under similar conditions. The relative contribution of individual features for predicting biomass was further quantified, revealing new insights into the phenotypic determinants of the plant biomass outcome. Furthermore, methods could also be used to determine the most important image-based features related to plant biomass accumulation, which would be promising for subsequent genetic mapping to uncover the genetic basis of biomass. Conclusions We have developed quantitative models to accurately predict plant biomass accumulation from image data. We anticipate that the analysis results will be useful to advance our views of the phenotypic determinants of plant biomass outcome, and the statistical methods can be broadly used for other plant species.

Image-based high-throughput phenotyping technologies have been rapidly developed in plant science recently and they provide a great potential to gain more valuable information than traditionally destructive methods. Predicting plant biomass is regarded as a key purpose for plant breeders and ecologist. However, it is a great challenge to find a suitable model to predict plant biomass in the context of high-throughput phenotyping. Results: In the present study, we constructed several models to examine the quantitative relationship between image-based features and plant biomass accumulation. Our methodology has been applied to three consecutive barley (Hordeum vulgare) experiments with control and stress treatments. The results proved that plant biomass can be accurately predicted from image-based parameters using a random forest model. The high prediction accuracy based on this model, in particular the crossexperiment performance, will contribute to relieve the phenotyping bottleneck in biomass measurement in breeding applications. The relative contribution of individual features for predicting biomass was further quantified, revealing new insights into the phenotypic determinants of plant biomass outcome. What's more, the methods could also be used to determine the most important image-based features related to plant biomass accumulation, which would be promising for subsequent genetic mapping to uncover the genetic basis of biomass. Conclusions: We have developed quantitative models to accurately predict plant biomass accumulation from image data. We anticipate that the analysis results will be useful to advance our views of the phenotypic determinants of plant biomass outcome, and the statistical methods can be broadly used for other plant species. Have you included all the information requested in your manuscript?

Yes
Resources A description of all resources used, including antibodies, cell lines, animals and software tools, with enough information to allow them to be uniquely identified, should be included in the Methods section. Authors are strongly encouraged to cite Research Resource Identifiers (RRIDs) for antibodies, model organisms and tools, where possible.
Have you included the information requested as detailed in our Minimum Standards Reporting Checklist?

89
Models were constructed to quantify the ability of imaging-based features to statistically predict the biomass 90 accumulation. The models were developed by using four widely used machine-learning methods (Fig. 1C Fig. 1D).

101
Our methodology was applied to three consecutive experiments ( Fig. 2A

124
FW over genotypes (Fig. 2E). Furthermore, the overall phenotypic patterns of these plants were similar to 125 their biomass output (Fig. 2,

133
Relating image-based signals to plant biomass output

134
The above analyses suggest that plant biomass can at least be partially inferred from image-based features.

135
To examine which model has the best performance and to select an appropriate model for biomass prediction,

136
we then applied our regression models (Fig. 1C)

169
In addition, the NIR-based features showed higher predictive capability for FW than for DW in control and 170 stressed plants, revealing NIR signals were import factors in determining FW accumulation.

172
Next, we investigated the relative importance (RI) of each feature for predicting biomass using a full model

188
Furthermore, we compared the relative importance of each feature in predicting FW and DW (Fig. 4E).

189
Although a positive correlation (r = 0.88) between the feature importance for FW and DW could be observed,

190
several features showed large differences in their ability to interpret FW or DW, including "nir.intensity"

258
The first evidence for this notion is the observation that our model showed more predictive power in plants 259 with two treatments than with a single treatment (Fig. 3, B and D). Indeed, when applying our model to the

278
derived parameter (such as projected area) or several geometric parameters, our analyses extended these 279 studies by incorporating more representative features that cover both structural and physiological-related 280 properties into a more sophistic model. Although the predictive power of our model is roughly higher than 281 that of single feature-based prediction, such as the digital volume (Fig. 3) [11], our model also reveals the 282 relative contribution of individual feature in prediction of biomass. The information regarding the importance 283 of each feature will offer new insights into the phenotypic determinants of plant biomass outcome.

396
We repeated the cross-validation procedure ten times. The mean and standard deviation of the resulting 2

397
and RMSRE values were calculated across runs.