Yield dissection models to improve yield: a case study in tomato

Yield as a complex trait may either be genetically improved directly, by identifying QTLs contributing to yield, or indirectly via improvement of underlying components, where parents contribute complementary alleles to different components. We investigated the utility of two yield dissection models in tomato for identifying promising yield components and corresponding QTLs. In a harvest dissection, marketable yield was the product of number of fruits and individual fruit fresh weight. In a biomass dissection, total yield was the product of fruit fresh-dry weight ratio and total fruit dry weight. Data came from a greenhouse experiment with a population of hybrids formed from four-way RILs. Trade-offs were observed between the component traits in both dissections. Genetic improvements were possible by increasing the number of fruits and the total fruit dry weight to offset losses in fruit fresh weight and fruit fresh-dry weight ratio. Most yield QTLs colocalized with component QTLs, offering options for the construction of high-yielding genotypes. An analysis of QTL allelic effects in relation to parental origin emphasized the complementary role of the parents in the construction of desired genotypes. Multi-QTL models were used for the comparison of yield predictions from yield QTLs and predictions from the products of components following multi-QTL models for those components. Component QTLs underlying dissection models were able to predict yield with the same accuracy as yield QTLs in direct predictions. Harvest and biomass yield dissection models may serve as useful tools for yield improvement in tomato by either or both of combining individual component QTLs and multi-QTL component predictions.


• Tsutsumi-Morita et al.
property from inputs defined by basic, upstream physiological traits and management and environmental factors (Tardieu 2003;Hammer et al. 2006;Chenu et al. 2009;Cooper et al. 2009;Messina et al. 2011;Bustos-Korts et al. 2019). The success of alternative yield dissections as a starting point for yield improvement strategies by breeding depends strongly on the feasibility of identifying a set of genetic factors, or quantitative trait loci (QTLs), driving phenotypic variation in the yield components. When a limited (QTL mapping) or exhaustive (genomic prediction) set of genetic markers allows the successful prediction of yield components, yield for any genotype and environmental condition can be predicted from marker profiles and environmental inputs. For recent examples of this approach, see Technow et al. (2015), Bustos-Korts et al. (2019) and Millet et al. (2019).
Another reason to try to dissect complex traits in component traits is to address the question of equifinality. Equifinality at a physiological level occurs when different combinations of component traits give rise to the same yield in particular M×E conditions. At a genetic level, equifinality occurs when different multi-locus QTL genotypes produce similar yields. It is obvious that writing yield as a function of simpler component traits, provides extra opportunities to improve yield by identifying yield component configurations that produce equal yields in some M×E conditions, but different yields in other M×E conditions. Moving to the QTL level, colocalizations of yield and component QTLs, open the way to more effective marker-assisted selection strategies. Yield QTLs can contribute positively to yield under one type of M×E conditions, while contributing negatively to yield under other M×E conditions (Boer et al. 2007;Chenu et al. 2009;Tardieu 2012;Bustos-Korts et al. 2019). Dissection of yield in components will shed light on why a particular yield QTL shows contrasting behaviour across changing M×E conditions. Following up on yield dissection, it will be important to identify and quantify trade-offs between yield components in relation to the M×E conditions and formulate ideotypes for higher yielding genotypes (Yin and Struik 2008;Chenu et al. 2009Chenu et al. , 2017. Reynolds and Langridge (2016) present a good overview of theory and practice of physiological dissection approaches to breeding. Ramstein et al. (2019) add an interesting aspect to the discussion about the relevance of dissection approaches by observing that the creation of additional phenotypes for a set of genotypes alleviates part of the n<<p problem in the fitting of genotype-to-phenotype models, where n is the number of genotypes and p is the number of markers. For many modern breeding populations, the number of markers is far larger than the number of genotypes, so that for fitting models of phenotypes as functions of markers, penalization of marker effects is required. Additional phenotypes enhance the chances of identifying biologically relevant markers.
In predictive breeding, the ultimate aim is the prediction of yield within acceptable bounds of uncertainty for new genotypes in new environments (Cooper et al. 2014;Yin et al. 2016;Washburn et al. 2020). A more modest objective looks at a static genetic dissection of yield into upstream components that are less integrated (less complex) and practically measurable or that can be approximated by use of a phenotyping tool, and for which a strong genetic basis can be identified (Alimi et al. 2013). In this way, trade-offs and negatively correlated components may become visible that hamper breeding efforts to improve yield based on selection for yield itself. At the same time, the use of QTL analysis on the component trait data may lead to the identification of alleles that make positive (or negative) contributions to yield components and hence to yield itself. The construction of a genotype containing several of such positive QTL alleles for different yield components may result in directed yield improvement. It can be visualized by a response surface for yield in which we search for the global maximum (Cooper et al. 2014). When component traits are assessed under controlled conditions in a well-defined management regime, repeatable and predictable G×M interactions may be exploited while unpredictable G×E and G×M×E interactions may be kept small. This constitutes an ideal scenario for improving yield by selecting for promising combinations of yield components.
The main objective of the current paper is to investigate two yield dissections and ways in which these dissections can increase the efficiency of breeding efforts. We use a static yield dissection into simpler components and then identify QTLs for these components and compare the component QTLs to the QTLs for yield itself in terms of location and effects. The immediate questions we like to answer or start answering are: • What is the phenotypic and genetic variation for yield and its components? In this paper, we consider two static yield dissections: one based on harvest, and one based on biomass production. Dissection based on harvest considers yield as the product of number of harvested organs and average organ size and is a standard dissection proposed and discussed in many papers (e.g. barley (Yin et al. 2002), maize (Peng et al. 2011), pea (Timmerman-Vaughan et al. 2005, melon (Zalapa et al. 2007)). In the dissection based on biomass production, yield is the product of total biomass production, harvest index and fruit fresh-dry weight ratio, all yield components in the component hierarchy presented by Higashide and Heuvelink (2009); Gur et al. (2010); Ronga et al. (2017Ronga et al. ( , 2019. We demonstrate our dissection approach on a population of hybrids of indeterminate tomato (Solanum lycopersicum) bred for greenhouse production in the Netherlands, grown and sampled during 2016. Tomato is one of the most important vegetable crops worldwide, and it is the model plant for Solanaceae (Heuvelink et al. 2020), for which genetic and genomic resources are available (The Tomato Genome Consortium et al. 2012). The tomato hybrids came from a cross between 342 four-way Yield dissection models to improve yield: a case study in tomato • 3 recombinant inbred lines (RILs) and two testers. The parent lines for the four-way RILs contained two elite lines and two wild-type lines. The fourway cross population was not specifically created for yield improvement, but instead presented a spectrum of genetic material with contributions from all parents. The wild-type material introduced genetic and physiological interactions that allowed us to investigate the contribution of wild yield component alleles to yield in elite genetic backgrounds.

Yield dissection models
We considered two static yield dissections: one based on harvest, and another one based on biomass production. They are given in Fig. 1.
The dissection based on harvest (Fig. 1A) is straightforward, yield is the number of harvested organs times their individual weight: Marketable yield = (Number of fruits) × (Individual fruit fresh weight) (1) This is a standard dissection in which the most elementary limitation to yield formation is included, where the increase in one component may not necessarily result in an increased total yield because of underlying trade-offs and negative correlations (Yin et al. 2002;Sadras 2007;Yin and Struik 2008;Gambín and Borrás 2010;Sadras and Lawson 2011;Bustos et al. 2013;Slafer et al. 2014). In contrast, the dissection of biomass (Fig. 1B) is based on the component hierarchy of Higashide and Heuvelink (2009). It considers yield as the product of fruit freshdry weight ratio and total fruit dry weight Total yield = (Fruit fresh-dry weight ratio) × (Total fruit dry weight) The total fruit dry weight can be further dissected into harvest index (HI) and total biomass Total fruit dry weight = (Harvest index) × (Total biomass) (3) Root weight is not considered in the calculation of harvest index and total biomass, because the measurement of roots is impractical for obvious reasons. Combining equations (2) and (3) gives Total yield = (Fruit fresh-dry weight ratio) × (Harvest index) × (Total biomass) (4) By having a static dissection it is (implicitly) assumed that the fractions fruit fresh-dry weight ratio and harvest index are fixed during the growth season and constant within a genotype.

Breeding population
The tomato population used in this experiment consisted of indeterminate tomato hybrids. Two wild-types (W1 and W2) were each crossed to an elite parent line (E1). The F1s of these two wild-type × elite crosses were then crossed with each other. The F1×F1 offspring population was crossed with another elite line (E2), and the resulting offspring population was selfed to produce a four-way RIL population. These four-way RILs were combined with two testers (TP1, TP2), producing two populations of 342 genotypes (RILs × TP1 and RILs × TP2) (Fig. 2). Although the phenotypic observations were done on hybrids, the QTL analyses relate the phenotypes to markers in the RILs. In the rest of the paper, we will for convenience discuss results in relation to both RILs and hybrids, in an exchangeable way. The four-way RILs were genotyped with a set of 279 SNP SOLCAP markers with minor allele frequency > 0.0125 [(Sim et al. 2012), http://solcap.msu.edu/tomato_genotype_data.shtml]. We used a genetic map that was based on the publicly available map, TraitGenetics EXPIMP2012: https://solgenomics.net/cview/map. pl?map_version_id=149. Together with the map, the markers formed the basis for identity by descent (IBD) calculations between parents and offspring that served to create design matrices for genome-wide testing of QTL additive effects. The calculation of IBD probabilities was done via a Hidden Markov Model by the Mathematica package RABBIT as described by (Zheng et al. 2015).
Following these IBD calculations, we found that the DNA in the four-way RIL population contained a 28 % contribution from E1, 52 % from E2, 12 % from W1 and 8 % from W2. The crossing scheme in Fig.  2 suggests that without selection, we should expect contributions of 25 % from E1, 50 % from E2 and 12.5 % from W1 and W2. Therefore, some selection against the wild-type alleles appeared in the construction of the RIL population.

Growing conditions and experimental design
The phenotyping experiment was conducted in the Westland area (the main tomato-producing region in the Netherlands) with commercial crop management. The experiment started in the first week of February 2016, when tomato seedlings were transferred to an area of around 3000 m 2 in a commercial tomato greenhouse (6 ha). Fruit harvest started in the middle of April. In September, the shoot tops were removed (decapitation), and the harvest finished in the middle of November. Tomatoes were grown on stone wool, and stems were supported by ropes (high-wire system). A shoot top produced about one truss and three leaves every week. Trusses were pruned to maintain six fruits per truss according to what is customary for this kind of truss tomato in the Netherlands. Plants were grown without supplementary light. Average daily temperature was 21.7 °C during daytime and 17.1 °C for night-time, while average CO 2 concentration during daytime was 629 ppm. Supporting Information- Fig. S1 shows tomato production in the greenhouse.
As experimental design, a randomized complete block design was employed (see Supporting Information- Fig. S2). The area was partitioned into two adjacent areas: RILs crossed with tester 1 (TP1) produced hybrids, RILsTP1, which were planted in area 1 (block 1), and hybrids created from the second tester (TP2), RILsTP2, were grown in area 2 (block 2). For quality control and spatial adjustment, several repetitions of crosses of the elite parents with tester 1 and 2 were planted (E1TP1 and E2TP1 in area 1, E1TP2 and E2TP2 in area 2). Also, several duplicates of five commercial cultivars were placed in both areas to be used as reference for correcting phenotypic values for spatial trends in later analysis. An experimental unit, or plot, included nine plants of a genotype. Because three out of nine plants were allowed to keep an additional stem, the resulting stem density was 3.37 m -2 , in agreement with commercial practice.

Phenotyping
Per plot, we phenotyped each of the traits as identified in the harvest and biomass dissection models (Table 1). Total yield, marketable yield, number of fruits, and fruit fresh-dry weight ratio were measured directly. All fruits per plot were measured for total yield, and all marketable fruits per plot were measured for marketable yield and the number of fruits. Individual fruit fresh weight was calculated at every harvest from marketable yield divided by the number of fruits. To determine the fruit fresh-dry weight ratio, three fruits were measured together from one side of a truss per plot. After fruit fresh weight was measured, fruits were dried in a ventilated oven at 45 °C for 24 h, 70 °C for the next 24 h and 105 °C for the last 72 h to determine fruit dry weight. Fruit dry weight measurements were conducted three times (in June, August and October), and the mean values per plot of these three measurements were used in the analysis. Also, the individual leaf dry weight was measured three times (in June, August and October). Three fully grown bottom leaves were measured together from a stem per plot. The leaves were dried in a ventilated oven at 45 °C for 24 h followed by 105 °C for 48 h.  Figure 2. Construction of four-way RILs and corresponding hybrids. Two elite parents (E1, E2) and two wild-type parents (W1, W2) were founders for offspring population of 342 fourway RILs. The RILs were crossed with two testers (TP1, TP2). The resulting hybrids (RILsTP1, RILsTP2) were phenotyped.
Yield dissection models to improve yield: a case study in tomato • 5 The harvest index was calculated as total fruit dry weight divided by total biomass. The total biomass in turn was taken as the sum of total fruit dry weight, total leaf dry weight and total stem dry weight. Total fruit dry weight was calculated from total yield multiplied by the fruit fresh-dry weight ratio. Total leaf dry weight was calculated from the mean of the three individual leaf dry weight measurements per plot multiplied by the total number of leaves. The total number of leaves in turn was calculated from the number of leaves under the first truss and the number of trusses (three leaves between two adjacent trusses). Total stem dry weight was estimated from stem length and diameter based on a regression model. We used this estimation because it was not feasible to measure the whole stem dry weight for more than 700 plots. For all plots, stem length and diameter were measured on one stem per plot. The mean stem diameter was calculated from six points uniformly distributed over a stem. First, stem volume (cm 3 ) was calculated by stem length (cm) and mean stem diameter (cm), assuming a perfect cylinder shape: (5) In total, 102 plots (45 RILsTP1, 45 RILsTP2, 3 E1TP1, 3 E2TP1, 3 E1TP2, 3 E2TP2, of which 10 RILs were taken equal between RILsTP1 and RILsTP2) were randomly selected based on a method to uniformly cover the genetic space . From Using equation (6), total stem dry weights were calculated for all plots.

Phenotypic and QTL analysis
A first step in the phenotypic analysis was the calculation of spatially adjusted genotypic means. We utilized the R-package SpATS (Spatial Analysis of Field Trials with Splines) fitting a two-dimensional Penalized spline (P-spline) mixed model (Velazco et al. 2017;Rodríguez-Álvarez et al. 2018) to calculate the BLUEs, the best linear unbiased estimates corrected for local spatial trends created by variations in environmental and management conditions. The same R-package was used to calculate a generalized heritability (Cullis et al. 2006;Rodríguez-Álvarez et al. 2018). The duplicates of the hybrid genotypes E1TP1, E2TP1, E1TP2, E2TP2 and five commercial cultivars were included in the experiment functioned as reference and facilitated the spatial corrections. Histograms of BLUEs for yield and components were generated to show the proportion and magnitude of transgression that occurred for single and pairs of traits, i.e. the proportion of offspring RILs that were exceeding the performance of their parents and the amount by which this was the case. Response surface plots for yield as a function of component traits (BLUEs) helped to investigate trade-offs between traits.
Because our mapping population was the result of a non-standard four-way cross, a special purpose QTL mapping procedure was used within a mixed model framework and based on IBD probabilities between parents and RILs as described in detail by (Li et al. 2020). To test for QTLs, at each genomic evaluation position the following models for the observation y were compared in a deviance test, where we use a scalar notation for simplicity: H 0 , y = μ + ε versus H a , y = μ + z t u + ε, with μ a fixed intercept, ε a normally distributed error with mean 0 and variance σ 2 , z a design vector containing scaled IBD probabilities between the four parents and the RILs, z t the transpose of z and u a coefficients vector of random QTL allele substitution effects with mean zero and variance σ 2 QTL . Scaling of z was such that IBD probabilities were converted to allele substitution effects.
The deviance test to compare models under null (H 0 ) and alternative hypothesis (H a ) consisted of a loglikelihood ratio test for the variance component, σ 2 QTL , within the package ASReml (Butler et al. 2017). A threshold for significance was put at −log 10 P ≥ 2.9 as a compromise between the desire to impose an approximate correction for multiple testing and find some consistency in the occurrence pattern of yield and yield component QTLs across the genome.
The significant QTLs from a genome scan were subsequently inserted in a multi-QTL mixed model: y = μ + ∑(z t u) + ε, where the summation, ∑, indicates that multiple QTLs were included into the model. Each QTL was fitted with a specific variance. Final estimates for QTL effects were obtained from the multi-QTL model fit for QTLs with non-zero variances. Therefore, QTLs for being reported needed to pass a single and multi-locus test.
To investigate colocalization of yield and component QTLs, a window of 20 cM on either side of the yield QTL was defined within which the component QTL should be located. We chose 20 cM because this seemed to be a reasonable choice of confidence interval for QTL location given our type of population, population size and QTL effect magnitude (Darvasi and Soller 1997;Wu et al. 2007). A further condition to be fulfilled for colocalization of yield and component QTLs was that the parental origin of the yield QTL allele with the largest magnitude coincided with the parental origin of the component QTL allele, or haplotype, with the largest magnitude.
We investigated the predictive accuracy of multi-QTL models for yield and yield components as well as the accuracy of dissection models for yield prediction. For the latter, we first produced the multi-QTL predictions for the yield components in a dissection, and then calculated the product between the multi-QTL predictions of two (or more) yield components. For the evaluation of prediction accuracies, we implemented a cross-validation procedure. The data were partitioned into 5-folds, with 4-folds being included in a training set and 1-fold serving as validation set. This 5-fold cross-validation was repeated 20 times for different random partitionings of the data. An overview of all phenotypic and QTL analyses is given in Fig. 3.

Heritabilities
The experiment was analysed by various mixed models with as most important outputs, heritabilities and spatially corrected genotypic means. The heritabilities will give an impression of the genetic determination of the traits and the possibilities for identifying QTLs. Table  2 shows that heritabilities for total yield and marketable yield were moderately high, 0.62 and 0.64, respectively. Therefore, direct selection for yield on the basis of yield QTLs looks feasible within this RIL population. When we look at the heritabilities of the yield components in the yield dissections, we note that the heritabilities of the components in the harvest dissection are higher than those for yield: 0.77 for number of fruits and 0.80 for individual fruit fresh weight. In contrast, for the components in the biomass dissection, the heritabilities were lower than for yield, 0.57 for fruit fresh-dry weight ratio, 0.47 for total fruit dry weight, 0.52 for harvest index and 0.48 for total biomass.

Genotypic means, transgressions and correlations
The distributions of the genotypic means for total yield and marketable yield reveal that relatively little transgression occurred for RILs beyond the elite parents ( Fig. 4A and B), i.e. few RILs had higher yields than the elite parents. The distributions of the components in the harvest dissection ( Fig.  4C−H) point to an interesting phenomenon. For individual fruit fresh weight, again few RILs were better than the parents, but for the number of fruits, many RILs produced more fruits than the elite parents. This suggests that yield may be improved by increasing the number of fruits for genotypes with sufficiently high fruit fresh weight. This would be straightforward when number of fruits and fruit fresh weight are independent traits.
Yield dissection models to improve yield: a case study in tomato • 7  Fig. 4. In Fig. 5, we observe a trade-off between number of fruits and fruit fresh weight (Pearson correlation was -0.44). The scatter plot of predicted yields with isoclines illustrates that despite the moderately negative correlation between the components, a small set of RILs achieved superior yields because of a higher number of fruits. This higher number of fruits compensated for a slightly lower fruit fresh weight in a limited number of RILs. This observation is a first success in the application of yield dissection. The harvest dissection provides ideas for possible improvement strategies of yield that would be missed when yield was studied only by itself.
In the biomass dissection, total yield is created from fruit freshdry weight ratio and total fruit dry weight. The latter is then the outcome of harvest index combined with total biomass. Fig. 4 shows little transgression for the fruit fresh-dry weight ratio, but a reasonable amount of transgression for the total fruit dry weight. Therefore, yield improvement is suggested via increasing total fruit dry weight. The response surface for yield as a function of total fruit dry weight and fruit fresh-dry weight ratio in Fig. 5 demonstrates more opportunities for improving yield via total fruit dry weight than via the fruit fresh-dry weight ratio. Again, the trade-off between the components  makes that it is almost impossible to achieve higher yields by simultaneous improvements in both components. At best one component, total fruit dry weight, can be increased in such a way that decreases in the other component, fruit fresh-dry weight ratio, are more than compensated for. This insight could only be obtained via a dissection. Further insight in alternative strategies for improving total yield via biomass dissection follow from the decomposition of total fruit dry weight in harvest index and total biomass, Figs 4 and Fig. 5. Fig. 4 leads to the conclusion that for both harvest index and total biomass substantial transgression occurred, suggesting that higher total fruit dry weight and higher total yield is possible via either or both of harvest index and total biomass. However, the yield response surface of Fig. 5 demonstrates that more yield improvement can be obtained via higher total biomass than via higher harvest index.

QTL analysis
Results of QTL analysis are presented in especially Table 2 and Figs 6 and 7, and see Supporting Information- Table S1. Table 2 gives the numbers of QTLs for all traits and the numbers of colocalizing QTLs between yield and its components. We decided on colocalization between yield and component QTL when the component QTL appeared within a 20 cM window to the left or right from the yield QTL and the parental origin of the alleles for maximum yield and component coincided. Note that this is a rather strict criterion. For yield and its components, 8-21 QTLs were detected. We will refer to yield QTLs as follows: y1.1 is the first yield QTL on chromosome 1, y1.2 is the second yield QTL on chromosome 1, y2.3 is the third yield QTL on chromosome 2, etc. Of the 16 QTLs detected for total yield and 14 for marketable yield, 10 QTLs colocalized, resulting in 20 yield QTLs in total. All yield QTLs, except for one (y3.2; see Fig. 6), colocalized with component trait QTLs. More yield QTLs colocalized with QTLs for component traits in the biomass dissection than in the harvest dissection. This result is important for plant breeders, as it allows them to get a better understanding of the nature of yield QTLs and the utility of individual QTLs for improving yield, where this utility is determined by QTL alleles increasing certain yield components without compromising other yield components.  Table S1, which contains physical positions of QTLs, can be used in the construction of a strategy to arrive at desirable genotypes (ideotypes), where the choice of which QTLs to target for yield improvement needs to take into account linkage and pleiotropic effects between yield and yield components.
Yield QTLs were identified on most of the chromosomes, with the exception of chromosome 11, as this chromosome did not contain a QTL for yield although it did hold a QTL for fruit fresh-dry weight ratio. For 17 out of 19 QTLs (y1.1 and y2.1 were the exceptions), the parental allele or haplotype that showed the highest positive effect for yield also presented the highest positive effect for at least one component trait, an extremely useful result for breeders. Of these 17 QTLs, eight yield QTLs (y1.2, y3.1, y4.1, y5.1, y6.1, y7.1, y10.1, y12.2) corresponded to QTLs for component traits in both the harvest and the biomass dissection (black boxes in Fig. 6), two yield QTLs (y2.2, y10.2) corresponded to QTLs for component traits in the harvest dissection but not in the biomass dissection (orange box), and seven yield QTLs (y1.3, y1.4, y6.2, y7.2, y8.1, y9.1, y12.1) corresponded to QTLs for component traits in the biomass dissection but not in the harvest dissection (green box). Fig. 7 shows parental chromosomal effects as well as multi-QTL predictions for the parent genotypes, two wild-type and two elite lines. In this case, parental haplotypes were constructed by combining the QTL genotypes across the QTL loci on single chromosomes to create a kind of chromosomal haplotypes for the four parents. Chromosomal effects per parent were calculated by summing allelic effects. The representation in Fig. 7 makes it possible to appreciate the individual chromosomal contributions to yield and component traits, expressed as a deviation from the mean (intercept in multi-QTL model), as well as the predicted phenotype for the parent.
For total yield (Fig. 7A), marketable yield (Fig. 7B), individual fruit fresh weight (Fig. 7D), total fruit dry weight (Fig. 7F) and total biomass (Fig. 7H), the two elite parents had an overall positive predicted trait value, while the two wild-type parents had a negative overall prediction, with elite parent 1, being best, followed by elite parent 2, wild-type parent 1 and wild-type parent 2 being the least productive. Still, in the elite parents some small negative effects were present Yield dissection models to improve yield: a case study in tomato • 9 on some chromosomes, while some sizeable positive effects occurred in wild-type parent 1. The distribution of these effects indicates that yield increases seem possible by replacing some elite alleles by wildtype alleles. This phenomenon is more outspoken in the components than yield itself. Furthermore, when considering number of fruits (Fig. 7C), fruit fresh-dry weight ratio (Fig. 7E) and harvest index (Fig.  7G), the usefulness of wild-type parent 1 as bearer of positive yield component alleles becomes even more evident.

Predictions from multi-QTL models
Although the merit of dissecting yield into components was already demonstrated above by the extra options that were offered to compile superior genotypes for yield from QTL alleles for components with different parental origin, we were curious to see whether yield can be predicted from the components in a dissection model. We built multi-QTL models for the components, took the multi-QTL predictions for the components and applied the function to the component predictions that according to the dissection model should result in yield. In this paper, product formulations were defined for yield in the harvest and biomass dissection models. Fig. 8 contains the prediction accuracies for yield (black box in Fig. 8) and component traits from their corresponding QTLs (white boxes in Fig. 8) based on 20 realizations of a 5-fold crossvalidation. For marketable yield, the (median) accuracy from predictions by multi-locus yield QTL models was 0.63, while the accuracy for the components number of fruits and individual fruit fresh weight was 0.73 and 0.69, respectively. When the phenotypic predictions for the components were multiplied to approximate yield (grey box in Fig. 8), the accuracy for the harvest dissection model was slightly lower than the accuracy for predicting yield from its own QTLs, 0.56. The correlation between the predictions of yield from its own QTLs and from the harvest dissection model with component QTLs was 0.84 (see Supporting Information- Fig. S3A).
In the biomass dissection, we observed an accuracy of 0.61 for total yield, and 0.65 for fruit fresh-dry weight ratio, while for total fruit dry weight it was 0.50. Accuracy for total yield prediction using the product of predicted components in the biomass dissection model was a little bit better than direct prediction of total yield from its own QTLs, 0.63. The correlation between total yield prediction from its own QTLs and total yield from the biomass dissection model was 0.89 (see Supporting Information-Fig. S3B). Extending the biomass dissection model with a decomposition of total fruit dry weight into harvest index and total biomass led to a somewhat lower accuracy than the other two prediction models for total yield, 0.58 (Fig. 8; see Supporting Information- Fig. S3C).
The conclusion of these multi-locus prediction analyses is that the genetic basis of the component traits can be estimated well enough to try to predict yield from the components using a dissection model. When the components can be measured or approximated earlier or cheaper than yield, whether by genomic prediction, secondary phenotyping or a combination of those two, prediction of yield via dissection models appears attractive.

Limitations of yield on the underlying levels
By following a yield dissection, we obtain insight in the limitations working on yield from genetic contributions at upstream levels. As a follow up, we should target QTLs for component traits such that observed trade-offs and negative correlations can be circumvented or 'broken' to further improve yield.
One apparent yield limitation is the trade-off between the number of fruits and individual fruit fresh weight (the two underlying components of marketable yield in the harvest-based dissection). The wild-type parents mainly had negative effects on individual fruit fresh weight, but these negative effects seemed to be mitigated by positive effects on the number of fruits. Hybrids with a higher number of fruits tended to have a lower individual fruit fresh weight, and vice versa, resulting in a negative correlation between the two traits. Some chromosomes contained QTLs that had opposite effects on number of fruits and individual fruit fresh weight (e.g. QTLs on chromosome 7 and 12). This suggests a necessity to consider both component traits simultaneously to increase yield. Several hybrids had a combination of individual fruit fresh weight and fruit number that led to higher yield than that of the elite parents, presenting favourable combinations of QTLs for the underlying component traits that should be targetted by breeders. Yield dissection models to improve yield: a case study in tomato • 11 The underlying component traits of yield in the biomass-based dissection, fruit fresh-dry weight ratio and total fruit dry weight, were only weakly correlated. This might be explained by the observation, that the QTLs for the two component traits were detected on different chromosomes (chromosome 3, 6 and 11 for fruit fresh-dry weight ratio, and 10 and 12 for total fruit dry weight, respectively). In addition, QTLs on 'shared' chromosomes had differentiating effects on the two component traits (QTLs on chromosome 1 and 4 affected fruit fresh-dry weight ratio much more than total fruit dry weight, whereas the opposite was true for QTLs on chromosome 9). This suggests the two traits are largely independent from each other and can as such be increased independently and/or simultaneously to increase total yield, which offers opportunities for breeding.
Total fruit dry weight was further dissected into harvest index and total biomass. In our population, total biomass correlated strongly with total fruit dry weight. By far, the largest QTL effects on total biomass were contributed by elite parent 1, although at least one QTL haplotype on chromosome 1 from wild-type parent 1 also looked promising. On the other hand, both wild-type parents as well as elite parent 2 had considerable positive and negative effects on harvest index from QTLs on different chromosomes, most importantly chromosome 7 (the highest positive effect from elite parent 2 and wild-type parent 1, and the most negative effect from wild-type parent 2, respectively). Five out of 12 QTLs for total biomass colocalized with yield QTLs with the same haplotype showing the highest positive effect, whereas only one out of eight QTLs for harvest index colocalized with yield QTLs. The question that arises is why harvest index QTLs seem to contribute so little to yield in this population? Does the contribution of harvest index QTLs to yield depend on genetic background or management? Or, is it a consequence of our dissection model that may be too simple and violates certain causal relations?
A previous study comparing eight Dutch indeterminate tomato cultivars released between 1950 and 2003 also showed yield was positively correlated with total biomass but not with harvest index (Higashide and Heuvelink 2009). On the contrary, another study comparing six Japanese indeterminate tomato cultivars released in the past 80 years showed that yield highly positively correlated with harvest index but not with total biomass (Higashide et al. 2012). These conflicting observations may be the result of the use of different genotypes or of genotype by environment effects (genotypic differences being dependent on the environmental and/or management conditions, van Eeuwijk et al. 2016). A higher harvest index results in a lower biomass allocation to leaves and leaf area index, which could mean a lower total biomass unless the specific leaf area (leaf area per leaf dry mass) and/ or light use efficiency (biomass production per unit intercepted light) would improve. In major staple crops, improvements of harvest index have historically increased yield potentials (Hay 1995;Smith et al. 2018). It is reported that modern crop cultivars have a higher harvest index, although total dry matter production is most often very similar (Evans 1993).

Contributions from wild-types
Many genes contribute to yield, making yield improvement a considerable challenge for breeders. In tomato, domestication and improvement have increased the productivity by narrowing its genetic basis (Bai and Lindhout 2007). Increasing the genetic diversity may be beneficial for improving yield along different component traits. Wild germplasm has been used as a source of new alleles for tomato breeding in recent decades (Rick 1978;Rick and Chetelat 1995;Tanksley and McCouch 1997;Lin et al. 2014). In our experiment, two wild-type parents provided new alleles. Many hybrids performed worse in terms of marketable yield (for the harvest dissection) or total yield (for the biomass dissection). Negative effects of wildtype parents are to be expected, as individual fruit fresh weight has increased 100-fold during the domestication and improvement of tomato (Lin et al. 2014). There was, however, a subset of hybrids in our population that performed better than the elite lines. This is because wild-type parent 1 also contributed positive QTL effects to different component traits underlying yield, such as number of fruits and fruit fresh-dry weight ratio. This emphasizes that by only considering yield and not looking at underlying components, positive effects to yield via component traits remain obscured. Moreover, genetic effects present an actionable target for breeders, who can cross QTLs for component traits into elite lines. In addition, it suggests that wild parents are useful as genetic resources, not only for incorporating multiple disease-resistant loci in elite lines, but also to improve yield. Contributions of wild parents to tomato yield have been reported for biparental populations (Zamir 2001;Gur and Zamir 2004;Lippman et al. 2007), whereas our study reports it for a multi-parent population.

Future methodological possibilities
Beneficial component traits can be used for making a cross for future cultivars and/or in choosing parent lines (Cooper et al. 2014;van Eeuwijk et al. 2019), and in this paper, we present a proof of principle for using yield dissection models for identifying such beneficial component traits. The usefulness of dissecting traits along component traits has received more attention with the emergence of phenotyping technology, such as high-throughput phenotyping. Ideally, a phenotype is highly correlated with QTLs. We can then perform a selection of high-yielding RILs/hybrids in an early stage by extracting DNA from seedlings, eliminating the need for following full breeding cycles, which would reduce breeding efforts considerably.
Yield is the integrated result of many genetic and gene-environment interactions, and a strong correlation between QTLs and phenotypic traits is more likely when the traits are on a lower organizational level, i.e. are upstream, which excludes several genetic interactions. Because of the simpler genetic basis, it is also expected that prediction of component traits is more accurate than prediction of yield itself. In our analysis, the yield prediction models based on QTLs for component traits performed about as good as the ones based on QTLs for yield itself, suggesting that genetic dissection is viable. It should be noted, that the prediction accuracy may be highly influenced by the quality and quantity of genotypic and environmental sampling. In our experiment, number of fruits and individual fruit fresh weight were phenotyped for all fruits in our study.
Additional advantages of using a genetic dissection with component traits may be: Figure 6. QTL locations and effects for yield and components. On vertical axis, chromosome 1 to 12, on horizontal axis, position within chromosome. QTL positions are indicated by dots. Dashed vertical lines represent markers. Colour of dot shows haplotype with highest positive effect. Size of dot represents proportion (%) of largest haplotype effect to mean for the trait. Boxes indicate colocalized yield and component QTLs, with black boxes for components belonging to both harvest and biomass dissection, orange for components belonging to harvest dissection and green for components belonging to biomass dissection.
Yield dissection models to improve yield: a case study in tomato • 13 1) The heritability may be higher than for yield due to the simpler genetic mechanism of component traits. In our results, the number of fruits and the individual fruit fresh weight showed higher heritability than yield (Table 2); 2) Some component traits do not need a complete harvest season (8 months) to be determined, possibly shortening the breeding cycle, e.g. individual fruit fresh weight and the fruit fresh-dry weight ratio can be measured for early yield.
In particular, this latter point may become more relevant as the measurement of several component traits is becoming easier and cheaper, and feasible for larger numbers of genotypes by new phenotyping technologies (e.g. by using drones, or high throughput phenotyping platforms). The use of a four-parent cross involving also two wild-type parents generates a larger 'genetic space' for possible genotypes to expose possible trade-offs and negative correlations, which might not have been found when only a biparental population would have been used. Depending on the cross, some traits may not be sufficiently contrasting, e.g. by crossing two parents with high yield, one will most likely not find promising QTLs for improving yield. In this four-parent cross involving very different genotypes, both the aggregated trait 'yield' and underlying component traits show strong contrasts.
Another relevant item is the set-up of the experiment. By growing the population in a greenhouse, management and environmental conditions are under relative control. The advantage of this is that we have replicable conditions. It, however, also means that the application of the QTLs we found may be limited when considering tomato populations grown in other environments. Other experiments could focus on alternative designs to evaluate which component-trait QTLs are consistently expressed across environmental and management gradients. This does not affect the methodological approach of combining static yield dissection modelling with QTL analysis for component traits, which we assume to be generally applicable. To include genotype by environment interactions into yield prediction, both static yield dissection models with environmental covariates (Millet et al. 2016  Yield dissection models to improve yield: a case study in tomato • 15 and dynamic crop growth models that include environmental factors may be useful (Cooper et al. 2016). In a follow-up paper to the current paper, we will report on results obtained in two experiments in Spanish greenhouses and come back to the question of the consistency of yield and component QTLs and the utility of yield dissection models.

CON CLUDIN G R E M A R K S
In this paper, we investigated the combination of static genetic yield dissection and QTL analyses for the possible detection of QTLs for component traits underlying yield. A primary outcome of such an approach for breeders is insights into yield response surfaces as functions of component traits in which alternative scenarios for yield improvement can be investigated under genetic and management constraints as well as cost considerations. The high accuracies of yield prediction from yield dissection models, where each component trait was predicted by its own QTLs, suggests that breeding for high yields can be done by selecting for component traits. A multi-parent population with sufficiently diverse parent genotypes enforces strong contrasts in component traits. Of interest to pre-breeding is that wild germplasm has more to offer than disease resistance genes and can contribute to higher yields.

SUPPORTIN G INFOR M ATION
The following additional information is available in the online version of this article- Figure S1. Indeterminate tomato grown in greenhouse for phenotyping. A shoot top produced a truss and three leaves constantly at around 1-week interval. Some flowers at top were pruned to keep six fruits per truss. Figure S2. Experimental design in the greenhouse. Figure S3. Scatter plots and correlations for yield predictions via dissection models versus direct yield predictions. (A) Harvest dissection, (B) Biomass dissection highest level, (C) Biomass dissection two levels. Table S1. Yield and component QTLs. QTL positions, SolCAP marker names, −log 10 (p) values for test on variance component of haplotype effects, and haplotype effects.

ACKN OWLED GE M EN TS
We thank BASF Vegetable Seeds for allowing detailed crop measurements in their experiment. We also thank the following people for contributions to the project: Jan Snel ( JFH Adviesbureau Snel), Piet Arts (Barenbrug), Bertrand Schuiling (Wiersum Plant Breeding), Roel de Bakker, Bart de Bakker (De Bakker Westland), Chaozhi Zheng for IBD calculation, Julio Velazco and Martin Boer for help with spatial analyses, Wenhao Li, Bader Arouisse and Marcos Malosetti for help with QTL analyses and predictions (WUR Biometris), Oscar Castellanos and Daisuke Tsutsumi for working together during phenotyping.

SO URCE S OF FUNDIN G
The project was funded by the Dutch STW TKI-U Program 'Green Genetics' project nr. 14524, BASF Vegetable Seeds and Adviesbureau JFH Snel. . Prediction accuracies for yield and yield components as well as for yield as predicted from a dissection model. On the left, marketable yield from components indicates the accuracy of prediction of marketable yield by the product of the predictions for number of fruits and individual fruit fresh weight. On the right, total yield from components 1 shows the accuracy for prediction of total yield from the product of the predictions for fruit fresh-dry weight ratio and total fruit dry weight, while total yield from components 2 gives the prediction of total yield from the product of the predictions for fruit fresh-dry weight ratio, harvest index and total biomass. See also Supporting Information- Fig. S3.