A generator of morphological clones for plant species

Detailed and realistic tree form generators have numerous applications in ecology and forestry. Here, we present an algorithm for generating morphological tree “clones” based on the detailed reconstruction of the laser scanning data, statistical measure of similarity, and a plant growth algorithm with simple stochastic rules. The algorithm is designed to produce tree forms, i.e. morphological clones, similar as a whole (coarse-grain scale), but varying in minute details of organization (fine-grain scale). We present a general procedure for obtaining these morphological clones. Although we opted for certain choices in our algorithm, its various parts may vary depending on the application. Namely, we have shown that specific multi-purpose procedural stochastic growth model can be algorithmically adjusted to produce the morphological clones replicated from the target experimentally measured tree. For this, we have developed a statistical measure of similarity (structural distance) between any given pair of trees, which allows for the comprehensive comparing of the tree morphologies in question by means of empirical distributions describing geometrical and topological features of a tree. Our algorithm can be used in variety of applications and contexts for exploration of the morphological potential of the growth models, arising in all sectors of plant science research. Summary Statement We present an algorithmic framework, based on the Bayesian inference, for generating morphological tree clones using a combination of stochastic growth models and experimentally derived tree structures.

for certain choices in our algorithm, its various parts may vary depending on the application. Namely, 23 we have shown that specific multi-purpose procedural stochastic growth model can be algorithmically 24 adjusted to produce the morphological clones replicated from the target experimentally measured tree. 25 For this, we have developed a statistical measure of similarity (structural distance) between any given 26 pair of trees, which allows for the comprehensive comparing of the tree morphologies in question by 27 means of empirical distributions describing geometrical and topological features of a tree. Our 28 algorithm can be used in variety of applications and contexts for exploration of the morphological 29 potential of the growth models, arising in all sectors of plant science research.

I. Introduction
The most recent advances in laser scanning techniques allow for fast and non-destructive measurement 68 of trees with subsequent reconstruction of various characteristics depending on application (e.g. 69 (Rosell et al., 2009; Van Leeuwen and Nieuwenhuis, 2010)). Most of such studies dedicated to 70 reconstruction of 3D point clouds obtained from laser scanning measurements deal with overall 71 characteristics, such as height, width, and volume of stems/crowns, leaf index, biomass etc., 72 resembling traditional destructive methods of measurement (Rosell et al., 2009;Rutzinger et al., 2010). 73 However, the detailed precise geometrical and topological reconstruction with the preserved tree 74 architecture as is, is rarely sought after. 75

76
In this work, we use a fast, precise, automatic, and comprehensive reconstruction algorithm initially 77 presented in (Raumonen et al., 2013) and further developed and tested in (Calders et al., 2015). The 78 algorithm reliably reconstructs a quantitative structure model (QSM), which contains all geometrical 79 and topological characteristics of the object tree. Input for the method is the 3D point cloud, 80 sufficiently covering the tree, obtained from the terrestrial laser scanning measurements (TLS)  In this work, we utilize an inverse iterative procedure to optimize model's parameters as to match the 92 (empirical) distribution of structural features of the simulated stochastic tree models (FSPM,graphical 93 or other) to that of the tree reconstructed from the laser scanning data. Meanwhile, we formulate a 94 measure of similarity of the tree structures grounded in tomographic analysis of the structural 95 distributions (e.g. Radon transform) (Kaasalainen, 2008;Bracewell, 1990). Finally, the optimal 96 parameter set produces morphological "clone" trees with similar overall structure, but varying minute 97 details of organization. 98 99 7 asymmetry the data sets for the first order branches have a modular structure: large scaled trunk-like 185 branches along with the smaller ones. Third, we observe that the overall shape of the subject QSM can 186 be approximated by the branches of the topological orders w ≤ 2 as it can be seen from Fig. 1B. 187 Namely, with orders w = 0 and 1 the shape of the tree seems to be underrepresented (mainly due to the 188 shape asymmetry), while with orders w = 0, 1, 2, and 3 the smaller twigs just fill in the spatial gaps 189 between the major branches. This makes the analysis and form fitting a more complex task as 190 compared with the tree shapes resulting from the growth with strong apical dominance (e.g. pine trees; 191 see (Potapov et al., 2016)). Another point to consider is the underlying statistical properties. For example, it is impossible to draw 199 any branch statistics from the single instance of the trunk (w = 0), while there are plenty of samples for 200 the higher order branches. Given that the overall shape is mainly governed by the lower order 201 branches, one must compromise between the main, shape determining branches with lower abundance 202 and less important, but numerous, higher order branches (Fig. 2D).

204
Finally, the branch-related (B, see Materials and methods for the notation) data sets do not provide 205 sufficient information for the width of the branches and their curvature in space. Moreover, although 206 the B set has some information on the width (R f , L t ), it is less abundant than the similar and more 207 detailed information contained in the segment-related (S, see Materials and methods) data sets. 208 However, the B data set has information on the structure of the emanating pattern of a branch, that is, 209 the spatial location of its lateral buds/branching points (L a ), and its angular properties, which, in turn, 210 can be substituted with the biologically plausible growth algorithm. 211 212 Therefore, we begin our analysis with S 0,1 data sets as w = 0, 1 branches represent the main structural 213 frame of the tree: without its valid approximation the whole tree cannot be considered fitted. distributions U for these runs are S 0,1 . Note that this exercise serves a basic exploration of the model's 222 behavior, which can be (partially) replaced, for example, by the expert guesses for the parameter 223 values or some calibration process (if the model is designed for specific purposes and/or species). 224 225 Second, based on these preliminary results we determine the most influential parameters for each of 226 the group and combine them in a single optimization set up. Several independent optimization runs 227 were taken in order to determine the most influential parameters. For example, we found that the 228 angular properties vary the least among these runs, whereas the apical dominance requires subtler 229 adjustments (as can be understood from the complex structure of the target QSM). 230 231 Low order topological adjustment of the shape 232 233 After these initial manipulations, we obtained a model with 11 parameters and good fit of the trunk and 234 first order branches ( Fig. 3C; d h = 0.05, d g = 0.42, d c = 0.57). However, the overall form of the 235 resulting minimal score tree does not resemble the target QSM due to its rosette-shape (Fig. 3A, B). A 236 closer look at the tree reveals that the higher order branches (w > 1) are mainly responsible for the 237 formation of the rosette-shape of the tree, i.e. the orders which were not subject to the optimization 238 ( Fig. 3). This example demonstrates the contribution of the higher order branches to the overall tree 239 shape, which suggests using the scatters of these orders in further optimization steps. Moreover, the 240 branch-related features, such as the angular properties of branches of order w > 1, were not captured 241 well (Fig. 3E), although similar order segment-related features show right stochastic tendencies ( The increase in number of the structural feature tables is coupled with the increase in number of 254 distinct distance values. Although the optimization of the mean distance value hinders the 255 improvement for each target table, the low order as well as high order branches need to be fitted to the 256 corresponding branches of the target QSM as we have shown above (Fig. 3). To reduce the number of 257 distinct feature tables for the optimization we further utilize the merged data sets resulting in two joint 258 S and B tables for all topological orders (see Materials and methods). 259 260 Thus, we opted for S 0,1 and B 2,3,4 merged data sets in the next run of optimization to account for the 261 higher order branch variability (Fig. 4, d  Due to the highly discrete and stochastic nature of the tree growth, the structural distance hyper-282 surface in the space of the parameters is extremely abrupt (Fig. 5A). Hence, finding the global minima 283 of such surface is not a trivial task (the classical smooth function optimizers are not suitable in this 284 case, while stochastic discrete optimizers, like the genetic algorithm, seem to be more appropriate). 285 Moreover, the hyper-surface itself is a stochastic entity changing every time the new sample of random 286 numbers is used for a particular SSM growth realization. Therefore, any best-fit SSM is the best for a 287 particular realization of this stochastic process: one needs to study variability of the tree shape and the 288 chances are that other SSM growth realization can produce a lower distance value (Fig. 5B). We call 289 these many realizations of the SSM growth morphological tree clones. 290 The structural distance profile depends not only on the parameters of the SSM, but the choice of the 304 structural data sets. For example, in Fig. 5B and C the median distance profile is depicted given U = 305 {S 0,1 , B 2,3,4 } (red line) and U = S 0,1 (blue line). In the given parameter span the latter seems to be more 306 flattened and lifted compared to the former. The addition of the B 2,3,4 data set might be seen as a 307 perturbation to the distance profile changing the landscape properties (like minima). In our simulations 308 we maintain the global parameter boundaries, which allows for the search within the full available 309 space. However, we sequentially improve the model characteristics by perturbing the system, i.e. 310 changing the parameters, their intervals, and the U data sets to address problematic parts of the SSM 311 such that at every next optimization run the genetic algorithm is instructed to search around the 312 previous best point using the initial ranges (see Materials and methods). 313 314 Given the considerations above about the nature of the structural distance hyper-surface, the further 315 study of the morphological clones is needed. Specifically, the variability and plausibility of the clonal 316 shapes need to be addressed. For example, the clones must be further selected as to produce realistic 317 tree shapes (especially, when the general purpose SSM is used, like in this study), although we could clones is to be calibrated, for instance, by the analysis of the natural/QSM clonal individuals. 320 321 Morphological tree clones 322 323 The quintessence of our work is the generation of the morphological clones. In our pipeline, this 324 occupies the last stage (see Fig. 1 We demonstrate visualization of six clones for three distinct cases in Fig. 6. One can see the fine-grain 340 variation in the structure in each panel of the figure, although the overall (coarse-grain) structure is 341 preserved and presumably captures that of the target maple QSM (Fig. 2). The three models are: the 342 one found during the optimization process (Fig. 6A), the one minimizing the sample median distance 343 profile for D S (U = {S 0,1 , B 2,3,4 }) shown in Fig. 5B and one minimizing the sample median profile D S (U 344 = S 0,1 ) from Fig. 5C. 345 346 Out of 100 simulated clones for each case, we can see that the best-fit SSM obtained directly as the 347 optimization outcome (Fig. 6A) produces larger proportion of individual trees exhibiting the three 348 standard allometric measures closer to those of QSM (Fig. 6D). However, we argue that such simple 349 description of a tree as using the allometric measures cannot be exhaustive enough to capture both the 350 overall structure and its fine details. 351

352
The height statistics have the largest variability but by the visual inspection of the drawn clones in Fig.  353 6 one can see that this variability does not exert significant alterations of the Z axis span and the trees 354 seem to have even heights. Perhaps, the way we calculate the height of a tree produces such large 355 deviations in each particular case, which makes it a non-robust estimator. 356 357 Similarly, the girth estimation, although being captured decently, produces large errors d g , which 358 seems to be a result of variation in its linear dimensions (Fig. 6D). The girth dimension spans a small 359 proportion of the dimension of the whole tree: from several to tens of centimeters compared to meters 360 of the whole tree. This makes the girth specific error look gigantic (exceeding in some cases 100%) 361 and thus non-robust as well. 362

363
The crown spread measure shows significant variation (Fig. 6D). We believe that this takes place due 364 to the environment of the real tree the QSM was reconstructed from, which was not modeled 365 appropriately in the SSM. Namely, the environmental effects (positions relative to the sun, as the tree 366 grows in the Northern country, animals, winds, neighboring trees etc.) might cause systematic 367 influences exerted on the shape of the QSM tree. These influences were not accounted for in the SSM, In this work, we described an algorithmic pipeline aimed at producing stochastic structural replicas, or 395 morphological "clones", of trees from a QSM tree (data from TLS reconstruction) and a 396 complimentary SSM tree (analytical/procedural growth model). The pipeline is based on an iterative 397 minimization of a distance between morphological structures. The distance is based on construction of 398 the structural data sets of the tree morphologies and subsequent measure of their discrepancy using the 399 ideas of distribution tomography analysis. The resulting best-fit morphological clones are statistically 400 similar which is expressed in overall similarity of their form (coarse-grain), but, nevertheless, 401 difference in fine details of structural organization (fine-grain). In this work, we use the reconstructed QSM of a maple tree (Fig. 2). The tree was measured in leaf-off 461 conditions and our system consisted of a phase-based terrestrial laser scanner (Leica HDS6100 with a covariates (e.g. radii, lengths, angles, tapering function of a branch etc.). On the one hand, one needs to 536 exhaustively describe morphology of the tree using various geometrical and topological features. On 537 the other hand, as the number of compared data sets {U m ,U d } grows the efficiency of the optimization 538 routine decreases, since the number of distance measures to be minimized grows correspondingly (one 539 distance value for each pair {U m ,U d }). Thus, one needs more compact representation of the data. One solution is to use bigger data sets with all possibly needed (for a given application) features. (Another 541 solution is to use multi-objective optimization routines finding, e.g. Pareto front, though we do not 542 employ such an approach in this work.) Therefore, we use larger tables of all measured features; hence, 543 one table represents a data set. However, we are unable to merge segment-and branch-related features 544 into a single table as these differ in dimension (Table 1). Thus, we usually compare the array of pairs 545 {U m ,U d }, having as a result the array of distance values, but with such larger table representation we 546 have smaller size of these arrays. 547 548 Branch-and segment-related data are described in Table 1 and Fig. 7. Throughout the manuscript we 549 maintain the notations B w and S w for the branch and segment-related data sets of the (Gravelius) order 550 w, respectively. The zero order w is assigned to the trunk (a branch connecting the tree with the 551 ground). At the branching points, the lateral buds give rise to branches with order w+1, where w is the 552 order of the parent branch, while the apical buds continue the branch of the same order. 553  γ, degree Angle between horizontal projections of the segment and its parent.
ζ, degree Angle between vertical projections of the segment and its parent. These features are not exhaustive and can be augmented at will, but we found this set sufficient for 560 obtaining realistic tree shape outcomes. Representation of the data sets in the form of big branch and 561 segment related tables reduces the complexity of optimization process by reducing the number of 562 distance values to minimize. Additionally, such representation of the data allows for the fast extraction 563 of all required relations between covariates or scalar entities without having them as separate data sets. 564 565 In a simulated SSM structure the extraction of topological relations between branches is 566 straightforward as the user observes the whole process of growth: the lateral buds start the next order 567 and apical buds continue the current order. However, this is not the case with QSM since it is a time 568 snapshot of a tree form that does not retain the history of the tree growth. Thus, the reconstruction 569 algorithm requires other principles for extraction of topology. Although the reconstruction algorithm 570 defines a complicated procedure that outlines the topology of a tree, it could be roughly approximated 571 by the following rule: at branching points the thickest branch is the continuation of the same order w, 572 while thinner branches are lateral expansions of the order w + 1 (Raumonen et al., 2013). For the 573 species with weak apical dominance (shrubby trees) we maintain similar procedure when simulating 574 corresponding SSM (for the species with strong apical dominance, both techniques should converge to 575 the same result). 576 577 Finally, it is possible to merge the corresponding data sets of the same order, which results at 578 maximum in two large data sets of branch-and segment-related features, respectively. While this 579 simplifies the search of the distance minimum (max two values to minimize), this technique must be 580 used with care as in this case one heavily relies upon the growth rules of SSM. If these rules are not 581 based on biologically motivated rules, SSM can produce highly unrealistic tree forms as the "best-fit", 582 since there is a possibility to mix the features of different topological orders. For example, the 583 branches of higher order could be much thicker than those of the lower order, which is unrealistic and 584 naturally is taken care of in the biologically based growth algorithms (e.g. pipe model). Carlo method using low-discrepancy (quasi-/sub-random) sequences, which cover the given space 594 more evenly than uniformly generated sequences. The difference between the projected cumulative 595 distributions is further measured by the Kolmogorov-Smirnov statistic (any other can be used) and the 596 resulting distance between the two data sets U is an average of all statistics calculated from each of the 597 lines (see Fig. 8A). 598 599 In general, ∈ ! , in our case N = 4 (segment) or N = 5 (branch) as can be seen from Table 1 Traditional metrics (d x ). In order to provide a reference to traditional tree measurement systems, we 617 also calculate three main tree characteristics that are used for describing a tree shape (Frank, 2010). 618 Height is calculated as the highest point of a tree. Girth is calculated as the diameter of the ground 619 segment (the breast-height diameter is not appropriate for the shrubby trees). Crown spread is 620 calculated as follows. First, on XY-plane (top view, Fig. 8B) the set of spokes (red lines in Fig. 8B) 621 emanating from the center of a tree (the ground segment, green circle) is formed (here, we opted for 622 the spokes with azimuthal separation of 10 degrees). Then the length of each spoke is calculated as a distance from the tree center to the most distant point of the crown in the direction of the spoke (blue 624 circles). The crown spread is twice an average of all spokes of a tree. 625 626 Finally, when comparing two tree shapes we calculate the distances as follows: 627 In this, h d , g d , and c d are the height, girth, and crown spread of the QSM tree, respectively, whereas h m , 628