Tree height-growth trajectory estimation using uni-temporal UAV laser scanning data and deep learning

Information on tree height-growth dynamics is essential for optimizing forest management and wood pro-curement. Although methods to derive information on height-growth information from multi-temporal laser scanning data already exist, there is no method to derive such information from data acquired at a single point in time. Drone laser scanning data (unmanned aerial vehicles, UAV-LS) allows for the efﬁcient collection of very dense point clouds, creating new opportunities to measure tree and branch architecture. In this study, we examine if it is possible to measure the vertical positions of branch whorls, which correspond to nodes, and thus can in turn be used to trace the height growth of individual trees. We propose a method to measure the vertical positions of whorls based on a single-acquisition of UAV-LS data coupled with deep-learning techniques. First, single-tree point clouds were converted into 2D image projections, and a YOLOv5 (you-only-look-once) convolutional neural network was trained to detect whorls based on a sample of manually annotated images. Second, the trained whorl detector was applied to a set of 39 trees that were destructively sampled after the UAV-LS data acquisition. The detected whorls were then used to estimate tree-, plot- and stand-level height-growth trajectories. The results indicated that 70 per cent (i.e. precision) of the measured whorls were correctly detected and that 63 per cent (i.e. recall) of the detected whorls were true whorls. These results translated into an overall root-mean-squared error and Bias of 8 and − 5 cm for the estimated mean annual height increment. The method’s performance was consistent throughout the height of the trees and independent of tree size. As a use case, we demonstrate the possibility of developing a height-age curve, such as those that could be used for forecasting site productivity. Overall, this study provides proof of concept for new methods to analyse dense aerial point clouds based on image-based deep-learning techniques and demonstrates the potential for deriving useful analytics for forest management purposes at operationally-relevant spatial-scales.


Introduction
Periodic tree height-growth trajectories, or time-series of height growth are extremely important forest metrics. Along with forest age, height-growth trajectories have traditionally been used to estimate site productivity through the definition of site index (Jones, 1969). In coniferous forests, given the direct link between yearly height growth and amount of branches (i.e. knots), yearly height-growth trajectories can also provide insight on the quality of standing timber by estimating knottiness, an important aspect of wood quality (Pyörälä et al., 2018a). Furthermore, tree heightgrowth trajectories can provide data to study the effect of climate on forest growth in a comparable way with tree rings, thus providing insights into the past as well as to project future trends due to climate change (e.g. Appiah Mensah et al., 2021). Therefore, the measurement of periodic tree height growth is a subject of considerable importance in forest science and methods to measure it both more widely and efficiently can be considered as enabling technologies for further research.
Time-series of tree height-growth can traditionally be obtained in two ways, which we can broadly classify as destructive or non-destructive. Non-destructive methods rely on the repeated measurement of permanent sample plots (e.g. Manso et al., 2020), and destructive methods (e.g. Manso et al., 2021) rely on the felling and measuring of trees. These measurements are time consuming, such as counting of tree rings at different heights (e.g. Curtis, 1964) or, in the case of some conifers, the measurement of the internodal distance (Solberg et al., 2019) by splitting trees longitudinally. Remotely-sensed data, and in particular airborne laser scanning (ALS), are becoming a more common non-destructive method to measure forest heightgrowth and site productivity information at a large spatialscale. In recent years, solutions have been proposed to obtain site productivity or site index using multi-temporal ALS data (Noordermeer et al., 2018;Solberg et al., 2019;Socha et al., 2020;Guerra-Hernández et al., 2021). The drawback of these solutions is that they require at least two repeat measurements and that Forestry the time-series of height obtained are narrow in that they do not span a long period.
For many conifer species, the nodes are clearly visible as branch whorls and this opens the possibility to obtain the internodal distance, and therefore height growth, by measuring the distance between whorls. Although this can be done destructively on felled trees (e.g. Manso et al., 2021), it is desirable to do this non-destructively on standing trees. In this respect, over the past two decades, several methods to derive tree-wise branching structure information based on terrestrial laser scanning (TLS) data have been proposed. These methods rely on the possibility to detect branches and to measure their vertical position (Gorte andPfeifer, 2004, Côté et al., 2009;Raumonen et al., 2012;Gonzalez de Tanago et al., 2018;Lau et al., 2018;Pyörälä et al., 2018aPyörälä et al., , 2018b. Amongst the proposed methods, the most prominent is the so-called tree quantitative structure models (QSM; Raumonen et al., 2013), which convert unstructured point clouds into geometric primitives (most commonly cylinders) and derive topological information from these. Although these methods have proven to provide very detailed information on tree structures, for the large part these studies are conducted with broadleaves or unimodal conifers (e.g. Pinus sylvestris L.), and a great deal of time and care is required to produce point clouds of sufficient quality for this to work. For example, working in Boreal forest, Pyörälä et al. (2018aPyörälä et al. ( , 2018c) scanned groups of 3-6 Scots pine trees from 5 to 10 positions in order to obtain sufficient coverage to extract branch information. Under such labour-intensive conditions, they could visually identify whorls detected either through log x-rays or through visual identification in the point clouds with an accuracy of 55 and 70 per cent, respectively.
In contrast to TLS, laser scanning data from unmanned aerial vehicles (UAV-LS) offer the possibility to bridge the gap between large-scale ALS and high-resolution TLS . The point clouds that can be obtained from UAV-LS enable us to overcome the logistical and small-area challenges intrinsic to TLS technology while still providing a highly-detailed representation of single trees and tree structures such as stems and branches. However, although UAV-LS point clouds are very detailed and it is possible for the human eye to see the structural components of trees, such as stems and branches, they still lack the spatial resolution and positional accuracy required for QSM modelling (Brede et al., 2019). In such a realm, state-of-the-art deeplearning techniques may offer new possibilities to automate whorl detection from UAV-LS point clouds.
Deep-learning techniques have become commonplace for a broad range of applications due to their efficiency at solving complex problems in data-rich environments. Deep-learning applications related to the detection of tree structures in dense laser scanning point clouds are subdivided into 3D (e.g. Krisanski et al., 2021) and 2D or image-based methods (e.g. Rehush et al., 2018). Although 3D methods may appear more suitable as they directly work on the point cloud level, their implementation in deeplearning frameworks, which are typically designed to work on gridded data, remains challenging due to the irregular, unstructured and unordered nature of point clouds (Bello et al., 2020;Guo et al., 2021) as well as the spatial invariance towards rotations and translations. In contrast, 2D deep-learning methods have been in use for a longer period of time, and thus offer a broader range of frameworks and a more advanced level of development. Within the plethora of available 2D convolutional neural network (CNN) frameworks for object detection tasks (i.e. bounding box detection), single-stage detectors (SSD) are very popular thanks to their unparalleled inference speed while still maintaining high accuracy. In this study, we opted for a youonly-look-once (YOLO) type of SSD model (Redmon et al., 2016) since these have been reported to attain similar precision while also outperforming alternative object detection CNN models (e.g. faster region-based-CNN) on inference speed (Du, 2018). We opted for the latest YOLO version implemented in Pytorch, i.e. YOLO v5 (Jocher et al., 2020), within the YOLO family, due to the increased accuracy and speed over previous YOLO versions and its user-friendliness. Within the realms of forestry and wood science, YOLO v5 has been applied to solve tasks related to the detection of palms in an urban setting (Jintasuttisak et al., 2022), forest fires (Xu et al., 2021), and surface knot detection on sawn timber (Fang et al., 2021). An advantage of applying object detection techniques to 2D projections of point cloud data are that, they allow to produce a virtually infinite number of projections from the same object from different view angles (Chen et al., 2017). In a forestry context, such a multi-view concept has two advantages, firstly it allows the creation of a large training set from a relatively small number of trees, and secondly it enables the detection of multiple objects (i.e. whorls) on the same tree from several angles, thus reducing issues of occlusion. In this study, we leveraged on such a multi-view approach to boost the detectability of whorls based on multiple predictions for the same tree.
The specific objective of this study was to evaluate the potential of estimating tree height-growth rates for Norway spruce (Picea abies (L.) H. Karst) trees using uni-temporal, high-density laser scanning data coupled with deep-learning techniques. The intention was to detect whorls using automated, non-destructive processes in a way that would be equivalent to manually and destructively measuring their positions on felled trees. We hypothesized that a deep-learning based whorl detector trained on UAV-LS data will produce similar height-age curve for mature Norway spruce forest, as one derived from destructively measured internodal distances. As in a similar previous study (e.g. Pyörälä et al., 2018c), in order to focus entirely on the aspect of whorl detection, we deliberately avoided trying to implement single tree detection and segmentation routines in the point clouds and instead used manual segmentation of sample trees. Ultimately, as proof of an intended end-use application, we tested the hypothesis by fitting a standard height model to the automatically detected whorls and compared it with a height model based on the manually measured whorls.

Study area description and field data collection
The study areas were located in Oslo municipality (250 m.a.s.l., 60.01 • N, 10.46 • E) and consisted of two neighbouring forest stands of 1 ha (stand 1) and 1.9 ha (stand 2), respectively. The two stands were mature forests dominated by Norway spruce, constituting 85 per cent of the standing volume. The spruce trees Tree height-growth trajectory estimation using uni-temporal UAV laser scanning data and deep learning were interspersed with Scots pine and deciduous species (mainly Betula pendula Roth. and Fraxinus excelsior L.).
An initial field campaign was conducted during November 2020 to perform traditional field plot measurement for a systematic sample of 10 circular plots (250 m 2 ) per stand with varying grid spacing depending on the stand size. We determined the tree species within these plots and measured the diameter at breast height (DBH) for all trees falling within the plot boundary using a Mantax caliper (Haglöf, 2021a). Total tree height (H) was directly derived from the drone laser scanning data (see Section 2.2). To ensure a link between the field measured trees and the drone laser scanning data, we measured the distance (i.e. slope adjusted) and the azimuth angle for each tree stem from the plot centre using a combination of Vertex 5 (Haglöf, 2021b) and a Suunto KB-14 compass (Suunto, 2021). The coordinates of plot centres were recorded using a Topcon GR5 GPS (Topcon, 2021), with a logging time of at least 30 min for each plot. The GPS data were post-processed using correction data from a local base station from the Norwegian mapping authority. For all spruce trees, the DBH and H were used to compute single tree volume using existing allometric equations for Norway spruce (Vestjordet, 1967(Vestjordet, , 1968Fitje and Vestjordet, 1977).
Within the measured plots, we selected a sub-sample of 41 spruce trees for destructive sampling, hereafter referred to as 'destructively sampled trees'. The destructively sampled trees were selected based on a stratified random sampling design, where the strata were different tree volume classes, thus ensuring to cover the range of tree sizes in the area. The destructively sampled trees were marked at ∼2-m above ground with two parallel stripes to ensure their visibility to the harvester operator. To correct further destructive measurements, we also marked the height corresponding to breast height on the north and south side of the tree.
The stands were mechanically harvested in January-February 2021. For the destructively sampled trees, the harvester felled them without further processing (i.e. no delimbing or crosscutting). The branches were manually removed to provide easeof-access, leaving 10-15-cm pegs on the stem to keep the whorls visible. Starting at breast height and working to the tree apex, we measured the distance of each visible whorl from the cut base of the stem using a measuring tape fixed at the base. For eight trees, the tree-tops broke off during felling and were lost in snow and harvesting debris, and thus, the measurements were performed until practicable. Furthermore, we also measured the distance between breast height and the base of the cut tree to correct the whorl absolute height measurement for the stump height.
In further analysis, we considered that the two stands were so close geographically and structurally that they were pooled.

UAV laser scanning data
UAV laser scanning data were collected using a Staaker FX8HL multirotor (Nordic Unmanned, 2021) equipped with a Phoenix MiniRanger (Phoenix Lidar Systems, 2021), composed of a Riegl MiniVUX-1UAV (Riegl, 2021), a KVH FOG 1725 inertial measurement unit (KVH, 2021) and a Novatel GPS 702GG_1.03 (Novatel, 2021). The system was combined with a base station placed above a known point measured with cm accuracy.
The UAV-LS data were collected on the 29 th September 2020. Given the requirement of collecting at least 5000 points m −2 , the laser scanning data collection was planned based on a double cross-hatch pattern, at a flight altitude of 60-m above ground and with a flight line spacing of ∼6 m, resulting in a later overlap of the flight lines of ∼90 per cent for a scan angle of ±70 • from nadir. In the flights over stand number two, additional flight lines were added to ensure a sufficiently high point density. The laser sensor was set to scan with a pulse repetition rate of 100 kHz and with a maximum of five returns per pulse. Based on the described acquisition parameters, the final point clouds had a density of 6681-12 925 points m −2 and an average of 9529 points m −2 . Between the two different stands, there was a difference in average point cloud density of 2515 points m −2 , with the second stand having the higher point density.
The raw data were pre-processed based on the following pipeline: (1) GNSS/INS trajectory processing; (2) Point cloud generation; (3) Altitude transformation of trajectories and point cloud to Norwegian reference ellipsoid (NN2000); (4) Trajectory and point cloud matching and smoothing; (5) Automatic ground classification; (6) Flight line matching and data merging and (7) automatic ground classification of entire surface.

Manual single tree segmentation
In order to explore the complete information available in single tree point clouds, all 575 trees within the measured plots were manually segmented as part of a broader benchmarking effort (Puliti et al., 2021). Trees which were located outside the plot boundaries (based on the xy-coordinates of the centre of the stem at breast height) were not segmented. The resulting data can be considered as the results of a highly-effective segmentation routine, and it allows to explore the full potential of branch detection algorithms. A single annotator carefully performed the manual segmentation, and all trees were double-checked for quality.

Methods
The developed methodology was structured into the following six main steps (see Figure 2): (1) manual segmentation of the point cloud; (2) generation of 2D projections from the single trees; (3) manual annotation of a sample of images; (4) training the whorl detector; (5) predicting whorl positions on destructively sampled trees and post-process the annotations and finally (6) estimate the height-growth trajectory. The sections below describe in detail the abovementioned steps.

Conversion of point clouds to 2D image projections
Because YOLO v5 works on images and not directly on point clouds, images representing projected point cloud were created for each tree by projecting the points onto different planes (see Figure 2). Individual tree point clouds were first split along the Z-axis into 10-m intervals, hereafter referred to as vertical partitions. Secondly, we sliced the vertically-partitioned point cloud into four radial slices along the E-W (0 • ), SW-NE (45 • ), N-S (90 • ) Forestry and SE-NW (135 • ) planes. The planes were defined according to the primary axis of the tree. Slice thickness was relative to the relative height of the vertical partition within a tree, and it decreased linearly from the bottom to the top. The thickness of the slices at the top and the bottom were always 0.3 and 3 m, respectively. This varying slice thickness approach was adopted to reduce whorl occlusions from the foliage in the dense upper crown and ensure that enough returns were captured in the lower part of the crown. Each of the planes was defined by rotating the point cloud around a pivot point defined as the x and y coordinates of the tree-top. Finally, each radial slice within each vertical partition was rendered as a jpeg image, where the grey values corresponded to the presence of returns. In addition, we opted for rendering the points with an alpha value of 0.05 for ensuring a better visibility of clusters of returns.
As a result of the processing steps described, each individualtree point cloud was split into a set of images, i.e. 4 in the case of trees shorter than 10 m (4 radial slices and 1 vertical partition, 4 × 1 = 4) and 16 for trees taller than 30 m (4 radial slices in each of 4 vertical partitions, 4 × 4 = 16). In addition, all metadata related to the tree unique identifier and the horizontal and vertical geographical coordinates (in meters) of the image corners were stored in the filename for later reprojection of the YOLO v5 output into a geographical space with a correct height reference system.
The described method to convert single tree point clouds to images is documented in the supplementary materials (see pointcloud2images.R script).

Manual whorl annotation
A sample of 141 jpeg images from the non-destructively sampled trees, i.e. trees that were segmented but not felled, were manually annotated using LabelImg (Tzuta Lin, 2015). The annotation consisted in drawing bounding boxes around the visible whorls. The number of annotated images was determined based on the available resources for this research. The images were selected purposively to ensure a broad range of variation in tree sizes and cover different tree portions (tree-top, middle and bottom). In addition, the images were selected independently from the tree identifiers, meaning that not all images from a single tree were annotated. Such selection was made to avoid redundancy in the data and limit the time required for annotation. In total, 1944 whorls were annotated within the selected images. The annotated images were then randomly split into training and validation sets, with the former holding 70 per cent of the number of images and the latter 30 per cent.

Training whorl detector
To train the whorl detector, we used a YOLO v5 model, an SSD object detector composed of backbone, neck and head. The backbone is based on Cross Stage Partial networks  and efficiently extracts image features from an input image. The neck relies on a Path Aggregation Network (Liu et al., 2018) and is used to generate feature pyramids to assist the model to generalize well on object scaling (e.g. large vs. small whorls) and thus transfer well on unseen data. In the head, the final detection is performed and mainly consists of applying anchor boxes on features, generating output vectors with associated class confidence, objectness score and bounding boxes. A Leaky rectified linear unit (Xu et al., 2015) was used as an activation function in the hidden layers, while a Sigmoid function was adopted in the final detection layer. The default YOLO v5 standard gradient descent algorithm (Ruder, 2016) optimization function was used. The loss function was computed for the following scores: objectness (i.e. measure of the probability that an object exists in a proposed region of interest), class probability and bounding box regression. The loss is computed using the binary cross-entropy with logits loss function from PyTorch for the first two. The compounded loss is then computed from the three loss functions.
Among the available YOLO v5 architectures, we selected the model with the most parameters (i.e. YOLO v5x) due to its documented better performance over smaller architectures. Model training was done by leveraging on transfer learning from pretrained weights from 100 epochs trained on the Common Objects in Context dataset (Lin et al., 2014).
The model training was done using the training set of images (see Section 3.1.2) and was set to run for a maximum of 1 K epochs and with the default patience threshold of 100, meaning that the model training gets interrupted after 100 consecutive epochs without improvement. The image resolution was set to 640 pixels × 640 pixels, and the batch size was 32 (maximum allowed by used hardware). The model training was visually evaluated based on the training and validation loss curves throughout all the trained epochs.
All computations were performed on a virtual machine hosted within the Norwegian Institute of Bioeconomy Research (NIBIO) high-performance computing cluster and based on the Huawei 228H V5 host. The virtual machine ran on Ubuntu 20.04 longterm support with 64GB RAM, 1TB NVMe disk (flash disk), eight CPU cores and NVIDIA V100 GPU 32GB.

Whorl detection on destructively sampled trees
The jpeg images from the destructively sampled trees were then fed to the trained whorl detector for prediction (see Use.R script in supplementary materials). Detected whorls with probability lower than 0.1 were discarded as simple observation revealed these to be wrongly-detected whorls. The prediction results consisted of a text file per image with image coordinates of each detected whorl (i.e. bounding box) and the corresponding probability score. The YOLO format outputs coordinates include centre coordinates (x and y) and width and height of the box in the percentage of the pixel width and height of the image. Furthermore, the YOLO format considers the origin in the upper left corner. These coordinates were converted to xmin xmax ymin ymax in meters using the following image metadata: xmin = 0; xmax = meters from xmin; ymin = minimum height above ground; ymax = maximum height above ground). Note that the y-axis in the YOLO analysis represents the z-axis in the point cloud (see yoloPredictions2UTM.R script in supplementary materials). Finally, for each tree, we aggregated the bounding box coordinates for all radial slices and vertical partitions into a single file and merged all the bounding boxes for all slices into a single-tree file for post-processing.
To discard redundant or duplicated detections, whorl predictions on a single tree from different images were post-processed Tree height-growth trajectory estimation using uni-temporal UAV laser scanning data and deep learning as follows (see predictions postProcessing.R script in supplementary materials). First, bounding box coordinates and associated probability scores from all images of an individual tree were merged in a single table, and bounding boxes were ranked from highest to lowest probability. Starting from the bounding box with the highest probability score, we located all the other boxes overlapping it. Next, all overlapping boxes that exceed an Intersection over Union (IoU) threshold value were discarded from the ranked table, and the process continues in the same way on the next bounding box in the ranking. A value of IoU = 0.001 was used for all post-processing, meaning that bounding boxes overlapping even slightly were selected and dropped.

Evaluation of whorl detection
We performed a two-fold evaluation of the whorl detections. In the first step, we compared the raw output from the YOLO whorl detector with the manually annotated validation data (see Section 3.1.2), whereas in the second step, we compared the YOLO predictions with additional post-processing to the field measurements done on the destructively sampled trees. In both cases, using the counts of true positives (TP, the whorl was detected), false positives (FP, a whorl was detected where it did not exist), and false negatives (FN, the whorl was not detected), we reported precision (P), recall (R) and F 1 score (F 1 ) based on the following equations: In the first validation step, the TP and FP were defined as bounding boxes with IoU > = 0.6 and <0.6, respectively. In the second validation step, a TP was defined as detected whorls within 20 cm (i.e. in the Z-axis) of a field measured whorl, and conversely, a FP was a detected whorl that did not have any field measured whorl within 20 cm. In both validation steps, a FN was a whorl measured in the field but not detected by the whorl detector based on the abovementioned criteria.

Measuring and modelling tree growth
To test the real-world applicability of the whorl detection method, we sought to compare tree height-growth increment from the detected data with the measured data. Furthermore, we aimed to illustrate how the detected whorls could be used to construct a simple height-growth model without relying on preexisting forest growth simulators. Therefore, mean growth increment was determined for each tree and each height quartile in each tree. Height quartiles split each tree into four sections based on relative heights and were used to examine performance at different relative heights in the trees. Mean growth increment is defined as the ratio of the total growth in the appropriate section (or the whole tree) to the number of whorls in that section (or the whole tree) expressed as m year −1 . To quantify the error in growth increment, we calculated root-mean-squared error (RMSE) and bias according to the standard formulae: where N is the number of observations, x andxis the measured and detected increment for observation i.
As an intended use case example, we used a Chapman-Richards (Pienaar and Turnbull, 1973) function to illustrate how the data could be used to model the height growth of the forest. The function is described by the following formula: where H is the height of the forest at time t, e is the residual error, α 1 , α 2 and α 3 are parameters estimated from the data. The constant of 1.3 is used to consider only the height and time above breast height, as is standard practice in Norway. We used non-linear least squares (R Core Team, 2021) to separately estimate the α parameters in the measured and detected data. Height is the height of the measured or detected whorls, and whorl number from breast height up was used as time in years. In the detected data, all whorls were used irrespective of whether their status was correctly or incorrectly detected. Before fitting the model, both the measured and detected whorl data were right-censored at time (or whorl number) 52 because this was the maximum value at which >50 per cent of the trees had an observed whorl. This was done to ensure that the data used to fit the model were reasonably balanced with respect to age and to prevent relatively high leverage on the regression by individual points at older ages. Model evaluation was performed by examining the coefficient of determination (R 2 , equation (7)) and RMSE (equation (4), with x andx substituted by y andŷ, see below). The model derived from the detected data was additionally tested for its ability to predict the expected height of the observed whorls by using the observed whorl number as input, and both the measured and detected models were tested against the full-uncensored, measured data set.
y is the observed measured or detected height as appropriate of whorl i,ŷ is the predicted value for that observation and y is the mean of the observations.

Evaluation of whorl detector
The first step of the whorl detector accuracy assessment was the comparison of the raw output of the YOLO v5 (see Figures 1-4 Forestry Figure 1 Geographical overview of the study area with location of the sample plots and destructively sampled trees. Stand 2 is immediately adjacent to Stand 1 on the northeast side. in supplementary materials) against the validation set, which was composed of 43 manually annotated images (609 whorls). Considering all detected whorls with probability >0.1, the results for precision, recall and F 1 score were 0.55, 0.40 and 0.46, respectively. In this case we selected whorls with a probability >0.1 in order to retain even whorls on the tree-tops that despite being correctly detected had low probability.
The second step was to evaluate performance after postprocessing, when the whorl detections aggregated for all radial slices were compared with whorls measured on the felled trees. Overall, for all trees and all portions of the trees, we obtained precision, recall and F 1 -score values of 0.70, 0.63 and 0.66, respectively (see Table 1). When split into different height quartiles, the F 1 -score increased from the tree's base (F 1 -score = 0.64) to the tree-top (F 1 -score = 0.70). The higher F 1 -score at the top of the tree is driven by the increased precision, as the recall is similar to the lower sections. Furthermore, the top part of the tree had the largest difference between P and R.
A visual comparison of the tree-wise detected and the measured whorls (i.e. sorted by decreasing stem volume) as a function of the height above ground is shown in Figure 3. The figure also gives the detection metrics at the tree level. A deviation of the detected from the measured points in the y direction represents a height difference, whereas a deviation in the x-axis represents a difference in whorl number. The deviation in the x-axis is the more prevalent, as can be observed by the maximum x value of each data series. The cumulative impact of FN or FP detections will be an incorrect total age (maximum x), but these can potentially balance out. Therefore, where there is an imbalance between precision and recall (e.g. tree 39, which has maximum precision but low recall), there can be a more considerable discrepancy between the time-series. In general, the net result will be better when P and R are balanced (e.g. tree 30, which has a similar F 1 -score to tree 39). There is no significant correlation between recall (P = 0.41) or F 1 (P = 0.06) and tree volume. The weak, negative correlation (R = −0.44) between precision and volume was significant (P < 0.01) and suggests precision is marginally better on smaller trees with apparently fewer branches. Conversely, there were visibly the most missing detections (FN, denoted by recall) in the two smallest trees in Tree height-growth trajectory estimation using uni-temporal UAV laser scanning data and deep learning the study (i.e. trees 38 and 39). The number of samples is small and the analysis on the effect of tree size is inconclusive. Overall, the time-series of height and age appear to be relatively well estimated.
To depict the reason for poorly detected whorls, Figure 4 shows an example of predictions for three trees spanning in the studied range of volume where the whorl detector performed well (trees 6, 16 and 37) and three trees where the performance was poor (trees 7, 23 and 38). In addition, it is possible to see that the whorl detector failed for trees where the whorls were poorly visible in the top portion of the tree (tree 7 and 38) or where the whorls in the dead part of the crown were poorly visible (tree 23). Figure 3 Scatterplot matrix for all of the destructively sampled trees with the number of measured and detected number of whorls (WN) as a function of height above ground (H; m) and including tree-wise precision, recall and F 1 score. The plots are sorted by decreasing tree stem volume from the largest (1) to the smallest tree (39). The values were calculated for individual trees, and the arithmetic means are presented with the standard deviations in parenthesis.

Measuring and modelling tree growth
The positional errors of the correctly detected whorls (TP) were examined by relative position in the trees (see table 2). The differences between the mean increments were minor in absolute terms being in the order of 0.10-0.13 m, though in relative terms, this could be up to ∼25 per cent of the increment at the top of the trees (75-100 per cent of tree height). The lowest RMSE and the smallest bias were found in the third quartile (50-75 per cent of tree height). There was a tendency for the detection to slightly overestimate the increment (negative bias), which was most pronounced in the upper and lowermost sections of the stem. In the middle of the stem, the systematic errors were negligible. Finally, to test the usability of the detected results and put some context to the detection metrics achieved, we fitted a standard height-age model to the data. The coefficients were very similar for the measured and detected data sets (Table 3), meaning that the lines representing the expected values for a given age ( Figure 5) are practically indistinguishable. Furthermore, when testing the models produced against the full measured dataset, the performance was entirely satisfactory and equivalent for the measured and detected models (R 2 = 0.94; RMSE =1 m).

Whorl detection
In this study, we aimed to provide proof of concept for the use of UAV-LS combined with deep learning to create tree Tree height-growth trajectory estimation using uni-temporal UAV laser scanning data and deep learning Table 2 Performance in terms of RMSE and Bias for the mean annual height-growth increment (m yr −1 ) estimation calculated for each of the relative heights in the stem and the entire tree (overall). This was done on a per tree basis and the values presented are arithmetic means with standard deviations in parenthesis.

Figure 4
Examples of detected whorls (in red) for three trees with relatively successful (upper panel) and relatively unsuccessful detection (lower panel). The tree numbers correspond to the numbering in Figure 3.
height-growth trajectories from a uni-temporal acquisition. We did this by developing an automatic procedure to detect whorls on manually segmented individual-tree point clouds. This Forestry CNN architecture and bespoke post-processing relevant to our purpose. To the best of our knowledge, this is the first time UAV-ALS has been used to detect branches or whorls.
Due to the small number of annotated images (i.e. 141) and instances (i.e. 1944 whorls) compared with what is recommended for training a YOLO v5 model (i.e. >1500 images or > 10 000 instances; Jocher et al., 2020), the limited precision (0.55) and recall (0.4) indicated a relatively poor performance of the YOLO v5-based whorl detector. However, despite the limited performance of the model on single images, the adoption of a multi-view approach and consequent post-processing of the detected whorls (see Section 3.1.5), we were able to increase the precision and recall score by 27 and 57 per cent, respectively (see Table 1). One interesting aspect, highlighted by our study, is the possibility of developing custom pipelines that use deep-learning models at their core and extending these by post-processing their raw predictions with simple rulesets to boost the detection performance. In our case, the aggregation of the detected whorls from the four radial slices reduced occlusions and ensured that a whorl could be detected even if visible in only one of the slices. Further, the simple iterative bounding box post-processing workflow allowed to remove redundant detected whorls by using the probability scores using a single and intuitive parameter (i.e. IoU = 0.001), indicating the minimum vertical overlap between the detected bounding boxes. Although we opted for the use of only four different projections by rotating around the z-axis, further studies should investigate the effect of increasing the number of projections on the detection performance. Such analysis would provide better insights into the benefits of using such multi-view approaches in point cloud object detection tasks using 2D projections.
The method's performance was apparently independent of relative tree size and a logical follow on would be to test if it can be applied to spruce trees in different developmental stages or to spruce forests with differing growth rates. On the other hand, there were some indications that the method did not perform so well with suppressed trees or where whorls were either very close to each other (i.e. in the tree-top) or trees where the branches in the dead part of the crown were barely visible. Therefore, it is likely that some supplementary development will be required in further use, an example of this could be to consider the correlations in height growth between climatic variables and the expected inter correlations between height growths of trees on the same site, using similar techniques to those considered in dendrochronology for example.
Concerning the performance of the developed whorl detector for different portions of the tree, our results indicated that the whorls could be detected for the entire height of the tree with an F 1 -score > 0.63, with the best performance in the upper half of the tree (F 1 -score = 0.65-0.7). The relatively balanced performance of our whorl detector throughout the vertical length of the trees represents a major difference compared with the multiscan TLS study by Pyörälä et al. (2018b), who found branch detection accuracies for the living crowns of Scots pine to be limited to 0-20 per cent. In such respect, our study indicates that very dense UAV-LS data (> 5000 pts m −2 ) coupled with deeplearning techniques may be a more efficient source for capturing complete annual height-growth trajectories than TLS and methods relying on geometrical features. This is further reinforced by the fact that our experiment was conducted in mature spruce stands, which are typically characterized by dense and deep crowns and are thus highly prone to occlusions in the lower parts of the canopy where one would expect a poor detectability of branching structures.

Measuring and modelling tree growth
The RMSE corresponding to the mean annual height-growth measurements was always <0.13 m and was best between 50 and 75 per cent of the tree height (RMSE% = 18.5 per cent; Bias% = 0 per cent). We found a slight tendency to overestimate the increment (Bias = −0.05 m, or −10 per cent of the mean increment). However, this systematic error was less than half of that found by Pyörälä et al. (2018a) when comparing manual measurements done on TLS data against X-ray measurements of logs from Scots pine trees (i.e. Bias = −0.12 m, or − 27 per cent of the mean increment). One of the reasons behind the better performance in our study could be that Norway spruce trees are relatively more shade tolerant than Scots pine, and therefore branches are retained for longer time periods along the entire length of the stem (i.e. the rate of self-pruning is considerably lower in spruce), thus making it easier to detect whorls. Further work could be dedicated to expanding the proposed whorl detector to a broader range of coniferous species.
Finally, we showed that the predicted whorls data could be used to estimate the rate of forest growth equivalently to the ground truth, confirming our hypothesis and demonstrating the potential for end-users. Further, our method allows the measurement of the time-series of height growth non-destructively using remote sensing without knowing the age of the forest stand. This presents an opportunity to solve the problem of unknown age for site index estimation that is prevalent in Nordic countries because age at breast height rather than planting/regeneration age is used (or total age) to construct height-growth functions (e.g. Sharma et al., 2011). Normally the age must be obtained by taking increment cores from dominant trees and performing tree ring analysis (Viken, 2018) Furthermore, the construction of tailormade height-growth functions for individual sites could have considerable implications for forest experiments or precision forecasting for example.

General remarks
To date, research seeking to detect branches in 3D point clouds has been dominated by TLS, presumably because aerial point clouds have been unable to match the density and accuracy of terrestrial point clouds. Even though UAV-LS data may not provide dense and precise enough data for QSM reconstruction (Brede et al., 2019), our study found that it can allow the detection of whorls with reasonable accuracy. Compared with TLS studies, our method allows the study of more extensive areas at a fraction of the effort and with similar or even better results.
Although the results of this study are encouraging, it is essential to acknowledge that improvements to our methods are possible through the annotation of a more extensive set of training data. It would be interesting to explore the possibility of expanding the range of operational scenarios by including a broader range of laser scanning sensors, collected from different platforms (e.g. Tree height-growth trajectory estimation using uni-temporal UAV laser scanning data and deep learning TLS, mobile laser scanning, or low flying helicopters) and, as aforementioned, by including a broader range of tree species. Ultimately by including a more extensive database of annotated whorls, it is likely that the performance of the trained YOLO v5 could be boosted while also increasing the transferability to a broader range of operational scenarios. Again, one of the key advantages of deep-learning models compared with QSM or machine learning techniques (e.g. random forests) is the possibility of applying transfer-learning techniques to existing models coupled with retraining of the model using newly annotated data and thus having a model that evolves through time. Through transfer learning, it is possible to rely on the backbone of previously trained model weights and further re-train the model (such as the one published in this study) using a limited number of newly annotated data and thus efficiently enabling the transfer to, for example, new sensor data or new tree species.

Conclusion
It is well known that forest site productivity is changing in many sites due to climate change and as a result of tree breeding. Consequentially, it is increasingly difficult to rely on established static height-age relationships and ways of efficiently revising these are required. Even without these changes, using generic and often nationwide relationships does not permit the highest degree of forecasting precision for single stands. In this study, we demonstrated how state-of-the-art technology could be used to determine annual tree growth trajectories using data acquired at a single point in time. Our approach could ultimately provide a suitable solution to define site-specific functions for tree growth. To date and to the best of our knowledge, this represents the first study to demonstrate an approach to accomplish this nondestructively. This has considerable potential for application in forest science and operational forestry. Besides an efficient characterization of site productivity, it may also support estimating wood quality in standing trees, obtaining meaningful biometrics data to support wood traceability and last but not least, the derived information may support the study of the effects of climate on forest growth.

Data Availability
The developed whorl detector YOLO v5 model and code to reproduce the methods described in this study will be publicly released as a Github repository upon acceptance of this paper. In addition, the manually segmented UAV-LS data will be released publicly as part of a broader benchmarking effort. Finally, further data underlying this article will be shared on reasonable request to the corresponding author.