Quantifying the precision of forest stand height and canopy cover estimates derived from air photo interpretation

Quality information on forest resources is fundamental for sustainable forest management. Manual aerial photointerpretation is used as a cost-effective source of data for forest inventories; however, the process of photointerpretation is inherently subjective and is often undertaken by multiple photointerpreters for a given forest management area. In contrast, airborne laser scanning (ALS) data enable characterization of forest structure in a systematic fashion with quantiﬁable levels of accuracy and precision that often exceed required targets and standards. However, the gains associated with the use of new technologies for forest inventory are difﬁcult to measure because the quality of existing photointepreted inventories have rarely been quantiﬁed. Using ALS data as reference, the objective of this study was to quantify the precision of photointerpreted estimates of forest stand height and canopy cover (CC). We examined forest inventories from three study sites in three different forest regions of Canada. Each of the study sites was located within a different provincial jurisdiction with unique photointerpretation standards and forest ecosystems. Stand-level estimates of forest height and cover were compared to reference estimates generated from the ALS data. Overall, our results indicated that precision was greater for photointerpreted estimates of height, with a relative standard deviation ranging from 22 per cent to 29 per cent among our three sites, compared to estimates for CC, with precision ranging from 28 per cent to 59 per cent. While the relationship between photointerpreted estimates of height and ALS estimates of height were generally linear and consistent for all study sites, relationships for CC were non-linear. We found that precision for both stand height and cover varied by dominant species, inventory stand structure, age, and ALS canopy complexity, and that in the majority of cases, the differences between the photointerpreted estimate and the ALS estimate were statistically signiﬁcant. It is also noted that the variability in photointerpretation precision as a function of the aforementioned factors was not consistent among our three study sites, indicating that site-speciﬁc forest conditions and photointerpretation procedures inﬂuence the precision of photointerpreted estimates. The inﬂuence of local forest conditions and interpretation procedures are therefore important considerations when seeking to quantify the potential relative gains in precision, which may be afforded by technologies such as ALS for forest inventory programs. Moreover, approaches to improve consistency in photointerpreted estimates of cover would be useful for operational inventory programs.


Introduction
Forest inventory is a critical information source for sustainable forest management activities with the aim of producing accurate estimates of key forest attributes such as forest area, timber volume, biomass, and canopy height and cover (Kangas and Maltamo, 2006). Increasingly, forest inventories are relied upon to produce additional insights related to the broad array of ecological goods and services provided by forests (Bastian et al., 2013;Simončič et al., 2015), including habitat and biodiversity, carbon Forestry sequestration and cultural values (Brockerhoff et al., 2017). In many jurisdictions around the world, forest inventories are undertaken using a range of approaches depending on the inventory's intended use and scale of application (i.e. strategic, tactical, operational; Kangas and Maltamo, 2006). These approaches often include regional mapping using remotely sensed data, which has conventionally been aerial photography in Canada (Leckie and Gillis, 1995), augmented by ground plots for calibration purposes and to derive a suite of forest stand attributes as well as calculation of adjustment factors for certain attributes such as timber volumes.
The interpretation of aerial photography has provided cost effective data for forest inventories for decades. Typically, the process of aerial photointerpretation for forest inventory involves four complementary activities: calibration, delineation, interpretation and verification (Leckie and Gillis, 1995). Calibration is a critical phase of any forest inventory project and comprises the collection of ground or aerial reconnaissance data or plots, as well as the collation of any relevant historical information or auxiliary data that can augment the photointerpreters' field experience and understanding of the forest types to be interpreted. Photointerpreters then delineate forest stands based on uniformity in composition, age and structure (Thompson et al., 2007). Delineation is currently accomplished directly through softcopy systems and is fully digital, allowing for efficiencies in workflows. Delineation is followed by interpretation, whereby photointerpreters estimate a suite of forest inventory attributes for the delineated stands and generalize the forest characteristics present to represent that spatial unit. These interpretations are then subject to various quality checks and verification procedures to ensure the quality of estimation and consistency among interpreters. The resulting forest stands and associated attributes are often stored within a spatial database in a Geographic Information System (GIS) or geo-database and form the basis of forest management and planning for the stand into the future, until the inventory is either updated or a new inventory is completed. Photointerpreters are required to have certain qualifications and experience in order to work on inventory projects, and often will have familiarity with the forest environment for which they are interpreting. Despite these requirements and best practices for manual calibration and verification, photointerpretation is an inherently subjective process and variability in interpretation is an accepted characteristic of the forest inventory process (Thompson et al., 2007); however, this variability has rarely been systematically quantified.
Although automated or semi-automated approaches to stand delineation and attribution have been the subject of research (Leckie et al., 2003;Wulder et al., 2008), inventory programs continue to largely rely on manual delineation and photointerpretation. As with any activity that relies on the subjective judgement of individuals, photointerpretation results will vary according to a photointerpreter's skill-level and experience and familiarity with the area of interest, the complexity of the forest environment, the quality, resolution and seasonality of the imagery being interpreted, the forest inventory specifications, including minimum mapping unit and the level of detail required in the attribution (i.e. continuous vs categorical variables), among other factors (Congalton and Mead, 1983;Magnussen and Russo, 2012). As an abstraction of reality, photointerpreted estimates are not expected to directly correspond to field measurements; however uncertainty in photointerpreted estimates can influence statistics derived from these data and therefore quantifying this uncertainty in a meaningful way is necessary for reporting purposes (Magnussen and Russo, 2012), as well as for better understanding, and possibly supporting, more operational use of alternative sources of information for forest inventory such as airborne laser scanning (ALS; White et al., 2016).
To date, the number of studies characterizing the uncertainty associated with forest attributes derived from photointerpretation has been limited. While several studies have quantified the accuracy of species interpretation (Congalton and Mead, 1983;Fent et al., 1995;Potvin et al., 1999), few studies have focused on other attributes. Thompson et al. (2007) studied the accuracy of photointerpreted estimates of tree species composition in boreal forests of Ontario, Canada and reported that 64 per cent of the stands examined were misclassified for species composition and 30 per cent were misclassified by forest type (conifer, deciduous, mixed). Common boreal species such as black spruce and jack pine were misclassified in 50 per cent of stands. Moreover, observed errors persisted when stand-level data were combined over larger spatial units (of 1, 2 and 5 km radii). Errors were more common in mixed forests and in second growth, younger forests. Magnussen and Russo (2012) simulated the impact of photointerpretation uncertainty on estimates of error for Canada's National Forest Inventory (NFI). Simulated errors for each attribute were determined based on the literature and via interview with experienced photointerpreters. Specifically, the authors examined the impact of uncertainty in photointerpreted estimates of cover type, age, maturity class, crown closure, height and volume on the precision of sample-based NFI estimates. The authors found that photointerpretation uncertainty inflated the precision of NFI estimates by an average of 7 per cent, with a range of 0 per cent to 36 per cent.
ALS data have become an important data source for forest inventories (White et al., 2013;Naesset, 2014). ALS data have demonstrated a capacity for characterizing a number of important forest stand attributes through the analysis of the threedimensional point clouds (White et al., 2016). Attributes such as height are directly measured based on difference in elevation between laser returns from canopy and ground. Canopy cover (CC) is inferred through the ratio of pulses returned to the sensor that were intercepted by vegetation vs those which were reflected off the forest floor. Other attributes, such as volume, require models that utilize ALS metrics as predictor variables. In an area-based approach, those predictors usually consist of ALS point cloud metrics (e.g. height percentiles, return densities) within a given area corresponding to the area of a measured ground plot (White et al., 2013). Following the availability of ALS derived forest inventory attributes, both researchers and practitioners sought to quantify the quality of these measurements against co-located plot information. Quality assessments initially focused on the accuracy of stand (Naesset, 1997) and tree (Andersen et al., 2006) height information. For example, bias for ALS estimates of height at the plot level is known to be <0.5 m (Naesset, 1997(Naesset, , 2007Magnussen and Boudewyn, 1998;Naesset and Økland, 2002). Quality assessments have since expanded to cover an increasingly full set of forest inventory attributes (White et al., 2016). These studies have demonstrated Quantifying the precision of forest stand height and canopy cover estimates from air photo interpretation ALS derived attributes to closely relate to field measured information. As such, ALS provides synoptic and spatially explicit, wallto-wall maps of forest attributes with a known level of error. As indicated above, forest inventory data are typically not subject to quantitative assessment of accuracy. It is the process, rather than the outcomes, that is typically prescribed and assessed as part of quality assurance activity. ALS-derived forest inventory attributes provide a source of information to better understand the quality and consistency of photointerpreted forest inventory data. With more and more remotely sensed data used to support forest inventory, understanding the current technology becomes essential when developing new methods and assessing their accuracy.
Forest inventory based upon photo interpretation continues to offer key information supporting forest management, reporting and policy development. Increasingly detailed and expanding demands for information to be derived from forest inventory data concurrent with increasingly operational options for digital data collection and automation of processes (White et al., 2016) motivate an interest in better understanding the strengths and weaknesses of differing data collection and information generation approaches. Given this context, our objective was to quantify the precision of photointerpreted estimates of forest stand height and CC, using ALS data as reference. Note that our objective is not necessarily to quantify the accuracy of these estimates per se. While our reference data can be expected to be of greater accuracy and consistency than stand-level photointerpreted attributes, we acknowledge that the reference ALS information may also have errors that prevent the characterization of the absolute accuracy of the photointerpreted estimates. Rather, our intention herein is to quantify the nature and consistency (via precision) in photointerpreted estimates of stand height and cover, and determine how consistent that level of precision is across different forest environments and jurisdictions. Previous studies have typically used field data as reference; however, a comparison of stand-level estimates of height and CC derived from manual air photo interpretation (API) and ALS estimates offers a number of benefits over a comparison to field data. First, using ALS data as reference allows for a broader and spatially extensive analysis over a much wider range of forest conditions, with a markedly larger sample. To date, most of the comparative studies present in the literature have used 100s of plots (Thompson et al., 2007;Magnussen and Russo, 2012;Bergseng et al., 2014), whereas when using ALS for comparison, 1000s of stands may be assessed. Second, aiming to make comparisons between similar spatially and categorically analogous information, we focus on stand-level estimates of height and cover. Height is directly measured with ALS data and by spatially generalizing the ALS data within the stand, we closely mimic the approach used by photointerpreters to generalize stand conditions, thereby avoiding a situation where we are comparing a spatially constrained point estimate from field data to an estimate for a larger spatial unit (the forest stand). Furthermore, stand height and cover are especially important forest inventory attributes, since they drive the definition of forest area and have implications for international reporting obligations (FAO, 2012;Wulder et al., 2020).

Study areas
Analyses were undertaken in three different forested ecosystems located in three provincial jurisdictions in Canada: British Columbia (BC), Ontario (ON) and Quebec (QC) (Figure 1). Combined, these three jurisdictions account for approximately half of Canada's managed forest area and 2/3 of the volume harvested (National Resources Canada, 2019; Stinson et al., 2019). The particular study areas were selected based on the availability of photointerpreted forest inventory data, as well as the contemporaneous acquisition of imagery used for photointerpretation and corresponding ALS data. For each study site, the photointerpretation standards and forest conditions varied. The BC site was located on the Haida Gwaii archipelago ( Figure 1) and was ∼361 000 ha in size. Haida Gwaii encompasses the Queen Charlotte Ranges and Queen Charlotte Lowland Ecoregions. These ecoregions have a strongly maritime climate with a mean annual temperature of 8 • C and 1500-2000 mm of precipitation annually. Forests in this area are dominated by western hemlock (Tsuga heterophylla) and western red cedar (Thuja plicata). The average age of forest stands within the study area was estimated to be 180 years (±113 years).
The ON site was ∼1 522 000 ha in size and was located within the Abitibi Plains Ecoregion (Figure 1). The area is characterized as having a humid mid-boreal ecoclimate, with an average annual temperature of 1 • C with warm summers (average temperature of 14 • C) and cold winters (average temperature of −12 • C). The average annual precipitation ranges from 725 to 900 mm. Elevations in this area range between 100 and 435 m above sea level and can be characterized by flat to rolling terrain underlain by poorly drained glacial-lacustrine clay deposits. Forest stands on wetter sites are dominated by black spruce (Picea mariana) and balsam fir (Abies balsamea), whereas drier sites have pure stands of jack pine (Pinus banksiana), or mixed stands of jack pine, paper birch (Betula papyrifera) and trembling aspen (Populus tremuloides). Mixed stands of white spruce (Picea glauca), balsam fir, paper birch or trembling aspen are also common. The average age of forest stands within the ON study area was 80 years (±45 years).
Lastly, the QC study site was ∼1 612 000 ha in size and was located within the Southern Laurentians Ecoregion (Figure 1), which is characterized by a low to mid-boreal ecoclimate with an average annual temperature of 1.5 • C. Winters are long and cold with an average temperature of −11 • C and summers are short with an average temperature of 14 • C. Mean annual precipitation in this area is ∼900 mm. The area is dominated by mixedwood stands of white spruce, paper birch, balsam poplar and trembling aspen. Black spruce, balsam fir and tamarack are found on poorly drained sites. The average age of forest stands within the QC study area was estimated to be 70 years (±34 years).

Forest inventory data and existing photo interpretation standards
Forest inventory data, consisting of stand polygons and their associated attributes derived using photointerpretation and compiled according to provincial forest inventory standards were gathered for each study site. Our objective was to assess the variability in photointerpreted attributes, and in order to enable fair comparisons of API-and ALS-derived estimates of stand height and CC, the calculation of the reference stand height and CC from the ALS was guided by the jurisdictionspecific photointerpretation standards. Although there are commonalities in forest inventory processes across Canada (Leckie and Gillis, 1995), there are also important differences by jurisdiction. In Table 1, we summarize the photointerpretation standards used in BC (BC MFLNRORD, 2020), ON (OMNR, 2009) and QC (QC MFFP, 2015), focusing on stand height and CC. The differences in the standards for determining height and CC are most pronounced for QC. For example, in BC and ON, the photointerpreter estimates stand height based on the average height of the dominant and co-dominant trees. In QC, photointerpreters are required to estimate stand height using separate rules for short stands (i.e. <7 m), and for single, twotier, or multistage stands. In BC and ON, stand height is recorded as a continuous variable, whereas in QC, photointerpreters round estimates of stand height to the nearest metre.
Standards for CC are more variable among the jurisdictions. In BC, photointerpreters assign a single attribute to characterize CC, while in ON, photointerpreters assign CC estimates separately to the overstorey and understorey layers of the stand. In QC, photointerpreters will only estimate CC if cover is estimated to exceed 25 per cent and only for trees taller than 4 m, although the specific height threshold used varies with stand structure (Table 1). In BC and ON, CC is estimated as a continuous variable, whereas in QC, photointerpreters estimate CC in 10 per cent classes.
Forest inventory data, derived using photointerpretation and compiled according to provincial forest inventory standards, were gathered for each study site. A summary of selected stand-level attributes is shown in Table 2. Generally, the BC site represents taller (22.2 ± 9.9 m), older (180.3 years ± 113.7), forests with moderate cover (51.9 per cent ± 17.3 per cent) that would be typical of coastal temperate rainforests. In contrast, the ON site represents shorter (11.4 m ± 5.6 m), younger (79.7 years ± 45.0 years) boreal forest conditions with denser cover (61.9 per cent ± 22.4 per cent). The QC site represents boreal and hemi-boreal forest conditions with stand heights and ages that are greater on average than the ON site, but less than the BC site (16.6 m ± 2.2 m and 90.5 years ± 27.2), but with the greatest CC on average amongst all the sites (71.8 per cent ± 11.4 per cent), as well as the greatest number of broadleaf species.
The acquisition dates for aerial imagery used for API are provided in Table 3. For the BC data, inventory attributes were projected forward by the data provider to represent forest stand conditions in 2015 (the year of ALS acquisition). Height was projected using derived site index information and age, following the approach detailed in Sandvoss et al. (2005).

Point-cloud data and derived metrics
A summary of the leaf-on ALS data acquisition characteristics for each study site is shown in Table 3. Of note, while ALS and photos were acquired contemporaneously for the ON and QC sites, the BC site had the longest time interval between acquisitions. The ALS point clouds were processed following standard processing routines, which included tiling, ground classification and height Quantifying the precision of forest stand height and canopy cover estimates from air photo interpretation Canopy cover Determined for each canopy layer. Tree crown closure is the percentage of ground area covered by the vertically projected crowns of the tree cover for each tree layer within the polygon. Where vegetation is overlapping (such as a two-layer stand) only the visible portion of each layer is estimated for crown closure. Inventory attribute name: CROWN_CLOSURE Canopy cover is defined as the percentage of ground area covered by the vertical projection of the tree crowns onto the ground. Similar to height, canopy cover is assigned separately for overstorey and understorey. Inventory attribute name: OCCLO (overstorey canopy cover); VERT (inventory-based stand structure class) Canopy cover is defined as the absolute projection on the ground of the crowns detectable in the photograph, and is classified into eight classes (25,35,45,55,65,75,85,95) with a 10% class width (e.g. class 65 corresponds to stands with canopy cover between 60 and 69%). The lowest CC is 25%. Trees of different height are considered when determining canopy cover for stands with different vertical structure. For single tier stands, dominant and codominant trees above 4 m are considered. For multistage stands, all trees above 7 m are considered. For two-storey stands, canopy cover is assessed separately for overstorey (trees above 12 m only) and understorey (trees above 7 m). For two storey stands the two stand levels are considered separately and the total canopy cover may exceed 100% (up to 120%). Inventory attribute name: DENSITE, ETAGEMENT (vertical structure) normalization. Processing was performed using a combination of FUSION (Version 3.80; McGaughey, 2015) and LAStools (Isenburg, 2020) software packages. Points classified as ground were used to normalize the ALS point cloud elevations relative to the ground surface. From the normalized point clouds, we calculated two point cloud metrics that were used as reference estimates of stand height and CC, and were later used in stand-level summaries designed to match each jurisdiction's photointerpretation standards. We used the 95th percentile of ALS first returns as a measure of stand height (P95), and the proportion of first Forestry Quantifying the precision of forest stand height and canopy cover estimates from air photo interpretation returns above a 2 m threshold (prop2m) as a measure of CC. The 2 m threshold is a commonly applied threshold in the scientific literature (White et al., 2013). Metrics were calculated for 20 × 20m grid cells, covering the coincident area between the ALS and the forest inventory data.

Photointerpreted estimates of stand height and CC
In general, photointerpreted estimates of stand height (H API ) and CC (CC API ) were taken directly from the forest inventory data (stand maps) for each stand. However, because a stand in BC may have two height estimates for each stand, with one height estimate for leading species (proj_height_1) and another height estimate for the second species (proj_height 2), the H API for BC was calculated as a weighted average of proj_height_1 and proj_height_2 (equation (1)). The dominant and second species proportions were used as weights (species_pct_1 and species_pct_2, respectively).

ALS estimates of stand height and CC
To enable comparison of the photointerpreted attributes with ALS-based reference estimates, the methodology for generating estimates of stand height and CC were designed to mimic the existing provincial photointerpretation standards as closely as possible. Since the provincial standards differ by jurisdiction, the methods used to generate reference estimates from the ALS data also differed. In general, reference estimates of stand height (H ALS ) and CC (CC ALS ) derived from the ALS data were calculated by averaging all of the 20 × 20-m cells completely enclosed within a stand boundary. Grid cells located at stand edges that were only partially found within the stand were not included in the calculation of the stand-level estimates. In addition, because photointerpretation is often more complex for two-storey stands, and the assigned value of height or CC may correspond to either the top or the lower canopy layer, a fair comparison between the reference and photointerpreted value would not be possible in these stand types. As a result, two-storey stands were excluded from further analysis. A complete list of stand structure types that were included in our analysis is provided in Table 5. CC ALS for BC and ON was calculated as follows: where n is the total number of 20 × 20-m cell within a stand polygon and i is a value of a current cell.
To mimic the fact that the interpreters in BC and ON focus on the top layer of the canopy to determine height, H ALS for those two datasets was calculated as a weighted mean of P95 values, with prop2m used as the weight: Provincial photointerpretation standards in QC are unique from those used in BC and ON (Table 1), and CC ALS and H ALS in QC were calculated using a different methodology. As noted in Table 1, height and CC in QC are estimated differently for stands of different heights or vertical structures, further complicating the calculation of reference estimates from the ALS data. The following approach was, therefore, used to emulate the operational procedures used for photointerpretation in QC.
First, ALS-based estimates of H ALS and CC ALS in QC required two additional pieces of information: stand height class and stand vertical structure. These two attributes were estimated using a canopy height model (CHM) and individual tree tops detected from the CHM. The CHM was derived for the QC study area at a 1m resolution. Stand height class ("0-3 m," "4-6 m" and "≥7 m") was determined based on proportions of CHM pixels above 5 and 7 m. If more than 25 per cent of CHM pixels within a stand had a height greater than 7 m, then the stand was classified in the "≥7 m." If less than 25 per cent of pixels within a stand had heights that were greater than 5 m, then the stand was classified as "0-3 m." If the percentage of CHM pixels within the stand that were greater than 7 m was less than 25 per cent, and the percentage of CHM pixels that were greater than 5 m was greater than 25 per cent, then the stand was classified as "4-6 m." Modal height and stand structure were determined based on individual tree heights. Treetops were detected using a local maxima filter implemented in the lidR package (Roussel et al., 2020). Modal height was calculated using a subset of the detected trees taller than 7 m if stand class was "≥7 m," trees between 4 and 7 m if the height class was "4-6 m," and trees shorter than 4 m if the height class was "0-3 m." Specifically, a height frequency table was created in 1-m wide classes. Following the photointerpretation standards (QC MFFP, 2015), modal height was then calculated based on the most frequently occurring height class and the two surrounding height classes.
Stand structure was determined based on the difference between 20th and 75th percentile of tree heights. A threshold value was defined for stands of different modal heights and if the difference between the two percentiles exceeded the threshold value, the stand was classified as multistage (Table 4).
Quantifying the precision of forest stand height and canopy cover estimates from air photo interpretation For multistage stands, height was determined as a weighted mean of tree heights instead of the modal height as described above. Individual tree volume was used as the weight and was calculated for each tree using the following formula: v = −0.3617h 2 + 24.038h − 146.53 (4) where v is the individual tree volume (in m 3 ) and h is the individual tree height. Equation (4) was developed using individual tree measurements from an extensive network of permanent sample plot data in Québec. CC was determined based on a proportion of CHM pixels above a sequence of height thresholds. The specific threshold value is determined for each stand based on the modal height. CC value is set to the proportion of CHM pixels above the modal height, and rounded to the nearest tenth.

Analysis approach
The photo-based measures of CC and height were compared to the ALS-based reference values. For each stand, we calculated the difference between the reference estimate and the API estimate. We then calculated the absolute and relative mean difference (MD, MD%) and the absolute and relative standard deviation of the differences (SD, SD%) using the following formulas: where N is the number of stands, y ALS is the ALS-based value, y API is the photo-based value for stand i,ȳ is the mean of the ALSbased estimate of height or CC. We used a paired Wilcoxon test to evaluate the null hypothesis that the medians of the reference (ALS) estimate and the API estimate were equal (at α = 0.05). The aforementioned measures of agreement between the photo-based and ALS-based attributes were calculated overall, and post hoc for the following strata: dominant species (Table 5), inventory stand structure (Table 6), age class (Table 6), as well as ALS canopy complexity (Table 6). In addition, to determine if stand height influenced photointerpretation of CC, or if CC influenced photointerpretation of stand height, we included a final comparison in which stand height MD% were stratified by CC class, and conversely, CC MD% were stratified by height class. To determine if the differences between groups within strata were significant, we used a Wilcoxon test when there were two groups, and a Kruskal-Wallis test with Dunn's post hoc test when there were more than two groups.

Results
After filtering stands based on their structure and species composition (Table 5), a total of 5084 (18.1 per cent), 30 723 (35.7 per cent) and 52 120 (20.02 per cent) stands remained within each of the BC, ON and QC study sites, respectively. The average stand size for these subsets was 10.6 ha in BC (σ = 20.0 ha), 9.9 ha in ON (σ = 13.0 ha), and 4.4 ha in QC (σ = 2.4 ha). Comparisons between API and ALS estimates of stand height and cover were undertaken overall and by strata described in Table 6.

Overall comparisons of stand-level API and ALS estimates of height and cover
The variability in photointerpreted estimates of stand height (SD%) ranged from 24.96 per cent in QC to 29.24 per cent in ON, whereas the variability of photointerpreted estimates of CC was greater, ranging from 27.92 per cent in ON to 56.85 per cent in BC (Figure 2; Tables 7-9). In general, the relationship between H API and H ALS was linear, and was similar for all three study sites; however, large differences can be observed for stands where H API is <3 m in ON (Figure 2). In contrast, the relationship between CC API and CC ALS was non-linear and markedly different for each of the study areas, partly due to differences in the use of a continuous variable in BC vs the rounding approach followed in ON (round to the nearest 10 per cent) and QC (Table 1). Of note, CC API is not assigned for short stands in QC (CC API = 0 when H API < 4 m); however, in some stands, H ALS was ≥4 m and CC ALS was estimated. This resulted in large discrepancies visible in the scatterplot (Figure 2).
A non-stratified analysis of CC shows that the average difference between CC ALS and CC API is markedly different in each of the study sites, with ON being the only site where the majority of the photointerpreted CC values were overestimated relative to CC ALS . BC had the greatest variability in CC API (56.85 per cent), but also had the greatest variability in stand ages and heights (Table 2). Our results showed that for stands that had high CC ALS , CC API was generally underestimated in BC and QC, and overestimated in ON (with the exception of stands dominated by trembling aspen). Based on the results, it is also apparent that interpreters rarely assign high CC values (e.g. above 90 per cent in BC), and that the majority of the CC API are within the 50 per cent-80 per cent range. In addition, photointerpreters tend to round the CC values to the nearest tenth, which is especially evident in ON (Figure 2). Abbreviations correspond to the terminology used in each jurisdiction. ALS canopy complexity For all jurisdictions, complexity was defined using the standard deviation of the 95th height percentile. Stands were classified as being either homogenous (SD < 2 m) or heterogeneous (SD ≥ 2 m).

Post-stratified comparisons of stand-level API and ALS estimates of height and cover
Post hoc stratifications provided further insights into the variability in API estimates (Tables 7-9). Stratification by dominant species revealed that in BC, variability (SD%) in H API was greatest for western hemlock dominated stands (29.53 per cent) and lowest for Sitka spruce (18.74 per cent); however, the magnitude of the differences between species was not large (  Figure 3). For CC API , the magnitude of photointerpreter variability in CC estimates was similar for the dominant species examined in ON, Quantifying the precision of forest stand height and canopy cover estimates from air photo interpretation but markedly different among dominant species in BC and QC. CC API in BC were underestimated relative to the reference for all species, by an average of 30 per cent, with variability greatest for CC API estimates for red cedar (73.18 per cent, Table 7). In QC, CC API estimates had the largest variability for eastern white pine stands (94.9 per cent), which were underestimated on average relative to the reference data (MD% = 82.04 per cent, Table 9). When stratified by inventory stand structure class, we found that variability in photointerpreted estimates of stand height and cover was of a similar magnitude for simple and complex stand structural classes in ON and QC (Tables 8 and 9). However, we found that variability in H API and CC API was markedly larger for stands labelled as "very uniform" in BC (Figure 3), but we also note the small sample size for this group (N = 47 stands). There is also a trend in BC of increasing variability in photointerpreted estimates for CC API with increasing non-uniformity in inventory stand structure classes (Figure 4).
The effects of age on API estimates were variable across study sites; however, in general, we observed only a small effect of age class on the agreement between API and ALS estimates of stand height and cover. In BC, the relative variability in H API was similar for both age classes, whereas in ON, the variability was markedly greater for stands <80 years (SD% = 70.67; Table 8). In QC, variability in estimates of H API were only marginally greater for stands <80 years. Of note, whereas in BC and QC, MD% was greater for stands ≥80 years and stand height was overestimated, in ON, MD% was markedly greater for stands <80 years compared to stands ≥80 years (40.26 per cent vs −2.16 per cent).
When stratified by ALS canopy complexity, the variability in H API was similar for homogenous and heterogeneous stands in BC (Table 7; Figure 3), and slightly larger for heterogeneous stands in QC (Table 9; Figure 3). By contrast, in ON, the variability in H API in heterogeneous stands (SD% = 41.87) was double that of homogenous stands (SD% = 18.62 per cent; Figure 3). In ON and QC, differences between H ALS and H API were statistically significant for all stratifications considered. In BC, differences were not significant for moderately uniform stands, stands that were <80 years old and stands with a heterogeneous ALS canopy complexity class. For CC API , variability was comparable for homogenous and heterogeneous stands in BC and ON, but in QC was double for heterogeneous stands (52 per cent) compared to homogenous stands (25 per cent) (Figure 4).
When stratifying the relative difference in CC by H classes (Figure 4, top panel) different trends can be observed for BC vs ON and QC. In BC, the median of MD% is positive for all height classes, indicating that CC API was consistently underestimated relative to CC ALS . The magnitude of MD% in BC increases with increasing H, but then levels off for stands that are >10 m in height. Trends P-values indicate results of the Wilcoxon matched pairs test. Different letters (within strata comparison) indicate significant difference between groups at α = 0.05 determined with Wilcoxon test when there were two groups, and a Kruskal-Wallis test with Dunn's post hoc test when there were more than two groups. Groups that share the same letter in the "within-strata comparison" column are not significantly different from each other. For example, PICE-SIT and THUJ-PLI share the letter "b" and are therefore not significantly different, whereas ALNU-RUB is significantly different from all other species.
were different in ON and QC, with CC API overestimated in shorter stands (i.e. <20 m in ON and <15 m in QC) and underestimated in taller stands. Overall, the relative differences in CC were statistically different for the majority of height classes across all jurisdictions, although in BC, MD% were similar for stands with heights >10 m. Relative differences in H stratified by CC classes were less similar across jurisdictions ( Figure 5, bottom panel). In BC, the MD% in H by CC oscillated around zero, with H API being overestimated for stands with CC between 30 per cent and 60 per cent. In ON, H API was underestimated for stands with CC < 20 per cent and for denser stands (i.e. CC > 20 per cent), the H MD% were consistently close to 0. In QC, H API was overestimated for the majority of CC classes, especially for open stands with CC < 30 per cent. While H MD% were significantly different across all CC classes in ON and most CC classes in QC, not all CC classes had significant differences in BC. Of note, variability in photointerpreted estimates of stand height were greatest in BC and lowest in ON.
Quantifying the precision of forest stand height and canopy cover estimates from air photo interpretation P-values indicate results of the Wilcoxon matched pairs test. Different letters (within strata comparison) indicate significant difference between groups at α = 0.05 determined with Wilcoxon test when there were two groups, and a Kruskal-Wallis test with Dunn's post hoc test when there were more than two groups. Groups that share the same letter in the "within-strata comparison" column are not significantly different from each other.

Discussion
In this study, we compared API and ALS derived stand height and CC estimates to characterize the variability of the API-derived stand attributes. To enable fair comparisons, we derived the ALS-based estimates by mimicking the existing photointerpretation standards in each of the three study areas located in three different forest regions of Canada. Each of our study sites represented a unique forest ecosystem, with older and taller coastal temperate rainforests in BC, younger and denser boreal forests in ON, and more broadleaf and mixedwood forests in QC (Table 2). Our results showed that variability was greater for photointerpreted estimates of stand CC than for stand height, with the magnitude of the variability differing by study site, as well as by species, inventory stand structure, age and ALS canopy complexity. Whereas the relationship between H API and H ALS was generally linear, the relationship between CC API and CC ALS was non-linear ( Figure 2). H API was overestimated relative to the H ALS data in BC and QC, and underestimated in ON, while conversely, CC API was underestimated in BC and QC, and overestimated in ON, relative to CC ALS . Our objective was to quantify the consistency of photointerpreted estimates of CC and height, hence when comparing the photointerpreted values to the ALS-reference, we focused on the standard deviation of the differences between the API and ALS estimates (SD and SD%). The mean difference (MD and MD%) provided an indication of bias; however, this bias could have resulted from the implementation of the existing photointerpretation standards using the reference data, specific ALS point cloud characteristics, or in the case of the BC site, the time elapsed between the air photo and ALS data acquisition. Note that in the BC site, the issue of time between acquisitions is less pronounced for heights, which were projected forward in the inventory to match the ALS acquisition date. No such forward projection is possible for the CC estimate. While we have endeavoured to emulate API protocols for estimating CC with the ALS data-to the greatest extent possible-this is particularly challenging since each jurisdiction has very different approaches Forestry to the estimation of CC using API (Table 2). In cases where the MD is large, the variability in API estimates (SD, SD%) will also be greater. This was an issue at the BC site, given the nature of the CC API estimate (a continuous variable) and the non-linearity in the relationship between CC API and CC ALS (Figure 2), and should be considered when interpreting the results. Our results showed that often minor differences between compared variables were statistically significant (e.g. MD value of −0.15 m for stands <80 years old in QC). When interpreting the results, it is therefore important to consider not only the outcome of the statistical testing, but also the practical significance of the reported result. For example, in situ tree height measurements done with a hypsometer are in most cases considered equivalent when the difference is less than 0.5 m (Luoma et al., 2017;Wang et al., 2019). Therefore, if the difference between heights derived with API and ALS is lower than the error expected with traditional forest mensuration techniques, the two should be considered equivalent.
A comparison of the non-stratified data showed that the variability in photointerpreted stand height (H API ) ranged between 24.96 per cent in QC and 29.24 per cent in ON. Research has demonstrated stability in H ALS across a range of site conditions and instruments in QC (Fradette et al., 2019). In ON, we observed greater variability in H API estimates for stands that were interpreted as being short (i.e. <5 m). We suspect that in recently harvested stands, residual trees may have been ignored by the photointerpreters, yet were included in the ALS point cloud and derived estimates. These large differences were present only in ON and influenced the overall assessment of variability in H API .
Different post hoc stratifications used in this study provide more insight into the nature of the variability in photointerpreted estimates of stand height ( Figure 3) and CC (Figure 4). For all study sites, stratification by dominant species indicated that the photointerpretation of height and CC was more challenging for some tree species (Tables 7-9).
Quantifying the precision of forest stand height and canopy cover estimates from air photo interpretation Stands <80 years of age in ON were the only case where the variability in H API were markedly larger than for the older stands (SD% = 70.67 per cent vs 16.35 per cent; Table 8). We relate this effect in part to the aforementioned issue of differences between H API and H ALS for stands with residual trees in harvest blocks, which may be ignored in the API estimate. For CC, our results indicated that in the majority of stratifications considered across the study sites, there was a significant difference between CC API and CC ALS , with the only exception being stands ≥80 years in QC (Table 9). Variability in CC was greatest for BC; however, the aforementioned temporal differences between API and ALS may have influenced the variability. In addition, the height threshold used for the ALS CC calculation (2 m) may have also influenced this result. The same threshold was applied consistently across all sites; however, in the coastal temperate forests of BC, understorey vegetation can exceed 2 m in height and this would have been included in the CC ALS , but not CC API . In terms of ALS complexity class, the magnitude of the variability in API estimates were comparable in both BC and ON for CC. In QC, however, variability for heterogeneous stands was more than double that of stands that were considered homogenous. This in part may be explained by the different nature of the forest types present, with more broadleaf and mixedwood stands at the QC site. We observed greater variability in API estimates of height in vertically heterogeneous stands in ON, where SD% was more than double that of vertically homogenous stands (41.87 per cent vs 18.62 per cent).
Our results indicated that the influence of CC on the variability in photointerpreted stand height and vice versa, varied amongst the different study sites ( Figure 5). In general, CC had less influence on the photointerpretation of stand height, than height had on the interpretation of CC, especially in BC and QC. We theorize that the general lack of consistency in the results across jurisdictions relates to differences in photointerpretation standards and differences in forest types and structures. For example, in BC and ON, the magnitude of the underestimation of CC is greatest in the tallest stands ( Figure 5), suggesting that photointerpreters may focus more on upper canopy layers when estimating cover and not account fully for shorter trees in the stand, and/or these shorter trees may be obscured in shadow or by upper canopy layers, precluding their inclusion in the estimate of stand-level CC. In BC, shorter stands in the forest types studied herein can have simpler vertical structures with less understorey, enabling a more consistent interpretation of CC. Theoretically, stands that are more open should provide the opportunity for more consistent estimation of stand height, as photointerpreters may be able to see the terrain surface in these stands. However, in the QC study area, shrubs and low vegetation are common, which may obscure the ground and contribute to under-or overestimation of stand height in open stands. In their simulation of photointerpretation uncertainty on NFI estimates, Magnussen and Russo (2012) estimated uncertainty for stand height by adding error to the photointerpreted height with a mean of 0 and a standard deviation that varied by maturity class (regeneration, sapling, young, immature mature, overmature), vertical structure (single layer, multilayer) and species group (softwoods, hardwoods and mixedwood). The results presented herein support the need to account for variability in these factors when attempting to account for uncertainty in forest inventory estimates. Uncertainty for crown closure was determined by Magnussen and Russo (2012) via interviews with photointerpreters that indicated an error of ∼6 per cent, and thus uncertainty was determined as a random draw from a beta distribution with a mean equal to the interpreted value for crown closure with a standard deviation of 0.06. The authors also note that they used data from two different jurisdictions, but assumed that the level of uncertainty was the same in both, despite differences in interpretation guidelines, aerial imagery and forest structure, assuming that the impacts of these would likely be minor. The results we presented herein suggest that an error of 6 per cent may be too low for characterizing uncertainty in API estimates of CC and, moreover, as uncertainty does vary between jurisdictions and is influenced by jurisdiction-specific photointerpretation procedures and stand attributes, those factors should also be considered.
The results of this study have quantified how photointerpreted estimates of stand height and CC vary by jurisdiction, dominant species, structure, age and canopy complexity classes. Although we identified several patterns in the variability of the photointerpreted estimates that were common in all three study sites, especially for height, we found that the variability in the photointerpreted estimates was not consistent, nor was the magnitude or direction of the bias between the API and the reference estimates. Of particular note, photointerpreted estimates of stand height were less variable than photointerpreted estimates of CC. Given the importance of CC estimates for a range of applications (Jennings and Brown, 1999), including as indicators of ecological processes (McElhinny et al., 2005) and in the definition of forest area (Wulder et al., 2020), additional or refined approaches to improve consistency in photointerpreted estimates of CC are warranted.
API of forest inventory attributes is a subjective process and a difficult and challenging task. Skilled interpreters often have extensive field and inventory experience, as well as Forestry local knowledge that enable their interpretations. However, interpretation projects for large forest management areas often require numerous interpreters and achieving consistency amongst interpreters is challenging. Moreover, forest inventories include an increasing number of attributes to address a broad range of forest management information needs (White et al., 2016), thereby also increasing the scope and complexity of the photointerpretation process. Added to this operational reality is the fact that the pool of experienced photointerpreters for forest inventory is diminishing, putting further pressure on forest inventory programs. All of these factors, combined with varying jurisdictional standards, likely contribute to the variability in photointerpreted estimates we have quantified herein and underscore the opportunities afforded by technologies such as lidar to improve the consistency and precision of forest inventory information.

Conclusion
ALS-derived forest inventory attributes have been quantified as accurate and consistent over a given area of interest. Given the known quality and consistency of ALS-derived forest inventory attributes, these data are highly desired by forest managers and practitioners. The higher cost of ALS over photointerpretationbased forest inventory has meant that in some jurisdictions, ALS data are strategically obtained over areas of high interest or value. Such a calculated implementation strategy allows for the benefits of ALS-derived inventories to be captured over specific areas, while minimizing costs of data acquisition, processing and modelling, among other considerations. Photointerpreted forest inventory attributes remain as a cost-effective means to represent larger areas with strategic forest inventory information needs in mind. With emerging opportunities to modernize forest inventory practices, knowledge of the precision and accuracy of attribution captured in photo-based forest inventories is essential to fully determine the precision and accuracy gains that may be enabled by the increasingly diverse range of remote sensing technologies available to support the forest inventory process.
In this study, we have demonstrated that uncertainty in photointerpretation of stand height and CC does vary by dominant species, inventory stand structure, age, and ALS canopy complexity, and that uncertainty in photointerpreted estimates of height is generally less than the uncertainty associated with photointerpreted estimates of CC. Importantly, we found that this variability in photointerpretation uncertainty also differed by jurisdiction, reflecting differences in forest types and photointerpretation standards. In practical terms, there are several implications of these results. Firstly, we determined that photointerpreted stand heights are more precise than estimates of CC, suggesting that approaches to improve the precision of CC estimates for individual photointerpreters and across photointerpretation teams may be warranted. Secondly, studies that seek to simulate the impact of photointerpretation uncertainty on subsequent estimates (i.e. forest area, site index, etc.), or alternatively that seek to quantify the gains in precision that may be afforded by technologies such as ALS or digital aerial photogrammetry, should consider the impact of dominant species, age and structure on uncertainty in the estimation of both stand height and CC. For a country such as Canada with a multitude of jurisdictions responsible for forest management, each with their own forest inventory and photointerpretation standards, regional variability must also be considered in determining uncertainty in photointerpretation estimates.