-
PDF
- Split View
-
Views
-
Cite
Cite
Kimberley D. Brosofske and others, A Review of Methods for Mapping and Prediction of Inventory Attributes for Operational Forest Management, Forest Science, Volume 60, Issue 4, August 2014, Pages 733–756, https://doi.org/10.5849/forsci.12-134
Close - Share Icon Share
Forest inventory attributes are an important source of information for a variety of strategic and tactical forest management purposes. However, it is not possible or feasible for field inventories to be conducted contiguously across large areas, especially at a resolution fine enough to be useful for operational management. Therefore, a large number of quantitative modeling and prediction methods have been and are being developed and applied to predict and map forest attributes, with the goal of providing an accurate, spatially continuous, and detailed information base for practitioners of forestry and ecosystem management. This article reviews the most commonly used prediction techniques in the context of a comprehensive modeling framework that includes a discussion of methods, data sources, variable selection, and model validation. The methods discussed include regression, nearest neighbor, artificial neural networks, decision trees, and ensembles such as random forest. No single technique is revealed as universally superior for predicting forest inventory attributes; the ideal approach depends on goals, available training and ancillary data, and the modeler's interest in tradeoffs between realism and statistical considerations. Useful ancillary data included in the models tend to include climate and topographic variables as well as vegetation indices derived from optical remote sensing systems such as Landsat. However, the use of airborne LiDAR in modeling of forest inventory attributes is increasing rapidly and shows promise for operational forest management applications. Different considerations are encapsulated within a generalized model development framework that provides a structure against which tradeoffs can be evaluated.
Detailed, accurate information about forest composition and structure is needed for effective management of forest ecosystems. Whereas the goal of forest inventory is to quantify the amount and type of forest resources and related attributes in a given area, the goal of forest mapping is to depict the spatial distribution of those resources and related attributes (Corona 2010). Detailed information on forest composition and structure is necessary for silviculture (e.g., Pond et al. 2014), ecological restoration and risk assessment (e.g., Pierce et al. 2009), assessment of biodiversity and habitat (e.g., Turner et al. 2003, Martinuzzi et al. 2009, Helmer et al. 2010), carbon management and reporting (e.g., Hall et al. 2006, Gibbs et al. 2007, Froese et al. 2010, Hoover and Rebain 2011), and forest health assessment and management (e.g., Solberg et al. 2004, Wolter et al. 2009), among other purposes. Forest inventories can provide this information for sampled plots, which can be summarized to the stand or higher levels via standard statistical procedures. However, forest and ecosystem managers benefit from spatially continuous maps of inventory data that can be tailored to specific objectives or projects.
Information needs differ across the forest planning spectrum. Although strategic planning efforts typically leverage coarse-resolution information, fine spatial resolutions such as stand-level summaries and tree-level attributes may be needed for tactical planning and operational forest management (Falkowski et al. 2009b). For example, tree-level data are needed for initialization of forest growth and dynamics models for use in operational management (Falkowski et al. 2010), whereas county-, state-, or regional-level inventory summaries are used for forest planning at the regional or national levels (McRoberts et al. 2010b). Efforts to conserve biodiversity and restore ecosystems for the benefit of wildlife communities often also require information at fine spatial scales. For example, Lesak et al. (2011) found forest structure attributes derived from airborne light detection and ranging (LiDAR) to be useful in predicting songbird diversity in temperate forests. Because data can be easily summarized and scaled up, spatially continuous (i.e., “wall-to-wall”) maps of inventory attributes at detailed to moderate spatial resolutions (e.g., 1−30 m) provide the greatest flexibility for meeting the widest variety of forest information needs by allowing researchers and forestry practitioners to tailor analyses to meet particular objectives.
In the context of operational management, forest inventory attributes may refer to a wide range of variables including individual tree measurements (e.g., species, diameter, height, volume, and biomass) as well as stand- or plot-level averages or totals of tree-level measurements (e.g., basal area, tree density, quadratic mean diameter, biomass, and volume). Collecting detailed inventory data across large, contiguous regions is not practical via traditional forest inventory methods because of time, expense, and other conditions; therefore, it is useful to have an effective way to predict attributes of interest at unsampled locations. A variety of statistical methods have been used toward these goals. Although regression is common, imputation, interpolation, and machine-learning algorithms are also frequently used, typically by taking advantage of remotely sensed and other ancillary data sets that are available across an area of interest. The ancillary data function as explanatory variables in models predicting necessary forest attributes, using ground samples from the forest inventory or other source (i.e., reference data) for model training and validation. Most comparisons of modeling and prediction methods that exist in the literature apply two or three methods to specific data sets (e.g., Berterretche et al. 2005, Hudak et al. 2008, Eskelson et al. 2009, Powell et al. 2010, Goerndt et al. 2011). Because the number of individual studies and methods used is large, it can be challenging to gain even a broad sense of strengths and weaknesses of the many different approaches.
In development of a prediction model, in addition to the difficulties associated with selection of an appropriate prediction method, the selection of ancillary data and variables can be daunting. Ancillary data can originate from many different sources, including satellite or airborne sensors (e.g., optical, radar, and LiDAR), digital models, aerial photo interpretation, field-based mapping efforts, or a combination of these. For example, prediction of fine-scale (e.g., tree-level) attributes might require high spatial or three-dimensional resolution in the source data (e.g., LiDAR instead of Landsat). It can also be difficult to decide which variables are most appropriate within a particular context. Although certain variables may be useful for one study area, they may be irrelevant to another with different primary environmental gradients. Conversely, there may be some predictors that are universally useful (e.g., vegetation indices such as the normalized difference vegetation index [NDVI]). Variable selection can itself be the essence of a modeling exercise when the purpose is to understand the relationship between inventory and predictor variables. However, even if generating accurate predictions is most important, selecting useful variables and discarding others can have a significant impact on model performance (Hudak et al. 2008, Packalén et al. 2012). Indeed, variable selection can be more important than method selection in modeling of forest inventory.
Both regional forest planning and operational forest management could benefit from detailed, geographically extensive forest inventories derived via statistical modeling techniques leveraging remotely sensed data. Although many studies have examined the use of various types of ancillary data as well as the utility of various approaches for mapping particular data sets (e.g., Cairns 2001, Lefsky et al. 2001, Labrecque et al. 2006, Maselli and Chiesi 2006, Wallerman and Holmgren 2007, Hudak et al. 2009, Goerndt et al. 2011, Tonolli et al. 2011), a cohesive synopsis of analytical methods for developing geospatial forest inventory mapping models in the context of the overall modeling process is particularly valuable. The primary objective of this article is to provide an overview of common approaches for generating maps of operational forest inventory attributes across large spatial extents. This overview includes a review and discussion of (1) common statistical prediction methods (including strengths and limitations), (2) several variable selection approaches suitable for different data sets and prediction techniques, (3) sources of modeling data and capabilities of different remote sensing data for predicting various forest inventory attributes, and (4) relevant aspects and approaches for validating and assessing the accuracy of the prediction models and map products.
Modeling and Prediction Methods
The methods that have been used for the prediction and mapping of detailed forest attributes from geospatial data are many and varied. Most models have been built to predict univariate responses, but the desire to reduce analysis time, maintain the covariance structure of the data, and ensure realistic combinations of attribute values has resulted in increased emphasis on multivariate techniques that predict a suite of attributes simultaneously (Moeur and Stage 1995, Lister 2009, Nothdurft et al. 2009, Falkowski et al. 2010). Many parametric approaches are common in the forestry literature, including maximum likelihood estimation (Hagner and Reese 2007, Baatuuwie and Van Leeuwen 2011), discriminant analysis (Thenkabail et al. 2004, van Aardt et al. 2008), and regression (le Maire et al. 2011, Tonolli et al. 2011). However, the nature of many forest inventory variables complicates the use of these methods. The popularity of nonparametric approaches means that these receive significant attention in our review. However, we include an overview of regression as well, given the importance and unique advantages of that method.
Regression
As a parametric technique, regression has traditionally been the most widely used method for predicting forest attributes of operational interest from ancillary data and remains a common analytical tool (e.g., Fiorella and Ripple 1993, Means et al. 1999, Wolter et al. 2009). For example, le Maire et al. (2011) used stepwise multiple linear regression to predict stand dominant height and stand merchantable wood volume from age, NDVI, and a number of bioclimatic variables in Brazil. Hudak et al. (2006) used multiple linear regression models to predict and map basal area and tree density in temperate coniferous forests. In addition, many of the studies detailing the operational use of airborne LiDAR data in forest inventories have used regression to predict forest inventory variables such as tree height, basal area, volume, biomass, and density (e.g., Næsset 2007, Woods et al. 2011). For example, Stephens et al. (2012) used a regression developed from airborne LiDAR height and canopy cover metrics to model total carbon stocks in New Zealand.
The most common form of regression for forest attribute prediction uses a linear or intrinsically linear multiple regression equation solved using ordinary least squares (OLS) (e.g., Næsset 2007, le Maire et al. 2011, Tonolli et al. 2011). Intrinsically linear equations are those that may be linearized by transformation, such as multiplicative models that are linear on the log scale (e.g., Næsset 2002). When the coefficients are transformed, they may be estimated with OLS; however, if the coefficients are back-transformed to the original scale, adjustments may be necessary to correct for bias (e.g., Fehrmann et al. 2008). For example, Næsset et al. (2005) used a multiplicative regression model, linearized with a log transformation, to predict forest biophysical attributes (mean height, volume, and others) using derived attributes from LiDAR in Norway. In some cases, truly nonlinear models may better represent many of the ecological relationships of interest at the operational level. For example, Muukkonen and Heiskanen (2005) used nonlinear regression to model boreal forest attributes including volume, age, and overstory biomass as a function of ASTER spectral data but found that a linear model was sufficient for understory biomass.
Frequently, the usual OLS assumptions about errors (independent, homoskedastic, and Gaussian) (Sokal and Rohlf 2012) are not met, and remedial measures are used. These include transformation of the response variable (e.g., Frazer et al. 2011), weighted least squares (e.g., Brown et al. 1989), or a shift to alternative methods for model fitting, such as partial least squares (e.g., Coops et al. 2003, Townsend et al. 2003), generalized method of moments (e.g., Nord-Larsen and Schumacher 2012), or mixed-effects models using restricted maximum likelihood (e.g., Fehrmann et al. 2008). The optimum approach depends on the context of the problem. The use of generalized linear models is common when the goal is modeling distributions (e.g., diameter and height) and the error covariance matrix is explicitly non-Gaussian. See, for example, Breidenbach et al. (2008), who modeled diameter distribution using a Weibull function. Parameter estimates from OLS can be unstable in the presence of multicollinearity, which can lead to models that extrapolate poorly outside of the training data or are difficult to interpret. Careful screening during model construction is important, using diagnostics such as condition number (e.g., Popescu et al. 2003, Næsset et al. 2005) or variance inflation factor (Sokal and Rohlf 2012). Alternative methods like partial least squares or canonical correlation analysis (CCA) (e.g., Cohen et al. 2003, Schroeder et al. 2007) can reduce multicollinearity because they extract linear combinations of predictors that optimize the relationship with the response variable. However, such models are much more difficult to interpret and may not provide any measurable improvement over OLS in development or application (Næsset et al. 2005).
OLS regression assumes that there are no measurement errors in the explanatory variables, which is often not realistic, especially with regard to remote sensing variables (Curran and Hay 1986, Berterretche et al. 2005). Standard major axis (SMA) regression has been proposed and used increasingly as a way to account for the errors in both the response and explanatory variables (Cohen et al. 2003, Berterretche et al. 2005). The method is also known as reduced major axis or geometric mean regression, but the term SMA regression is preferred (Sokal and Rohlf 2012). Cohen et al. (2003) compared SMA and OLS regression for predicting leaf area index (LAI) and canopy cover and found that SMA regression maintained the variance structure of the observations in the predictions, whereas OLS regression did not. Schroeder et al. (2007) preferred SMA regression for this reason, using it to predict tree cover (in an absolute percentage of ground surface) from Landsat reflectance bands applied to normalized Landsat images in a date-invariant approach.
Recursive Partitioning: Classification and Regression Tree (CART) and Multivariate Adaptive Regression Splines (MARS)
CARTs, sometimes called decision trees, work by recursively subdividing training (i.e., reference) data into more and more homogeneous subgroups. Classification trees are those that predict a categorical response, such as forest type, whereas regression trees predict a continuous response, such as basal area. At each level, a splitting rule is developed to partition the observations into two subgroups: for classification, the rule maximizes intergroup variability, whereas for regression the rule minimizes the sum of squared residuals (Hastie et al. 2009). This results in a hierarchical “tree” that places observations into end nodes with other similar observations.
Decision trees have been used extensively for classification of forest and other vegetation types (e.g., Ruefenacht et al. 2008, Mora et al. 2010) as well as for prediction of forest structural attributes (e.g., Falkowski et al. 2005, Blackard et al. 2008, Helmer et al. 2010). Decision trees have a large number of characteristics that make them an attractive option for prediction in forestry. They are nonparametric and can readily accommodate nonlinear responses, variable interactions, both continuous and categorical explanatory variables, and missing values and are relatively unaffected by outliers or multicollinearity (Urban 2002, Piramuthu 2008). Decision trees can also handle large data sets and large numbers of variables and are useful for variable reduction because a measure of the relative importance of the different variables can be calculated. Here, relative importance is a sum of internal metrics used to construct individual trees and is different from “variable importance,” commonly provided for some ensemble methods (see Hastie et al. (2009) for a detailed explanation). Trees can be represented graphically with ease, and therefore even complex data structures such as thresholds and nested responses can be interpreted readily (Young et al. 2009). Although typically used to predict a univariate response, De'ath (2002) demonstrated their utility in predicting multivariate ecological responses. Limitations of decision trees have been found to include potential instability in the models (i.e., different tree structures) that can result from slight changes to the input data (Prasad et al. 2006). Although use of closely correlated variables as surrogates for missing data allows flexibility in the use of incomplete data sets, this can result in very different end models if a variable used heavily as a surrogate is removed from the analysis. In addition, some decision tree algorithms (e.g., CART) (Hastie et al. 2009) also show selection bias, in which a predictor is more likely to be chosen if it has a greater number of distinct values or a larger sample size (Young et al. 2009). Some algorithms have been developed to counter this problem (e.g., GUIDE) (Loh 2011). A large imbalance of cases in the response classes can also result in a greater number of predictions falling within the more abundant class (He and Garcia 2009). The use of stratified sampling designs may help balance sample size across classes.
MARS (Friedman 1991) produces a model based on local relationships between response and predictors using a set of adaptive piecewise basis functions (i.e., linear regressions) to fit separate splines to intervals of the explanatory variables. The “knots” (endpoints of the intervals) and the identification of the predictors are determined by the basis functions in a forward and backward stepwise procedure that involves overfitting the model and then removing the knots that contribute least to model fit. For continuous variables, the discontinuous splits obtained with decision trees may be arbitrary. By combining spline fitting and recursive partitioning, MARS overcomes this problem and models continuous smooth functions that reflect the local pattern. However, predictions can be unrealistic because the local nature of the data may affect the basis functions excessively; further, selection of appropriate parameters can be tedious and time-consuming, often involving trial and error (Prasad et al. 2006). MARS has been used successfully to predict forest stand biomass, age, diameter, and crown cover in the western United States (Moisen and Frescino 2002), and Muñoz and Felicísimo (2004) demonstrated its utility for mapping potential plant distributions in Latin America and Spain.
Artificial Neural Networks (ANNs)
ANN models are nonlinear, “black box” approaches to prediction and mapping that were originally based on the biological problem-solving process of the brain. The idea is that the data describing the unknown situation are presented to the trained model, which then provides the predictions. The training of the model involves “learning” the patterns from the observations, which takes place in different ways, depending on the ANN chosen. There are many types of ANN algorithms that can be used based on the specific objectives and characteristics of the data set. The most popular algorithm is the multilayer perceptron (MLP) (also known as multilayer feed-forward networks trained by back-propagation or simply back-propagation networks), although other algorithms such as Kohonen self-organizing maps and Hopfield networks have also been used with success (Lek and Guégan 1999). The MLP is a supervised model in that data with known outputs are used in the model training procedure. Data flow from the input layer containing the explanatory data through one or more hidden layers, with predictions provided in the output layer. The hidden layers are where the “learning” takes place and contain a number of nodes that represent a set of models or weights that are applied to the predictors to obtain the output values. These weights are determined in each hidden layer node by linearly combining all the input variables into a derived variable and then performing a nonlinear transformation on the derived value (Campbell et al. 2007). The learning process in the MLP algorithm consists of checking predictions based on a certain set of weights against the known values and then going back and correcting the weights to reduce the error if needed, until the network gives acceptable predictions. The user specifies the number of hidden layers and hidden layer nodes to use in the model. Multiple hidden layer nodes may increase the predictive ability of the model, but too many may result in model overparameterization and decreased predictive ability (Blackard and Dean 1999). Use of too many nodes will also increase the computing resources needed to train the model. Selection of appropriate model architecture and parameters can be challenging, typically requiring a trial-and-error process (Blackard and Dean 1999, Peng and Wen 1999, Niska et al. 2010).
Although not as widely used as other approaches in ecology, ANNs have been applied to forestry data, especially classification problems (Fitzgerald and Lees 1992, Peng and Wen 1999, Rogan et al. 2008). Fewer studies have used ANNs to predict continuous variables (but see Jensen et al. 1999, Moisen and Frescino 2002, Niska et al. 2010). Ingram et al. (2005) used ANNs to model plot- and stand-level basal areas from Landsat ETM+ spectral data in a tropical forest, but they were not able to obtain a significant ANN model for predicting stem density from the same predictors. A study by Niska et al. (2010) in the boreal forest achieved high accuracy using three ANN models to predict species-specific plot and stand volumes from a number of spectral and textural variables derived from aerial photographs and airborne LiDAR data.
Neural network approaches may be well suited to problems in which accurate predictions are the primary goal but, because of their black box nature, are less suited to situations in which the relationships underlying the predictions are of primary interest (Cartwright 2008). Their uncomplicated application to data sets containing nonlinearity, nonnormality, heterogeneity, and multicollinearity (Campbell et al. 2007) makes ANNs potentially useful tools for ecologists, particularly when the underlying processes and structure are not known (Peng and Wen 1999). In addition, they are able to use data originating from multiple sources in the same model (Atkinson and Tatnall 1997). However, Niska et al. (2010) note that careful variable selection is important when an ANN is used because variables that have no relationship with the response can have a large influence on the predictive power of the model. The computing time required for training and testing the model can also be prohibitive, and the method can be difficult to understand and implement effectively, especially with regard to optimal selection of the hidden layers and nodes (Blackard and Dean 1999, Peng and Wen 1999, Moisen and Frescino 2002).
Ensembles: Random Forest and Other Methods
A collection of predictive models may be combined into an ensemble to leverage their collective strengths (Hastie et al. 2009). Perhaps the most popular ensemble method at present is the random forest (RF) decision tree algorithm, which combines predictions from a myriad of individual decision trees (Breiman 2001, Cutler et al. 2007). RF uses bootstrapping to select samples that are then subjected to model fitting, but only a small number of randomly selected predictor variables are used to find the best split at each node, decreasing the correlation between trees and reducing bias. Each tree is grown fully and then used to predict the out-of-bag (OOB) observations (i.e., observations not included in the bootstrap sample). A simple majority or average of the predictions determines the class placement or prediction of an observation. RF maintains the advantages of simpler CART algorithms but is not subject to overfitting and tends to be less biased (Breiman 2001, Prasad et al. 2006). A measure of relative variable importance similar to that used for CARTs may be obtained, but RF can also use OOB samples to construct a different metric. Here, variable importance is calculated as the difference in prediction accuracy when OOB values for each variable are permuted, compared with the actual values (Hastie et al. 2009). Randomization nullifies the value of a variable; thus, a large decrease in accuracy when a variable is permuted implies that the variable is important.
Although the application of RF in forestry is relatively recent, many studies have had success using it to classify vegetation (Grossman et al. 2010) and predict various forest attributes such as live biomass (Baccini et al. 2004, Houghton et al. 2007), tree cover or species presence (Evans and Cushman 2009, Liknes et al. 2010), forest structural stage (Falkowski et al. 2009b), and disturbance (Stumpf and Kerle 2011), as well as individual tree attributes (Yu et al. 2011). For land cover classification, Gislason et al. (2006) achieved higher overall accuracy using RF than using a single decision tree, although the authors did not specify whether this difference was statistically significant. Although Prasad et al. (2006) found that RF could be computationally time-consuming, Gislason et al. (2006) found it to be very fast. RF is less interpretable than simpler tree methods because it is cumbersome to investigate the large collection of individual trees. However, metrics such as variable importance aid in interpretation, and other techniques such as conditional density estimation can be used post hoc to reveal the underlying biophysical relationships between attributes of interest and important predictor variables (Falkowski et al. 2009b, Henareh Khalyani et al. 2012).
Although RF has some significant advantages over single-tree methods, some recent studies have shown that variable selection can have an important influence on the results. For example, Falkowski et al. (2009a) and Murphy et al. (2010) have suggested that multicollinearity among the predictors might decrease RF model accuracy. Because tree-based methods are robust to overfitting, they have been commonly applied in contexts in which there are a large number of predictors relative to the number of observations in the training data set. However, because RF uses only a subset of predictor variables to find the best split at each node, the result can be sensitive to the inclusion of a large number of predictors with no causal relation, which reduces the probability of selecting useful variables (Hastie et al. 2009). Thus, careful variable selection strategies during model construction may lead to significant improvements.
Other ensemble methods have been used less commonly for operational forest management. Bootstrap aggregation, or “bagging,” is a method in which, as in RF, many decision trees are grown and averaged using bootstrapping of the training data set. It differs from RF in that all variables are used to find the best split at each node, and thus bagging lacks the advantages of RF when construction of less correlated trees is desired. Prasad et al. (2006) found that bagging trees performed better than simple regression trees and MARS, and about as well as RF, in mapping the distributions of four common forest tree species in the eastern United States under different climate scenarios. In their evaluation of land cover classification techniques, Gislason et al. (2006) also found that their best bagging classifiers performed approximately on par with RF but better than a single decision tree. Stochastic gradient boosting (SGB) (Friedman 2002) is another refinement of the single decision tree approach. In SGB, many small trees are built sequentially from “pseudo” residuals (the gradient of the loss function of the previous tree). At each iteration a tree is built from a random subsample of the data set, producing an incremental improvement in the model. Using only a fraction of the training data increases both computational speed and predictive accuracy, while also reducing overfitting (Moisen et al. 2006). SGB has been found to perform better than standard classification and regression trees and other modeling techniques for species modeling (Guisan et al. 2007), species mapping and basal area prediction (Moisen et al. 2006), prediction of aboveground biomass (Carreiras et al. 2012), and assessment of forest damage (Kupfer et al. 2008).
Nearest Neighbors
Nearest neighbor techniques (also called reference sample plot methods) (Kilkki and Päivinen 1987) predict the response variable(s) in an unsampled target unit (e.g., pixel) by computing a statistical distance metric between the target and reference samples, or “neighbors,” and then assigning the value(s) of the closest neighbor(s) to the target unit. The distance metric is computed from ancillary (sometimes termed “auxiliary”) variables drawn from the feature space (sensu McRoberts 2012) that are common to both the reference and target samples. In other words, the reference and target samples are linked via the ancillary variables, which are usually obtained from remote sensing. Much of the early work on nearest neighbor techniques originated and was conducted in the Nordic countries, especially Finland (Tomppo 1991, Tokola et al. 1996). This approach, often referred to generically as k-nearest neighbor (k-NN), can use any number (k) of nearest neighbors to impute the target value, usually by averaging or by taking the mode or median if the response is categorical. Besides the size of k, other choices to be made when this method is used include the distance measure (Chirici et al. 2008, Hudak et al. 2008) and neighbor weighting scheme (if any). Although nearest neighbor techniques can be used to predict a single response variable, they are especially useful for the prediction of multiple response variables simultaneously. For example, Vauhkonen et al. (2010) used nearest neighbors to simultaneously predict tree species, diameter, height, and stem volume from airborne LiDAR data in Finland.
Several distance metrics have been used with k-NN analysis to predict forest attributes (Chirici et al. 2008, Hudak et al. 2008). In addition to simple metrics that are independent of the response variable, such as Euclidean or Mahalanobis distance, many metrics that are based on the observations of the response have been developed. The most similar neighbor (MSN) approach (Moeur and Stage 1995) calculates distance (or similarity) using the results from CCA and then predicts values using k = 1. Both the canonical coefficients and canonical correlations are incorporated into a squared difference function to weight the ancillary variables according to their multivariate predictive power and allow preservation of the covariance structure of the response data set. MSN or a variant in which k >1 has been applied in a wide variety of studies to predict and map forest structural attributes (LeMay et al. 2008, Nothdurft et al. 2009), and others have used this approach to obtain tree lists for unsampled areas (Eskelson et al. 2008). The gradient nearest neighbor (GNN) technique is similar to MSN but differs in that it uses CCA to establish relationships between the ancillary data and observed response variables as an initial step (Ohmann and Gregory 2002, Ohmann et al. 2011). Distances are calculated in multivariate ordination space using axis scores to weight the ancillary variables and eigenvalues to weight the axes and attribute values of the single closest plot (k = 1) are assigned to the target pixel. Wilson et al. (2012) used a modified GNN approach to map tree species distributions at a moderate resolution in the eastern United States, noting that the approach has potential for operational application. Crookston and Finley (2008) incorporated the RF algorithm into k-NN as a distance measure using an RF proximity matrix, which is based on the number of times pairs of observations end up in the same terminal node, and modified the technique to allow for multiple response variables. RF-k-NN has been found to be a useful approach for imputation of complex forest attributes such as forest structural stage (Falkowski et al. 2009b) and snag characteristics (Eskelson et al. 2012), tree-level forest inventory data (Falkowski et al. 2010, Vauhkonen et al. 2010), and also forest inventory variables at the plot level (Hudak et al. 2008) and pixel level (Chirici et al. 2008, McInerney et al. 2010). Other distance metrics have also been developed. Chirici et al. (2008) used three distance measures to predict growing stock volume in Italy, including a modification of the Mahalanobis distance metric in which the variance-covariance matrix was computed in a fuzzy way, giving greater weight to observations of the response that are closer to the mean (fuzzy distance), distance modified by multiple regression, and distance modified by nonparametric weights. The Finnish multisource national forest inventory uses a four-step genetic algorithm to optimize weights based on large-scale variation in forest attributes as well as spectral information (Tomppo and Halme 2004, Tomppo et al. 2009). Despite the popularity of nearest neighbor techniques in the literature, there are surprisingly few critical comparisons of distance metrics. Hudak et al. (2008) compared several distance metrics, concluding that the RF approach was superior to Euclidean and Mahalanobis distances as well as to MSN and GNN for predicting basal area and tree density in a temperate conifer forest. Chirici et al. (2008) found that, for their data, fuzzy distance outperformed the other distance metrics, especially with regard to maintaining the stability of the predictions. Vauhkonen et al. (2010) compared RF and k-MSN approaches to predicting a multivariate set of tree structural attributes in Finland and found only marginal differences.
In k-NN, the optimal number of neighbors to use in calculating the attributes of the target is not well defined. Although accuracy of the predictions tends to increase with increasing k, variation in the predictions decreases as a result of averaging of a large number of values, and the differences between observed and predicted values increase for extreme values (Tuominen et al. 2003, McRoberts 2012). In addition, Franco-Lopez et al. (2001) found a tradeoff between reductions in overall error and decreasing producer's accuracy with increasing k. Several authors have also noted that, because predicted values represent an average of observed values when k >1, the values predicted for the target may be unrealistic (LeMay and Temesgen 2005); this can be especially true with multivariate response variables, which, when averaged, may produce a combination of attributes that is not seen in the observed population. Using a single neighbor may be preferred in some mapping applications to retain the full range of values in the data but using more than one neighbor might be more appropriate for accurate large-area prediction (Franco-Lopez et al. 2001). In the case of k = 1, it is extremely important to have a representative sample of the population because predictions are limited to the range of observed data. In many applications, it is desirable to optimize k using the reference data. McRoberts (2012) gives several options for optimization schemes, including choosing the value of k that minimizes a specific assessment criterion (e.g., root mean square error [RMSE]) or that minimizes over- or underprediction; he also suggests a multivariate optimization criterion.
Neighbor weighting in k-NN is most commonly accomplished using weights that are inversely proportional to the statistical distance between the target unit and the reference unit. Studies evaluating alternative weighting methods or optimization of the weighting parameter are scarce. Using a case study of univariate volume prediction from Landsat data in the boreal forest, McRoberts (2012) found that complex optimization schemes did not appreciably improve predictions over equal neighbor weighting. However, McRoberts (2012) recommended further study in the case of multiple response variables and larger sets of predictor variables. Weighting has been shown to be useful for accounting for spatial gradients (e.g., altitudinal or latitudinal) in multivariable models for volume by species group in Finland (Tomppo and Halme 2004) and for classification of forest into species and productivity groups in Finland and northern Italy (Tomppo et al. 2009).
Nearest neighbor approaches have been applied widely to the problem of predicting forest inventory information from remotely sensed data. Their popularity stems from several advantages. First, nearest neighbor approaches are intuitively simple and can therefore be readily understood and applied. They are also nonparametric in that they make no assumptions regarding the underlying distribution of the data. However, they can be computationally demanding, especially with very large data sets as are likely to be encountered when forest attributes are mapped at fine resolutions across large geographic areas. A review of the advantages and disadvantages of alternative nearest neighbor approaches used in forestry applications was recently published by Eskelson et al. (2009).
Model Development
Clearly, a large number of choices are available to the modeler in both the range of available methods and the number of readily available predictor variables in the form of remotely sensed and geospatial data. Careful decisions about method and variables are essential in model development and are intricately linked to the objectives of each modeling problem. Fundamentally, decisions may be driven by the degree to which the modeler seeks to explain or understand the system versus generating accurate predictions (Caswell 1976). If the goal of the model is to gain understanding, then ease of interpretation and ability to discern between alternative and potentially causal explanations is key. When the goal is prediction, then decisions about method and variable selection may be driven principally by the degree to which they improve accuracy, with little concern for or attention to interpreting model dynamics. In both cases, discarding irrelevant and redundant predictor variables can be important.
Method Selection
Several authors have compared the effectiveness of various techniques for predicting forest composition and structure. Pierce et al. (2009) found that GNN generally performed better than linear regression models or classification trees in temperate marine forests but that regression performed better in the more complex forests of adjacent temperate steppe and Mediterranean forests, suggesting that the best technique may be reliant on the characteristics of the particular forest or landscape and that nonparametric methods are not always better than parametric methods. In the boreal forest, Maltamo et al. (2009) found that MSN more accurately predicted a number of tree-level size and quality attributes than did a regression approach. Eskelson et al. (2012) found that RF-NN provided more accurate predictions of snag density by decay class than a two-step parametric model but less accurate classifications of the presence or absence of large snags than logistic regression. Berterretche et al. (2005) compared SMA regression and two geostatistical methods for predicting LAI and concluded that SMA was the most practical option because it did not need a high density of reference measurements for application. ANNs have been found to outperform traditional statistical approaches such as regression and discriminant analysis in a variety of studies (Blackard and Dean 1999, Jensen et al. 1999, Vasconcelos et al. 2001, Foody et al. 2003). Using Landsat data to classify complex tropical rainforest types, Sesnie et al. (2010) found classification accuracies to be comparable for ANNs and RF. However, other studies have found performance of ANNs to be worse than (Serpico et al. 1996) or similar to (Niska et al. 2010) those of more conventional models.
As the variety of outcomes of these studies suggests, there is no consensus on which method is “best” in a general sense. The determination of best depends on several factors that may be subjective, including the purpose of the analysis, available resources, data characteristics, underlying biophysical conditions of the forest, and the type of response variables (Figure 1; Powell et al. 2010) found that RF provided more accurate predictions (i.e., lower RMSE) of aboveground biomass than SMA or GNN, but the variance ratio indicated that SMA and GNN were better choices for preservation of the observed variance in the predictions. Eskelson et al. (2012) found that RF-NN produced greater accuracy than a parametric method, where accuracy was defined as the fraction of predictions within 30% of the true value. However, RF-NN also had a greater RMSE, revealing a tendency for k-NN to produce large over- and underpredictions across the range of the responses. Thus, there are tradeoffs that need to be considered when an analytical approach is selected (Table 1).
A diagrammatic representation of the model development and validation/evaluation process. Inputs include the problem context, which affects method and variable selection decisions. The final product is model predictions.
Advantages and disadvantages associated with methods for modeling and predicting forest inventory attributes.


In addition to weighing the relative importance of the different validation measures, statistical considerations such as model structure and characteristics of the data are important. Linear regression techniques have the most restrictive data assumptions, several of which may be violated for many important forest attributes. However, regression has the advantage that the relationship between predictors and response is fixed in the coefficients; the resulting models can be recycled or embedded in scalable systems that are not dependent on context-specific samples, as in k-NN. This property is essential in common forest inventory applications, such as individual tree volume and allometric biomass equations (e.g., Jenkins et al. 2003), which, once developed, can be used in many subsequent contexts. ANNs are free of distributional assumptions and do not require knowledge or definition of any underlying processes, but as a black box approach, the relationships underlying the predictions are not known, so interpretation of the relationships is not possible. Decision trees and nearest neighbor methods are also distribution free and have several other advantages, but disadvantages may include selection bias or difficulty in selecting appropriate model parameters.
Consideration also should be given to whether the analysis will predict univariate or multivariate responses. Many studies have taken a univariate approach to predicting forest composition and structure at both large and small scales. For example, Iverson and Prasad (1998) used a number of climatic, soil, land cover, and topographic variables to individually predict the distributions of 80 tree species at the county level in the eastern United States with reasonable success for most species. However, although the univariate prediction approach was appropriate for the coarse resolution examined in their study, it could lead to biologically unrealistic predictions (i.e., unrealistic species combinations) at the finer resolutions typically required for operational use, suggesting that a multivariate approach might be the better option when composition for direct applications is mapped (Moeur and Stage 1995). Likewise, forest structural variables that are modeled independently could lead to predicted conditions that are biologically impossible, such as positive basal area predictions in areas of no forest cover. Although all of the methods discussed in this article can be adapted to handle multivariate problems, nearest neighbor methods probably handle these most easily and may also help preserve the covariance structure in the data when k = 1, if that is a desirable outcome.
Practical considerations related to computational time, resources, and analyst experience also play a role in method selection. For example, Moisen and Frescino (2002) compared five approaches with forest attribute imputation, including ANNs, geostatistical methods, CART, general additive models, and MARS; the authors found comparable accuracies among all five techniques when applied to real data sets, but computational time for the ANN was slowest and was described as having the “potential to be cripplingly slow” for some parameter optimization procedures. In the past decade, advances in computer technology and algorithm improvements (e.g., Finley and McRoberts 2008) have decreased the computational time required for many procedures such that it may not be the constraint it once was, but it can still be a relevant consideration. In addition, some techniques such as ANNs can be difficult to implement and understand, requiring a relatively steep learning curve, whereas others such as regression, decision trees, and nearest neighbor methods may be much more intuitive and easily implemented through widely available software.
Variable Selection
Useful models of forest inventory variables include predictor variables that are functionally or statistically related to the response(s) and discard those that are irrelevant or redundant. Clearly, distinguishing between these classes is essential when the purpose of modeling is to understand the underlying system (Rykiel 1996, Froese and Robinson 2007). However, the inclusion of irrelevant or redundant (correlated) variables can also have negative effects when models are developed for prediction. Under classic OLS, models remain unbiased even if irrelevant variables are included, but correlated variables can render models sensitive to the particular sample of calibration data. Many other estimation models have been shown to be sensitive to irrelevant or redundant predictor variables as well (Packalén and Maltamo 2007, Niska et al. 2010, Banskota et al. 2013a). For example, in a k-NN imputation approach, McRoberts et al. (2002) found that the inclusion of covariates that were unrelated to the response variables decreased the overall prediction accuracy, and Banskota et al. (2013b) and Pal and Foody (2010) demonstrated the necessity of conducting an analysis of variable importance before OLS regression and SVM classification, respectively. Even approaches such as RF, which can deal with high-dimension data sets, are negatively affected by the inclusion of redundant variables in the same prediction model (Evans and Cushman 2009, Falkowski et al. 2010, Murphy et al. 2010). Variable selection can be particularly challenging when modeling forest inventory attributes, because there are typically a large number of readily available covariates in the form of remotely sensed and geospatial data.
Variable selection can be made on the basis of previous studies, through the implementation of a variable or model selection algorithm (Packalén and Maltamo 2007, Hudak et al. 2008, Lister 2009) or by conducting feature extraction techniques such as principal components analysis (PCA) or CCA (Lefsky et al. 2005a). Feature extraction techniques transform the original, high-dimension variable space into a subspace with a lower dimensionality, while preserving a high degree of the original variation (Sotoca et al. 2007). Several feature extraction techniques such as PCA (Pu and Gong 2004), singular value decomposition (Philips et al. 2009), and maximum noise fraction (Green et al. 1988) have been used in forest inventory and assessment applications. However, the use of linear combinations of predictors in approaches such as PCA may limit model interpretability because the individual predictors are combined in composite variables (Cohen et al. 2003). If the goal of the modeling effort is attaining accurate predictions rather than clear interpretability of the relationships, then techniques such as PCA might be useful. Another major limitation of these techniques is that transformations are carried out independent of response variables and are based on the global covariance matrix, which can blur distinguishable features related to the response variables (Miao et al. 2007). Conversely, CCA transforms independent variables in a manner that retains their covariance structure, which is useful in the prediction of the response variables (Lefsky et al. 2005a). The wavelet transform, which provides a multiscale representation of the original data, has been used for feature extraction in forestry applications, including forest canopy structure identification (Bradshaw and Spies 1992), crown closure mapping (Pu and Gong 2004), tropical species identification (Zhang et al. 2006), and pine species classification (Banskota et al. 2011). However, as with other feature extraction techniques, wavelet analysis lacks an automated method to select features and still needs a suitable technique for selecting a useful subset of the transformed variables.
Most studies use some form of variable or model selection algorithm to choose predictor variables. Stepwise variable selection is commonly used in OLS regression (e.g., Lefsky et al. 2001, Hudak et al. 2006, Næsset 2007). Several stepwise procedures (e.g., forward, backward, and mixed) and criteria (e.g., F-test, Akaike information criterion, Bayesian information criterion, and Mallows' Cp) can be used to select variables (Sokal and Rohlf 2012). Alternatively, the best subset techniques exhaustively compare all possible subsets and return the best subset of variables according to some user-defined criterion. For example, Hudak et al. (2006) used stepwise multiple regression and best subsets regression to predict basal area and tree density from multispectral satellite and airborne LiDAR data. Decision tree or ensemble algorithms such as RF have a built-in ability to identify relevant predictor variables based on variable importance measures such as “permutation importance” or “Gini importance” (Strobl et al. 2008). Although RF is considered to be insensitive to the number of predictor variables, removal of irrelevant variables can improve the performance of the algorithm on retraining (Svetnik et al. 2003). For example, both Hudak et al. (2008) and Falkowski et al. (2010) used similar iterative stepwise looping procedures that are analogous to backwards stepwise multiple regression to identify candidate predictor variables in RFs. Packalén et al. (2012) term this “variable selection by random forests.” The RF method can also potentially act as a variable selection tool for other classifiers or regression techniques (Hudak et al. 2008, McInerney et al. 2010).
On performing variable selection, some studies have indicated the relative performances of each predictor or explored the reasons for particular variables being identified as important. This information is potentially very valuable for future efforts to model forest inventory data. Although relationships and important predictors may vary, depending on geographic region or scale, it is useful to have a starting point that could reduce time and effort in future model development and that could potentially (if consistencies are found) form the framework of a theory. Furthermore, some studies have suggested that variable selection may be more important than model selection when the goal is predictive accuracy. For example, Hudak et al. (2008) found that variable selection using RF improved the model fit for every nearest neighbor distance metric they compared. Packalén et al. (2012) found that optimal variable selection always improved model accuracy and in most cases was more important than their choice of nearest neighbor method.
Selecting Responses and Data Sets
Spatial forest inventory modeling can be complex and resource consuming. Informed selection of appropriate predictor variables a priori could increase the efficiency of the modeling process and the accuracy, utility, and effectiveness of the resulting model and map (Niska et al. 2010). We include here a summary of variables deemed important enough to have been included in many models and try to indicate when certain variables have been specifically determined to be more or less influential for the prediction of specific forest inventory attributes (also see Table 2).
Common predictors of forest compositional and structural inventory attributes.

Ancillary variables refer to the variables related to climate (e.g., precipitation and temperature), topography (e.g., location and elevation and landform), soil, and land cover, individually or in combination. Image derivatives refer to the bands or features derived from different transformations (e.g., band ratios and textures). The superscripts in the cited references denote the corresponding predictor variables used in the studies.

Ancillary variables refer to the variables related to climate (e.g., precipitation and temperature), topography (e.g., location and elevation and landform), soil, and land cover, individually or in combination. Image derivatives refer to the bands or features derived from different transformations (e.g., band ratios and textures). The superscripts in the cited references denote the corresponding predictor variables used in the studies.
Reference and Feature Data
National forest inventories (NFIs) have been used as sources for reference data in forest attribute mapping efforts worldwide (e.g., Tomppo et al. 1999, 2008, McRoberts et al. 2002, Gjertsen 2007, Hirata et al. 2009, McRoberts 2009b, Teissier du Cros and Vidal 2009). Reference data may also come from ground samples obtained from other government or industry-initiated forest inventories (e.g., Pierce et al. 2009, le Maire et al. 2011) or independent project-based samples (e.g., Zheng et al. 2004, Falkowski et al. 2010, Niska et al. 2010). Pierce et al. (2009) combined field inventory data from several federal sources to map wildland fuels and forest structure in temperate forests in an effort to provide needed information for fire risk assessment and fuel treatment planning. Usually, reference data come from field measurements of actual conditions, but in some instances they may be derived from other sources, especially when field measurements would be difficult or expensive to obtain. For example, indices (e.g., successional stage and defoliation) derived from very fine-resolution imagery or airborne LiDAR data have been used as sources for reference data (Schroeder et al. 2007, Eklundh et al. 2009, McRoberts et al. 2010a).
Using existing data from NFIs can reduce costs and help ensure rigor in sampling design and consistency in data collection. In many instances, the sampling schemes have been updated to include variables relevant to current international issues such as biodiversity, sustainability, acid deposition, and carbon accounting (McRoberts et al. 2010b). However, not all countries have NFIs, or if they do, they may contain insufficient data for some purposes. Further, forest inventories may be outdated and thus of less operational utility, especially in developing countries with limited financial resources. Depending on the purpose and desired map properties, there may also be problematic spatial or temporal mismatches, including global positioning system (GPS) position errors (McRoberts 2010). Confidentiality constraints that fuzz the precise spatial locations of the reference data can also prove a hindrance to fine-resolution analytical problems. For example, Prisley et al. (2009) found the perturbed locations associated with plots of the USDA Forest Inventory and Analysis program (Bechtold and Patterson 2005) unsuitable for analyses involving topographic data at resolutions of 10–30 m.
Ancillary or “feature” data (sensu McRoberts 2012) are used as predictor variables and can originate from many sources, although the most common are from remote sensing. Other sources include field-based mapping efforts, such as soil surveys, and climate layers derived from interpolation of weather station data (e.g., Daly et al. 2008). Remotely sensed data are popular because they are spatially continuous, typically cheaper to obtain than field data and often have high temporal resolution. Passive optical remote sensors include Landsat, moderate resolution imaging spectroradiometer (MODIS), advanced very high resolution radiometer (AVHRR), and airborne visible/infrared imaging spectrometer (AVIRIS); because these have long histories and are widely available, they are especially useful for monitoring and offer unique possibilities for homogeneous global assessment systems. However, because passive sensors are not able to represent three-dimensional spatial patterns, they have historically been more useful for characterizing horizontal forest structure than vertical structure, especially in heterogeneous forests or those with high biomass (Drake et al. 2002, Lefsky et al. 2002, Powell et al. 2007). More recent methodological advances such as the use of multitemporal data in combination with canopy shadowing and light attenuation relationships have made optical data more practical for characterization of vertical forest structure (Wolter et al. 1995, Lefsky et al. 2001, Xiao et al. 2002, Powell et al. 2010). Some studies have demonstrated that predictions of forest attributes can be improved in heterogeneous forests by using multitemporal, multisensor, or ancillary data (e.g., Lefsky et al. 2001, Xiao et al. 2002, Sesnie et al. 2008, 2010, Helmer et al. 2012).
To achieve more detailed predictions of inventory attributes, higher resolution data sources may be needed. Very high spatial resolution data from optical airborne or satellite sensors or airborne LiDAR show promise for prediction of both stand- and tree-level attributes, and research for operational purposes is increasing (Hudak et al. 2008, Næsset and Gobakken 2008, Falkowski et al. 2009a, Gleason and Im 2011, Næsset et al. 2011, Gobakken et al. 2012). For example, Mora et al. (2010) used panchromatic QuickBird satellite data to identify individual tree crowns and identify the major tree species present in a boreal forest. Breidenbach and Astrup (2012) used a canopy height model derived from optical airborne imagery for small area estimation of forest volume, height, and biomass. Gobakken et al. (2012) concluded that the use of airborne LiDAR was feasible for regional prediction of aboveground biomass in Norway. LiDAR has some advantages; unlike passive sensors, LiDAR has the ability to penetrate vegetation and clouds. Airborne LiDAR has been implemented or studied for use in an operational inventory in diverse forest types across the globe (e.g., Parker and Evans 2004, Jensen et al. 2006, Tickle et al. 2006, Næsset 2007, Hilker et al. 2008, Stephens et al. 2012). Despite the potential of airborne LiDAR, cost and availability continue to limit its operational feasibility (Wulder et al. 2008, Wolter et al. 2009), although it has been found to be cost-effective in some studies (e.g., Eid et al. 2004, Hummel et al. 2011). LiDAR sampling, in contrast to wall-to-wall LiDAR surveying, could prove to be an effective compromise for some operational inventory needs (Wulder et al. 2012). In the future, further development of spaceborne LiDAR systems could ultimately provide systematically collected forest canopy measurements in support of operational forest inventory and management.
Because of the limitations associated with acquisition of airborne LiDAR data, it is of interest whether Landsat or other optical remote sensing data are useful for predicting forest attributes at fine resolutions, particularly in heterogeneous forests. Unfortunately, there have been few studies that address this topic. Generally, optical remote sensing data tend to become poor predictors of forest biomass and vertical structure after the canopy closes or when vegetation is dense (Foody et al. 2001). For example, Watt et al. (2004) concluded that airborne LiDAR metrics were able to provide accurate predictions of forest height even after canopy closure in densely stocked conifer stands, while metrics from Landsat ETM+ and high-resolution IKONOS imagery were not. Nevertheless, Foody et al. (2001) obtained biomass predictions that were strongly correlated with measured values using Landsat data in tropical forests; they emphasized the importance of using all the available spectral data instead of vegetation indices such as NDVI that only incorporate some of the spectral wavebands. le Maire et al. (2011) were able to develop useful nonlinear regression models for predicting stand height and volume in eucalyptus stands, by using a time series of MODIS reflectance data to predict age from NDVI, and then predicting height and volume from models incorporating age, NDVI, and bioclimatic variables as predictors. These studies suggest that optical data may be useful if care is taken to incorporate a time series of images and to use appropriate indices. Few studies directly assessed the ability of Landsat versus airborne LiDAR to predict forest attributes at relatively fine resolutions. Hill et al. (2010) found that certain Landsat spectral indices were strongly correlated with canopy height as determined from airborne LiDAR measurements in a tropical rainforest, suggesting that the Landsat data could be used to predict height over a large area. In a similar study, Pascual et al. (2010) also found relationships between Landsat indices and airborne LiDAR canopy height. Hummel et al. (2011) examined a wider variety of forest structural attributes in a heterogeneous temperate forest landscape and found that imputations from Landsat data were not able to provide stand-level predictions of height, basal area, and volume (i.e., merchantable and total cubic meters) that were similar to estimates obtained from field measurements, although biomass and tree density estimates did not differ from stand means calculated from field data (Hummel et al. 2011). Airborne LiDAR data, however, provided estimates that were more consistent with field measurements. This finding seems to suggest that it may be possible to use Landsat data for prediction of forest attributes at operational (stand) scales, at least for some attributes, but that airborne LiDAR data may be more consistently useful. However, if tree-level or very fine-resolution information is desired, Landsat data are generally not appropriate (Wulder et al. 2004).
Spatial Mismatch
Although registration errors are often not addressed explicitly, several studies have demonstrated that errors caused by poor georeferencing of remotely sensed imagery and positional errors associated with ground reference points may contribute substantially to prediction error (Næsset et al. 2011, Valbuena et al. 2011, Bright et al. 2012). McRoberts (2010) found that a mean registration error of 30 m (i.e., the width of one Landsat pixel) resulted in more than half of the subplots in his study being associated with the wrong pixel. Errors associated with GPS units that are used to locate ground reference points can have similar effects. The GPS units used by the USDA Forest Inventory and Analysis program have been reported to have errors ranging from 5 to 20 m (Cooke 2000). The combined uncertainties in GPS locations, image coordinates, and geometric errors can result in greater discrepancies among satellite images, airborne LiDAR, and field plots, especially in steep, mountainous areas and high-density forests (Valbuena et al. 2011, Bright et al. 2012, Jung et al. 2013). In a simulation study motivated by prior work with Landsat data, Patterson and Williams (2003) found that small registration errors (a 1 or 2 pixel shift) induced minimal bias but dramatically inflated the variance of estimates of forest area. These types of errors could result in forest attribute maps that are of questionable utility in the originally intended contexts. However, few recommendations exist in the literature on how to compensate for or correct the errors caused by image-to-plot mismatch. One such remedy is reported by Halme and Tomppo (2001), who showed that reassigning the locations of field plots to Landsat pixels within a 7 × 7 pixel window, using a weighted function of correlations among field and spectral values, can produce substantial reductions in prediction RMSE of volume per unit area using the k-NN method.
Relatively small plot location errors probably do not lead to large prediction errors when integrated with coarser-resolution raster data sets, such as MODIS. However, the many sources of spatial mismatch do raise questions about the utility of combining field plots with very fine resolution remotely sensed data such as LiDAR in operational mapping efforts. Gobakken and Næsset (2009) examined the impact of interaction between plot position error and plot size on the accuracy of least-squares predictions of stand height, basal area, and volume and found that larger field plots provided greater accuracy for a given position error. They also observed that LiDAR metrics were less sensitive to variations in plot size and coregistration error in young, dense, and homogeneous canopies than in open and heterogeneous stands. Gobakken and Næsset (2009) recommended that for poor sites where there are normally few stems it is important to have large plots and accurate plot positions to obtain precise estimates; this is not likely to be difficult to attain as relatively precise GPS positions would be expected in such forests compared with dense forests. Similar results were obtained by Mauro et al. (2011) while evaluating the effect of positioning error on tree height distribution measurements carried out in circular plots. They found that GPS positional errors had a greater impact on LiDAR metrics when extracted from small plots (6 m radius) than from larger sized plots (10 m radius). Using a Monte Carlo approach, Frazer et al. (2011) further demonstrated that large plots were substantially more resilient to coregistration error than were small plots.
Response Variables
Forest inventories collect information on a multitude of variables, the number of which is constantly increasing as new uses are found. Whereas traditional inventory variables have been those relevant to timber management, such as composition, diameter, and height, an increasing focus on nontimber objectives has broadened the scope to include data useful for applications such as forest health monitoring, biodiversity conservation, and carbon accounting, simultaneously increasing the costs and time involved in collecting the data (McRoberts and Tomppo 2007). Here, we examine forest inventory variables frequently predicted via these mapping models and discuss the ancillary variables commonly found to be useful as predictors (Table 2).
Composition
Information on forest composition is basic to almost all operational forest management and planning applications. One of the main uses of remote sensing data in forestry has been to classify forest cover types (e.g., Rogan et al. 2008, Ruefenacht et al. 2008, Knorn et al. 2009, Mora et al. 2010) (Table 2). Traditionally, classification of forest composition has taken the form of relatively broad type classes. For example, Ruefenacht et al. (2008) classified 142 forest types across the United States using a large number of spatial predictors, including MODIS satellite data and derived indices, soils, climate, topographic, and ecoregion variables. Such broad forest or cover type classes may, however, obscure the heterogeneity in actual ground conditions. Some remotely sensed data collected by satellite and aerial platforms have proven to be of use in predicting forest composition at detailed resolutions or for distinguishing individual tree species distributions. For example, Mora et al. (2010) used tree crown metrics from panchromatic Quickbird-2 images to delineate the four major species in forest stands in a boreal forest. Wilson et al. (2012) mapped tree species distributions in continental and subtropical forests using MODIS, climate, topographic, and ecoregion variables.
Accurate prediction of species composition for complex or diverse stands tends to be more difficult than for more homogeneous stands (e.g., Temesgen et al. 2003, Pierce et al. 2009, Helmer et al. 2012). Coarser resolution sensors may not provide the spatial detail needed to capture compositional diversity; while the broad spectral differences between coniferous and deciduous species are generally clear, distinguishing species within these classes from each other is much more difficult given the spectral resolution of many sensor systems. Therefore, distinguishing attributes of forests with high species diversity may require high spatial and spectral resolution data sources. Indeed, recent work by Asner and Martin (2009) has highlighted the potential for use of airborne, high spatial resolution, hyperspectral imagery to map taxonomic diversity in a highly diverse tropical forest. Although such approaches have merit, the relatively high costs and complexities associated with data acquisition, processing, and analysis have limited the application of hyperspectral remote sensing in the field of forestry.
Structure
Descriptions of forest structure are necessary for strategic and tactical silvicultural, forest health, restoration, conservation, and reporting objectives, as well for assessment of economic and noneconomic forest values. Remotely sensed and ancillary data have been used to predict a large number of forest structural attributes (Table 2) at both the plot and stand levels, including diameter (Pierce et al. 2009, Falkowski et al. 2010), height (Hall et al. 2006, Pierce et al. 2009), age (Jensen et al. 1999, Moisen and Frescino 2002), tree density (Franco-Lopez et al. 2001, Ingram et al. 2005, Pierce et al. 2009), basal area (Ingram et al. 2005, Pierce et al. 2009, Falkowski et al. 2010), volume (Franco-Lopez et al. 2001, Reese et al. 2002, Mäkelä and Pekkarinen 2004, Chirici et al. 2008, Niska et al. 2010), biomass (Moisen and Frescino 2002, Labrecque et al. 2006, Powell et al. 2010, Tuominen et al. 2010), LAI (Turner et al. 1999, Berterretche et al. 2005), and crown cover or canopy closure (Moisen and Frescino 2002, Hall et al. 2006). In addition to attributes of interest for operational forest management, ecologically meaningful variables are often predicted, including snag density (Martinuzzi et al. 2009, Eskelson et al. 2012), coarse woody debris (Pesonen et al. 2008, Hudak et al. 2012), tree size inequality (Valbuena et al. 2013), and tree competition (Pedersen et al. 2012), among others.
Many forest structure variables are highly correlated, e.g., age, height, diameter, and volume. Therefore, it makes sense, when possible, to use inventory variables to predict other inventory variables. For example, le Maire et al. (2011) found that age-related variables explained substantial variation in volume and height of eucalyptus plantations in southern Brazil, though addition of NDVI and climatic variables improved these values significantly. Leduc et al. (2001) used stand age, site index, height, and density to predict diameter distributions in subtropical conifer plantations. Although useful, the lack of continuous spatial coverage of the predictors themselves is a problem when one is attempting to use imputation or interpolation to create continuous predictive output layers of inventory attributes.
Fortunately, remotely sensed data can be used to predict many forest structural attributes (Table 2). Spectral data from passive sensors (e.g., Landsat) have often been used. For example, Pocewicz et al. (2004) found NDVIc (a mid-infrared corrected version of NDVI that has improved sensitivity to plant area and canopy closure) to be a strong predictor of LAI in complex temperate mountainous forests. Fiorella and Ripple (1993) found strong negative correlations between Landsat TM spectral bands and stand age in maritime coniferous forests, but also found that the structural index (TM 4/5 ratio) had the highest correlation with stand age. Similarly, Hall et al. (2006) found negative correlations between two structural variables (stand height and crown closure) and Landsat ETM+ spectral bands, with bands 4 (near infrared) and 5 (mid-infrared) being most highly correlated with height and crown closure. Ingram et al. (2005) found reflectance of Landsat bands 3, 4, 5, and 7 to be useful predictors of basal area in tropical forests, whereas NDVI was not helpful.
Airborne LiDAR metrics have become increasingly popular, particularly with the rapid development of the technology. In the boreal forest, Næsset (2007) found that LiDAR-derived height and canopy density metrics were effective predictors of the variability in ground-measured values of six structural variables at the stand level, including mean and dominant heights, diameter, tree density, basal area, and volume; they also found that the predictions obtained from the models using airborne LiDAR data did not differ significantly from ground-measured values, indicating the potential for operational use in forest inventories. Airborne LiDAR metrics are especially useful in predicting height of dominant trees (Hirata et al. 2009, Maltamo et al. 2009). For example, Richardson and Moskal (2011) found LiDAR metrics useful for predicting density of taller trees, but not shorter trees, which they suggested might be a limitation of the technology. Falkowski et al. (2009a) found mean vegetation height and canopy cover metrics derived from airborne LiDAR measurements to be especially useful in classifying forest structural stages, and Hudak et al. (2008) found topographic and canopy structure variables derived from LiDAR measurements to be helpful when modeling basal area and tree density. Lefsky et al. (2005a) demonstrated that the vast number of metrics derived from large footprint LiDAR data could be grouped into three nonredundant factors (mean height, cover or leaf index area, and height variability) related to unique components of forest structure.
Biomass
Forest biomass is of particular current interest because of its relationship with climate (i.e., carbon sequestration and emissions) and as a potential alternative fuel source (i.e., biofuels), although biomass inventory also aids understanding of forest productivity, nutrient allocation, and fuel accumulation (Brown et al. 1999, Zhang and Kondragunta 2006, Saatchi et al. 2007). Biomass estimates are particularly relevant for carbon accounting programs such as measuring, reporting, and verification systems under the auspices of the United Nations Framework Convention on Climate Change and in quantifying impacts of implementing activities to reduce emissions from deforestation and degradation in developing countries (Global Observations of Forest and Land Cover Dynamics 2012). These programs require field measurements, which can be difficult to obtain, especially for developing countries with limited resources. However, remote sensing can facilitate carbon monitoring and measurement efforts by helping to stratify land use types for better representation of field samples, by direct prediction of structural parameters or by modeling approaches that can be used to calculate biomass through allometric relationships (Goetz and Dubayah 2011). Although total carbon pools include several components (e.g., aboveground biomass, belowground biomass, dead wood, litter, and soil organic matter), aboveground biomass (AGB) is the most commonly and easily measured (Zheng et al. 2004, Hall et al. 2006, Blackard et al. 2008) and is often used to predict carbon stored in other pools.
The most common approach used for biomass involves deriving biomass values for reference data, usually from conventional field-based forest inventory programs and then developing prediction models that use remote sensing data as feature variables (e.g., Baccini et al. 2004, Houghton et al. 2007, Powell et al. 2007, Nelson et al. 2012). Studies have shown that forest complexity may affect biomass predictions from optical data. For example, using Landsat spectral data, Powell et al. (2007) were able to build stronger biomass models in younger, coniferous forests than in older, deciduous forests. Houghton et al. (2007) predicted and mapped biomass in boreal forests by developing a relationship between ground-measured wood volume and MODIS data; they found that spectral saturation was less influential than different forest definitions on variation in biomass predictions and suggested that use of finer-resolution data, multitemporal data, and ancillary data on climate, soils, and topography could help to improve biomass models in areas where forest structure or composition is less well-defined. Indeed, Baccini et al. (2004) found nonlinear relationships between aboveground biomass and MODIS spectral data, precipitation, and elevation; incorporating these variables into a biomass model gave good results. Hall et al. (2006) found that modeling biomass and volume using height and crown closure (predicted from Landsat ETM+ data) was better than direct prediction of biomass and volume from Landsat data, compared with field observations. Most biomass prediction studies examined here tended to truncate the distribution by underpredicting in areas of high biomass and overpredicting in areas of low biomass.
LiDAR metrics show great promise for modeling biomass components. In an early study, Drake et al. (2002) successfully used four LiDAR height and ground return metrics to predict AGB in dense tropical forests. Næsset and Gobakken (2008) modeled AGB and belowground biomass, respectively, in models using metrics derived from airborne LiDAR as predictors in a boreal forest. Nelson et al. (2012) augmented forest inventory ground measurements with LiDAR-derived height and density to predict AGB and found that profiling LiDAR surveys could provide estimates comparable to ground measurements. Zhao et al. (2009) demonstrated an approach to scale-invariant biomass prediction using LiDAR-derived canopy height distributions and canopy height quantile functions.
Biomass is rarely, if ever, directly observed in the reference data; instead, biomass at the scale used in modeling is commonly derived from tree-level measurements using a variety of allometric approaches. Usually, whole tree or tree component biomass is estimated as a function of diameter, and plot-level unit area estimates are calculated by summing (e.g., Zheng et al. 2004, Hall et al. 2006, Houghton et al. 2007, Powell et al. 2007). Biomass components may be available in NFI databases with no special processing required before model development (e.g., Blackard et al. 2008). Alternatively, biomass can be derived directly at the stand level using volume-to-biomass multipliers (Baccini et al. 2004) or in a two-step process in which foliage is predicted from feature variables and biomass is derived using allometric relationships with foliage (Zhang and Kondragunta 2006). Zhao et al. (2009) note that the biomass prediction method is often sensitive to the scale of the original reference data or the feature data used in the model; they proposed a scale-invariant approach in which biomass is modeled as a function of canopy height distribution metrics from fine-scale LiDAR data. Of note, Zhao et al. (2009) derived the reference data for their model by aggregating tree-level biomass predictions from published allometric functions of diameter, and diameter was derived from fine-scale LiDAR data.
Faux Inventory or “Tree Lists”
Increasingly detailed forest inventory information is needed for many objectives. For example, technological advances have improved our ability to predict forest growth and dynamics through simulation models that require tree lists or stand tables for initialization (e.g., the Forest Vegetation Simulator) (Dixon 2002). Ultimately, detailed maps of forest structure and composition that provide tree lists allow the greatest flexibility for varied uses because they can be manipulated or aggregated as needed to suit the objectives of a particular analysis. Another advantage of predictive modeling of tree lists is that they are equivalent to predicting compatible multivariate inventory attributes; i.e., if tree lists are imputed as if they were plot-based inventory data (“faux inventory”), then derived attributes such as basal area, volume, and density are automatically compatible. Separate predictions of composition, structure, or biomass may not be required.
Tree lists can be approximated through diameter distributions. For example, Maltamo et al. (2007) modeled diameter distribution using a Weibull function and LiDAR data as predictor variables. However, distributions are approximations and are incomplete compared with inventory data that routinely include multiple attributes (e.g., diameter, height, species, and quality). An alternative is to use methods that impute entire tree lists. Temesgen et al. (2003) used MSN to predict tree lists for complex temperate forest stands with aerial data on species composition, crown closure, elevation, biogeoclimatic ecosystem classification zones, height, age, and site class as predictors. When subsequently used as inputs to a growth- and-yield simulator, the modeled tree lists produced average stand volume predictions that closely approximated the volume obtained using the actual tree list, although results were sometimes less accurate for individual stands. Wallerman and Holmgren (2007) used LiDAR and optical image data as predictors with a similar approach, using MSN, and achieved sufficiently accurate estimates of volume and density in a boreal forest. Building on these results, Duvemo et al. (2007) evaluated the cost/benefit tradeoff for a strategy using field plots against imputation and concluded that, for some interest rate scenarios, imputation was the superior approach both in terms of cost and risk. More recently, Falkowski et al. (2010) predicted tree-level forest inventory data from LiDAR height and topographic metrics, achieving high accuracy overall although the error was much higher for inventory metrics that incorporated small trees. Advances in LiDAR technology may enable rapid advances in tree list modeling in the future; identification of individual tree crowns from high pulse density LiDAR data may be operationally feasible, and tree level variables may be determined directly from the LiDAR point cloud or based on various other LiDAR features that are extracted for the tree segments (Lindberg et al. 2010, 2013, Hyyppä et al. 2012, Vastaranta et al. 2012). Overstory trees will still occlude understory trees from view of the sensor, impeding the prediction of a full tree list, though some approaches have been proposed to address this issue (e.g., Maltamo et al. 2007, Breidenbach et al. 2010).
Model Validation and Assessment
Model development and validation are closely linked. Once a method and appropriate variables have been identified, the model can be run and its performance assessed using a number of techniques. Results from model testing can inform strategies to refine the initial model or replace it entirely; therefore, model building is often considered an iterative process (Rykiel 1996, Froese and Robinson 2007). Despite much agreement on the utility of model validation, the literature is replete with alternative approaches and arguments about their relative appropriateness. There is some consensus that validation can at least be divided into questions about realism and questions about predictive accuracy (Caswell 1976, Rykiel 1996, Robinson and Ek 2000, Froese and Robinson 2007). Especially with the latter, comparison of predictions with measurements of the target population is the usual basis for model validation.
Ideally, data-based model validation is done with independent data drawn from the population to which the model will be applied. Here we define independent as being entirely separate from the data used to train or calibrate the model. This is in contrast to cross-validation, or “data splitting,” in which a single data set is split into two subsets, one used for training and the other for validation. Use of independent data has the advantage of generating estimates of accuracy that are free of both the model and the training data. The model is tested in the context of interest; i.e., on the population on which it is to be used, not the population from which it was constructed. If independent data are not available, cross-validation may be used instead. This approach has been argued to provide no information beyond fit and lack of fit statistics derived from the model fit, at least for regression models, and may merely demonstrate that the subsets come from the same population (Kozak and Kozak 2003). However, with cross-validation accuracy can be determined without making any assumptions, parametric or otherwise, about the distribution of model coefficients or the training data. This allows for model-free estimates of prediction accuracy and permits statistical inferences about accuracy to be made for nonstatistical models (Robinson and Ek 2000), which is particularly useful for ANN black boxes.
Sometimes splitting the observations into training and testing groups is not feasible (i.e., very small data sets), and independent data are lacking. In that case, double cross-validation (also called k-fold cross-validation) may be used to estimate the error. Double cross-validation is an iterative procedure that involves dividing a data set into k subsets, training the model using all subsets but one, and then testing the model on the subset that was left out (Kozak and Kozak 2003). This process repeats until all subsets have been used once as the test set and then averages the errors obtained from the repeated tests. In this way, all of the observations are used to train the model and to test the model, maximizing the information gleaned from the data set. Double cross-validation has been shown to be nearly unbiased with respect to estimating error, although its error estimates can be highly variable (Efron and Tibshirani 1997). Bootstrapping, which is similar to cross-validation except that error is estimated from subsamples drawn with replacement from the original sample, has also been used widely to estimate error and can sometimes provide better error estimates than cross-validation, generally resulting in smaller variability although bias tends to be larger (Efron 1983, Efron and Tibshirani 1997, Lendasse et al. 2003). However, bootstrapping does not work equally well for all analytical methods; although it is a popular method of error estimation for ANNs, it works less well with decision trees (Breiman et al. 1984, Kohavi 1995).
Categorical Response
Among the scientific community, no broadly accepted standardized methods exist for accuracy assessment or reporting of categorical data, although agencies may develop standards for accuracy in certain contexts (Foody 2002). However, most studies report an overall accuracy measure such as the percentage of correctly classified cases that quantifies how the predictions as a whole compare with the observed data. Because overall accuracy gives no information on how error is distributed among the classes, measures of producer's and user's accuracies are also needed to provide a better assessment of the utility of the model/map for different purposes (Foody 2002, Fassnacht et al. 2006, McRoberts et al. 2010a). Producer's accuracy refers to errors of omission, revealing how many of the test observations of a particular class are labeled correctly, whereas user's accuracy refers to errors of commission, which reveal how many of the classified pixels in a given class are actually what they say they are. Reporting an error (or “confusion”) matrix is generally advised to fully disclose relevant information. Although normalization of the error matrix has been a common practice for almost 30 years (Congalton et al. 1983), this practice has been criticized increasingly in recent years for interpretation problems and because bias in parameters estimated from the normalized matrix has been demonstrated (Stehman 2004, McRoberts et al. 2010a). Therefore, it is generally advised that the raw error matrix be reported (Foody 2002). There is no universal standard for deciding whether or not a classification can be considered “accurate enough,” but some authors have provided guidance with respect to acceptable levels of accuracy (e.g., Thomlinson et al. 1999). Even so, whether or not a classification is accurate enough often depends on subjective decisions related to the prospective use of the map (Fassnacht et al. 2006).
The error matrix, as an accuracy measure, is not without criticisms (Foody 2002). It typically assumes that classes are discrete and that all cases (pixels, stands, and others) belong entirely in a single class. However, heterogeneity often exists at subpixel resolutions (i.e., mixed pixels), and sometimes a pixel could belong to more than one class. Modification of accuracy measures to include “fuzzy” class boundaries may help with this (Townsend 2000, Fassnacht et al. 2006). In addition, the severity of misclassification may differ among classes (e.g., a high misclassification rate may be less important among similar classes but extremely important among other classes). Thus, interpretation of the error matrix and derived accuracy measures is not always straightforward. Another perceived issue with the error matrix is that it does not take into account cases that have been allocated into the correct class purely by chance (Foody 2002). The κ coefficient (Cohen 1960) has become a popular statistic for estimating the difference between actual and chance agreement. Because variances can be calculated, differences between two κ coefficients can be tested statistically, although often the statistical assumptions (e.g., independence of the samples) may not be met (Foody 2002, 2004). Recent criticism of the κ coefficient suggests that the adjustment for chance agreement is conceptually flawed, and, furthermore, whether a classification is correct by chance or design is irrelevant and misleading to map users (Stehman and Foody 2009). In their evaluation of a wide variety of accuracy measures, Liu et al. (2007) found a high correlation between the κ value and overall accuracy, suggesting that κ was simply a downscaled version of overall accuracy; they recommended using overall accuracy as a primary measure and advised against using the κ coefficient despite its popularity. The error matrix is increasingly considered to be inadequate for full accuracy reporting, and current thinking emphasizes the need for techniques for constructing inferences in the form of confidence intervals (McRoberts 2010). Such techniques are of particular value for comparison and evaluation purposes as estimation protocols shift from plot-based methods to geospatial methods to accommodate operational needs and limitations.
Continuous Response
For continuous data, measures of precision (unsystematic error) and bias (systematic error) are considered essential for assessing the accuracy of modeled data. Most studies commonly report fit statistics, either the RMSE or root mean squared difference (RMSD) (Stage and Crookston 2007), sometimes augmented by other measures (e.g., R2 or relative RMSE). RMSD is an analog to RMSE commonly reported for nearest neighbor models; the notational distinction is important because the RMSD includes different components of error than the usual RMSE reported in regression (for details, see Stage and Crookston 2007). None is an ideal measure, however.
R2 is a measure of the variation explained by the least squares regression model but does not actually measure the difference between two data sets (Ji and Gallo 2006). RMSE is not standardized and therefore has units identical to the quantity being predicted, so it is often not comparable among different models (McRoberts 2009a, Riemann et al. 2010). Furthermore, RMSE is a metric of precision only for an unbiased estimator; otherwise, it incorporates both. It is important to keep in mind that both R2 and RMSE have been used as fit statistics for trained models, but they can also be calculated for comparisons of predicted to measured values as a better measure of how well the model predicts new cases. Graphical plots of predicted versus actual values of a variable give a visual representation of bias, often clearly showing where over- or underprediction occurred. Prediction bias is calculated by taking the average of the differences between the observed and predicted values. The bias measure can be tested statistically to determine whether it is significantly different from zero, and, if so, under- or overprediction can be determined. For example, Niska et al. (2010) calculated a test statistic, tbias, as bias/(s/n − 2), where s is the SD of the differences between the actual and predicted values. This statistic was used to determine whether significant bias was present in models developed to predict stem volumes at plot and stand levels.
Taken alone, these precision and bias measures are not sufficient to assess a modeled spatial data set. In addition to error associated strictly with the modeling process and variability in the data sets, uncertainty and error may result from several other factors, including spatial mismatches in data, registration, and other locational errors, spatial variation in error due to sensors, and uncertainty and error in the reference data (Foody 2002, Ji and Gallo 2006, Riemann et al. 2010). All of these error sources make accuracy assessments difficult to interpret, especially when only one or two indices are used. It is important to have several assessment measures because model or map choice may depend on different criteria for different uses (Foody 2002, Riemann et al. 2010). A map that is acceptable for one purpose may not be for another. For example, Powell et al. (2010) found that an RF model generally performed better than two other methods in terms of minimizing prediction error of biomass, but SMA regression and GNN did a better job at variance preservation. Thus, an effective assessment provides enough information to evaluate whether a modeled data set is appropriate for varied uses.
Riemann et al. (2010) proposed a potentially useful set of guidelines for assessing modeled output for continuous variables, summarizing what the authors considered to be the five characteristics of effective assessments: (1) use of multiple types of assessment, (2) a description of the characteristics relevant to how the data set will be used, (3) a description of the location of error, (4) assessment across a range of scales, and (5) timeliness and consistent application of assessments. Operationally, the Riemann et al. (2010) protocol proposes (1) comparing the empirical cumulative distribution functions of the observed and predicted data sets for several scales and testing them using the Kolmogorov-Smirnov test, (2) examining overall agreement across several scales using (i) 1:1 scatterplots and SMA regression lines, (ii) an agreement coefficient that is then recalculated for both systematic and unsystematic error based on the proximity of the SMA line to the 1:1 line and the degree of scatter around the SMA line, (iii) differences between means and RMSE, and (iv) maps of the plot-based and modeled estimates at different scales, (3) examining spatial and distributional patterns of local differences by comparing predictions versus plot-based confidence intervals and producing choropleth maps of predictions with respect to plot confidence intervals, and (4) examining local variability via choropleth maps of SD of modeled estimates.
In addition to the detailed maps and accuracy measures described above, equivalence tests have been proposed as a way to statistically test whether two data sets differ from each other (Robinson and Froese 2004, Robinson et al. 2005). Unlike traditional significance testing, which tests the null hypothesis of no differences between the two data sets, equivalence tests place the burden of proof on the model, testing the null hypothesis that the two data sets are different and thereby requiring evidence for model validation through rejection of the null hypothesis. Leites et al. (2009) used equivalence tests, among other evaluation criteria, to evaluate crown ratio models and their influence on projected stand variables in temperate conifer forests. Likewise, Falkowski et al. (2010) used equivalence tests to evaluate imputation models predicting tree-level inventory data against NFI data in the same forest type. They also assessed the model predictions for operational accuracy by conducting equivalence tests on the projections from a forest growth model obtained using both the predicted and observed data for parameterization.
Conclusions
Predicting forest inventory attributes from ancillary remotely sensed or other data involves a number of important considerations at each step of the process (Figure 1). Each of the methods we reviewed has particular strengths and weaknesses, and no analytical technique examined emerged as superior for all cases. The choice of method should be determined carefully by identifying the specific goals of an analysis, the type of response required, the characteristics of the ancillary data, and the resources available. Many studies reach the conclusion that the approach chosen and the amount of attention dedicated to variable selection is probably more important than method selection to generate precise and useful results.
Identifying the purpose of the analysis is probably the single most important first step and will help rule out inappropriate analytical methods. For example, “black box” techniques such as ANN provide no information on relationships among variables, so if interpretation of such relationships is important, then ANN is probably not an appropriate analysis technique. Further, specific variants of nearest neighbor analysis may be better for preserving the covariance structure of the data, whereas other techniques might provide greater predictive accuracy. It is up to the user to decide which is most important. Moreover, analyst and computing resources are not a trivial consideration. Whereas regression, nearest neighbor, and decision trees are conceptually easy to understand and implement, ANN in particular is not straightforward and typically requires large amounts of computing time for implementation.
Careful consideration of the types and characteristics of the response and ancillary variables can also help determine the best analytical method for a particular problem. Even when multivariate responses are of interest, most previous studies have built individual models for each forest attribute response. However, more recent work is trending toward development of multiresponse models for several reasons, including maintenance of covariance structure and realistic outcomes as well as simply reducing the time and expense associated with the analysis phase. For example, multivariate imputation of forest inventory plot identification numbers that are linked to the inventory data through lookup tables permits prediction of any of the structural or compositional variables measured on the inventory plots by imputing a single plot identification. Because only a single model needs to be developed, this type of analysis reduces the work and resources involved in the prediction process. Thus, multivariate responses may be best suited to nearest neighbors analysis, whereas any of the methods might perform well with a univariate response.
Other characteristics of the data such as non-normality and nonlinear relationships might suggest use of approaches that do not rely on those assumptions, such as decision trees, nearest neighbors, or ANNs. If there are missing data, regression is not recommended (Table 1). Some techniques such as decision trees and nearest neighbors can readily incorporate several ancillary variables, as well as both categorical and continuous predictors, better than other techniques such as regression. Although regression can handle both categorical and continuous data, inclusion of categorical predictors can often be a tedious and inefficient task, especially if there are a large number of categories or classes.
A wide variety of remotely sensed and other ancillary variables have been used to predict forest composition and structural attributes. Traditionally, species composition has been classified from the spectral bands of remote sensing images into broad cover type classes, but other data such as NDVI, land cover, topography, climate, soils, and tree crown metrics have also been commonly used (Table 2). Many studies have found measured forest inventory variables such as height or age to be useful in predicting other structural inventory attributes such as biomass or volume; however, for mapping purposes, the lack of spatially continuous coverage of the predictors is problematic, and remotely sensed data have proven to be valuable because of their spatial continuity. When prediction of individual tree variables is desired, metrics from active sensors such as LiDAR are being used increasingly and effectively for the purpose of providing information on vertical structure (e.g., canopy height) and horizontal structure (e.g., canopy cover and tree density) that can be further used in models predicting attributes such as volume or biomass. The use of LiDAR metrics, in particular, for predicting forest inventory attributes holds great promise for forest management and has been implemented in an operational context in some countries. Both compositional and structural attributes tend to be more difficult to predict accurately in complex stands than in simple stands, and various authors have suggested the use of fine-resolution, multitemporal, and ancillary data on climate, soils, and topography to help overcome this problem.
Method and variable selection are intertwined in the sense that characteristics of both the response and predictor variables help determine which method is most appropriate to use (Figure 1). A problematic trend apparent from our review is the lack of consistency in the literature with regard to testing and reporting the relative contributions of ancillary variables to models predicting forest attributes. If reported consistently, such information could be very useful for refining future modeling efforts and improving predictions as well as permitting generalizations and theoretical advances to be made. In many cases, variables were included with no apparent rationale other than availability, and no variable reduction procedure or other test of variable contributions was used, which impeded our efforts to compare and generalize. Therefore, we strongly recommend that future efforts report the relative contributions of the various predictor variables to the models that are developed. Formal study leveraging data fusion schemes may aid in standardization in the future.
Once a method has been selected and appropriate response and predictor variables have been identified, the formal model can be developed, tested, refined if necessary based on the results of the validation, and used to predict and map forest attributes across a spatially continuous area (Figure 1). Part of the difficulty in comparing different predictive techniques or even simply different data sets is the lack of a standardized validation protocol for prediction models. Most studies report RMSE, but this measure is not standardized, rendering it less useful for purposes of comparison outside a particular study. Other measures are reported sporadically but not consistently enough to be useful for comparison purposes. As discussed earlier, Riemann et al. (2010) proposed a comprehensive assessment protocol for prediction models using geospatial data that would, if adopted widely, provide valuable information to allow for broader comparisons and advances in model development, ultimately benefiting forest planning and operations as well as the wider forestry community.
Acknowledgments: Funding for this work was provided by a grant from the USDA Forest Service, Northern Research Station through the Northern Institute of Applied Climate Science. Additional funding was provided by the NASA New Investigator Program via Grant Contract NNX12AL53G. We gratefully appreciate helpful comments from Nan Pond, Rubén Valbuena, and two anonymous reviewers, which significantly improved the article.
Literature Cited
