Additive Nonlinear Biomass Equations: A Likelihood-Based Approach

Since Parresol’s (Can. J. For. Res. 31:865– 878, 2001) seminal article on the topic, it has become standard to develop nonlinear tree biomass equations to ensure compatibility among total and component predictions and to fit these equations using multistep generalized least-squares methods. In particular, many studies have specified equations for total tree biomass by aggregating the expectations of M component biomass models and have fit all M 1 equations jointly using seemingly unrelated regression. More recently, an alternative approach has been used wherein compatibility is ensured by deriving multiplicative component expectations, with estimation carried out using instrumental variables and generalized least squares. Yet neither of these strategies considers the fundamental additivity of biomass data themselves nor the implied stochastic constraints necessary for maximum likelihood (ML) estimation. For model selection based on information criteria, stochastic simulation, Bayesian inference, or estimation with missing data, it is important to base estimation and inference on valid probabilistic models. Here, we show how aggregative and disaggregative nonlinear equations can be specified within a probabilistic framework and fit using Gaussian ML with open-source software. We use Parresol’s slash pine (Pinus elliottii Engelm. var. elliottii) data to contrast model forms and predictions. We also show how the ML approach can accommodate unobserved or aggregated component biomass data and can thus be useful for integrating felled-tree data collected under different protocols.

N ational forest inventory programs now commonly report estimates of aboveground biomass and carbon in live trees. The estimates are often derived from individual tree biomass equations developed to also furnish estimates of biomass in foliage, branches, stems, and other tree components. These component estimates may be relevant to ecophysiological growth models, timber planning efforts, bioenergy programs, or wildland fuel assessments. For many of these purposes, compatibility among the component and total tree biomass models is important. As noted by Parresol (2001), this compatibility should, at a minimum, ensure that component biomass or carbon estimates do not exceed estimates of whole-tree biomass or carbon and that component estimates sum to the estimates of the totals. Yet for more advanced analytical and estimation procedures such as stochastic simulation, mixed-effects modeling, and Bayesian inference, this level of compatibility is insufficient. These techniques require that the additive nature of tree biomass measurements themselves be recognized and that appropriate constraints be placed on stochastic model elements. In particular, they require that tree biomass models be formulated so as to jointly constitute a valid probability model.
Individual-tree equations commonly relate total and component biomass to diameter at breast height (dbh), either alone or in combination with other tree attributes. Dbh is readily measured and often accounts for the largest share of variation in tree biomass, but attributes such as tree height or live crown length can account for additional variation associated with differences in height:diameter ratios, stand density, or other factors (Brown 1978, Evert 1985. These and other predictors have been combined in myriad linear and nonlinear parametric equations to describe differences in biomass allometries across trees. The multivariate power law (i.e., y ϭ b 0 x 1 b 1 x 2 b 2 . . . x p b p ) is often a candidate equation. This equation in particular may motivate linear regression after logarithmic transformation or the use of nonlinear regression on the original mass scale. As the calibration of multicomponent equations is intrinsically more complex in the latter case, this research focuses on the nonlinear equation setting. Parresol (2001) developed two methods for ensuring the additivity of nonlinear component biomass equations. Both can be called aggregative in that, at least as developed in his examples, they ensured component additivity by constraining the mean function for total biomass to be the sum of the mean functions of the biomass components. His two methods differed in the stochastic conditions attached to the models, although they are more commonly differentiated by the parameter estimation techniques employed. The first approach prescribed separate estimation of each biomass component equation, with subsequent recovery of cross-component correlations from the original measurements for estimation of precision. The second used seemingly unrelated regression methods to fit the total and component models jointly. With this latter approach, the potential cross-correlations among biomass measurements on a given tree are modeled directly. In principle, this imparts greater efficiency to parameter estimators and has helped render this method the de facto standard for nonlinear biomass equation estimation (e.g., Sanquetta et al. 2015, Zhao et al. 2015. The inverse of the aggregative approach is to focus first on the specification of an allometric equation for total biomass and then constrain the form or parameters of component equations to ensure additivity. The latter constraints could be imposed by fitting equations characterizing the observed proportions of total biomass in each component. This could be achieved, for example, using Dirichlet regression models (Gueorguieva et al. 2008), although specialized forms (perhaps incorporating random effects) might be needed to allow for dependence among total and component biomass observations from the same tree. More commonly, nonlinear models for component biomass proportions have been fit separately, leaving one component out and estimating it subsequently via subtraction (e.g., Jenkins et al. 2003). Yet this approach does not allow for dependence among observations from the same tree and cannot ensure that subtotals of the estimated component proportions are strictly less than 1.
An alternative disaggregation strategy, apparently developed initially in the Chinese literature (Tang et al. 2000, see also Dong et al. 2015), is likewise based on the development of component biomass fraction equations but with parameter estimation carried out on the original scale (i.e., kilograms of mass). This approach can guarantee additivity of regression-based estimates of biomass components and totals. Disaggregative models of this form have been completed by specifying a stochastic structure (generally one allowing for crosscomponent correlations) premised on an errors-in-variables framework (see, e.g., Tang et al. 2001, Tang andWang 2002). Like Parresol's (2001) approaches, this framework recommends a multistep least-squares (MSLS) procedure to obtain parameter estimates.
Both aggregative and disaggregative models have been developed with due attention to the inconstant nature of component biomass variance structures and to the potential for cross-correlation between measurements taken on the same trees. However, these methods typically are fit with observations of both component and total biomass and have not explicitly recognized the fundamental additivity of these data. Biomass measurement requires destructive sampling and, except occasionally for small trees, is generally carried out component by component. That is, a tree is partitioned into components, a subsampling strategy is applied to each component, and a total biomass is obtained only afterwards by addition. This additivity of the data themselves has implications for model specification as well as the validity of probability or likelihood-based estimators and selection criteria. Estimation issues stemming from this additivity have been noted (e.g., by Bi et al. 2004) but not addressed in terms of the implied constraints on model forms. Of course, it is not necessary to develop component biomass models that are strictly valid in terms of their probabilistic structure, but such models are useful for many stochastically oriented inference and estimation techniques. In addition, for regression models that are at least implicitly based on normal or Gaussian error structures, it is worthwhile appreciating when and why the estimation techniques fail to coincide with Gaussian maximum likelihood (ML).
A probabilistic approach to biomass equation development can also be useful when biomass data are incomplete. That is, ML estimation can be applied to make efficient use of data when measurements of certain components are missing or available only in aggregate for some trees. More generally, differences in field data collection protocols or differing definitions of biomass components across studies often present challenges in combining data sets to calibrate tree component biomass equations. For example, in a recent study of loblolly pine (Pinus taeda L.), Zhao et al. (2015) identified five aboveground biomass components of interest, namely stemwood, stembark, dead branches, live branches, and foliage. In another loblolly pine study, Naidu et al. (1998) identified seven biomass components including stem sapwood, stem heartwood, stembark, live branchwood, live branchbark, foliage, and roots. Each of these data sets is missing at least one component that the other includes, although they have several components in common. More interestingly, two of the components (stemwood and live branches) measured by Zhao et al. incorporate two more finely resolved subcomponents (stem sapwood and stem heartwood; live branchwood and live branchbark) available in the data of Naidu et al. By their nature, these undifferentiated components of Zhao et al. ought to contain some information about the subcomponents of Naidu et al., albeit at an aggregated level. Regardless of whether the data from these two studies would be suitable for combining to fit biomass equations, the incomplete pairing and differing resolutions of their component measurements demonstrate a challenge for analysis to be addressed here.
The primary objective of this research was to synthesize aggregative and disaggregative approaches to nonlinear biomass equation specification within a probabilistic modeling framework. A second objective was to extend the models and ML estimation algorithms to accommodate missing or aggregated observations of biomass components. Theory and analytic results relevant to these two objectives are developed in the subsequent section, both for the general case and for the special case of Gaussian ML. These are followed by empirical results obtained by estimating biomass model parameters under Gaussian ML using the slash pine (Pinus elliottii Engelm. var. elliottii) biomass data studied by Parresol (2001). The computational methodology developed to garner these results itself constitutes the product of the third objective of this research, which was to document how Gaussian ML estimation of nonlinear biomass equations can be implemented using open-source statistical software.

Biomass Models and Estimators Notation and Additivity
Biomass studies differ in the components of biomass measured and totals sought. Some studies target total tree biomass, others only aboveground biomass. Within the aboveground portion, most studies differentiate between crown (i.e., branches and foliage) and stem, but there are variations in the precise definitions of these components and their further subdivisions. Thus, for generality, let Y 1 , Y 2 , . . ., Y M denote the M biomass components of a given tree and Y t the total of interest. A fundamental identity is This identity holds by definition when the symbols represent unobserved (or true) tree biomass quantities and is often a desideratum of biomass model estimates or simulations. Yet it is important to recognize that this identity also generally holds when the symbols represent biomass measurements or data obtained from destructive sampling. This follows from the fact that total tree biomass typically goes unmeasured and is obtained instead by summing together the component biomass observations (or estimates) assembled from field and laboratory measurements. From the above identity stems the fact that given a set of treelevel predictors x the joint probability law for a set of random variables Y 1 , Y 2 , . . ., Y M , and Y t denoting tree biomass components and their total is That is, given x, the joint probability model for Y 1 ,Y 2 , . . ., Y M , and Y t is a simple multiplicative function of the model for Y 1 ,Y 2 , . . ., Y M , with the multiplier being 0 or 1 independently of x or any model parameters. As such, the ML estimators of the parameters governing the former can be obtained directly by maximizing only the latter. At work here is the fact that given the biomass components and their additivity, there is no additional information contained in the biomass totals.
In practice, multivariate Gaussian probability models are commonly adopted for component biomass data. Where the Gaussian assumption has not been made explicit, it has often been invoked implicitly in inferential procedures or in the calculation of information criteria such as the Akaike information criterion (AIC) (Akaike 1973). These component biomass models are often of the additive-error form Here g m ( ⅐ ) is an arbitrary and possibly nonlinear function of a set of explanatory variables (x m ) relevant to the mth component and a q m vector of unknown parameters ␤ m . Conditional on the explanatory variables, the error term m is a mean-zero Gaussian disturbance with variance where v m ( ⅐ ) is a function of a vector of explanatory variables z m (not necessarily identical to x m ) and a vector of variance parameters m . Whereas observations from different trees are generally taken to be independent, cross-correlations among the error terms of biomass components on a single tree are often modeled using a set of M(M Ϫ 1)/2 pairwise-correlation parameters. With the variance functions of Equation 3, the latter complete the definition of the variance-covariance matrix of component errors V ϭ ΄ 1 2 1 2 1,2 1 3 1,3 . . . 1 M 1,M 1 2 1,2 2 2 2 3 2,3 . .
where m,mЈ is the correlation between biomass components m and mЈ across all trees. Below, the cross-covariances in V will also be denoted mmЈ ϭ m mЈ m,mЈ for simplicity. There are two properties of this general model that complicate specification of a compatible submodel for Y t . First, given the identity presented in Equation 1, the variance of the total biomass is not free to take any form. Instead, it is a function of the component scedastic functions and cross-correlation parameters. For example, suppose the variances of the M components are taken to scale with dbh according to component-specific parameters: i.e., v(dbh; m Second, if the component scedastic functions of Equation 3 have free parameters, then the cross-correlation of any given component Y m and the total Y t will vary across the ranges of the heteroscedasticity variables z m . That is, the cross-covariance between the total and, say, the first component is This cross-covariance is a function of dbh and, given the structure of t 2 , the corresponding cross-correlation cannot be reduced to a constant parameter t, 1 without complex constraints on the component cross-correlations and variance parameters. In terms of allometries, the implication is that as tree dbh increases, the variance of one component (e.g., the crown) could increase faster than that of another (stemwood). Then the cross-correlation of the stemwood component with the total biomass will decrease with dbh because the variability of the total is increasing more rapidly than the variability in the stemwood component.
To our knowledge, these variance-covariance relationships have not been incorporated into biomass models that directly specify submodels for both components and totals. Yet they reveal that incorporating a submodel for total biomass by simply specifying an additional mean function and expanding V by one row and one column (incorporating one additional variance function and M additional cross-correlation parameters) will not result in a valid probability model. Instead, the implied constraints must be explicitly identified or the submodel for total biomass must be stricken. Otherwise, singularities will emerge in likelihood-based simultaneous estimation algorithms (as noted by Bi et al. 2004) and least-squares estimators will diverge from Gaussian ML estimators. From a practical standpoint, probabilistic foundations for the use of likelihoodbased criteria such as the AIC will be lacking, and stochastic tree biomass simulations derived from the model will not be additive.
Finally, we note that multiplicative-error models analogous to Equation 2 have also been used in component biomass modeling. In some cases, these are transformed (e.g., to a logarithmic scale), and their stochastic specifications are completed within an additive-error framework. Alternatively, a multiplicative-error model can be reexpressed as If stochastic properties are attached directly to m ϭ g m (x m , ␤ m )( * m Ϫ 1) rather than to * m , then these models can be subsumed into the Gaussian additive-error framework, and the variance-covariance constraints identified above still apply.

Aggregative and Disaggregative Biomass Equations
In some instances, it may be useful to specify an equation for the expected total biomass independently of any component biomass equations. Doing so would allow for maximum fidelity in characterizing totals and presumably for improved accuracy in predicting those totals. Yet in many cases, it is important to ensure that component predictions add to predictions of totals. One way to achieve this is to specify an aggregative equation for the total where x ϭ ഫ m x m and ␤ ϭ (␤ 1 , ␤ 2 , . . .,␤ M ) T collect the component predictors and parameters. Naturally, such an approach could be used to specify equations for subtotals as well (e.g., stem biomass, inclusive of both bark and wood). If the component models are linear, then Equation 5 will reduce to a congruent linear form, but in the nonlinear case, there is generally no way to simplify the expression. This aggregative approach is at the heart of the two methods described by Parresol (2001), which differ primarily in whether or not observed Y t are used with component data to estimate the elements of ␤ and other model parameters simultaneously. From a likelihood perspective, the system of equations that must be fit in this context is with the error terms conforming to the variance structure of Equation 4 when a Gaussian model is adopted. Owing to the identity defined by Equation 1, the observed totals do not contribute to the likelihood, but total biomass at given levels of x can subsequently be estimated as ensuring additivity.
Pursuing an alternative approach, Tang et al. (2000) noted that it is also possible to begin with the specification of an equation for total biomass and then disaggregate this using multiplicative componentdiscrimination functions. For example, in a case where only stemwood (Y w ), stembark (Y b ), and crown components (Y c ) are differentiated, one might begin by constructing an equation for the expected aboveground biomass Here no constraints on the form or parameters of this function are invoked, at least not for the purposes of achieving additivity. Next, a function g cs (x cs , ␤ cs ) is specified to discriminate between the crown and stem (Y s ϭ Y w ϩ Y b ) fractions of the biomass such that where g cs (x cs , ␤ cs ) maps strictly to the unit interval. For instance, from working component biomass equations [say, h c (x c , ␤ c ) for Y c and h s (x s , ␤ s ) for Y s ], a discrimination function confined to [0,1] can be obtained as In turn, a function g wb (x wb , ␤ wb ) is then developed to discriminate between the stemwood and stembark fractions of the stem biomass In this way, component biomass expectations become complex nonlinear functions of the available predictors, whereas totals (and subtotals) simplify. With the additivity identity of Equation 1 in mind, a valid Gaussian biomass model based on this system of disaggregative equations can be written in the additive-error form with the error terms () having the general variance-covariance structure defined by Equation 4. Again, incorporation of a submodel for observed biomass totals is not necessary.
As developed above and in Dong et al. (2015), this disaggregative biomass equation strategy dichotomizes biomass totals or subtotals into finer components. Yet the same approach could be used to split a (sub)total into 3 or more components (Tang et al. 2000). Regardless, attention must be paid to the parameterization of the discrimination functions if it is important that specific parameters be identifiable. For instance, if the crown and stem component equations on the right hand side of Equation 10 are arbitrary nonlinear functions, then it may be impossible to obtain unique estimates for each of ␤ c , ␤ s , and ␤ t in a simultaneous calibration of Equation set 11. On the other hand, if congruent forms are adopted for those component equations, e.g., then the discrimination function will factor and require a smaller number of parameters (␤ cs, k ϭ ␤ s, k Ϫ ␤ c, k ) to be estimated. This strategy, applied by Dong et al. (2015), reduces the complexity of parameter estimation and allows for (additive) estimation of all components and totals. However, it does not permit recovery of unique values for ␤ c and ␤ s and does constrain the complexity of component equations.
Related to this issue of identifiability, Tang et al. (2001) developed an alternative stochastic formulation to replace the additiveerror structure exemplified by Equation 11. Specifically, they developed an errors-in-variables structure, and Dong et al. (2015) refer to this in the context of their disaggregative biomass equation formulation. However, those studies do not detail how stochastic constraints on the error term for the biomass total (e.g., on its variance or its cross-covariances with components) can be explicitly recognized in estimation. As such, we treat below only the additive-error model structure represented by Equation 11.
In contrasting aggregative and disaggregative model structures, it is worth noting how component biomass observations contribute to the mean structure parameters in ␤. In the aggregative model defined by Equation 6, each component Y m contributes directly to the estimation of its own ␤ m parameter vector and also indirectly (because of cross-correlations among components) to the other vectors ␤ mЈ . In contrast, in the disaggregative framework of Equation 11, the same kinds of indirect contributions occur, but each component Y m contributes information directly to the estimation of ␤ t in Equation 8 as well as to the parameters characterizing one or more discrimination functions. This in turn has consequences for the impact of missing biomass component data, as discussed below.

Maximum Likelihood Estimation
It was noted above that under a likelihood-based approach to estimation only the likelihood of the observed component data must be maximized. That is, regarding the sample trees as independent, all parameters are estimated by maximizing where ϭ ( 1 , 2 , . . ., M ) T and ϭ ( 1, 2 , 1,3 , . . ., MϪ1, M ) T contain the variance and cross-correlation parameters of the component data, Y m ϭ (Y 1m , Y 2m , . . ., Y nm ) T comprises the observed data on component m for all n trees, X collects the covariate values x im from each tree and component, and L i ( ⅐ ) denotes the likelihood function for the ith tree. If there is tree-to-tree dependence in one or more components, then the complete sample likelihood of Equation 12 will not factor into n tree-specific likelihoods but will nonetheless remain a function only of the component observations. Under standard conditions, the ML parameter estimators are asymptotically Gaussian distributed with variance-covariance matrix given by the inverse of the Fisher information matrix (Lehmann and Casella 1998, §6.3). For multivariate Gaussian models, a treatment of ML estimation covering both theory and computational methods is given by Pinheiro and Bates (2000, §7.5). Most notably, the asymptotic variance-covariance matrix of the estimator of the q parameters ␤ in a Gaussian component biomass model such as Equation 6 or 11 can be written where V i is the variance-covariance matrix of component errors (cf. Equation 4) for the ith tree and is the M ϫ q matrix of the tree's component mean functions differentiated with respect to their parameters (and evaluated at the true value of ␤). This variance matrix can itself be estimated by plugging in the ML estimates of ␤ to allow for testing or interval estimates on the parameters themselves. For biomass estimation purposes, point estimates of components and totals can be obtained using the same framework as Equation 7 and the ML estimates of ␤. Denote an estimate of the expected component, subtotal, or total biomass at a given level of x by ⌬ T g(x,␤ ), where ⌬ is a known M ϫ 1 vector (e.g., ⌬ ϭ 1 yields an estimate of total biomass). For a Gaussian biomass model, the variance of the estimate ⌬ T g(x, ␤ ) can be estimated using the delta method where⍀ is obtained by evaluating ⍀ at the ML parameter estimates and For a prediction at the individual-tree level, say Ŷ new ϭ ⌬ T g(x, ␤ ), variance estimation would require that Equation 13 be supplemented by estimates of the intrinsic tree-to-tree variation in component, subtotal, or total biomass. Specifically, for prediction for a tree selected independently of those used to fit the biomass equations, where V corresponds to an evaluation of Equation 4, given the ML parameter estimates and the chosen levels of the predictors x.

Missing Biomass Components
Given both the complexity and cost of obtaining tree biomass data, it is important that parameter estimation techniques are able to accommodate settings where, for some trees, some of the biomass components are not available. Little and Rubin (1987, ch. 1) discuss different mechanisms that may yield incomplete data, as well as the implications for data analysis and inference. It is assumed in the subsequent discussion that cases of incomplete data are instances of data missing at random or missing completely at random. That is, it is assumed that the missingness of any biomass components is the result of a mechanism that is independent of the values of those (or any related) components and dependent only on variables and parameters with no bearing on the relationship between tree biomass and its predictors. These are cases where missingness is uninformative, and the processes leading to incomplete data can be ignored (see Little and Rubin 1987, §5.3). To such cases ML methods are readily extended, particularly in the Gaussian setting.
With multidimensional component biomass measurements, there are two patterns of missingness that should be considered. The first and simplest is where no information for one or more components is available. For instance, if destructive sampling is carried out in leaf-off season or frozen ground prevents extraction of roots, then no information will be available for foliage or root biomass. Furthermore, the total biomass of these trees will remain unobserved. For such trees, the observed data likelihood is obtained by integrating the complete data likelihood over the missing components. More specifically, if the first 2 biomass components are unobserved on a particular tree, then the corresponding observed data likelihood for the tree is proportional to In the Gaussian setting this integral is readily obtained. The result is itself a multivariate Gaussian density with the missing components eliminated from the complete data likelihood, i.e., where y o T ϭ [y 3 , y 4 , . . ., y M ] is the observed data, g o (x, ␤) denotes the mean functions for the observed, components and V o is the matrix obtained by deleting the appropriate (here the first two) rows and columns of V The likelihood function of Equation 12 covering all n trees will then be the product of complete data likelihoods for all trees without missing data and the (reduced) observed data likelihoods of the form of Equation 14 for trees with missing components. Thus, trees with different patterns of missingness will contain information on distinct subsets of model parameters and contribute differently to the overall likelihood.
The other relevant pattern of missingness for tree biomass data occurs when information is available for all components, but some components are observed only in aggregate. For example, with stem biomass components estimated from disc-based measurements, if bark and wood are not separated on some trees' discs then separate estimates of stembark and stemwood may not be available for those trees. Or, if for some trees foliage is not separated from branches or if branches are not divided into specific size classes, then only total crown or total branch mass data will be available. These trees still contain information on all components in the sense that total tree biomass is still observed, but not all components are separately resolved. For these trees, the observed data likelihood takes a different form from Equation 14 but is again obtained by integrating over the complete data likelihood. For example, if the first two components are observed only in aggregate (i.e., Z ϭ Y 1 ϩ Y 2 ), then for that tree the relevant observed data likelihood is f Z, Y3, Y4, . . ., YM ͑ z, y 3 , y 4 , . . ., y M ͒ ϭ ͵ 0 ϱ ͵ 0 zϪy2 f Y1, Y2, . . ., YC ͑ y 1 , y 2 , . . ., y C ͒dy 1 dy 2 Although complex in the general case, this integral is also straightforward to evaluate in the Gaussian setting. This follows from the fact that the sum of Gaussian variables is itself Gaussian, with mean and variance that stem directly from the means and (co)variances of the summands. Thus, f Y1ϩY2, Y3, Y4, . . ., YM ͑ y 1 ϩ y 2 , y 3 , y 4 , . . ., where y a T ϭ [y 1 ϩ y 2 , y 3 , y 4 , . . ., y M ] is the observed (partially aggregated) data, g a (x; ␤) is the vector of mean functions of observed components but with first element g 1 (x 1 ; ␤ 1 ) ϩ g 2 (x 2 ; ␤ 2 ), and V a is the matrix obtained by collapsing the first two rows and columns of V V a ϭ ΄ Once again, these analytical operations need to be undertaken only for trees with aggregated component data. The full data likelihood of Equation 12 will consist of complete data likelihoods for trees without missing data and incomplete data likelihoods for the remaining trees. By applying these results, ML estimation of Gaussian component biomass models with aggregative or disaggregative equations can proceed from data exhibiting either or both of these missingness patterns. As noted above, however, missing component biomass data have different consequences, depending on whether aggregative and disaggregative biomass equations are used. For instance, suppose a particular tree provides stemwood and stembark biomass measurements but no crown biomass. With adoption of aggregative equations, the crown biomass component model must be dropped for that tree, and the tree will contribute no information for the estimation of the parameters governing expected crown mass. Yet if disaggregative equations are used, although the Y c equation will still need to be dropped from a system such as Equation 11, the tree in question will still contribute to the estimation of all ␤ parameters because no elements of ␤ are unique to just one component equation. Conversely, suppose that all trees but one possess measurements of stemwood and stembark biomass, whereas the remaining tree only has a record for undifferentiated stem biomass. In this case, the deviant tree cannot provide information useful for estimation of ␤ wb in a disaggregated model such as Equation 11; its contribution to the joint likelihood will be based on observed deviations from the expected values given by Equation 9. Yet if aggregated biomass equations are used, this tree will still contribute to the estimation of all elements of ␤, although its likelihood would be formed by adding together two of the equations in system 6.

Materials
ML estimation of biomass model parameters is illustrated using the green mass data previously presented by Parresol (2001). The data are from 40 slash pine trees grown in unthinned plantations in Louisiana, USA. Biomass totals are exclusive of 15-cm stumps, and the crown component represents the branch and foliage masses. Figure 1 reproduces the complete data and also identifies components taken as missing in subsequent analyses.

Biomass Models
An aggregative component biomass model for the slash pine data was developed using the biomass equations selected by Parresol (2001). Expressed as exponential functions these are where D and H represent tree dbh and total height and the subscripts c, b, and w identify the crown, stembark, and stemwood equations. For error variance functions, Parresol (2001) recommended were evaluated here. To complete the variance-covariance matrix, an unstructured set of pairwise cross-correlations ( m,mЈ ) was used to describe the associations among component error terms for the same tree, and error terms for distinct trees were presumed independent. Against this, a disaggregative model of the form of Equation system 11 was then specified. This model was based on an equation for expected total biomass, a function to discriminate between crown and stem mass, and a function to discriminate between stembark and stemwood mass. The forms adopted for these equations were Substituting these equations into Equation 11 provides the overall model structure. To complete the model, variances of component errors were characterized using Equation 19 and cross-covariances using a set of unstructured pairwise correlation parameters (as above).

Maximum Likelihood Estimation
The aggregative and disaggregative models were initially fit to the complete set of component data in Figure 1. The nlme package (Pinheiro et al. 2014) for the open-source program R (R Core Team 2014) was used for Gaussian ML. Specifically, as detailed in Appendix A, the gnls function of this package was used to obtain ML estimates. This function operates by alternating between optimization of the likelihood for ␤ conditional on estimates of variance parameters and vice versa (see Pinheiro and Bates 2000, §7.5.2). To allow for simultaneous estimation of all three component biomass equations, the three mean functions (formulated on an exponential scale) were collapsed into a single function using logical flags, and the variance structure was specified using nlme's varComb() function.
For the aggregative model, the starting values of the ␤ parameters were obtained from separate ordinary least-squares calibrations undertaken after logarithmic transformation of both the observed biomass and the component expectation Equations 18. This approach was also used to obtain starting values for ␤ t in the disaggregative model. The starting values for ␤ cs and ␤ wb in that model's discrimination functions were obtained from separate log-linear regressions of Y c /(Y w ϩ Y b ) and Y b /Y w , respectively, on D and H. With starting values for ␤ in hand, model parameters were then estimated by ML using gnls.
After calibrations of the models against the complete data set, the patterns of missingness shown in Figure 1 were simulated, both separately (i.e., only crown component missing or only total stem biomass available) and in combination. The gnls algorithms can readily accommodate missing component data. Specifically, provided that components are consistently (across trees) ordered and identified by integer values (Appendix A), then missing values can be included for unobserved components and gnls will appropriately factor the likelihoods of the affected trees to the form of Equation 15. On the other hand, missingness in the form of aggregated components (e.g., overall stem mass is observed, rather than separate stemwood and stembark masses) cannot be accommodated by standard inputs to this function. In particular, gnls presumes that component scedastic functions and component cross-correlation functions can be specified separately, using nlme's built-in variance and correlation class structures. This imparts computational efficiencies because the variance and correlation parameters can be separated in optimization. However, with this form of missingness the resulting covariance structure given by Equation 17 cannot be factored into variance and correlation matrices in the same way for all trees. One way to address this is to develop a nlme covariance class structure that incorporates both variance and cross-correlation parameters as well as the structure of the data for each particular tree. This is illustrated by an R script in the Supplemental Material that redefines nlme's basic symmetric correlation class to provide a symmetric covariance class taking an additional argument describing the data structure for each tree. This last argument allows the general variance-covariance matrix (Equation 4) to be collapsed to a form (such as Equation 17) according to how the component biomass data themselves have been collapsed. With this covariance structure used as a correlation class in gnls, ML estimates of model parameters can be obtained when both patterns of missingness occur (Appendix B).

Model Comparison and Evaluation
As a matter of standard practice, circumspect evaluation of model-data agreement via graphical and statistical measures is recommended. Here, however, our emphasis is on demonstration of alternative estimation techniques for complete/incomplete data sets with respect to a fixed set of model forms. Therefore, models fit by ML to the complete and simulated-incomplete data are compared in terms of their estimated parameters and SEs. In addition, the resulting biomass estimation surfaces are evaluated and model root mean square errors (RMSEs) are compared. The latter are computed on the original, unweighted scale where the e mi ϭ Y mi Ϫ Ŷ mi are the raw residuals for the mth component. Owing to the generalized variance structure of Equation 19, the raw residuals have unequal variability and are not well suited for residual diagnostics. Better for diagnostic purposes are the normalized residuals r i ϭ V i Ϫ1/2 e i , where e i ϭ (e 1i , e 2i , . . ., e Mi ) T is the vector of raw residuals for the ith tree, V i is the corresponding estimated variance-covariance matrix, and V i Ϫ1/2 is a Choleski factorization of that matrix. Normalized residuals are approximately Gaussian distributed with mean 0 and variance 1 if the model is correctly specified (Pinheiro and Bates 2000, p. 239).
Inasmuch as the RMSE statistic as calculated above makes no adjustment for the numbers of parameters estimated by the various models, we also compute the finite-sample corrected form of AIC (AICc) for each model where d counts the number of parameters in the mean and covariance functions. For comparability, RMSE and AICc are computed using the complete set of observations even for models fit to simulated-incomplete data. Both RMSE and AICc are also computed for the aggregative models fit (on the original scale) by Parresol (2001) using 2-stage (2SLS) and 3-stage (3SLS) least squares. For these models, a Gaussian likelihood was used in AICc calculations along with the published variance functions. AICc for the 2SLS fit used the cross-correlation parameter estimates presented by Parresol (p. 870). Estimates of these parameters for the 3SLS fit were not published but were provided in computer code supplied by B.R. Parresol (USDA Forest Service, pers. comm., May 2001). Table 1 alongside the 2SLS (Agg-2SLS) and 3SLS (Agg-3SLS) estimates given in Parresol (2001). Both the estimates themselves and their estimated SEs changed only slightly when the missing data patterns of Figure 1 were imposed (Agg-ML miss ). The estimated SEs of the crown equation parameter estimates all increased, whereas those of the stembark and stemwood equation estimates fell, albeit to a lesser degree. The increased SEs for ␤ c presumably result from the complete loss of crown biomass data for 7 trees. On the other hand, the stemwood and stembark equation estimates are still informed by data from all 40 trees, although to a lesser extent by the 7 trees with only aggregated stem biomass data. The estimated SEs reported for Parresol's MSLS fits are similar to those from the ML fits, although those for his Agg-3SLS fit are generally the smallest. For the disaggregative system, estimates of the parameters characterizing total tree biomass varied less between the complete (Dis-ML) and incomplete (Dis-ML miss ) data fits than did estimates of the stem-crown or stemwood-stembark discrimination parameters ( Table 2). The same was true of the estimated SEs, although these increased across the board when missingness was simulated. Under the Dis-ML miss model, all 40 trees contribute to the estimation of ␤ t even when the missing data patterns of Figure 1 are Supplementary data are available with this article at http://dx.doi.org/10.5849/forsci.15-126.

Table 1. Estimated component biomass equation parameters and
SEs for the aggregative model obtained by ML from the complete (Agg-ML) and incomplete (Agg-ML miss ) data as well as from the MSLS fits (Agg-2SLS and Agg-3SLS) reported in Parresol (2001 imposed. However, the seven trees missing crown biomass provide less information for estimating ␤ sc , and the seven trees lacking separate stembark and stemwood biomass measurements provide no information for estimating ␤ wb . Estimates of covariance parameters by model and data set are given in Table 3. The variance parameter estimates reinforce the heteroscedastic nature of biomass observations, both across components ( w1 are approximately double b1 or c1 ) and across tree sizes ( ⅐2 Ͼ 0). Table 3 does not show estimated SEs, but the ML estimates of the cross-correlation between stemwood and stembark biomass were consistently more than 2 estimated SEs from 0, whereas the other cross-correlation estimates were consistently within 2 estimated SEs of 0. After Equations 18 were fit separately by 2SLS, Parresol (2001, p. 870) recovered cross-correlations of 0.29 (crown-stembark), 0.01 (crownstemwood), and 0.19 (stemwood-stembark). The first and last of these are different from the ML estimates in Table 3. Estimated cross-correlations under his Agg-3SLS fit were not reported but were communicated as 0.013 (crown-stembark), Ϫ0.115 (crown-stemwood), and 0.283 (stemwood-stembark). These agree more closely with the ML correlations in Table 3.
Despite variations in functional forms and parameter estimates, component and total tree biomass estimates were similar across models. The most pronounced differences between the Agg-ML and Dis-ML estimates were for trees with low heights given their dbh. For instance, applying Agg-ML coefficients, trees with dbh 20 cm and height 15 m are estimated to have, on average g c ͑20, 15; ␤ c ͒ ϭ exp ͓Ϫ3.559 ϩ 3.522 ln 20 Ϫ 1.122 ln 15͔ ϭ 52.1 kg g b ͑20; ␤ b ͒ ϭ exp ͓Ϫ3.146 ϩ 2.236 ln 20͔ ϭ 34.8 kg g w ͑20, 15; ␤ w ͒ ϭ exp ͓Ϫ4.156 ϩ 1.064 ln͑20 2 15͔͒ ϭ 163.7 kg of biomass in their crowns, stembark, and stemwood components, for an estimated aboveground biomass of 250.6 kg. Applying Equation 13, the estimated SEs for these components are 4.8, 0.8, and 2.5 kg, respectively, and 5.6 kg for the total. With the Dis-ML model, the total biomass is estimated as with respective estimated SEs of 4.7, 1.7, and 6.8 kg. For a given dbh, the respective estimates from the Agg-ML and Dis-ML models converge as height is increased.
Looking across the range of the predictors, RMSE differed little across models and estimation routines ( Table 4). As expected, the RMSE for total biomass was lower for models fit to the complete data set. However, this was not the case for all components: for some components the RMSE was the same as or lower for Agg-ML miss or Dis-ML miss relative to those for Agg-ML or Dis-ML. This can occur because, as calculated here, RMSE is an unweighted measure of data-model agreement, and each model fit is associated with a unique set of weight (i.e., variance) functions (cf. Table 3). Figures 2 and 3 show Gaussian probability plots of the normalized residuals from the Agg-ML and Dis-ML models. The distributions of residuals from these and other models had light left tails. Assessed against 100 parametric bootstrap resampling results, however, the normalized residual plots exhibited no pronounced outliers or deviations from Gaussian densities. Table 5 provides measures of model-data agreement in terms of   Gaussian probabilities. By these measures, the ML-estimated models provided better descriptions of the slash pine data set, with Agg-ML providing the best characterization after accounting for the dimension of the parameter vector. As expected, the models obtained from the incomplete data were associated with smaller log-likelihoods as these were evaluated using all the data. The models obtained by Parresol (2001) had smaller likelihoods and larger AICc than any ML fits. This is not surprising given that the MSLS fits were not obtained by optimizing a Gaussian probability model. Indeed, the structure implied by the model Parresol used to obtain the Agg-3SLS parameter estimates does not correspond with a probability model of any kind. We emphasize, however, that this does not imply that his MSLS models are inferior to the ML models obtained here in terms of predictive accuracy. The RMSEs of the models are very similar (Table 4) such that the differences from a likelihood or AIC perspective may be due in large part to differences in the fitted covariance structures.

Discussion
Aggregative and disaggregative biomass equations can be developed and fit within statistical frameworks built on valid probability models. Doing so requires that the additivity of the biomass data themselves be recognized. In turn, this implies that the dependence structure of biomass data collected from the same tree be described only for the observable components and that only the component biomass observations (not observed totals or subtotals) be used in parameter estimation. This departs from previous presentations of aggregative and disaggregative systems (Parresol 2001, Dong et al. 2015 where component data as well as their (conditionally redundant) totals are used in MSLS estimation procedures. In principle, valid probability models could also be calibrated using observations of M Ϫ 1 biomass components plus the total, but doing so would   require that additional constraints be imposed to preclude negative estimates of the Mth component. Notably, these considerations also carry over to the linear modeling context. If a system of component biomass models has a valid probabilistic structure, its parameters can be estimated by ML. This is advantageous when a credible probability law can be advanced because ML then allows for efficient use of the available data. It also facilitates the use of information-based criteria for model selection and is crucial for Bayesian model development and inference. In addition, it allows for consistent treatment of partially observed component biomass data. Specifically, if component data are missing (or aggregated) at random (cf. Little and Rubin 1987, ch. 1), then ML remains fully efficient. For a given data set, its use obviates the need to drop trees or impute missing component data. For instance, the proposed approach can be readily applied where high excavation costs preclude measurement of belowground biomass on all sample trees. More broadly, its use can facilitate assimilation of data sets collected by researchers measuring different biomass components. For example, one destructive sampling effort may yield only estimates of total branch biomass, whereas another may identify the distribution of branch biomass across several size-classes. By recognizing the aggregated nature of the former data set, the two could be pooled to jointly estimate branch biomass within each size-class. Of course, in evaluating partially observed component biomass data, one must carefully consider whether the missingness mechanism is informative or not. Trees lacking observations of foliage biomass owing to defoliation by insects may, for example, also have had biomass distributions that predisposed them to attack. In such cases, the missingness mechanism itself must be modeled if partially observed data are to be pooled with complete records (Little and Rubin 1987, ch. 11).
Joint calibration of systems of biomass component equations allows for the use of cross-component dependencies in parameter estimation. In principle, this improves estimator accuracy at the system level. At the same time, by minimizing a system-level loss function, one potentially sacrifices a greater precision that might be obtained for a particular component or subtotal if it were fit separately. This could be disadvantageous if a particular component or subtotal is of overriding importance. For instance, in the application studied here, a model for total slash pine biomass might yield more accurate estimates if fit exclusively to the observed totals (see e.g., Parresol 1999). In addition, more accurate estimates of totals may be obtained by fitting an additive system of equations but using both the component data and the observed totals. For example, Parresol's (2001) 3SLS procedure reduced the RMSE for total biomass relative both to his 2SLS procedure and the ML fit obtained here (Table 4). It achieves this by reweighting the component residuals in the objective function. For ML the latter is a quadratic function of the component residuals weighted by the variance functions of those components and their cross-correlations; for 3SLS the component residuals are also weighted by a specified variance function for the total and an additional set of cross-correlations (between components and totals). In this indirect reweighting of component residuals, as well as in the fact that it implies an invalid probability model, it is a curious (albeit effective) way to seek improved precision for totals. An alternative strategy using ML is to fix or premultiply elements of the component variance functions (e.g., the 1m in Equation 19) to reflect greater importance of some components vis-à-vis others. This allows for importance-based weighting to overcome uncertainty-based weighting, although it is unclear how this could be used to improve estimates of (sub)totals as opposed to individual components.
Joint ML estimation of nonlinear component biomass models is readily achieved using open-source software, as documented above, at least in the Gaussian setting. Moreover, in this same setting, missing or aggregated component observations can be accommodated by exploiting the extendable structure of the nlme package for R. This is important given the common invocation, implicit or explicit, of Gaussian models in biomass modeling. Nonetheless, generalization of these methods beyond the standard Gaussian model would serve to facilitate nonlinear mixed-effects modeling to account for dependence across trees as well as the characterization of fundamentally non-Gaussian data. This is the subject of continuing research.