-
PDF
- Split View
-
Views
-
Cite
Cite
Anthony R Ives, R|$^{2}$|s for Correlated Data: Phylogenetic Models, LMMs, and GLMMs, Systematic Biology, Volume 68, Issue 2, March 2019, Pages 234–251, https://doi.org/10.1093/sysbio/syy060
Close -
Share
Abstract
Many researchers want to report an |$R^{2}$| to measure the variance explained by a model. When the model includes correlation among data, such as phylogenetic models and mixed models, defining an |$R^{2}$| faces two conceptual problems. (i) It is unclear how to measure the variance explained by predictor (independent) variables when the model contains covariances. (ii) Researchers may want the |$R^{2}$| to include the variance explained by the covariances by asking questions such as “How much of the data is explained by phylogeny?” Here, I investigated three |$R^{2}$|s for phylogenetic and mixed models. |$R^{2}_{resid}$| is an extension of the ordinary least-squares |$R^{2}$| that weights residuals by variances and covariances estimated by the model; it is closely related to |$R^{2}_{glmm}$| presented by Nakagawa and Schielzeth (2013. A general and simple method for obtaining R2 from generalized linear mixed-effects models. Methods Ecol. Evol. 4:133–142). |$R^{2}_{pred}$| is based on predicting each residual from the fitted model and computing the variance between observed and predicted values. |$R^{2}_{lik}$| is based on the likelihood of fitted models, and therefore, reflects the amount of information that the models contain. These three |$R^{2}$|s are formulated as partial |$R^{2}$|s, making it possible to compare the contributions of predictor variables and variance components (phylogenetic signal and random effects) to the fit of models. Because partial |$R^{2}$|s compare a full model with a reduced model without components of the full model, they are distinct from marginal |$R^{2}$|s that partition additive components of the variance. I assessed the properties of the |$R^{2}$|s for phylogenetic models using simulations for continuous and binary response data (phylogenetic generalized least squares and phylogenetic logistic regression). Because the |$R^{2}$|s are designed broadly for any model for correlated data, I also compared |$R^{2}$|s for linear mixed models and generalized linear mixed models. |$R^{2}_{resid}$|, |$R^{2}_{pred}$|, and |$R^{2}_{lik}$| all have similar performance in describing the variance explained by different components of models. However, |$R^{2}_{pred}$| gives the most direct answer to the question of how much variance in the data is explained by a model. |$R^{2}_{resid}$| is most appropriate for comparing models fit to different data sets, because it does not depend on sample sizes. And |$R^{2}_{lik}$| is most appropriate to assess the importance of different components within the same model applied to the same data, because it is most closely associated with statistical significance tests.
Researchers often want to calculate a coefficient of determination, an |$R^{2}$|, to give a measure of the amount of variance in their data explained by a statistical model. For ordinary least-squares models (OLS), such as regression and analysis of variance (ANOVA), the |$R^{2}$| is simple to calculate and interpret. Many types of models, however, assume that the errors among response variables are correlated. Phylogenetic generalized least squares models (PGLS) allow the possibility of phylogenetically related species being more similar to each other, leading to phylogenetic correlations in the errors (Felsenstein, 1985; Garland et al., 1992; Martins and Hansen, 1997). PGLS models are structurally similar to linear mixed models (LMMs) that include random effects to account for correlations in the residual variation; for example, LMMs can account for correlation between residuals of experimental replicates within the same block (Gelman and Hill, 2007; Bolker et al., 2009). Interpretation of an |$R^{2}$| for models for discrete response variables, such as phylogenetic logistic regression models and generalized linear mixed models (GLMMs), adds a second complexity. Even a perfectly fitting model will have residual variation due to the discreteness of the data, which raises the question of how to formulate an |$R^{2}$| when the variance in the data can never be completely explained.
Formally, correlated errors in statistical models for continuous or discrete data cause two issues for defining an |$R^{2}$|. The first involves assessing the goodness-of-fit of predictor variables (fixed effects) in terms of the explained variance. For standard OLS models, the errors are assumed to be identical and independently distributed, and therefore, the variance in the residuals can be calculated directly to give the total variance that is not explained by the model. In models for correlated data, however, the errors are not independently distributed. Therefore, to calculate the “unexplained variance” given by the residuals, it is necessary to deal with the covariances among errors; applying the OLS |$R^{2}$| to estimates from a model with covariances among errors gives values that are bounded below by |$-\infty $| rather than zero (Judge et al., 1985, p. 32).
The second issue for defining an |$R^{2}$| involves assessing the goodness-of-fit of the covariances (random effects) estimated in the model. For phylogenetic models, this is embodied by the question “How much of the data is explained by phylogeny?” The difficulty is that a phylogenetic model can be used to estimate the strength of phylogenetic signal (the magnitude of the covariances) in the errors, but the phylogenetic signal does not directly quantify the proportion of the total (rather than residual) variance explained. For example, in PGLS Pagel’s |$\lambda $| branch-length transform (Pagel, 1997; Housworth et al., 2004) partitions the variance of the residuals into phylogenetic and non-phylogenetic parts; this is similar to LMMs that partition variances into random effects and residuals. In contrast, in OLS the |$R^{2}$| partitions the total variance in |$Y$| into that part explained by the predictor variables (fixed effects) and the residual variation. It is not immediately clear what it means in PGLS for a phylogeny to “explain” the total variance of the data in a way comparable to the predictor variables in OLS.
Here, I assess three |$R^{2}$|s for models that specify non-zero covariances among errors. Although these definitions of |$R^{2}$|s are broad enough to encompass any model specifying an error covariance matrix, I will focus on application to phylogenetic models for continuous (PGLS) and binary (PGLMM) data. In addition, I will compare the properties of the |$R^{2}$|s applied to phylogenetic data to the properties of the |$R^{2}$|s applied to mixed models. This comparison shows the generalizability of the |$R^{2}$|s, and validates the |$R^{2}$|s as viable measures of goodness-of-fit for to a broad class of models. This is important, because |$R^{2}$|s should make it possible to assess and compare as wide a range of models as possible (Kvalseth, 1985).
The three |$R^{2}$|s are designed as partial |$R^{2}$|s that compare a full model with a reduced model in which one or more of the parameters are removed; partial |$R^{2}$|s measure the explained variance that is lost when the full model is reduced. The overall |$R^{2}$|s are obtained by comparing the full model to the simplest reduced model in which there is only an intercept and the residuals are independent. Partial |$R^{2}$|s have the advantage of making it possible to ask about the contribution of a single or subset of components to the fit of a model. This makes it possible to exclude coefficients in a model that are not of explicit interest; for example, many phylogenetic models for species traits include body size as one of the predictor variables to factor out body size, and partial |$R^{2}$|s make it possible to assess the goodness-of-fit for the remaining predictor variables. By comparing a model with a phylogeny to a model without, partial |$R^{2}$|s also make it possible to answer the question “How much of the data is explained by phylogeny?”
|$R^{2}$|s can be assessed on multiple grounds (Kvalseth, 1985), and here I use three. First, does the |$R^{2}$| give a good measure-of-fit of a model to data? To serve as a basis for assessment, I use the log-likelihood ratio (LLR) of the full and reduced models. The LLR approaches a |$\chi ^{2}$| distribution for large samples and is therefore used for hypothesis tests of full versus reduce models (Judge et al., 1985). Also, the LLR is linearly related to the Akaike’s Information Criterion (AIC) and other measures used for model selection (Burnham and Anderson, 2002). Therefore, the LLR is a natural choice to assess |$R^{2}$|s: a good |$R^{2}$| should be monotonically related to the LLR. Second, can the |$R^{2}$| identify the contribution of different components of the model, specifically that component attributed to predictor variables and that attributed to phylogenetic or random effects, to the overall model fit? The partial |$R^{2}$| corresponding to the phylogenetic or random effect should give an indication of the statistical significance of these covariance structures in the data. Third, does the |$R^{2}$| give similar values when applied to data generated by the same statistical process? If the values of |$R^{2}$| when applied to data generated from the same statistical process are all similar, then the |$R^{2}$| gives a precise measure of goodness-of-fit.
Materials and Methods
Here, I present three |$R^{2}$|s—|$R^{2}_{resid}$|, |$R^{2}_{prd}$|, and |$R^{2}_{lik}$|—that can be applied to a broad class of models in which the variance structure of the residuals contains covariances. As a strategy of attack, I will begin with a detailed discussion of |$R^{2}_{resid}$|, with the goal of explaining the challenges of defining an |$R^{2}$| for correlated data as well as possible solutions. |$R^{2}_{resid}$| is based on the variance of residuals of a fitted model and is related to |$R^{2}_{glmm}$| (Nakagawa and Schielzeth, 2013), and contrasting |$R^{2}_{resid}$| with |$R^{2}_{glmm}$| generates a discussion of what partial |$R^{2}$|s reveal about a fitted model. I will then give briefer presentations of |$R^{2}_{pred}$| and |$R^{2}_{lik}$|. All |$R^{2}$|s are computed in the R package rr2 (Ives and Li, 2018).
$R^{2}_{resid}$
There is an extensive literature on |$R^{2}$|s for GLMs and LMMs, and a growing literature for GLMMs (Buse, 1973; Cameron and Windmeijer, 1996, 1997; Kenward and Roger, 1997; Menard, 2000; Xu, 2003; Kramer, 2005; Edwards et al., 2008; Liu et al., 2008; Orelien and Edwards, 2008; Nakagawa and Schielzeth, 2013; Jaeger et al., 2017); this literature forms the basis for the |$R^{2}$|s that can be applied to phylogenetic models. The three |$R^{2}$|s take three different approaches to defining “explained variance”, the same general approaches considered for LMMs by Xu (2003). The |$R^{2}$| discussed first, |$R^{2}_{resid}$| (for residual variance), is similar to |$R^{2}_{glmm}$| presented by Nakagawa and Schielzeth (2013) and related work (Edwards et al., 2008; Jaeger et al., 2017; Nakagawa et al., 2017). Therefore, I will present |$R^{2}_{resid}$| first in application to GLMMs and then in application to phylogenetic models.
Depictions of the covariance matrices from LMM, PGLS, GLMM, and PGLMM models. In the covariance matrix for LMMs, |$\sigma ^{2}\boldsymbol\Sigma(\sigma^{2}_{b})=\sigma ^{2}_{b}\boldsymbol\Sigma_{b}+\sigma^{2}$|, the variance of the random effect |$\sigma^{2}_{b}$| is scaled against the variance of the residual errors. In the PGLS with a Pagel’s |$\lambda $| branch-length transform, the covariance matrix is |$\sigma^{2}\boldsymbol\Sigma (\lambda)=\sigma^{2}\lambda \boldsymbol\Sigma_{BM} + \sigma^{2}(1-\lambda)\textbf{I}$|, in which |$\lambda $| determines the strength of phylogenetic signal in the residual errors. For GLMMs, the variance of the random effect is given by |$\boldsymbol\Sigma (\sigma ^{2}_{b})=\sigma^{2}_{b}\boldsymbol\Sigma_{b}$|, and there is additional variance |$\sigma^{2}_{w[i]}$| owing to the discreteness of the data. Similarly, for PGLMM, phylogenetic signal enters the model as a covariance matrix |$\sigma^{2}_{b}\boldsymbol\Sigma_{BM}$|, with additional variance |$\sigma^{2}_{w[i]}$|.
Depictions of the covariance matrices from LMM, PGLS, GLMM, and PGLMM models. In the covariance matrix for LMMs, |$\sigma ^{2}\boldsymbol\Sigma(\sigma^{2}_{b})=\sigma ^{2}_{b}\boldsymbol\Sigma_{b}+\sigma^{2}$|, the variance of the random effect |$\sigma^{2}_{b}$| is scaled against the variance of the residual errors. In the PGLS with a Pagel’s |$\lambda $| branch-length transform, the covariance matrix is |$\sigma^{2}\boldsymbol\Sigma (\lambda)=\sigma^{2}\lambda \boldsymbol\Sigma_{BM} + \sigma^{2}(1-\lambda)\textbf{I}$|, in which |$\lambda $| determines the strength of phylogenetic signal in the residual errors. For GLMMs, the variance of the random effect is given by |$\boldsymbol\Sigma (\sigma ^{2}_{b})=\sigma^{2}_{b}\boldsymbol\Sigma_{b}$|, and there is additional variance |$\sigma^{2}_{w[i]}$| owing to the discreteness of the data. Similarly, for PGLMM, phylogenetic signal enters the model as a covariance matrix |$\sigma^{2}_{b}\boldsymbol\Sigma_{BM}$|, with additional variance |$\sigma^{2}_{w[i]}$|.
Illustrative simulation comparing |$R^{2}$|s fitted to two LMMs (equation 1) that differ in whether the predictor variable |$x$| shows weak taxonomic signal (|$\sigma^{2}_{bx} = 0.2$|) or strong taxonomic signal (|$\sigma^{2}_{bx} = 0.8)$|
| . | Full . | Reduced . | |${R^{2}_{\rm glmm.partial}}$| . | |${R^{2}_{\rm resid}}$| . | |${R^{2}_{\rm pred}}$| . | |${R^{2}_{\rm lik}}$| . | |${R^{2}_{\rm glmm(c)}}$| . | |${R^{2}_{\rm glmm(m)}}$| . | |${R^{2}_{\rm glmm(r)}}$| . |
|---|---|---|---|---|---|---|---|---|---|
| |$\sigma^{2}_{bx} = 0.2$| | |$(\beta _{1}, \sigma^{2}_{b})$| | ||||||||
| Total tax. | (0,1) | (0,0) | 0.29 | 0.29 | 0.34 | 0.16 | |||
| Partial |$x$| | (1,1) | (0,1) | 0.25 | 0.26 | 0.26 | 0.25 | |||
| Total | (1,1) | (0,0) | 0.48 | 0.47 | 0.52 | 0.38 | 0.48 | 0.23 | 0.24 |
| Partial tax. | (1,1) | (1,0) | 0.30 | 0.31 | 0.36 | 0.19 | |||
| Total |$x$| | (1,0) | (0,0) | 0.24 | 0.24 | 0.24 | 0.24 | |||
| |$\sigma^{2}_{bx} = 0.8$| | |||||||||
| Total tax. | (0,1) | (0,0) | 0.36 | 0.36 | 0.41 | 0.22 | |||
| Partial |$x$| | (1,1) | (0,1) | 0.04 | 0.04 | 0.02 | 0.07 | |||
| Total | (1,1) | (0,0) | 0.38 | 0.38 | 0.42 | 0.28 | 0.38 | 0.18 | 0.21 |
| Partial tax. | (1,1) | (1,0) | 0.26 | 0.25 | 0.31 | 0.13 | |||
| Total |$x$| | (1,0) | (0,0) | 0.17 | 0.17 | 0.17 | 0.17 |
| . | Full . | Reduced . | |${R^{2}_{\rm glmm.partial}}$| . | |${R^{2}_{\rm resid}}$| . | |${R^{2}_{\rm pred}}$| . | |${R^{2}_{\rm lik}}$| . | |${R^{2}_{\rm glmm(c)}}$| . | |${R^{2}_{\rm glmm(m)}}$| . | |${R^{2}_{\rm glmm(r)}}$| . |
|---|---|---|---|---|---|---|---|---|---|
| |$\sigma^{2}_{bx} = 0.2$| | |$(\beta _{1}, \sigma^{2}_{b})$| | ||||||||
| Total tax. | (0,1) | (0,0) | 0.29 | 0.29 | 0.34 | 0.16 | |||
| Partial |$x$| | (1,1) | (0,1) | 0.25 | 0.26 | 0.26 | 0.25 | |||
| Total | (1,1) | (0,0) | 0.48 | 0.47 | 0.52 | 0.38 | 0.48 | 0.23 | 0.24 |
| Partial tax. | (1,1) | (1,0) | 0.30 | 0.31 | 0.36 | 0.19 | |||
| Total |$x$| | (1,0) | (0,0) | 0.24 | 0.24 | 0.24 | 0.24 | |||
| |$\sigma^{2}_{bx} = 0.8$| | |||||||||
| Total tax. | (0,1) | (0,0) | 0.36 | 0.36 | 0.41 | 0.22 | |||
| Partial |$x$| | (1,1) | (0,1) | 0.04 | 0.04 | 0.02 | 0.07 | |||
| Total | (1,1) | (0,0) | 0.38 | 0.38 | 0.42 | 0.28 | 0.38 | 0.18 | 0.21 |
| Partial tax. | (1,1) | (1,0) | 0.26 | 0.25 | 0.31 | 0.13 | |||
| Total |$x$| | (1,0) | (0,0) | 0.17 | 0.17 | 0.17 | 0.17 |
Partial |$R^{2}$|s were computed for all nested models, ordered in rows with the total |$R^{2}$| in the middle. Full and reduced models are specified by the presence or absence of the two parameters (|$\beta _{1}, \sigma ^{2}_{b})$|. |$R^{2}_{glmm.partial}$| is the partial |$R^{2}_{glmm(c)}$| (equation 5). |$R^{2}_{glmm}$| was computed for the total |$R^{2}$| (|$R^{2}_{glmm(c)}$|, equation 2) and marginal |$R^{2}$|s (|$R^{2}_{glmm(m)}$|, equation 3; |$R^{2}_{glmm(r)}$|, equation 4); because |$R^{2}_{glmm(m)}$| and |$R^{2}_{glmm(r)}$| are computed from the full model, they are placed on the same row as the total |$R^{2}$|s from the other methods. Simulations were performed with |$\beta _{1} = 0$| or 1, with eight levels of |$b$||$(\sigma^{2}_{b} = 0$| or 1) and 10 observations per level (|$\sigma ^{2}_{e} = 1.5$|). The mean |$t$|-scores for the fixed effect when there was weak and strong taxonomic signal in |$x$| were 5.23 and 1.61, respectively, which correspond to |$P < 10^{-5}$| and |$P = 0.11$|. Although the values reported are from single simulations, they are close to the means obtained from 1000 simulations.
Illustrative simulation comparing |$R^{2}$|s fitted to two LMMs (equation 1) that differ in whether the predictor variable |$x$| shows weak taxonomic signal (|$\sigma^{2}_{bx} = 0.2$|) or strong taxonomic signal (|$\sigma^{2}_{bx} = 0.8)$|
| . | Full . | Reduced . | |${R^{2}_{\rm glmm.partial}}$| . | |${R^{2}_{\rm resid}}$| . | |${R^{2}_{\rm pred}}$| . | |${R^{2}_{\rm lik}}$| . | |${R^{2}_{\rm glmm(c)}}$| . | |${R^{2}_{\rm glmm(m)}}$| . | |${R^{2}_{\rm glmm(r)}}$| . |
|---|---|---|---|---|---|---|---|---|---|
| |$\sigma^{2}_{bx} = 0.2$| | |$(\beta _{1}, \sigma^{2}_{b})$| | ||||||||
| Total tax. | (0,1) | (0,0) | 0.29 | 0.29 | 0.34 | 0.16 | |||
| Partial |$x$| | (1,1) | (0,1) | 0.25 | 0.26 | 0.26 | 0.25 | |||
| Total | (1,1) | (0,0) | 0.48 | 0.47 | 0.52 | 0.38 | 0.48 | 0.23 | 0.24 |
| Partial tax. | (1,1) | (1,0) | 0.30 | 0.31 | 0.36 | 0.19 | |||
| Total |$x$| | (1,0) | (0,0) | 0.24 | 0.24 | 0.24 | 0.24 | |||
| |$\sigma^{2}_{bx} = 0.8$| | |||||||||
| Total tax. | (0,1) | (0,0) | 0.36 | 0.36 | 0.41 | 0.22 | |||
| Partial |$x$| | (1,1) | (0,1) | 0.04 | 0.04 | 0.02 | 0.07 | |||
| Total | (1,1) | (0,0) | 0.38 | 0.38 | 0.42 | 0.28 | 0.38 | 0.18 | 0.21 |
| Partial tax. | (1,1) | (1,0) | 0.26 | 0.25 | 0.31 | 0.13 | |||
| Total |$x$| | (1,0) | (0,0) | 0.17 | 0.17 | 0.17 | 0.17 |
| . | Full . | Reduced . | |${R^{2}_{\rm glmm.partial}}$| . | |${R^{2}_{\rm resid}}$| . | |${R^{2}_{\rm pred}}$| . | |${R^{2}_{\rm lik}}$| . | |${R^{2}_{\rm glmm(c)}}$| . | |${R^{2}_{\rm glmm(m)}}$| . | |${R^{2}_{\rm glmm(r)}}$| . |
|---|---|---|---|---|---|---|---|---|---|
| |$\sigma^{2}_{bx} = 0.2$| | |$(\beta _{1}, \sigma^{2}_{b})$| | ||||||||
| Total tax. | (0,1) | (0,0) | 0.29 | 0.29 | 0.34 | 0.16 | |||
| Partial |$x$| | (1,1) | (0,1) | 0.25 | 0.26 | 0.26 | 0.25 | |||
| Total | (1,1) | (0,0) | 0.48 | 0.47 | 0.52 | 0.38 | 0.48 | 0.23 | 0.24 |
| Partial tax. | (1,1) | (1,0) | 0.30 | 0.31 | 0.36 | 0.19 | |||
| Total |$x$| | (1,0) | (0,0) | 0.24 | 0.24 | 0.24 | 0.24 | |||
| |$\sigma^{2}_{bx} = 0.8$| | |||||||||
| Total tax. | (0,1) | (0,0) | 0.36 | 0.36 | 0.41 | 0.22 | |||
| Partial |$x$| | (1,1) | (0,1) | 0.04 | 0.04 | 0.02 | 0.07 | |||
| Total | (1,1) | (0,0) | 0.38 | 0.38 | 0.42 | 0.28 | 0.38 | 0.18 | 0.21 |
| Partial tax. | (1,1) | (1,0) | 0.26 | 0.25 | 0.31 | 0.13 | |||
| Total |$x$| | (1,0) | (0,0) | 0.17 | 0.17 | 0.17 | 0.17 |
Partial |$R^{2}$|s were computed for all nested models, ordered in rows with the total |$R^{2}$| in the middle. Full and reduced models are specified by the presence or absence of the two parameters (|$\beta _{1}, \sigma ^{2}_{b})$|. |$R^{2}_{glmm.partial}$| is the partial |$R^{2}_{glmm(c)}$| (equation 5). |$R^{2}_{glmm}$| was computed for the total |$R^{2}$| (|$R^{2}_{glmm(c)}$|, equation 2) and marginal |$R^{2}$|s (|$R^{2}_{glmm(m)}$|, equation 3; |$R^{2}_{glmm(r)}$|, equation 4); because |$R^{2}_{glmm(m)}$| and |$R^{2}_{glmm(r)}$| are computed from the full model, they are placed on the same row as the total |$R^{2}$|s from the other methods. Simulations were performed with |$\beta _{1} = 0$| or 1, with eight levels of |$b$||$(\sigma^{2}_{b} = 0$| or 1) and 10 observations per level (|$\sigma ^{2}_{e} = 1.5$|). The mean |$t$|-scores for the fixed effect when there was weak and strong taxonomic signal in |$x$| were 5.23 and 1.61, respectively, which correspond to |$P < 10^{-5}$| and |$P = 0.11$|. Although the values reported are from single simulations, they are close to the means obtained from 1000 simulations.
To illustrate the difference between marginal and partial |$R^{2}$|s, consider the case of the model from equation (1) in which, in addition to |$Y$|, the values of |$x$| are associated with taxon (i.e., |$x$| has taxonomic signal). Specifically, let |$x=b_{x}+e_{x}$| where |$b_{x}$| is a random effect for taxa that has the same structure as the random variable |$b$| for |$Y$| in equation (1). To generate a contrast between marginal and partial |$R^{2}$|s, let |$x$| have either weak (|$\sigma ^{2}_{bx} = 0.2, \sigma ^{2}_{ex} = 0.8$|) or strong (|$\sigma ^{2}_{bx} = 0.8, \sigma ^{2}_{ex} = 0.2$|) taxonomic signal (Table 1). The marginal values of |$R^{2}_{glmm}$| (|$R^{2}_{glmm(m)}$| and |$R^{2}_{glmm(r)}$| from equations 3 and 4) are little affected by the strength of taxonomic signal in |$x$|. In contrast, the partial |$R^{2}$| values for |$x$| from |$R^{2}_{glmm.partial}$| (equation 5) and |$R^{2}_{resid}$| (equation 6) are much lower for the case when there is strong taxonomic signal in |$x$|; this is also true for |$R^{2}_{pred}$| and |$R^{2}_{lik}$| which are introduced later. The pattern seen in the |$R^{2}$|s corresponds to the statistical significance of the coefficient for |$x$| in the model: the average |$t$|-scores for the fixed effect when there was weak and strong taxonomic signal in |$x$| were 5.23 and 1.61, respectively, which correspond to |$P < 10^{-5}$| and |$P = 0.11$|. Thus, the lower significance of the fixed effect with strong taxonomic signal in |$x$| is reflected in the lower partial |$R^{2}$|s. Note finally that |$R^{2}_{glmm.partial}$| and |$R^{2}_{resid}$| have very similar values, implying that in this case the sum of all variance estimates used in equation (5) are close to var(|$Y)$| in equation (6).
The explanation for taxonomic signal in |$x$| affecting partial |$R^{2}$|s is that, in the reduced model with |$x$| removed, the taxonomic signal in values of |$Y$| that was accounted for by |$x$| in the full model is absorbed into the random effect |$b$| (equation 1) in the reduced model. This can be seen by looking at the taxonomic effect model [full model: (|$\beta _{1}, \sigma ^{2}_{b}) = (0,1)$|; reduced model: (|$\beta _{1}, \sigma ^{2}_{b}) = (0,0)$|]. For the case with strong taxonomic signal in |$x$|, the total |$R^{2}$| for the taxonomic effect model (0.36) is similar to the total |$R^{2}$| for the full model (0.38), implying that the taxonomic signal in the residuals can explain most of the taxonomic signal in the data |$Y$|.
The error term |$e_{i}$| has a multivariate Gaussian distribution with mean 0 and covariance matrix |$\sigma ^{2} \boldsymbol\Sigma(\theta)$| that may depend on a vector of parameters |$\theta $|. In PGLS, the structure of |$\sigma ^{2}\boldsymbol\Sigma(\theta )$| is typically generated under a specific model of evolution, such as an Ornstein–Uhlenbeck model of evolution under selection to an optimum (Martins and Hansen, 1997; Blomberg et al., 2003; Lavin et al., 2008). The definitions of |$R^{2}_{resid}$|, |$R^{2}_{pred}$|, and |$R^{2}_{lik}$| are not limited by the form of evolutionary transform, although their implementation for phylogenetic models for discrete data is limited by the availability of methods for fitting discrete phylogenetic models. I will present the |$R^{2}$|s using Pagel’s |$\lambda $| branch-length transform (Pagel, 1997; Housworth et al., 2004) that makes implementation easiest. In this transform, |$\boldsymbol\Sigma (\lambda)$| is the sum of two matrices, |$\boldsymbol\Sigma(\lambda ) = \lambda \boldsymbol\Sigma_{BM} + (1-\lambda)$|I, where |$\boldsymbol\Sigma_{BM}$| is the covariance matrix derived under the assumption of Brownian motion evolution (Felsenstein, 1985; Grafen, 1989), and I is the identity matrix. If |$\lambda = 1$|, then the covariance between errors at two tips on the phylogenetic tree is proportional to the height of their most common ancestor. If |$\lambda = 0$|, the errors are uncorrelated, and |$0 < \lambda < 1$| gives intermediate levels of phylogenetic signal. The effect of |$\lambda $| on the covariances among errors can be depicted by lengthening the tip branches of the BM phylogenetic tree to make up a proportion |$\lambda $| of the base-to-tip distance (Fig. 1, PGLS). Although the phylogenetic tree in Figure 1 is ultrametric, the calculations of |$R^{2}$|s are the same for trees with non-contemporaneous tips.
The LMM in equation (1) has the same statistical structure as equation (7), since the covariance matrix for the LMM can be written |$\sigma ^{2}_{e}\boldsymbol\Sigma(\sigma ^{2}_{b})=\sigma ^{2}_{b }\boldsymbol\Sigma_{b}+\sigma ^{2}_{e}$|I, where |$\boldsymbol\Sigma_{b}$| is a block-diagonal matrix whose values are 1 for each row |$i$| and column |$j$| corresponding to species within the same taxon and 0 elsewhere (Gelman and Hill, 2007). Although random effects in LMMs are often explained as if they were randomly varying fixed effects, technically they occur in the variance–covariance matrix of the error term. Nonetheless, in most fitting algorithms, such as that used in lmer in the lme4 R package (Bates et al., 2015), the values of |$b_{i}$| at each level of the random effect are computed; technically, these values are the conditional expectations of |$b_{i}$| as opposed to fitted coefficients. LMMs can be fit as if they were phylogenetic models, which gives identical results as fitting them as LMMs (Ives, 2018).
For the phylogenetic model with Pagel’s |$\lambda $| transformation, a possible approach to defining an |$R^{2}$| is to treat (1-|$\lambda )\sigma ^{2}$| as the unexplained variance as done for |$\sigma ^{2}_{e}$| in the LMM (equation 6). This approach fails, however. The problem is that the evolutionary null hypothesis is BM phylogenetic signal (|$\lambda = 1$|), and it is common for estimates of |$\lambda = 1$| in data sets. When this occurs, the total |$R^{2}_{resid}$| = 1 (equation 6) and consequently, all partial |$R^{2}$|s will be 1 or undefined (0 divided by 0). This problem does not occur for the LMM because |$R^{2}_{resid}$| only equals 1 if there is no variation in |$Y$| among species within the same taxa; while this is not impossible, it will essentially never occur. An exception to this is models that incorporate measurement error, in which tip lengths can go to zero when estimates of measurement error in |$Y$| are too high (Ives et al., 2007); |$R^{2}_{resid}, R^{2}_{pred}$|, and |$R^{2}_{lik}$| can be extended for measurement-error models, although this is not described here. The problem of defining the unexplained (residual) variation needed to compute |$R^{2}_{resid}$| (equation 6) is more clear for other branch-length transforms, such as the OU transform, in which |$\sigma ^{2}\boldsymbol\Sigma(\theta )$| does not decompose into linear terms as it does in Pagel’s |$\lambda $| transformation.
A solution to this problem involves scaling |$\boldsymbol\Sigma\theta)$|. Because the total variance in the model, |$\sigma ^{2}\boldsymbol\Sigma (\theta)$|, is the product of |$\sigma ^{2}$| and |$\boldsymbol\Sigma (\theta)$|, the matrix |$\boldsymbol\Sigma (\theta)$| can be rescaled by a constant without changing the fit of the statistical model; the only effect of multiplying |$\boldsymbol\Sigma(\theta)$| by a constant is to change the value of |$\sigma ^{2}$| by 1/constant in the fitted model. Even though this does not affect model fitting, it does affect an |$R^{2}$| that is based on |$\sigma ^{2}$|. In general |$\boldsymbol\Sigma (\hat\theta_f) \ne \boldsymbol\Sigma(\hat\theta)$|, so |$\boldsymbol\Sigma(\theta)$| does not cancel out when dividing |$(\hat\sigma_e)^2$| by |$(\hat\sigma_{e.r})^2$| (equation 6). To solve this problem, I propose scaling |$\boldsymbol\Sigma (\theta)$| so that the sum of branch lengths of the corresponding phylogenetic tree equals 1. For a fitted tree with strong phylogenetic signal, scaling |$\boldsymbol\Sigma (\theta)$| to have the sum of branch lengths equal to 1 will make the base-to-tip distances greater than a fitted tree with no phylogenetic signal. Because this scaling increases the diagonal elements in |$\boldsymbol\Sigma (\theta)$| when there is greater phylogenetic signal, it will reduce the estimates of |$\sigma^{2}$| and decrease the variance in the residuals that is unexplained by the model. Although this is only a convention (as opposed to a scaling derived from theory), the resulting |$R^{2}_{resid}$| performs for phylogenetic models in a similar way as it does for LMMs.
Simulation example of sprint speed regressed on log body mass (|$x_{1})$| and log hind limb length (|$x_{2})$|
| . | Phylogenetic signal in |$x_{2}$| . | No phylogenetic signal in |$x_{2}$| . | ||
|---|---|---|---|---|
| Coefficient . | Estimate . | |$P$| . | Estimate . | |$P$| . |
| Intercept | 0.34 | 0.58 | 0.28 | 0.65 |
| |$x_{1}$| | 1.17 | |$<$|10|$^{-5}$| | 1.00 | |$<$|10|$^{-5}$| |
| |$x_{2}$| | 0.21 | 0.22 | 0.20 | 0.0038 |
| |$\lambda $| | 1 | 1 | ||
| N = 30 | logLik = |$-$|16.92 | logLik = |$-$|16.96 | ||
| . | Phylogenetic signal in |$x_{2}$| . | No phylogenetic signal in |$x_{2}$| . | ||
|---|---|---|---|---|
| Coefficient . | Estimate . | |$P$| . | Estimate . | |$P$| . |
| Intercept | 0.34 | 0.58 | 0.28 | 0.65 |
| |$x_{1}$| | 1.17 | |$<$|10|$^{-5}$| | 1.00 | |$<$|10|$^{-5}$| |
| |$x_{2}$| | 0.21 | 0.22 | 0.20 | 0.0038 |
| |$\lambda $| | 1 | 1 | ||
| N = 30 | logLik = |$-$|16.92 | logLik = |$-$|16.96 | ||
| . | Phylogenetic signal in |$x_{2}$| . | No phylogenetic signal in |$x_{2}$| . | ||||
|---|---|---|---|---|---|---|
| Reduced model . | |$R^{2}_{resid}$| . | |$R^{2}_{pred}$| . | |$R^{2}_{lik}$| . | |$R^{2}_{resid}$| . | |$R^{2}_{pred}$| . | |$R^{2}_{lik}$| . |
| |$x_{2}$| = 0 | 0.05 | -0.05 | 0.05 | 0.16 | 0.22 | 0.26 |
| |$\lambda = 0$| | 0.40 | 0.65 | 0.51 | 0.46 | 0.65 | 0.56 |
| |$x_{1}=x_{2} =\lambda = 0$| | 0.86 | 0.92 | 0.89 | 0.78 | 0.86 | 0.82 |
| . | Phylogenetic signal in |$x_{2}$| . | No phylogenetic signal in |$x_{2}$| . | ||||
|---|---|---|---|---|---|---|
| Reduced model . | |$R^{2}_{resid}$| . | |$R^{2}_{pred}$| . | |$R^{2}_{lik}$| . | |$R^{2}_{resid}$| . | |$R^{2}_{pred}$| . | |$R^{2}_{lik}$| . |
| |$x_{2}$| = 0 | 0.05 | -0.05 | 0.05 | 0.16 | 0.22 | 0.26 |
| |$\lambda = 0$| | 0.40 | 0.65 | 0.51 | 0.46 | 0.65 | 0.56 |
| |$x_{1}=x_{2} =\lambda = 0$| | 0.86 | 0.92 | 0.89 | 0.78 | 0.86 | 0.82 |
For the simulation, the regression coefficients for log(body size) and log(hind leg length) were |$\beta _{1} = 1$| and |$\beta _{2} = 0.5$|, and the intercept was |$\beta _{0} = 0$| (equation 8); residual variation was given by BM evolution, so |$\lambda = 1$|. Hind limb length was simulated under BM evolution (left table) or as a Gaussian random variable (right table), and in both cases the variance of the hind limb length was standardized to 1.
Simulation example of sprint speed regressed on log body mass (|$x_{1})$| and log hind limb length (|$x_{2})$|
| . | Phylogenetic signal in |$x_{2}$| . | No phylogenetic signal in |$x_{2}$| . | ||
|---|---|---|---|---|
| Coefficient . | Estimate . | |$P$| . | Estimate . | |$P$| . |
| Intercept | 0.34 | 0.58 | 0.28 | 0.65 |
| |$x_{1}$| | 1.17 | |$<$|10|$^{-5}$| | 1.00 | |$<$|10|$^{-5}$| |
| |$x_{2}$| | 0.21 | 0.22 | 0.20 | 0.0038 |
| |$\lambda $| | 1 | 1 | ||
| N = 30 | logLik = |$-$|16.92 | logLik = |$-$|16.96 | ||
| . | Phylogenetic signal in |$x_{2}$| . | No phylogenetic signal in |$x_{2}$| . | ||
|---|---|---|---|---|
| Coefficient . | Estimate . | |$P$| . | Estimate . | |$P$| . |
| Intercept | 0.34 | 0.58 | 0.28 | 0.65 |
| |$x_{1}$| | 1.17 | |$<$|10|$^{-5}$| | 1.00 | |$<$|10|$^{-5}$| |
| |$x_{2}$| | 0.21 | 0.22 | 0.20 | 0.0038 |
| |$\lambda $| | 1 | 1 | ||
| N = 30 | logLik = |$-$|16.92 | logLik = |$-$|16.96 | ||
| . | Phylogenetic signal in |$x_{2}$| . | No phylogenetic signal in |$x_{2}$| . | ||||
|---|---|---|---|---|---|---|
| Reduced model . | |$R^{2}_{resid}$| . | |$R^{2}_{pred}$| . | |$R^{2}_{lik}$| . | |$R^{2}_{resid}$| . | |$R^{2}_{pred}$| . | |$R^{2}_{lik}$| . |
| |$x_{2}$| = 0 | 0.05 | -0.05 | 0.05 | 0.16 | 0.22 | 0.26 |
| |$\lambda = 0$| | 0.40 | 0.65 | 0.51 | 0.46 | 0.65 | 0.56 |
| |$x_{1}=x_{2} =\lambda = 0$| | 0.86 | 0.92 | 0.89 | 0.78 | 0.86 | 0.82 |
| . | Phylogenetic signal in |$x_{2}$| . | No phylogenetic signal in |$x_{2}$| . | ||||
|---|---|---|---|---|---|---|
| Reduced model . | |$R^{2}_{resid}$| . | |$R^{2}_{pred}$| . | |$R^{2}_{lik}$| . | |$R^{2}_{resid}$| . | |$R^{2}_{pred}$| . | |$R^{2}_{lik}$| . |
| |$x_{2}$| = 0 | 0.05 | -0.05 | 0.05 | 0.16 | 0.22 | 0.26 |
| |$\lambda = 0$| | 0.40 | 0.65 | 0.51 | 0.46 | 0.65 | 0.56 |
| |$x_{1}=x_{2} =\lambda = 0$| | 0.86 | 0.92 | 0.89 | 0.78 | 0.86 | 0.82 |
For the simulation, the regression coefficients for log(body size) and log(hind leg length) were |$\beta _{1} = 1$| and |$\beta _{2} = 0.5$|, and the intercept was |$\beta _{0} = 0$| (equation 8); residual variation was given by BM evolution, so |$\lambda = 1$|. Hind limb length was simulated under BM evolution (left table) or as a Gaussian random variable (right table), and in both cases the variance of the hind limb length was standardized to 1.
An alternative approach for determining |$\sigma ^{2}_{d}$| is to use the GLM iterated weights |$\sigma ^{2}_{w[i]}$|, since these are associated with the unexplained variance in the Gaussian space of the GLMM (equation 9). Because |$\sigma ^{2}_{w[i]}$| will differ among data points, it is necessary to take an average; the appropriate average is the geometric mean (Appendix). This leads to using |$\sigma ^{2}_{d}=\sigma ^{2}_{w} = \exp(E\{\log \sigma^{2}_{w[i]}\})$|. For a binary GLMM, using either |$\sigma ^{2}_{d} = 0.8768809 \pi ^2/3$| or |$\sigma ^{2}_{d}=\sigma ^{2}_{w}$| leads to the very similar values of |$R^{2}_{resid}$| for both logit and probit link functions. Nonetheless, because the values of |$\sigma ^{2}_{d}$| differ, so do the values of |$R^{2}_{resid}$|, which are lower when scaling |$\sigma ^{2}_{d} = \sigma ^{2}_{w}$|. Throughout the remainder of this article, the binary GLMM will use a logit link function with |$\sigma ^{2}_{d} = 0.8768809 \pi ^2/3$|, which is most comparable to |$R^{2}_{glmm(c)}$|, although |$\sigma ^{2}_{d}=\sigma ^{2}_{w}$| is the default in the |$R$| package rr2.
$R^{2}_{pred}$
|$R^{2}_{pred}$| (for prediction) is based on the variance in the difference between observed and predicted data, |$\frac{1}{n}\sum(Y_i-\hat Y_i)$|. This approach conceptually comes the closest to answering the question of how much variation in the data is explained by the covariances in the model, where the “explanation” is measured by the prediction of the residuals. For the case of LMMs, the predicted values |$\hat Y_i$| are taken as the sum of the fixed effect values and the conditional estimates of the values of the random effects. For the LMM in Figure 1, this corresponds to the estimated values at the polytomies formed at the nodes shared by all observations within the same level of the random effect (i.e., taxon). As the number of observations within each level of the random effect increases, |$R^{2}_{pred}$| for LMMs converges to |$R^{2}_{resid}$| because the estimates of the values of the random effects become more precise.
The same approach as used for LMMs can be used for GLMMs and PGLMMs. In these cases, the variances are calculated for untransformed values of |$Y_{i}$|, rather than in the Gaussian space of |$g(\mu _{i})$| as was done for |$R^{2}_{resid}$|. Therefore, one should expect that they may take different values. The predicted values are given from the estimation algorithms for GLMMs by glmer() in the R package lme4 (Bates et al., 2015) and for PGLMMs by binaryPGLMM() (Ives and Helmus, 2011; Ives and Garland, 2014) in the R packages ape (Paradis, 2012) and rr2 (Ives and Li, 2018). In PGLMMs, the estimates |$\hat Y_i$| correspond to those values a distance |$\sigma ^{2}_{w}$| from the tips of the tree in Figure 1.
$R^{2}_{lik}$
Here, Y is the |$n \times 1$| vector of response values |$Y_{i}$|, X is the |$n \times p$| matrix for |$p$| predictor variables (including the intercept), and |$\hat\beta$| is the |$1\times p$| vector of estimated regression coefficients (fixed effects). Technically, GLS models do not have covariances that depend on parameters |$\theta $| (even though “PGLS” is used to refer to models that do), so the SS in equation (15) is conditioned on |$\theta $|.
I derived |$R^{2}_{lik}$| defined by equation (17) for PGLS models, although it can be extended to any model that is fit using maximum likelihood. |$R^{2}_{lik}$| has been proposed for logistic regression (Cragg and Uhler, 1970; Maddala, 1983; Cox and Snell, 1989) and generalized by Magee (1990) and Pinheiro and Chao (2006), and used for LMMs by Kramer (2005). For discrete data, equation (17) does not have a maximum of one, because the maximum attainable log-likelihood for discrete data is zero. Therefore, Nagelkerke (1991) and Cameron and Windmeijer (1996) proposed dividing by the maximum attainable value, which is equation (17) with |$logLik(\hat\theta_f)=0$|; throughout, I have used this standardization.
|$R^{2}_{lik}$| is computing using maximum likelihood (rather than REML) estimates for LMMs, PGLS, and GLMM models. A complication occurs for binary PGLMM models, because the algorithm used by binaryPGLMM() from which |$R^{2}_{resid}$| and |$R^{2}_{pred}$| are calculated uses quasi-likelihoods and does not give a true maximum likelihood. Therefore, for binary phylogenetic models I calculated |$R^{2}_{lik}$| using the ML-based function phyloglm() in the R package phylolm (Ho and Ane, 2014), fitting the model with penalized ML but then using the provided log likelihood values to calculate |$R^{2}_{lik}$|. The function phyloglm() fits a phylogenetic logistic regression (Ives and Garland, 2010), rather than the PGLMM given by equation (9), although I will still refer to this as a binary PGLMM model for simplicity in the text.
Simulations for Assessment
The simulations to assess the statistical properties of the |$R^{2}$|s applied to LMM, PGLS, GLMM, and PGLMM all follow the same strategy. For each, data were simulated for the cases with variation in a predictor variable and no covariances (|$\beta _{1} > 0, \theta = 0$|), only covariances (|$\beta _{1} = 0, \theta > 0$|) or both (|$\beta _{1} > 0, \theta > 0$|). For each case, the model parameters were the same for all simulations, so that variation in values of a given |$R^{2}$| among data sets is caused by random sampling from the same statistical process.
For the PGLS model, to obtain the covariance matrix |$\boldsymbol\Sigma(\theta)$| in equation (7), I first simulated random phylogenetic trees using the rtree() function of the ape package of R (Paradis et al., 2004). Thus, a different tree was simulated for each data set. The strength of phylogenetic signal was varied using Pagel’s |$\lambda$| transformation. Values of |$x_{i}$| were simulated under the BM assumption using the rTraitCont() function (Paradis et al., 2004). The simulated data were fit using penalized maximum likelihood with the function phylolm() assuming a Pagel’s lambda transformation in the package phylolm in R (Ho and Ane, 2014). The PGLMM was similar to the PGLS model, but in contrast the predictor variable |$x_{i}$| was assumed to be independently distributed; including phylogenetic signal in |$x_{i}$| caused challenges for model fitting for some simulated data sets, making the simulation studies difficult. Phylogenetic signal in the residuals |$e_{i}$| was controlled by setting |$\boldsymbol\Sigma (\theta)=\theta \boldsymbol\Sigma _{BM}$| so that in the absence of phylogenetic signal (|$\theta = 0$|) the simulations conformed to simple logistic regression. To simulate binary data, a logit link function was used.
Results
The |$R^{2}$|s were assessed according to three properties: (i) their ability to measure goodness-of-fit as benchmarked by the LLR of full model and the model with only an intercept, (ii) whether they can identify separate sources of variation in the model, and (iii) the precision of their inference about goodness-of-fit. Property (iii) treats the |$R^{2}$|s as if they were statistical estimators of goodness-of-fit and asks how variable are the estimates when applied to repeated simulations from the same model (e.g., Cameron and Windmeijer, 1996). A more comprehensive assessment is given in Supplementary Section 1 available on Dryad at http://dx.doi.org/10.5061/dryad.345v6.
Goodness-of-Fit
All |$R^{2}$|s were positively related to the LLR (Fig. 2), which is a minimum requirement for an |$R^{2}$|. |$R^{2}_{lik}$| shows a monotonic relationship with LLR, which is necessarily the case due to the definition of |$R^{2}_{lik}$| (equation 17). For the remaining |$R^{2}$|s, values for a given LLR were generally lower for simulations in which variation was produced only by the fixed effect (|$\beta_{1} > 0, \theta = 0$|; Fig. 2, circles). This implies that, relative to the LLR, these |$R^{2}$|s were attributing less “explained” variance to fixed effects than random effects.
Results for LMM, PGLS, GLMM, and PGLMM simulations giving |$R^{2}_{resid}, R^{2}_{pred}, R^{2}_{lik}$|, and the OLS |$R^{2}_{adj}$| versus the LLR between full model and reduced model containing only an intercept. All simulated data had 100 samples. For LMM, the simulation model (equation 18) contained a fixed effect with |$\beta _{1} = 0$| or 1, and a random effect |$u_{i}$| with 10 levels and variance |$\theta = 0$| or 1.5. The binomial (binary) GLMM was similar but with |$\beta _{1} = 0$| or 1.8, and |$\theta = 0$| or 1.8. For PGLS, |$\beta _{1} = 0$| or 1, and the strength of phylogenetic signal |$\theta = \lambda = 0$| or 0.5; while for PGLMM |$\beta _{1} = 0$| or 1.5, and |$\theta = 0$| or 2. The LMM was fit using lmer() (Bates et al., 2015); the GLMM was fit using glmer() (Bates et al., 2015); the PGLS was fit using phylolm() (Ho and Ane, 2014); and for PGLMM LLR and |$R^{2}_{lik}$| were fit using phyloglm() (Ho and Ane, 2014), and |$R^{2}_{resid}$| and |$R^{2}_{pred}$| were fit using binaryPGLMM() (Ives and Garland, 2014). For reduced models without variance parameters, fitting was done with lm() and glm() in R.
Results for LMM, PGLS, GLMM, and PGLMM simulations giving |$R^{2}_{resid}, R^{2}_{pred}, R^{2}_{lik}$|, and the OLS |$R^{2}_{adj}$| versus the LLR between full model and reduced model containing only an intercept. All simulated data had 100 samples. For LMM, the simulation model (equation 18) contained a fixed effect with |$\beta _{1} = 0$| or 1, and a random effect |$u_{i}$| with 10 levels and variance |$\theta = 0$| or 1.5. The binomial (binary) GLMM was similar but with |$\beta _{1} = 0$| or 1.8, and |$\theta = 0$| or 1.8. For PGLS, |$\beta _{1} = 0$| or 1, and the strength of phylogenetic signal |$\theta = \lambda = 0$| or 0.5; while for PGLMM |$\beta _{1} = 0$| or 1.5, and |$\theta = 0$| or 2. The LMM was fit using lmer() (Bates et al., 2015); the GLMM was fit using glmer() (Bates et al., 2015); the PGLS was fit using phylolm() (Ho and Ane, 2014); and for PGLMM LLR and |$R^{2}_{lik}$| were fit using phyloglm() (Ho and Ane, 2014), and |$R^{2}_{resid}$| and |$R^{2}_{pred}$| were fit using binaryPGLMM() (Ives and Garland, 2014). For reduced models without variance parameters, fitting was done with lm() and glm() in R.
For the LMM, I included the adjusted |$R^{2}, R^{2}_{adj}$|, computed from OLS regression by treating the random effect as a categorical fixed effect. |$R^{2}_{resid}$| and |$R^{2}_{adj}$| were almost identical. This correspondence implies that |$R^{2}_{resid}$| gives an |$R^{2}$| that is interpretable in the same way as the standard |$R^{2}_{adj}$| but generalized to LMMs.
All of the |$R^{2}$|s other than |$R^{2}_{lik}$| showed greater scatter in their relationships with LLR for the simulations of binary data (GLMM and PGLMM). In part, this is due to the difficulty of estimating variance parameters |$\theta $| in binomial models. The scatter seems particularly large for |$R^{2}_{resid}$| and |$R^{2}_{pred}$| applied to PGLMM simulations, although this case requires some technical discussion. For PGLMM, the LLR was obtained from phyloglm() using penalized maximum likelihood, whereas |$R^{2}_{resid}$| and |$R^{2}_{pred}$| were estimated from the model fit by binaryPGLMM() using the pseudo-likelihood. The phyloglm() estimate of phylogenetic signal, |$\lambda $|, tended to absorb at zero even when the estimate |$\lambda $| from binaryPGLMM() was positive; therefore, |$R^{2}_{resid}$| and |$R^{2}_{pred}$| could be positive even when the LLR was zero. Previous comparison between phyloglm() and binaryPGLMM() showed that they have similar performances but do not necessarily give the same conclusions about the presence of phylogenetic signal for the same data set (Ives and Garland, 2014).
Identifying Sources of Variation
The partial |$R^{2}_{resid}, R^{2}_{pred}$|, and |$R^{2}_{lik}$| were generally able to identify sources of variation between components of a model, specifically between regression coefficients (fixed effects) and covariance parameters (random effects); this is seen in the generally non-overlapping clustering of symbols in Figure 3. Simulations with |$\beta _{1} > 0$| and |$\theta = 0$| should have partial |$R^{2}$|s for |$\beta _{1}$| that are positive and partial |$R^{2}$|s for |$\theta $| that are zero (circles, Fig. 3). Simulations with |$\beta _{1} = 0$| and |$\theta > 0$| should have partial |$R^{2}$|s for |$\beta _{1}$| that are zero and partial |$R^{2}$|s for |$\theta$| that are positive (triangles, Fig. 3). Simulations with |$\beta _{1} > 0$| and |$\theta > 0$| should have both partial |$R^{2}$|s positive (|$x$|’s, Fig. 3). Because in the simulations the values of |$\beta _{1}$| and |$\theta $| were the same whether or not the other was zero, the partial |$R^{2}$|s for |$\beta _{1}$| should be the same for simulations with |$\theta = 0$| (circles) as for simulations with |$\theta > 0$| (|$x$|’s), and the partial |$R^{2}$|s for |$\theta $| should similarly be the same for |$\beta _{1} = 0$| (triangles) and |$\beta _{1} > 0$|. For continuous data (LMM and PGLS), all three |$R^{2}$|s had similar performance and similar values of the partial |$R^{2}$|s (see also Supplementary Section 1 available on Dryad). For binary data (GLMM and PGLMM), the three |$R^{2}$|s showed more scatter, which in large part is due to the greater statistical challenge of estimating regression coefficients and variance parameters from discrete data. This is seen, for example, in the GLMM and PGLMM simulations with |$\beta _{1} > 0$| and |$\theta > 0$| in which the partial |$R^{2}_{lik}$| for |$\theta$| was occasionally zero (x’s); these cases occur when the estimate of |$\theta$| was zero even though a non-zero value was used in the simulations.
Results for LMM, PGLS, GLMM, and PGLMM simulations giving partial values of |$R^{2}_{resid}, R^{2}_{pred}, R^{2}_{lik}$|, and |$R^{2}_{adj}$|. The partial |$R^{2}$| for |$\beta _{1}$| was calculated using the reduced model in which |$\theta $| is removed, and for the partial |$R^{2}$| for |$\theta $| the reduced model had |$\beta _{1}$| removed. The simulated data and fitting methods are the same as in Figure 2.
Results for LMM, PGLS, GLMM, and PGLMM simulations giving partial values of |$R^{2}_{resid}, R^{2}_{pred}, R^{2}_{lik}$|, and |$R^{2}_{adj}$|. The partial |$R^{2}$| for |$\beta _{1}$| was calculated using the reduced model in which |$\theta $| is removed, and for the partial |$R^{2}$| for |$\theta $| the reduced model had |$\beta _{1}$| removed. The simulated data and fitting methods are the same as in Figure 2.
Inference about the Underlying Process
The ability of |$R^{2}$|s to infer the fit of a model to its assumed statistical process depends on the precision of the estimates of |$R^{2}$|. Figure 4 plots the mean values of the |$R^{2}$|s with 66% and 95% inclusion intervals for simulated data sets with sample sizes |$40, 60, \ldots , 160$|. For LMMs and GLMMs, there were 10 levels of the random effect; data sets were produced by first simulating 160 samples (16 replicates at each level) and then randomly removing two replicates at each level to reduce the sample size in steps of 20. For PGLS and PGLMM, each data set at each sample size was simulated independently.
Results for LMM, PGLS, GLMM, and PGLMM simulations showing means, 66% and 95% inclusion intervals for |$R^{2}_{resid}, R^{2}_{pred}, R^{2}_{lik}$|, and |$R^{2}_{adj}$| versus sample size. For all simulations 1000 data sets were analyzed at each sample size. Parameter values were: LMM, |$\beta _{1} = 1, \theta = 1.5$|; PGLS, |$\beta _{1} = 1, \theta = 0.5$|; GLMM, |$\beta _{1} = 1.8, \theta = 1.8$|; and PGLMM, |$\beta _{1} = 1.5, \theta = 2$|.
Results for LMM, PGLS, GLMM, and PGLMM simulations showing means, 66% and 95% inclusion intervals for |$R^{2}_{resid}, R^{2}_{pred}, R^{2}_{lik}$|, and |$R^{2}_{adj}$| versus sample size. For all simulations 1000 data sets were analyzed at each sample size. Parameter values were: LMM, |$\beta _{1} = 1, \theta = 1.5$|; PGLS, |$\beta _{1} = 1, \theta = 0.5$|; GLMM, |$\beta _{1} = 1.8, \theta = 1.8$|; and PGLMM, |$\beta _{1} = 1.5, \theta = 2$|.
For LMM simulations, |$R^{2}_{resid}, R^{2}_{pred}$|, and |$R^{2}_{adj}$| showed similar patterns (Fig. 4), reflecting the fact that they give very similar values (Fig. 2, Supplementary Section 2 available on Dryad). Mean values did not change much with sample size, and there was only a moderate increase in variability among simulations with decreasing sample size. In contrast, mean values of |$R^{2}_{lik}$| decreased with decreasing sample size. This probably reflects that information is lost when estimating the model parameters. In contrast to LMM simulations, the PGLS simulations showed less change in the means of |$R^{2}_{lik}$| with sample size, presumably because there were more covariances among samples (i.e., the covariance matrix had more non-zero elements) than in the LMM with few replicates per level.
For the GLMM, both |$R^{2}_{resid}$| and |$R^{2}_{pred}$| had somewhat higher variances (less precision) than |$R^{2}_{lik}$|. The greater variation in values of |$R^{2}_{resid}$| and |$R^{2}_{pred}$| compared with |$R^{2}_{lik}$| may occur because |$R^{2}_{resid}$| and |$R^{2}_{pred}$| depend directly on estimates from the models, whereas |$R^{2}_{lik}$| depends only on the synoptic likelihood. Thus, |$R^{2}_{resid}$| and |$R^{2}_{pred}$| are compromised when the estimates are poor, as is particularly the case when sample sizes are small. For PGLMM, the variances in |$R^{2}_{resid}$| and |$R^{2}_{pred}$| were similar to |$R^{2}_{lik}$| (Fig. 4). This is likely because estimates of phylogenetic signal (|$\lambda =\theta )$| were well-bounded, in contrast to the variance in random effects in the GLMMs.
Discussion
|$R^{2}_{resid}, R^{2}_{pred}$|, and |$R^{2}_{lik}$| were presented here with focus on phylogenetic models, although they are broadly applicable to models with correlated errors. Below, I first address their use as measures of goodness-of-fit, and then their specific application to phylogenetic models using mixed models as a reference. I end with general recommendations.
Partial R$^{2}$ and Goodness-of-Fit
The |$R^{2}$|s presented here are designed to give partial |$R^{2}$|s and are consequently measures of goodness-of-fit. Applications of standard |$R^{2}$|s in regression and ANOVA all rely on partial |$R^{2}$|s which compare the amount of variation explained by a full model to a reduced model with terms from the full model removed. Comparisons between full and reduced models underlie model selection, either by classical methods such as stepwise selection based on partial F scores or methods involving model selection criteria such as AIC. Partial |$R^{2}$|s are also closely associated with the statistical significance of a component of a model; this is especially the case for |$R^{2}_{lik}$| that is computed from log-likelihoods, and therefore, is directly related to likelihood ratio tests.
The partial |$R^{2}_{resid}, R^{2}_{pred}$|, and |$R^{2}_{lik}$| differ conceptually from the marginal |$R^{2}_{glmm(r)}$| (equation 3) derived for GLMMs (Nakagawa and Schielzeth, 2013). While marginal |$R^{2}$|s are useful in partitioning variances among components of fitted models, partial |$R^{2}$|s answer the question of how important are the components, where importance is measured by the loss of explanatory power following their removal. Because marginal |$R^{2}$|s do not compare between full and reduced models, they do not give information about how the fit of a model declines as components are removed. A final advantage of partial |$R^{2}$|s is that they make it easy to investigate any combination of fixed and random effects, and their interactions, by comparing full and reduced models; in contrast, computing and interpreting marginal |$R^{2}$|s for complex random effects, such as random slope effects, is less straightforward (Johnson, 2014).
Partial |$R^{2}$|s can answer the question of how important phylogenies are in explaining the pattern of covariance in the data. Does this mean that partial |$R^{2}$|s themselves are measures of phylogenetic signal (sensu Blomberg et al., 2003) or, equivalently, phylogenetic heritability (Lynch, 1991)? Phylogenetic signal could be computed, for example, by comparing the reduced model containing no phylogenetic term (e.g., with Pagel’s |$\lambda = 0$|) with the full model including phylogeny. However, in the phylogenetic literature, the term “phylogenetic signal” is associated with the magnitude of phylogenetic correlations in the branch-length transform (e.g., Pagel’s |$\lambda$|) that is estimated during model fitting (Blomberg et al., 2003; Revell et al., 2008). Phylogenetic signal is akin to the marginal |$R^{2}_{glmm(r)}$| which is based on partitioning variances into a correlated component (due to random effects) and uncorrelated residuals. Phylogenetic signal is thus a description of the phylogenetic correlations, rather than a measure of how important are the phylogenetic correlations in the goodness-of-fit of the model. Therefore, I think it is best to make a clear conceptual difference between partial |$R^{2}$|s giving information about the statistical significance of phylogenetic correlations in the model, and phylogenetic signal that gives a measure of their magnitude.
Applications to LMM, PGLS, GLMM, and PGLMM
For both continuous data (LMM and PGLS) and discrete data (GLMM and PGLMM), all |$R^{2}$|s had good performance, without one standing out as obviously the best. For the simple model with a single regression coefficient |$\beta _{1}$| and single covariance parameter |$\theta$| (equation 18), all |$R^{2}$|s were reasonable measures of goodness-of-fit, as assessed against the LLR between full and reduced models (Fig. 2). Nonetheless, |$R^{2}_{resid}$| and |$R^{2}_{pred}$| gave lower |$R^{2}$| values for variation explained by fixed effects (|$\beta _{1})$| than random effects (|$\theta )$| in comparison to |$R^{2}_{lik}$| and the LLR (Fig. 2). Also, although all three |$R^{2}$|s gave very similar values for the same data set with continuous data, the values differed more for discrete data fit with either GLMM or PGLMM (Fig. 2 and Supplementary Section 1 available on Dryad). This is reflected in general by the decreased precision of the |$R^{2}$|s applied to discrete data, as measured by the variation in values when fit to data simulated under the same parameter values (Fig. 4). All |$R^{2}$|s were capable of identifying whether |$\beta _{1}$| or |$\theta $| was responsible for the fit of the model to the data as determined by the partial |$R^{2}$|s; when |$\beta _{1}$| or |$\theta $| were zero in the simulations, the partial |$R^{2}$|s for |$\beta _{1}$| or |$\theta $|, respectively, were low (Fig. 3). However, the partial |$R^{2}$|s for GLMM and PGLMM tended to be more variable and less conclusive that for LMM and PGLS (Fig. 3). Also, partial |$R^{2}_{resid}$| and |$R^{2}_{pred}$| can sometimes be negative, but negative values of |$R^{2}_{lik}$| are not possible as long as the full and reduced models are fitted correctly. Finally, |$R^{2}_{lik}$| decreased as sample sizes decreased especially for LMMs but also for GLMMs (Fig. 4). This is an understandable consequence of the loss of information to separate full and reduced models when there are fewer data. Nonetheless, it is an undesirable property when comparing across data sets with different sample sizes.
The poorer performance of all three |$R^{2}$|s for GLMM and PGLMM relative to LMM and PGLS in terms of identifying sources of variation (Fig. 3) and precision (Fig. 4) is due to the greater challenges of fitting discrete data. This will affect the |$R^{2}$|s differently if they are differently sensitive to the fitting. |$R^{2}_{resid}$| is calculated from fitted variances in a model; |$R^{2}_{pred }$|is calculated from the fitted values of |$Y_{i}$|; and |$R^{2}_{lik}$| is calculated from the likelihood. Therefore, the three |$R^{2}$|s will be sensitive to the precision with which each of these attributes is estimated. For GLMMs, the precision of |$R^{2}_{lik}$| was slightly greater than the other two |$R^{2}$|s, although this did not appear to be the case for PGLMM. Nonetheless, for GLMMs and PGLMMs |$R^{2}_{resid}$| and |$R^{2}_{pred}$| can have negative values especially for small sample sizes, while this is never the case for |$R^{2}_{lik}$|.
Recommendations
An ideal |$R^{2}$| would make it possible to compare among different models and among different methods used to fit the same model (Kvalseth, 1985 properties of a good R2 #4 and #5). |$R^{2}_{resid}$| and |$R^{2}_{pred}$| can be used for any model and fitting method that estimates the covariance matrix (|$R^{2}_{resid}$|) and/or fitted values (|$R^{2}_{pred})$|; for example, they could be used to compare LMMs fit with ML vs. REML, or binary phylogenetic models fit with ML (e.g., phyloglm; Ho and Ane, 2014) or quasi-likelihood (e.g., binaryPGLMM; (Ives and Garland, 2014)). Nonetheless, |$R^{2}_{resid}$| and |$R^{2}_{pred}$| have a disadvantage in terms of generality. For |$R^{2}_{resid}$| a decision must be made about how to weight the covariance matrix |$\boldsymbol\Sigma (\theta)$| (equation 9), and for |$R^{2}_{pred}$| a decision has to be made about how values of |$Y_{i}$| are predicted. In contrast, |$R^{2}_{lik}$| is restricted to models that are fit with ML estimation; however, if ML is used for fitting, then values of |$R^{2}_{lik}$| can be compared across different types of models. This applies to any type of data and model fit with ML estimation.
An ideal |$R^{2}$| should also be intuitive (Kvalseth, 1985 property #1). However, intuitive is in the eye of the beholder. |$R^{2}_{resid}$| is similar to the OLS |$R^{2}$|, which grounds |$R^{2}_{resid}$| in the familiar and intuitive OLS framework. |$R^{2}_{lik}$| is also related to the OLS |$R^{2}$|: for LMMs and PGLS, |$R^{2}_{lik}$| is the same as the GLS |$R^{2}$| if the determinant of |$\sigma ^{2}\boldsymbol\Sigma(\theta )$| is scaled to equal one. |$R^{2}_{pred}$| predicts the data from covariances estimated in the model, and therefore, it is the most intuitive way to relate the variance explained by regression coefficients (fixed effects) to that explained by variance parameters (random effects). I think this is the most intuitive |$R^{2}$|. Therefore, if the question is how much variation in the data is explained by a model, or different components of a model, then I think |$R^{2}_{pred}$| gives the most direct answer.
An ideal |$R^{2}$| would make it possible to compare the fit of one model to one data set, and the same model to another data set; in other words, it should give an overall sense of how well a model fits. For this, |$R^{2}_{resid}$| and |$R^{2}_{pred}$| have the advantage over |$R^{2}_{lik}$| in being relatively insensitive to sample size (Fig. 4). Because |$R^{2}_{lik}$| is based on the likelihood ratio between full and reduced models, all else being equal, it will decrease for smaller data sets because there is less statistical power. If some data sets are very small, |$R^{2}_{resid}$| has the advantage over |$R^{2}_{pred}$| because for very small samples, |$R^{2}_{pred}$| is more prone to give negative values (although negative values of |$R^{2}_{resid}$| are possible for discrete data). Therefore, for broad comparisons among different data sets using the same model, I would use |$R^{2}_{resid}$|.
|$R^{2}$|s are often used as “summary statistics” to describe the fit of a model to data in a way that does not involve statistical inference about the underlying stochastic process that generated the data: “How does the model fit these data?” rather than “How much does the model infer about the process that generated the data?” Should |$R^{2}$|s be judged as a summary statistic? I think not. All the |$R^{2}$|s showed high variation among simulations of the same model with the same parameters, especially when sample sizes were small (Fig. 4). This means that how the model fits a specific data set involves a lot of chance, and hence one should not get too excited about a high |$R^{2}$|, or too discouraged about a low one. |$R^{2}$|s are best treated as inferential statistics, that is, as functions of a data-generating process that are themselves random variables (Cameron and Windmeijer, 1996; Nakagawa and Schielzeth, 2013). As an inferential statistic, |$R^{2}_{lik}$| most directly ties to hypothesis testing between full and reduced models using a likelihood ratio test, and therefore it is particularly valuable as a partial |$R^{2}$|. I think this makes |$R^{2}_{lik}$| the best choice for investigating how well a specific model fits a specific data set using partial |$R^{2}$|s.
Supplementary Material
Data available from the Dryad Digital Repository: http://dx.doi.org/10.5061/dryad.345v6.
Acknowledgments
I thank Ted Garland, Daijiang Li, Shinichi Nakagawa, Lucas Nell, Eric Pedersen, Joe Phillips, Bo Wang, Gang Wang, and especially Pierre de Villemereuil for wonderfully insightful comments that helped to clarify this article.
Funding
This work was supported by the US National Science Foundation [DEB-LTREB-1052160 and DEB-1240804].
Appendix
Scaling Variances in GLMMs and PGLMMs
GLMMs and PGLMMs are formulated without a residual variance. Nonetheless, in algorithms used to fit GLMMs and PGLMMs, terms arise that can be interpreted as residual errors. These are important for the formulation of |$R^{2}$|s in two ways. First, they explain why PGLMMs do not suffer the complications that arise for PGLS in defining |$R^{2}$|s. The “residual errors” of PGLMMs never disappear, in contrast to the non-phylogenetic residuals in PGLS models that often go to zero. Second, the “residual errors” of GLMMs and PGLMMs provide a way of computing the scaling variance |$\sigma ^{2}_{d}$| in |$R^{2}_{resid}$| (equation 11).
Accuracy of three scaling methods for the random effects variance in the binary GLMM in equation (A1). In the left panel, |$\sigma ^{2}_{b}$|[logit]/|$\sigma ^{2}_{d.NS}$| is plotted against |$\sigma ^{2}_{b}$|[probit], where |$\sigma ^{2}_{d.NS}=\pi ^{2}/3$|. In the center panel, |$\sigma ^{2}_{b}$|[logit]/|$\sigma ^{2}_{d.rNS}$| is plotted against |$\sigma ^{2}_{b}$|[probit], where |$\sigma ^{2}_{d.rNS} = 0.8768809 \pi ^{2}/3$|. In the right panel, |$\sigma ^{2}_{b}$|[logit]/|$\sigma ^{2}_{w}$|[logit] is plotted against |$\sigma ^{2}_{b}$|[probit]/|$\sigma ^{2}_{w}$|[probit], where |$\sigma ^{2}_{w}$|[logit] and |$\sigma ^{2}_{w}$|[probit] are given in equation (A3). One-hundred data sets with 10 levels of the random effect and a total sample size of 1000 were simulated and fit using the code provided in Supplementary Section 2 available on Dryad.
Accuracy of three scaling methods for the random effects variance in the binary GLMM in equation (A1). In the left panel, |$\sigma ^{2}_{b}$|[logit]/|$\sigma ^{2}_{d.NS}$| is plotted against |$\sigma ^{2}_{b}$|[probit], where |$\sigma ^{2}_{d.NS}=\pi ^{2}/3$|. In the center panel, |$\sigma ^{2}_{b}$|[logit]/|$\sigma ^{2}_{d.rNS}$| is plotted against |$\sigma ^{2}_{b}$|[probit], where |$\sigma ^{2}_{d.rNS} = 0.8768809 \pi ^{2}/3$|. In the right panel, |$\sigma ^{2}_{b}$|[logit]/|$\sigma ^{2}_{w}$|[logit] is plotted against |$\sigma ^{2}_{b}$|[probit]/|$\sigma ^{2}_{w}$|[probit], where |$\sigma ^{2}_{w}$|[logit] and |$\sigma ^{2}_{w}$|[probit] are given in equation (A3). One-hundred data sets with 10 levels of the random effect and a total sample size of 1000 were simulated and fit using the code provided in Supplementary Section 2 available on Dryad.
Residual Errors in Fitting GLMMs and PGLMMs
In GLMM fitting algorithms including those of Schall (1991), Breslow and Clayton (1993), and the Laplace algorithm used in glmer() (Pinheiro and Chao, 2006), the model is fit by constructing a covariance matrix in the Gaussian space (second line in equation A1) given by |$\textbf{V} = \textbf{W}^{-1} + \sigma ^{2}_{b}\textbf{C}$|. Here, |$\sigma ^{2}_{b}\textbf{C}$| is the covariance matrix for the random effects design matrix. |$\textbf{W}^{-1}$| is the diagonal matrix containing the theoretical variances of |$\varepsilon_{i}g'(\mu _{i})$| where |$\varepsilon_{i} = (y_{i}-\mu _{i})$| is the residual in the data space and |$g'(\mu _{i})$| is the derivative of the link function evaluated at the conditional value |$\mu _{i}$| corresponding to datum |$y_{i}$|. The matrix W also appears in algorithms used to estimate GLMs, and its diagonal elements are known as the GLM iterated weights (McCullagh and Nelder, 1989). Because the |$\varepsilon_{i}g'(\mu _{i})$|s never all go to zero in a well-defined model, they are treated as “residual errors” in the Gaussian space.
The algorithms of Schall (1991) and Breslow and Clayton (1993) implement this expression iteratively, using the “working” estimates of |$\mu _{i}$| to compute the residuals |$\varepsilon_{i}g'(\mu _{i})$|, and then treating equation (A2) as a GLS regression to estimate |$\beta +b_{i}$| and from these produce “updated” estimates of |$\mu _{i}$|. It is necessary to do this iteratively, because in general the link function |$g(\mu _{i})$| is nonlinear, so the residuals |$\varepsilon_{i}g'(\mu _{i})$| depend on |$\mu _{i}$|. The same approach was adapted for phylogenetic models by Ives and Helmus (2011). Because |$\mu _{i}$| are generally different for each datum |$i$|, the variances of |$\varepsilon_{i}g'(\mu _{i})$|, denoted |$\sigma ^{2}_{w[i]}$|, will also differ for each |$i$|. This is depicted in Figure 1 as variable lengths of |$\sigma ^{2}_{w[i]}$| added to the tips of the covariance trees for the GLMM and PGLMM.
GLMM and PGLMM Scaling Variances $\sigma^{2}_{d}$ for $R^{2}_{resid}$
|$R^{2}_{resid}$| requires a scaling term |$\sigma ^{2}_{d}$| for the variances of fixed and random effects (equation 11). For binary GLMMs with a logit link function, Nakagawa and Schielzeth (2013) use |$\sigma ^{2}_{d}=\pi^2 /3$|, the variance of the logistic distribution, while they use |$\sigma ^{2}_{d} = 1$| for the probit link function. Nakagawa et al. (2017) give additional approaches based on different approximation methods. Comparing estimates of the random effects variances for a binary GLMM fit with logit vs. probit link functions is a useful way to assess the scaling term |$\sigma ^{2}_{d}$|. Below I illustrate that neither |$\sigma ^{2}_{d}=\pi^2/3$| nor the delta approximation from Nakagawa et al. (2017) give particularly good scaling terms for binary GLMMs with a logit link function. I also present two better alternatives.
To explore the accuracy of |$\sigma ^{2}_{d.NS}=\pi^2/3 $|, consider the estimates of the random effects variance for equation (A1) using a logit vs. a probit link function, |$\sigma ^{2}_{b}$|[logit] and |$\sigma ^{2}_{b}$|[probit]. If the scaling |$\sigma ^{2}_{d.NS}$| is correct, then |$\sigma ^{2}_{b}$|[logit]/|$\sigma ^{2}_{d.NS}$| should approximate |$\sigma ^{2}_{b}$|[probit]; they will not in general be exactly the same due to differences in the fits of the logit and probit GLMMs. In a simple simulation study (see Supplementary Section 2 available on Dryad), using |$\sigma ^{2}_{d.NS}=\pi^2/3$| gives lower values |$\sigma ^{2}_{b}$|[logit]/|$\sigma ^{2}_{d.NS}$| than |$\sigma ^{2}_{b}$|[probit], implying that |$\sigma ^{2}_{d.NS}=\pi^2/3$| is too large (Fig. A1a).
There are two better alternatives. The first is 0.8768809 |$\pi^2/3 $|; call this the “reduced Nakagawa and Schielzeth” |$\sigma ^{2}_{d}$| denoted |$\sigma ^{2}_{d.rNS}$|. |$\sigma ^{2}_{d.rNS}$| is better than |$\sigma ^{2}_{d.NS}$| (Fig. A1b) for reasons explained in the next section.
Accuracy of four scaling methods for |$R^{2}_{resid}$| in a GLMM (equation A1) determined by plotting values from a logit link function against values from a probit link function. One-hundred data sets with 10 levels of the random effect and a total sample size of 1000 were simulated and fit using the code provided in the Supplementary Section 2 available on Dryad.
Accuracy of four scaling methods for |$R^{2}_{resid}$| in a GLMM (equation A1) determined by plotting values from a logit link function against values from a probit link function. One-hundred data sets with 10 levels of the random effect and a total sample size of 1000 were simulated and fit using the code provided in the Supplementary Section 2 available on Dryad.
The second alternative comes naturally from the residual variances |$\sigma ^{2}_{w[i]}$| arising from the estimation algorithms for GLMMs. There is a technical complication: because |$\sigma ^{2}_{w[i]}$| take on different values for each datum |$i$|, how should these be combined to give a single |$\sigma ^{2}_{w}$|? The algorithms for fitting GLMMs scale the total covariance matrix V according to the log of its determinant, and this implies that |$\sigma ^{2}_{w}$| should be the geometric mean of the values of |$\sigma ^{2}_{w[i]}$|. The probit and logit random effects variances should be scaled by |$\sigma ^{2}_{w}$|[probit] and |$\sigma ^{2}_{w}$|[logit], respectively. Thus, |$\sigma ^{2}_{b}$|[logit]/|$\sigma ^{2}_{w}$|[logit] should be approximately equal to |$\sigma ^{2}_{b}$|[probit]/|$\sigma ^{2}_{w}$|[probit], which is the case (Fig. A1c).
Threshold formulations of the GLMM in equation (A1) for probit (a–c) and logit (d–e) link functions. (f) Gives the probability density functions for a logistic distribution (solid), Gaussian distributions with variances |$\sigma ^{2}_{d.rNS} = 0.8768809\pi^2/3 $| dotted), and |$\sigma ^{2}_{d.NS}=\pi^2/3 $| dashed).
Threshold formulations of the GLMM in equation (A1) for probit (a–c) and logit (d–e) link functions. (f) Gives the probability density functions for a logistic distribution (solid), Gaussian distributions with variances |$\sigma ^{2}_{d.rNS} = 0.8768809\pi^2/3 $| dotted), and |$\sigma ^{2}_{d.NS}=\pi^2/3 $| dashed).
From the performance of the scaling terms shown in Figure A1, |$R^{2}_{resid}$| computed with logit and probit link functions should be more similar using either |$\sigma ^{2}_{d.rNS}$| or |$\sigma ^{2}_{w}$| than using |$\sigma ^{2}_{d.NS}$|. This is in fact the case (Fig. A2). Nakagawa et al. (2017) discuss a number of other approximations. In particular, although derived in a different manner, the values of |$\sigma ^{2}_{d}$| given by the delta method are similar to |$\sigma ^{2}_{w}$|, although they are computed using the estimates of the mean and variance of the distributional parameter (|$\mu _{i}$| in the case of the binomial distribution), rather than the GLM iterated weights that differ among data points |$i$|. As shown in Figure A2, |$\sigma ^{2}_{d.rNS}$| and |$\sigma ^{2}_{w}$| outperform the delta-method values of |$\sigma ^{2}_{d}$| computed with r.squaredGLMM() in the MuMIn package (Barton, 2018). The performances of the methods for calculating |$\sigma ^{2}_{d}$| depend on the details of the model, such as the sample size and the number of levels of the random effect; nonetheless, the patterns shown in Figure A2 are typical.
The Threshold Formulation of GLMMs and $\sigma^{2}_{d.rNS}$
For the logit link function, the same arguments follow (Fig. A3d,e). A complication occurs, however, because |$F_{D}\{Z_{i}| b_{i}\}$| in equation (A5) is the cumulative density of a logistic distribution, rather than the Gaussian distribution as it is for the probit link function. The assumption |$\sigma ^{2}_{d.NS}=\pi^2/3 $| is akin to approximating the logistic |$F_{D}\{Z_{i}\vert b_{i}\}$| with a Gaussian cumulative density function with the same variance, namely |$\pi^2/3 $|. This is not very accurate, however, because the logistic distribution is leptokurtic relative to the Gaussian distribution (Fig. A3f). Instead, a better approximation can be made by selecting the variance of the Gaussian distribution to give a better match to the logistic cumulative density function. A simple least-squares match gives |$\sigma ^{2}_{d.rNS} = 0.8768809 \pi^2/3 $|. While this is ad hoc, it works pretty well (Figs. A1 and A2).
Although this appendix has focused on GLMMs, identical arguments apply to PGLMMs.

![Depictions of the covariance matrices from LMM, PGLS, GLMM, and PGLMM models. In the covariance matrix for LMMs, $\sigma ^{2}\boldsymbol\Sigma(\sigma^{2}_{b})=\sigma ^{2}_{b}\boldsymbol\Sigma_{b}+\sigma^{2}$, the variance of the random effect $\sigma^{2}_{b}$ is scaled against the variance of the residual errors. In the PGLS with a Pagel’s $\lambda $ branch-length transform, the covariance matrix is $\sigma^{2}\boldsymbol\Sigma (\lambda)=\sigma^{2}\lambda \boldsymbol\Sigma_{BM} + \sigma^{2}(1-\lambda)\textbf{I}$, in which $\lambda $ determines the strength of phylogenetic signal in the residual errors. For GLMMs, the variance of the random effect is given by $\boldsymbol\Sigma (\sigma ^{2}_{b})=\sigma^{2}_{b}\boldsymbol\Sigma_{b}$, and there is additional variance $\sigma^{2}_{w[i]}$ owing to the discreteness of the data. Similarly, for PGLMM, phylogenetic signal enters the model as a covariance matrix $\sigma^{2}_{b}\boldsymbol\Sigma_{BM}$, with additional variance $\sigma^{2}_{w[i]}$.](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/sysbio/68/2/10.1093_sysbio_syy060/1/m_syy060f1.png?Expires=1605425534&Signature=SGXvZA9gLvLCLGwjWkTdPb6XGjQwVpNLRTDWpRfYtqH6psovGCPl6Tf9wD73kmirrHr6U2SdCiB87z-46hkhOTmnulHy~oMoVmDbhEFoObcds7dZqzR1I6MTdW2t9iXcrwzVXrnN3Vp9-qbNexLuSZsieZMV6j0rT0YF3iktqHZ4wrdcwzcrvIZ7DTE2~eAgMp66XFYP0V3vwSrJe4N7DotiIBElQBaE9lgKmndEhV13iG-hUbsVJKcEgV2svlAikeTOR88RPBGLDnVFEsFx72nFu3-bqnMNg6o~2Mv0xDbGjnEvLZ~nbGuFwgKmOVjgIpL3wC-R9UPH2RxpI5JhJg__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)



![Accuracy of three scaling methods for the random effects variance in the binary GLMM in equation (A1). In the left panel, $\sigma ^{2}_{b}$[logit]/$\sigma ^{2}_{d.NS}$ is plotted against $\sigma ^{2}_{b}$[probit], where $\sigma ^{2}_{d.NS}=\pi ^{2}/3$. In the center panel, $\sigma ^{2}_{b}$[logit]/$\sigma ^{2}_{d.rNS}$ is plotted against $\sigma ^{2}_{b}$[probit], where $\sigma ^{2}_{d.rNS} = 0.8768809 \pi ^{2}/3$. In the right panel, $\sigma ^{2}_{b}$[logit]/$\sigma ^{2}_{w}$[logit] is plotted against $\sigma ^{2}_{b}$[probit]/$\sigma ^{2}_{w}$[probit], where $\sigma ^{2}_{w}$[logit] and $\sigma ^{2}_{w}$[probit] are given in equation (A3). One-hundred data sets with 10 levels of the random effect and a total sample size of 1000 were simulated and fit using the code provided in Supplementary Section 2 available on Dryad.](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/sysbio/68/2/10.1093_sysbio_syy060/1/m_syy060f5.png?Expires=1605425534&Signature=CNp~r7i8MsRnfKPTEvF9q-lusbcRFtXe3lcSp0MnHsFfo1GNWo78pxWhhltNJBTOBScceZNdsCnJnVC7aSu5k1ZEYczQEj1QCy1ydxixxpc4Z7LFHsxHNyemj-zFk0eJLnZofW8rYBI1OkPk65SOIMMuHRWkesXOqKl9r7EvB5Tx9AZPma3xLQXYwzJ5PEY9sy9~x63HxX6UtvBSQWlPb3DrPbKnhE3STqNsG1VOnVeMjJlH1fK0GBQEnB3KaPiqqU1y05O7lHHxOHLwoXTtgS3KvHbW6-p~1LTW5Gk1m79SZfpJxJSSogx~3vDhPpbbUOjHLy86S-54jbsHjH0zxw__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)

