Analyzing Disparity and Rates of Morphological Evolution with Model-Based Phylogenetic Comparative Methods

Abstract Understanding variation in rates of evolution and morphological disparity is a goal of macroevolutionary research. In a phylogenetic comparative methods framework, we present three explicit models for linking the rate of evolution of a trait to the state of another evolving trait. This allows testing hypotheses about causal influences on rates of phenotypic evolution with phylogenetic comparative data. We develop a statistical framework for fitting the models with generalized least-squares regression and use this to discuss issues and limitations in the study of rates of evolution more generally. We show that the power to detect effects on rates of evolution is low in that even strong causal effects are unlikely to explain more than a few percent of observed variance in disparity. We illustrate the models and issues by testing if rates of beak-shape evolution in birds are influenced by brain size, as may be predicted from a Baldwin effect in which presumptively more behaviorally flexible large-brained species generate more novel selection on themselves leading to higher rates of evolution. From an analysis of morphometric data for 645 species, we find evidence that both macro- and microevolution of the beak are faster in birds with larger brains, but with the caveat that there are no consistent effects of relative brain size.[Baldwin effect; beak shape; behavioral drive; bird; brain size; disparity; phylogenetic comparative method; rate of evolution.]

Received 07 July 2021; reviews returned 16 September 2021;accepted 23 September 2021 Associate Editor: Simon Ho Abstract.-Understanding variation in rates of evolution and morphological disparity is a goal of macroevolutionary research. In a phylogenetic comparative methods framework, we present three explicit models for linking the rate of evolution of a trait to the state of another evolving trait. This allows testing hypotheses about causal influences on rates of phenotypic evolution with phylogenetic comparative data. We develop a statistical framework for fitting the models with generalized least-squares regression and use this to discuss issues and limitations in the study of rates of evolution more generally. We show that the power to detect effects on rates of evolution is low in that even strong causal effects are unlikely to explain more than a few percent of observed variance in disparity. We illustrate the models and issues by testing if rates of beak-shape evolution in birds are influenced by brain size, as may be predicted from a Baldwin effect in which presumptively more behaviorally flexible large-brained species generate more novel selection on themselves leading to higher rates of evolution.
From an analysis of morphometric data for 645 species, we find evidence that both macro-and microevolution of the beak are faster in birds with larger brains, but with the caveat that there are no consistent effects of relative brain size. [Baldwin effect; beak shape; behavioral drive; bird; brain size; disparity; phylogenetic comparative method; rate of evolution.] Rates of evolution are highly variable. While some species and traits remain nearly unchanged for tens of millions of years, others undergo dramatic shifts in mere dozens of generations (Simpson 1944;Eldredge and Gould 1972;Hendry and Kinnison 1999;Schluter 2000;Gould 2002;Hendry et al. 2008;Uyeda et al. 2011;Hunt and Rabosky 2014;Gingerich 2019;Reznick et al. 2019;Voje et al. 2020;Zelditch et al. 2020). Explaining these differences is a key challenge for a theory of macroevolution (e.g., Hansen and Houle 2004;Eldredge et al. 2005;Estes and Arnold 2007;Bell 2012;Hansen 2012;Arnold 2014;Jablonski 2017a,b). Yet, the majority of work on rates of trait evolution is descriptive and lacking in quantitative hypotheses and statistical methods derived from first principles. While substantial progress may have to wait for a more mature quantitative theory of macroevolutionary change, it is important to move beyond mere description toward testing hypotheses about factors influencing rates of evolution. Model-based phylogenetic comparative methods provide a framework for such tests. The models that are used to translate phylogenetic relationships into the statistical covariances needed for statistical analysis usually depend on parameters related to rates of evolution. In particular, the commonly used Brownian-motion model includes trait-specific rates of "diffusion" that are estimated as a part of the analysis. These can be used to estimate rates of evolution in different traits or directions of morphospace (e.g., Felsenstein 1985;Lynch 1991;Adams 2013). Methods have been developed to estimate the Brownian diffusion rate in different parts of the phylogeny, and this allows testing for clade differences in rates of evolution (Garland 1992;Martins 1994;McPeek 1995;Gittleman et al. 1996;O'Meara et al. 2006; Thomas et al. 2006;Revell 2008;Revell and Harmon 2008;Eastman et al. 2011;Revell et al. 2011;Beaulieu et al. 2012;Adams 2014;Fuentes et al. 2016;Weir and Lawson 2016;Caetano and Harmon 2019; May and Moore 2020). Such approaches make it possible to obtain and compare estimated rates of evolution from standard comparative data (e.g., Pie and Meyer 2017;Alhajeri and Steppan 2018;Chira et al. 2018;Revell et al. 2018). Estimated rates can then be used as data in further analyses to test hypotheses about causes of rate differences, but this approach entails statistical and interpretational difficulties in dealing with estimation error and linking rates with explanatory variables. What is lacking are methods for combining rates of evolution with explanatory variables that are themselves treated as evolving entities in a common model. This would allow direct estimation of causal links in a statistical framework that accounts for stochasticity in both response and predictor variables.
Here, we explore some stochastic-process models for estimating the relationship between rates of evolution in a quantitative trait, formalized either as the diffusion parameter of a Brownian motion or as the variance of recent deviations from a phylogenetic prediction, and potential explanatory variables modeled as evolving quantitative traits on the phylogeny. We illustrate the models with an analysis of rates of evolution in bird beaks based on data from Cooney et al. (2017) and Chira et al. (2018). We relate these rates to measures of absolute and relative brain sizes, thus revisiting the classical analyses of Wyles et al. (1983) to test whether evolution is speeded up by increased behavioral flexibility due to a Baldwin effect (e.g., Baldwin 1896;Popper 1972;Wilson 1985). We argue that the classical tests of this hypothesis were inadequate, but that current data based on orders of magnitude more species, accurate phylogenies, and better statistical approaches now make it feasible to test such hypotheses.

Models for Testing Macroevolutionary Hypotheses
about Rates In contrast to standard models used in comparative methods for continuous traits, most of our models will not be Gaussian, meaning that the resulting distributions of the traits will not be normal (Gaussian). This complicates statistical analyses as standard statistics based on normal distributions cannot be used. Additionally, the joint probability distributions are hard to derive, and this rules out exact likelihood or Bayesian approaches. It is possible, however, to characterize the distributions by deriving moments. This allows the use of methods-of-moments estimators and generalized least squares, which do not require exact distributional assumptions. Approximate and numerical likelihood approaches are also possible but will not be explored here.
For each model, we will start by deriving moments of the trait distributions after a time interval t. We then use this to motivate a regression-based estimator of the influence of a predictor x on the rate of evolution of a trait y. Because the state of a Brownian motion is uncorrelated with its own linear deviation from the starting point, this will take the form of a regression of the squared trait deviation on the predictor. We then derive the variances and covariances of the relevant functions for phylogenetically related species, and use this to characterize the residual variance matrix for the regression. This forms the basis for generalized least-squares (GLS) estimation. We present three models that are all based on the assumption that the variance of the evolutionary rate is linear in the predictor. This assumption helps generate approximately linear regressions of y 2 on x and will often be a reasonable choice because independent components act additively on the variance scale. Some microevolutionary models also predict approximate additive effects of microevolutionary parameters on variances of change (Bolstad et al. 2014).

Model 1: Predictor Evolves as a Brownian motion
In this model, the response variable, y, follows a Brownian motion with a rate variance that is a linear function of the predictor variable, x, which itself follows a standard Brownian motion with constant variance, 2 . The model is where the dW i are uncorrelated white noise, and we assume that the initial conditions are y 0 = x 0 = 0. The influence of x on the rate of evolution in y is characterized by the parameter b, and the goal is to find an estimator of b. This model breaks down when a+bx becomes negative, and should be seen as an approximation that may be valid for the range of x values spanned by the species to be used in the analysis. As there is no causal influence of y on the evolution of x, the unconditional distribution of x is Gaussian with mean equal to the initial condition, x 0 , and variance equal to 2 t, where t is the time from the root to the tip of the phylogeny. The moments of y and the crossmoments between y and x are derived in Appendix A.1 by stochastic integration. This reveals that y and x are indeed uncorrelated, but the squared deviation of y from its mean is correlated with x as which suggests that a regression of y 2 on x can be used to estimate the b-parameter. In Appendix B.1, we show that the best linear predictor of the species vector of y 2 from the species vector of the x-variable is where x is the column vector of species predictor variables, 1 is a column vector of ones, I is the identity matrix, T is the matrix of shared branch lengths between all the species pairs, • denotes Hadamard (i.e., elementwise) multiplication, and v y is the error variance in the prediction of the root value of the y-variable. This model assumes an ultrametric phylogeny scaled to unit height and that the x and y variables are centered on their predicted values. The residual variance matrix for this model is where v x is the variance in the predicted root value used to center the x-variable. Equations 3 and 4 can be used to estimate the parameters a and b in an iterated GLS procedure conditionally on the phylogeny and previous estimates of the 2 , v y , and v x .

Model 2: Predictor Evolves as a Geometric Brownian motion
In Model 1, we assumed that the predictor x follows a Brownian motion. This yields simplicity and may be realistic in many cases, but there are also cases in which this assumption is problematic. In particular, many candidate predictor variables will be on ratio scale types. Such variables must be positive and tend to follow multiplicative dynamics, which is incompatible with Brownian motions. The usual solution to this problem is VOL. 71 log transformation, as the logarithm of a multiplicative process may follow additive dynamics compatible with Brownian motions, but this does not solve the problem if the causal influence of the predictor variable on the rate of evolution is linear on the original scale. To deal with such situations, we use the model where x follows geometric Brownian motions, which is equivalent to Brownian motions of the logarithm of x. We again assume that the y-variable is centered on an unbiased predictor of its root value. We also assume that the x-variable is standardized with an unbiased predictor of its logarithmic root value. In this model, y is also uncorrelated with x (and any power of x), but after a time interval t, which shows that the b-parameter can be estimated from a regression of the squared y-variables on the x-variables. In Appendix B.1, we show that the best linear predictor of the vector of squared y-variables from the vector of x-variables is where x is centered on its predicted mean (see Appendix B.1), [f(t a )] denotes a matrix in which the ij'th element is f(t a ) with t a being the shared branch length between species i and j on the phylogeny, and A is an intercept equal to a minus a correction term given in the appendix. We have again assumed an ultrametric phylogeny scaled to unit height (t = 1). The residual variance matrix for this model is where v lnx is the error variance in predicting lnx 0 , the root value of the x-variable on log scale. Note that 2 for this model is the predicted variance of the natural logarithm of x across the tips of an ultrametric phylogeny scaled to unit height. The regression is not linear in the sense that the covariance of y 2 with higher powers of the predictor are not zero, but these tend to zero as 2 increases. In the simulations, however, we found some bias when 2 = 2, but not for 2 ≤ 1. Nevertheless, if 2 is below 0.5 or so, it may be better to use the simpler Model 1 since the multiplicative dynamics of a ratio-scale variable with moderate variation can be approximated with additive dynamics and Brownian motions.

Model 3: Predictor Affects Rates of Recent Microevolution
In a large data set of morphological traits, Uyeda et al. (2011) observed that recent microevolution was largely decoupled from macroevolution in a so-called "blunderbuss" pattern. The best general model for this pattern was a combination of a white noise with a million-year-scale Gaussian point process. Gaussian point processes converge to Brownian motions on long time scales, and the current phenotype of a species may thus be reasonably described by the sum of a Brownian motion on a macroevolutionary scale and a white noise representing recent evolution on a less than million-year time scale. In contrast to the previous models, which represent rates of cumulative macroevolution over the entire phylogeny, the following model may be used to test if rates of recent microevolution are related to candidate predictor variables. In light of the blunderbuss, we can do this by using the squared deviation of each species from its macroevolutionary prediction as a response variable.
We model the species trait vector as y = y macro +y micro , where y macro is the result of a macroevolutionary process (e.g., Brownian motion) unfolding on the phylogeny and y micro is a microevolutionary deviation, independent of the macroevolutionary process, but with a variance that may depend on the predictor variable as where x is a vector of species-specific predictor variables.
The diag-function applied to a vector yields a diagonal matrix with the vector along the diagonal. In this case, only the recent value of the predictor matters and we will assume that it does not interact with the macroevolutionary process. As the microevolutionary deviation is not directly observable, we model it as y −ŷ, whereŷ is a vector of macroevolutionary predictions for each species based on the other species given the phylogeny and a model of macroevolution. The best linear unbiased predictor for each species based on the other species in the phylogeny is the vector where the d-function sets to zero all nondiagonal elements of its argument, V is the variance matrix of the y vector, andȳ is the mean vector. Technically, the elements of the mean vector should be calculated separately for each species by not including that species, but in practice with many species it makes little difference to use the 1057 same grand mean for all species. Any fixed effects of the evolutionary model can be included in the mean vector. This equation can be used to make predictions from any evolutionary model by using the phylogenetic variance matrix and mean vector derived from that model. It can be derived from the general equation for best linear unbiased prediction of ancestral states given in Martins and Hansen (1997) to predict the trait value of an hypothetical species in the exact phylogenetic position of the (deleted) species to be predicted. The form of the equation can be verified by elementwise comparison. The deviance from the prediction is then and the squares of the elements of this vector can be used as a response variable in the regression model. If the evolutionary process is Gaussian, then these deviations are normally distributed with mean zero and variance matrix The variance matrix of the y-vector is where V macro is the phylogenetic variance matrix derived from the assumed macroevolutionary process, V micro (= aI+diag(bx)) is the variance matrix contributed by recent microevolution (white noise) and V measurement contains any measurement variances we wish to add to the model. If we assume that macroevolution is a Brownian-motion process, V macro = 2 y T, where T is the matrix of shared branch lengths, but other Gaussian processes, such as an Ornstein-Uhlenbeck process, may also be used.
In this model, the microevolutionary parameters a,b, and macroevolutionary parameters such as 2 y could be estimated with maximum likelihood based on the normal distribution, but in line with the approach of the other models, we focus on a GLS regression of the square of the predicted microevolutionary deviations on the predictor variable as as shown in Appendix B.1. The intercept vector is which is computed by use of the relation y −ŷ = y micro + (y macro −ŷ), and measurement variance can be added to V macro This assumes that micro-and macroevolution are independent, and that the predictor variable only affects the microevolutionary deviation. The latter assumption is not likely to be important in the GLS setting, as the effects of macroevolution are largely removed by using the deviation from the macroevolutionary prediction as a response variable. The units of these parameters are Because predictions for the different species are correlated with each other, we need to compute the residual covariances. Assuming that y −ŷ is normally distributed with zero mean, the vector of their squared elements follows a multivariate 2 -distribution with one degree of freedom and covariances Because the predictor variable, x, is regarded as fixed, these covariances are also the residual covariances of the regression model. In matrix notation the residual variance matrix can then be written METHODS

The Statistical Model
To analyze these models we will use the standard framework of phylogenetic generalized least squares (e.g., Martins and Hansen 1997). The statistical model is where Y is a vector of species observations, which in our case will be the species squared deviation from a grand mean (Models 1 and 2) or prediction (Model 3). The D is a design matrix with predictor variables, β is a vector of parameters to be estimated, and r a vector of residuals with mean zero and, a not necessarily diagonal, variance matrix, V r . If the D and V r are specified, generalized least squares (GLS) estimates can be used to obtain best linear unbiased estimates of the parameter vector β, aŝ with estimation variance which can be used to form standard errors. Note that these standard errors cannot be translated into confidence intervals when the distribution of the residuals is unknown. In our models, the V r matrix will be a function of model parameters such as a,b and 2 . The parameter 2 in Models 1 and 2 can be obtained by maximum likelihood based on the predictor variable alone (since either the predictor or its logarithm is assumed to be normally distributed). The other parameters can be estimated with an iterated 1058 SYSTEMATIC BIOLOGY VOL. 71 procedure starting from an initial guess (e.g., based on ordinary least-squares regression). Usually, such schemes converge in a handful of iterations. The design matrix, D, will normally contain two columns. For Model 1, we use with x centered on its predicted root value. This estimateŝ β ={Â,b}, and an unbiased estimator of a can be obtained by adding v y to A. For Model 2, we use with x standardized with its predicted root value and centered on its predicted mean value. Here, we estimate a by adding v y plus a bias-correction term given in appendix B to the intercept. For Model 3, we use with V as in (13), and with the term subtracted from the response vector (i.e., a is estimated according to equation 15). It is also possible to add columns to these design matrices to account for various fixed effects that may influence the squared deviations.
Most of our models require that variables are centered on or standardized with predicted values at the root of the phylogeny. Under the assumption of Brownian motion, the expression will be an unbiased predictor of the root value of y. Its variance is v y = a/n e , where a is the variance parameter of the Brownian motion, and n e = 1 T T −1 1 is an effective sample size. The same equation can be used for the root value of x or lnx as needed. In principle better predictors could be found by replacing T with the estimated variance matrices of the variables, but this is unlikely to make a difference in practice. The residual variance matrix, V r , will contain a biological component with residual variances and covariances as described in equation 4 for Model 1, equation 8 for Model 2, and equation 17 for Model 3. When available, measurement or observation variances for response and predictor variables may also be included in the residual variance matrix and used to correct for attenuation as described in Appendix B.1. Observation variances in comparative studies are often available from standard errors or confidence intervals of the means (or other statistics) used from the individual species. Note that the observation variance of the response variable in this case would be the predicted variance of the square of the mean. If we assume that the observation error is normally distributed, observation variances of squared means can be obtained as 2se 4 , where se is the standard error of the mean. Hansen and Bartoszek (2012), Garamszegi (2014), and Grabowski et al. (2016) provide further discussion of observation error in comparative studies.
The three models are set up to explain the variance of a variance, and we expect a worse fit as compared to standard linear models of trait means. Indeed, even an R 2 of a few percent could be regarded as a strong signal in these models. To illustrate, let us assume that, as in all our models, the y variable conditional on x is normally distributed with a variance that is linear in x, as in Var[y|x]=a+bx. Then, assuming that y is mean centered, the proportion of variance in the square of y explained by x is Hence, with these assumptions there is a theoretical ceiling for R 2 at less than 1/3. In practice, much less variance will be explained. For example, in a rather ideal situation with two equal-sized groups of species with one group having twice the variance of the other, we get an expected R 2 = 1/21 ≈ 5% for the simple ANOVA set up (i.e., a = 1,b = 1, and Var[x]=1/4 with x as a zero-one indicator variable); even with infinitely larger variance in one group (b =∞) the variance explained is limited to R 2 = 1/5.
Step-by-Step Summary of Approach. 1. Do exploratory analysis of both the response (y) and predictor (x) variable to assess their distributional properties and approximate modes of evolution. Phylogenetic signal can be assessed by estimating a phylogenetic heritability (Lynch 1991;Housworth et al. 2004) or a phylogenetic halflife (Hansen 1997). This will tell if the variables (or their logarithms) can be approximated by a Brownian motion and help decide which of the three models is most appropriate.
2. Prepare the data by choosing appropriate scales and units for the analysis to be used. This may involve log transformation and centering on the mean or predicted root value. Be careful about variance standardization as this may obscure the signal of interest.
3. Formulate the hypothesis to test, and pay attention to the form of the relationship between predictor and rates that is predicted by the hypothesis.
4. Fit the predictor variable or its logarithm to a Brownian motion to estimate the parameter 2 . In cases of low but nonzero phylogenetic signal, it is possible to fit a mixed model and use the rate parameter ( 2 ) from the Brownian-motion component.
5. Choose the model to use. This should be based both on statistical considerations (distribution of variables, model adequacy, e.g., Pennell et al. 2015) and biological considerations (likely or hypothesized modes of evolution and relations between variables). Model 2 should only be used with predictor variables on ratio or log-ratio scale types (see Houle et al. 2011), and logtransformed predictors would not be suitable for this model. Note that Model 3 is testing a different hypothesis than Models 1 and 2. In cases of low phylogenetic signal, Model 3 would be fine to use and will then approximate an ordinary least-squares regression, but Models 1 and 2 would be problematic to interpret as they assume the relationship between the variables is built gradually over the history of the species.
6. Fit the desired models as described above. 7. Calculate the parameters a and b from the regression parameters, and report results with units and estimated uncertainty. Interpret effect sizes. The R 2 of the model should be considered, but be aware that we can only reasonably expect to explain a few percent of the variance. Resampling procedures can be used to further explore model adequacy and uncertainty of the estimates.
We have implemented statistical machinery for most of the operations in Step 4-7 in the function rate_gls in the R-package evolvability (Bolstad et al. 2014). We have also included functions for simulation of the three evolutionary models (function: simulate_rate), parametric bootstrapping (function: rate_gls_boot), and a vignette explaining the use of the functions (Analyzing rates of evolution). The package is available at CRAN with the latest version at "github.com/GHBolstad/evolvability." APPLICATION Do Beak Shapes Evolve Faster in Large-Brained Birds? Wyles et al. (1983) argued that animals with larger brains have faster rates of morphological evolution than comparable animals with smaller brains. They attributed this to behavioral drive, a process by which learning allows individuals to explore new ways of life and thereby setting up novel selection pressures on their anatomy and physiology that lead to more extensive divergence on macroevolutionary time scales. This idea is part of a broader conglomerate of hypotheses and theories about how plasticity, learning, preferences, intentions, culture, and niche construction can influence evolution and adaptation by altering selection pressures or allowing organisms to persist in novel environments long enough to adapt to them (e.g., Baldwin 1896;Popper 1972;Wake et al. 1983;Odling-Smee et al. 2003;Frank 2011;Zuk et al. 2014). The effects of such mechanisms on rates of evolution are far from obvious, however, and it has also been argued that behavioral flexibility may buffer against changes and help explain morphological stasis (e.g., Bogert 1949;Wake et al. 1983;Hansen and Houle 2004). Paenke et al. (2007) show that both scenarios are theoretically plausible and provide general conditions for plasticity to accelerate or decelerate the response to selection.
The study of Wyles et al. (1983) was based on comparing morphological disparity between a few large clades with different degrees of encephalization such as primates against other mammals, song birds against other birds, and mammals against reptiles. This procedure yields only a handful of comparisons and is vulnerable to spurious third-variable effects and artifacts of measurement stemming from the many biological differences that must exist at such broad phylogenetic scales. It can only be regarded as crudely suggestive.

Application: Methods
To illustrate our models, we capitalize on recent large-scale phylogenies and open-source data bases for bird morphology to test the hypotheses of Wyles et al. (1983) and Wilson (1985) more rigorously. We use 3D morphometric beak-shape data taken from Cooney et al. (2017) and brain-and body-size data from Fristoe et al. (2017) and Tsuboi et al. (2018) for 651 bird species to test if rates of beak-shape evolution are influenced by brain size. We excluded the paleognaths, comprising four species of ratites and one species of tinamou, from the analysis, as this group is both a sister group to the rest and biologically different from most other birds. We also removed the emperor penguin (Aptenodytes forsteri), as this species was a strong outlier in terms of both brain and body size. This left us with 645 species that we used for the analyses presented here. Results were qualitatively similar when including all species.
The 3D-landmark data (4 landmarks and 75 semilandmarks) of beak shape were standardized by generalized Procrustes analysis before principalcomponent analysis in Cooney et al. (2017). We studied the first eight principal components from this analysis, which collectively explain more than 99% of the variance. The principal components are in units of centroid size (i.e., mean linear distance of landmarks to centroid). The phylogeny was taken from Jetz et al. (2014), and we used a maximum credibility tree from a sample of thousand phylogenies based on family-level topologies from Ericson et al. (2006) using TreeAnnotator version 2.4.5.
We tested for the effects of both absolute and relative brain mass. To obtain relative brain mass we performed an allometric analysis of brain mass on body mass ( Fig. 1) with the R-package slouch (Hansen et al. 2008; version 2.0.0. documented in Kopperud et al. 2020). Following Grabowski et al. (2016), we fitted a direct-effect model and accounted for estimation error in species means based on reported standard errors of the means of logged brain and body masses. The estimated allometric exponent corrected for attenuation due to observation error was 3/5(= 0.60±0.01), and log body size explained VOL. 71 Body mass (g) Brain mass (g) FIGURE 1. Brain-body allometry for the 645 species of birds used in this study. The data points are species means taken from Tsuboi et al. (2018). The regression line −2.21(±0.12)+0.598(±0.008) log(body mass), R 2 = 88.4%) was estimated from a "direct-effect" Ornstein-Uhlenbeck model and corrected for estimation error in species means (including attenuation) as described in Grabowski et al. (2016). The phylogenetic half-life and stationary variance of the Ornstein-Uhlenbeck model were t 1/2 = 5 h with 2-unit support interval: (1.35-∞) and 2 ou = 0.43 with 2-unit support interval: (0.14-0.49). The tree length was h = 104 myr. The long half-life means that the residuals evolve similarly to a Brownian motion.
88% of the variance in log brain size. Relative brain mass is based on the residuals from this allometric model.
The basic statistics of the traits are in Table 1. The brain-and body-size data show a strong phylogenetic signal and good fit to Brownian motion on log scale. For the beak-shape data, the first principal component, which is an axis of stoutness to pointedness of the beak, likewise show strong phylogenetic signal and good fit to Brownian motion. The other seven "shape" traits show a moderately strong phylogenetic signal, leaving room for Ornstein-Uhlenbeck-type dynamics or a component of recent evolution. The two-unit support intervals for the phylogenetic halflife exclude both infinity (Brownian motion) and zero (white noise) for all these.
We tested the hypotheses that either absolute or relative brain size influence the rate of evolution of beak shape, and whether this happens on average throughout the phylogeny (Models 1 and 2) or in terms of recent "within-species" microevolution (Model 3). The models were fitted with the functions in the evolvability package as described above. Estimates are shown ± se if not otherwise indicated. Reported standard errors are based on equation 20. With some exceptions, these were similar to bootstrap estimates (see Supplementary material). The predictor variables in the plots are phylogenetically weighted values that account for correlation with related species (i.e., values correspond to the entries in the design matrices in equations 21-23).

Application: Results
Starting with absolute brain mass, we note that this is a ratio-scale variable that we expect to evolve on a multiplicative scale. This makes the geometric Brownian-motion Model 2 is the most natural choice, and the mean-scaled variance for brain mass of 1.19(= 2 t) > 1 means that Model 1 may be less good on an untransformed scale. The results from this model mix small to moderate positive effects of absolute brain size on the rates of evolution in most shape variables (PCs 2, 3, 7, 8) with a large positive effect on PC 6, and essentially no effects on PCs 1, 4, and 5 (Fig. 2). The phylogenetic R 2 of the positive effects are small in absolute terms, 5% for PC 6 and 0.6-2% for the others, but still give room for substantial influence on rates. The predicted rate variance, in units of squared centroid size (cs) per tree length (h = 104 my), for PC 6 is −7.5×10 −4 +12.4×10 −4 x, where x is mean-standardized brain mass, which means that a doubling of brain mass from the phylogenetic mean (x = 1) would increase the rate variance from 4.9× 10 −4 cs 2 /h to 17.3×10 −4 cs 2 /h. This means that expected variance (disparity) in this shape variable of a clade with twice the average brain mass would be expected to be 3.5 times as large as the average. In other words, doubling the brain mass of a lineage would increase the expected change in PC 6 over an arbitrary time period, and thus the classic rate of evolution (i.e., the darwin), with a factor of √ 3.5, or 88%. Similar calculations for PCs 2, 3, 7, and 8 yield factors of, respectively, 27%, 25%, 65%, and 60%.
An alternative is to use Model 1 with log brain mass as predictor variable. This is equivalent in terms of its assumptions about the evolution of brain mass, but differs in that the assumed causal link to the rate variance is now linear on a proportional scale (i.e., linear on log brain mass rather than linear on brain mass). The results of this analysis are congruent with the fitting of Model 2 in that the strongest effect is for PC 6, but both this and the other effects are weaker ( Supplementary Fig. S1). For PC 6 the predicted rate variance is 9.5+10.5 log brain mass, which means that doubling brain mass from the clade mean (x = 0) by setting log brain mass to x = ln 2 would increase the rate variance from 9.5×10 −4 cs 2 /h to 17 ×10 −4 cs 2 /h, or the expected rate of evolution with a factor of 34%. The other "significant" effects, on PC 7 and 8, correspond to increases of 32% and 25%, respectively.
For relative brain mass, which is on an interval scale type, the Brownian-motions-based Model 1 is the natural choice. Fitting this model to the beak-shape variables shows little effect (Fig. 3). The estimated slopes are all positive, but small, with phylogenetic R 2 below 1% in all cases except for PC 7. Taken in isolation the results on PC 6, 7, and 8 are consistent with some effect, but we interpret the results collectively as evidence against any substantial effect of relative brain size on rates of beak-shape evolution.
Fitting Model 3 tests a different hypothesis, namely that the rate of recent (within-species) evolution of beak TABLE 1. Basic statistics for the among-species data: the phylogenetic heritability, PH 2 , is the fraction of among-species variance attributable to Brownian motion (Lynch 1991). The 2 Brownian is the rate variance of the Brownian motion in units per tree length (h = 104 my), and 2 residual is the nonphylogenetic residual variance. The phylogenetic halflife, t 1/2 , is the time in units of tree length to lose half the expected ancestral influence in an Ornstein-Uhlenbeck model of evolution (Hansen 1997). Body and brain size are in units of ln (gram), and principal components of beak shape in units of 100 × centroid size (cs). The variances are in units of the squares of the trait units. The phylogentic heritability and the variance components were estimated with the function Almer in the evolvability package, and the halflife was estimated with slouch. shape is influenced by brain size. We used a Brownianmotion-based mixed model to predict the states of the shape variables at the tips of the phylogeny and used the squared deviance of the observed states from these predicted states as response variables in the regression. This model makes no assumptions about the evolution or distribution of the predictor variables, which can then be transformed and fitted on any scale that is deemed biologically reasonable. As with Model 1 there are only tiny, inconsistent effects of relative brain size ( Supplementary Fig. S2), but increasing absolute brain size elevates the rates of all PCs except the first and marginally the third with R 2 ranging from 0.5% to 4% (Fig. 4). The lack of an effect on PC 1 is perhaps not surprising given that the strong phylogenetic signal of this variable leaves a lesser fraction of "recent" variation to be explained. The estimates from the other shapes indicate that the predicted residual variance at the tip would increase from 18% to 57%, corresponding to 9% to 25% increases in conventional rates of evolution. The general lack of effect of relative brain size may suggest that the effects of absolute brain size may be caused by body size. For this reason, we also tested body mass as an explanatory variable. With Models 1 and 2, we found generally weaker effects of body mass than we found with brain mass although 3.4% of the variance in PC 6 was explained with the geometric Brownianmotion Model 2 (results not shown). This is still weaker than the effect of brain mass, but it is possible that body size may explain some of the large effect of brain mass on PC 6 in particular. For the recent-evolution model, however, the effects of body mass were similar to those of brain mass (Supplementary Fig. S3).

DISCUSSION
The variation in tempo and mode of evolution is such that even the basic concept of a rate of evolution defined as character change per time, as in the classic darwin and haldane measures, is problematic. Beyond a few generations, evolutionary change rarely resembles a deterministic trend, but instead becomes stochastic with huge variations in both rate and direction (e.g., Gingerich 1983Gingerich , 2019Bookstein 1987;Hunt 2012;Voje 2016). These variations are such that one cannot compare rates of change across different time intervals; in fact, below species-level time scales the amount of change is often independent of time span (Gingerich 1983(Gingerich , 1993(Gingerich , 2001(Gingerich , 2009Roopnarine 2003;Uyeda et al. 2011;Voje et al. 2018) rendering conventional rates meaningless. A relationship between change and time span appears at longer, among-species, time scales, but change remains stochastic and often with scalingrelationships resembling Brownian motion (Uyeda et al. 2011). This means that rates of macroevolution cannot be measured on the scale of the trait. Instead, it is necessary to measure rates as "intensities," variance of change per time. Over the last decades, it has become common to use the diffusion parameter of a fitted Brownian motion as a measure of rate of evolution on macroevolutionary time scales (e.g., Hunt and Carrano 2010, and references in the introduction).
Estimating rates as variances of change has a downside as compared to the estimation of a trend in that it is less statistically efficient and demands larger samples to reach comparable levels of accuracy (e.g., Slater and Pennell 2014). Our theoretical analysis illustrates this point. Variables affecting rates of evolution are not expected to correlate with the state of the evolving traits in the absence of directional trends. Correlations only appear with disparity on the scale of the squared traits and statistical analysis of causal effects on the tempo of evolution must be conducted on this scale. We have shown that even strong effects on the rates of change only generate weak correlations with the causal variable implying that we do not expect to explain more than a few percent of the among-species variance in squared deviations. This means that large samples of species are necessary to detect an effect. Fortunately, databases of morphological traits and phylogenetic relationships are Brain mass, mean scaled and centred PC8 (absolute values) a −1.14 ± 1.54 b 3.15 ± 0.86 R 2 2.1 % FIGURE 2. The effect of absolute brain size on beak-shape evolution. The plot shows the magnitude of the deviance from clade mean of beak-shape variables in units of centroid size (×100) plotted against (phylogentically weighted) log brain mass for 645 bird species. The fitted line and parameter estimates (± SE) are based on Model 2 in the main text assuming that brain mass evolves like a geometric Brownian motion. The estimated slopes (b) of PCs 2, 6, 7, and 8 are statistically significant in the sense that they are more than two standard errors above zero. Relative brain size, mean centred PC8 (absolute values) a 2.66 ± 1.66 b 6.45 ± 3.47 R 2 0.5 % FIGURE 3. The effect of relative brain size on beak-shape evolution. The plot shows the magnitude of the deviance from clade mean of beakshape variables in units of centroid size (×100) plotted against relative brain size for 645 bird species. The fitted line and parameter estimates are based on Model 1 in the main text assuming that relative brain size evolves like a Brownian motion. The estimated slopes (b) of PCs 6 and 7 are statistically significant in the sense that they are more than two standard errors above zero. Brain mass (ln g), mean centred. PC8 residuals (absolute values) a 0.777 ± 0.087 b 0.322 ± 0.068 R 2 3.3 % FIGURE 4. The effect of log brain size on recent beak-shape evolution. The plot shows the magnitude of the deviance of beak-shape variables from the macroevolutionary prediction in units of centroid size (×100) in Model 3 plotted against (predicted) log brain mass for 645 bird species. The fitted line and parameter estimates are based on Model 3 in the main text, and the intercept used for plotting is the average of the prediction in equation 15. The estimated slopes (b) of PCs 2 and 4 to 8 are statistically significant in the sense that they are more than two standard errors above zero. accumulating and it is becoming increasingly possible to do comparative analyses with the hundreds of species necessary to test hypotheses about causal effects on rates of evolution.
Despite the mathematical complexity, the underlying idea of our method is simple: The effect of a variable on rates of evolution can be detected by a regression of squared trait deviance on the variable. The mathematics yields two refinements to a naive regression of this type. First, it shows how we can translate the coefficients of the raw regression to effects on underlying rates of evolution (the a and b parameters), and second, it shows how to calculate residual covariances so as to use statistically more efficient GLS estimation that accounts for phylogenetic relatedness. As with most comparative methods, both interpretation and the optimal statistical model depend on assumptions about the underlying evolutionary process. Our three models differ in these assumptions and apply under different circumstances.

Do Large-Brained Birds Evolve Faster?
We have illustrated our methods with a data set of 645 species of birds from Cooney et al. (2017), which we used to test the hypothesis that brain size influences rates of evolution in beak shape. This hypothesis, derived from Wyles et al. (1983), may be motivated as the outcome of a Baldwin effect in which presumptively more behaviorally flexible larger-brained species are more able to discover new sources of food or novel uses of their beaks, which may introduce more frequent novel patterns of selection acting on their beaks and thus more frequent evolutionary changes in morphology. This is consistent with the recent finding of increased speciation rates in large-brained birds (Sayol et al. 2019), and with findings of increased body-size disparity in bird families with large brains (Sol and Price 2008).
Our results support this idea in finding indications that absolute brain size is related to the rate of evolution of some aspects of beak shape, but with the caveat that relative brain size had little effect. Our models never explained more than a few percent of the variance in beak-shape disparity, but in the cases explaining more than a percent or so of the variance, the best estimates of effects usually indicated that the variance in shape change and thus the disparity would increase substantially. For example, assuming that brain mass evolved as a geometric Brownian motion and had a linear effect on the rate variance of the shape variables leads to the prediction that a clade of birds with twice the brain mass of another would have between 1.5 and 3.5 times the disparity (among-species variance) in many of the beak-shape variables. The recent-evolution model (Model 3) also showed rather consistent positive effects, which indicates that the rapid stationary fluctuations that characterize evolution on short below-species-level time scales are more pronounced in larger-brained birds.
The absence of consistent effects of relative brain size is surprising, but consistent with the idea that it is absolute more than relative brain size that matters for intelligence (e.g., MacLean et al. 2014;and discussed in Striedter 2005). There is a serious caveat, however, in that brain size is related to many other ecological and behavioral factors in birds (e.g., Burish et al. 2004;Shultz and Dunbar 2010;Vincze 2016), and we cannot exclude that some factor other than intelligence mediates the causal effect on the tempo of evolution. We tested the effect of body size on rates and found that these could not fully explain the effect of absolute brain size in Model 2, but that they mirrored the effects of absolute brain size when using the recent-evolution model. It is therefore possible that the elevated microevolutionary rates of large-brained species are mediated through some other correlate of larger bodies.

Relation to Other Methods
As mentioned in the introduction, there are several methods for studying rates of evolution in phylogenies or fossil time series based on stochastic models. The approach most similar to ours is the likelihood-based method pioneered by O'Meara et al. (2006) and Thomas et al. (2006). Here, Brownian motions with different rate parameters are fitted to different sections (regimes) of a phylogeny. This is similar to our Model 1, but with the predictor variable as a categorical fixed effect mapped onto the phylogeny. This approach is superior to ours when the predictor categories can be mapped reliably onto the phylogeny, because the assumption of a fixed (nonevolving) predictor allows a Gaussian process and hence the application of likelihood methods based on the normal distribution. Our models should be considered an alternative to this only when the predictor is continuous or uncertain such that mapping is unreliable. In this situation a full stochastic model for the joint evolution of both response and predictor variables is preferable. In fact, Revell (2013) has shown that rate estimates from different regimes are biased to be more similar when mapping of regimes is uncertain. Also be aware that the low power we have illustrated is not a particular flaw of our models but apply similarly to all statistical estimators of variance of change.
Another approach to the study of rate involves the estimation of ancestral states on a phylogeny and then computation of rates over ancestor-descendant pairs or contrasts between such estimates. This is often done without consideration of statistical uncertainty and is problematic because estimated ancestral states and derived contrasts are generally uncertain, heteroscedastic, and correlated (Martins and Hansen 1997). As discussed in Hansen (2014), this approach is at best a roundabout way of estimating rate parameters with many pitfalls and no advantages to a direct approach.
A related approach is to correlate disparity at higher taxonomic levels with trait means at those levels, as was done by Wyles et al. (1983) and Sol and Price (2008). See also Weir and Lawson (2016) for a pairwise-contrasts VOL. 71 method to study the effects of predictors on evolutionary rates with an application to song complexity in birds.
Our three basic models are far from exhaustive, and there are many conceivable modifications and extensions. Some flexibility can be achieved by scale transformations of the variables, such as use of log scale or not. A more challenging extension would be to multivariate rates. In our example, we decomposed a multivariate character into principal components and analyzed these separately with our univariate models. It is well known, however, that principal-component analysis of comparative data can be misleading because it mixes phylogenetic structure with trait dependence (Uyeda et al. 2015). The fundamental problem with multivariate extensions, however, is that the number of rate parameters will increase, and given the low power of the univariate approaches, any method not involving some form of dimensionality reduction is unlikely to be useful. Extension to more than one predictor variable in the fashion of a multiple regression is also possible but increases mathematical complexity and would likely need to assume independent evolution of the predictors to keep a manageable number of parameters.
The Ornstein-Uhlenbeck process is a common extension to Brownian motion in the stochastic modeling of evolution. Parameters in the Ornstein-Uhlenbeck process are sometimes used to describe rates of evolution (e.g., Martins 1994) or rates of adaptation (Hansen 1997(Hansen , 2012. These parameters have a less direct interpretation as rates, however, and are perhaps better considered as collectively describing dynamically changing tempo and mode of evolution in relation to selective optima. Nevertheless, one could consider models with either the "rate of adaptation" or the stochastic diffusion of an Ornstein-Uhlenbeck model being influenced by an evolving trait. The moments of such a process would be calculable, but more complicated than in the models we have presented. The regressions would also be nonlinear. Likelihood methods for estimating the influence of mapped "fixed effects" on parameters in Ornstein-Uhlenbeck models have been presented by Beaulieu et al. (2012). Another, simpler, extension would be to allow the predictor variable to follow an Ornstein-Uhlenbeck process in place of the Brownian motion or geometric Brownian motion. Finally, May and Moore (2020) have recently proposed a model similar to ours but with a discrete predictor variable evolving according to a Markov chain. This complements the models in the present article.
The recent-evolution model provides a different perspective on rates of evolution than in most approaches derived from comparative methods or fossil time-series analysis. Instead of studying rates of evolution as change per time over phylogenetic branches or time intervals, this approach focuses on recent evolution after removing the effects of past change. In fact, the different models we have presented resemble two positions from the early days of the phylogenetic comparative-methods literature (reviewed in Martins and Hansen 1996). One position, exemplified by the autoregressive approach of Cheverud et al. (1985), sought to remove the phylogenetic component and study adaptation in the reminder, while another position, exemplified by Lynch's (1991) mixed model, sought to remove the species-specific component and study adaptation in the phylogenetic component. While the latter position, being implicit in Felsenstein's (1985) independent-contrasts and related processbased phylogenetic comparative methods, won out for conventional analysis of trait evolution, our analysis illustrates how this can be seen as a distinction between studying evolution at two different time scales, a macroevolutionary, above species-level time scale, and a microevolutionary, below species-level time scale. As argued by Uyeda et al. (2011) andHansen (2012), evolutionary rates behave differently at microand macroevolutionary time scales and likely reflect different evolutionary mechanisms. This justifies the development of distinct models to study rates at different time scales.
The recent-evolution model also resembles comparative methods designed to make inferences about changes at single branches; typically aiming to test if particular changes are unusual as compared to the overall rate of evolution on the phylogeny (e.g., McPeek 1995;Revell 2008;Nunn and Zhu 2014). In fact, our equation 11 could be used to pick out the deviance for any species or subset of species, and test these for conformity to an expected Gaussian distribution with the predicted variance matrix in equation 12. See also Uyeda et al. (2018) for more extensive discussion of the role of singular events in comparative studies.
Our motivation for developing these models was to explore the potential for testing hypotheses about factors influencing rates of evolution or the disparity of a clade. For example, as reviewed in Bolstad et al. (2014), there have been many claims that rates of divergence and among-species variance are related to within-species variance, or to more specific measures of species evolvability. Such claims are usually based on comparing clade variances to estimates of evolvability from one or a few species across different traits, and thus do not fit the framework we have set up. It is, however, desirable to develop direct comparative tests of how measures of evolvability and other microevolutionary parameters influence rates of divergence, and the models presented here illustrate both how this can be done and the limitations in doing so.

A.1 CALCULATION OF MOMENTS
In this appendix, we compute the necessary moments for the different models by integration. The general strategy is to use the multivariate Itô formula (e.g., Gard 1988, p. 59) to derive stochastic differential equations for powers of the variables x and y and then integrate over x and y to form differential equations for moments and cross moments. We then solve these differential equations by standard integration over time.
The Model 1 was with initial conditions y 0 and x 0 . Since x follows standard univariate Brownian motions, its moments at time t are where the !! signifies semifactorial. We illustrate the computation of the other moments by an example, E[y 2 x]. The Itô formula yields dy 2 x = (a+bx)xdt+2yx a+bxdW 1 +y 2 dW 2 .
Taking the expectation, and using the fact that the dW i have mean zero and are independent of previous history, yields Integrating yields and since E y 2 = y 2 0 + a+bx 0 t, we get The differential equation for a general moment obtained from the Itô formula is which can be solved iteratively in m and k. In the following we specifically need the fourth moment of y, which is E y 4 = y 4 0 +6 a+bx 0 y 2 0 t+3 a+bx 0 2 t 2 +b 2 2 t 3 , which yields Var y 2 = 4 a+bx 0 y 2 0 t+2 a+bx 0 2 t 2 +b 2 2 t 3 .
Note that this shows that the distribution of y is leptokurtic and not Gaussian.
To obtain the necessary cross-species covariances needed for the GLS regression, we use an approach from Hansen and Martins (1996) based on conditioning on the values of the most recent common ancestor of the two species in question, and assuming that they evolve independently after they split apart. This requires the following conditional expectations: E y 2 i |y a ,x a = y 2 a +(a+bx a )t ai , where y a and x a are the values of an ancestor of i and t ai is the time that separates the ancestor a and the species i. For an ultrametric tree, t ai = t aj = t−t a with t a the time from root to the most recent common ancestor of i and j. From this, we get +2b t−t a Cov y 2 a ,x a = 4 a+bx 0 y 2 0 t a +2 a+bx 0 2 t 2 a +b 2 2 t 3 a +b 2 t−t a 2 2 t a +b 2 2 t−t a t 2 a = 4 a+bx 0 y 2 0 t a +2 a+bx 0 2 t 2 a +b 2 2 t a t 2 −t a t+t 2 a , These moments are all we need to construct the necessary variance and covariance matrices for GLS regression. This is detailed in Appendix B.1. The Model 2 was dy = a+bxdW 1 , dx = 1 2 2 xdt+xdW 2 , which implies that x follows a log-normal distribution with moments Using the Itô formula and taking moments yield, the following differential equation for an arbitrary moment of the process: which can be solved iteratively in m. Most pertinently   where T =[t a ] is the matrix of shared branch lengths in units of total tree length, I is the identity matrix, 1 is a column vector of ones, and y 0 and x 0 are the initial values of y and x at the root of the tree. Centering the variables around estimates of the root values would in principle justify setting y 0 = x 0 = 0, but the estimation error in the root values can be substantial if there are only a few branches near the root of the phylogeny. To account for this we assume that the estimates of the root values are unbiased and uncorrelated with variance v y for y 0 and v x for x 0 . Furthermore, we have to account for the fact that the error in estimating y 0 is correlated with y 2 . To do this, assume the y-variable has evolved as a Brownian motion, so that the y-vector is distributed as y ∼ N(y 0 1, aT). Let where the notation [f(t a )] means a matrix in which the ij'th element is f(t a ) with t a being the shared branch length between species i and j. In this notation T =[t a ]. As for Model 1, we center the y-variable on its estimated root value, but for the x-variable we standardize by dividing with the predicted root value. To model the error in this root value, we assume that ln(x 0 ) is normally distributed with zero mean and variance v lnx . Then assuming that y is centered and x standardized on their predicted root values. This yields equations 7 and 8 in the main text. Note that the second term in the intercept of regression depends on the regression slope. This caused numerical problems in our estimation procedure, which we solved by treating the whole intercept (except v y ) as a constant, A, and the subtracting the second term at the end to estimate a. For Model 3, let y−ŷ • y−ŷ be the column vector of squared deviations from the prediction as given in equation 11. The linear regression of the squared deviations vector on the vector x is where r is a vector of residuals. To compute this we use Observation, that is measurement, error in the species means can also be added to the residual variance matrices. Let V my and V mx be the measurement variance matrices for the response (i.e., the y-squared vector) and predictor (i.e. the x-vector). These matrices will typically be diagonal, but in principle measurement covariances can also be included. The terms added to the residual variance will be then be for Model 2. For Model 3, measurement variance can be added as described in the main text. Observation error in the predictor variable may also cause attenuation, a downward bias in the estimated regression slope. Attenuation is not corrected by including observation error in the residual variance. Expressions for attenuation with correlated data are given in Hansen and Bartoszek (2012). Applying this to our situation yields the attenuation factor K = 1− Dividing the estimated slope with K would give an approximately unbiased estimate of the true slope.