Abstract

Discrete character-taxon matrices are increasingly being used in an attempt to understand the pattern and tempo of morphological evolution; however, methodological sophistication and bespoke software implementations have lagged behind. In the present study, an attempt is made to provide a state-of-the-art description of methodologies and introduce a new R package (Claddis) for performing foundational disparity (morphologic diversity) and rate calculations. Simulations using its core functions show that: (1) of the two most commonly used distance metrics (Generalized Euclidean Distance and Gower's Coefficient), the latter tends to carry forward more of the true signal; (2) a novel distance metric may improve signal retention further; (3) this signal retention may come at the cost of pruning incomplete taxa from the data set; and (4) the utility of bivariate plots of ordination spaces are undermined by their frequently extremely low variances. By contrast, challenges to estimating morphologic tempo are presented qualitatively, such as how trees are time-scaled and changes are counted. Both disparity and rates deserve better time series approaches that could unlock new macroevolutionary analyses. However, these challenges need not be fatal, and several potential future solutions and directions are suggested.

Introduction

The codification of morphology into discrete character-taxon matrices has a long pedigree in biology (Westoll, 1949) and formed the basis of the field of numerical taxonomy (Sokal & Sneath, 1963; Sneath & Sokal, 1973). A central aim of this field was estimating phylogenetic relationships, although whether this would best be achieved by phenetic or cladistic approaches was an early controversy. Arguably, the latter approach won out, at least by frequency of implementation (Kitching et al., 1998), although the former left a legacy of discrete-character distance metrics. These were subsequently co-opted and extended (Ricklefs & Travis, 1980), particularly within palaeobiology (Wills, Briggs & Fortey, 1994; Foote, 1999), to measure morphological diversity (often abbreviated as ‘disparity’; Wills, 2001). When combined with a phylogenetic hypothesis, it was realized that such data sets could also be used to measure evolutionary tempo (Derstler, 1982; Forey, 1988), an approach that has subsequently occurred with increasing frequency in the literature (Cloutier, 1991; Ruta, Wagner & Coates, 2006; Lloyd, Wang & Brusatte, 2012; Brusatte et al., 2014a; Close et al., 2015).

Palaeobiological disparity and rate approaches have increased dramatically in appearance in the past decade or so (Liow, 2004, 2006, 2007; Ruta et al., 2006, 2013a, b; Brusatte et al., 2008a, b, 2014a; Young et al., 2010; Prentice, Ruta & Benton, 2011; Thorne, Ruta & Benton, 2011; Bapst et al., 2012; Lloyd et al., 2012; Close et al., 2015; Hetherington et al., 2015; Kotric & Knoll, 2015a; Halliday & Goswami in press), although methodological development and software implementations have lagged behind. For example, many disparity analyses still rely on the MATRIX software of Wills (1998), and rate protocols have required significant manual effort (Brusatte, 2011). More recent implementations (Lloyd, Wang & Brusatte, 2012; Brusatte et al., 2014b) have leveraged the power of the statistical programming language R (R Core Team, 2015), although these have shown limited concession to usability. Meanwhile, although some methodological advances have been made, there has been a strong focus on establishing pattern but limited progress on identifying processes. Specifically, there is a lack of sophisticated models to which empirical data can be fitted.

This present study has two goals: (1) to introduce a new software implementation for both disparity and rate analyses and (2) to delineate disparity and rate protocols as they currently stand, including their limitations and biases. In addition, some current challenges and potential future solutions and directions are described.

Claddis: A New R Package for Calculating Disparity and Rates

Analysis of disparity and rates requires significant amounts of repeated calculation (e.g. iterating over characters, branches of a phylogeny, or pairs of taxa) that are well suited for computational analysis. Claddis is a new package for the statistical programming language R (R Core Team, 2015) that aims to: (1) automate and supersede a previous manual rate analysis protocol (Brusatte, 2011); (2) simplify and supersede a set of rate analysis R scripts (Lloyd et al., 2011; Brusatte et al., 2014b); (3) offer an alternative to the disparity focused MATRIX software of Wills (1998); and (4) allow input data to be taken directly from the commonly used NEXUS (Maddison, Swofford & Maddison, 1997) format (see below). The choice of R offers numerous advantages because it: (1) is freely-available (increasing access, for example, to students and those in developing countries); (2) is multiplatform (usable in essentially the same form across Windows, Apple and Linux machines); (3) includes extensive plotting functions for easily visualizing data; and (4) has a rich ‘ecosystem’ of pre-existing packages. For example, Claddis is dependent on (i.e. uses functions from) four main packages: ape (Paradis, Claude & Strimmer, 2004), gdata (Warnes et al., 2014), phytools (Revell, 2012) and strap (Bell & Lloyd, 2015). Claddis is available from both the CRAN web site (http://cran.r-project.org/web/packages/Claddis/index.html) and github (https://github.com/graemetlloyd/Claddis). It can also be installed directly from within R using the Packages menu.

The current (0.1) version of Claddis includes 16 functions and two example data sets. These functions include those for performing rate and disparity analyses, as well as subfunctions that deal with various tree-based calculations. These include tree-traversal, ancestral state reconstruction (Yang, Kumar & Nei, 1995; Brusatte, 2011; Revell, 2012) and phylogenetic gap filling (Butler et al., 2012). It is also intended that other analytical tools for exploring discrete character-taxon matrices be added to Claddis (e.g. character compatibility; Wagner & Estabrook, 2015) and a function to perform safe taxonomic reduction (Wilkinson, 1995) is already included along with the example data set from that paper (Gauthier, 1986). The second included data set (Michaux, 1989) is included purely for its small size, meaning that examples in the help files run quickly. It should be noted that, for more typical data set sizes, many of the functions will be slow to run on a typical machine (minutes to hours), which is why replicate numbers in published studies tend to be low (Brusatte et al., 2014a; Close et al., 2015). The supplementary information (SI) contains a basic tutorial on the functions of Claddis and uses a more realistic but not overly large data set (Cullen et al., 2013). In addition, all simulations (and code to perform; Dryad Digital Repository. http://dx.doi.org/10.5061/dryad.gp16s) carried out below use Claddis, version 0.1.

Preparing Data for Disparity and Rate Analyses

It is important to note at the outset that not all cladistic data sets may be appropriate for disparity or rate analyses. Cladistic analysis can be used as a means of establishing a taxonomic classification or to infer a phylogeny to be used for other purposes. As such, cladistic data sets can be considered as inhabiting a cline, with those composed of collections of characters biased towards those that support specific a priori notions of classification at one end (Sereno, 1999) and those that are attempts at delineating the entire morphological variation of a group at the other (Livezey & Zusi, 2006), with most data sets found somewhere between these two extremes. Where a particular data set sits on this cline can affect the results inferred.

For disparity, it might be argued that an ideal scenario is to follow the example of Livezey & Zusi (2006) and be as inclusive as possible. This is often not achieved in practice (as a result of the sheer amount of labour required) and other biases such as the frequent exclusion of ‘parsimony uninformative’ characters (i.e. those that are constant across all taxa observed, or that exhibit the derived state in a single taxon: autapomorphies). Pragmatically, we might ask whether these biases really matter. One way that this can be achieved is by comparing disparity inferred from discrete characters with that from other morphometric approaches (Anderson & Friedman, 2012; Foth, Brusatte & Butler, 2012; Hetherington et al., 2015), which, in theory, lack these biases. So far, such studies show good congruence between the two techniques, at least for the groups concerned, suggesting that we appear to be capturing the same signal. However, some caution is warranted because our null expectation here is perhaps not no correlation but, instead, some degree of phylogenetic covariance (Anderson & Friedman, 2012). Thus, it is recommended that workers follow the example of Ruta et al. (2013b) in at least considering this explanation before proceeding.

For rates, the situation is somewhat different and has received relatively less consideration. A potential reduction in rate on terminal branches as a result of the biased exclusion of autapomorphies has been considered (Brusatte, 2011) but does not appear to have a noticeable effect (Brusatte et al., 2014a). This is perhaps partly because other changes accrue on terminal branches anyway (reversals and parallellisms; Cisneros & Ruta, 2010). On the other hand, the position on the cline described above may be much more important for analyses of rate. This is because the position on such a cline can be considered in terms of the proportion of ‘homologous’ (slow-evolving with few changes) to ‘homoplastic’ (fast-evolving with many changes) characters. At the classification end of the cline, there will be few homoplastic characters and fewer changes, meaning lower absolute rates. At the other end, homoplastic characters will be proportionally larger but will additionally account for a greater proportion of total changes (i.e. the total number of changes will be larger than those of each homologous character). This difference may not be of concern depending on the question being asked, although it is worthy of consideration, especially where characters are re-weighted based on their apparent homology (Farris, 1969; Goloboff, 1993).

A final consideration in data preparation concerns outgroups. Although these are a common feature of cladistic data sets, their presence in rate or disparity analyses might be counterproductive. This is because they are typically there to define the absence of certain features and hence are not coded ‘correctly’ (i.e. their own apomorphies are frequently excluded). As such, they should be, and frequently are (Hughes, Gerber & Wills, 2013; Toljagic & Butler, 2013), excluded from disparity analyses. However, they do serve an important function for rate analyses, or disparity analyses that incorporate phylogeny. Specifically, they are key in determining ancestral state reconstructions and counts of character changes on branches. In addition, they can be important for time-scaling trees, which has a direct effect on subsequent ancestral state reconstruction (at least within the likelihood framework implemented in Claddis). However, it may still be worthwhile excluding them from any subsequent analysis once they have performed their function and Claddis will allow this option in future.

Formatting Input Data for Claddis

For both rate and disparity, the starting point is to import a cladistic discrete character-taxon matrix into R. In its present form, Claddis can only take data that is already in the common NEXUS format (Maddison et al., 1997) using the ReadMorphNexus function. In future, the Henning86/TNT format will be supported. This not only places some constraints on the data, but also means it is in a machine-readable format that can easily be moved between popular software packages, such as PAUP* (Swofford, 2003) and MESQUITE (Maddison & Maddison, 2008). Specifically, Claddis assumes that the data are: (1) discrete (but see below); (2) limited to a maximum of 32 states per character; (3) composed of characters specified as either ordered (Wagner optimization; Farris, 1970) or unordered (Fitch optimization; Fitch, 1971); (4) allowed to contain missing data (unobservable or inapplicable characters); (5) allowed to contain polymorphisms (genuine polymorphisms and uncertainties); and (6) have specifiable character weights. These separate aspects of a data set are unpacked in greater detail below.

Although discrete characters are by far the most commonly used in cladistic analyses, the option is available in software such as TNT (Goloboff, Farris & Nixon, 2008) and BEAST 2 (Bouckaert et al., 2014) to combine these with (or exclusively use) continuous characters. Such combined data sets have already been used to study disparity (Ruta et al., 2013a) and, mathematically, this is easily accommodated, although not yet implemented in Claddis. In terms of rates, however, the use of continuous characters is more complex (Raia et al., 2013) and they are usually considered separately in terms of their own set of models (Harmon et al., 2008; Hunt, 2012). At present, the only way to incorporate continuous data is to convert the data into discrete ‘gap-weighted’ characters (Thiele, 1993; but see Bapst et al., 2012), normally with an accompanying up-weighting of nongap-weighted characters (Maidment et al., 2008) to equalize overall weight (see discussion on weight below). However, importing continuous characters into Claddis, and using them in at least disparity analyses, is planned for the future.

Ordered and unordered characters are really just two special cases of a character step matrix (in a cladistic, parsimony-driven paradigm) or Markov chain (in a likelihood paradigm). Alternative step matrices or Markov chains can also be envisaged for discrete characters. For example, Dollo characters or more complex ‘state trees’ have previously been specified in cladistic analyses (O'Keefe, 2001). These character types do not come with any onerous mathematical complications that would exclude them from rate or disparity analyses, although they do require an extra level of input and are not implemented in Claddis yet, partly because of their rarity in the published literature. More generally, care should be taken when inputting characters because the default assumption is unordered, although it might make more sense to treat meristic characters as ordered even if they were not treated this way in an earlier cladistic analysis.

Missing data are common in cladistic data sets and represent perhaps the primary difficulty for both rate and disparity analyses (see below). Cladistic data sets typically allow two different types of missing data: true missing data, for which characters cannot be coded as a result of incompleteness of specimen(s), and gaps, indicating inapplicable states (e.g. feather colour for a taxon where feathers are absent). At present, Claddis treats all missing data as belonging to the former category and, in most practical cases, this has no negative consequences (because most metrics are only concerned with the data that are available). However, caution should be observed when combining inapplicable states with disparity analyses when using either the Generalized Euclidean Distance metric (see below) or phylogenetic gap-filling (Butler et al., 2012). Specifically, both can effectively fill in missing states, although this may be illogical (e.g. stating feather colour to be blue in a taxon that is known to be featherless).

Polymorphisms are another common feature of cladistic data sets but bring their own complexity to analyses. These also come in two forms (specifiable in the NEXUS format by either () or {}): true polymorphisms, where both states are observed within the taxon, or uncertainties, where either may be the true state. Any state not specified in the polymorphism is considered definitively not present. Currently, Claddis treats both data types the same way. This is partly because other machine-readable formats such as Hennig86/TNT do not allow for this distinction, and also because it is not yet clear how these two types should be treated separately in disparity or rate analyses. At present, Claddis implements a minimum distance or change approach. Thus, for disparity calculations, the distance between a state of zero and a state of zero and one would be zero (the minimum distance being zero to zero) and, similarly for rate analyses, the number of changes counted along a branch for an ordered character with state zero at one end and state two and three at the other would be two (the minimum difference being zero to two).

Finally, any kind of character weighting scheme can be used in Claddis, either by specifying character weights in the NEXUS file itself, by using the equalise.weights option when importing into R, or by modifying the weights element of the data set within R. Characters can also be weighted zero to exclude them. Without specifying weights, all characters will be assigned a default weight of one, although it should be noted that this does not mean all characters have equal weight. Indeed, mixing ordered multistate characters with either binary or unordered multistate characters means that weights are unequal in cladistic, as well as disparity or rate analyses. If no a priori weighting scheme is preferred (except equal weighting), the user is recommend to set the equalise.weights option to TRUE when importing their data into R. This is the equivalent of the ‘range’ option of Wills (2001a). Note that weights are already accounted for in all disparity and rate calculations in Claddis.

At present, Claddis imports data into R as a ‘list’ object, with elements that correspond to:

  • header. A heading (basically a text field that allows the user to include notes).

  • matrix. The main character taxon matrix, with row names corresponding to taxa, missing or inapplicable states coded as NA and coded states as strings (text). The reason for this latter choice was to account for polymorphisms (e.g. ‘0&1’), which are non-numeric. Note that here states are numbered 0–31 even if the original symbols are of a different type (e.g. ACGT) or in a different order (e.g. 5130; but see number 8, below).

  • ordering. A vector indicating for each character (in sequence) if it is ord(ered) or unord(ered).

  • weights. A vector indicating for each character (in sequence) its weight.

  • max.vals. A vector indicating the maximum ‘realized’ value for each character. Note that this is only really meaningful in the context of ordered multistate characters.

  • min.vals. A vector indicating the minimum ‘realized’ value for each character. Again, this is only really meaningful in the context of ordered multistate characters.

  • step.matrices. If the input file specifies more complex step matrices than those for ordered or unordered characters, these can be specified here. At present, Claddis does not support their use but will do so in future.

  • symbols. Finally, this vector indicates the original symbols used to represent the states (e.g. 0123, or abcd) and is retained as data for writing out.

Disparity and Rates: Separate But Related

Although there is a clear need to integrate them to fully understand discrete character evolution, at present, disparity and rate analyses tend to be conducted separately (but see Hopkins, in press) and, for simplicity, that approach is also followed here. This separate evolution has lead to some notable differences between the two approaches. Perhaps the most obvious of these is that disparity analyses are much more common in the literature, whereas rates are still relatively rare. At least one probable explanation for this is the availability of an appropriate disparity software implementation since 1998 (Wills, 1998), whereas the present study marks the first comparable implementation for rates. Disparity has also seen greater methodological innovation (summarized in Fig. 1). However, rate approaches utilize model comparisons (albeit simple ones), whereas these are almost completely absent from disparity, with Foote (1996) being the only notable exception. Disparity studies also typically apply a standard core pipeline (Fig. 1), whereas rate papers tend to derive a new (albeit similar) approach each time (Derstler, 1982; Forey, 1988; Ruta et al., 2006; Lloyd et al., 2012; Brusatte et al., 2014a). Below, the typical steps in first disparity and then rate analyses are broken down to further illustrate these differences before a consideration of some ways in which time series remain a major challenge, with some suggested solutions.

Figure 1.

Disparity analysis pipeline(s). Note that disparity measures can be calculated at different stages and phylomorphospaces can be arrived at in two different ways: (1) by performing individual ancestral state reconstructions on the characters (Brusatte et al., 2011) or (2) on the individual ordination axes (Hopkins & Smith, 2015).

Figure 1.

Disparity analysis pipeline(s). Note that disparity measures can be calculated at different stages and phylomorphospaces can be arrived at in two different ways: (1) by performing individual ancestral state reconstructions on the characters (Brusatte et al., 2011) or (2) on the individual ordination axes (Hopkins & Smith, 2015).

Disparity Data Pipeline(s)

Analyses of disparity in the literature typically follow a simple pipeline that can be extended or modified depending on need (Fig. 1). The basic pipeline begins with the assembly of (or co-opting of a pre-existing) cladistic character-taxon matrix, taking care to consider some of the modifications suggested above. This is then converted into a pairwise distance matrix using one of the metrics outlined below. This can then be handed to an ordination procedure and, finally, the ordination axes themselves can be used to both visualize the data and make disparity measures.

This simple pipeline can be modified by ignoring the ordination and visualization steps, and simply using the distance matrix itself as a disparity metric. Alternatively, the process can be extended by adding a phylogenetic tree and inferring ancestral states (Brusatte et al., 2011), inferring missing tip states, or both (Butler et al., 2012). These ancestral states can then themselves be included in the ordination (Brusatte et al., 2011), or ancestral states can be inferred from the ordination itself (Hopkins & Smith, 2015) in a process directly analogous to the ‘phylomorphospaces’ produced from continuous data (Sidlauskas, 2008; Sakamoto & Ruta, 2012). Importantly, these ancestral reconstructions can be expected to differ markedly in the implied evolution of morphological diversity and so this choice should be considered carefully.

Four Distance Metrics for disParity

Regardless of whether reconstructed ancestors are included or not, a typical first step in most disparity analyses is the determination of the pairwise distances between all taxa. For any metric, the distance between a taxon and itself should be zero, and hence the diagonal of the resulting matrix should also be zero (as with any typical dissimilarity measure). Off-diagonal values should thus be positive (unless taxa are considered identical based on the available data; Wilkinson, 1995), although the exact value depends on the metric chosen. Note that, ideally, distances will always be Euclidean, although the presence of unordered multistate characters, polymorphisms, and missing data all confound this. Here, four metrics are listed, including the two most commonly used in the literature (Generalized Euclidean Distance and Gower's Coefficient).

Raw Euclidean Distance (RED)

Although not used in practice, if there are no missing data, then we can simply use the Raw Euclidean Distance, which can be written as: 
Sij=k=1vSijk2Wijk

where Sij is the distance between taxa i and j, v is the total number of characters, Wijk is the weight of the kth character, and Sijk is the distance between taxon i and taxon j for the kth character, the value of which depends on the character type. For an ordered character, this is the absolute difference between the two states; for an unordered character, any difference is one, otherwise it is zero. Note that, if there are no comparable characters for a particular taxon-taxon comparison, then theoretically Sij is zero. However, in practice, it is safer to consider it as a missing value (NA). Once repeated for every possible pairwise comparison (every possible combination of i and j for N taxa), a distance matrix is generated that can then be used to measure disparity or for ordination (Fig. 1). Note that missing data might be expected to confound the RED because distances are likely to show some degree of positive correlation with the number of comparable characters used to calculate them. Thus, it is not generalizable for practical use but is included here as a baseline metric for comparison.

Generalized Euclidean Distance (GED)

Perhaps the most popular distance metric (at least amongst vertebrate workers), partly because of its implementation in the software MATRIX (Wills, 1998), the GED is essentially a modification of the RED designed to cope with missing data. It can be expressed (modified from Wills, 2001a, eqn 1) as: 
Sij=k=1vSijk2Wijk
However, the key difference from the RED is that when distances (Sijk) are missing a weighted mean fractional univariate distance, based on those distances that are calculable, is inserted. To calculate this we can use Wills (2001a; his eqn 2): 
Sij=k=1vSijkWijk/k=1vS(ijk)maxWijk
Where S(ijk)max is the maximum possible distance between taxa i and j for the kth character. Our Sijk=S¯ijS(ijk)max can then be substituted back into eqn 2 (above) as: 
formula

Another major reason why this metric is popular is that it will always return a complete pairwise distance matrix (required by most ordinations), even if some taxon-taxon pairs have no comparable characters. This is a common scenario with vertebrate data sets and likely explains their frequency amongst studies that use it (Wills et al., 1994; Brusatte et al., 2008a, b; Young et al., 2010; Prentice et al., 2011; Thorne et al., 2011; Ruta et al., 2013a, b; Hetherington et al., 2015).

Gower's Coefficient (GC)

An alternative to filling in missing values is simply to rescale calculable distances based on the amount of information available. One way to do this is to divide each distance by the number of characters used to calculate it, as first suggested by Gower (1971; his eqn 1, modified here to include the weight term): 
Sij=k=1vSijkWijk/k=1vδijkWijk

where δijk is coded as 1 if both taxa i and j can be coded for character k, and 0 if not (sensuHopkins & Smith, 2015). If there are no comparable characters for a specific distance, then the denominator will be zero and the distance is incalculable. Here, such values are returned as NA (as with the RED) and will represent missing cells in a pairwise distance matrix. As with the other metrics, the minimum value is zero (indicating the two taxa are morphologically identical), although note that the maximum value is not capped at one, except when all characters are either binary or unordered. This metric is more commonly used by nonvertebrate workers, and was used explicitly by Hopkins & Smith (2015), and implicitly by many others (Foote, 1999; Lupia, 1999; Boyce & Knoll, 2002; Liow, 2004, 2006, 2007; Bapst et al., 2012; Kotric & Knoll, 2015a), albeit with the caveat that they were treating all characters as unordered and equally weighted (as one), such that all differences count as 1 and all similarities as zero, and incomparable differences were not counted. Davis, Finarelli & Coates (2012) state that they used Hamming distances (a technique for measuring differences between two text strings) in their approach, although it is likely that their implementation was also computationally identical to GC. Note that no square root term is used here (which helps make distances more Euclidean; Wills, 2001), although this is offered as a transformation option in Claddis (and was used by others; Foote, 1999).

Maximum Observable Rescaled Distance (MORD)

Another way that distances can be rescaled is to divide them instead by the maximum realizable distance based on observed characters. This can be expressed using the terms introduced above as: 
Sij=k=1vSijkWijk/k=1vS(ijk)maxWijk

This is a novel measure, although it is explicitly used by Close et al. (2015) and has a single main advantage: it places distances on a strict 0 to 1 scale and hence allows quick ascertainment of how close any individual distance is to the maximum possible value. Note that, when characters are entirely binary or unordered, this metric is directly equivalent to the GC and thus was effectively already used by the same investigators listed above (Foote, 1999; Lupia, 1999; Boyce & Knoll, 2002; Liow, 2004, 2006, 2007; Bapst et al., 2012; Kotric & Knoll, 2015a). As with the GC, Claddis offers the option to transform this metric by taking the square root but, because it is also a proportional measure, an additional arcsine square root transformation is also offered (and is the default). This is helpful for transforming proportional data to better meet the expectations of parametric statistics (i.e. to be normally distributed.)

Choosing a Distance Metric for Disparity

It is perhaps surprising that, although these four distance metrics exist (and two are commonly used), there has been, as far as I am aware, no direct comparison of their efficacy. Before suggesting guidelines for metric choice, we can first consider a list of desiderata fulfilling what it means to be a good distance metric. I outline my preferred desiderata below and show some simple simulation results that may aid the user in metric choice.

First, and most importantly, a good metric should retain as much of the true relative distance (despite missing data) as possible. This can be considered as its fidelity. Second, a good distance metric should lead to a distance matrix that is as close to Euclidean as possible. More specifically, they should lead to few, or no, negative eigenvalues after principal coordinate ordination (Lupia, 1999). Note that these are not fatal to analysis, although their elimination allows simple mathematical assumptions to hold; for example, that the eigenvalues can be used to create a scree plot and that the centroid can be found at zero on all axes. Third, they should (at least ideally) be calculable for all possible pairwise distances between taxa. In practice, GED will always return distances even if no comparable characters exist, except for the trivial case where the input matrix is 100% incomplete, whereas the other metrics can lead to incomplete distance matrices where incalculable distances exist. Fourth, a good metric should summarize as much of the variance as possible within the fewest number of axes that can be plotted (i.e. two for a typical bivariate plot). Typically, this is not a feature of a distance metric per se because classic multidimensional scaling allows forcing of the entire variance on to any number of axes but at some cost in terms of fidelity (although this remains, to my knowledge, unexplored). However, there is some variation between the metrics (see below) and so this is worth consideration. Fifth, and finally, because principal coordinate ordination (and other common statistical tests) is parametric, the final set of pairwise distances should be as close to normally distributed as possible.

To assess these five desiderata, four sets of simulations were run. For the first three, an idealized scenario of 20 taxa and 50 characters (all weighted one) was used. Characters were binary for the first of these (state zero or one), and multistate (state zero, one or two) for the second and third. For the first two, all characters were treated as ordered and, for the third, unordered. This is to gauge the effects of these different data types on disparity, with the binary representing an idealized scenario and the ordered and unordered a comparison of Euclidean vs. non-Euclidean characters. In the plots that follow, these are referred to as the ‘binary’, ‘ordered’, and ‘unordered’ results, respectively. For the fourth set, random sampling was allowed of the 1797 empirical datasets available at: http://www.graemetlloyd.com/matr.html to determine taxon number, character number, character weight, and character ordering. These are referred to as the ‘mixed’ results in the plots. Data sets that were too big (i.e. greater than 100 taxa) or too small (i.e. less than eight taxa) were excluded because they prevented calculation of the Shapiro–Wilk W or led to too few values for statistical analysis, respectively. For each simulation set, states were randomly generated for all taxa and characters consistent with the base matrix (i.e. minimum and maximum state values). For example, if there are three states for a character, then each cell for that character has an equal chance of being scored zero, one or two. Note that this results in matrices that, unlike most empirical data sets, will not be expected to have hierarchical phylogenetic structure (as determined by permutation tests; Alroy, 1994) and thus some caution is recommended in interpretation. Similarly, unlike many empirical data sets, inapplicable or polymorphic codings were excluded and thus their effects are not considered. For all data types, 100 matrices were generated to gauge the variability in the results.

Missing data were simulated by generating successive versions of the same matrix with increments of one percent additional missing scorings from 0% to 80%, with each cell of the matrix having equal probability of being converted to a missing value. This range is much larger than that used by a previous simulation study (Ciampaglio, Kemp & McShea, 2001; 25%; their fig. 3) but more accurately reflects the limits of the real distribution in empirical datasets (Fig. 2). As with the scorings, a random distribution of missing data is likely to be unrealistic (Smith et al., 2014) and some caution is urged in interpretation. For each level of missing data, a comparison was made to the RED values for the complete (no missing data) version of the matrix, which effectively served as the ‘true’ value.

Figure 2.

Distribution of missing data in a sample of 1797 cladistic matrices (taken from: http://www.graemetlloyd.com/matr.html). Note that a single matrix is used to represent those that are mutually non-independent using the criteria outlined in Wright, Lloyd & Hillis (in press), meaning that only 734 data sets are plotted.

Figure 2.

Distribution of missing data in a sample of 1797 cladistic matrices (taken from: http://www.graemetlloyd.com/matr.html). Note that a single matrix is used to represent those that are mutually non-independent using the criteria outlined in Wright, Lloyd & Hillis (in press), meaning that only 734 data sets are plotted.

To evaluate each metric across the range of missing data, four of the desiderata outlined above were converted to quantitative measures: (1) Mantel correlation of a true distance matrix with a degraded matrix allows a measure of fidelity from one (the true signal being present without loss) to zero (no original signal surviving); (2) calculable distances were assessed by simply taking the percentage of taxa retained from the complete matrix (taxa that led to the most incalculable distances were successively removed until the remaining pairwise distance matrix was complete); (3) the proportion of total data set variance on the first two ordination axes was used to assess the maximum amount of information that can be shown in a bivariate plot (the larger this proportion the better); and, finally, (4) normality of the data was assessed using the W statistic (Shapiro & Wilk, 1965), with one being a good approximation to normal and zero being a poor approximation. The remaining desiderata (i.e. the presence of negative eigenvalues) was sidestepped here by using the Cailliez (1983) correction in all cases. In other words, negative eigenvalues were avoided entirely to both simplify the simulations and because they were found to have little effect. Interested users can run the R script themselves (Dryad Digital Repository. http://dx.doi.org/10.5061/dryad.gp16s) to compare results with and without the correction.

In R, these tests correspond to: (1) the mantel.randtest function (using 10 000 repetitions for the randomization test) in the ade4 package (Dray & Dufour, 2007); (2) a simple count of the number of rows in the pairwise distance matrix (after missing values are removed) divided by the number of rows in the pairwise distance matrix before missing data are removed and multiplying the answer by 100; (3) taking the sum of the variance of the first two ordination axes, dividing it by the total variance and, multiplying it by 100, and (4) using the shapiro.test function implemented in the stats package (R Core Team, 2015). The Cailliez (1983) correction was applied by using the add = TRUE option in the cmdscale function in the stats package (R Core Team, 2015). The script used to perform these simulations is available on Dryad (Dryad Digital Repository. http://dx.doi.org/10.5061/dryad.gp16s). Below, the results for the four tested desiderata are considered in order.

Two different ways of visualizing fidelity (the most important desideratum) are shown in Fig. 3 and 4. Figure 3 shows fidelity relative to an expected value: the proportion of missing data squared (e.g. for 50% total missing data, each pairwise distance is expected to be between two taxa of 50% missing data each with overlapping scored characters of 50% of 50% = 25%). Overall, all metrics perform better than this expectation (Fig. 3), with the exception of RED for the unordered character set when the level of missing data is low but not minimal. Encouragingly, the three other distance metrics tend to fall closer to the ‘best case’ scenario (1 minus the proportion of missing data; e.g. 70% fidelity for 30% missing data). The exception here being GC and MORD where there is a combination of very low missing data and ordered multistate characters, likely as a result of the arcsine square root transformation. However, it should be noted that, for the first three character sets, GC and MORD tend to outperform GED and RED (Fig. 4), even for the ordered character set (when missing data is larger). In all cases, for very high missing data (the right end of each plot in Fig. 4), there is a tendency for metrics to become equally good, or more likely, equally poor. Note that (trivially) when there is no missing data, the GED and RED perform best because the RED for no missing data is the reference point and, when there are no missing data, the RED and GED are equivalent. GC and MORD do not perform as well here because of (again) the arscine square root transformation applied. The ordered multistate case allows comparison of the GC and MORD, which are equivalent in the binary and unordered cases, with the MORD tending to perform slightly better. The mixed data also appear to support this result, although the difference is less pronounced because all three missing data correcting metrics perform approximately equally. Thus, here, a tentative recommendation is to use the MORD distance metric, with the caveat that the level of missing data and character type should inform the decision.

Figure 3.

Comparative fidelity of distance metrics [Raw Euclidean Distance (RED); Generalized Euclidean Distance (GED); Gower's Coefficient (GC); Maximum Observable Rescaled Distance (MORD)] for the binary, ordered multistate, unordered multistate, and mixed characters. The results show the difference from a simple expectation (horizontal dashed line; calculated as the total proportion of missing data squared), with the grey area indicating the expected envelope between simple best case (1 – the proportion of missing data), and worst case (absolute minimal overlap, i.e. zero for 50% or more missing data) scenarios. Thus, values above the dashed line indicate better fidelity than expected, and those below, worse. Note that these expectations are really oversimplifications and are thus only intended as a general guide. Mean of 100 randomly generated matrices is shown as a solid line with 95% confidence interval indicated by vertical bars (with some truncated for the mixed data).

Figure 3.

Comparative fidelity of distance metrics [Raw Euclidean Distance (RED); Generalized Euclidean Distance (GED); Gower's Coefficient (GC); Maximum Observable Rescaled Distance (MORD)] for the binary, ordered multistate, unordered multistate, and mixed characters. The results show the difference from a simple expectation (horizontal dashed line; calculated as the total proportion of missing data squared), with the grey area indicating the expected envelope between simple best case (1 – the proportion of missing data), and worst case (absolute minimal overlap, i.e. zero for 50% or more missing data) scenarios. Thus, values above the dashed line indicate better fidelity than expected, and those below, worse. Note that these expectations are really oversimplifications and are thus only intended as a general guide. Mean of 100 randomly generated matrices is shown as a solid line with 95% confidence interval indicated by vertical bars (with some truncated for the mixed data).

Figure 4.

Comparative highest fidelity of distance metrics [Raw Euclidean Distance (RED); Generalized Euclidean Distance (GED); Gower's Coefficient (GC); Maximum Observable Rescaled Distance (MORD)] for binary, ordered multistate, unordered multistate, and mixed characters. Note that, where there is a tie for highest fidelity, the total percentage assigned is shared equally (i.e. they are not counted twice) to avoid misleading overweighting. Similarly, for binary and unordered characters, GC and MORD are equivalent and so their results are pooled.

Figure 4.

Comparative highest fidelity of distance metrics [Raw Euclidean Distance (RED); Generalized Euclidean Distance (GED); Gower's Coefficient (GC); Maximum Observable Rescaled Distance (MORD)] for binary, ordered multistate, unordered multistate, and mixed characters. Note that, where there is a tie for highest fidelity, the total percentage assigned is shared equally (i.e. they are not counted twice) to avoid misleading overweighting. Similarly, for binary and unordered characters, GC and MORD are equivalent and so their results are pooled.

Choosing the GED, however, ensures that all taxa can be retained, and Figure 5 shows at what levels of missing data the other three metrics require a reduction in the number of taxa that are retained through to ordination. For the binary, ordered multistate, and unordered multistate sets (which are equivalent here), taxa only start to be lost at high (approximately 60%) levels of missing data, although this can happen earlier in the mixed data (approximately 45%). In both cases, the proportion of taxa retained starts to drop precipitously at approximately 60%. Unsurprisingly, low levels of taxon retention may be fatal for useable ordination analyses but then very high levels of missing data are likely to result in little original signal being retained anyway. It should also be noted that this assumption of randomly distributed missing data is likely to be violated in reality (Smith et al., 2014), where missing data being concentrated in a small number of taxa is likely to be more common. In such cases, the level of total missing data at which the first taxon needs to be pruned is likely to be much lower but, at the same time, the total number of taxa retained is likely to stay higher for longer. Overall, the GED has an advantage here, although perhaps this is limited if missing data are low, pre-ordination disparity metrics (Fig. 1) are preferred or illogical data are introduced (see above).

Figure 5.

Comparative percentage of taxa that can be retained for ordination (i.e. the maximum complete distance matrix) for binary, ordered multistate, unordered multistate, and mixed characters. Note that the result is the same for the Raw Euclidean Distance (RED), Gower's Coefficient (GC) and Maximum Observable Rescaled Distance (MORD) distance metrics, as is the result for binary, ordered multistate, and unordered multistate (because they share the same 20-taxa-by-50-characters template) and so these are not plotted separately as in other figures. Similarly, for the Generalized Euclidean Distance (GED), the value is always 100% because missing values are always filled. The ideal value (dashed line) is 100%, the mean of 100 randomly generated matrices is shown as a solid line, and the 95% confidence interval is indicated by vertical bars.

Figure 5.

Comparative percentage of taxa that can be retained for ordination (i.e. the maximum complete distance matrix) for binary, ordered multistate, unordered multistate, and mixed characters. Note that the result is the same for the Raw Euclidean Distance (RED), Gower's Coefficient (GC) and Maximum Observable Rescaled Distance (MORD) distance metrics, as is the result for binary, ordered multistate, and unordered multistate (because they share the same 20-taxa-by-50-characters template) and so these are not plotted separately as in other figures. Similarly, for the Generalized Euclidean Distance (GED), the value is always 100% because missing values are always filled. The ideal value (dashed line) is 100%, the mean of 100 randomly generated matrices is shown as a solid line, and the 95% confidence interval is indicated by vertical bars.

The percentage of variance that can be plotted on the first two PCO axes is shown in Figure 6. Here, it is overwhelmingly clear that no metric performs particularly well, with the total variance on the first two axes generally falling well below even 50%. Two observations that are notable are the uptick seen in the non-GED metrics for high levels of missing data and the huge range of values in the confidence intervals for the mixed data. The former is a result of the exclusion of taxa (Fig. 5), which reduces the total number of ordination axes and hence increases the amount of variance each one carries. The latter is related in that the number of taxa is the main determinant of the number of axes; fewer taxa means more variance on the first two. However, it is clearly not desirable to produce datasets with very low numbers of taxa just to meet this criterion. Instead, it is only recommended that extremely minimal interpretation be placed on a ordination plot of any two axes or such plots be avoided completely.

Figure 6.

Comparative sum of variance on first two ordination axes of distance metrics [Raw Euclidean Distance (RED); Generalized Euclidean Distance (GED); Gower's Coefficient (GC); Maximum Observable Rescaled Distance (MORD)] for binary, ordered multistate, unordered multistate, and mixed characters. Note the uptick seen for the RED, GC, and MORD at the far right of each plot is a result of the removal of taxa (Fig. 5); this leads to a reduction in the maximum number of ordination axes and consequently an increase in the minimum variance that can be explained by the first two. The ideal value is 100%, the mean of 100 randomly generated matrices is shown as a solid line, and the 95% confidence interval is indicated by vertical bars.

Figure 6.

Comparative sum of variance on first two ordination axes of distance metrics [Raw Euclidean Distance (RED); Generalized Euclidean Distance (GED); Gower's Coefficient (GC); Maximum Observable Rescaled Distance (MORD)] for binary, ordered multistate, unordered multistate, and mixed characters. Note the uptick seen for the RED, GC, and MORD at the far right of each plot is a result of the removal of taxa (Fig. 5); this leads to a reduction in the maximum number of ordination axes and consequently an increase in the minimum variance that can be explained by the first two. The ideal value is 100%, the mean of 100 randomly generated matrices is shown as a solid line, and the 95% confidence interval is indicated by vertical bars.

Finally, Figure 7 shows the results of the Shapiro–Wilk test. Here, we see that the data generally conform to the expectations of a normal distribution, although, for all metrics, this shows a decline as the amount of missing data increases. Lower minimal values are also seen in the mixed data. This is likely a result of the variance in dataset size (because higher maximal W values are also seen for the mixed data). Here, the GED seems to perform best, although the GC and MORD metrics do comparatively well, at least for up to approximately 40% missing data.

Figure 7.

Comparative normality (measure by the Shapiro-Wilk W statistic; Shapiro & Wilk, 1965) of distance metrics [Raw Euclidean Distance (RED); Generalized Euclidean Distance (GED); Gower's Coefficient (GC); Maximum Observable Rescaled Distance (MORD)] for binary, ordered multistate, unordered multistate, and mixed characters. The ideal value (dashed line) is 1, the mean of 100 randomly generated matrices is shown as a solid line, and the 95% confidence interval is indicated by vertical bars.

Figure 7.

Comparative normality (measure by the Shapiro-Wilk W statistic; Shapiro & Wilk, 1965) of distance metrics [Raw Euclidean Distance (RED); Generalized Euclidean Distance (GED); Gower's Coefficient (GC); Maximum Observable Rescaled Distance (MORD)] for binary, ordered multistate, unordered multistate, and mixed characters. The ideal value (dashed line) is 1, the mean of 100 randomly generated matrices is shown as a solid line, and the 95% confidence interval is indicated by vertical bars.

Overall, these results show some ambiguity over metric choice. The GED performs best on most desiderata, except (critically) fidelity. Thus, here, a tentative favouring of the MORD is recommended pending more realistic simulations. However, it is also clear that dataset size, the ordering of multistate characters, and the amount of missing data should all factor into metric choice.

Ordinating Disparity

Once a metric has been chosen, a distance matrix can be produced that, if complete, can then be used to ordinate the data. In practice, most workers have done so using principal coordinates (Gower, 1966), although some (such as Liow, 2004) have used non-metric multidimensional scaling (Kruskal, 1964). Principal coordinates can be thought of as a special case of classic (or metric) multidimensional scaling (Cox & Cox, 2001) where the number of ordination axes is maximized at N – 1, where N is the number of objects (taxa). In practice, the last few axes may have too low variance to be useful and are hence discarded. Following ordination, there will therefore typically be a large number of ordination axes summarizing the full set of variance in the data. How many of these axes should be retained depends on the intended use. Broadly speaking, these will fall into two categories (Fig. 1): (1) visualization of data as a means of assessing where taxa fall in ordination space and (2) calculation of disparity metrics that can themselves be used to visualize differences between subsets of the data (such as clades, regions or time bins). In the case of the former, any single plot is likely to only visualize a relatively small amount of the total variance (Fig. 6). Although this may be because data was randomly generated for the simulations, many empirical studies, particularly vertebrate-focused ones, have exhibited the same issue (Bapst et al., 2012; Davis et al., 2012; Brusatte et al., 2014a). There is no simple solution to this problem (beyond avoiding ordination altogether) (Fig. 1), although it is important to be aware of it when interpreting data. In the case of the latter, a wide variety of approaches are seen in the literature. Specifically, previous studies have used: the first three axes only (Hetherington et al., 2015), the first ten axes only (Foote, 1999; Liow, 2004; Ruta et al., 2013a), the first 20 axes only (Ruta et al., 2013b), all axes with positive eigenvalues (Lupia, 1999), all axes (Cisneros & Ruta, 2010), the smallest number of axes that capture 85% of total variance (Young et al., 2010), and the smallest number of axes that capture 90% of total variance (Wills et al., 2012). Such variety clearly indicates a lack of consensus. Kotric & Knoll (2015a) note that information is still present on higher axes and this appears to be a good reason for not excluding them a priori based on some arbitrary variance total. However, a perhaps underappreciated concern with including higher (very low variance) axes is computational difficulties associated with small numbers (Ciampaglio et al., 2001). Specifically, floating point errors will have a greater influence when performing product calculations where values are very small. Again, there is no clear solution to this problem (beyond avoiding ordination). A tentative recommendation is to use all axes except perhaps the very highest (where total variance requires many decimal places to be expressed).

Metrics for Measuring Disparity

The ultimate goal of a disparity analysis is usually to quantitatively measure morphological diversity with a single statistic. This is especially true if a bivariate ordination visualization fails to summarize the total variance. These measures are typically comparative (Fig. 1) and may reflect a desire to understand differences between: clades (Brusatte et al., 2008a, b), time periods (Butler et al., 2012) or geographical regions (McGowan & Neige, 2011). The number of disparity measures that can be used are apparently endless (Foote, 1997; Ciampaglio et al., 2001; Wills, 2001) but can be split into those that can be calculated without ordination (e.g. weighted or unweighted mean pairwise distance, the number of discrete character states in the sample, the number of character-state combinations) or those that require ordination (e.g. mean pairwise ordination space distance, mean distance from centroid, mean distance from root node, product or sum of variance or range, participation ratio, hypercube volume, hyperellipsoid volume, minimum spanning tree length, number of grid cells occupied). Furthermore, many of these metrics can themselves be bootstrapped, jack-knifed or rarefied as a means of accounting for sampling bias or quantitatively estimating uncertainty (Foote, 1999; Kotric & Knoll, 2015b).

Simulations comparing disparity metrics are available in Ciampaglio et al. (2001) where a more in-depth discussion of these choices is provided. However, two points can be made here that may be of use. First, pre-ordination metrics are not widely used in the literature (but see Foote, 1999; Bapst et al., 2012; Benson & Druckenmiller, 2014; Close et al., 2015) despite offering a major advantage if missing data are prevalent. Specifically, as shown above, although greater fidelity is often found by using the GC or MORD distance metrics, these can lead to incomplete pairwise distance matrices (Fig. 5) that must be pruned of taxa prior to ordination. This problem can be sidestepped if an ordination is not required. Second, and more pragmatically, Claddis does not currently include functions to calculate the full suite of disparity metrics listed above; these are planned for the future and will allow a much needed update of the simulations of Ciampaglio et al. (2001). However, many simple metrics can be calculated with minimal R coding; for example, the pre-ordination mean pairwise distance and most commonly used post-ordination metrics: product or sum of range or variance; see the tutorial on Dryad (Dryad Digital Repository. http://dx.doi.org/10.5061/dryad.gp16s).

Models for Measuring Shifts in Evolutionary Tempo

Unlike disparity, consideration of evolutionary tempo typically starts from a consideration of evolutionary models. For example, continuous univariate characters already have a mature set of standard models (e.g. Brownian motion, Ornstein-Uhlenbeck, Early Burst, etc.: Harmon et al., 2008; Hunt, 2012). These focus more on evolutionary mode, although they have parameters that also reflect tempo (Hunt, 2012). By comparison, discrete characters have rarely been subjected to model-based analyses at all (but see below), although several data sets have suggested something akin to an early burst model, including lungfish (Westoll, 1949; Lloyd et al., 2012), coelacanths (Cloutier, 1991), and early tetrapods (Ruta et al., 2006). Despite a lack of explicit models, Cloutier (1991) divided rate questions into three types: (1) a rate between cladogenetic events (a per-branch rate); (2) a rate during a fixed period (a user-defined time bin); and (3) a rate during a geological period (a geologically-defined time bin). Lloyd et al. (2012) effectively added a per-clade rate to this list, and we could also envisage a per-character rate. These can themselves be split into tree-based rates (per-branch and per-clade) that allow us to clarify ‘where’ (within a phylogeny) rate shifts may occur, per-time bin rates that allow us clarify ‘when’ rate shifts occur, and per-character rates that allow us to clarify ‘which’ anatomical regions have the highest or lowest rates.

Before we can consider models of evolution, and as with our disparity metrics outlined above, we must first account for missing data. Not accounting for incompleteness can bias our results towards finding higher rates on more completely sampled branches and lower rates on poorly sampled branches, even if the true pattern is homogenous rates. Thus, our generic rate calculation should look something like this (for the actual likelihood equations used in Claddis, see Lloyd et al., 2012): 
Rate=ChangesTime×Completeness
Note that completeness here represents the number of characters that can be scored for the nodes at both ends of a branch, and thus represents our opportunity to observe changes. As with our disparity metrics, a weight term can also be added (and is implemented in Claddis). Different weights may be appropriate; for example, in a scenario where most recorded changes are found in a small subset of the characters. In such cases, we may wish to down-weight these characters so that our overall rate metric more accurately reflects change across the whole organism. This is directly analogous to the down-weighting of homoplastic characters in cladistic analysis (Farris, 1969; Goloboff, 1993).

Now that a rate calculation is formally established, we can consider some simple models of rate evolution. So far, these (Lloyd et al., 2012; Brusatte et al., 2014a; Close et al., 2015) have reflected a single comparison between a one-rate model (null hypothesis; there is a single rate operating in all parts of the tree, at all times, or in all characters) and a two-rate model (alternate hypothesis; two rates occur in two partitions, either different parts of the tree, different time bins, or different characters). These hypotheses can then be compared using a likelihood ratio test and a P-value is returned, which, when combined with a pre-defined alpha, allows acceptance or rejection of the null. In theory, this approach can be used to answer a single hypothesis (e.g. about a particular clade or branch) but, in practice, usually an exploratory approach is used whereby all branches, clades, characters or time bins are considered (Lloyd et al., 2012; Brusatte et al., 2014a; Close et al., 2015). In these cases, a multiple comparison correction is required to avoid type I errors and here the Benjamini–Hochberg test (Benjamini & Hochberg, 1995) is used (Lloyd et al., 2012). In addition, both Brusatte et al. (2014a) and Close et al. (2015) consider the problem of internal and terminal branch duration differences (the former are often shorter and lead to faster rates when branches are pooled) by treating them separately. Quite why this difference occurs requires further investigation.

At present, these models are very simple, although ways in which they can be extended can be considered. An obvious candidate for this is to consider models with more than two-rates (and hence more than a single rate shift). This can quickly become very computer intensive because of the search time required. For example, for a five-rate tree model, we would have to consider every way the tree could be divided into five contiguous units. In practice, it would thus likely require the development of some form of heuristic algorithm. Another way in which the model could be improved is to consider the most likely two-rate model for a tree. At present, the per-clade rate test tends to show a smearing out of a pattern down the tree (Lloyd et al., 2012). For example, a significantly slow clade can be nested in a larger clade that is also considered significantly slow. However, it may be that the smaller clade is simply dominating the rate calculation for the larger clade and so it might be reasonable to ask where the implied rate shift is most likely to have occurred by comparing all the two-rate models to find the highest likelihood. A final and more nuanced consideration concerns how the current per-clade and per-branch tests split trees into segments. Specifically, clades are currently defined as beginning at a single node, although a scenario where the branch subtending that clade is part of the same rate shift can also be envisaged. Similarly, current internal branch rate tests are potentially unrealistic because, if shown to be significantly fast or slow, they really represent two rate shifts (at the top and bottom of the branch). All of these modifications are potentially realizable within Claddis in future but, at present, these issues are noted here purely for the user to consider when interpreting their results.

Aside from extending the model, important future considerations for rate analyses concern the reliability of the input data and likely violations of the assumptions of the model. Specific issues include: (1) how should we time-scale trees (uncertainty in the rate denominator); (2) how should we count changes along branches (uncertainty in the rate numerator); and (3) how can we account for non-independent character change (model assumption violation). There are now a variety of options for time-scaling a tree of fossil taxa in R (Bapst, 2012; Bell & Lloyd, 2015). These are mostly arbitrary in nature and are designed primarily to overcome the problem of zero-length branches and hence a divide by zero problem in eqn 7. They can also lead to a large variety of both absolute and relative branch lengths and hence many studies simply compare different approaches to test for robustness of observed patterns (Brusatte et al., 2014a). However, newer approaches allow for probabilistic time-scaling (Bapst, 2013) or simultaneous inference and time-scaling (Lee et al., 2014; Close et al., 2015), which may help circumvent these problems. Methods for counting character changes remain more primitive because they are focused on parsimony as an optimality criterion. As such, most published analyses (Lloyd et al., 2012; Brusatte et al., 2014a) assume a minimum number of changes. Under such circumstances, the numerator of eqn 7 can remain relatively invariant, meaning that the time component of the denominator is typically the largest determinant of rate calculations (Lloyd et al., 2012). A better, non-minimal, approach is possible but complex. One solution is that already implemented in software such as BEAST2 (Bouckaert et al., 2014), where the tree is inferred and time-scaled simultaneously allowing direct inference of rates (Lee et al., 2014; Close et al., 2015). The compromise is that this largely restricts questions to per-branch rates (although extracting branch information may allow per-clade rates to be inferred too). An alternative is to use an approach such as stochastic character mapping (SCM; Huelsenbeck, Nielsen & Bollback, 2003) that may better allow per-time-bin and per-character rates to also be explored. This approach is discussed in more detail below. Finally, an outstanding challenge that goes beyond disparity and rates to phylogenetic inference in general is the problem of non-independent character evolution. Although this has always seemed probable, in reality, most approaches assume characters evolve independently. For example, Claddis models character evolution using a Poisson process. However, recent studies appear to confirm that character evolution is integrated and non-independent (Wagner & Estabrook, 2015). This is likely to effect current approaches because it makes the null of homogenous rates (where changes are hypothesized to be approximated equally distributed across the tree and time) too easy to reject (i.e. a type 1 error problem). Indeed, homogenous rates appear to be rare in empirical data. At present, this difficult problem offers no obvious solutions but is something the user should be aware of.

Pipelines for Evolutionary Tempo Analyses

A simple rate analysis pipeline is shown in Figure 8. It should be immediately obvious that this is less complex than the choices available in disparity analyses (Fig. 1), and this largely reflects their rarity in the literature and consequent immature methodological development. However, it also simplifies the complexity underlying the two main rate approaches so far: those of Lloyd et al. (2012) and Brusatte et al. (2014a). Both ultimately require three trees with branch lengths referring to changes, time, and completeness (i.e. the three elements of eqn 7, above). However, the approach of Lloyd et al. (2012) adopts parsimony algorithms to identify changes, and thus all internal nodes and branches are considered completely known (only terminal branches may be considered incomplete). The approach of Brusatte et al. (2014a), the only one implemented in Claddis, is here considered superior because ancestral states are established using a likelihood approach that allows internal nodes and branches to be incomplete.

Figure 8.

Rate analysis pipeline(s). Note that, because of their infrequent usage relative to disparity (Fig. 1), they are much simpler, with the only major operational choice shown here related to the input data.

Figure 8.

Rate analysis pipeline(s). Note that, because of their infrequent usage relative to disparity (Fig. 1), they are much simpler, with the only major operational choice shown here related to the input data.

Another difference between these two approaches is the stage, or manner, in which the tree is time-scaled. In Lloyd et al. (2012) and Ruta et al. (2006), the number of changes recorded along a branch was used to bias the time-scaling step towards the null model of homogenous rates, and thus required the number of changes to be estimated prior to time-scaling. The approach of Brusatte et al. (2014a) operates in the opposite manner: the tree must be time-scaled first and then ancestral states (and subsequently number of changes) inferred from it. Although this is also biased towards the null hypothesis of homogenous rates, it also allows more of the operations to take place within R and hence reduces manual effort (and potential for human error). It is worth noting that the BEAST2 approach to rates (Lee et al., 2014; Close et al., 2015) can also be considered biased towards the null because it attempts to fit clock-like homogenous rates. However, this effect can be reduced by applying a relaxed rather than a strict clock.

Another way the pipeline depicted here simplifies the process is by additional analyses being hidden within the ‘Rate test function’ box. For example, prior to searching for per-branch, per-clade or per-time-bin rates, the function first performs a test on the whole tree to determine whether homogenous rates can be rejected at all. If not, then no further tests are performed (i.e. it is meaningless to ask if a single branch, clade or time-bin has a significantly high or low rate if the whole tree exhibits homogenous rates). Similarly, at present, Claddis will automatically perform additional tests where only terminal or internal branches are used and these are displayed alongside those where all branches are pooled. Finally, there is another input to the function that should be noted, namely alpha. So far, an alpha of 0.01 has been preferred (Lloyd et al., 2012; Brusatte et al., 2014a; Close et al., 2015). This is a result of the generally high heterogeneity of rates within those data sets and hence a desire to reduce the number of significant values. It is perhaps noteworthy that this appears to be a general phenomenon and a search for data sets that actually fit the null could prove of interest.

Problems and Prospects in Time Series of Rates and Disparity

Palaeontologists are uniquely fond of time series, and temporal distributions of morphological diversity and evolutionary rate open up new avenues of exploration in the search for extrinsic evolutionary drivers (Butler et al., 2012; Hopkins & Smith, 2015). However, typical cladistic data sets offer two key challenges to useable time series generation: (1) a small taxonomic sample size when spread over multiple time bins; and (2) the heterogeneous distribution of missing data over time. In addition, rate-based time series offer their own peculiar complexities in transitioning branch-specific calculations into time-bins (Forey, 1988; Ruta et al., 2006; Lloyd et al., 2012).

The literature contains few (Foote, 1999; Butler et al., 2012; Hopkins & Smith, 2015) truly large disparity time series (i.e. those that are of sufficient length to allow viable statistical interrogation beyond asking whether successive bins are significantly different in value; Smith et al., 2014). Long rate time series are more common (Derstler, 1982; Forey, 1988; Ruta et al., 2006; Lloyd et al., 2012) but present greater complications in terms of binning (because rates are usually calculated for branches but branches can span multiple time bins). A simple answer to the disparity problem is just to generate larger data sets; however, this comes with a considerable labour cost. A separate approach, adopted by Brusatte et al. (2011) and Butler et al. (2012), is to increase sample size by including reconstructed ancestral morphologies for the internal nodes of a phylogenetic tree. However, this likely introduces a smoothing bias on time series because the parsimony algorithms typically applied seek to minimize overall change along branches.

For some data sets (e.g. echinoids; Hopkins & Smith, 2015), missing data are likely to be sufficiently low that temporal biases can be safely ignored. However, as shown by Smith et al. (2014), for many real data sets, disparity time series (and the significance of bin-to-bin changes) can change once the amount of missing data is proportioned more equally. Accounting for completeness is clearly also important for measures of rate but is again complicated by the fact that these values are typically attached to branches and thus are hard to satisfactorily assign to time bins. One attempt to circumvent this problem was to randomly assign dates to each change between the beginning and end of a branch and repeat the process several times to quantify uncertainty (Lloyd et al., 2012). However, this approach simply ignores completeness (eqn 7), potentially biasing the results. The proposed solution by Smith et al. (2014) for the missing data problem for disparity time series is effectively to perform a kind of character subsampling that forces time bins to have equal amounts of missing data. This is potentially valuable, although there is no obvious route to translate the same process to a rate-based approach. Even for disparity, this process may be problematic because removing data can lead to a greater number of incalculable pairwise distances (Fig. 5) (i.e. it can shift data further away from the desiderata outlined above). In addition, the last two metrics (GC and MORD) proposed above already account for missing data in disparity. Future resolution of these issues would be aided by simulations that compare these and the approach of Smith et al. (2014).

Rather than removing data, an alternative is simply to fill gaps. For disparity, this can be achieved by using the GED metric. However, for large amounts of missing data, this is likely to (again) have a smoothing bias as distances are pulled towards the weighted fractional mean value (eqn 3). Another approach (Fig. 1) adopted by Butler et al. (2012) is to use phylogeny to estimate missing values for the tips. Unfortunately, this is also likely to lead to a biased result whereby poorly known taxa are found to sit on branches that show relatively few changes (because parsimony will favour this). Further investigation is required to help determine whether, or when, these approaches should be used, and consideration of alternatives such as likelihood is warranted.

Overall there are, at present, significant challenges to producing time series of disparity and rates. However, some promising future directions can at least be suggested. First, incomplete pairwise distance matrices that confound ordination can be avoided by using pairwise distances themselves as a disparity metric (Foote, 1999; fig. 1; Benson & Druckenmiller, 2014; fig. 7; Close et al., 2015; fig. 1). Thus, a row or column in a distance matrix that includes some incalculable values is not useless because the calculable values can still be used. This also avoids the issue of non-Euclidean distance corrections such as Cailliez (1983). However, this is only really an option for disparity time series. It should be noted, however, that distances are not independent of one another and so caution is warranted in their interpretation or subsequent statistical interrogation. Second, the random assignment of dates to character changes by Lloyd et al. (2012) is essentially a independently derived simplification of SCM that offers a potential solution to both rate and disparity time series. SCM enables single potential histories of character change (consistent with a topology and tip states) to be delineated and hence used for continuous time series of either rate or disparity. In other words, a set of morphologies can be sampled for any instant in time and passed to a rate or disparity metric. Because these are stochastically generated, this process should be repeated multiple times, as Lloyd et al. (2012) did with their pseudo-SCM approach, and hence this is likely to convey a significant computation time cost. Another potential issue with SCM is that (similar to the time-scaling and change counting approaches mentioned above), there is a bias towards the null of homogenous rates, albeit with a different rate for each character. Nevertheless, this appears to be an avenue worth exploring in that appropriate R functions are already available in the phytools package (Revell, 2012) and an approach is currently being worked on for Claddis. Third, and finally, the completeness issue for rate time series can be solved more easily within the maximum likelihood framework by splitting the time component into multiple separate trees: one for each character. These trees will differ not in the age of their nodes but, instead, in the branches that they include. Specifically, branches leading to missing states will be pruned from the tree, meaning that the total time component for each tree will differ. The duration of these branches, once binned, can be summed to produce a completeness value that is expressed purely in terms of time, simplifying the rate calculation in eqn 7 by merging the denominator into a single value.

Concluding Remarks

The data presented here temper the apparent ease of translation of cladistic datasets to disparity or rate analysis. In particular, disparity simulations show that distance metric choice often requires a compromise between taxon inclusion and fidelity. Additionally, simple bivariate plots of ordination axes are shown to be of little value. These problems (and several others) can be circumvented by using pre-ordination disparity metrics, and this may be preferable depending on the question(s) being asked. Simulations on the same level are currently lacking for rate methods, although larger issues exist concerning time-scaling trees, counting character changes, and accounting for non-independent character evolution. Nevertheless, approaches are in development that may enable new analyses and better time series, which could further unlock the wealth of morphological data stored in discrete character-taxon matrices.

Acknowledgements

This paper was improved by thoughtful reviews from Dave Bapst and Melanie Hopkins; in particular, their comments prompted additional analyses that overturned my initial conclusions regarding unordered multistate characters. I have discussed aspects of Claddis and this paper informally with many people, and thank the following for their various insights and ideas: John Alroy, Dave Bapst, Roger Benson, Steve Brusatte, Roger Close, Matt Friedman, Mike Foote, Thomas Guillerme, Matt Pennell, Emma Sherratt, and Steve Wang. I would like to single out Dave Bapst's contributions in particular. He was typically thorough in his approach and both this manuscript and the Claddis package are significantly improved because of them. Finally, I am grateful to the Australian Research Council for financial support through DECRA grant DE140101879. This paper was a contribution to a Linnean Society symposium on ‘Radiation and Extinction: Investigating Clade Dynamics in Deep Time’ held on 10–11 November 2014 at the Linnean Society of London and Imperial College London and organized by Anjali Goswami, Philip D. Mannion, and Michael J. Benton, the proceedings of which have been collated as a Special Issue of the Journal. I am grateful to the organizers for the invitation to contribute.

References

Alroy
J
1994
.
Four permutation tests for the presence of phylogenetic structure
.
Systematic Biology
43
:
430
437
.

Anderson
PSL
,
Friedman
M
2012
.
Using cladistic disparity to predict functional variety: experiments using gnathostomes
.
Journal of Vertebrate Paleontology
32
:
1254
1270
.

Bapst
DW
2012
.
Paleotree: an R package for paleontological and phylogenetic analyses of evolution
.
Methods in Ecology and Evolution
3
:
803
807
.

Bapst
DW
2013
.
A stochastic rate-calibrated method for time-scaling phylogenies of fossil taxa
.
Methods in Ecology and Evolution
4
:
724
733
.

Bapst
DW
,
Bullock
PC
,
Melchin
MJ
,
Sheets
HD
,
Mitchell
CE
2012
.
Graptoloid diversity and disparity became decoupled during the Ordovician mass extinction
.
PNAS
109
:
3428
3433
.

Bell
MA
,
Lloyd
GT
2015
.
strap: an R package for plotting phylogenies against stratigraphy and assessing their stratigraphic congruence
.
Palaeontology
58
:
379
389
.

Benjamini
Y
,
Hochberg
Y
1995
.
Controlling the false discovery rate: a practical and powerful approach to multiple testing
.
Journal of the Royal Statistical Society, Series B
57
:
289
300
.

Benson
RBJ
,
Druckenmiller
PS
2014
.
Faunal turnover of marine tetrapods during the Jurassic-Cretaceous transition
.
Biological Reviews
89
:
1
23
.

Bouckaert
R
,
Heled
J
,
Kühnert
D
,
Vaughan
T
,
Wu
C-H
,
Xie
D
,
Suchard
MA
,
Rambaut
A
,
Drummond
AJ
2014
.
BEAST 2: a software platform for Bayesian evolutionary analysis
.
PLoS Computational Biology
10
:
e1003537
.

Boyce
CK
,
Knoll
AH
2002
.
Evolution of developmental potential and the multiple independent origins of leaves in Paleozoic vascular plants
.
Paleobiology
28
:
70
100
.

Brusatte
SL
2011
.
Calculating the tempo of morphological evolution: rates of discrete character change in a phylogenetic context
. In:
Elewa
AMT
ed.
Computational paleontology
.
Heidelberg
:
Springer-Verlag Publishers
,
53
74
.

Brusatte
SL
,
Benton
MJ
,
Ruta
M
,
Lloyd
GT
2008a
.
Superiority, competition, and opportunism in the evolutionary radiation of dinosaurs
.
Science
321
:
1485
1488
.

Brusatte
SL
,
Benton
MJ
,
Ruta
M
,
Lloyd
GT
2008b
.
The first 50 Myr of dinosaur evolution: macroevolutionary pattern and morphological disparity
.
Biology Letters
4
:
733
736
.

Brusatte
SL
,
Montanari
S
,
Yi
H-Y
,
Norell
MA
2011
.
Phylogenetic corrections for morphological disparity analysis: new methodology and case studies
.
Paleobiology
37
:
1
22
.

Brusatte
SL
,
Lloyd
GT
,
Wang
SC
,
Norell
MA
2014a
.
Gradual assembly of avian body plan culminated in rapid rates of evolution across dinosaur-bird transition
.
Current Biology
24
:
2386
2392
.

Brusatte
SL
,
Lloyd
GT
,
Wang
SC
,
Norell
MA
2014b
.
Data from: Gradual assembly of avian body plan culminated in rapid rates of evolution across dinosaur–bird transition
.
Dryad Digital Repository
doi:10.5061/dryad.84t75.

Butler
RJ
,
Brusatte
SL
,
Andres
B
,
Benson
RBJ
2012
.
How do geological sampling biases affect studies of morphological evolution in deep time? A case study of Pterosauria (Reptilia: Archosauria) disparity
.
Evolution
66
:
147
162
.

Cailliez
F
1983
.
The analytical solution of the additive constant problem
.
Psychometrika
48
:
305
308
.

Ciampaglio
CN
,
Kemp
M
,
McShea
DW
2001
.
Detecting changes in morphospace occupation patterns in the fossil record: characterization and analysis of measures of disparity
.
Paleobiology
27
:
695
715
.

Cisneros
JC
,
Ruta
M
2010
.
Morphological diversity and biogeography of procolophonids (Amniota: Parareptilia)
.
Journal of Systematic Palaeontology
8
:
607
625
.

Close
RA
,
Friedman
M
,
Benson
RBJ
,
Lloyd
GT
2015
.
Elevated morphological rates and high disparity support a mid-Jurassic adaptive radiation in mammals
.
Current Biology
25
:
2137
2142
.

Cloutier
R
1991
.
Patterns, trends, and rates of evolution within the Actinistia
.
Environmental Biology of Fishes
32
:
23
58
.

Cox
TF
,
Cox
MAA
2001
.
Multidimensional scaling (Second Edition)
.
Boca Raton, FL
,
Chapman & Hall
.

Cullen
TM
,
Ryan
MJ
,
Schroder-Adams
C
,
Currie
PJ
,
Kobayashi
Y
2013
.
An ornithomimid (Dinosauria) bonebed from the Late Cretaceous of Alberta, with implications for the behavior, classification, and stratigraphy of North American ornithomimids
.
PLoS ONE
8
:
e58853
.

Davis
SP
,
Finarelli
JA
,
Coates
MI
2012
.
Acanthodes and shark-like conditions in the last common ancestor of modern gnathostomes
.
Nature
486
:
247
250
.

Derstler
KL
1982
.
Estimating the rate of morphological change in fossil groups
.
Proceedings of the Third North American Paleontological Convention
1
:
131
136
.

Dray
S
,
Dufour
AB
2007
.
The ade4 package: implementing the duality diagram for ecologists
.
Journal of Statistical Software
22
:
1
20
.

Farris
JS
1969
.
A successive approximations approach to character weighting
.
Systematic Zoology
18
:
374
385
.

Farris
JS
1970
.
Methods for computing Wagner trees
.
Systematic Zoology
19
:
83
92
.

Fitch
WM
1971
.
Towards defining the course of evolution: minimum change for a specified tree topology
.
Systematic Zoology
20
:
406
416
.

Foote
M
1996
.
Models of morphological diversification
. In:
Jablonski
D
,
Erwin
DH
,
Lipps
JH
eds.
Evolutionary paleobiology
.
Chicago, IL
:
Chicago University Press
,
62
86
.

Foote
M
1997
.
The evolution of morphological diversity
.
Annual Review of Ecology and Systematics
28
:
129
152
.

Foote
M
1999
.
Morphological diversity in the evolutionary radiation of Paleozoic and post-Paleozoic crinoids
.
Paleobiology
,
25
(supp),
1
115
.

Forey
PL
1988
.
Golden jubilee for the coelacanth Latimeria chalumnae
.
Nature
336
:
727
732
.

Foth
C
,
Brusatte
SL
,
Butler
RJ
2012
.
Do different disparity proxies converge on a common signal? Insights from the cranial morphometrics and evolutionary history of Pterosauria (Diapsida: Archosauria)
.
Journal of Evolutionary Biology
25
:
904
915
.

Gauthier
JA
1986
.
Saurischian monophyly and the origin of birds
. In:
Padian
K
ed.
The origin of birds and the evolution of flight
.
San Francisco, CA
:
Towne and Bacon
,
1
55
.

Goloboff
PA
1993
.
Estimating character weights during tree search
.
Cladistics
9
:
83
91
.

Goloboff
P
,
Farris
J
,
Nixon
K
2008
.
TNT, a free program for phylogenetic analysis
.
Cladistics
24
:
774
786
.

Gower
JC
1966
.
Some distance properties of latent root and vector methods used in multivariate analysis
.
Biometrika
53
:
325
338
.

Gower
JC
1971
.
A general coefficient of similarity and some of its properties
.
Biometrics
27
:
857
871
.

Halliday
TJD
,
Goswami
A
in press.
Eutherian morphological disparity across the end-Cretaceous mass extinction
.
Biological Journal of the Linnean Society
118
:
152
168
.

Harmon
LJ
,
Weir
J
,
Brock
C
,
Glor
RE
,
Challenger
W
2008
.
GEIGER: investigating evolutionary radiations
.
Bioinformatics
24
:
129
131
.

Hetherington
AJ
,
Sherratt
E
,
Ruta
M
,
Wilkinson
M
,
Deline
B
,
Donoghue
PCJ
2015
.
Do cladistic and morphometric data capture common patterns of morphological disparity
.
Palaeontology
58
:
393
399
.

Hopkins
MJ
in press.
Magnitude versus direction of change and the contribution of macroevolutionary trends to morphological disparity
.
Biological Journal of the Linnean Society
118
:
116
130
.

Hopkins
MJ
,
Smith
AB
2015
.
Dynamic evolutionary change in post-Paleozoic echinoids and the importance of scale when interpreting changes in rates of evolution
.
PNAS
112
:
3758
3763
.

Huelsenbeck
JP
,
Nielsen
R
,
Bollback
JP
2003
.
Stochastic mapping of morphological characters
.
Systematic Biology
52
:
131
158
.

Hughes
M
,
Gerber
S
,
Wills
MA
2013
.
Clades reach highest morphological disparity early in their evolution
.
Proceedings of the National Academy of Sciences of the United States of America
110
:
13875
13879
.

Hunt
G
2012
.
Measuring rates of phenotypic evolution and the inseparability of tempo and mode
.
Paleobiology
38
:
351
373
.

Kitching
IJ
,
Forey
PL
,
Humphries
CJ
,
Williams
DM
1998
.
Cladistics: the theory and practice of parsimony analysis (Second edition)
.
Oxford
:
Oxford University Press
.

Kotric
B
,
Knoll
AH
2015a
.
A morphospace of planktonic marine diatoms. I. Two views of disparity through time
.
Paleobiology
41
:
45
67
.

Kotric
B
,
Knoll
AH
2015b
.
A morphospace of planktonic marine diatoms. II. Sampling standardization and spatial disparity partitioning
.
Paleobiology
41
:
68
88
.

Kruskal
JB
1964
.
Nonmetric multidimensional scaling: a numerical method
.
Psychometrika
29
:
115
129
.

Lee
MSY
,
Cau
A
,
Naish
D
,
Dyke
GJ
2014
.
Sustained miniaturization and anatomical innovation in the dinosaurian ancestors of birds
.
Science
345
:
562
566
.

Liow
LH
2004
.
A test of Simpson's “rule of the survival of the relatively unspecialized” using fossil crinoids
.
American Naturalist
164
:
431
443
.

Liow
LH
2006
.
Do deviants live longer? Morphology and longevity in trachyleberidid ostracodes
.
Paleobiology
32
:
55
69
.

Liow
LH
2007
.
Lineages with long durations are old and morphologically average: an analysis using multiple datasets
.
Evolution
61
:
885
901
.

Livezey
BC
,
Zusi
RL
2006
.
Higher-order phylogeny of modern birds (Theropoda, Aves: Neornithes) based on comparative anatomy: I. – Methods and characters
.
Bulletin of the Carnegie Museum of Natural History
37
:
1
556
.

Lloyd
GT
2016
.
Data from: Estimating morphological diversity and tempo with discrete character-taxon matrices: implementation, challenges, progress, and future directions
.
Dryad Digital Repository
. doi:10.5061/dryad.gp16s

Lloyd
GT
,
Wang
SC
,
Brusatte
SL
2011
.
Data from: Identifying heterogeneity in rates of morphological evolution: discrete character change in the evolution of lungfish (Sarcopterygii; Dipnoi)
.
Dryad Digital Repository
. doi:10.5061/dryad.pg46f

Lloyd
GT
,
Wang
SC
,
Brusatte
SL
2012
.
Identifying heterogeneity in rates of morphological evolution: discrete character change in the evolution of lungfish (Sarcopterygii; Dipnoi)
.
Evolution
66
:
330
348
.

Lupia
R
1999
.
Discordant morphological disparity and taxonomic diversity during the Cretaceous angiosperm radiation: North American pollen record
.
Paleobiology
25
:
1
28
.

Maddison
WP
,
Maddison
DR
2008
.
Mesquite: a modular system for evolutionary analysis
, Version 2.5. Available at: http://mesquiteproject.org

Maddison
DR
,
Swofford
DL
,
Maddison
WP
1997
.
NEXUS: an extensible file format for systematic information
.
Systematic Biology
46
:
590
621
.

Maidment
SCR
,
Norman
DB
,
Barrett
PM
,
Upchurch
P
2008
.
Systematics and phylogeny of Stegosauria (Dinosauria: Ornithischia)
.
Journal of Systematic Palaeontology
6
:
367
407
.

McGowan
AJ
,
Neige
P
2011
.
Disparity as a complement to taxonomy and phylogeny in biogeographic studies: present and past examples from the cephalopods
. In:
Upchurch
P
,
McGowan
AJ
,
Slater
CSC
eds.
Palaeogeography and palaeobiogeography: biodiversity in space and time
.
Boca Raton, FL
:
CRC Press
,
173
205
.

Michaux
B
1989
.
Cladograms can reconstruct phylogenies: an example from the fossil record
.
Alcheringa
13
:
21
36
.

O'Keefe
FR
2001
.
A cladistic analysis and taxonomic revision of the Plesiosauria (Reptilia: Sauropterygia)
.
Acta Zoologica Fennica
213
:
1
63
.

Paradis
E
,
Claude
J
,
Strimmer
K
2004
.
APE: analyses of phylogenetics and evolution in R
.
Bioinformatics
20
:
289
290
.

Prentice
KC
,
Ruta
M
,
Benton
MJ
2011
.
Evolution of morphological disparity in pterosaurs
.
Journal of Systematic Palaeontology
9
:
337
353
.

R Core Team
.
2015
.
R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria
. Available at: http://www.R-project.org/.

Raia
P
,
Carotenuto
F
,
Passaro
F
,
Piras
P
,
Fulgione
D
,
Werdelin
L
,
Saarinen
J
,
Fortelius
M
2013
.
Rapid action in the Palaeogene, the relationship between phenotypic and taxonomic diversification in Coenozoic mammals
.
Proceedings of the Royal Society of London Series B, Biological Sciences
280
:
20122244
.

Revell
LJ
2012
.
phytools: an R package for phylogenetic comparative biology (and other things)
.
Methods in Ecology and Evolution
3
:
217
223
.

Ricklefs
RE
,
Travis
J
1980
.
A morphological approach to the study of avian community organization
.
The Auk
97
:
321
338
.

Ruta
M
,
Wagner
PJ
,
Coates
MI
2006
.
Evolutionary patterns in early tetrapods. I. Rapid initial diversification followed by decrease in rates of character change
.
Proceedings of the Royal Society of London Series B, Biological Sciences
273
:
2107
2111
.

Ruta
M
,
Angielczyk
KD
,
Frobisch
J
,
Benton
MJ
2013a
.
Decoupling of morphological disparity and taxic diversity during the adaptive radiation of anomodont therapsids
.
Proceedings of the Royal Society of London Series B, Biological Sciences
280
:
20131071
.

Ruta
M
,
Botha-Brink
J
,
Mitchell
SA
,
Benton
MJ
2013b
.
The radiation of cynodonts and the ground plan of mammalian morphological diversity
.
Proceedings of the Royal Society of London Series B, Biological Sciences
280
:
20131865
.

Sakamoto
M
,
Ruta
M
2012
.
Convergence and divergence in the evolution of cat skulls: temporal and spatial patterns of morphological diversity
.
PLoS ONE
7
:
e39752
.

Sereno
PC
1999
.
The evolution of dinosaurs
.
Science
284
:
2137
2147
.

Shapiro
SS
,
Wilk
MB
1965
.
An analysis of variance test for normality (complete samples)
.
Biometrika
52
:
591
611
.

Sidlauskas
B
2008
.
Continuous and arrested morphological diversification in sister clades of characiform fishes: a phylomorphospace approach
.
Evolution
62
:
3135
3156
.

Smith
AJ
,
Rosario
MV
,
Eiting
TP
,
Dumont
ER
2014
.
Joined at the hip: linked characters and the problem of missing data in studies of disparity
.
Evolution
68
:
2386
2400
.

Sneath
PHA
,
Sokal
RR
1973
.
Numerical taxonomy
.
WH Freeman and Company
,
San Francisco, CA
.

Sokal
RR
,
Sneath
PHA
1963
.
Principles of numerical taxonomy
.
WH Freeman and Company
,
San Francisco, CA
.

Swofford
DL
2003
.
PAUP*: phylogenetic analysis using parsimony (*and other methods)
.
Sunderland, MA
:
Sinauer Associates
.

Thiele
K
1993
.
The holy grail of the perfect character: the cladistic treatment of morphometric data
.
Cladistics
9
:
275
304
.

Thorne
PM
,
Ruta
M
,
Benton
MJ
2011
.
Resetting the evolution of marine reptiles at the Triassic-Jurassic boundary
.
Proceedings of the National Academy of Sciences of the United States of America
,
108
,
8339
8344
.

Toljagic
O
,
Butler
RJ
2013
.
Triassic-Jurassic mass extinction as trigger for the Mesozoic radiation of crocodylomorphs
.
Biology Letters
9
:
20130095
.

Wagner
PJ
,
Estabrook
GF
2015
.
The implications of stratigraphic compatibility for character integration among fossil taxa
.
Systematic Biology
64
:
838
852
.

Warnes
GR
,
Bolker
B
,
Gorjanc
G
,
Grothendieck
G
,
Korosec
A
,
Lumley
T
,
MacQueen
D
,
Magnusson
A
,
Rogers
J
2014
.
gdata: various R programming tools for data manipulation
. R package version 2.13.3. Available at: http://CRAN.R-project.org/package=gdata

Westoll
TS
1949
.
On the evolution of the Dipnoi
. In:
Jepsen
GL
,
Simpson
GG
,
Mayr
E
eds.
Genetics, paleontology and evolution
.
Princeton, NJ
:
Princeton University Press
,
121
184
.

Wilkinson
M
1995
.
Coping with abundant missing entries in phylogenetic inference using parsimony
.
Systematic Biology
44
:
501
514
.

Wills
MA
1998
.
Crustacean disparity through the Phanerozoic: comparing morphological and stratigraphic data
.
Biological Journal of the Linnean Society
65
:
455
500
.

Wills
MA
2001
.
Morphological disparity: a primer
. In:
Adrain
JM
,
Edgecombe
GD
,
Lieberman
BS
eds.
Fossils, phylogeny, and form: an analytical approach
.
New York, NY
:
Kluwer Academic/Plenum Publishers
,
55
144
.

Wills
MA
,
Briggs
DEG
,
Fortey
RA
1994
.
Disparity as an evolutionary index: a comparison of Cambrian and Recent arthropods
.
Paleobiology
20
:
93
130
.

Wills
MA
,
Gerber
S
,
Ruta
M
,
Hughes
M
2012
.
The disparity of priapulid, archaeopriapulid and palaeoscolecid worms in the light of new data
.
Journal of Evolutionary Biology
25
:
2056
2076
.

Wright
AM
,
Lloyd
GT
,
Hillis
DM
in press.
Modeling character change heterogeneity in phylogenetic analyses of morphology through the use of priors
.
Systematic Biology
[http://sysbio.oxfordjournals.org/content/early/2015/12/22/sysbio.syv122.].

Yang
Z
,
Kumar
S
,
Nei
M
1995
.
A new method of inference of ancestral nucleotide and amino acid sequences
.
Genetics
141
:
1641
1650
.

Young
MT
,
Brusatte
SL
,
Ruta
M
,
De Andrade
MB
2010
.
The evolution of Metriorhynchoidea (Mesoeucrocodylia, Thalattosuchia): an integrated approach using geometric morphometrics, analysis of disparity, and biomechanics
.
Zoological Journal of the Linnean Society
158
:
801
859
.

Shared Data

Data deposited in the Dryad digital repository (Lloyd et al., 2016).