## Abstract

Geographic characters—traits describing the spatial distribution of a species—may both affect and be affected by processes associated with lineage birth and death. This is potentially confounding to comparative analyses of species distributions because current models do not allow reciprocal interactions between the evolution of ranges and the growth of phylogenetic trees. Here, we introduce a likelihood-based approach to estimating region-dependent rates of speciation, extinction, and range evolution from a phylogeny, using a new model in which these processes are interdependent. We demonstrate the method with simulation tests that accurately recover parameters relating to the mode of speciation and source–sink dynamics. We then apply it to the evolution of habitat occupancy in Californian plant communities, where we find higher rates of speciation in chaparral than in forests and evidence for expanding habitat tolerances.

The influence of species' traits on lineage diversification is an active area of macroevolutionary research, and recent advances have improved methods for detecting the phylogenetic signature of state-dependent speciation and extinction. In particular, hypotheses of trait acquisition leading to higher rates of species accumulation can now be tested for binary and quantitative characters using models that also allow for asymmetry in the direction of trait evolution (Maddison et al. 2007, FitzJohn 2010b). However, not all traits are well served by current methods, and in this paper, we draw attention to one particularly interesting class: characters whose states can be directly transformed by the processes of lineage birth and death. The concept that trait evolution may not only affect lineage diversification but also be affected by it is not new (e.g., rapid state changes associated with speciation, Eldredge and Gould 1972), but it has not previously been modeled in a way that permits hypothesis testing from comparative data.

We illustrate this idea of reciprocal evolutionary interactions with “geographic” characters whose states are defined by spatial distributions. Consider geographic range and habitat preference, two traits likely to influence speciation and extinction (McKinney 1997, Ribera et al. 2001, Jablonski and Roy 2003, Cardillo et al. 2005). For geographic range, imagine a widespread ancestral species split by allopatric divergence into two daughter species, each confined to a smaller range. For habitat preference, imagine an ancestral generalist species able to tolerate multiple habitat types in which local adaptation of a population to one habitat leads to its isolation and divergence as a new specialist species. In each of these examples, speciation results in the origin of at least one daughter lineage having a state different from its immediate ancestor.

This possibility of a change in character state at a splitting event has significant consequences for models of trait-dependent lineage diversification in which speciation is a stochastic process generating discrete events through time. If the rate of speciation is mode-dependent (e.g., if divergence in allopatry is more rapid than divergence in sympatry or parapatry) and the mode of speciation is state-dependent (e.g., if allopatric divergence is more likely in widespread species than in small-ranged endemics), then reciprocal feedback ensues between the evolution of traits and the rate and mode by which lineages proliferate. Character evolution is thus a function of rates of change along phylogenetic branches (anagenesis) and rates of change associated with speciation events (cladogenesis), while the rate of speciation is likewise affected by trait values and modes of divergence.

This feedback loop between diversification and state changes also includes extinction. In the case of geographic range, species with larger ranges are commonly considered less prone to extinction than those with smaller ranges (McKinney 1997). Range contraction events will therefore increase the rate of extinction, and they may have causes that are anagenetic (stochastic extirpation of local populations) or cladogenetic (speciation fragmenting an ancestral range into smaller descendant ranges). These effects are counterbalanced by range expansion, achieved through dispersal events that establish new populations. Increases in range size may have a variety of effects on speciation rates (Rosenzweig 1995, Chown 1997, Gaston and Chown 1999).

Dynamic, reciprocal interactions between state transitions (range expansion and contraction) and rates of lineage birth and death are an intrinsic feature of many evolutionary and ecological hypotheses. For example, are regional differences in species richness and endemism driven by spatial asymmetries in speciation and extinction rates, or by asymmetries in the direction of lineage dispersal (i.e., centers of origin and accumulation; Stebbins, 1974, Chown and Gaston, 2000, Mora et al., 2003)? How do range size and range evolution affect speciation rate (Wagner and Erwin 1995, Gaston and Chown 1999, Jablonski and Roy 2003, Pigot et al. 2010)? What are the most common modes by which speciation divides ancestral ranges in terms of range size or degree of sympatry (Anderson 1985, Gaston 1998, Barraclough and Vogler 2000, Phillimore et al. 2008, Pigot et al. 2010)? How do speciation rates compare on the mainland, within an island, and directly after dispersal to the island (Gillespie and Roderick 2002, Ricklefs and Bermingham 2007)? Are specialists more likely to arise from generalist ancestors or directly from other specialists (Simpson 1953, Nosil and Mooers 2005)?

The need for phylogenetic models to infer these interactions has been raised previously in the context of biogeographic frameworks that lack parameters for stochastic speciation and extinction and hence allow consideration of geographic characters only on a static tree (Ree and Smith 2008, Lamm and Redelings 2009, Ree and Sanmartín 2009). Here, we develop a model for geographic traits that encapsulates the basic concepts introduced above. It uses the mathematical framework of Maddison et al. (2007), modified to include biogeographic parameters and to allow state changes at speciation and through local extinction. We demonstrate with simulations how parameter estimation can be used to test hypotheses about the tempo and mode of lineage diversification in the context of range evolution. We then apply the model to an empirical case study, quantifying the contributions of diversification and range shifts to diversity in Californian chaparral plant communities.

## THE GEOSSE MODEL

Our model of “geographic state speciation and extinction,” GeoSSE, differs from the “binary state speciation and extinction” model (BiSSE, Maddison et al. 2007) in three specific ways. First, the character states are defined so that a single, wide-ranging species may occupy more than one area or region simultaneously. Second, the transitions between states are explicitly parameterized in terms of range expansion through dispersal and range contraction through local extirpation. And third, rates of speciation, extinction, and dispersal are tied to the regions themselves rather than to the character states, causing the dynamics of wider-ranging species to be determined by the cumulative effects of the areas they inhabit.

The mechanisms and assumptions of GeoSSE are the same as the model of Goldberg et al. (2005), with the addition of between-region speciation (from Ree et al. 2005; see below). The two models differ in the type of data they are designed to fit: the first was developed for the ages of extant species as determined from the fossil record, whereas GeoSSE works with reconstructed phylogenies estimating relationships among extant species. Of existing phylogenetic biogeographic models, the closest is the dispersal-extinction-cladogenesis model (DEC, Ree and Smith 2008); GeoSSE differs in treating speciation and global extinction as a stochastic process rather than assuming a fixed phylogenetic history.

### Conceptualization

#### The regions.—

We greatly simplify spatial structure by modeling a system with only two regions, denoted A and B, in each of which a species may be present (having one or more populations) or absent. Coding ranges in this way yields three possible extant states: A, B (species endemic to a single region), and AB (species widespread in both regions). We assume that the carrying capacities of both regions are unbounded and that the diversity dynamics are not affected by species interactions, as in the basic birth–death framework.

It is simplest to interpret each region as a continuous portion of space, but our model applies equally to regions consisting of a collection of disjoint areas under the interpretation that the processes within each region are averaged across the individual areas. Consequently, although we will present the model in terms of geographic regions, it is also relevant to other range-related traits such as habitat type or the host type of parasites. Regions cannot, however, be defined by the species within them; such systems may instead require models with clade-specific rates or interaction effects between lineages.

#### Dispersal and extinction.—

Range evolution along a phylogenetic branch (anagenesis) is composed of two processes: range expansion via dispersal and range contraction via local extirpation. Over an infinitesimal time interval, ranges can evolve only by single events. In this two region model, range expansion consists of transitions from A or B to AB, occurring with per-lineage rates dA and dB, respectively. We assume that these and all other rates of the model are constant over time and across lineages. The reverse process, range contraction from AB to A or B, occurs with per-lineage rates xB and xA, respectively. We assume that extinction from a region is independent of presence in the other region, so global extinction of species from states A and B also occurs at per-lineage rates xA and xB, respectively. Lineage extinction in the GeoSSE model thus depends on range size as well as location because more events are required for extinction of a wider-ranging species. We do not allow instantaneous transitions between A and B or immediate global extinction of AB because each of these requires two events.

#### Speciation.—

In the model, speciation of an endemic lineage always produces two daughter lineages with the same range: an A parent produces two A daughters and similarly for B. A widespread AB species may also undergo speciation within a single region, yielding one endemic and one wide-ranging daughter, that is, the daughters have states A and AB if speciation was in Region A, or B and AB. The region-specific per-lineage rates for this within-region mode of speciation are sA and sB. Alternatively, an AB species may diverge along the boundary that separates the regions, yielding one A and one B daughter. The rate of this between-region mode of speciation is $sAB$. Because widespread lineages are subject to both the between- and within-region modes of cladogenesis, their effective rate of speciation is greater than that of endemics; it is the sum of sA, sB, and $sAB$.

Identical inheritance of geographic distribution is likely impossible when examined on a fine spatial scale, but we ignore structure inside regions. Therefore, although the two daughter ranges of an endemic parent are equal in our model, we do not specify whether within-region speciation is allopatric, parapatric, or sympatric. Between-region speciation events in GeoSSE are not traditional vicariance events (a physical change in the connectivity of the regions that affects all species) but allow different species to respond individually to their environment.

### Formulation

Because the mathematical description of the GeoSSE model is similar to BiSSE (Maddison et al. 2007), we present the formulation only briefly here; a full derivation can be found in online Appendix 1 (available from http://dx.doi.org/10.5061/dryad.8343). A schematic comparison of GeoSSE and BiSSE is shown in Figure 1.

FIGURE 1.

The states and allowed transitions in a) BiSSE and b) GeoSSE. Both models have six rate parameters and allow state-dependent speciation (λ or s) and extinction (μ or x), and anagenetic state changes (q, d, x). GeoSSE additionally allows state changes during speciation in association with the third (AB) character state.

FIGURE 1.

The states and allowed transitions in a) BiSSE and b) GeoSSE. Both models have six rate parameters and allow state-dependent speciation (λ or s) and extinction (μ or x), and anagenetic state changes (q, d, x). GeoSSE additionally allows state changes during speciation in association with the third (AB) character state.

Like BiSSE, GeoSSE assumes a fully resolved, dated phylogeny of the group in question. Although BiSSE can in principle allow for ancestral nodes with hard polytomies (via modification of Equation 4 in Maddison et al. 2007), the possibility of character state changes at nodes makes this unwieldy for GeoSSE. Phylogenetic uncertainty can better be incorporated by performing analyses across a posterior set of bifurcating trees, as we illustrate in our empirical case study below.

Geographic ranges of extant species should be known with sufficient precision to say whether each species is present in Region A only, Region B only, or both regions. Incomplete sampling, either randomly distributed across the tree or in the form of unresolved clades, can be incorporated as in FitzJohn et al. (2009), as can uncertain tip state information. Such incomplete information, of course, reduces the power of the analysis.

#### Likelihood of tree and character states (D).—

The likelihood $DNi(t)$ is proportional to the probability of a lineage beginning at time t in state i (i = A, B, or AB) evolving into a clade with identical branching structure and character states as the (sub)tree actually observed to descend from N. The branching process will be viewed as proceeding forward in time, though time is defined to increase towards the root of the tree (Online Fig. A1-1).

Changes in the $DNi$ over time within a branch are described by

(1a)

(1b)

(1c)
where $Ei(t)$ is the likelihood that a lineage in state i goes extinct before the present time, described in more detail below.

Comparing Equation 1 with Equation 3 of Maddison et al. (2007), the description for states A and B (Equations 1a and 1b) is quite similar to that for BiSSE's two states. A lineage remains unbranched and in state A, for example, if it does not speciate, disperse, or go extinct (first term in Equation 1a); it changes state by dispersing (second term in Equation 1a); and if speciation does occur, one daughter lineage goes extinct before the time of observation in order for the branch to appear without a node in the reconstructed tree (third term in Equation 1a). The AB state operates differently, however. Dispersal is not an option for a species already present in both regions, so dA and dB do not appear in Equation 1c. Local extinction causes a change of state out of AB (second and third terms in Equation 1c). Speciation can occur in any of three ways (last three terms of Equation 1c), but if it does, one of the daughter lineages must go extinct before the present.

When N is a tip, the $DNi(0)$ are the initial conditions, equal to the probability of finding state i at that tip. This value is one for the observed state i (or fi if only a proportion fi of tips of state i are included in the tree, FitzJohn et al. [2009]) and zero for the others.

Likelihoods from sister branches (denoted N and M) must be combined at the node of their immediate ancestor (denoted C; see online Fig. A1-1) with a speciation event. The likelihood of a lineage just below the ancestral node, $DCi(tC)$, is

(2a)

(2b)

(2c)

Comparing Equation 2 with Equation 4 of Maddison et al. (2007), the node join for states A and B is as simple as in BiSSE because a species endemic to a single region can only produce daughters that are themselves only in that region. A parent species in AB, however, can produce three possible pairs of daughters: AB and A, AB and B, or A and B (terms one to three, respectively, of Equation 2c).

When the ancestral node is the root of the tree (C=R), the conditional likelihoods $DRi(tR)$ must be summed with an appropriate weighting to obtain the likelihood of the entire tree given the parameter values. Equal or equilibrium frequencies of the three geographic states can be used as the weights, but this is not always appropriate (Goldberg and Igić 2008). A more robust solution is to weight each state by its likelihood of giving rise to the observed data (FitzJohn et al. 2009), which we do in all analyses below.

#### Likelihood of extinction (E).—

The probability that a lineage in state i at time t goes extinct by the present time (t=0) is denoted $Ei(t)$. Changes in Ei over time are described by

(3a)

(3b)

(3c)

Equation 3 is analogous to Equation 7 of Maddison et al. (2007) for species endemic to Regions A or B and slightly more complicated for AB species because of the additional modes of speciation. The possible ways in which a lineage can eventually go extinct are no events at this particular time but later extinction (first term of each of Equation 3abc); immediate extinction (only possible for endemics, second term of Equations 3a and 3b); a state change at this time via dispersal for endemics (third term of Equations 3a and 3b) or extinction for widespread species (second and third terms of Equation 3c), followed by eventual extinction from the new state; or speciation at this time followed by eventual extinction of both daughter lineages (final terms of each of Equation 3abc).

Extinction can only occur if some amount of time has passed, so the initial condition is $Ei(0)=0$ when taxon sampling is complete or $Ei(0)=1−fi$ if only a proportion fi of tips of state i are included in the tree (FitzJohn et al. 2009). As in BiSSE, the extinction probability equations (Equation 3) can be solved numerically. Their solutions can then be used to obtain solutions to Equation 1, which, together with the procedure for combining branch likelihoods at nodes (Equation 2), allows the tree likelihood to be computed for a specified set of parameter values. Maximum likelihood estimates or posterior probability distributions of regional speciation, extinction, and dispersal rates can then be obtained.

## SIMULATION TESTS

We used simulation tests to assess the accuracy and power of the GeoSSE model under a collection of scenarios for generating spatial structure in diversity and endemism.

### Methods

We simulated trees using a continuous-time birth–death process under a variety of parameter values summarized in Table 1. Each simulation started with a single lineage in state AB, and each scenario was tested with a batch of 500 trees. The tree sizes varied because the simulations were run for a fixed period of time rather than to a fixed number of tips, but we selected time periods such that the expected number of tips was 200. In order to consider a tree for analysis, we required a minimum of 20 tips and the existence of at least two of the three possible tip states. Acquisition bias (Lewis 2001) is, therefore, a possible artifact that may reduce estimation accuracy, but we did not find any obvious signs of bias in our results.

TABLE 1.

Parameter values used in the simulation tests. sA is speciation within Region A, sB is speciation within Region B, $sAB$ is between-region speciation, xA is extinction in Region A, xB is extinction in Region B, dA is dispersal from A to B, dB is dispersal from B to A, and T is elapsed time. In all cases, the expected number of tips per tree is approximately 200

 Batch sA sB $sAB$ xA xB dA dB T Scenario Results 1 2.0 2.0 0 1.9 1.9 2.0 2.0 4.2 Symmetric parameters Figure 2a–c 2 0.5 2.0 0 1.9 1.9 2.0 2.0 9.2 Asymmetric speciation Figure 2a 3 2.0 2.0 0 1.9 0.4 2.0 2.0 2.5 Asymmetric extinction Figure 2b 4 2.0 2.0 0 1.9 1.9 2.0 0.8 5.4 Asymmetric dispersal Figure 2c 5 1.5 0.5 1.0 0.7 0.7 1.5 1.5 4.1 Both speciation modes Figure 3a 6 1.5 0.5 0 0.7 0.7 1.5 1.5 4.7 Within-region speciation Figure 3b 7 0 0 1.0 0.1 0.1 1.5 1.5 12.0 Between-region speciation Figure 3c 8 1.5 0 0 0.5 0.5 2.0 0 5.1 Source–sink system Figures 4a,b and 5 9 0.5 0.5 0.2 1.0 1.0 3.0 4.0 11.3 Sink–sink system Figure 4c,d
 Batch sA sB $sAB$ xA xB dA dB T Scenario Results 1 2.0 2.0 0 1.9 1.9 2.0 2.0 4.2 Symmetric parameters Figure 2a–c 2 0.5 2.0 0 1.9 1.9 2.0 2.0 9.2 Asymmetric speciation Figure 2a 3 2.0 2.0 0 1.9 0.4 2.0 2.0 2.5 Asymmetric extinction Figure 2b 4 2.0 2.0 0 1.9 1.9 2.0 0.8 5.4 Asymmetric dispersal Figure 2c 5 1.5 0.5 1.0 0.7 0.7 1.5 1.5 4.1 Both speciation modes Figure 3a 6 1.5 0.5 0 0.7 0.7 1.5 1.5 4.7 Within-region speciation Figure 3b 7 0 0 1.0 0.1 0.1 1.5 1.5 12.0 Between-region speciation Figure 3c 8 1.5 0 0 0.5 0.5 2.0 0 5.1 Source–sink system Figures 4a,b and 5 9 0.5 0.5 0.2 1.0 1.0 3.0 4.0 11.3 Sink–sink system Figure 4c,d

We obtained maximum likelihood parameter estimates to assess accuracy across a batch of trees. Rates were constrained to be nonnegative, and the subplex algorithm was used to search the sometimes-tricky likelihood space. To test power in discerning differences between rates, we used likelihood ratio tests for which the critical log-likelihood difference was determined from simulated trees with symmetric rate parameters.

To test model selection performance, we compared scores of the Akaike information criterion (AIC), equal to twice the difference between the number of free parameters in a model and its maximum log-likelihood. The best model is the one with the lowest AIC score, but models scoring no more than two units higher still have “substantial” support (Burnham and Anderson 2002). The parameter combinations compared under this procedure were chosen to yield similar equilibrium distributions of tip states when possible, so as to give a fair and conservative test of the model's ability to distinguish between alternative explanations for the same pattern.

To assess precision and power for individual trees, we additionally carried out a Bayesian analysis on 100 trees chosen randomly from Batch 8. Each tree was subjected to 5000 post-burn-in iterations of Markov chain Monte Carlo (MCMC) under broad priors (slice sampling algorithm, exponential priors with rate 0.5). We formed credible intervals for the differences between relevant parameter pairs by computing quantiles of the MCMC samples.

Finally, to assess correlations between model parameters, we performed similar MCMC analyses on trees from Batches 5–9. Ten trees of approximately 200 tips were taken from each batch. A cross-correlation matrix was computed for the posterior samples from each tree.

Code for simulating trees under the GeoSSE model is available in the C program SimTreeSDD, available as Supplementary Material and from http://tigger.uic.edu/∼eeg/code/code.html. Code for estimating parameters is available in the R package diversitree (FitzJohn 2010a), available as Supplementary Material and from http://cran.r-project.org/web/packages/diversitree/, which was also used for the maximum likelihood and Bayesian analyses. Analysis scripts are in the Dryad database at doi:10.5061/dryad.8343.

### Scenario-specific results

#### Regional differences in a single process.—

We first apply our model to test the separate contributions of speciation, extinction, and dispersal to diversity differences between regions. Under the center of origin hypothesis (Croizat et al. 1974), a region with high diversity has a high speciation rate. Wallace (1878) argued that low diversity in some areas is due primarily to extinctions there driven by climate fluctuations. The center of accumulation hypothesis (Mora et al. 2003) suggests that high diversity in a region comes through immigration.

We used likelihood ratio tests to assess the power of the GeoSSE model to recover each of these three scenarios (fixing $sAB$ to zero). Using trees simulated with symmetric parameter values (Batch 1; see Table 1), we fit a six parameter model and three constrained five parameter models ($sA=sB$, $xA=xB$, or $dA=dB$) to obtain the log-likelihood differences below which 95% of the trees fell. These critical values were 2.07 for the equal speciation model, 1.80 for the equal extinction model, and 1.78 for the equal dispersal model. Then we performed likelihood ratio tests, using those critical values, on trees simulated with asymmetric rates. We found that 95% of the trees correctly rejected equal speciation rates (Batch 2), 26% correctly rejected equal extinction rates (Batch 3), and 29% correctly rejected equal dispersal rates (Batch 4).

For each of Batches 1–4, we fit the four models corresponding to those scenarios (i.e., one three parameter model and three four parameter models). With the three processes symmetric (Batch 1), the correct model had the lowest AIC score on 71% of the trees, but it was never substantially preferred (i.e., in no case did all the other models score more than two AIC units worse). Under the three asymmetric scenarios, the correct model was substantially preferred on 95% of trees under asymmetric speciation (Batch 2), 45% of trees under asymmetric extinction (Batch 3), and 46% of trees under asymmetric dispersal (Batch 4). When the full six parameter model was added to the set of those considered, it frequently scored well, lowering the selection of the correct model to 35% and 33% of trees for Batches 3 and 4, respectively, and always scoring comparably to or better than the correct model for Batch 2. These results are consistent with the known tendency of AIC to favor parameter-rich models (Kass and Raftery 1995), and we therefore prefer to conduct our subsequent inference by comparing parameter estimates rather than by selecting among models.

Figure 2 shows the effects of tree size and correlations between corresponding parameters in Batches 1–4. The median rate estimates under the six parameter model across each batch of 500 trees were close to the values used in the simulations. The handful of outlier points were mostly small trees. Asymmetry in speciation rate could be identified more reliably across trees than could asymmetry in dispersal or extinction, though the parameter clouds for even those quantities show moderate separation (keep in mind the high density of points near the true values).

FIGURE 2.

Maximum likelihood parameter estimates under asymmetries in each of a) speciation, b) extinction, and c) dispersal. Of the 500 trees per batch, a representative 150 are shown. The symmetric parameter values (Batch 1 in Table 1) are shown by open circles, and the asymmetric parameter values (Batches 2–4 for panels (a–c), respectively) are shown by filled triangles. Large gray symbols show the median parameter estimates for all trees in each batch, and dashed lines show the values used in the simulations. Symbol sizes indicate the number of tips per tree, with the scale shown in the panel (a) legend.

FIGURE 2.

Maximum likelihood parameter estimates under asymmetries in each of a) speciation, b) extinction, and c) dispersal. Of the 500 trees per batch, a representative 150 are shown. The symmetric parameter values (Batch 1 in Table 1) are shown by open circles, and the asymmetric parameter values (Batches 2–4 for panels (a–c), respectively) are shown by filled triangles. Large gray symbols show the median parameter estimates for all trees in each batch, and dashed lines show the values used in the simulations. Symbol sizes indicate the number of tips per tree, with the scale shown in the panel (a) legend.

#### Geographic mode of speciation.—

The GeoSSE model allows us to ask whether speciation occurs mainly within regions (sA and/or sB dominate) or if it involves reproductive isolation across the border between the regions ($sAB$ dominates). We consider three scenarios relating to the geographic mode of speciation: both within- and between-region speciation (Batch 5), within-region only ($sAB=0$, Batch 6), and between-region only ($sA=sB=0$, Batch 7). Parameter values are given in Table 1 and histograms of maximum likelihood estimates of the three speciation rates are shown in Figure 3.

FIGURE 3.

Histograms of maximum likelihood parameter estimates, with emphasis on geographic mode of speciation. Panels (a–c) are for Batches 5–7, respectively (Table 1). Arrows indicate the parameter values used in the simulations. In panel (c), the results for sA and sB are nearly identical thus overwrite each other.

FIGURE 3.

Histograms of maximum likelihood parameter estimates, with emphasis on geographic mode of speciation. Panels (a–c) are for Batches 5–7, respectively (Table 1). Arrows indicate the parameter values used in the simulations. In panel (c), the results for sA and sB are nearly identical thus overwrite each other.

When speciation happens only within regions, $sAB$ is usually correctly estimated to be near zero, and different values for sA and sB can easily be distinguished (Fig. 3b). When speciation is only between regions, sA and sB are correctly estimated near zero, and the positive value of $sAB$ is correctly found (Fig. 3c). When speciation occurs both within and between regions, the three speciation rates can on average be recovered, but their distributions across trees are broader and overlapping (Fig. 3a). When between-region speciation occurs but is ignored, estimates of sA and sB are correctly ordered and distinguished but are biased substantially upward (by more than 40% with the Batch 5 parameter values; results not shown).

In likelihood ratio tests, using critical values determined from Batches 6 and 7 (1.21 and 1.29, respectively), we found that 85% of trees in Batch 5 correctly rejected $sAB=0$ and 100% correctly rejected $sA=sB=0$.

#### Macroevolutionary sinks.—

Dispersal or range expansion of species can play an important role in determining regional diversity (Roy and Goldberg 2007). This is particularly evident in “macroevolutionary sinks,” regions in which the speciation rate is low and richness is maintained by immigration. We consider two such scenarios. In the source–sink case (Batch 8, Table 1), all species in the system arise in one region (the source), but they may expand their ranges, so diversity in the other region (the sink) increases through immigration. For simplicity, dispersal from the sink to the source does not occur, and extinction occurs at the same rate in both regions. In the sink–sink scenario (Batch 9, Table 1), both regions have speciation rates lower then their extinction rates, so without dispersal, diversity would fall to zero in each area. Because extinction in each region is an independent event, however, sufficiently frequent dispersal can maintain positive diversity in the system, allowing each region to serve as a refuge for the other.

Despite the difficulties in estimating extinction and dispersal rates, the GeoSSE model is able to recover both source–sink and sink–sink dynamics (Fig. 4). Net diversification is nearly always estimated as positive in the source (Fig. 4a) and negative in the sinks (Fig. 4b–d). Dispersal rate estimates show more variation, but immigration is clearly recovered as greater than diversification in the sinks and less in the source.

FIGURE 4.

Histograms of maximum likelihood parameter estimates, with emphasis on dispersal maintaining diversity. Panels (a) and (b) are for the source and sink regions in Batch 8, and panels (c) and (d) are for the two sink regions in Batch 9. Net diversification rates of and immigration rates into each region are shown. Arrows indicate the parameter values used in the simulations.

FIGURE 4.

Histograms of maximum likelihood parameter estimates, with emphasis on dispersal maintaining diversity. Panels (a) and (b) are for the source and sink regions in Batch 8, and panels (c) and (d) are for the two sink regions in Batch 9. Net diversification rates of and immigration rates into each region are shown. Arrows indicate the parameter values used in the simulations.

To better reflect how real-world analysis of a single data set would proceed, results from the posterior probability distributions of 100 trees from the source–sink scenario (Batch 8) are shown separately (Fig. 5). Precision increases markedly with tree size and is lowest for dispersal. Accuracy is generally good for extinction and within-region speciation rates, and it improves with tree size for dispersal and between-region speciation rates. There is a slight bias toward underestimating the speciation difference between the regions (Fig. 5a; likely because sB and $sAB$ can be given values greater than but not less than their true values of zero), which causes underestimation of the dispersal difference (Fig. 5d).

FIGURE 5.

Credible intervals for diversity differences driven by source–sink dynamics (Batch 8). Each point shows results for a single tree, ordered on the horizontal axis by tree size. Open circles show the median value of the quantity written on the vertical axis, and inner and outer whiskers mark the 50% and 90% credible intervals, respectively. Horizontal dashed lines indicate the values used in the simulations.

FIGURE 5.

Credible intervals for diversity differences driven by source–sink dynamics (Batch 8). Each point shows results for a single tree, ordered on the horizontal axis by tree size. Open circles show the median value of the quantity written on the vertical axis, and inner and outer whiskers mark the 50% and 90% credible intervals, respectively. Horizontal dashed lines indicate the values used in the simulations.

Of the 100 trees, 98% confidently show a difference in the source and sink speciation rates (i.e., a difference of zero falls outside the 90% credibility interval); this increases to 100% when considering only the 57 trees with more than 200 tips (Fig. 5a). Ninety-three percent of trees (89% of trees with more than 200 tips) show a speciation rate difference that is consistent with the value used in the simulations (Fig. 5a). All trees correctly find no extinction rate difference between the two regions, and this is not due solely to large credibility intervals, at least on the large trees (Fig. 5c). The between-region speciation rate was zero in the simulations and was estimated to be very small on the large trees (Fig. 5b). On the smallest trees, however, some of the diversity in the sink region was incorrectly attributed to between-region speciation (Fig. 5b; also to speciation within the sink to a lesser extent, results not shown) rather than to immigration. Seventy-four percent of trees (96% of trees with more than 200 tips) confidently show a difference between the dispersal rates, and 79% of trees (75% of trees with more than 200 tips) find a dispersal difference consistent with the value used in the simulations (Fig. 5d).

### General conclusions

#### Parameter correlations.—

We found that the posterior distribution in multidimensional parameter space is considerably more constrained than is represented by the marginal parameter distributions. Some parameter correlations persisted across much of parameter space and all individual trees, whereas others were more localized or intermittent. All scenarios except the source–sink (Batch 8) showed consistent correlation between net diversification rate within a region and immigration into that region ($sA−xA$ and $dB$; $sB−xB$ and $dA$). This was generally driven by higher correlations between extinction and dispersal than between speciation and dispersal. Correlation between $dB$ and $xA$ was slightly weaker in the source–sink scenario, likely because dispersal into the source is low. Correlation between speciation and extinction within a region was common only when speciation was relatively high (Region A of Batches 5, 6, and 8). There was little correlation between the two (or three) speciation rates or between the two extinction rates (see also Fig. 2). Correlation between the two dispersal rates was common only in Batch 5. The overall strength of correlations was greatest in the sink–sink scenario (Batch 9).

#### Analysis recommendations.—

Due to the complex likelihood surfaces that arise under GeoSSE (as revealed by parameter correlation study and trial usage of a variety of optimization algorithms), working with more than point estimates of parameters is important. The MCMC-based approach we illustrate in the source-sink scenario above and the empirical application below provided a relatively straightforward basis for examining parameter uncertainties and correlations, and for hypothesis testing without the complications of model selection procedures.

Such investigations will be an important part of any empirical analysis, although some patterns are likely to emerge in all applications. Based on our simulation tests, the signature of speciation, particularly the within-region mode, appears to be written most clearly on a phylogeny, followed by those of extinction and dispersal. The strongest parameter correlations are typically between extinction and immigration, and accounting for those can allow more powerful inference than can the marginal parameter distributions alone.

## APPLICATION TO CALIFORNIA CHAPARRAL

We applied GeoSSE to the problem of diversification and historical habitat shifts in the California flora. The California Floristic Province is one of five Mediterranean biodiversity hot spots on the planet, all of which are characterized by a diversity of habitat types that turn over rapidly at short spatial scales, thus supporting multiple niches for diversification and species persistence (di Castri and Mooney 1973). Mediterranean regions are also all characterized by unique sclerophyllous shrubland communities, which are rare or absent outside of these temperate hot spots (Verdú et al. 2003). One or both of these features of Mediterranean biomes may critically affect the currently high diversity in those regions.

Two California-native shrub genera, Arctostaphylos Adans. (manzanita, Family: Ericaceae) and Ceanothus L. (California lilac, Family: Rhamnaceae), are key indicator species of the chaparral, California's version of unique Mediterranean shrubland (24% of Arctostaphylos spp. and 30% of Ceanothus spp. are chaparral endemics). However, both genera also contain forest/woodland-restricted species (7% and 29%, respectively), and generalist species inhabiting both chaparral and wooded habitats (69% and 41%, respectively), making these genera excellent candidates for examining how a unique habitat (chaparral) and shifts in habitat occupancy have interacted to produce extant diversity and distributions in California.

Because of the ubiquity of sclerophyllous shrublands in Mediterranean biodiversity hot spots, we predicted that the chaparral habitat may promote speciation, leading to a higher speciation rate in the chaparral than in forested regions ($sC>sF$). Consequently, we expected to see greater habitat expansion out of chaparral than into it ($dC>dF$). We also investigated historical expansion and contraction of habitat tolerance in these genera. Like other Mediterranean hot spots, California features many, highly interdigitated plant communities. Therefore, species have ample opportunity to enter and adapt to alternative habitats. Because of this, we predicted that expansion in habitat tolerance would be more common than habitat specialization (i.e., net range expansion is positive; $dC−xF>0$ and $dF−xC>0$).

We further expected the interdigitated nature of California's habitats to preclude between-region speciation: barriers to gene flow that separate all habitat patches by type are uncommon, so it is unlikely that a generalist species would produce one chaparral-specialist daughter and one forest-specialist daughter ($sCF=0$). This would mean that habitat specialization is more likely to occur via extirpation (anagenic specialization) or local speciation (cladogenic specialization) than via allopatric divergence. We therefore asked which of anagenic of cladogenic specialization has been more important (comparing $xF$ with $sC$ and $xC$ with $sF$).

### Methods

We generated time-calibrated phylogenies for each of Arctostaphylos and Ceanothus using sequences for internal transcribed spacer regions of 18s-26s rDNA retrieved from GenBank (http://www.ncbi.nlm.nih.gov) and fossil calibration dates retrieved from the Paleobiology database (http://paleodb.org). Phylogenies were constructed with the Bayesian MCMC method implemented in BEAST (Drummond and Rambaut 2007), using a lognormally distributed relaxed molecular clock model (Drummond et al. 2006) and a birth–death prior. Habitat states for extant species were obtained from the Jepson Manual (Hickman 1993), the CalFlora database (http://www.calflora.org), and other sources (Fig. 6). We fit the GeoSSE model to Arctostaphylos and Ceanothus jointly by approximating the genera as independent; joint fitting increased power without qualitatively changing the results from estimates derived from each genus separately. Model parameters were estimated via MCMC over 1000 trees per genus, and results are reported as the proportion of samples from the posterior distribution for which a statement is true. Models with and without the between-habitat speciation parameter (sCF) were compared with the Bayes factor (Kass and Raftery 1995). Full methods are provided in online Appendix 2, and data files and analysis scripts are available in the Dryad database at doi:10.5061/dryad.8343; TreeBASE study number 11167.

FIGURE 6.

Summary phylogenies with tip states for Arctostaphylos and Ceanothus. These are the maximum clade credibility trees with mean node heights; the posterior set of trees was used in the analysis.

FIGURE 6.

Summary phylogenies with tip states for Arctostaphylos and Ceanothus. These are the maximum clade credibility trees with mean node heights; the posterior set of trees was used in the analysis.

### Results

We found “positive” evidence in favor of the six parameter over the seven parameter model ($2ln[BF67]=3.0$; Kass and Raftery 1995). Based on this moderate statistical support and the spatial structure of the system, we present results only for the model that excludes (sets to zero) sCF. Parameter estimates are reported in Table 2, and their posterior distributions are shown in Figure 7a–c.

TABLE 2.

For the California plants, mean parameter estimates, their standard errors (s.e.), and their correlations. C denotes chaparral habitat and F denotes forest habitat. Stronger correlations (approximately −0.9) were found between $sC−xC$ and $dF$, and between $sF−xF$ and $dC$

 Correlations Parameter Mean ± s.e. $sC$ sF xC xF dC dF sC 0.194 ± 0.0032 — sF 0.080 ± 0.0020 −0.436 — xC 0.294 ± 0.0067 0.322 0.503 — xF 0.481 ± 0.011 0.485 0.195 0.280 — dC 1.293 ± 0.028 0.515 −0.025 0.175 0.887 — dF 0.873 ± 0.021 −0.106 0.653 0.837 0.101 −0.018 —
 Correlations Parameter Mean ± s.e. $sC$ sF xC xF dC dF sC 0.194 ± 0.0032 — sF 0.080 ± 0.0020 −0.436 — xC 0.294 ± 0.0067 0.322 0.503 — xF 0.481 ± 0.011 0.485 0.195 0.280 — dC 1.293 ± 0.028 0.515 −0.025 0.175 0.887 — dF 0.873 ± 0.021 −0.106 0.653 0.837 0.101 −0.018 —
FIGURE 7.

Posterior probability distributions of GeoSSE parameters for California plants. Rates of a) speciation, b) extinction, and c) dispersal were estimated for chaparral (solid lines) and forest (dashed lines) habitats. Differences in d) speciation and extinction rates between habitats, e) range contraction versus expansion, and f) mechanism of specialization were computed from the six estimated parameters.

FIGURE 7.

Posterior probability distributions of GeoSSE parameters for California plants. Rates of a) speciation, b) extinction, and c) dispersal were estimated for chaparral (solid lines) and forest (dashed lines) habitats. Differences in d) speciation and extinction rates between habitats, e) range contraction versus expansion, and f) mechanism of specialization were computed from the six estimated parameters.

We inferred a higher rate of speciation in chaparral than in forested regions in California ($sC−sF=0.11±0.003$, positive with posterior probability 0.82; Fig. 7a,d). Comparing directions of range expansion and contraction, we find indications of more rapid dispersal out of chaparral than into it ($dC−dF=0.420±0.019$, positive with posterior probability 0.72; Fig. 7c) and of more frequent reduction to chaparral habitat than to forest ($xF−xC=0.187±0.0051$, positive with posterior probability 0.76; Fig. 7b,d).

We also found that expansion in habitat tolerance has been more common than habitat specialization in the history of these genera (Fig. 7e). Net range expansion (i.e., expansion exceeds contraction) out of chaparral-covered regions and into forested habitat was strongly supported ($dC−xF=0.812±0.017$, positive with posterior probability 0.97). Support for net range expansion in the opposite direction was milder ($dF−xC=0.579±0.015$, positive with posterior probability 0.75).

Finally, we find that the production of range-limited species (i.e., species inhabiting chaparral or forest but not both) has occurred predominantly via anagenesis, rather than via cladogenesis, in these genera (Fig. 7f). (For specialization on or range limitation to chaparral-covered regions, $xF−sC=0.287±0.0086$ and was positive with posterior probability 0.90. For forested regions, $xC−sF=0.214±0.0054$ and was positive with posterior probability 0.96.)

Correlations between parameters are reported in Table 2. As in the simulation tests, the strongest correlations were between the extinction/extirpation and range expansion parameters. Next strongest, for these data, were the correlations between speciation in and range expansion out of each habitat; this is likely driven by the tie between extinction and dispersal. The combination of these two factors leads to very strong negative correlations between net diversification within a region and immigration into that region. Correlations between speciation and extinction in the same habitat type were relatively low.

Imprecision in the parameter estimates, as displayed in Figure 7, reflects contributions from both phylogenetic uncertainty (in topology and branch lengths, as captured by the posterior set of trees) and uncertainty from the parameter estimation process on each tree. Using coefficients of variation for each parameter to compare average within-tree variability, tree-to-tree variation in median estimates, and overall variability, we find that phylogenetic uncertainty contributes approximately 50% of the variation seen in each speciation rate, 40% for each extinction rate, and 35% for each dispersal rate. More precise phylogenetic inference would hence provide moderate improvement in our results, particularly for speciation, the process to which the model is most sensitive.

### Interpretation

Our results indicate that chaparral is a key site of speciation in California, at least for the two shrub genera under consideration here. Future examination of additional groups in more regions will test whether sclerophyllous shrublands consistently foster speciation across all five Mediterranean biodiversity hot spots. We also found evidence that habitat tolerance is more likely to expand than to contract in these California genera. The spatially fine-grained, and hence readily accessible patches of alternative habitat types in California and other Mediterranean regions may promote habitat generalization and may therefore support species richness if habitat-generalist lineages have lower extinction probability than specialists (as assumed by GeoSSE). We further found that the net direction of range expansion was in the direction of forested habitats; similarly, other forest lineages in California and other Mediterranean regions may have shrubland origins. As predicted, we found no evidence of vicariant speciation between forest and chaparral habitat types, likely because barriers to gene flow between these habitat types are rare. However, we did estimate moderate rates of anagenic habitat specialization, perhaps due to trade-offs between adaptation to drier, fire-prone shrublands, and moister, shaded forest habitats.

Habitat preference is often thought of as a labile trait, with transitions potentially occurring on ecological rather than macroevolutionary timescales, and this could limit GeoSSE's power (see below). However, recent evidence indicates that plant community membership is in fact quite conserved, with >96% of sister species in one study occupying the same plant community type as their most recent shared ancestor (Crisp et al. 2009). This indicates that adaptation to a new plant community is a significant and relatively rare event in a plant lineage, and one that is therefore tractable at macroevolutionary timescales.

Directly comparing the success of generalist and specialist species is not possible with GeoSSE because the same rates apply to each (though in different combinations). Possible reparameterizations to address this are discussed below, but a different approach is to apply the existing model to additional clades spanning various combinations of habitat types. Resulting associations between habitat breadth and speciation or extinction rates may reveal the extent to which habitat generalization enhances speciation rates or reduces global extinction probability.

## DISCUSSION

### Biogeographic Significance

The GeoSSE model advances existing biogeographic inference methods by integrating processes that have previously been treated separately (Lamm and Redelings 2009, Ree and Sanmartín 2009). Range and habitat evolution have been inferred on phylogenies using models for discrete characters that do not include parameters for the tempo, mode, or regional dependence of lineage diversification (e.g., Nepokroeff et al. 2003, Fine et al. 2005, Sanmartín et al. 2008, Weir et al. 2009, Bastida et al. 2010, Pirie et al. 2010). The DEC model of range evolution considers mode but not tempo or regional dependence (Ree et al. 2005, Ree and Smith 2008). Such models have been used to test for lineage movement as a key innovation (e.g.,Moore and Donoghue 2007, Moore and Donoghue 2009), but because separate analyses are required to first reconstruct ancestral states and then estimate their effect on diversification, this approach may sacrifice power or be subject to confounding model interactions (Maddison 2006).

Studies of regional effects on diversification, largely in the context of latitudinal diversity gradients or centers of origin, typically employ sister-group or -species comparisons or regressions on lineage or clade size or age (Gaston and Blackburn 1996, Cardillo et al. 2005, Ricklefs 2006, Weir and Schluter 2007, Wiens 2007). However, these do not consider the effect of change in species' ranges through time (Roy and Goldberg 2007). Methods for distinguishing the geographic mode of speciation are similarly hampered by post-speciational range shifts (Chesser and Zink 1994, Losos and Glor 2003, Fitzpatrick and Turelli 2006).

Application of BiSSE (Maddison et al. 2007) to a geographic character (Valente et al. 2009, Rabosky and Glor 2010) does allow consideration of both diversification differences and range changes. Geographic mode of speciation is not considered, though, so it does not cleanly extract the signal of region-specific effects from widespread species. The different parameterizations of BiSSE and GeoSSE may, however, be combined to give complementary perspectives on the dynamics of a system (cf. Anacker et al. 2011).

### Model Limitations and Extensions

As mathematical models of character evolution on a phylogeny become more elaborate, an obvious question is, how much information is necessary for confident inferences? The answer certainly depends on the model, the details of the data, and the strength of the signal being sought, but our simulation and empirical results indicate that roughly one or two hundred tip species may be sufficient for GeoSSE to uncover significant differences between parameters. The continuing rapid increase in availability of large phylogenies makes this a reasonable demand, and we have also illustrated how separate, smaller trees can be combined into a single analysis (subject to assumptions or justifications of independence; e.g., Anacker et al. 2011).

A potential liability of requiring a moderately large set of species is the assumption that rates of diversification and trait evolution are constant across the phylogeny, which may become more tenuous with broader and deeper clades. Allowing rates to vary across time and between clades is certainly possible, and it could be useful to represent, for example, the effects of an unmodeled trait that appears in one or a few instances, or changes in spatial connectivity such as appearance of a dispersal corridor or a vicariance event. This technique must be applied judiciously (e.g., Alfaro et al. 2009, Weir et al. 2009, FitzJohn 2010b), however, to prevent an explosion in the number of parameters to be estimated. Diversification that depends on species richness presents a different form of time-dependent rates. Phylogenetic models in which speciation and extinction are affected by species interactions will be considerably more complicated to develop, but they may be necessary to distinguish truly density-dependent diversification from the more general pattern of slowdowns in diversification (Pybus and Harvey 2000, Rabosky and Lovette 2008). Including a biogeographic context is especially relevant here (Rabosky 2009) because species interactions depend on their spatial distributions. Successfully tying rate heterogeneity to particular conditions, events, or trait changes will greatly improve the extent to which relatively simple models fit real data, and it will enhance our understanding of the forces shaping diversification and trait macroevolution.

As with related macroevolutionary models of character evolution, GeoSSE's utility will be limited when state changes are very rapid or very slow relative to diversification or the age of the clade being studied. This will affect the spatial scale of regions that can be effectively treated, because, for example, range changes between continents occur on a different temporal scale than those within local landscapes. A small set of simulations (not shown) indicated that when extinction and especially dispersal rates are very high relative to speciation and elapsed time, speciation rates are still accurately estimated and dispersal rates are correctly found to be much larger, but pronounced correlations can cause dispersal-driven diversity differences to be misattributed to extinction. Further tests of this limitation will require separating the rates of range contraction or extirpation from those of global extinction (see below). It may well be that rates of range expansion and contraction can be correctly inferred as large relative to speciation and extinction, but that ancestral ranges (see below) cannot then be reconstructed with any confidence.

One difficulty with this model's structure is the inevitable correlation between some parameters. We cannot reduce parameter number by combining processes because each has its own unique behavior. Incomplete independence of the parameters is, however, evidenced by pervasive correlations, especially between dispersal and extinction. This is intuitively expected because immigration and extirpation “undo” each other in their effects on species richness. It is perhaps surprising that correlations between speciation and extinction are less strong, but we attribute this to speciation effects being directly written on tree shape, whereas dispersal effects are not. Strong correlations do not necessarily prevent useful inference, however, and understanding their structure will improve the precision of conclusions.

All else being equal, GeoSSE may have greater statistical power than BiSSE because it has the same number of rate parameters but a larger number of character states. Further work is needed to determine whether additional states generally improve power in the BiSSE family of models or if the benefits of increased resolution may be offset by the reduction in number of tips in each state and the greater number of parameters generally required to describe transitions between the states. When feasible, however, multistate characters may allow finer-grained questions to be addressed.

GeoSSE requires only six rate parameters with its three states by making constraining assumptions about several processes, for example, rates of local and global extinction are equal within each region, and endemics and wide-ranging species share common regional rates of speciation. This strategy can be retained to maintain a reasonable number of parameters when extending the model to allow more than two regions (online Appendix 3). Constraints of this nature may not always be desired, however. For example, further study of diversification and range shifts in California chaparral communities would benefit from separating the parameters describing local extirpation and global extinction. This would enable tests of habitat shift directionality not directly affected by lineage extinction facilitate comparisons between habitat specialists and generalists. In other systems, hypotheses of interest might involve state transitions not allowed by GeoSSE, for example, A changing directly to B, an ancestral AB species yielding two AB daughters at a speciation event, or global extinction of AB species.

A general framework would allow an arbitrary number of states, all possible transitions between them along phylogenetic branches, and all possible modes of state segregation at cladogenesis events. This general model could then be constrained as needed by disallowing certain events and assuming that some processes share parameters. For example, such constraints might set northward dispersal to be constant across regions, speciation rates to vary linearly with latitude, or extinction rates to scale with region size. Such a framework would allow flexible handling of a variety of discrete characters and provide a natural means for testing hypotheses about anagenetic versus cladogenetic change. It could also facilitate tests of the errors incurred by fitting models with parameterizations that misrepresent the true processes acting in a system.

Analysis of nongeographic characters could also benefit from models that link state change to speciation and extinction. For example, the breakdown of self-incompatibility can accelerate speciation because self-fertilization will rapidly isolate a new lineage from its parent stock (Foxe et al. 2009); this mating system trait can also affect extinction and net diversification rates (Goldberg et al. 2010). Polyploidization can cause immediate speciation (Ramsey and Schemske 1998), and this process could be built into a model of chromosome number evolution (Mayrose et al. 2010).

These examples recall punctuated character evolution in the sense that speciation events can be a direct and immediate outcome of anagenetic transitions. A biogeographic process that might be modeled in this way is long distance or “jump” dispersal in which rare migration events establish new populations that are effectively isolated by geographic barriers from the rest of the species. In GeoSSE, this phenomenon translates to a high value of $sAB$, the allopatric speciation parameter, which accelerates divergence following dispersal. However, one might imagine the need for a model in which state transitions and speciation events are exactly coincident in time. In general, tests of punctuated change are preferably performed using methods that explicitly model the process of character change (e.g., Bokma 2008), rather than indirect strategies such as setting all branches of the phylogeny to equal length (Pagel 1994, Pagel 1999a).

In addition to rate estimation, reconstruction of ancestral states is an appealing class of inference due to its nominally more concrete view of changes in character states. Implementation with GeoSSE would require an extension of existing approaches (Schluter et al. 1997, Pagel 1999b, Bollback 2006) to allow for state changes during speciation. This could be approached by considering triplets of parent and daughter states at each node. It may well be that ancestral states can be more confidently reconstructed under BiSSE and GeoSSE when diversification is strongly state-dependent. Testing is required, however, to determine accuracy and power.

Beyond the more general parameterization of GeoSSE described above, a useful extension would be to multiple, correlated characters. This would allow tests of, for example, ties between life history or morphology and geographic distribution. Another, perhaps more technically difficult possibility is to treat range as a continuous character and work with more spatially explicit functions of speciation and extinction rates; this could begin with the framework of FitzJohn (2010b).

Evidence for macroevolutionary and biogeographic dynamics may come from a variety of sources, employing data and methods spanning population biology, comparative analyses, and the fossil record. Further modeling of the reciprocal effects of speciation and character change on phylogenetic trees may enable a new, complementary suite of tests for questions typically addressed on other spatial or temporal scales.

## SUPPLEMENTARY MATERIAL

Supplementary material, including data files and/or online-only appendices, can be found at http://dx.doi.org/10.5061/dryad.8343.

## FUNDING

This work was supported by the National Science Foundation [DEB-0919089 to E.E.G., DEB-0614108 to R.H.R.]; and the National Center for Ecological Analysis and Synthesis, a center funded by the National Science Foundation [DEB-0072909 to L.T.L.].

This project benefited immensely from discussions with Kaustuv Roy, Russ Lande, and Isabel Sanmartín. Work with Boris Igić, Brian Anacker, Justen Whittall, and Susan Harrison broadened our vision of what could be tackled with these approaches. Trevor Price's lab, Boris Igić, and two anonymous reviewers shared insightful comments on the manuscript. Sally Otto generously provided independent derivations of Equations 1–3. Rich FitzJohn's effort on diversitree was extremely valuable for our implementation.

## References

Alfaro
ME
Santini
F
Brock
C
Alamillo
H
Dornburg
A
Rabosky
DL
Carnevale
G
Harmon
LJ
Nine exceptional radiations plus high turnover explain species diversity in jawed vertebrates
,
2009
, vol.
106
(pg.
13410
-
13414
)
Anacker
BL
Whittall
JB
Goldberg
EE
Harrison
SP
Origins and consequences of serpentine endemism in the California flora
Evolution
,
2011
, vol.
65
(pg.
365
-
376
)
Anderson
S
The theory of range-size (RS) distributions
Am. Mus. Novit
,
1985
, vol.
2833
(pg.
1
-
20
)
Barraclough
TG
Vogler
AP
Detecting the geographical pattern of speciation from species-level phylogenies
Am. Nat
,
2000
, vol.
155
(pg.
419
-
434
)
Bastida
JM
Alcantara
JM
Rey
PJ
Vargas
P
Herrera
CM
Extended phylogeny of Aquilegia: the biogeographical and ecological patterns of two simultaneous but contrasting radiations
Plant Syst. Evol.
,
2010
, vol.
284
(pg.
171
-
185
)
Bokma
F
Detection of “punctuated equilibrium” by Bayesian estimation of speciation and extinction rates, ancestral character states, and rates of anagenetic and cladogenetic evolution on a molecular phylogeny
Evolution
,
2008
, vol.
62
(pg.
2718
-
2726
)
Bollback
JP
SIMMAP: stochastic character mapping of discretetraits on phylogenies
BMC Bioinformatics
,
2006
, vol.
7
pg.
88

Burnham
KB
Anderson
D
Model selection and multi-model inference
,
2002
New York (NY)
Springer
Cardillo
M
Orme
CDL
Owens
IPF
Testing for latitudinal bias in diversification rates: an example using New World birds
Ecology
,
2005
, vol.
86
(pg.
2278
-
2287
)
Chesser
RT
Zink
RM
Modes of speciation in birds: a test of Lynch's method
Evolution
,
1994
, vol.
48
(pg.
490
-
497
)
Chown
SL
Kunin
WE
Gaston
KJ
Speciation and rarity: separating cause from consequence
The biology of rarity: causes and consequences of rare-common differences
,
1997
London
Chapman & Hall
(pg.
91
-
109
)
Chown
SL
Gaston
KJ
Trends Ecol. Evol.
,
2000
, vol.
15
(pg.
311
-
315
)
Crisp
M
Arroyo
M
Cook
L
Gandolfo
M
Jordan
G
McGlone
M
Weston
P
Westoby
M
Wilf
P
Linder
H
Phylogenetic biome conservatism on a global scale
Nature
,
2009
, vol.
458
(pg.
754
-
756
)
Croizat
L
Nelson
G
Rosen
DE
Centers of origin and related concepts
Syst. Zool
,
1974
, vol.
23
(pg.
265
-
287
)
di Castri
F
Mooney
HA
Mediterranean-type ecosystems: origin and structure
,
1973
Springer-Verlag
Drummond
AJ
Ho
SYW
Phillips
MJ
Rambaut
A
Relaxed phylogenetics and dating with confidence
PLoS Biol.
,
2006
, vol.
4
(pg.
699
-
710
)
Drummond
AJ
Rambaut
A
BEAST: Bayesian evolutionary analysis by sampling trees
BMC Evol. Biol.
,
2007
, vol.
7
pg.
214

Eldredge
N
Gould
SJ
Schopf
TJM
Punctuated equilibria: an alternative to phyletic gradualism
Models in Paleobiology
,
1972
San Francisco (CA)
Freeman Cooper and Company
(pg.
82
-
115
)
Fine
PVA
Daly
DC
Muñoz
GV
Mesones
I
Cameron
KM
The contribution of edaphic heterogeneity to the evolution and diversity of Burseraceae trees in the Western Amazon
Evolution
,
2005
, vol.
59
pg.
1464

FitzJohn
RG
Diversitree: comparative phylogenetic tests of diversification
2010

R package version 0.4-5
FitzJohn
RG
Quantitative traits and diversification
Syst. Biol.
,
2010
, vol.
59
(pg.
619
-
633
)
FitzJohn
RG
WP
Otto
SP
Estimating trait-dependent speciation and extinction rates from incompletely resolved phylogenies
Syst. Biol.
,
2009
, vol.
58
(pg.
595
-
611
)
Fitzpatrick
BM
Turelli
M
The geography of mammalian speciation: mixed signals from phylogenies and range maps
Evolution
,
2006
, vol.
60
(pg.
601
-
615
)
Foxe
JP
Slotte
T
Stahl
EA
Neuffer
B
Hurka
H
Wright
SI
Recent speciation associated with the evolution of selfing in Capsella
,
2009
, vol.
106
(pg.
5241
-
5245
)
Gaston
KJ
Species-range size distributions: products of speciation, extinction and transformation
Philos. Trans. R. Soc. B Biol. Sci.
,
1998
, vol.
353
(pg.
219
-
230
)
Gaston
KJ
Blackburn
TM
The tropics as a museum of biological diversity: an analysis of the New World avifauna
Proc R. Soc. Lond. Ser. B.
,
1996
, vol.
263
(pg.
63
-
68
)
Gaston
KJ
Chown
SL
Magurran
AE
May
RM
Geographic range size and speciation
Evolution of biological diversity
,
1999
Oxford (UK)
Oxford University Press
(pg.
237
-
259
)
Gillespie
RG
Roderick
GK
Arthropods on islands: colonization, speciation, and conservation
Ann. Rev. Entomol
,
2002
, vol.
47
(pg.
595
-
632
)
Goldberg
EE
Igić
B
On phylogenetic tests of irreversible evolution
Evolution
,
2008
, vol.
62
(pg.
2727
-
2741
)
Goldberg
EE
Kohn
JR
Lande
R
Robertson
KA
Smith
SA
Igić
B
Species selection maintains self-incompatibility
Science
,
2010
, vol.
320
(pg.
493
-
495
)
Goldberg
EE
Roy
K
Lande
R
Jablonski
D
Diversity, endemism, and age distributions in macroevolutionary sources and sinks
Am. Nat
,
2005
, vol.
165
(pg.
623
-
633
)
Hickman
JC
The Jepson manual: higher plants of California
,
1993
Berkeley (CA)
University of California Press
Jablonski
D
Roy
K
Geographical range and speciation in fossil and living molluscs
Proc. R. Soc. Lond. Ser. B Biol. Sci.
,
2003
, vol.
270
(pg.
401
-
406
)
Kass
RE
Raftery
AE
Bayes factors
J. Am. Stat. Assoc.
,
1995
, vol.
90
(pg.
773
-
795
)
Lamm
KS
Redelings
BD
Reconstructing ancestral ranges in historical biogeography: properties and prospects
J. Syst. Evol.
,
2009
, vol.
47
(pg.
369
-
382
)
Lewis
PO
A likelihood approach to estimating phylogeny from discrete morphological character data
Syst. Biol.
,
2001
, vol.
50
(pg.
913
-
925
)
Losos
JB
Glor
RE
Phylogenetic comparative methods and the geography of speciation
Trends Ecol. Evol.
,
2003
, vol.
18
(pg.
220
-
227
)
WP
Confounding asymmetries in evolutionary diversification and character change
Evolution
,
2006
, vol.
60
(pg.
1743
-
1746
)
WP
Midford
PE
Otto
SP
Estimating a binary character's effect on speciation and extinction
Syst. Biol.
,
2007
, vol.
56
(pg.
701
-
710
)
Mayrose
I
Barker
MS
Otto
SP
Probabilistic models of chromosome number evolution and the inference of polyploidy
Syst. Biol.
,
2010
, vol.
59
(pg.
132
-
144
)
McKinney
ML
Extinction vulnerability and selectivity: combining ecological and paleontological views
Ann. Rev. Ecol. Syst.
,
1997
, vol.
28
(pg.
495
-
516
)
Moore
BR
Donoghue
MJ
Correlates of diversification in the plant clade Dipsacales: geographic movement and evolutionary innovations
Am. Nat.
,
2007
, vol.
170
(pg.
S28
-
S55
)
Moore
BR
Donoghue
MJ
A Bayesian approach for evaluating the impact of historical events on rates of diversification
,
2009
, vol.
106
(pg.
4307
-
4312
)
Mora
C
Chittaro
PM
Sale
PF
Kritzer
JP
Ludsin
SA
Patterns and processes in reef fish diversity
Nature
,
2003
, vol.
421
(pg.
933
-
936
)
Nepokroeff
M
Sytsma
KJ
Wagner
WL
Zimmer
EA
Reconstructing ancestral patterns of colonization and dispersal in the Hawaiian understory tree genus Psychotria (Rubiaceae): a comparison of parsimony and likelihood approaches
Syst. Biol.
,
2003
, vol.
52
(pg.
820
-
838
)
Nosil
P
MooersØ
A
Testing hypotheses about ecological specialization using phylogenetic trees
Evolution
,
2005
, vol.
59
(pg.
2256
-
2263
)
Pagel
M
Detecting correlated evolution on phylogenies: a general method for the comparative analysis of discrete characters
Proc. R. Soc. B
,
1994
, vol.
255
(pg.
37
-
45
)
Pagel
M
Inferring the historical patterns of biological evolution
Nature
,
1999
, vol.
401
(pg.
877
-
884
)
Pagel
M
The maximum likelihood approach to reconstructing ancestral character states of discrete characters on phylogenies
Syst. Biol.
,
1999
, vol.
48
(pg.
612
-
622
)
Phillimore
AB
Orme
CDL
Thomas
GH
Blackburn
TM
Bennett
PM
Gaston
KJ
Owens
IPF
Sympatric speciation in birds is rare: insights from range data and simulations
Am. Nat
,
2008
, vol.
171
(pg.
646
-
657
)
Pigot
AL
Phillimore
AB
Owens
IPF
Orme
CDL
The shape and temporal dynamics of phylogenetic trees arising from geographic speciation
Syst. Biol.
,
2010
, vol.
59
(pg.
660
-
673
)
Pirie
MD
Lloyd
KM
Lee
WG
Linder
HP
Diversification of Chionochloa (Poaceae) and biogeography of the New Zealand Southern Alps
J. Biogeogr
,
2010
, vol.
37
(pg.
379
-
392
)
Pybus
OG
Harvey
PH
Testing macro-evolutionary models using incomplete molecular phylogenies
Proc. R. Soc. Lond. Ser. B.
,
2000
, vol.
267
(pg.
2267
-
2272
)
R Development Core Team
R: a language and environment for statistical computing
,
2009
Vienna (Austria)
R Foundation for Statistical Computing
Rabosky
DL
Ecological limits on clade diversification in higher taxa
Am. Nat
,
2009
, vol.
173
(pg.
662
-
674
)
Rabosky
DL
Glor
R
,
2010
, vol.
107
(pg.
22178
-
22183
)
Rabosky
DL
Lovette
IJ
Density-dependent diversification in North American wood warblers
Proc. R. Soc. B
,
2008
, vol.
275
(pg.
2363
-
2371
)
Ramsey
J
Schemske
DW
Pathways, mechanisms, and rates of polyploid formation in flowering plants. Ann. Rev. Ecol
Syst
,
1998
, vol.
29
(pg.
467
-
501
)
Ree
RH
Moore
BR
Webb
CO
Donoghue
MJ
A likelihood framework for inferring the evolution of geographic range on phylogenetic trees
Evolution
,
2005
, vol.
59
(pg.
2299
-
2311
)
Ree
RH
Sanmartín
I
Prospects and challenges for parametric models in historical biogeographical inference
J. Biogeogr
,
2009
, vol.
36
(pg.
1211
-
1220
)
Ree
RH
Smith
SA
Maximum likelihood inference of geographic range evolution by dispersal, local extinction, and cladogenesis
Syst. Biol.
,
2008
, vol.
57
(pg.
4
-
14
)
Ribera
I
Barraclough
TG
Vogler
AP
The effect of habitat type on speciation rates and range movements in aquatic beetles: inferences from species-level phylogenies
Mol. Ecol
,
2001
, vol.
10
(pg.
721
-
735
)
Ricklefs
RE
Global variation in the diversification rate of passerine birds
Ecology
,
2006
, vol.
87
(pg.
2468
-
2478
)
Ricklefs
RE
Bermingham
E
The causes of evolutionary radiations in archipelagoes: passerine birds in the Lesser Antilles
Am. Nat
,
2007
, vol.
169
(pg.
285
-
297
)
Rosenzweig
ML
Species diversity in space and time.
,
1995
Cambridge
Cambridge University Press
Roy
K
Goldberg
EE
Origination, extinction, and dispersal: integrative models for understanding present-day diversity gradients
Am. Nat
,
2007
, vol.
170
(pg.
S71
-
S85
)
Sanmartín
I
van der Mark
P
Ronquist
F
Inferring dispersal: a Bayesian approach to phylogeny-based island biogeography, with special reference to the Canary Islands
J. Biogeogr
,
2008
, vol.
35
(pg.
428
-
449
)
Schluter
D
Price
T
Mooers
AO
Ludwig
D
Evolution
,
1997
, vol.
51
(pg.
1699
-
1711
)
Simpson
GG
The major features of evolution.
,
1953
New York
Columbia University Press
Stebbins
GL
Flowering plants: evolution above the species level
,
1974
Cambridge (MA)
Harvard University Press
Valente
LM
Reeves
G
Schnitzler
J
Mason
IP
Fay
MF
Rebelo
TG
Chase
MW
Barraclough
TG
Diversification of the African genus Protea (Proteaceae) in the Cape biodiversity hotspot and beyond: equal rates in different biomes
Evolution
,
2009
, vol.
64
(pg.
745
-
760
)
Verdú
M
Dávila
P
García-Fayos
P
Flores-Hernández
N
Valiente-Banuet
A
‘Convergent’ traits of mediterranean woody plants belong to pre-mediterranean lineages
Biol. J. Linn. Soc.
,
2003
, vol.
78
(pg.
415
-
427
)
Wagner
PJ
Erwin
DH
Erwin
DH
Anstey
RL
Phylogenetic patterns as tests of speciation models
New approaches to speciation in the fossil record.
,
1995
New York
Columbia University Press
(pg.
87
-
122
)
Wallace
AR
Tropical nature and other essays
,
1878
London
MacMillan
Weir
JT
Bermingham
E
Schluter
D
The Great American Biotic Interchange in birds
,
2009
, vol.
106
(pg.
21737
-
21742
)
Weir
JT
Schluter
D
The latitudinal gradient in recent speciation and extinction rates of birds and mammals
Science
,
2007
, vol.
315
(pg.
1574
-
1576
)
Wiens
JJ
Global patterns of diversification and species richness in amphibians
Am. Nat
,
2007
, vol.
170
(pg.
S86
-
S106
)

## Author notes

Associate Editor: Vincent Savolainen