Swapping Birth and Death: Symmetries and Transformations in Phylodynamic Models

Abstract Stochastic birth–death models provide the foundation for studying and simulating evolutionary trees in phylodynamics. A curious feature of such models is that they exhibit fundamental symmetries when the birth and death rates are interchanged. In this article, we first provide intuitive reasons for these known transformational symmetries. We then show that these transformational symmetries (encoded in algebraic identities) are preserved even when individuals at the present are sampled with some probability. However, these extended symmetries require the death rate parameter to sometimes take a negative value. In the last part of this article, we describe the relevance of these transformations and their application to computational phylodynamics, particularly to maximum likelihood and Bayesian inference methods, as well as to model selection.

Linear birth-death models play a pivotal role in phylodynamics. These stochastic models provide a prior distribution on evolutionary trees (both the shape and edge length distribution) for Bayesian inference methods (Yang and Rannala 1997;. Moreover, these models allow biologists to estimate key parameters of macroevolution (such as speciation rates corresponding to birth rates and extinction rates corresponding to death rates) from reconstructed phylogenetic trees which were dated by fossil (or other time-sampled) evidence (Nee et al. 1994).
The study of such models dates back to some classical papers from the early to mid-20th century (Yule 1924;Kendall 1948a,b), and the application of these models to phylogenetics and phylodynamics flourished from the 1990s onwards (Nee et al. 1994;Rannala and Yang 1996). Further in-depth mathematical analysis (Aldous 2001;Maddison 2007;Aldous et al. 2009;Morlon et al. 2011;Lambert and Stadler 2013) has extended our understanding of the properties of these models and extensions that allow more complex processes of birth and death.
In this article, we identify and explore curious symmetries in fundamental birth-death model probability distributions when the birth and death rates ( and ) are swapped. This symmetry has been known in the case of complete sampling of individuals at present (Waugh 1958;Tavaré 2018), and we will start the article by providing an intuitive account of this symmetry that seems at first a little surprising. We extend this to the more general setting where a third parameter is introduced-the sampling probability of individuals sampled at the present-and show how analogous symmetries can be derived by a transformation that reduces these three parameters to just two ( , ). One can view these as "corrected" birth and death rates, except for the caveat that this new death rate can now take negative values. A major advantage of working with the transformed pair of parameters ( , ) is that it captures the correct dimensionality of the process (namely 2), thereby avoiding the inherent redundancy present in the 3D parameterization that uses the triple (,,). This viewpoint has implications for phylogenetic and phylodynamic inferences, both in the maximum likelihood and Bayesian settings, and we explore these implications in the latter part of our article.

BIRTH-DEATH SYMMETRIES
Consider a phylogenetic tree that evolves from a single ancestral individual according to a birth-death process, with a constant birth rate ≥ 0 and a constant death rate ≥ 0. Suppose that at some time point in the tree, there are n individuals present. Let p n,m (t|,) be the probability that at time t later, there will be m individuals present. These transition probabilities are classical and provide a foundation for phylodynamic models. The starting point for this article is the following curious symmetry which goes back to (Waugh, 1958) and was recently highlighted again in (Tavaré, 2018): This equation states the surprising result that the probability of one individual having one surviving descendant after time t remains the same if we swap the birth rate () and the death rate (). Thus a process with a birth rate of, say, 100 and a death rate of, say, 1-a scenario with a very fast-growing population-has the same probability of having one surviving descendant as a process with a birth rate of 1 and a death rate of 100a scenario where we know that the process eventually leads to extinction. This symmetry can be extended to more general scenarios, as stated in the following theorem.
This result has been established in Waugh (1958) and explicitly stated in Tavaré (2018) (an alternative formal proof of Theorem 1 is provided in the supplementary material available on Dryad at http://dx.doi.org/10.5061/dryad.57704ft). To provide some intuitive insight into this result, we now provide a direct and conceptually transparent Proof of Theorem 1 in the case where n = m = 1 (i.e., equation (1)); the result for n = m > 1 follows by essentially applying the same idea. We start a birth-death process with one individual. The waiting time between "events" (a birth event or death event) is exp(n(+)), where n is the number of individuals at the considered time point. Let p = + , and consider two different scenarios (one proceeds forward in time, the other backward): • Scenario 1: The process starts at time 0 and is stopped at time t > 0. At an event, with probability p, we add an individual and, with probability 1−p. we remove an individual. Scenario 1 is a classic forward-in-time birth-death process.
• Scenario 2: The process starts at time t > 0 and is stopped at time 0. At an event, with probability 1−p we add an individual and, with probability p, we remove an individual. Scenario 2 is a birthdeath process in reversed time with the birth and death rates being interchanged compared with Scenario 1.
Intuitively, the result of the time-reversed process with birth and death being interchanged is analogous to the forward-in-time process. However, we justify this intuition by a formal argument showing that the probability of observing one individual after time t is the same under Scenario 1 and Scenario 2.
Consider some population size trajectory X that starts at time 0 with one individual and ends with one individual after time t (see Fig. 1 for an example). At each event, X can grow or decrease by one. Let the number of growth events be k, which therefore also equals the number of death events. Denote the time of these 2k events by t 1 ,t 2 ,...t 2k , and define t 0 = 0 and t 2k+1 = t. See Figure 1 for an example with k = 2.
The probability density of X under Scenario 1, L 1 (X), is a product of the probability for the birth events, p k , for the death events (1−p) k , and the waiting times between events, 2k i=1 (+)n i e −(+)n i (t i −t i−1 ) , where n i is the number of individuals prior to the event at time t i . Finally, the term e −(+)(t−t 2k ) stipulates that no subsequent event happens after the event at time t 2k . In FIGURE 1. The forward-in-time birth-death process with realization X and the equivalent time-reversed process with interchanged rates and realization X . summary, the probability density of X under Scenario 1 for k > 0 is: Now we reverse time in the realization X and call it X . Thus, X starts where X ends, and X ends where X starts. The probability density of X under Scenario 2 is then L 2 (X ). We establish L 2 (X ) analogous to the procedure above, with the birth events in X being death events in X and vice versa. Thus, the same p and (1−p) factors are multiplied when calculating the probability density of X under Scenario 2, compared to the probability density of X under Scenario 1. Furthermore, the waiting time contributions are the same for Scenario 1 and Scenario 2, and thus L 1 (X) = L 2 (X ).
One can directly extend this argument to establish Theorem 1 for any value of n ≥ 1 by considering the associated forward-in-time and backward-in-time processes. 854 SYSTEMATIC BIOLOGY VOL. 68

GENERAL SYMMETRIES UNDER INCOMPLETE SAMPLING
We continue to study a birth-death model with constant and non-negative birth and death rates and . However, we now allow each of the individuals present at time t to be sampled (independently) with probability ∈ (0,1]. Let us first suppose that we start with one individual at time 0, and let p i (t|,,) be the probability that i sampled descendants are observed (i.e., extant and sampled) at time t. The exact expressions for p i (t) = p i (t|,,) are provided by the following theorem.
Theorem 2. For = , we have: For the critical case = , we have: For > ≥ 0 and ∈ (0,1], the result is already provided in Stadler (2010), based on earlier work by Nee et al. (1994); Yang and Rannala (1997). The critical case for = 1 is provided for example in (Feller, 2008). For the proof of the remaining cases, refer to the Supplementary Material available on Dryad.
In what follows, we investigate the expressions for p i (t|,,) in detail, and identify symmetries with respect to adjusted birth and death rates.

Negative "Death Rates" in the Case of Incomplete Sampling
We introduce two new variables and , which will play a key role in the remainder of the article. They are defined by ,, and according to the following transformation: = and = −(1−).
Note that when = 1, we have = and = .
Further, for all vales of we have − = − (thus = if and only if = ). Note also that < 0 is entirely possible (e.g., when = 4 and = 0.5, we obtain =−). In this case, cannot easily be viewed as a death rate (nor as a birth rate); however, allowing to take any real value (positive or negative) means that all parameter triplets (,,) have a transformation to ( , ).
The following lemma is straightforward to verify using simple algebra (Stadler 2013).
In order to investigate symmetries, we define the following functions, which only depend on , , and t (rather than the four parameters ,,, and t) (this dependence on , , and t can easily be seen from Lemma 3). Let: For = , these equations are, In particular, we have:p 1 (t| , ) = p 1,1 (t|,, = 1). This leads to the following symmetries with respect to and . A proof is provided in the Supplementary Material available on Dryad.
Theorem 4. For ≥ 0, the following symmetries hold: (1−p 0 (t| , )) = (1−p 0 (t| , )), A phylogenetic tree T that evolves under a birthdeath process with rates , and with sampling at the present with probability . Lineages ending in a death (extinction) are marked by × whereas lineages at the present that are not sampled are marked by o. The reconstructed tree on the sampled extant individuals is indicated by the additinal lines starting at t 1 . and for all n ≥ 1: TREE PROBABILITY DENSITIES Let T be a phylogenetic tree generated by a birthdeath process starting with one individual and being stopped after time t 0 . Each individual alive after time t 0 is sampled with probability . In this tree, all extinct lineages are pruned, and only the lineages leading to the sampled tips are kept. Such a tree is also called the reconstructed tree (Nee et al. 1994), as indicated by the red lines in Figure 2. Let this tree have n sampled tips and the branching times t 1 > t 2 ,... > t n−1 , where time is measured from the present time 0. Let L(t) be the number of coexisting lineages of tree T at time t (see Fig. 2).
Let f (T |L(t 0 ) = 1) be the probability density of the tree T , and let f (T |t 0 = t stem ) be the probability density of the tree T , given that at least one individual is sampled at present. Thus t 0 is the stem age (t stem ) of the process. For = 1, this corresponds to conditioning on nonextinction of the process. Let f (T |t 0 = t stem ,L s (0) = n) denote the probability density of the tree T , given that we sample exactly n tips at present (denoted by L s (0) = n).
The tree T in these formulations was a tree starting with one individual, leading to two lineages at time t 1 in the past. Alternatively, a tree T may start with two lineages at time t 1 ago; the probability of such a tree is f (T |L(t 1 ) = 2). Let f (T |t 1 = t crown ) be the probability density of the tree T conditioning on sampling at least one descendant individual from both initial lineages.
Note that when conditioning on sampling, the time t 1 is the crown age of the clade (t crown ). Furthermore, let f (T |t 1 = t crown ,L s (0) = n) be the probability density of the tree T conditioned on sampling exactly n tips at present. Finally, in the setting where t 0 is chosen uniformly at random from (0,∞), then a tree T conditioned on n tips and integrated over all possible t 0 has probability density f (T |L s (0) = n).
In what follows, we assume >0 and thus > 0; otherwise, we cannot obtain a tree with n > 1.

Remark 6.
Only the expressions for the unconditioned tree probability densities (i.e., the equations not conditioning on observing at least one sample) depend on all three parameters ,, and . The remaining five expressions (the conditioned tree probability densities) only depend on two parameters ( , ), meaning only two out of the three birth-death parameters ,, can be inferred from the phylogenetic tree. This has already been observed for f (T |L s (0) = n) by (Stadler, 2009) and is then trivial to generalize for the other equations. Furthermore, based on Theorem 4, the expressions for f (T |t 0 = t stem ,L s (0) = n) and f (T |t 1 = t crown ,L s (0) = n) (i.e., the expressions where we condition on both the age of the process and the number of sampled tips) give the same result for , and for when the parameters are swapped to , . For complete sampling, Rannala (1997) 856 SYSTEMATIC BIOLOGY VOL. 68 noticed this symmetry in f (T |t 0 = t stem ,L s (0) = n) (this author also mentioned that this special symmetry had also been independently observed by Monty Slatkin). Note that ≤ 0 is possible, whereas > 0, thus the swapping is only well-defined if > 0.

Tree Symmetries for Complete Sampling with Implications on Parameter Inference
As highlighted in Remark 6, we can, based on Corollary 1 of the supplementary material available on Dryad, directly conclude that Thus, we obtain the same probability density when swapping birth and death. As a consequence, we have to specify if the birth rate is bigger or smaller than the death rate prior to any analysis based on these equations.

Mapping from ( , ) to the Birth-Death Model Parameters (,,) with implications for Maximum Likelihood and Bayesian Inference
When using the tree probability densities in a maximum likelihood inference framework, the expressions are maximized over the parameters for a given tree. Based on the five conditioned tree probability density equations, we should optimize over and , with ∈ (0,∞) and ∈ (−∞,∞), instead of maximizing over the three parameters ,, and , as the latter parameterization induces a ridge in the likelihood surface and thus optimization is problematic. This is equivalent to optimizing when assuming complete sampling (and allowing the "death rate" to be negative) and, in a second step, assuming a sampling probability and transforming from ( , ) to (,). This procedure was already suggested in (Stadler, 2009), Section 6.2 (up to pointing out the possibility for negative ). We next investigate for which chosen values of we can transform , to ,. A proof is provided in the Supplementary Material available on Dryad.
In summary, given we estimate a negative , for some , we cannot transform the parameters to ,. Thus, for parameter inference on empirical data, the best strategy might be to fix and then estimate and .
Given the dependency of ,, and on only two parameters and , one may decide to perform a Bayesian analysis on ∈ (0,∞), ∈ (−∞,∞) (see also Stadler (2009), Section 6.1). Care has to be taken though regarding the priors, since these priors play out in nonstraightforward ways. Assume, for example, that the analysis is performed by sampling , . For each sampled parameter pair, one might assume a ∈ (0,1] uniformly at random. Given that ≥ 0, this would yield a uniform distribution on the chosen . However, given that some sampled parameter pairs reveal < 0, it follows that only a small , namely ∈ (0, 1 1− / ] is possible, meaning that overall, the samples on would be nonuniform, with a preference for small values of . Thus, in the Bayesian setting, we need to assess the effective priors on ,, given the parameter nonidentifiability. (,,) and (,,)

Mappings between Birth-Death Model Parameters
Next, we characterize all birth-death parameters that are transformations of ,,, the proof is again provided in the Supplementary Material available on Dryad. Note that the parameters (,,) and (,,) give thus rise to the same tree probability density.
Next, we consider = 1 (i.e., the transformation to the case of complete sampling). A further consequence of Theorem 8 is the following result from Stadler and Steel (2012).
Implications for proving properties of the birth-death tree distribution. Properties of the birth-death tree distribution need to be known in order to test if empirical data are significantly different from these properties and thus the birth-death model has to be rejected for the given data. Sometimes, proofs of the properties of the conditioned tree distribution are carried out for complete sampling (i.e., for parameters,, = 1). Such properties also hold for incomplete sampling if ≥ 1 or if ≥ 1−. To include the parameter space 0 ≤ < 1−, the proof needs to be done with explicitly acknowledging incomplete sampling. This was noticed already in Stadler and Steel (2012).
Implications regarding model selection. For a given phylogenetic tree, it is tempting to ask if a model with = 1 or =<1 fits the data better. However, for every parameter combination (,,1), we also find a parameter combination (,,) with both parameter triples having the same conditioned tree probability density. Moreover, there are parameter combinations (,,) without a corresponding triplet where = 1 (see Corollary 9). Thus, the model with <1 always gets more support than the model with = 1. In summary, such a test is meaningless because of the parameter nonidentifiability.

DISCUSSION
Birth-death models have been studied for almost 100 years (Yule 1924;Kendall 1948a). However, surprising properties are still being uncovered. Here, we presented some unexpected symmetries in birth-death models with incomplete sampling of individuals. In particular, a birth-death process with incomplete sampling can be described phylogenetically through two parameters instead of three parameters, resulting in parameter nonidentifiability.
Such parameter nonidentifiability has important consequences for using birth-death models in phylogenetic and phylodynamic inference. In particular, the likelihood surface of the three birth-death parameters ,, and for a given tree has a ridge, and we can therefore only estimate two of the three parameters. Maximum likelihood estimation should thus be done for a fixed sampling probability. In Bayesian analysis, we need to carefully consider the effective prior when using such nonidentifiable parameter triplets.
Furthermore, we showed that for some of the parameter triplets (,,), their two-parameter description is, in fact, equivalent to a birth-death process with complete sampling. However, in some cases, the resulting 'death' rate is negative, and thus the transformed parameters cannot always be considered as a birth-death process with complete sampling. This means that we cannot simply prove properties of phylogenetic trees for complete sampling and then extrapolate to incomplete sampling, as we then miss some birth-death parameter combinations (namely the ones leading to a negative "death" rate). Furthermore, testing whether the data are completely sampled ( = 1) or not (<1) is not informative, as the models with <1 always have more support: parameter triplets for incomplete sampling may only have corresponding complete sampling parameters with a negative "death" rate, whereas birth and death rates under complete sampling have a corresponding triplet for all ∈ (0,1]. The birth-death model presented here is the simplest model for speciation and extinction, or for transmission and recovery. However, it has limitations for explaining the data, as it assumes exponential growth of the population, although populations cannot have unlimited growth, and it assumes that all individuals are dynamically equivalent. There has been considerable work on extending the birth-death model to address such limitations (Maddison 2007;Morlon et al. 2011;Stadler 2011;Etienne et al. 2012;, but no symmetries and only very special parameter nonidentifiability has been observed ). It will be interesting to explore in the future whether the observed symmetries and nonidentifiabilities in our simple model are also present in these more complex models.