Modeling the Growth and Decline of Pathogen Effective Population Size Provides Insight into Epidemic Dynamics and Drivers of Antimicrobial Resistance

Abstract Nonparametric population genetic modeling provides a simple and flexible approach for studying demographic history and epidemic dynamics using pathogen sequence data. Existing Bayesian approaches are premised on stochastic processes with stationary increments which may provide an unrealistic prior for epidemic histories which feature extended period of exponential growth or decline. We show that nonparametric models defined in terms of the growth rate of the effective population size can provide a more realistic prior for epidemic history. We propose a nonparametric autoregressive model on the growth rate as a prior for effective population size, which corresponds to the dynamics expected under many epidemic situations. We demonstrate the use of this model within a Bayesian phylodynamic inference framework. Our method correctly reconstructs trends of epidemic growth and decline from pathogen genealogies even when genealogical data are sparse and conventional skyline estimators erroneously predict stable population size. We also propose a regression approach for relating growth rates of pathogen effective population size and time-varying variables that may impact the replicative fitness of a pathogen. The model is applied to real data from rabies virus and Staphylococcus aureus epidemics. We find a close correspondence between the estimated growth rates of a lineage of methicillin-resistant S. aureus and population-level prescription rates of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}$\beta$\end{document}-lactam antibiotics. The new models are implemented in an open source R package called skygrowth which is available at https://github.com/mrc-ide/skygrowth.

Nonparametric population genetic modeling has emerged as a simple, flexible, popular, and powerful tool for interrogating genetic sequence data to reveal demographic history (Ho and Shapiro 2011). This approach has proved especially useful for analysis of pathogen sequence data to reconstruct epidemic history, and such models are increasingly incorporated into surveillance systems for infectious diseases (Volz et al. 2013). The most commonly used techniques are derivatives of the original skyline coalescent model, which describes the evolution of effective population size as a piecewise constant function of time (Pybus et al. 2000). The basic skyline model is prone to overfitting and estimating drastic fluctuations in effective population size, so that numerous approaches were subsequently developed for smoothing population size trajectories. Initial approaches to smoothing skyline estimators were based on aggregating adjacent coalescent intervals within a maximum likelihood framework (Strimmer and Pybus 2001). Subsequent development has largely focused on Bayesian approaches where a more complex stochastic diffusion process provides a prior for the evolution of a piecewise constant function of effective population size (Drummond et al. 2005). Nonparametric Bayesian approaches are now the most popular approach for phylodynamic inference, and such approaches have illuminated the epidemic history of numerous pathogens in humans and animals (Ho and Shapiro 2011).
To date, all Bayesian nonparametric models have assumed that the effective population size (or its logarithm) follows a stochastic process such as a Brownian motion (BM) (Minin et al. 2008;Palacios and Minin 2013). The choice of a process with stationary increments as prior can have large influence on size estimates especially when genealogical data are sparse and uninformative. Genealogies often provide very little information about effective population size near the present (or most recent sample), especially in exponentially increasing populations (de Silva et al. 2012). In such cases, skyline estimators with BM priors on the effective population size may produce estimates which stabilize at a constant level even when the true size is increasing or decreasing exponentially. We argue that in many situations, a more realistic prior can be defined in terms of the growth rate of the effective population size. Below, we describe such a prior based on a simple autoregressive stochastic process defined on the growth rate of effective population size. We show how this prior can lead to substantially different estimates and argue that these estimates are more accurate in many situations. When genealogical data are sparse, our model will retain the growth rate learned from other parts of the genealogy and will correctly capture trends of exponential growth or decline. Even though our approach is nonparametric, we consider its relationship with parametric models of epidemic population genetics to show that our estimates of growth rates of pathogen effective population size are often likely to correspond to growth rates of an infectious disease epidemic.
Smoothing effective population size trajectories using a prior on growth rates also have important advantages when incorporating nongenetic covariate data into phylodynamic inference (Baele et al. 2016). Recent work has focused on refining effective population size estimates using both the times of sequencing sampling 720 SYSTEMATIC BIOLOGY VOL. 67 (Karcher et al. 2016) or using environmental data which are expected to correlate with size estimates, such as independent epidemic size estimates based on nongenetic data (Gill et al. 2016). Existing statistical models have assumed that the effective population size has a linear or log-linear relationship with temporal covariates. However in many cases, a more realistic model would specify that the growth rate of effective population size is correlated with covariates, as when for example an environmental variable impacts the replicative fitness of a pathogen. We provide a similar extension of previous skyride models with covariate data (Gill et al. 2016) to show how such data can be used to test hypotheses concerning their effect and, when a significant effect exists, to refine estimates of both the growth rates and the effective population sizes.
We illustrate the potential advantages of our growth rate model using a rabies virus data set that has been thoroughly studied using previous phylodynamic methods (Biek et al. 2007;Gill et al. 2016). In particular, we show how our model correctly estimates a recent decline in epidemic size whereas previous models mistakenly predict a stabilization of the epidemic prevalence. We also apply our methodology to a genomic data set of methicilin-resistant S. aureus (MRSA) that had not formally been analyzed using phylodynamic methods (Uhlemann et al. 2014). We show how time series on prescription rates of -lactam antibiotics correlate strongly with growth and decline of the effective population size, revealing the impact of antibiotic use on the emergence and spread of resistant bacterial pathogens.

METHODS AND MATERIALS
We model effective population size through time as a first order autoregressive stochastic process on the growth rate. This provides an intuitive link between the growth rate of effective population size of pathogens and epidemic size as well as the reproduction number of the epidemic. We further show how to incorporate timevarying environmental covariates into phylodynamic inference.

Previous Bayesian Nonparametric Phylodynamic Models
Several nonparametric phylodynamic models have been proposed based on BM processes and the Kingman coalescent genealogical model (Kingman 1982). In particular, the Bayesian nonparametric skyride model uses a BM prior to smooth trajectories of the logarithm of the effective population size (Minin et al. 2008). Let (t) = log(Ne(t)) denote the logarithm of the effective population size as a function of time. The BM prior is defined as: where is an estimated precision parameter, for which an uninformative Gamma prior is typically used.
This BM prior has been adapted and applied in a variety of ways to enable statistical inference. In the skygrid model (Gill et al. 2013), time is discretized, and is defined to be a piecewise constant function of time over a grid with time increments h, and the value i is estimated for each interval i. Time intervals do not in general correspond to coalescent times in the genealogy. In this case, the BM prior is computed over increments of : where The genealogical data take the form G = (c 1:(n−1) ,s 1:n ), where c and s are respectively ordered coalescent times (internal nodes of the genealogy) and sampling times (terminal nodes of the genealogy). In the coalescent framework, the sampling times are usually considered to be fixed, so that p(s) = 1 and p(G) = p(c|s). Alternatively, in some variations of this model, a prior p(s|Ne) is also provided for the sequence of sampling times, making this approach similar to but more flexible than samplingbirth-death-models (Volz and Frost 2014;Karcher et al. 2016).
Given a genealogy, the posterior distribution of the parameters and 1:m is decomposed as: p( 1:m ,|G) ∝ p(G| 1:m )p( 1:m |)p(). ( The second term is given by Equation 2 and the last term by the prior on . To assist with the definition of the first term, we first denote A(t) to be the number of extant lineages at time t: where I(x) is an indicator function equal to one when x is true and equal to zero otherwise. The probability density of the genealogical data given the population size history 1:m is then equal to (Griffiths and Tavare 1994): where t 1:(2n−1) = c 1:(n−1) ∪s 1:n is the set union of sample and coalescent times in descending order.  (Rosenberg and Nordborg 2002;Koelle et al. 2011;Dearlove and Wilson 2013). A simple link between these quantities does not exist, since the relationship depends on how incidence and epidemic size change through time (Volz et al. 2009), population structure (Volz 2012), and complex evolution of the pathogen within hosts (Didelot et al. 2016;Volz et al. 2017). Under idealized situations, there is however a simple relationship between the growth rate of effective population size and the growth rate of an epidemic (Frost and Volz 2010;Volz et al. 2013).
Let Y(t) and (t) denote the number of infected hosts and per-capita transmission rate, respectively, as functions of time. Note that (t) may depend on the density of susceptible individuals in the population, as in the common susceptible-infected-removed (SIR) model, in which case (t) ∝ S(t)/N (Allen 2008). The coalescent rate for an infectious disease epidemic was previously derived under the assumption that within-host effective population size is negligible and that superinfection does not occur (Volz et al. 2009;Frost and Volz 2010): Equating this rate with the coalescent rate under the coalescent model (t) = A(t) 2 /Ne(t) (Kingman 1982) yields the following formula for the effective population size: Differentiating with respect to time (denoting with a dot superscript) yields: Note that, in general the growth rate of the effective population size does not correspond to the growth rate of Y, however if the per-capita transmission rate is constant ( = 0), we haveṄe =Ẏ/(2) ∝Ẏ. Thus, we expect that over phases of the epidemic where per-capita transmission rates are nearly constant there will be close correspondence between the growth or decline of the effective population size and the growth or decline of the unobserved number of infected hosts. This condition is often satisfied near the beginning of an outbreak which has an exponential phase. It is also often satisfied towards the end of epidemics when the epidemic size is decreasing at a constant exponential rate. The basic reproduction number R 0 describes the expected number of transmission events caused by a single infected individual in an otherwise susceptible population. By extension, we can define R(t) as the expected number of transmissions by an infected host infected at time t (Fraser 2007). Assuming that all infected individuals are equally infectious (as is the case e.g., in the SIR model), we have that during periods when the epidemic growth rate is constant, each infected individual transmits at rate (t) = R(t)/ , where is the mean duration of infections. With these definitions, the number of infections Y(t) varies according to the following differential equation: Combining Equations 7, 8, and 9 leads to the following approximate estimator for the reproduction number through time: This estimator makes use of the quantityṄe(t)/Ne(t) which will be estimated in our model below. Equation 10 is likely to be a good estimator over periods of the epidemic where per-capita transmission rates are invariant. A special case of this occurs at the start of an epidemic, in which case Equation 10 can be used to estimate the basic reproduction number R 0 , as previously noted (Pybus 2001).

A Growth Rate Prior for Effective Population Size
We propose a model in which the growth rate of the effective population size, as opposed to effective population size itself, is an autoregressive process with stationary increments. This growth rate is defined as: Note that (t) is a real-valued quantity, with negative and positive values respectively indicating an increase and decrease in the effective population size. In particular, if the population is exponentially growing or declining from t = 0, then we would have Ne(t) = Ne(0)exp(t) so that (t) = at every time t ≥ 0. More generally, we model (t) using a BM process: (t) ∼ BM() (cf Equation 1). To facilitate statistical inference, we work with a discretized time axis with m intervals of length h as in the skygrid model (Gill et al. 2013). We define the growth rate in time interval i as: We use the following approximate model for model to be superior to exact models on the log transformation of Ne provided that h is small. With the above definitions, the prior density of a sequence 1:m is defined in terms of the increments: where This equation can be compared with the skygrid density, Equation 2.

Incorporating Covariates into Phylodynamic Inference
A simple model was recently proposed for incorporating time-varying covariates into phylodynamic inference with skygrid models (Gill et al. 2016). Suppose we observe q covariates at m time points denoted X = (X 1:m,1:q ), and such that observation times correspond to the grid used in the phylodynamic model. The following linear model for the marginal distribution of with covariate vector 1:q was proposed: where 0 is the expected mean of without covariate effects. This implies, along with the BM model, the following marginal distribution of the increments: When covariates are likely to be associated with growth rates of the effective population size instead of the logarithm of the effective population size, we can analogously define the density of increments of : When fitting this model, we drop for simplicity (as in Gill et al. 2016), and estimate a single variance parameter along with the regression coefficients .
Inference and Software Implementation Our growth rate model is implemented in an open source R package called skygrowth , available from https://mrc-ide.github.io/skygrowth/, and which includes both maximum a posteriori (MAP) and Bayesian Markov Chain Monte Carlo (MCMC) methods for model fitting.
The MCMC procedure uses a Gibbs-within-Metropolis algorithm that alternates between sampling the growth rate vector 1:m and sampling of the precision parameter . Metropolis-Hastings sampling is also performed for regression coefficients 1:q if covariate data are provided with univariate normal proposals. The elements of 1:m are sampled in sequence (from past to present), and multiple Gibbs iterations (by default 100) are performed before updating other parameters using Metropolis-Hastings steps.
MAP is used as a starting point for the MCMC. The MAP estimator alternates between optimization of 1:m using gradient descent (BFGS in R, Goldfarb 1970) and univariate optimization of until convergence in the posterior is observed. Approximate credible intervals are provided for the MAP estimator based on curvature of the posterior around the optimum.

Simulations
We evaluated the ability of the skygrowth model to infer epidemic trends by simulating partiallysampled genealogies from a stochastic individual-based susceptible-infected-removed (SIR) model. Simulated data were generated using the BEAST2 package MASTER (Vaughan and Drummond 2013), and code to reproduce simulated results is available at https://github.com/emvolz/skygrowth-experiments. The skygrowth model was also compared to skygrid model as implemented in the phylodyn R package (Karcher et al. 2016(Karcher et al. , 2017 which estimates effective population size using a fast approximate Bayesian nonparametric reconstruction (BNPR). The SIR model was density dependent with a reaction rate S(t)I(t) of generating new infections. Figure 1 shows results of a single simulation with R 0 = 1.3 and 10,000 initial susceptible individuals. Additional simulations are shown in Supplementary Fig. S1 available on Dryad at http://dx.doi.org/10.5061/dryad.9qh7t9t.
Estimates with skygrowth were obtained using the MCMC algorithm and an Exponential(0.1) prior on the precision parameter. We report the posterior means from both skygrowth and skygrid BNPR. Genealogies were reconstructed by sampling 200 or 1000 infected individuals at random from the entire history of the epidemic. In this scenario, both the skygrowth and skygrid models reproduce the true epidemic trend, capturing both the rate of initial exponential increase, the time of peak prevalence, and the rate of epidemic decline. However, when sampling only 200 lineages (Fig. 1B), the genealogy contains relatively little information about later epidemic dynamics, and the skygrid estimates an unrealistic levelling-off of Ne. Estimates using the skygrid BNPR model were highly similar to results using an exact MCMC algorithm for sampling the posterior also included in the phylodyn package.
While the results in Figure 1A and B suggest that Ne(t) can serve as a very effective proxy for epidemic size, the degree of correspondence will depend on details of the epidemic model as discussed in the Methods section.  Figure 1C and Supplementary Fig. S2 available on Dryad show a scenario where estimates of Ne(t) capture the initial rate of exponential growth but fail to estimate the time of peak epidemic prevalence, and the skygrid model also fails to detect that the epidemic ever decreases. This scenario was based on a higher R 0 = 5 and only 2000 initially susceptible individuals, such that almost all hosts are eventually infected and the rate of epidemic decline predominantly reflects the host recovery rate. This is easily understood using the formula Ne(t) ∝ I(t)/S(t) (cf. Equation 7). When R 0 is large, S(t) will change drastically over the course of the epidemic. In the later stages, almost all hosts have been infected so that 1/S(t) is large, producing correspondingly large effective population sizes. There is a very slight signal of decreasing growth rate which is detected shortly following epidemic peak using the skygrowth model. In the absence of other information, the skygrowth model retains this growth rate which produces estimates of decreasing Ne(t).

Rabies Virus
An epidemic of rabies broke out in the late 1970s in the North American raccoon population, following the emergence of a host-adapted variant of the virus called raccoon rabies virus (RRV). By the end of the 1990s, this outbreak had spread to a vast geographical area including all Northeast and mid-Atlantic US states (Childs et al. 2000). A sample of 47 RRV isolates has been sequenced in a previous study (Biek et al. 2007), and BEAST (Drummond et al. 2012) was used to reconstruct a dated phylogenetic tree. A standard skyline analysis (Drummond et al. 2005) was performed, which visually suggested a correlation between the inferred effective population size (Ne) and the monthly area newly affected by RRV (hereafter denoted V), but without attempting to quantify the strength or significance of this association.
These data were recently reanalyzed using the skygrid model with covariates (Gill et al. 2016). No significant association was found between Ne and V, but the authors noted that since V is the newly affected area, V would be expected to be associated with a change in Ne rather than Ne itself. Since the skyride method is focused on Ne, like all previous phylodynamic methods, the authors considered the cumulative distribution of V and showed that this is slightly associated with Ne [with a 95% credible interval of (0.18-2.86) on the covariate effect size, Gill et al. 2016]. However, this approach is not fully satisfactory. In particular, since V is always positive, the cumulative distribution of V is always increasing, whereas Ne is in principle equally likely to increase or decrease over time. Furthermore both V and its cumulative distribution were considered on a logarithm scale, so that the latter flattens over time by definition.
A more natural solution is to keep the covariate V untransformed and investigate its association with the growth rate (t) rather than Ne(t) as implemented in our methodology (Fig. 2). For this analysis, we used exactly the same dated phylogeny as previously published (Biek et al. 2007) (reproduced in Supplementary Fig. S3 available on Dryad). When the covariate was not used (red results in Fig. 2), the growth rate was inferred to be positive but declining progressively to zero from 1973 to ∼1983, then stable around zero up to ∼1990, followed by a period of positive growth until ∼2000, after which the growth rate decreased below zero. This implies that the effective population size increased from 1973 to ∼1983, then was stable until ∼1990, increased to a peak in ∼1997 and afterwards decreased. Two waves of spread have therefore been inferred as in previous analyses (Biek et al. 2007;Gill et al. 2016), with the first one starting in the 1970s and ending in ∼1983 and the second one lasting from ∼1990 to ∼1997. Unfortunately the covariate data V start in September 1978 and therefore do not cover the first wave. However, the covariate data show that the epidemic was spreading very quickly between 1992 and 1997, much faster than before or after these dates, and this timing corresponds fairly precisely to the second wave of spread. When the covariate data were integrated into phylodynamic inference, the covariate effect size was found to be statistically significant but only slightly so, with a large 95% credible interval for the covariate effect size of (0.03-4.61) and posterior mean of 1.09. The reconstructed growth rate and effective population size when using the covariate data (blue results in Fig. 2) were compatible with results without covariate data. Using additional informative data tighten the credible interval as would be expected, except in the second wave during which the covariate data suggest higher values for both the growth rate and effective population size. The mean posterior growth rate reached a value of about 2.5 per year in the 1990s (Fig. 2) and the average generation time of raccoon rabies has previously been estimated to be around 2 months (Biek et al. 2007). We can use Equation 10 to infer a reproduction number of R = 1.4, slightly higher than a previous estimate around R = 1.1 based on the same data (Biek et al. 2007).
One of the main novel findings of our analysis is that we found a significant decline of the effective population size of raccoon rabies post-2000, whereas previous phylodynamic studies based on the same data found this to be constant (Biek et al. 2007;Gill et al. 2016). Previous methods consider a BM on the logarithm of Ne, which results in a strong prior that Ne is constant in recent time. In contrast, our model results in the growth rate being a priori constant, so that the clear decline in growth rate started in the mid-1990s is likely to have continued to the point that the growth rate became negative and Ne declined. Our result is in good agreement with Centers for Disease Control surveillance that shows a clear decline in rabid raccoons after the peak in the mid-1990s (Monroe et al. 2016).

Staphylococcus aureus USA300
Staphylococcus aureus is a bacterium that causes infections ranging from mild skin infections to life-threatening septicemia. In the 1980s and 1990s, several variants of S. aureus have emerged that are resistant to methicilin and other -lactam antibiotics, and collectively called MRSA (Chambers and Deleo 2009). MRSA are well known as a leading cause of hospital infections worldwide, but the MRSA variant called USA300 differs from most others by causing infections mostly in communities rather than hospitals. USA300 was first reported in 2000 and has since spread throughout the USA and internationally (Tenover and Goering 2009;Challagundla et al. 2018). A recent study sequenced the genomes from 387 isolates of USA300 sampled from New York between 2009 and 2011, and reconstructed phylogeographic spread that frequently involved transmission within households (Uhlemann et al. 2014).
The USA300 phylogenetic tree (Uhlemann et al. 2014) was dated using a previously described method (Didelot et al. 2012) and a clock rate of ∼3 substitutions per year for USA300 (Uhlemann et al. 2014;Alam et al. 2015). We analyzed the resulting dated phylogeny ( Supplementary Fig. S4 available on Dryad) using our phylodynamic methodology (Fig. 3). We initially performed this analysis without the use of any covariate data (red results in Fig. 3) and found that the growth rate had been around zero up until 1985, after which it steadily increased until ∼1995, and subsequently decreased almost linearly, becoming negative in ∼2002 and continuing to decrease afterwards. The effective population size was accordingly found to have been very small until the mid-1990s, to have peaked in ∼2002 and to have declined since. These results are in very good agreement with a phylodynamic analysis of USA300 performed using a traditional skyline plot on a different genomic data set (Glaser et al. 2016), and epidemiological data also suggest that USA300 may be declining (Planet 2017). The overall MRSA incidence has declined by 31.2% in the USA between 2005 and 2011 (Dantes et al. 2013), with some MRSA lineages showing encouraging signs of reverting to methicilin susceptibility (Ledda et al. 2017).
We hypothesized that the dynamics of USA300 may be driven by the consumption of -lactams in the USA, and we therefore gathered data on this from three different sources covering respectively the periods between 1980and 1992(McCaig and Hughes 1995), between 1992(McCaig et al. 2003, and between 2000 and 726 SYSTEMATIC BIOLOGY VOL. 67 2012(CDDEP 2017. There first and second sources overlapped in the year 1992, and the second and third sources overlapped in the year 2000. We used these 2 years of overlap to scale the data for consistency between the three sources. Specifically, the values from the second source (McCaig et al. 2003) were scaled so that the 2000 value is equal to the one in the third source (CDDEP 2017), and values from the first source (McCaig and Hughes 1995) were then scaled so that the 1992 value is equal to the one in the previously rescaled second source. The final rescaled data are therefore measured as in the third source, namely in standard units of -lactams (i.e., narrow-spectrum and broad spectrum penicilins plus cephalosporins) consumed per 1000 population in the USA (CDDEP 2017). These data show that the consumption of -lactams almost doubled between 1980 and 1991 and subsequently decreased to reach around 2010 levels comparable to the early 1980s (Fig. 3). These trends on -lactams consumption therefore appear to be very similar to the ones observed for the USA300 growth rate without the use of covariates (red results in Fig. 3). To confirm this observation, we repeated our phylodynamic analysis with integration of the -lactam use as a covariate (blue results in Fig. 3). We found that the covariate was significantly associated with growth rate, with a mean posterior effect of 0.48% and 95% credible interval (0.18-0.71). The growth rate dynamics inferred when using covariate data were almost identical to those inferred without the use of covariate data, except for a clear reduction of the width of the intervals which reflects the gain in information when combining two independent types of data. USA300 is also partly resistant to macrolides and quinolones, but the consumption of these antibiotics increased in the USA throughout the 1990s (McCaig et al. 2003), and the subset of sensitive genomes (31.9% for ciprofloxacin and 6.3% for erythromycin) did not exhibit different phylodynamic properties compared to resistant genomes (Uhlemann et al. 2014), so that these antibiotics could not explain the USA300 growth rate dynamics.
Our analysis therefore suggests that the rise in -lactams consumption in the 1980s was responsible for the emergence of the highly successful USA300 lineage. From the mid-1990s, the use of -lactams has declined, both due to an overall reduction in antibiotic use and a diversification of the type of antibiotics prescribed (McCaig et al. 2003;CDDEP 2017), and the growth rate of USA300 has consequently decreased. Importantly, the consumption of antibiotics is expected to be associated with the growth rates of resistant bacterial pathogens, rather than with their effective population sizes, which here is not at all correlated with the covariate (Fig. 3). Amongst pairs of genomes sampled from the same hosts, the distribution of genomic distance had a mean of 1.4 substitution (Uhlemann et al. 2014). If we assume that sampling occurred on average in the middle of the carriage duration (i.e., /2 time after infection), the evolutionary time separating the two genomes is between 0 depending on the level of within-host genetic drift (Didelot et al. 2016). Given that the molecular clock rate of USA300 is approximately 3 substitutions per year (Uhlemann et al. 2014;Alam et al. 2015), the average duration of infections in this outbreak is therefore around = 1.4×2/3 = 0.93 year. In the first half of the 1990s, the growth rate peaked around 1 per year (Fig. 3) and using Equation 10 we estimate that the reproduction number was around R = 1.93, which is in good agreement with the recent estimate R = 1.5 for MRSA in the US population (Hogea et al. 2014). The fact that this estimate is only modestly above the minimum threshold of R = 1 required for outbreaks to take place could help explain why the USA300 is declining, even though -lactams are still widely used. The consumption level may have lowered below the threshold caused by the fitness cost of resistance, as previously discussed for other resistant bacteria (Dingle et al. 2017;Whittles et al. 2017).

DISCUSSION
Many environmental covariates, particularly those with a mechanistic influence on replicative fitness of pathogens, are closely related to the growth rate of epidemic size but not necessarily related to absolute epidemic size. We have found that these relationships can be inferred from random samples of pathogen genetic sequences by relating environmental covariates to the growth rate of the effective population size. This enables the estimation of the fitness effect of environmental covariates as well as the prediction of future epidemic dynamics should conditions change. We have found a clear and highly significant relationship between the growth and decline of communityassociated MRSA USA300 and the population-level prescription rates of -lactam antibiotics (Fig. 3). This relationship is not apparent when comparing antibiotic usage directly with the effective population size of MRSA USA300. Our methodology focused on growth rate is therefore well suited to investigate the drivers of antibiotic resistance, compared to previous phylodynamic methods focused on the effective population size.
The skygrowth model can provide a more realistic prior for many infectious disease epidemics where the growth rate of epidemic size is likely to approach stationarity as opposed to the absolute effective population size. Conventional skyride and skygrid models are prone to erroneously estimating a stable effective population size when genealogical data are uninformative, as for example when estimating epidemic trends in the latter stages of SIR epidemics (Fig. 1). The skygrowth model will correctly predict epidemic decline in this situation. Moreover, under ideal conditions, the estimated growth rate can be related to the reproduction number of an epidemic, and the skygrowth model provides a simple nonparametric estimator of the reproduction number through time given additional information about the natural history of infection (Equation 10). Caution should be exercised when using the effective population 2018 VOLZ AND DIDELOT-GROWTH AND DECLINE OF EFFECTIVE POPULATION SIZE 727 size as a proxy for epidemic size, as the relationship between the two is complex (cf. Simulation results). In general, there will be close correspondence between the growth of epidemic size and growth of effective population size during periods where the growth rate is relatively constant.
The methods presented here can be applied more generally to evaluate the role of antibiotic stewardship, vaccine campaigns, or other public health interventions on epidemic growth rates. Some environmental covariates, such as independent prevalence estimates, may be more closely related to effective population size rather than growth rates, and future work is indicated on the development of regression models in terms of both statistics. More complex stochastic models can also be considered, such as processes with both autoregressive and moving average components. A variety of mathematical models have been developed to explain de novo evolution of antimicrobial resistance as a function of population-level antimicrobial usage (Bonhoeffer et al. 1997;Austin et al. 1999;Spicknall et al. 2013;Whittles et al. 2017), and an important direction for future work will be the development of parametric and semiparametric structured coalescent models (Volz 2012) that can be applied to bacterial phylogenies featuring a mixture of antibiotic sensitive and resistant lineages. This methodology will allow us to estimate key evolutionary parameters, such as the fitness cost and benefit of resistance, or the rate of mutation from sensitive to resistant status, which are needed to make well-informed recommendations on resistance control strategies.