A flexible parametric accelerated failure time model

Accelerated failure time (AFT) models are used widely in medical research, though to a much lesser extent than proportional hazards models. In an AFT model, the effect of covariates act to accelerate or decelerate the time to event of interest, i.e. shorten or extend the time to event. Commonly used parametric AFT models are limited in the underlying shapes that they can capture. In this article, we propose a general parametric AFT model, and in particular concentrate on using restricted cubic splines to model the baseline to provide substantial flexibility. We then extend the model to accommodate time-dependent acceleration factors. Delayed entry is also allowed, and hence, time-dependent covariates. We evaluate the proposed model through simulation, showing substantial improvements compared to standard parametric AFT models. We also show analytically and through simulations that the AFT models are collapsible, suggesting that this model class will be well suited to causal inference. We illustrate the methods with a dataset of patients with breast cancer. User friendly Stata and R software packages are provided.


Introduction
Accelerated failure time (AFT) models are commonly used in a variety of settings within the medical literature (Collett, 2003). The interpretation of an acceleration factor can be considered more intuitive, directly adjusting the survival time, either increasing or decreasing it, compared to the interpretation of a hazard ratio, meaning a relative increase or decrease in the event rate (Swindell, 2009). A parametric approach tends to be favoured when fitting an AFT model; however, parametric models are limited by the flexibility of the distribution chosen (Cox et al., 2007;Cox, 2008). Parametric AFT models are particular prevalent in economic decision modelling, where it is emphasized to fit a wide variety of parametric models (either proportional hazards or accelerated failure time), to obtain the 'best fitting' model (Latimer, 2013). Often, extrapolation is required to calculate survival across a lifetime horizon, and hence parametric and flexible approaches are needed. Of course, extrapolation is fraught with dangers, and arguably should only be attempted in the presence of appropriate external data (Andersson et al., 2013).
To our knowledge, the most flexible fully parametric AFT model is the generalized F distribution, a four parameter distribution described by Cox (2008), which often suffers from convergence problems. This contains the more widely used (due to availability of software) generalized gamma as a special case (Cox et al., 2007). Many authors have compared and contrasted accelerated failure time models with the more commonly used proportional hazards metric (Kay and Kinnersley, 2002;Orbe et al., 2002). Lambert et al. (2004) developed a mixture AFT model with frailties, where a short term hazard component was modelled with a Gompertz distribution, and the long term hazard component could be any of the standard parametric AFT models. There have been several efforts to develop smooth accelerated failure time models, including mixtures of normal densities (Komárek et al., 2005), kernel smoothed densities (Zeng and Lin, 2007) and seminonparametric densities (Zhang and Davidian, 2008).
A software implementation of the mixture of normal densities has received modest attention. Rubio et al. (2019) recently proposed a general hazards-based model, utilising the exponentiated-Weibull model to model the baseline function.
Within a proportional hazards metric, the Royston-Parmar flexible parametric model has grown in popularity in recent years, with a number of extensions and developments being proposed (Royston and Lambert, 2011;Liu et al., 2016). The fundamental strength of the model is to use restricted cubic splines to model the underlying baseline function (regardless of scale), and any time-dependent effects. However, there are known limitations with models based on hazard ratios, where the hazard ratios are not collapsible across covariates not associated the exposure of interest (Martinussen and Vansteelandt, 2013), while AFTs are known to be robust to omitted covariates (Hougaard, 1999). Together, this motivates the incorporation of a flexible framework into an accelerated failure time paradigm, which we consider in this article.
AFT models make the assumption of a constant acceleration factor, i.e. the effect of a covariate remains the same across follow-up time, similar to the proportional hazards assumption. Clearly this assumption is open to violation. This motivates the relaxation of the constant acceleration factor to allow time-dependency, similarly to modelling of non-proportional hazards. This has been described within a generalised gamma AFT model by Cox et al. (2007). In this article, we further relax the constant acceleration factor assumption, within the flexible parametric AFT model, by using restricted cubic splines.
The paper is organised as follows. In Section 2, we first show that an accelerated failure time model has some desirable properties, including collapsibility, that are not exhibited by a proportional hazards model. In Section 3, we derive the proposed model framework and describe the estimation process within a likelihood framework. In Section 4 we conduct a simulation study to evaluate the finite sample performance of the proposed model under complex scenarios, comparing to standard parametric AFT models. In Section 5 we illustrate the model using data from the England and Wales breast cancer registry. Finally, in Section 6 we conclude the paper with a discussion.
2 Causal interpretation of the accelerated failure time model Hougaard (1999) provided an informal description of how AFT models are robust to omitted covariates. We now provide a more formal development for the collapsibility of the acceleration factor for AFT models. Consider a model with two covariates X and Z, with an event time T , with regression parameters β X and β Z and linear predictor β X x + β Z z. Assume that the censoring variable C is independent of X, Z and T , and that the time process is observed by the tuple (Y = min(T, C), ∆ = I(T ≤ C)); the associated causal diagram is given in Figure 1. Figure 1: Causal diagram for the motivating example, with censoring variable C being independent from X, Z and T .
Following Martinussen and Vansteelandt (2013), we define the marginal unadjusted effect for binary X = 1 compared with X = 0 at time t for a contrast function g and a prediction function ψ as T m (t) = g(ψ(t|x = 1), ψ(t|x = 0)). The marginal exposure effect (causal effect) is defined as T (t) = g(ψ(t|x = 1), ψ(t|x = 0), wherex = x is the do operator, which can be conceptualised as the population value that would be realised if X were uniformly set to x.
3 A general parametric accelerated failure time model Continuing with the notation defined in the previous section, an accelerated failure time model, conditional on a set of explanatory variables, X, can be written in the form of the survival function, We can also specify an AFT model in terms of the cumulative hazard function In essence, we can specify any parametric function for Equation (4), subject to the appropriate constraints that the function remains positive for all t > 0 and is monotonically increasing as t → ∞. In this article, we concentrate on a highly flexible way of specifying a parametric AFT, using restricted cubic splines as our basis functions Durrleman and Simon (1989). Similarly to Royston and Parmar (2002), we begin with the log cumulative hazard function of the Weibull distribution, log H(t|λ, γ) = log(λ) + γ log(t) Instead of incorporating covariates into the linear predictor of the log(λ) component, as in Royston and Parmar (2002), here we incorporate them as a multiplicative effect on t, where φ(X; β) is defined in Equation (3). Now we can incorporate the desired flexibility, expanding log(t φ(X; β)) into restricted cubic spline basis. For simplicity, letting u = log(t φ(X; β)), our spline function is defined as where k 0 is a vector of knot locations with parameter vector γ, and derived variables v j (known as the basis functions). For a truncated power basis, the v j are defined as where for j = 2, . . . , m + 1, (u − k j ) 3 + is equal to (u − k j ) 3 if the value is positive and 0 otherwise, and Alternatively, the v j (u, k 0 ) can be calculated using a B-spline basis with a matrix projection at the boundary knots, as per the ns function in R. Given one of these bases, our flexible parametric AFT model can be defined as log H(t|X) = s(log(t φ(X; β))|γ, k 0 ) Usually, knot locations are calculated based on quantiles of the distribution of the variable being transformed into splines, in this case log(t φ(X; β)), also restricted to those observations which are uncensored.

Likelihood and estimation
We define the likelihood in terms of the hazard and survival functions. The hazard function can be written as follows The survival function is defined as We can therefore define our log likelihood for the i th patient, allowing for delayed entry, as We maximise Equation (5) using Newton-Raphson based optimisation (Gould et al., 2010), with analytic score and Hessian.

Time-dependent acceleration factors
Following Cox and Oakes (1984) and Hougaard (1999), we have the survival function of a timedependent AFT model, such that where η(X, u; β) is the time-varying acceleration factor at time u and the baseline survival is S 0 (t) = exp(− exp(s(log(t)|γ, k 0 ))). Within our flexible parametric framework, we can avoid the integration by directly modelling on the cumulative scale, such that Since we are on a cumulative scale, to recover the directly interpretable time-dependent acceleration factor, η(X, t; β), we derive the following relationship, which gives a rather convenient formula for the time-dependent acceleration factor in terms of its cumulative. We can arguably use any continuous function to capture simple and complex timedependent acceleration factors. The hazard function is defined as × s (log(tφ(X, t))|γ, k 0 ) Our form of choice continues the use of restricted cubic splines. A common case is to use a minus log link for the linear predictor, such that where for the p th time-dependent effect, with p = {1, . . . , P }, we have x p , the p th covariate, multiplied by some spline function of log time, s(log(t)|γ p , k p ), with knot location vector, k p , and coefficient vector, γ p . Now Substituting Equation (9) into (7) h(t|X) = exp[s(log(tφ(X, t; β))|γ, k 0 )] × s (log(φ(X, t; β))|γ, with survival function S(t|X) = exp[− exp{s(log(tφ(X, t; β))|γ, k 0 )}] Equations (10) and (11) can then be substituted into Equation (5) to maximise the log likelihood. Analytic scores and Hessian elements can also be derived.

Causal inference
We first simulate under Figure 1. Assume that X is Bernoulli, Z is normal, T is exponential and C is uniform. Specifically, let X ∼ Bernoulli(0.5), Z ∼ Normal(0, 2 2 ), T ∼ Exponential(exp(β 0 + β X X + β Z Z)), β 0 = −5, β X = β Z = 1, C ∼ Uniform(0, 10) and (Y, ∆) = (min(T, C), T < C). Let (x, z, y, δ) be realisations for (X, Z, Y, ∆). These data can be modelled using both proportional hazards and accelerated failure time models. We fit models for (y, δ) with both x and z as linear and additive covariates and with only x as a covariate. We model using Poisson regression, Cox regression and our smooth AFT with 3 degrees of freedom (see Table 1). Note that the estimated β X 's have opposite signs for the AFT models compared with the proportional hazards models (Poisson and Cox regression). As a reminder, an exponential AFT would estimate −β X as per the Poisson regression. We find that all of the models are unbiased when both covariates are included (|E(β X )| = 1). When X and Z are independent, then the effect of X is not confounded by Z and T m (t) = T (t). For the AFT models, assuming that we have captured the baseline distribution, then E Z (Z|X = 1) = E Z (Z|X = 0) and T m (t) = −β X (t). However, for the proportional hazards models with both covariates, the marginal (causal) effect is attenuated for increasing time. This pattern of attenuation is well recognised from the context of frailty models (e.g. Aalen et al. (2008)). Moreover, when the covariate z is not included and X and Z are correlated, then the first three models are more biased, while the smooth AFT is less biased. Table 1: Simulation results for exponentially distributed data with β X = 1, with n = 10 4 observations per simulation and 300 simulation sets. The regression models assume that both covariates are modelled ((y, δ) ∼ x + z) or that only the x covariate is modelled ((y, δ) ∼ x). The expectations for the estimated β X and their standard errors are over the simulation sets.

Other simulations
In this section we conduct a simulation study to assess the ability of the flexible parametric AFT model to capture complex, biologically plausible, baseline functions, and subsequently the impact on estimates of acceleration factors and survival probabilities, when misspecifying the baseline. We also compare the newly proposed flexible AFT to existing parametric models, including the Weibull, generalized gamma, and generalized F. The Weibull and generalized gamma models are available in the streg command in Stata, and we implement the generalized F in Stata, following Cox (2008).
In all simulations, we use a range of two-component mixture Weibull baseline hazard functions, and also a standard Weibull, to generate complex, realistic scenarios (Crowther and Lambert, 2013). When fitting the flexible parametric models, we are therefore not fitting the 'true' model, but investigating how well the spline approximations can do (Rutherford et al., 2015). The baseline survival function for a two-component mixture Weibull is defined as follows: We choose four different baseline hazard functions, representing clinically plausible functions (Royston and Lambert, 2011;Murtaugh et al., 1994). The four assumed baseline hazard functions are shown in Figure 2. Scenarios 1 to 3 come from mixture Weibull functions defined in Equation (12), with Scenario 4 a standard Weibull function. With our baseline functions defined, we can choose to simulate under an accelerated failure time framework, or under proportional hazards, using the general survival simulation framework developed by Crowther and Lambert (2013). Consider a binary treatment group variable, X, with a log acceleration factor, β. We can simulate accelerated failure time data from the following, For each scenario, we assume a log AF of β = −0.5, or β = 0.5. This results in 8 scenarios in total.
To each simulated dataset, we apply a Weibull AFT model, a generalised gamma AFT model, a generalized F AFT model and the proposed flexible parametric AFT model with 2 to 9 degrees of We monitor estimates of β from all models, and estimates of the survival probability at 1, 2, 3, 4, and 5 years, in both treatment groups. Survival was monitored on the log[− log()] scale, with standard errors calculated using the delta method. We also monitor values of the AIC and BIC.

Simulation results
Results are presented in Table 2 for all 8 scenarios. We present bias, percentage bias, and coverage for the estimates of the log acceleration factor from all AFT models. We further present the median rank in terms of best fitting model based on either the AIC or BIC, for all models fitted. Finally, in Tables 3 to 6 we present bias, percentage bias, and coverage for estimates of the survival probability at 1, 2, 3, 4, and 5 years, for the four scenarios, when X = 0 and β = 0.5, X = 1 and β = 0.5, X = 0 and β = −0.5, X = 1 and β = −0.5, respectively, with all estimates are on the log {− log[S(t)]} scale.
From Table 2, looking at scenarios 1 to 3, the Weibull AFT model gives substantial bias in estimates of the log acceleration factor, and poor coverage probabilities. Similarly, but to a lesser extent, the generalised gamma also indicates some bias and poor coverage, but in addition an important proportion of models, 69 out of 250, failed to converge in Scenario 3 when β = 0.5. The generalised F model performed particularly poorly as a substantial proportional in most scenarios failed to converge; therefore the bias and coverage estimates calculated only on models which converged, should be interpreted with caution. Results based on those that did converge indicate some bias across scenarios, but particularly poor coverage across all scenarios. In all scenarios, the flexible parametric AFT performed well across varying degrees of freedom. In scenarios 1 to 3, there was a FPAFT with a specific degree of freedom (or multiple), that outperformed the Weibull, generalized gamma, and generalized F, both in terms of less bias and coverage probabilities closer to the optimum of 95%. When there was bias in specific degrees of freedom, the AIC and BIC indicated a more appropriate well fitting model, generally with the least bias. For example, in Scenario 1 with β = −0.5, the FPAFT with df=2 had -10.6% bias and coverage probability of 68.4% and on average was the 10th best fitting model based on both the AIC and BIC; however, all other degrees of freedom were better fitting, for example, with df=7, percentage bias was -0.4% with coverage of 94.8%, ranked 2nd and 5th on average in terms of AIC and BIC, respectively. In Scenario 4, where the true model was a Weibull (equivalent to FPAFT with df=1), generally all models estimated the log acceleration factor with minimal bias; however, coverage began to be suboptimum as the degrees of freedom increased in the FPAFT, clearly due to over-fitting. Generally, a flexible parametric AFT model was the best fitting in terms of both AIC and BIC, apart from scenario 4 where the true Weibull model (which is equivalent to a flexible parametric AFT with 1 degree of freedom). In some settings the generalized F was best fitting; however, this is based only on estimates that converged (for example scenario 3 and β = −0.5, only 106 out of 250 converged).
Moving to estimates of survival in Tables 3 to 6, in scenarios 1 to 3, the Weibull model produced substantial bias and poor coverage, compared to excellent performance in scenario 4 when the truth was Weibull. Both the generalized gamma and generalized F models produced varying levels of bias and poor coverage, particularly the generalised F, across all 4 scenarios. Both suffered from varying levels of lack of convergence, and also the delta method failed to calculate a standard error in a small number of simulations. The flexible parametric model performed well across all 4 scenarios; there was at least one degree of freedom which provided generally unbiased estimates of survival in each treatment group, with coverage around the 95% optimum.

Breast cancer in England and Wales
To illustrate the proposed AFT model, we use a dataset of 9721 women aged under 50 and diagnosed with breast cancer in England and Wales between 1986 and 1990. Our event of interest is death from any cause, where 2,847 events were observed, and we have restricted follow-up to 5 years, leading to 6,850 censored at 5 years. We are interested in the effect of deprivation status, which was categorised into 5 levels; however, in this example we restrict our analyses to comparing the least and most deprived groups. We subsequently have a binary covariate, with 0 for the least deprived and 1 for the most deprived group.
We fit Weibull, generalized gamma, generalized F and the proposed AFT models with 2 to 9 degrees of freedom, and present estimates of the log acceleration factor for the effect of deprivation status, its standard error and associated 95% confidence interval in Table 7, and also model fit statistics, namely the AIC and BIC. Table 7 indicates that the best fitting model, both in terms of lowest AIC and BIC, is the flexible parametric AFT model with 3 degrees of freedom. This estimates an acceleration factor of 0.744 (95% CI: 0.689, 0.803) for the effect of deprivation status, indicating a patient's survival time is reduced by 25.6% (95% CI: 19.7%, 31.1%) by being in the most deprived group, compared to the least deprived.
The differences in AIC and BIC are substantial between the best fitting model, and those commonly used, namely the Weibull, gamma and F models. We also observe important variation in the estimates of the effect of deprivation status between the FPAFT with df=3, and the Weibull, gamma and F model estimates.
We illustrate the fitted AFT models in Figure 3, showing the fitted survival function for both deprivation groups, for the Weibull, gamma, F and best fitting flexible AFT model, overlaid on the Kaplan-Meier estimates. It is evident from Figure 4 that the flexible AFT fits substantially better than the other models.

Discussion
Accelerated failure time models provide an attractive alternative to the proportional hazards framework, particularly for patients, as an acceleration factor can have a more intuitive meaning, directly increasing/decreasing survival time, rather than the event rate. Many authors have argued that AFT models are underused in applied research (Swindell, 2009;Kay and Kinnersley, 2002;Ng et al., 2015). Indeed, estimates have been shown to be more robust to covariate omission, compared to proportional hazards models (Lambert et al., 2004;Keiding et al., 1997). In this article we proposed a new general parametric accelerated failure time model. We focused on the use of restricted cubic splines to provide a highly flexible framework with which to capture complex, biologically plausible functions. Our model can be thought of as an accelerated failure time formulation of that proposed by Royston   and Parmar (2002). Furthermore, we extended the framework to allow time-dependent acceleration factors. Accelerated failure time models show considerable promise for causal inference. In particular, the log acceleration factors are collapsible for omitted covariates that are uncorrelated with the exposure of interest, whereas the proportional hazards models are sensitive to such random effects or frailties. Moreover, the proportional hazards model have a difficult causal interpretation (see Equations (1)).
We conducted a simulation study to evaluate the performance of the proposed AFT model, indicating excellent performance in a variety of complex, but plausible, settings. In our scenarios, it outperformed the Weibull, generalized gamma and generalized F models, both in terms of minimizing bias in estimates of the acceleration factor and coverage probabilities closer to the optimum 95%. Furthermore, we found that model selection criteria can aid in selecting degrees of freedom, both to select a model with minimal bias, but also a model which capture the baseline to provide reliable estimates of absolute risk such as survival probabilities. The proposed flexible parametric AFT models is also highly computationally efficient.
An accelerated failure time model will be most appropriate when the covariate effects are multiplicative on a time scale. This scale is intuitive for modelling life expectancy, but is more difficult to interpret in terms of competing risks. To describe this difficulty, consider dividing causes of death into two groups, where an exposure affect the causes of death with different acceleration factors. Then survival from all cause will be the product of two survival probabilities which have "ageing" at different rates for the different causes.
In contrast, the proportional hazards models assume that the covariate effects are multiplicative on the hazards scale. This scale is more intuitive for modelling system dynamics and for competing risks, but these models have a difficult causal interpretation and they are less intuitive for the lay person. As a third model class, the additive hazards model are intuitive for effects operating as competing events and have a straightforward causal interpretation, however they are less intuitive for interpreting effects on the same mechanistic pathway.
Extensions to the framework that would be useful include incorporating random effects, to account for clustered structures and unobserved heterogeneity (Lambert et al., 2004;Crowther et al., 2014), and the extension to interval censoring, possibly in combination with delayed entry.
We provide user-friendly Stata and R software packages to allow researchers to directly use the proposed model framework. For Stata, the command can be installed by typing ssc install staft. For R, the rstpm2 package on CRAN provides an aft regression function.    Table 3: Bias, percentage bias and coverage of estimates of log(-log(S(t))) when X=0 and β = .5  Table 4: Bias, percentage bias and coverage of estimates of log(-log(S(t))) when X=1 and β = .5  Table 5: Bias, percentage bias and coverage of estimates of log(-log(S(t))) when X=0 and β = −.5  Table 6: Bias, percentage bias and coverage of estimates of log(-log(S(t))) when X=1 and β = −.5