Software Application Profile: dynamicLM—a tool for performing dynamic risk prediction using a landmark supermodel for survival data under competing risks

Abstract Motivation Providing a dynamic assessment of prognosis is essential for improved personalized medicine. The landmark model for survival data provides a potentially powerful solution to the dynamic prediction of disease progression. However, a general framework and a flexible implementation of the model that incorporates various outcomes, such as competing events, have been lacking. We present an R package, dynamicLM, a user-friendly tool for the landmark model for the dynamic prediction of survival data under competing risks, which includes various functions for data preparation, model development, prediction and evaluation of predictive performance. Implementation dynamicLM as an R package. General features The package includes options for incorporating time-varying covariates, capturing time-dependent effects of predictors and fitting a cause-specific landmark model for time-to-event data with or without competing risks. Tools for evaluating the prediction performance include time-dependent area under the ROC curve, Brier Score and calibration. Availability Available on GitHub [https://github.com/thehanlab/dynamicLM].


Introduction
Accurate prediction of disease prognosis is essential for effective clinical decision-making. 1 Traditional prediction models are commonly used to estimate the risk of the event of interest (e.g.mortality) at a fixed time point-such as at the time of diagnosis 2,3 or the end of curative treatment 4,5 -and thus may fail to provide updated risk estimates that can change over time. 6,7Temporal changes in patient data can affect the subsequent risk, and incorporating these changes can help update risk estimations for optimal patient management. 6,8he landmark model 6,[8][9][10] and joint modeling 11,12 are two approaches used for dynamic prediction for survival data.Joint modeling can provide accurate estimates by simultaneously modeling the longitudinal markers and time-to-event outcome, but it is computationally intensive and requires more modelling assumptions.Alternatively the landmark model, built based on the concept of ongoing risk assessment times (i.e.landmarks) following the baseline, reduces the computational burden to provide updated risk estimates for large-scale data.Specifically, the dataset is transformed into multiple censored datasets based on a prediction window of interest and predefined landmarks.A model is then fitted on this stacked dataset (i.e.supermodel) that incorporates the dynamic trajectories of the patients, which can be used to provide the most up-to-date risk predictions.
Most landmark models for dynamic prediction have been applied in the context of standard survival outcomes using the Cox model. 6,9,10However, it is often observed that other causes of failure (i.e.competing events) may preclude the occurrence of the event of interest.][15] The cause-specific Cox (CSC) model 16

International Epidemiological Association
combines several fitted Cox models to avoid the overestimation of the predicted risk. 15There have been efforts to incorporate competing risks into dynamic prediction, 10,14,17,18 but they lack a user-friendly implementation.Furthermore, practical evaluations (e.g.computational feasibility) of the dynamic landmark model for competing-risk data have been lacking.
We introduce an R package, dynamicLM, a tool to implement the landmark model for dynamic predictions using survival data incorporating competing risks.The package provides a framework including data preparation, model fitting, prediction, evaluation and visualization.We illustrate this method by applying it to predict second primary lung cancer risk from the Multiethnic Cohort Study 19 with a set of simulations.

Implementation
The dynamicLM tool provides a simple framework to perform dynamic w-year risk predictions, i.e. predicting the risk of developing the event of interest within w years from each risk assessment.Risk prediction for the next w years is made at baseline (e.g.diagnosis) and at a later set of risk assessment times ('landmark' times).
Input data can take various forms; covariates can be static (e.g.race), time-varying (e.g.biomarker changes) or a mix.Time-varying covariates can be in long or wide formats.Outcomes can be time-to-event for one cause (standard survival data) or a specific event in the presence of competing event(s).dynamicLM includes implementations of the Cox landmark model for standard survival data and the CSC landmark model for competing risk data.
The hazard of each cause-specific landmark supermodel for cause j (j ¼ 1; . . .; CÞ from a landmark time s 2 s 0 ; s L ½ for time t (s t s þ w) is: where h s ð Þ models the main effects of the landmark time s and ZðsÞ denotes the most recent covariates of individual observed until s.In particular, the baseline hazard at time t, , is the probability that a person with all zero covariates will experience the event in the next instant if that person survives to t from s.The interaction of s with the covariates, modelled by b j s ð Þ, captures the time-dependent effects of covariates.The w-year survival and (cause-specific) cumulative incidence for cause j can be predicted at any point s in the window ½s 0 ; s L : Estimating cumulative incidence in dynamicLM involves an iteration over the causes and landmarks in the super dataset.It also uses prodlim, 20

Use
This section presents the application of dynamicLM to develop a risk-prediction model for second primary lung cancer (SPLC) among lung cancer patients using the CSC landmark supermodel.The data contain 3844 ever-smoking patients diagnosed with initial primary lung cancer (IPLC) between 1993 and 2007, followed through 2017, in the Multiethnic Cohort Study (MEC) (Supplementary Method 2, available as Supplementary data at IJE online).
The event of interest is the time from IPLC to SPLC diagnosis (Cause 1) with competing events of lung cancer death (Cause 2) and other-cause death (Cause 3).First, we load the package and data into our R session: > devtools::install_github("thehanlab/dynamicLM") > library(dynamicLM) > data(splc) *The example dataset in this R package is synthetic because the original MEC data are only available under a data use agreement.Readers can apply the code but cannot replicate the same estimates.
The super dataset setup requires: (ii) the outcome columns; and (ii) variable types (fixed vs time-varying).Fixed variables (e.g.sex) do not change over time, whereas time-varying covariates (e.g.smoking status at baseline and 10-year follow-up surveys) do.In this example, the fixed covariates include age at IPLC diagnosis ('age.ix'),family history of lung cancer ('fh'), prior history of cancer ('ph'), IPLC stage ('stage.ix'),IPLC histology and IPLC treatment.The timevarying covariates include smoking-related variables.The following code sets this up: > outcome <list(time ¼ "Time", status ¼ "event") > fix_covs <c("age.ix","male", "fh", "ph", "bmi", "stage.ix","hist_AD", "hist_LC", "hist_NSCLC_NOS", "hist_SC", "hist_OTH", "surgery.ix","radiation.ix","chemo.ix","quityears") > vary_covs <c("smkstatus", "cigday", "packyears") To predict the 5-year SPLC risk during the first 3 years of IPLC diagnosis, we first specified a prediction window and established a set of landmarks (i.e.risk assessment time points at 0, 1, 2 and 3 years from IPLC diagnosis), producing one stacked dataset ('lmdata') of four landmark datasets (Supplementary Figure S1 and Supplementary Certain time-varying covariates can be manually updated as time passes.In our data, smoking quit years ('quityears') increase linearly for former smokers: The next step involves creating; (i) interaction terms between chosen covariates ('lm_covs') and landmark times (linear, quadratic, or other forms using 'func_covars') to examine time-dependent effects of the covariate and check the proportional hazard assumption; and (ii) transformations of the landmark time variables ('func_lms').Unlike time-varying covariates, time-dependent effects occur when the hazard of a covariate changes over time.For example, the effect of radiotherapy given to treat IPLC on SPLC risk has been reported to increase over time. 24We selected a priori a list of variables and associated transformations (linear, quadratic) to be checked for the time-dependent effects based on domain knowledge of how variable effects change over time.Certain other transformations (e.g.logarithmic) can be added if necessary (Supplementary Method 2, available as Supplementary data at IJE online).The newly created interaction terms are identified by each variable name followed by an underscore and number (e.g.stage.ix_1¼ stage.ix*LM,and stage.ix_2¼ stage.ix*LM 2 ).
The simulation results for evaluating the impact of correctly handling competing events on predictive accuracy are shown in Figure 2. Using a standard Cox landmark model instead of a CSC landmark model leads to reduced predictive performance when competing risks exist.The difference in AUC between the two models becomes more pronounced over landmark times.
Additionally, we compared the two models in our application example of SPLC (vs simulation).The predicted risk score was overestimated (1.5-5.3 times higher) in a standard Cox landmark vs CSC landmark (Supplementary Figure S4, available as Supplementary data at IJE online), emphasizing the importance of correctly handling competing risks in dynamic prediction.

Discussion
In this study, we presented the R package dynamicLM that implements a flexible framework for building a dynamic landmark supermodel for competing risk data, covering the entire pipeline for data preparation, model development, prediction and evaluation.By providing researchers with practical tools and instructions on how and when to use this approach, dynamicLM holds great promise to improve individual risk predictions by using updated patient data.
The proposed implementation can be applied in many clinical settings, such as predicting a cancer recurrence 6 (or a therapy response) using time-varying biomarkers (i.e.circulating tumour DNA) 1 in clinical trials or predicting second malignancies using updated patients' treatment histories in electronic health records.Future research directions include incorporating regularization for feature selection in highdimensional data and developing unified performance metrics summarized across landmarks to evaluate the predictive performance of the landmark model.
is one solution that riskRegression, 21 survival 22 and dynpred 23 R packages.The dynamicLM package has a PDF manual that is downloadable from GitHub [https://github.com/thehanlab/dynamicLM] and accessible in R program (See Supplementary Method 1, available as Supplementary data at IJE online).

Table S1
Supplementary Method 2, available as Supplementary data at IJE online).The column 'T.fup' represents the duration between baseline and follow-up measurements of time-varying covariates ('smkstatus', 'cigday' and 'packyears').This follow-up time is specified in the 'rtime' argument: , available as Supplementary data at IJE online).The original dataset (splc) is in long-form format that may include multiple observations per patient if the patient has 10-year follow-up survey data (