Semiparametric regression on cumulative incidence function with interval-censored competing risks data and missing event types

Summary Competing risk data are frequently interval-censored, that is, the exact event time is not observed but only known to lie between two examination time points such as clinic visits. In addition to interval censoring, another common complication is that the event type is missing for some study participants. In this article, we propose an augmented inverse probability weighted sieve maximum likelihood estimator for the analysis of interval-censored competing risk data in the presence of missing event types. The estimator imposes weaker than usual missing at random assumptions by allowing for the inclusion of auxiliary variables that are potentially associated with the probability of missingness. The proposed estimator is shown to be doubly robust, in the sense that it is consistent even if either the model for the probability of missingness or the model for the probability of the event type is misspecified. Extensive Monte Carlo simulation studies show good performance of the proposed method even under a large amount of missing event types. The method is illustrated using data from an HIV cohort study in sub-Saharan Africa, where a significant portion of events types is missing. The proposed method can be readily implemented using the new function ciregic_aipw in the R package intccr.


Appendix I. Proof of Theorem 1 (double-robustness)
To show the double robustness property of the proposed estimator we will use empirical process theory (Kosorok, 2008;Van der Vaar and Wellner, 1996). In this section we use the standard empirical process notations P f = X f (x)dP (x) and P n = n −1 n i=1 f (X i ) for a measurable function f : X → R, where X is the sample space. Also, let K be a generic constant, that could differ from place to place. We now define empirical process. Now, define the functions (1) For the first term we have After trivial algebra, the first expectation in the right side of (0.2) is If either ρ(O i ; ξ * ) or π j (O i ; ψ * ) is correctly specified, that is if either E(R i |O i ) = ρ(O i ; ξ * ) a.s. or ij |O i ) = π j (O i ; ψ * ) a.s., then, in light of condition C8, it follows that E ∆ (1) ij (ξ * , ψ * ) − ∆ (1) ij = 0. Similarly, if either ρ(O i ; ξ * ) or π j (O i ; ψ * ) is correctly specified, then E ∆ (2) ij (ξ * , ψ * ) − ∆ (2) ij = 0. Therefore, in light of (0.2), B n p → 0. Finally, Bakoyannis and others (2017) showed that C n p → 0 and, thus, based on (0.1), condition (i) is satisfied.
Condition (ii) has been shown by Bakoyannis and others (2017). Finally, by Taylor expansion and conditions C1, C7, and C8 it follows that∆ ij (ξ * , ψ * ) + o p (1) and ij (ξ * , ψ * ) + o p (1). Thus,M n (θ n ;ξ n ,ψ n ) −M n (θ 0 ;ξ n ,ψ n ) =M n (θ n ; ξ * , ψ * ) − M n (θ 0 ; ξ * , ψ * ) + o p (1). Now, using the same arguments to those used in the consistency proof in Bakoyannis and others (2017) leads to the conclusion that condition (iii) is satisfied. Therefore, d(θ n , θ 0 ) p → 0. We implemented the proposed augmented inverse probability weighted method in the existing R package intccr (Park and others, 2019). The corresponding function ciregic_aipw for the analysis of interval-censored competing risks data and missing event types is provided in R version 3.5.2 or higher (R Core Team, 2019). Currently, the function allows for only two event types. The package installation and loading can be performed as follows: R> install.packages("intccr")

R> library(intccr)
In this illustration we will analyze the simulated data set (simdata_aipw) which is available in the intccr package. This data set consists of 200 observations with 7 variables: id, v, u, c, z1, z2, and a. The description of these variables is provided in Table1  The simdata_aipw has 50.3 % missing event types among 169 observations that are not right-censored.
The function ciregic_aipw fits two parametric models; one is a logistic regression model for the probability of non-missingness using 169 observations and the other is for the probability of event type using 84 observations. Signif. codes: 0`***' 0.001`**' 0.01`*' 0.05`.' 0.1`' 1 The parallel computing option do.par = TRUE selects the maximum number of cores minus one. For example, 3 available cores are assigned in quad core system because the user needs one core to run the operating system. The parallel computing offers faster bootstrap standard error computation, and returns the same result if the same seed number is defined. Moreover, we provide a function that returns the covariate-specific predicted cumulative incidence function (CIF). The generic function predict provides a corresponding predicted CIF to a sequence of time points and a combination of covariates. The following R code shows how to draw a plot for the predicted baseline CIFs. The resulting plot is depicted in Figure ??, a different value in the argument covp provides the predicted CIFs with for the required covariate pattern (e.g. covp = Appendix IV: Analysis of cause-specific hazards in the HIV data example It has been argued that, in real-world analyses with competing risks data, one should analyze all the CSHs and CIFs to obtain a more complete understanding of the competing risks process under study (Latouche and others, 2013). However, to the best of our knowledge, there are no methods for semiparametric regression analysis of the CSH under both interval censoring and missing event types. Thus, in order to additionally analyze the CSHs, we used the maximum pseudo-partiallikelihood estimator (MPPLE) for the semiparametric proportional hazards model that accounts for missing event types (Bakoyannis and others, 2020), while we implemented the naïve midpoint imputation method to address the interval censoring issue in the data. The results from this analysis are provided in Table 1. There is no statistically significant evidence that male gender is associated with the CSH of disengagement. In contrast, the effect of male gender on the CIF of disengagement is statistically significant. This can be explained by the fact that males have a higher CSH of death compared to females. Thus, males appear to disengage less than females since males die more and this precludes them from experiencing disengagement. This can be seen more precisely by the relationship between the CIF and the CSH. Let F 1 (t; Z) and F 2 (t; Z) represent the CIFs of disengagement and death, respectively, conditional on the covariates Z.
The corresponding CSHs are denoted as λ 1 (t; Z) and λ 2 (t; Z). Then Therefore, even if λ 1 (s; Z) does not depend on gender, males will have a lower overall survival S(u; Z) as a result of their CSH hazard of death λ 2 (s; Z) and, thus, a lower CIF of disengagement