CGR-CUSUM: a continuous time generalized rapid response cumulative sum chart

Summary Rapidly detecting problems in the quality of care is of utmost importance for the well-being of patients. Without proper inspection schemes, such problems can go undetected for years. Cumulative sum (CUSUM) charts have proven to be useful for quality control, yet available methodology for survival outcomes is limited. The few available continuous time inspection charts usually require the researcher to specify an expected increase in the failure rate in advance, thereby requiring prior knowledge about the problem at hand. Misspecifying parameters can lead to false positive alerts and large detection delays. To solve this problem, we take a more general approach to derive the new Continuous time Generalized Rapid response CUSUM (CGR-CUSUM) chart. We find an expression for the approximate average run length (average time to detection) and illustrate the possible gain in detection speed by using the CGR-CUSUM over other commonly used monitoring schemes on a real-life data set from the Dutch Arthroplasty Register as well as in simulation studies. Besides the inspection of medical procedures, the CGR-CUSUM can also be used for other real-time inspection schemes such as industrial production lines and quality control of services.


Introduction
Rapid detection of deterioration in the quality of care can spare patients unnecessary health burdens.There are currently many inspection schemes which can be used to monitor the quality of care, such as funnel plots [Spiegelhalter, 2005] and a variety of Cumulative Sum (CUSUM) charts (Steiner et al., 2000;Biswas and Kalbfleisch, 2008).A particularly attractive property of CUSUM charts is that they can be used to sequentially check for a decrease in the quality of a process.Ideally, the inspection scheme is also tailored to the outcome type.In this article we are interested in inspecting survival outcomes, where every individual can experience a failure at any time after their entry into the study.As an example, the Dutch Arthroplasty Register (LROI) is interested in simultaneously monitoring the quality of orthopaedic care at multiple hospitals performing total hip replacement surgery by considering the information provided by the time of implant failure as soon as it occurs, as well as the information provided by patients not experiencing implant failures.To facilitate such real-time inspection, Biswas and Kalbfleisch [2008] developed a CUSUM chart for survival outcomes, followed by Sego et al. [2009] and Begun et al. [2019].Each of these charts uses different assumptions in the CUSUM model and is therefore applicable in different scenarios.One similarity is that all of them require the researcher to specify an expected increase in the future rate of failure.When this quantity is chosen incorrectly, the charts may experience delays in detection and produce false negative signals.
Our main goal in this article is to develop a method that no longer requires the researcher to specify many parameters in advance, thereby requiring less prior information for inspection and leading to faster detection times in practical applications.For this reason, we devise a generalisation of the CUSUM chart by Biswas and Kalbfleisch [2008], which we call the Continuous time Generalised Rapid Response CUSUM (CGR-CUSUM).Biswas and Kalbfleisch [2008] chose to only consider the information provided by patients until 1 year after their procedure.In contrast, the CGR-CUSUM is constructed using all available information on any patient at all times.A consequence of these changes is that generally our chart leads to quicker detection of underperforming hospitals, thereby contributing to the improvement of the quality of care.
Other methods for the continuous time inspection of the quality of care include the uEWMA chart for survival time data by Steiner and Jones [2009] and the STRAND chart by Grigg [2018].Grigg [2018] briefly discusses the differences between the BK-CUSUM, uEWMA and STRAND charts, and concludes that the uEWMA and STRAND charts are particularly suitable for quick detection when failures are clustered.In contrast, the BK-CUSUM and the CGR-CUSUM are designed to detect increased failure rates without a specific mechanism for clusters.
We derive an approximation for the average run length (average time to detection) of the CGR-CUSUM, by means of considering a simplification of the CGR-CUSUM called the Continuous time Generalised Initial response CUSUM (CGI-CUSUM).Additionally, we consider an adjusted Biswas and Kalbfleisch [2008] CUSUM procedure which uses the information of all patients at all times, which we call the BK-CUSUM for convenience.Similarly, we present an approximation to the average run length of the BK-CUSUM and compare this approximation with the approximation found for the CGR-CUSUM.This comparison demonstrates how incorrect prior information can significantly increase the detection times of the BK-CUSUM procedure, which then also carries over to the Biswas and Kalbfleisch [2008] CUSUM.
The new CGR-CUSUM chart can be a very useful tool in practical applications where the future expected rate of failure is not known in advance or likely to vary over the time of the study.As this occurs often in medical applications, the CGR-CUSUM chart can significantly improve the quality of care worldwide by inspecting current procedures.In contrast to the multi-chart CUSUM scheme of Han and Tsung [2007], where the possible increase in failure rate is considered over a finite probable domain, the CGR-CUSUM only requires the construction of one chart and the increase in failure rate can also be limited to a fixed domain.On top of this, the CGR-CUSUM is not limited to medical applications.The chart can be used to inspect any procedures involving "survival" outcomes, such as production lines and customer satisfaction inspection.
In Section 2 of this article the relevant quantities, notation as well as the CGR-CUSUM are introduced.Afterwards the Biswas and Kalbfleisch [2008] CUSUM procedure is introduced, as well as an adjusted version thereof which we call the BK-CUSUM.An approximate average run length is derived for both procedures.In Section 3 all methods are applied to a data set from the Dutch Arthroplasty Register (LROI).In Section 4 a simulation study is performed to compare the average run lengths of aforementioned procedures under restrictions on their null (hypothesis) average run length.Additionally, a simulation study is performed using the data from this register where the type I error of the charts over time is restricted under the null rate.The article concludes with a discussion and recommendations for practice.

Model and data
Following Biswas and Kalbfleisch [2008], consider a hospital with subjects i = 1, 2, ... arriving (entering the study) according to a Poisson process with rate ψ.Let S i denote the time of entry of subject i into the study, relative to the starting time t = 0. Denote by X i the time from entry until failure, such that T i = S i + X i is the chronological time of failure.Consider only right-censored observations, and let R i denote the chronological time of right-censoring of observation i.Let the p−vector Z i denote the relevant covariates of subject i. Assume that there is a known null-distribution for the subject specific time to failure, denoted by the hazard rate h i (x).We make use of the Cox proportional hazards model to incorporate the covariates, such that h i (x) = h 0 (x) exp Z i β β β with regression coefficients β β β and known baseline hazard rate h 0 .Let Y i (t) = 1{S i ≤ t ≤ min{T i , R i }} be an indicator whether subject i is active at time t.Define Ñi (t) = 1{T i ≤ t} and subsequently define N i (t) = t 0 Y i (u)d Ñi (u) for t > 0 as the counting process for an observed failure of subject i. Define N (t) = i≥1 N i (t) as the counting process for the total number of failures observed at the hospital.Define the cumulative intensity of subject i as • exp(θ) and F θ i the associated cumulative distribution function.We call exp(θ) the hazard ratio and say that the process is in control when θ = 0 and out of control when θ > 0. Define Λ(t) = i≥1 Λ i (t) as the total cumulative intensity at the hospital at time t.For aforementioned counting processes, define dN (t) = N (t+dt)−N (t), with dt an infinitesimally small quantity.It follows that: We denote the right-hand side of (1) by dΛ θ i (t).

Continuous Time Generalized Rapid Response CUSUM (CGR-CUSUM)
The CUSUM procedure developed by Biswas and Kalbfleisch [2008] can be used to test whether the hazard rate at a hospital has increased from Λ i to Λ θ i for some fixed and known θ > 0, at some unknown time after the start of the study.This procedure is very useful when there is some prior knowledge about the true hazard ratio e θ , but may lead to delays in detection when this is not the case or when the rate of failure is variable.For this reason we will consider a more general test, where the expected hazard ratio no longer needs to be specified in advance, much like the GLR Statistic in Siegmund and Venkatraman [1995] is a generalization of the original CUSUM procedure of Page [1954].
To achieve this, we test the hypotheses of a sudden change in hazard rate at some unknown study time s > 0, affecting all subjects at risk at time s and thereafter: with θ > 0. Let us find the likelihood ratio for a test of θ = 0 against θ = θ 1 with θ 1 > 0 an unknown constant.The distribution of dN i (t) is approximately Bernoulli distributed with probability λ θ i (t)dt conditional on the history up to time t (e.g.Klein and Moeschberger [2003], Section 3.6).The likelihood of all n observations is then equal to This yields a likelihood ratio statistic at time t of: where θ(t) is the maximum likelihood estimate of θ at time t.This maximum likelihood estimator θ(t) can be determined by maximising the likelihood at a hospital where patients are failing with cumulative intensity e θ Λ i (t) up to time t over θ and is given by: The logarithm of the LR statistic is then given by: Note that this quantity will increase as soon as a failure is observed, and drift downwards at all other times.A preliminary chart is then found by maximising U (t) over the interval [s, t]: where s indicates that the quantity is determined using the information provided by all active patients in the time frame (s, t): In contrast to the method developed by Biswas and Kalbfleisch [2008], it is not straightforward to determine this chart recursively due to the combination of a maximum likelihood estimator over time and the maximisation term over recent time.This makes the chart very computationally expensive.We therefore consider simpler hypotheses: with θ > 0 and ν ≥ 1 both unknown in advance.We then test whether the rate of failure at the hospital has changed to e θ Λ i , starting from some subject ν ≥ 1.These hypotheses make sense in a medical context, where the hazard rate is likely to depend on the entry time of the patient.
Definition 1 The Continuous time Generalised Rapid response CUSUM (CGR-CUSUM) chart is given by: with (subjects sorted according to chronological arrival time): and θ≥ν (t) = max 0, log In the CGR-CUSUM patients prior to the ν−th patient no longer contribute to the chart at all, whereas in G(t) all patients active after time t − s still contribute to the value of the chart.This difference is highlighted in Figure 1 of the Supplementary Materials.To employ a testing procedure, we construct the chart CGR(t) at every relevant time point t and reject the null hypothesis (producing a signal) as soon as CGR(t) ≥ h for some h > 0. This constant h is called the control limit, and can be chosen in accordance to some desired property of the procedure such as the average run length of the chart defined below.
Definition 2 Denote by τ h = inf{t > 0 : CGR(t) ≥ h} the time it takes for a CGR-CUSUM to produce a signal.The average run length (ARL) is then defined as E[τ h ].We refer to the in control average run length as the expected time to detection when exp(θ) = 1 and out of control average run length when exp(θ) > 1.

An approximation to the ARL
In this section we will derive an upper bound for the average run length of the CGR-CUSUM in the out of control case.The maximisation term in (7) poses a significant challenge in approximating the ARL.It turns out that we can derive a bound on the ARL through comparison with a simpler version of the CGR chart.For this reason we consider the Continuous time Generalised Initial response (CGI) CUSUM chart.This chart can be used to test the hypotheses of an initial change in the rate of failure:

Definition 3
The Continuous time Generalized Initial response CUSUM (CGI-CUSUM) with θ(t) as in (3) is given by: Note how the CGI chart is simply the CGR chart without the maximisation term.The CGI-CUSUM is not a chart which should be used in practice as it cannot be used to sequentially detect a changepoint in the process, but instead it is merely a tool for theory.Due to its simpler expression, it is possible to determine the asymptotic distribution of the chart under some assumptions.One of the key assumptions is that subjects arrive according to a Poisson process with rate ψ, allowing us to equate the number of patients to time by n ≈ ψ • t.
The proof of this theorem, the required regularity conditions as well as the derivation of the Fisher information can be found in the Supplementary materials Sections 2, 3 and 4. The usefulness of this result depends on the availability of an expression for I(θ, t).A discussion on how to calculate the Fisher information, as well as some examples for the PVF family of distributions can be found in the Supplementary Materials Section 7. We determine an approximate (asymptotic) average run length for the CGI chart by equating the expected value of the asymptotic distribution to the control limit h.
Lemma 1 We find an approximate average run length ARL CGI (θ, h) for the CGI-CUSUM when exp(θ) > 1 by solving the following equation for t: For exp(θ) = 1, this method yields no approximation to the ARL and it is therefore not possible to determine theoretical control limits which restrict the in control ARL.It is possible to approximate the value of the in control average run length by means of Monte Carlo simulation when it is of interest.Note that due to the convergence requirement, this approximate ARL will not yield good approximations for small values of the control limit h.The theoretical out of control ARL will be evaluated by means of simulation in Section 4.1.
Note that the CGR-CUSUM is simply a CGI-CUSUM maximised over the last n − ν patients.As a result, the CGR-CUSUM is always larger or equal than the CGI-CUSUM.This property allows us to compare the average run lengths of the CGR-and CGI-CUSUM charts.
Remark 1 Suppose that subjects are failing with increased hazard rate Λ i exp(θ) from the beginning of the study.
Then the average run lengths of the charts can be compared as follows: In most practical applications, an upper bound is sufficient as the interest lies in restricting the run time of the chart from above when the failure rate is higher than expected.
Due to the found upper bound, we can now determine the CGI chart on out of control samples in simulation studies to obtain information on the ARL of the CGR chart for comparable samples.This negates the need to construct the CGR chart when approximating the ARL, saving a lot of computation time.Another way to reduce the computation time of the CGR-and CGI-CUSUM charts is given in the following corollary.
Remark 2 The value of the CGR-CUSUM and CGI-CUSUM can only increase at a time point when a failure is observed.As a consequence, for detection purposes it is sufficient to only determine the value of the charts at the times of failure.

The Biswas and Kalbfleisch [2008] CUSUM and CGR-CUSUM
By a priori fixing a value θ 1 > 0 for θ in the chart G(t) (see Equation ( 4)) we would recover the CUSUM procedure developed by Biswas and Kalbfleisch [2008].The biggest advantage of the CGR-CUSUM over the Biswas and Kalbfleisch [2008] CUSUM is that we no longer need to specify this expected hazard ratio, allowing for a more general test requiring less prior knowledge.Besides this, the maximum likelihood estimator allows for updating the parameter to the most recent failure rates.In contrast, the maximum likelihood estimator needs time to converge to the true value, possibly causing delays in detection when compared to the Biswas and Kalbfleisch [2008] CUSUM with correctly specified θ 1 .
Biswas and Kalbfleisch [2008] note that one year post procedure survival outcomes are often employed for medical inspection schemes, and decide to consider subjects as active only for C = 1 year after the procedure.This limitation allows them to derive a closed form approximation to the average run length of the chart.We decide not to disregard the information provided by patients 1 year post procedure.The value of the chart is then based on more complete information, possibly leading to quicker detection times.With this approach, determining an expected run length shorter than C = 1 year is possible, in contrast to Biswas and Kalbfleisch [2008].Our new approach then also leads towards an approximate ARL for the Biswas and Kalbfleisch [2008] CUSUM procedure with the C = 1 limitation relaxed.Further on in this article, we will only consider the Biswas and Kalbfleisch [2008] CUSUM procedure with the C = 1 limitation relaxed, as it is more similar to our CGR chart.We call this chart the BK-CUSUM chart.

Definition 4
The BK-CUSUM is given by: with notation as in (5) where exp(θ 1 ) is the expected hazard ratio chosen in advance.
Taking a similar approach to Section 2.3, it is possible to determine an approximate average run length for the BK-CUSUM procedure.
Corollary 1 Suppose θ 1 is chosen such that exp(θ 1 )/ exp(θ) < θ 1 + exp(−θ).We find an approximate average run length ARL BK (θ, h) by solving the following equation for t: The proof can be found in the Supplementary Materials section 5.
Due to the restriction on θ 1 it is not always possible to use this expression for the approximate ARL.As I(θ, t) is non-negative for every t ≥ 0, the approximate ARL for the CGR and the BK-CUSUM can be compared.It can easily be seen that when θ 1 = θ, the left side of ( 9) is guaranteed to be larger than the left side of (10) for t > 0. This means that when the expected hazard ratio exp(θ 1 ) is misspecified, the approximate ARL of the CGI chart will be smaller than that of the BK-CUSUM chart therefore yielding faster out of control detection speeds.
The difference between the CGR-CUSUM and BK-CUSUM lies in the hypotheses used for constructing the chart, where the CGR-CUSUM is used to detect a change in hazard rate for all patients entering after some patient entry time and the BK-CUSUM to detect a spontaneous change in hazard rates for all patients at risk after some chronological time.This difference is shown visually in Figure 1 of the Supplementary materials.

Application to Dutch Arthroplasty Register (LROI)
We demonstrate the possible gain in detection speed when using the CGR-CUSUM over the BK-CUSUM by applying both methods on a hip replacement data set from the Dutch Arthroplasty Register (LROI).The LROI is the Dutch national registry of all orthopaedic implants (e.g.hip, elbow, wrist, ankle, knee, shoulder, finger, thumb), with a reported completeness of more than 95 percent for registered hip and knee surgical procedures (van Steenbergen et al. [2015], Dutch Arthroplasty Register (LROI) [2020]).

The data set
The data used for the analysis consists of information on total hip replacement surgeries at 97 hospitals across the Netherlands from 01/01/2014 up until 01/01/2020 and was received under agreement LROI 2020 − 053.Available variables are the dates of all primary procedures, time until failure of the prosthesis (our main interest) and/or death of the patient as well as multiple characteristics of each patient which can be found in the Supplementary materials Table 2. Three characteristics of patients had more than 0.5 percent of missing values, which were BMI (1.8%), Smoking indicator (4.5%) and Charnley Score (5.3%).Using the R package mice (van Buuren and Groothuis-Oudshoorn [2011]) we imputed missing values to obtain a complete data set.

Baseline: yearly funnel plot
The current method employed by the LROI for comparing implant surgery performance between hospitals is a yearly risk-adjusted funnel plot over all available data of the recent three to six years.The funnel plot uses one year post surgery failure as binary outcome, therefore not allowing for continuous inspection of the quality of care.van Schie et al.
[2020] have used the funnel plot as the "golden standard" for the LROI, indicating which hospitals had problems in their quality of care.As we have no information on the true failure rate and problems at the hospitals in question, we will compare detection times with the funnel plot as well.

The baseline hazard
In any practical application, the determination of the baseline is of great importance when considering the BK-and CGR-CUSUM charts as this greatly influences the detection speed and false detection rate.In both cases, the baseline is completely fixed by a null hazard rate and the corresponding Cox regression coefficients.In good scientific practice, these quantities should be determined using an in control data set, where failures are known to be happening at an acceptable rate.In reality, it is often difficult to obtain such a set for many different reasons.Because of this we determine the null hazard rate and Cox coefficients using the whole data set as training set.This implies that the average national failure rate over all hospitals is up to the desired standard.The same is done for the funnel plot.
The yearly funnel plot requires a yearly determination of the baseline.To make a fair comparison between the funnel plot and CUSUM methods, we therefore determine the baseline hazard rate, failure proportion and risk-adjustment coefficients using the whole data set restricted to the first 3, 4, 5 and 6 years of information for both methods.This was achieved by using the Cox proportional hazards function in the R package survival developed by Therneau [2021].This way, all methods use the same information for the construction of the charts and thereby all use the same "standard of care".We start with determining parameters at the 3 year margin to have a sufficient amount of information for the determination of the null parameters.

Determining control limits
We determine control limits for the risk-adjusted BK-and CGR-CUSUM charts by restricting the simulated probability of a type I error to 0.05 over a period of 6 years.The procedure is described in the Supplementary materials Section 6. Due to the extremely low failure rate in the data we chose to restrict e θ(t) ≤ 6.This comes down to believing the hazard rate at a hospital cannot be more than 6 times the baseline.Without this limitation, the CGR-CUSUM made very large jumps for patients experiencing near instant failure (first or second day after surgery), almost always leading to detection.The determined control limits and a summary of detection times with respect to the hospitals detected by the funnel plot in the first three years can be found in Table 2.The continuous time procedures have their detection time rounded upwards to the closest month, to show what detection times are realistic when constructing the charts monthly.
We also include the detection times achieved by the monthly Bernoulli CUSUM procedure suggested by van Schie et al.
[2020].They chose to take a control limit of h = 3.5 in correspondence to other literature.Exact detection times for all detected hospitals can be found in Table 1 of the Supplementary materials.

Result: average detection delays
As the Bernoulli CUSUM and funnel plot both use one year post implant failure as outcome and the same risk-adjustment model, they can both detect exactly the same hospitals at the end of the third year.This is different for the continuous time CUSUM charts: both the BK-and CGR-CUSUM do not detect hospitals 5 and 19 and yielded one and two "false" detections respectively.Overall, the continuous time CUSUM procedures yield (much) faster detection times, but also signal very different hospitals than the discrete time methods, especially after the four year mark.There are multiple reasons for this.We will explain some of the possible mechanisms which cause a mismatch in detections between the methods by means of some examples in Figure 1.A general observation is that the Bernoulli CUSUM has a one year delay compared to the continuous time charts.In Figure 1a we can see that hospital 5 was signalled by the funnel plot and Bernoulli CUSUM but not by the BK-and CGR-CUSUM charts.Whereas the continuous time charts show a downward motion after a period of multiple consecutive failures, the Bernoulli CUSUM does not.This is most likely due to the fact that the Bernoulli CUSUM and funnel plot only consider whether an implant has failed within one year, and disregard the time of the death.The BK-CUSUM has a similar problem, where multiple consecutive failures in a short period of time can trigger a false alarm, even if implants fail at reasonable times.A possible example of this can be seen in Figure 1b.We can see that the many consecutive failures make the chart jump upwards by log(2) every time, independent of the probability of failure of those implants at that point in time, thereby rapidly hitting the control limit and afterwards quickly dropping to zero.In contrast to this, the CGR-CUSUM can produce a signal when a few very unlikely failures happen in rapid succession, as can be seen in Figure 1c.We can see that the Bernoulli CUSUM also almost hits the control limit at a later point, as the upward jumps in the Bernoulli CUSUM chart also depend on the likelihood of failure.Finally, Figure 1d shows us a hospital which was only detected by the funnel plot.We can see that the hospital experiences a steady stream of failures as the value of the charts is never zero, meaning the proportion of failures at this hospital is reasonably high.The CUSUM charts however indicate that failures are happening at an acceptable rate (possibly slightly higher than target).
The main take-away from this section is that using the continuous time methods it is possible to detect most hospitals signalled by the discrete time methods (much) faster, while guaranteeing a lower percentage of false positive signals.
This follows from the fact that the type I error probability for the funnel plot was restricted to approximately 0.05 in 3 years, while we chose control limits for the BK-and CGR-CUSUM such that the probability of a type I error in 6 years was 0.05.Coincidentally, the Bernoulli CUSUM control limit of h = 3.5, although chosen with a different reasoning, corresponds closely to limiting the type I error in 3 years to 0.05 (h = 3.3).Additionally, the results found in this section have to be considered in the correct perspective.We cannot simply state whether the right hospitals were detected at all by any of the charts as we have no information about the true failure rates.For this reason, it is crucial to compare the performance of the charts on a set of hospitals where the true performance of the participating hospitals/implants is known.This will be done in the next section by means of comparing the (average) run length of the procedures under restrictions of the ARL and type I errors when hospitals are performing as expected.

Simulation studies
In Section 3 we did not know which hospitals were in control.In order to truly compare the performance of the BK-and CGR-CUSUM charts it is crucial to know which hospitals are out of control.This section will compare the methods when the true failure rates at the hospitals are known by means of simulation studies.
In many practical applications the time to detection after problems occur is of crucial importance in monitoring the quality of the process.Therefore, comparing the performance of inspection schemes in terms of detection speed is important.The expressions found in Section 2 for the approximate average run length of the charts provide a way to compare the charts on a theoretical basis.The equations yielded approximations, and depend strongly on the convergence rate of the maximum likelihood estimate.A simulation study can provide a better picture on the finite sample performance of said methods.Besides the detection speed, other quantities such as the type I error and power over time are of interest and will be considered later on in this section.
In Section 3, we chose to impose an upper limit for the MLE e θ(t) < 6 in the CGR-CUSUM, due to the extremely low failure rates in the LROI data.In this section we also investigate the unrestricted CGR-CUSUM, to investigate the impact of this decision.All simulation studies performed in this section will follow the simulation procedure stated in the Supplementary materials section 6.

A comparison of ARLs
The main goal of this simulation section will be to compare the BK-CUSUM with the new CGR-CUSUM procedure on detection speed for out of control instances.A core assumption of the considered methods is that the change in failure rate happens instantaneously (instead of gradually) and that the true change can be quantified as a fixed increase exp(θ) of the hazard rate.In many practical applications, both assumptions are not likely to hold.We want to examine the effect of wrongly choosing the expected change in failure rate exp(θ 1 ) in the BK-CUSUM.
To this end we consider the CGR-CUSUM and two BK-CUSUM procedures with exp(θ 1 ) = 1.4 and 1.8 respectively.We cannot use equal control limits for all charts as this would lead to different properties under the null.For this reason we determine control limits h for each procedure such that the in control (exp(θ) = 1) average run lengths of the procedures are approximately equal to 15 years on a simulated sample size of N = 3000 hospitals.For the in control hazard we use an exponential distribution with rate λ = 0.002 (time in days), so that approximately half of the subjects have failed 1 year post procedure.We simulate patient entry by a Poisson process with rate ψ = 2.28 (in days), corresponding to the largest hospitals in the LROI data set.For this simulation study, risk-adjustment procedures were not considered.For the out of control situation, we want to explore what happens when the chosen θ 1 is far away from the true value, so we choose true failure rates exp(θ) ∈ {1.2, 1.4, ..., 3} to generate out of control data sets containing N = 3000 hospitals with the arrival rate and null hazard rate as before.The run lengths of the two BK-CUSUM procedures are determined for each out of control data set.We determine the run length of the CGI chart on these data sets, giving us an upper bound on the run length of the CGR chart.The results can be found in Table 1, as well as the expected theoretical value of the run length as determined using equations ( 9) and ( 10).The calculation of the Fisher information for the exponential case is discussed in the Supplementary materials Section 7. A notable result is that at exp(θ) = 1.4 the BK-CUSUM with exp(θ 1 ) = 1.4 clearly performs better than the CGR-CUSUM, but the BK-CUSUM with exp(θ 1 ) = 1.8 performs worse than the CGR-CUSUM.This already indicates that the impact of misspecifying θ 1 can be quite large.Surprisingly, at exp(θ) = 1.8 the CGR-CUSUM outperforms the other two charts with respect to ARL, but has the largest standard deviation in detection times.In contrast, for small values of θ the SD of the BK-CUSUM charts is larger.Finally, for very large values of exp(θ) > 2 the CGR-CUSUM seems to be the clear winner.Noticeably, the run lengths of the BK-CUSUM are way more right-skewed than those of the CGR-CUSUM.This can be explained by the non-variable (θ 1 ) size of jumps the BK-CUSUM charts can make, in contrast to the variable ( θ(t)) jump size of the CGR-CUSUM.All in all, we can conclude that with respect to detection speed the BK-CUSUM is the preferred chart when the true hazard ratio is small (exp(θ) ≤ 1.4) and/or we have a lot of confidence in our prior knowledge.The approximate average run lengths determined using ( 10) and ( 9) seem to work quite well both for the BK-CUSUM as well as for the CGR-CUSUM, especially for large (exp(θ) > 1.2) true hazard ratios.
These simulation results give rise to the presumption that the CGR-CUSUM should perform better when the rate of failure is variable, especially combined with large values of θ.This is also what we saw in Section 3, when we applied the CGR-CUSUM to a real-life data set.

Power under type I error restriction
Instead of restricting the in control ARL, Biswas and Kalbfleisch [2008] and Begun et al. [2019] have chosen to restrict the simulated in control type I error to 0.15 in 5 years and 0.1 in 8 years respectively.Besides this, hospitals vary in size and therefore the number of patients treated per day.This difference in patients treated per time unit, in our model expressed by the parameter ψ, has a strong influence on the detection speed and power of the procedures.
For this reason, in this section we will determine the power over time of two BK-CUSUM procedures (e θ1 ∈ {2, 4}), two CGR-CUSUM (e θ ≤ {∞, 6}) procedures and the Bernoulli CUSUM (e θ1 = 2) for hospitals of different sizes under a restriction on the type I error.We consider 4 groups of hospitals by size, with ψ ∈ {0.2, 0.6, 1, 1.7}.These values were determined by subdividing the hospitals in the LROI data set into four groups by size and averaging over their estimated patient arrival rate, see also Figure 2a.Using the simulation procedure described in Supplementary materials Section 6 with resampling from the LROI data set we find control limits for all considered methods by limiting the risk-adjusted simulated type I error in 6 years to α ≈ 0.1 on N = 500 in control hospitals, see Table 3.Note that the control limits for the unrestricted CGR-CUSUM are very close together for all values of ψ.This is a consequence of no longer bounding the MLE θ(t) from above.
We then simulate N = 500 out of control (exp(θ) = 2) hospitals for each considered value of ψ.The detection times on these data sets are then determined for each chart using the control limits in Table 3.The resulting power over time for the BK-CUSUM (e θ1 = 2), the Bernoulli CUSUM and CGR-CUSUM can be seen in Figures 2b and 2c.The BK-CUSUM with correctly specified parameters clearly has the best power over time for hospitals of all sizes.The CGR-CUSUM performs worse than the Bernoulli CUSUM for low arrival rates, but does better as the arrival rate increases.This is due to the very high value of the control limit for the CGR-CUSUM, causing detections to be delayed.
We also compare the power over time of the BK-CUSUM (e θ = 2) with that of the CGR-CUSUM (e θ(t) ≤ 6) and the BK-CUSUM (e θ = 4) in Figure 2d.In this figure, the CGR-CUSUM is clearly the winner for all hospital sizes.The control limits for the restricted CGR-CUSUM are much smaller than for the unrestricted CGR-CUSUM.This is because the unrestricted CGR-CUSUM can produce extremely large estimates for θ(t), therefore becoming very unstable even in the in control situation.The BK-CUSUM (e θ1 = 4) with incorrectly specified parameters performs the worst for all hospital sizes.Notably, all 3 procedures seem to converge towards the same power over time graph as the arrival rate increases, which was not the case in Figure 2c.We conclude that the CGR-CUSUM can yield the best power over time, but depending on the nature of the data restricting the value of θ(t) might be necessary to achieve such a performance.

Discussion
For almost all applications, the CGR-CUSUM will yield earlier detection times than the BK-CUSUM.This is because in any practical application the change of failure rate at a hospital will never be of a fixed size and most of the time this change will happen gradually instead of instantaneously.For this reason, the CGR-CUSUM will perform better in any practical scenario where the expected true hazard ratio exp(θ) is not known in advance or variable over time.This was demonstrated very clearly in our application of the charts to the LROI data set.The application of the BK-and CGR-CUSUM charts on the LROI data set also showed that in a practical application the CGR-CUSUM outperforms the BK-CUSUM with respect to detection times, while retaining a similar number of "false" detections.It is important to note that we do not know whether the hospitals detected by the funnel plot were the hospitals with "true" problems, instead operating in line with van Schie et al. [2020] by taking the funnel plot as the golden standard.We cannot be sure that the chosen expected hazard ratio for the BK-CUSUM was in line with reality.All in all, we can conclude that the CGR-CUSUM is the preferred method for quality inspection, especially for large arrival rate ψ.
From the simulation study in Section 4.2 we concluded that the CGR-CUSUM can yield better power than the BK-CUSUM, but might require appropriately restricting the values of θ(t).Even though the CGR-CUSUM was created with the goal of specifying less parameters, we believe that bounding θ(t) is often more tractable than correctly specifying the expected hazard ratio.This restriction was necessary for the LROI data, but not all survival data will have an extremely low failure rate and therefore the CGR-CUSUM could also perform well without this restriction, as was seen in Section 4.1.

Recommendations for practice
For practical applications we suggest using the CGR-CUSUM for quality control, keeping in mind that it might be necessary to restrict the maximum likelihood estimate to an appropriate range of values (i.e. e θ(t) ≤ 6).For small volume hospitals the BK-CUSUM could be preferred, as long as there is some prior information about the expected increase in failure rate.This way the small amount of information retained from patients can be partly compensated by prior knowledge.The use of a funnel plot is not advised as it is not a real-time procedure and has the potential disadvantage of an increased risk of a type I error incurred by performing a multiple testing procedure.
We advise determining control limits h for CUSUM charts either by restricting the simulated probability of a type I error over a time frame or by restricting the in control average run length of the charts.The first method may be preferred due to the lower computational requirements.
Ideally, the baseline hazard rate should be determined on a data set which is known to be in control.Realistically, this is unlikely to be feasible in many applications.The practice of considering the national average rate of failure to be in control is often sufficient.An important consideration is that any major change in the distribution of risk factors in the population will require a recalculation of control limits.Whereas information on the failure of patients can be collected in real time, the aggregation of such data over multiple hospitals is not likely to happen in real time.If the risk distribution has changed over this frame of time, it might be necessary to reconstruct the CUSUM charts, possibly leading to new or different detections.

Limitations
In the considered model, we assume that observations can only be right-censored.This is because in the setting of arthroplasty surgery left and interval censoring are of little interest.The same is not true for competing risks mechanisms.Begun et al. [2019] have considered a similar procedure to Biswas and Kalbfleisch [2008] with the addition of frailty terms and competing risks, allowing for dependent competing risks.Even though they could not find an indication that the competing risks of death and revision surgery are dependent in their data, their methods can be carried over to our procedure as well.Should we be interested in detecting a decrease in the rate of failure using the BK-or CGR-CUSUM, a two-sided procedure as suggested by Page [1954] can be considered where the hypotheses of θ = 0 against θ = θ 1 < 0 are used for constructing the likelihood ratio.This yields the CUSUM charts with switched positive and negative signs.

Future work
Additions to the CGR-CUSUM and BK-CUSUM should be considered.Notably, the power of the unrestricted CGR-CUSUM was lacking for hospitals with a low volume of patients.This is largely due to the (relatively) very high value of the control limit of the CGR-CUSUM (see Table 3).These values are so high because the CGR-CUSUM will often have an initial spike upwards when the first failure is observed due to the maximization over previous patients (i.e.all patients before the first failed patient will be ignored).When the volume of patients or failure rate is low, this leads to a large uncertainty in the determination of the MLE θ(t).To counteract this, in Section 3 we introduced the upper limit θ(t) ≤ ln(6).Another solution to the problem would be to impose a time-dependent control limit which is large at the start of the study and decreases until it reaches a fixed value, allowing the CGR-CUSUM to converge before yielding detections.A patient shuffling or weighing mechanism can be added to the CGR-CUSUM chart in order to yield quicker detection in the case of clustered failures in the past.For this, the mechanisms used by Steiner and Jones [2009] and Grigg [2018] can be used as inspiration.Finally, a mechanism where patients have periods when they are not at risk of failure can be incorporated into the chart as well.

Software
Software in the form of an R package, together with a sample input data set and complete documentation is available on CRAN at https://cran.r-project.org/package=success.  1: Average/Median run length, as well as standard deviation and approximate ARL (determined using ( 10) and ( 9)) for two BK-CUSUM with exp(θ) = 1.4 and 1.8, as well as the CGR-CUSUM (exp(θ) = 1) and CGI-CUSUM exp(θ) > 1.Each of the quantities has been determined on a sample of N = 3000 hospitals with hazard ratio exp(θ).3: Control limits determined on a sample size of N = 500 in control (e θ = 1) hospitals such that the type I error in 6 years α ≈ 0.1.with ψ the rate of arrivals and e θ Λ i (t) the true risk-adjusted subject specific cumulative intensity of failure at the institute of interest.

BK-CUSUM
Proof.We consider a hospital with hazard rate e θ times the baseline hazard rate.We define: where: Assume that patients arrive according to a homogeneous Poisson process with rate ψ > 0. We choose to consider the lifetime of all patients up until the time of failure (or censoring).The first step is to calculate the expected value of dΛ(u): Here we use that T i = S i + X i .Then using the law of total expectation twice (conditioning first on Z i and then on S i ): We obtain using Fubini's Lemma that: We define, similarly to above, the notation: We assumed that f θ i and h θ i are non-negative and Borel measurable, therefore using Fubini's Lemma once again we obtain that E[Λ(t)] = t 0 γ u du.

Fisher Information
We derive the Fisher information contained in all observations: Lemma 2. The Fisher information in all observations at time t > 0 is given by: Proof.The likelihood function for all observations is given by: (t) .
The log likelihood ratio is then given by: Taking the derivative w.r.t.θ yields: And the second derivative: Finally, the Fisher information is given by: = e θ e −θ ψ t 0 where we have used our result from Lemma 1.

Convergence of MLE
In this section we reiterate a result from Hoadley [1971] and transform it so it becomes usable in our problem at hand.Lemma 3. Let t > 0 and suppose we have n patients.Let {f θ i , i = 1, ..., n} meet conditions (N1)-(N9) of Hoadley [1971].Then, as n → ∞: with I(θ, t) = I(θ,t) n and I(θ, t) the Fisher information in all observations at time t.Moreover, assuming that n = ψ • t we have that: Proof.Section 4 of [Hoadley, 1971] tells us that under conditions (N 1) − (N 9) as stated in the article our first statement holds.Most of these conditions are likely to hold when h θ i is continuous and twice differentiable.Finally, as patients arrive according to a Poisson process with rate ψ, it is reasonable (especially when ψ is large) to assume that n = t • ψ, as we expect to see ψ patients arrive per time unit.Using this relation we then have that n → ∞ implies that t → ∞ (remember that ψ is constant), therefore using the properties of a normal distribution we obtain the second statement.

A martingale approach
A result similar to Theorem 1 can be proved using a Martingale approach.Theorem 2. As t becomes large, the expected value of the CGI-CUSUM chart is approximated by: E[CGI(t)] ≈ (θe θ − e θ + 1)Λ(t) Proof.The CGR-CUSUM can be written as: Lemma 3 tells us that √ t θ(t) − θ converges to a normal distribution with mean zero and strictly decreasing variance as t becomes large.Applying the delta method we obtain that √ t e θ(t) − e θ converges to a normal distribution with mean zero and strictly decreasing variance e 2θ ψI(θ,t) as t becomes large.As t becomes large, the first two terms of Equation ( 11) will therefore become small.We can then approximate: When θ > 0, e θ Λ(t) is the compensator of N (t) and M (t) := N (t) − e θ Λ(t) is a zero-mean martingale as M (0) = 0 per construction (see Aalen et al. [2011] Section 2.2.5).We rewrite CGR(t): Again, we assume Λ(t) is constant with value E[Λ(t)] = e −θ I(θ, t) (using Lemma 1) to obtain: Equating this expression to h > 0 and solving for t we obtain the same expression for the average run length of the CGR-CUSUM as using the method above.
6 Standard simulation procedure • Step 1: Generating a training (in control) data set with N hospitals.
1. Choose (parametric) null cumulative baseline hazard rate or determine from existing data (for example, using R package survival [Therneau, 2021]).2. Generate patient arrival times in the required time frame using Poisson arrivals with rate ψ. 3. (Optional) Resample patient characteristics from data set.4. Determine (risk-adjusted) survival times for every patient using above chosen cumulative baseline hazard rate using the method described by [Bender et al., 2005]. 5. Repeat 2-4 N times.Combine into single data set.
• Step 2: Determining a suitable control limit h.
1. Determine a (parametric) cumulative baseline hazard rate using the generated in control data set.Optionally, use the cumulative hazard rate from step 1. 2. Construct the charts on the training data set.3. Determine control limit h such that required restrictions (on sensitivity or ARL under the null) are met for the collection of the constructed charts.
• Step 3: Generate test (out of control) data set.
1. Follow step 1 with θ > 0 as required. • Step 4: Evaluate charts on test data set 1. Construct the charts on the test data with the control limit determined in step 2.

Calculation of the Fisher information
There are two hurdles in the calculation of the Fisher information found in Section 2. First of all we need to calculate E Zi F θ i (s) .We propose a method to tackle this calculation in Section 7.1 as well as some examples for the PVF family of distributions.The second hurdle is then calculating the resulting integral t 0 E Zi F θ i (s) ds.For this step, an assumption must be made for the hazard rate.As an example, we calculate a closed form expression for the Fisher information for exponential failure times in Section 7.2.

Risk-adjustment
An approach which can be used to calculate the expected value of the risk-adjusted cumulative distribution function is by using Laplace transforms.
Lemma 4. Assuming that e Z i β β β ∼ U with U a general distribution, the Fisher information at time t is given by: I(θ, t) = ψt − ψ

Numerical computation
For covariate distributions which do not have an (easy) Laplace transform or hazard rates with no closed form expression, we suggest using the relationship: 1 − e −H(t) exp (θ) exp (Z i β β β) with H(t) the cumulative hazard rate.The resulting function can then be numerically integrated first with respect to exp Z i β β β and then with respect to t to obtain I(θ, t).Without risk-adjustment, exp Z i β β β can be left out of the equation.

Hospital nr.
Funnel Median (IQR) difference in detection speed (months) for hospitals detected by funnel plot in the first 3

Figure 1 :
Figure1: The (Bernoulli) CUSUM, BK-CUSUM and CGR-CUSUM charts for four hospitals with their control limits (same color/linetype).The control limits can be found in Table2.

E 0 E
Figure 1: Lexis diagrams of BK-CUSUM and CGR-CUSUM charts.The line segments in bold represent the information used by the chart if both charts were to detect the same change point (dotted vertical line).

Table 2 :
Difference in detection speed (months) of columns with respect to rows.Positive indicating quicker detection and negative indicating slower detection speeds.Values determined on hospitals detected by the funnel plot in the first 3 years, with missing detections omitted.