Abstract

In medical research, understanding changes in outcome measurements is crucial for inferring shifts in health conditions. However, traditional methods often struggle with large, irregularly longitudinal data and fail to account for the tendency of individuals in poorer health to interact more frequently with the healthcare system. Additionally, clinical data can lack information on terminating events like death. To address these challenges, we start from the continuous-time hidden Markov model which models observed data as outcomes influenced by latent health states. Our extension incorporates a point process to account for the impact of health states on observation timings and includes a ‘death’ state to model unobserved terminating events through a Poisson process, where transition rates depend on the latent health state. This approach captures both the severity of the disease and the timing of healthcare interactions. We present an exact Gibbs sampler procedure that alternates between sampling the latent health state paths and the model parameters. By including the ‘death’ state, we mitigate biases in parameter estimation that would arise from solely modelling ‘live’ health states. Simulation studies demonstrate that the proposed Gibbs sampler performs effectively. We apply our method to Canadian healthcare data, offering valuable insights for healthcare management.

1 Introduction

In medical research, observed changes in outcome measurement are used to infer changes in the underlying condition of the patient. Data must be collected over time (longitudinally) on a patient to characterize the effect of risk factors and to evaluate the effectiveness of treatments. For instance, understanding the progression of chronic diseases, such as chronic obstructive pulmonary disease (COPD), is important to inform early diagnosis, personalized care, and health system management. With no existing cure, the severity of COPD increases over time, producing more frequent exacerbations, and increasing use of healthcare (Burge and Wedzicha, 2003). This worsening disease trajectory defines COPD and many other chronic illnesses. Despite this natural history of worsening disease, health system indicators currently describe COPD from a cross-sectional perspective. These traditional measures of service volume, such as hospitalization rates, neglect the temporal disease progression within individuals that is central to managing COPD. Data from clinical and administrative systems have the potential to advance this understanding, but traditional methods for modelling disease progression are not well-suited to analysing high-dimensional data collected at irregular intervals. Many cases of severe illness can be traced back to a lack of appropriate and ongoing follow-up health care. To help understand the long-term causes of severe and preventable illness, we need to understand not only how many patients have severe illness but also what their ‘care paths’ or ‘trajectories’ were before they ended up with severe illness. Private and public healthcare systems now have the ability to maintain longitudinal digital health records. These data from sources, such as electronic health records, health systems and data from mobile health applications, allow the study of subject-specific trajectories, which can inform clinical and public health decision-making. However, data are typically recorded only when a subject interacts with a provider, resulting in irregularly spaced longitudinal observations, and the patterns of clinical interactions vary from subject to subject. Although the data allow dynamic monitoring of the underlying disease progression that governs the data recorded in the health system, this disease progression cannot be observed directly, and so inferential methods are needed to ascertain the latent progression. The statistical analysis of such data, especially when the cohort is large and observations are frequent, is complex.

A multi-state model captures the status of an individual longitudinally as a time-discretization of a continuous-time Markov process, and in many settings the common assumption that the health status of a patient falls into one of a finite number of categories (usually ordered by severity) is a plausible one. Inference for continuous-time Markov processes observed discretely and either precisely or partially has been well studied (see, for example, Bladt and Sørensen, 2005; Pfeuffer, 2017). However, health system records typically do not include direct information about underlying health status; instead, it is inferred indirectly through a proxy measure. Such a process can be modelled as a continuous-time hidden Markov model (CTHMM, e.g. Cappé et al., 2005), which has been studied recently in medical applications such as disease progression (Luo et al., 2021) and cancer screening (Klausch et al., 2023; Lange et al., 2015). While a CTHMM can address the issue of non-equidistant longitudinal observations, in medical research, it is possible that the timing of healthcare utilization itself is informative in the assessment of health status progression. For example, patients who are in more severe health states typically interact with the healthcare system more frequently. A common approach to modelling informative observation times is to use a Poisson process (e.g. Cook and Lawless, 2018). However, if the intensity of the Poisson process depends on the state of a hidden Markov process, it is often referred to as a Markov-modulated Poisson process (MMPP) (e.g. Fearnhead and Sherlock, 2006; Rydén, 1996). There have been studies modelling this observation time process using frequentist approaches, such as Lange et al. (2015) and Mews et al. (2023).

In addition, data from the claim and healthcare system provide no information for terminating events, such as death. However, appropriate parameter inference will only follow if we distinguish a lack of interaction with the healthcare system due to, for example, mild illness from a lack of interaction due to the subject being dead. Nevala et al. (2024) included the death data from a different data system to model colorectal cancer progression. However, to the best of our knowledge, there is no existing method to account for the terminating event, such as death, when it is not available in a dataset and event times are informative. In this article, we develop a Bayesian inference procedure that takes into account the observation process by modelling the observed data as an outcome process and the time of observations as a point process jointly alongside the CTHMM, given the underlying latent health state. Moreover, we also consider the death time as missing data, and model the death as an informative censoring. This extension allows us to model disease severity and death not only from the types of care received but also from the times between different observed events and their respective frequency. It also provides insights on how long patients tend to survive in each state.

The article is organized as follows. Section 2 demonstrate the model structure, followed by the likelihood construction in Section 2.1 and the forward–backward method to calculate the probability of each state at the observation time in Section 3. Section 4 layouts the Gibbs sampler to draw inference for the model, and some empirical studies, including simulation and real data analysis, are in Section 5. The article concludes in Section 6 with a summary of findings and discusses areas for future research.

2 A MMPP with an outcome process

In this section, we describe the continuous-time hidden Markov model with the observation time following a Poisson process. In the sequel, for any collection of scalars, a1,,aT or a1,,aT, we denote the concatenations by a1:T and a1:T, respectively. Thus, for a particular individual we observe the data o1:T:={o1,,oT} at time points τ1:T:={τ1,,τT}. We assume that the data arise as a consequence of a latent continuous-time Markov chain (CTMC) {Xs}s[0,τ] taking values on the finite integer set K:={1,2,,K}. We denote the generator of the latent process by Q and the initial distribution by ν. At observation time τt (t=1,,T), the latent process is in state Xτt, and the observation is ot.

Conditional on the latent process being in state kK, at time τt, the outcome observation ot is drawn from a distribution with a probability mass or density function ft(ot|Xτt=k)  fk(ot|zt,βk), where ztRD is a vector of covariates (including an intercept) and β is a D-vector of covariate effects. In all of our simulations, including the real-data example, the prior for βk, is chosen to be conditionally conjugate to fk so that it is straightforward to sample from the conditional posterior for each βk given Xτ1,,XτT. It is possible to extend inference to the non-conjugate case using the Metropolis–Hastings algorithm; see the online supplementary material for a brief discussion. Define the matrix B=(βd,k) for d=1,,D and k=1,,K as the regression coefficient matrix.

In addition, we allow the observation process itself (that is, the times that the observations are made) to be informative about the latent trajectory. Specifically, let Ns be the number of observations (events of the Poisson process) on [0,s], Ns=t=1T1(τts), so that N0=0 and Nτt=t. We assume that Ns follows a Poisson process whose intensity is λk when Xs=k. The joint process (Ns,Xs) is a MMPP, and inference approaches have been extensively studied in, for example, Rydén (1996) and Fearnhead and Sherlock (2006). To simplify notation, we write X[0,τ] for {Xs}s[0,τ] and N[0,τ] for {Ns}s[0,τ].

Conditional on Xs, the observations O1:T are assumed to be independent of each other and of Ns; also Ns is assumed to evolve independently of Xs except that its intensity is λXs. Therefore, the observation time τt is generated from the MMPP (Ns,Xs) and the outcome Ot is then generated from fXτt(ot|zt,βXτt). Figure 1 provides a schematic of the presumed data generating structure for one subject. Conditional on ν, Q, λ:=(λ1,,λk) and B, data for each subject are assumed to be generated independently of the data for all other subjects. Ideally we observe the process over some time interval [0,τ] for some τ>τT and with τ1>0. However, for each patient, our data start with a diagnosis of COPD, so τ1=0.

Schematic of the presumed data generating mechanism for the Markov-modulated Poisson process with an outcome process Oτt. Xs represents the process underlying the observed data, with observation (event) time points denoted τt,t=1,…,T for a state-dependent Poisson process Ns; Xs represents the hidden Markov process.
Figure 1.

Schematic of the presumed data generating mechanism for the Markov-modulated Poisson process with an outcome process Oτt. Xs represents the process underlying the observed data, with observation (event) time points denoted τt,t=1,,T for a state-dependent Poisson process Ns; Xs represents the hidden Markov process.

2.1 Likelihood for the MMPP with an outcome process

We are interested in inference for the data generated from the MMPP with an outcome process. We now construct the complete-data likelihood for one subject over the time interval [0,τ]. Since subjects are independent, the likelihood for all subjects is the product of the individual likelihoods.

The likelihood function for Q and ν is (Bladt and Sørensen, 2005)

where Nl,m is the number of transitions from state l to state m in the time interval [0,τ], and Rl is the total time that the process has spent in state l in [0,τ]. The values of ql,m are the elements in Q. In addition, the likelihood function for λ is

where Ni is the number of events at state i. Note that the quantities Nl,m, Rl, and Ni all are unobserved, but they can be computed when the underlying process X[0,τ] is known.

The complete likelihood, derived from o1:T, τ1:T and X[0,τ] can be factorized:

Thus,

(1)

If independent Gamma-distributed priors are assigned to all of the components λi and ql,m (see Section 4.1) then, conditional on the complete chains for all individuals, the posteriors for these parameters also have independent Gamma distributions. This permits the use of the Gibbs steps described in Section 4.2. An alternative partial imputation, which necessitates Metropolis–Hastings updates for these parameters, is discussed in the online supplementary material.

2.2 Data-induced window restriction

Our data are of interactions between patients with COPD and the health service during a 17-year window between 1st January 1998 and 31st December 2014. If a patient was diagnosed with COPD within the window then the first observation for that patient is the time of diagnosis. The patient may have interacted with the health service within the 17-year window and before they were diagnosed with COPD, but we do not know about these interactions. Therefore, for any given patient, we set τ1=0; we start the clock at the first interaction. Since we know there will always be an interaction at time 0, this observation does not contribute to the count Nk(τ) in (1). Of necessity, therefore, the initial distribution ν represents an average of the distribution of the state when an individual is first diagnosed with COPD and, for those already had with COPD at the start of 1998, the distribution of the state at their first interaction within the window.

For each patient, we do not have the time of death or, even, an indicator of whether or not death occurred before the end of 2014. If the last interaction with the health service was just before the end of the window then it seems likely the patient was still alive at the end of the window; however, if the last interaction was well before the end of the window and there had been a high frequency of interactions before this then it is likely that the patient died or moved away from the province shortly after the final interaction. In Section 5, for the data analysis and one of the simulation studies, we circumvent this issue by introducing an additional state for the hidden chain, corresponding to death (or moving away). This is an absorbing state and there are no observations whilst in this state.

Figure 2 shows three example patients’ health trajectories. The observation window is between 0 and τ, and assume that all patients have an observation at time 0. Patient 1 died right before τ, and we have the record for the last observation but do not have information for the death. Patient 2, on the other hand, has one additional interaction after the observation window and died afterwards. For these patients, the death happens close to the observation window. Modelling death does not have much impact on estimates of Q or λ. However, Patient 3 has an early death, and the trajectory for observed data is only half of the observation window. For these patients who die early, in particular, we need to model death instead of, for example, assuming the patient is alive until the end of the window.

Example of patients’ health trajectories. Circle represents the observation, and cross represents the death (missing). The end of the observation window is at time τ.
Figure 2.

Example of patients’ health trajectories. Circle represents the observation, and cross represents the death (missing). The end of the observation window is at time τ.

3 The forward–backward algorithm

Before we discuss inference for the model, we first derive the specifics of the general forward–backward algorithm (Baum and Petrie, 1966) during the window [0,τ] which enable us to simulate the unobserved Markov process X[0,τ].

In this section, it will be clearer to denote the end of the observation window by τe rather than τ and to define XτT+1Xτe. Note that τ1:T are random observation times but τe is the fixed end of the window. Just as the event τ1:T is equivalent to having observation-process events at times τ1:T and at no other times in [τ1,τT], the event τ1:T,τe implies, additionally, that there are no observation process events in the interval (τT,τe]. Recall, also, that our data format forces us to set τ1=0, so that the fact that an event happens at time 0 does not form part of the data; τ1 now has the same purpose as τe, defining the boundary of the observation window. However, since there was an event at time τ1 there is an observation, o1, which does need to be accounted for.

We use the notation j:k to indicate all indices between and including j and k; for example oj:k(oj,oj+1,,ok). Thus, we first define the forward variable,

In our case, τ1=0 is guaranteed, so α1,k=νkf(o1|Xτ1=k). Then for t=2,,T,

Further, αe,k=i=1KαT,iP(τe,Xτe=k|XτT=i).

To enact the above recursion we must calculate the quantities, P(τt,Xτt=k|Xτt1=i), t=2,,T, and P(τe,Xτe=k|XτT=i). For t=2,,T, this is the probability of the next event after time τt1 being at time τt and the chain being in state k at this time, all given that the chain was in state i at time τt1. The final quantity differs from this in that there has been no event from just after τT, up to and including τe.

Consider a new process {Yt}tτt1 on {1,,K,}. The absorbing state occurs as soon as there has been at a new event in the Poisson process Ns. Specifically, at time τt1 the process is in state i; i.e. Yτt1=Xτt1. Up until the next event in the Poisson process, Ns, we continue to have Ys=Xs, but from the time of the next Poisson event onwards, Ys=. The process {Ys}sτt1 is a CTMC (Mark and Ephraim, 2013), and its generator is

where Λ=diag(λ1,,λK). Writing Yτ for limsτYs, and defining ηt1,t,i,k:=  [exp{(QΛ)(τtτt1)}]i,k, we see that

Similarly,

Therefore, the recursions for the forward variable are

with αe,k=i=1KαT,iηT,e,i,k.

From its definition, ae,kP(Xτe=k|o1:T,τ1:T,τe) so we may sample a value for xτe with the probabilities of the individual states proportional to the elements of ae. Suppose that we have simulated xτe,xτT,,xτt+1 from their joint distribution given o1:T, τ1:T and τe. We now describe how to simulate xτt backward from its conditional distribution given xτe,xτT,,xτt+1, o1:T, τ1:T and τe. From the hidden Markov structure, conditional on xτt+1, all random variables with time indices before t+1 are independent of those with indices after t+1, so

where τT+1 is understood to be τe. Hence

again using the conditional independence structure. Thus, for t=1,,T1,

since k is known. For t=T, the constant λk term does not appear in the first place, leading to the same result. Finally,

The above recursions enable us to simulate Xτt, t=e,T,,1, conditional on B,ν,Q,λ, o1:T and τ1:T and (no further observations up to) τe. To simulate from X[0,τ], we also need to fill in the gaps between the observation times. The conditional dependence structure of the model means that given Xτt1 and Xτt, X[τt1,τt] is independent of O1:T and of Xτ1,,Xτt2 and Xτt+1,,XτT. Furthermore, given τt1 and τt, X[τt1,τt] is independent of τ1:T. However, the fact that two neighbouring event times, τt1 and τt bracket the time interval of interest means that there are no Ns events in (τt1,τt).

Combining the above information, to simulate from X(τt1,τt) given o1:T,τ1:T and Xτ1,,XτT with Xτt1=j and Xτt=k, we must simulate from Y(τt1,τt) given Yτt1=j and Yτt=k. We employ the uniformization method of Fearnhead and Sherlock (2006) and Rao and Teh (2013).

4 Bayesian inference for the MMPP with an outcome process

We are interested in Bayesian analysis of the MMPP with an outcome process. Bayesian inference for CTHMMs has been explored under various likelihood formulation, such as simulation-based (Luo et al., 2021) and approximation (Williams et al., 2020) methods. To the best of our knowledge, it has never been implemented for an MMPP model with an outcome process under a full Bayesian paradigm. We outline the steps for a Metropolis-within-Gibbs Markov chain Monte Carlo (MCMC) algorithm to draw posterior samples given suitable prior distributions for the parameters. Our scheme uses the complete data log-likelihood in (1) in the spirit of Luo et al. (2021) and Fearnhead and Sherlock (2006). Throughout, we assume that the number of states, K, is fixed and known.

4.1 Prior distributions

As stated in Section 2, the model for the outcome process when Xτt=k, fk(ot|zt,βk) is such that a family of conditionally conjugate priors exists; we choose the prior for each βk from this family, and set π0(B)=k=1Kπ0,k(βk). The initial distribution, ν, is given a Dirichlet prior, so that, conditional on X0=x0, its posterior is also Dirichlet.

For λ and Q, we choose

where Gam(x;a,b) is the density of a Gamma(a,b) random variable, evaluated at x. From (1), conditional on X[0,τ] and τ1:T, the posterior distributions for each λk and qj,k (jk) are mutually independent and all have Gamma distributions.

For λ and Q, it is tempting to represent the absence of knowledge about each parameter through a prior with a low shape parameter, a<1. However, any model with K states includes a model with K1 states, with a particular state, i, never visited. In this case, the data, o1:T and τ1:T, provide no further information on λi and qi,k (ki). Thus, the posteriors for all λ and Q parameters are a mixture including at least one component that is simply the prior. If the prior shape parameter for that parameter is below 1, the prior density rises to as the parameter tends to 0, and, because the posterior is a mixture density that includes the prior, the posterior density also rises to as each parameter 0, which usually does not represent our prior belief and, moreover, can lead to poor mixing of the MCMC chain.

4.2 Exact Gibbs sampler

Each step of our Gibbs sampler simulates a parameter or parameter vector, or an aspect of the latent process, X[0,τ], from its conditional distribution given all of the other information, including τ1:T and o1:T.

The presentation in Sections 2 and 3 focused on a single individual. To make the full process clear, let there be N subjects, and let the nth have Tn observations o1:Tnn at times τ1:Tnn. For subject n, we shift time so that τ1n=0, set τn=ττ1n, where τ corresponds to the end of 2014 and define the latent path to be X[0,τn]n. Each latent path X[0,τn]n gives rise to summary statistics Nl,mn, Rln, and Nin as defined for a specific subject in Section 2.1. We write Θ=(B,λ,Q,ν) and Dn={Tn,τ1:Tnn,O1:Tnn}.

Following Fearnhead and Sherlock (2006); Luo et al. (2021), the Gibbs sampler has the following distinct steps.

  1. For each n in 1,,N, simulate (xτ1nn,,xτTnnn,xτnn) from its conditional distribution given Θ, Dn via the steps in Section 3.

  2. For each n in 1,,N and each t in 1,,Tn, simulate x(τtn,τt+1n) (identifying τT+1n with τn) via the direct-sampling method of Hobolth and Stone (2009) or uniformization in Fearnhead and Sherlock (2006) and Rao and Teh (2013). This provides x[0,τn]n and, hence, the summaries Rln, Nl,mn and Nln, n=1,,N.

  3. Update ν from its conditional Dirichlet posterior given x01:N.

  4. For l=1,,K and ml, update ql,m from its conditional Gamma posterior given Rl1:N and Nl,m1:N.

  5. For k=1,,K, update each λk from its conditional Gamma posterior given Rk1:N and Nk1:N.

  6. For k=1,,K, update βk from its conditional posterior given xτ111,,xτT111,,  xτ1NN,,xτTNNN.

Our first simulation study involves a two-state chain and allows us to demonstrate the utility of both forms of information: the event times and values observed at these times. Our second simulated example, and our real-data example consider a three-state chain, with the third state being death and where the state can only increase. The non-zero transition rates are q1,2, q1,3, and q2,3, and the non-zero observation-process rates are λ1 and λ2.

5 Empirical studies

5.1 Example 1

In this example, we demonstrate the performance of the proposed Gibbs sampler, and investigate the convergence, mixing and inferences on parameters under different scenarios.

For each of N=50 subjects, we simulate a realization of a two-state latent process without the death, X[0,5], using a generator of

and an initial distribution of ν=(0.8,0.2). The outcome process is Gaussian, OtN(βk,1), where βk is a scalar, so that B=(β1,β2). For each individual, the time window is [0,5]; individuals are started from the initial distribution, ν, and the trajectory, events and observations simulated for the entire window. Thus, here, in contrast with our real-data scenario, there is no observation at time 0, and the first forward variable is α0,k=νk.

We consider four scenarios in this example.

  • Scenario I: B=(1,1) and the point process rate λ=(4,12).

  • Scenario II: B=(1,1) and the point process rate λ=(8,8).

  • Scenario III: B=(0.8,1) and the point process rate λ=(4,12).

  • Scenario IV: We use the same B and λ as Scenario III but perform Bayesian inference using a CTHMM (Luo et al., 2021), ignoring the point process.

In Scenario I, both the Poisson intensity and the Gaussian expectation are distinct between states, whereas in Scenario II, while the Gaussian expectations are distinct, the Poisson intensities are the same. In Scenario III, the Gaussian expectations are similar but the intensity rates distinct, while Scenario IV is extremely challenging, with no information from the point process and very little information from the outcome process.

We place non-informative priors on all the parameters in this study. Each of λ1, λ2, q12, and q21 have independent Gamma(1,1/8) priors. We place a N(0,10,000) prior on each element of B, and a Beta(1,1) prior for ν.

Trace plots, auto-correlations plots, and histograms for λ1 and q12 are shown in Figures 3 and 4; Integrated auto-correlation times (IACTs) for λ1, λ2, q1,2, q2,1, β1, and β2 are provided in Table 1. In Scenario I, where the data are generated from a model characterized by clear separation between the states, the Gibbs sampler performs well, with fast convergence and mixing. In Scenario II, where the timing of the events provides no information about the state, the Gibbs sampler’s performance declines slightly, though it still remains promising. The posterior distribution of λ1 is slightly off the true value. Challenges arise when the data are generated from a less well-separated outcome process, leading to notable auto-correlations in the posterior samples, particularly those associated with Q. The reduction in mixing efficiency is considerable; however, the chain does still mix adequately over the 20,000 iterations shown. By contrast, in Scenario IV, when the point-process information is removed from Scenario III, the exploration of the posterior over the 20,000 iterations is so poor that we would judge that the chain has perhaps not yet even converged. Moreover, the posterior for q1,2 includes values that are not strongly supported by the data. This illustrates the most important point: especially in cases where the outcome process is not particularly informative, the information from the timings of the events provides an important contribution to the posterior and, possibly because of the tighter posterior and more clearly defined paths for each X[0,τ], improves the mixing of the chain.

Example 1: Trace, ACF plots, and histogram for λ using the exact Gibbs sampler for different scenarios.
Figure 3.

Example 1: Trace, ACF plots, and histogram for λ using the exact Gibbs sampler for different scenarios.

Example 1: Trace, ACF plots, and histogram for q12 using the exact Gibbs sampler for different scenarios.
Figure 4.

Example 1: Trace, ACF plots, and histogram for q12 using the exact Gibbs sampler for different scenarios.

Table 1.

Example 1: Integrated auto-correlation times (IACTs) of the posterior sample for each parameter in each scenario

Scenarioλ1λ2q1,2q2,1β1,1β1,2
I3.45.07.88.45.44.6
II3.94.812.410.77.17.1
III27.422.952.747.713.314.5
IVn/an/a(257.0)(261.4)(276.2)(384.0)
Scenarioλ1λ2q1,2q2,1β1,1β1,2
I3.45.07.88.45.44.6
II3.94.812.410.77.17.1
III27.422.952.747.713.314.5
IVn/an/a(257.0)(261.4)(276.2)(384.0)

Note. Bracketed terms indicate that the effective sample size was <100, so there is reduced confidence in the estimate of the IACT.

Table 1.

Example 1: Integrated auto-correlation times (IACTs) of the posterior sample for each parameter in each scenario

Scenarioλ1λ2q1,2q2,1β1,1β1,2
I3.45.07.88.45.44.6
II3.94.812.410.77.17.1
III27.422.952.747.713.314.5
IVn/an/a(257.0)(261.4)(276.2)(384.0)
Scenarioλ1λ2q1,2q2,1β1,1β1,2
I3.45.07.88.45.44.6
II3.94.812.410.77.17.1
III27.422.952.747.713.314.5
IVn/an/a(257.0)(261.4)(276.2)(384.0)

Note. Bracketed terms indicate that the effective sample size was <100, so there is reduced confidence in the estimate of the IACT.

5.2 Example 2

In this example, we demonstrate valid inference for simulated data with a similar pattern to our real data. We generate the data from a three-state model which is similar to that used for our real data, with N=500 subjects. For each subject, we simulate a realization of a three-state latent process with the death as the absorbing state, X[0,10],

so trajectories can only progress forward with no chances to transit back to previous states. The outcome process is OtPoisson(μt) with logμt=B1,k+ztB2,k when in state k, and where the time-varying covariate is ZtBernoulli(0.65); we set

This is equivalent to Ot|(Zt=1)Poisson(μ1,k) and Ot|(Zt=0)Poisson(μ0,k), with logμ1,k=B1,k+B2,k and logμ0,k=B1,k. In this case, the outcome expectations (marginalized over the covariate) are 0.35μ0,k+0.65μ1,k, which are 0.46 and 1.71 for the two states, respectively. The point process rate is set as λ=(6,10,0). State 3 is the death state without any observed outcome. If the patient progresses to the absorbing state before the observation window, i.e. death, then the last observation is the final one before death. Otherwise, the last observation is the one right before the observation window, τ=10. We assumed that the first observation is made at time 0 according to the initial distribution. Similar to Example 1, we place non-informative priors on all the parameters in this study. Each of λ1, λ2, q12, and q21 have independent Gamma(1,1/8) priors. We place independent Gamma(0.1,0.1) priors on each of the four Poisson expectations, μi,k, i=0,1, k=1,2, and a Beta(1,1) prior for ν1.

Table 2 presents the integrated auto-correlation times (IACTs) for λ, Q, and B over 20,000 iterations and shows that all of these parameters mix well. Figure 5 presents a histogram of the marginal posterior for each of these parameters, with each true parameter value falling within the main posterior mass. This example demonstrate that by simulating the data-induced window similar to the real data, we can achieve valid inference.

Example 2: marginal histograms of q12, q1,3, q2,3, λ1, and λ2; the associated vertical red lines depict the true parameter value.
Figure 5.

Example 2: marginal histograms of q12, q1,3, q2,3, λ1, and λ2; the associated vertical red lines depict the true parameter value.

Table 2.

Example 2: Integrated auto-correlation times (IACTs) of the posterior sample for each parameter

λ1λ2q1,2q1,3q2,3β1,1β1,2β2,1β2,2
3.03.03.23.53.13.43.73.23.0
λ1λ2q1,2q1,3q2,3β1,1β1,2β2,1β2,2
3.03.03.23.53.13.43.73.23.0
Table 2.

Example 2: Integrated auto-correlation times (IACTs) of the posterior sample for each parameter

λ1λ2q1,2q1,3q2,3β1,1β1,2β2,1β2,2
3.03.03.23.53.13.43.73.23.0
λ1λ2q1,2q1,3q2,3β1,1β1,2β2,1β2,2
3.03.03.23.53.13.43.73.23.0

5.3 Application: Canadian healthcare system data

In this section, we delve into the proposed methodology designed to enhance our understanding of the progression of chronic diseases, particularly COPD. The significance of this understanding lies in its potential to help with effective management of healthcare systems and pave the way for methods of early diagnosis and personalized care, as the model can help the management team understand the ‘care path’ or ‘trajectory’ of the cohort before they become severely ill. While clinical and administrative data harbour valuable insights, traditional methods for modelling disease progression face limitations when dealing with data collected at irregular intervals and interaction frequency, such as when patients choose to engage with the healthcare system.

To address these challenges, our proposed methodology is applied to a COPD cohort derived from healthcare administrative data. An open, dynamic cohort was established, comprising a 25% random sample of Montreal residents in Canada. In 1998, a similar random sample was drawn from the Régie de l’assurance maladie du Québec (RAMQ)-registered population within the census metropolitan area of Montreal. Subsequently, at the onset of each subsequent year, 25% of individuals born in Montreal or relocating to Montreal within the past year were sampled to maintain cohort representativeness. Follow-up concluded upon death or a change in residential address outside Montreal; however, neither death nor the time of address change is recorded in our data. This administrative database encompasses outpatient diagnoses and procedures submitted through RAMQ billing claims, as well as inpatient claims for procedures and diagnoses. Drug dispensing data are available for individuals covered by drug insurance through RAMQ, which includes roughly half the population, especially all those aged 65 and above. All data are interconnected through an anonymized version of the RAMQ number, and administered by the Surveillance Lab at McGill Clinical & Health Informatics research group.

Utilizing established case definitions based on diagnostic codes (Lix et al., 2018), a total of 76,888 COPD patients were enrolled, with an incident event (ICD-9 491x, 492x, 496x; ICD-10 J41–J44) occurring after a minimum of 2 years at risk with no events. The observation period spanned from January 1998, commencing with the patients’ first diagnosis, until December 2014. Physicians observed these patients solely during medical visits, which transpired when patients chose to interact with the healthcare system. In these medical visits, pertinent medical information was gathered, and from this observed information, we aim to infer the patients’ unobserved disease status, which is represented using a discrete disease state model. In this study, the disease status was indirectly gauged through a proxy: the number of prescribed medications at the time of observation. Notably, this information was available exclusively for patients with drug insurance, leading us to focus on patients aged over 65 years, constituting the study population for our analysis. This full dataset has been previously analysed in Luo et al. (2021). Due to the data restriction, we are allowed to randomly sample 1,000 patients to demonstrate our proposed method. Figure 6 shows one COPD patient’s follow-up from this dataset. As shown in the figure, we observe the number of drugs prescribed as the observation process. In addition, we have also observe variations in the type of encounters, i.e. general practitioner visit (GP), hospitalization (HOSP), specialist visit (SPEC), and emergency department visit (ED), which will be incorporated as time-varying covariates in our model. These changes in the observation point process, transitioning from sparse to dense, reflect the evolving severity of the chronic disease under study. In the dataset, the average number of drugs prescribed from a visit for GP, ED, HOSP, and SPEC is 5.06, 5.57, 5.28, and 5.17 respectively, suggesting little deviation among the types of healthcare utilization. Over the 1,000 patients, the average time from the first recorded interaction until the last observation is about 13.71 years.

COPD patient’s trajectory from the Canadian healthcare system data.
Figure 6.

COPD patient’s trajectory from the Canadian healthcare system data.

We apply the model from Section 2 to this sample dataset, with the inference methodology from Section 4. We focus on a three-state model with State 3 (the unobserved death state) as an absorbing state. We place independent Gamma(1,1/8) priors on each λi and all elements in Q, and a Gamma(0.1,0.1) prior on each expected number of drugs dispensed for each healthcare utilization, and a Beta(1,1) prior for (ν1,ν2), with ν3=0. The outcome process is modelled with a Poisson distribution, with the types of healthcare utilization (HOSP, SPEC, GP, and ED) as a time-dependent covariate; equivalently, for each of States 1 and 2 there are four parameters, indicating the expected number of drugs dispensed for each of the four values of the covariate.

We ran the algorithm for 20,000 iterations; for all parameters, the estimated IACTs were <5, indicating excellent mixing of the MCMC algorithm. Table 3 displays the estimates for all of the model parameters. The narrow 95% credible intervals associated with these estimated parameters indicate low uncertainty within the estimated densities. The posterior median for the number of prescribed drugs rises by about a factor of four from State 1 to State 2, reflecting the worsening severity of patients’ health conditions. In State 2, expected drug dispensations for emergency department visits and hospitalizations are slightly higher than other types of visits. Point process rates are roughly 35% higher in State 2 than in State 1, suggesting increased healthcare system interactions during poorer health conditions. This indicates a correlation between disease severity and healthcare utilization, with State 2 indicating the higher level of healthcare engagement among patients. In the state transition dynamics, the median stay in State 1 is roughly 2.3 years and the probability that a patient is still in State 1 after 4 years is approximately 0.30. When a transition from State 1 occurs, the probability that it will be directly to the death state is 0.23. The estimates for State 2 suggest a more persistent state (median stay being over 4 years) and potentially more severe state of illness. Patients in State 1 are less likely to die (or move from the province) in any given year, compared to those in State 2. The median time to death in State 2 is 4.3 years. We also notice that ν1ν2 meaning that conditional on a healthcare visit, patients are almost equally likely to start in each of the two states in the healthcare system.

Table 3.

Posterior median of MMPP parameters associated with 95% credible intervals

ParameterState 1State 2
λ (Marked point process rate)5.07 (4.97, 5.16)6.82 (6.72, 6.93)
ν (initial state distribution)0.54 (0.51, 0.58)0.46 (0.42, 0.49)
Expected number of drugs dispensed (GP)2.12 (2.10, 2.14)7.28 (7.26, 7.31)
Expected number of drugs dispensed (ED)1.72 (1.67, 1.78)7.47 (7.43, 7.53)
Expected number of drugs dispensed (HOSP)1.77 (1.72, 1.81)7.36 (7.31, 7.41)
Expected number of drugs dispensed (SPEC)1.79 (1.75, 1.85)7.16 (7.10, 7.21)
q1. (Transition rate from State 1)0.23 (0.21, 0.25)
q.3 (Transition rate to death state)0.07 (0.06, 0.08)0.16 (0.14, 0.17)
ParameterState 1State 2
λ (Marked point process rate)5.07 (4.97, 5.16)6.82 (6.72, 6.93)
ν (initial state distribution)0.54 (0.51, 0.58)0.46 (0.42, 0.49)
Expected number of drugs dispensed (GP)2.12 (2.10, 2.14)7.28 (7.26, 7.31)
Expected number of drugs dispensed (ED)1.72 (1.67, 1.78)7.47 (7.43, 7.53)
Expected number of drugs dispensed (HOSP)1.77 (1.72, 1.81)7.36 (7.31, 7.41)
Expected number of drugs dispensed (SPEC)1.79 (1.75, 1.85)7.16 (7.10, 7.21)
q1. (Transition rate from State 1)0.23 (0.21, 0.25)
q.3 (Transition rate to death state)0.07 (0.06, 0.08)0.16 (0.14, 0.17)
Table 3.

Posterior median of MMPP parameters associated with 95% credible intervals

ParameterState 1State 2
λ (Marked point process rate)5.07 (4.97, 5.16)6.82 (6.72, 6.93)
ν (initial state distribution)0.54 (0.51, 0.58)0.46 (0.42, 0.49)
Expected number of drugs dispensed (GP)2.12 (2.10, 2.14)7.28 (7.26, 7.31)
Expected number of drugs dispensed (ED)1.72 (1.67, 1.78)7.47 (7.43, 7.53)
Expected number of drugs dispensed (HOSP)1.77 (1.72, 1.81)7.36 (7.31, 7.41)
Expected number of drugs dispensed (SPEC)1.79 (1.75, 1.85)7.16 (7.10, 7.21)
q1. (Transition rate from State 1)0.23 (0.21, 0.25)
q.3 (Transition rate to death state)0.07 (0.06, 0.08)0.16 (0.14, 0.17)
ParameterState 1State 2
λ (Marked point process rate)5.07 (4.97, 5.16)6.82 (6.72, 6.93)
ν (initial state distribution)0.54 (0.51, 0.58)0.46 (0.42, 0.49)
Expected number of drugs dispensed (GP)2.12 (2.10, 2.14)7.28 (7.26, 7.31)
Expected number of drugs dispensed (ED)1.72 (1.67, 1.78)7.47 (7.43, 7.53)
Expected number of drugs dispensed (HOSP)1.77 (1.72, 1.81)7.36 (7.31, 7.41)
Expected number of drugs dispensed (SPEC)1.79 (1.75, 1.85)7.16 (7.10, 7.21)
q1. (Transition rate from State 1)0.23 (0.21, 0.25)
q.3 (Transition rate to death state)0.07 (0.06, 0.08)0.16 (0.14, 0.17)

We also ran the Gibbs sampler without the point process as with Scenario IV in Example 1. The parameter estimates are reproduced in the online supplementary material. They show that ignoring the point process information makes patients seem to remain in State 1 for longer. The other substantial difference is that the transition rate to the death state from State 1 is slightly greater than that from State 2. We also fitted a four-state model, again with only transitions to higher states allowed, and now with State 4 (the unobserved death state) as an absorbing state. We compared a Laplace-based approximation of its marginal likelihood with a similar approximation to the marginal likelihood of the three-state model; see the online supplementary material for details. The estimated marginal likelihoods for three-state and four-state models are 2,702.93 and 2,687.05, respectively, leading to corresponding BICs of 5,502.57,5,612.19. This shows that the three-state model is preferable; results for the four-state model are also presented in the online supplementary material.

This inferred state dynamic provides valuable insights for healthcare management, offering guidance on the performance of current COPD interventions. The tendency for patients in State 1 to remain in that state initially before transitioning to State 2 underscores the need for interventions aimed at preventing or delaying disease progression, particularly during the early stages. For patients already in State 2, the observed persistence suggests a potentially more challenging treatment landscape, emphasizing the importance of comprehensive and long-term management strategies. Understanding these state transition dynamics informs healthcare providers and policymakers about the effectiveness of existing interventions and guides the development of targeted strategies to improve patient outcomes. By identifying critical transition points and patient populations at higher risk of disease progression, healthcare resources can be allocated more effectively to provide timely and appropriate interventions, ultimately improving the management and care of individuals with COPD. However, it is important to note that there is no guarantee that the latent state and trajectory groupings will directly correspond to the underlying biologic processes or disease stages as our initial model is built without clinical measurements. Therefore, the identified states correspond to the general health conditions of COPD patients, implying that a higher disease level or intensity of healthcare encounters does not strictly indicate a deterioration in COPD but could signify any temporary or prolonged period of poor health condition. By using this Canadian cohort dataset, the initial latent groupings are expected to generate hypotheses about how the patients are being managed.

6 Discussion

In this article, we have considered a generalization of both the hidden Markov model and the MMPP to an MMPP with an outcome process, and we have proposed an exact Gibbs sampler to facilitate Bayesian inference. Our data model also incorporates an unobserved terminating event (death or moving away from the area).

This advance is driven by the need to analyse longitudinal data and model an unobserved death so as to comprehend the progression of a disease processes. A critical decision revolves around whether observation times are considered non-informative or informative, on the state of the Markov chain, which relates to the disease severity. Neglecting to account for an informative observation process in the disease dynamics or to incorporate an unobserved terminating event, such as death, can potentially lead to biased parameter estimates. With the large amount of claim data and healthcare data available, there is a pressing need to work on appropriate statistical modelling framework for longitudinal healthcare studies. The proposed technique for inferring death prevents the bias that would arise if a model without death were employed with a shortened observation window only up to the final event time. Our approach offers a framework for analysing complex healthcare data, providing insights into disease progression and patient health/death states that can inform clinical decision-making and healthcare interventions.

Since our algorithm is a Gibbs sampler, it requires no tuning parameters and could be used ‘off the shelf’ by practitioners. Simulation studies indicate the effectiveness of the Gibbs sampler and show how the event times contribute vital information, especially when the outcome process lacks a clear separation between the states. By integrating observation times and death into the model, we enhance our understanding of disease dynamics and healthcare utilization patterns, thus advancing the methodology for analysing longitudinal data in healthcare research. In our application, we imposed a constraint that the process can only progress forward since COPD gets progressively worse over time (Cleveland Clinic, 2024). However, one could make no restrictions on the allowed transitions in the state space. In the absence of other information, the hidden state would then be non-identifiable with all permutations of the state labels equivalent. To avoid the ensuing identifiability issue, meaningful priors could be placed on the outcome models for each state or the outcome process event rates—that is, we would distinguish the states via their impact on the observable quantities or imposing certain restrictions on Q (see Section 4 in Luo et al., 2021 for a detailed discussion).

Although the proposed model possesses a flexible structure, there remains some scope for future research to explore several possibilities in analysing such irregularly spaced longitudinal data. The current setup assumes time-homogeneous processes Xs and Ns. Both processes may plausibly depend on time. For instance, within any given health state, older patients might engage more frequently with the healthcare system. However, introducing complex structures to model the transition in Xs poses computational challenges in both frequentist and Bayesian inference (see for example, Kendall et al., 2024, and a discussion in the online supplementary material of this article). In addition, the issue of identifying the number of hidden states in MMPPs has been a long-standing challenge. In the context of HMMs, this topic has been explored using approaches such as marginal likelihood, cross-validated likelihood, and complete-data likelihood (see, Celeux and Durand, 2008; Luo and Stephens, 2021; Pohle et al., 2017); however, model selection criteria based on these quantities become unreliable as the dimensionality increases beyond 50 (Chen and Chen, 2008). Moreover, while the current model categorizes patients into discrete states, which is appropriate for some chronic diseases thought to have discrete stages of progression, relaxing the assumption of discrete states could yield insights. Relaxing the assumption that patients evolve through discrete states of underlying disease progression, would allow estimation of a continuum of severity that may better represent the trajectory of COPD and other chronic diseases. Substantial advances in statistical methodology would be required to extend this approach to a continuous state space and to allow the latent state distribution to have an interpretation as a continuous index or score. For example, one could use the features included in comorbidity indices to measure multimorbidity in terms of the ability to predict future mortality and health services use. The inference problem for this type of assumption involves multivariate diffusion processes, using discrete-time data that may be incomplete and affected by measurement errors (e.g. Golightly and Sherlock, 2022).

Acknowledgments

The authors would like to thank Prof. David Buckeridge from the Surveillance Lab at McGill Clinical & Health Informatics research group for granting us access to the real data that enabled us to demonstrate the empirical study in this article. The authors would also like to thank two referees for their constructive comments, which have improved the content and clarity of this article.

Data availability

Real data used in this article to illustrate our findings are not shared as restrictions apply to the availability of these data, which were used under license for this study. Simulated data related to Example 1 are generated by the code supplied in the online supplementary material.

Supplementary material

Supplementary material is available online at Journal of the Royal Statistical Society: Series C.

References

Baum
 
L. E.
, &
Petrie
 
T.
(
1966
).
Statistical inference for probabilistic functions of finite state Markov chains
.
The Annals of Mathematical Statistics
,
37
(
6
),
1554
1563
.

Bladt
 
M.
, &
Sørensen
 
M.
(
2005
).
Statistical inference for discretely observed Markov jump processes
.
Journal of the Royal Statistical Society: Series B (Statistical Methodology)
,
67
(
3
),
395
410
.

Burge
 
S.
, &
Wedzicha
 
J.
(
2003
).
COPD exacerbations: Definitions and classifications
.
European Respiratory Journal
,
21
(
41 suppl
),
46s
53s
.

Cappé
 
O.
,
Moulines
 
E.
, &
Rydén
 
T.
(
2005
).
Inference in hidden Markov models
,
Springer Series in Statistics
,
Springer
.

Celeux
 
G.
, &
Durand
 
J.-B.
(
2008
).
Selecting hidden Markov model state number with cross-validated likelihood
.
Computational Statistics
,
23
(
4
),
541
564
.

Chen
 
J.
, &
Chen
 
Z.
(
2008
).
Extended Bayesian information criteria for model selection with large model spaces
.
Biometrika
,
95
(
3
),
759
771
.

Cleveland Clinic
(
2024
). Chronic obstructive pulmonary disease (COPD). https://my.clevelandclinic.org/health/diseases/8709-chronic-obstructive-pulmonary-disease-copd. Date accessed December 30, 2024.

Cook
 
R. J.
, &
Lawless
 
J. F.
(
2018
).
Multistate models for the analysis of life history data
.
Chapman and Hall/CRC
.

Fearnhead
 
P.
, &
Sherlock
 
C.
(
2006
).
An exact Gibbs sampler for the Markov-modulated Poisson process
.
Journal of the Royal Statistical Society: Series B (Statistical Methodology)
,
68
(
5
),
767
784
.

Golightly
 
A.
, &
Sherlock
 
C.
(
2022
).
Augmented pseudo-marginal Metropolis–Hastings for partially observed diffusion processes
.
Statistics and Computing
,
32
(
1
),
21
.

Hobolth
 
A.
, &
Stone
 
E. A.
(
2009
).
Simulation from endpoint-conditioned, continuous-time Markov chains on a finite state space, with applications to molecular evolution
.
The Annals of Applied Statistics
,
3
(
3
),
1204
1231
.

Kendall
 
E. B.
,
Williams
 
J. P.
,
Hermansen
 
G. H.
,
Bois
 
F.
, &
Thanh
 
V. H.
(
2024
).
Beyond time-homogeneity for continuous-time multistate Markov models
.
Journal of Computational and Graphical Statistics
,
1
15
.

Klausch
 
T.
,
Akwiwu
 
E. U.
,
van de Wiel
 
M. A.
,
Coupé
 
V. M.
, &
Berkhof
 
J.
(
2023
).
A Bayesian accelerated failure time model for interval censored three-state screening outcomes
.
The Annals of Applied Statistics
,
17
(
2
),
1285
1306
.

Lange
 
J. M.
,
Hubbard
 
R. A.
,
Inoue
 
L. Y.
, &
Minin
 
V. N.
(
2015
).
A joint model for multistate disease processes and random informative observation times, with applications to electronic medical records data
.
Biometrics
,
71
(
1
),
90
101
.

Lix
 
L.
,
Ayles
 
J.
,
Bartholomew
 
S.
,
Cooke
 
C.
,
Ellison
 
J.
,
Emond
 
V.
,
Hamm
 
N.
,
Hannah
 
H.
,
Jean
 
S.
, &
LeBlanc
 
S.
(
2018
).
The Canadian chronic disease surveillance system: A model for collaborative surveillance
.
International Journal of Population Data Science
,
3
(
1
),
1
11
.

Luo
 
Y.
, &
Stephens
 
D. A.
(
2021
).
Bayesian inference for continuous-time hidden Markov models with an unknown number of states
.
Statistics and Computing
,
31
(
5
),
57
.

Luo
 
Y.
,
Stephens
 
D. A.
,
Verma
 
A.
, &
Buckeridge
 
D. L.
(
2021
).
Bayesian latent multi-state modeling for non-equidistant longitudinal electronic health records
.
Biometrics
,
77
(
1
),
78
90
.

Mark
 
B. L.
, &
Ephraim
 
Y.
(
2013
).
An EM algorithm for continuous-time bivariate Markov chains
.
Computational Statistics & Data analysis
,
57
(
1
),
504
517
.

Mews
 
S.
,
Surmann
 
B.
,
Hasemann
 
L.
, &
Elkenkamp
 
S.
(
2023
).
Markov-modulated marked Poisson processes for modeling disease dynamics based on medical claims data
.
Statistics in Medicine
,
42
(
21
),
3804
3815
.

Nevala
 
A.
,
Heinävaara
 
S.
,
Sarkeala
 
T.
, &
Kulathinal
 
S.
(
2024
).
Bayesian hidden Markov model for natural history of colorectal cancer: Handling misclassified observations, varying observation schemes and unobserved data
.
The Annals of Applied Statistics
,
18
(
4
),
3050
3070
.

Pfeuffer
 
M.
(
2017
).
ctmcd: An R package for estimating the parameters of a continuous-time Markov chain from discrete-time data
.
R Journal
,
9
(
2
),
127
141
.

Pohle
 
J.
,
Langrock
 
R.
,
van Beest
 
F. M.
, &
Schmidt
 
N. M.
(
2017
).
Selecting the number of states in hidden Markov models: Pragmatic solutions illustrated using animal movement
.
Journal of Agricultural, Biological and Environmental Statistics
,
22
(
3
),
270
293
.

Rao
 
V.
, &
Teh
 
Y. W.
(
2013
).
Fast MCMC sampling for Markov jump processes and extensions
.
Journal of Machine Learning Research
,
14
(
1
),
3295
3320
.

Rydén
 
T.
(
1996
).
An EM algorithm for estimation in Markov-modulated Poisson processes
.
Computational Statistics & Data Analysis
,
21
(
4
),
431
447
.

Williams
 
J. P.
,
Storlie
 
C. B.
,
Therneau
 
T. M.
,
Jack
 
C. R.
 Jr
, &
Hannig
 
J.
(
2020
).
A Bayesian approach to multistate hidden Markov models: Application to dementia progression
.
Journal of the American Statistical Association
,
115
(
529
),
16
31
.

Author notes

Conflicts of interest The authors have no conflict of interest to disclose.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Supplementary data