A new relation between prevalence and incidence of a chronic disease

In 1991 Keiding published a relation between the age-specific prevalence and incidence of a chronic disease (in Age-specific incidence and prevalence: a statistical perspective. J. Roy. Stat. Soc. A, 154, 371–412). For special cases alternative formulations by differential equations were given recently in Brinks et al. (2013, Deriving age-specific incidence from prevalence with an ordinary differential equation. Statist. Med., 32, 2070–2078) and in Brinks & Landwehr (2014, Age- and time-dependent model of the prevalence of non-communicable diseases and application to dementia in Germany, Theor. Popul. Biol., 92, 62–68). From these works, we generalize formulations and discuss the advantages of the novel approach. As an implication, we obtain a new way of estimating the incidence rate of a chronic disease from prevalence data. This enables us to employ cross-sectional studies where otherwise expensive and lengthy follow-up studies are needed. This article illustrates and validates the novel method in a simulation study about dementia in Germany.


Introduction
One of the objectives of epidemiology is the description of health-related states and events in populations. To achieve this objective, incidence and prevalence are important quantitative concepts. Incidence refers to the occurrence of new cases in a specific health-related state during a time period, whereas prevalence measures the proportion of subjects who are in the state at a point in time. Both measures are fundamental in epidemiological research.
For analysing quantitative aspects of infectious diseases, state models (synonymously: compartment models) are widely used and have a history going back at least to the 1920s (see, for example, Brauer, 2005). With respect to chronic diseases, compartment models are less common and have appeared later (Fix & Neyman, 1951). The infrequent use of mathematical models in this field is in contrast to the tremendous worldwide burden of chronic diseases. For example, two-thirds of all global cases of death in 2010 have been attributed to chronic diseases (Lozano, 2012). Hence, we feel the urgent need to contribute to the mathematical understanding of the worldwide epidemics of chronic diseases.
A typical model in the epidemiology of chronic diseases considers a population in three states: healthy (H), diseased (I) and dead (D) (Keiding, 1991). Subjects of the population may undergo irreversible transitions between these states as shown in Fig. 1. The transition rates are the incidence i, and the mortalities m 0 and m 1 of the healthy and the diseased subjects, respectively.
In many situations, it is important to keep track of different time scales (Keiding, 2006). Mortality, for instance, crucially depends on the age of the subjects, but also on secular progress in hygiene, 426 R. BRINKS AND S. LANDWEHR nutrition and medical care. Hence, the rates i and m 0 may depend on age a and on calendar time t. In addition, the rate m 1 may also depend on the duration d of the disease (Fig. 1).
In the literature, two approaches can be found in dealing with the state model and its transition rates. Keiding (1991) chose a stochastic nomenclature, whereas the group around Murray & Lopez preferred differential equations (Murray & Lopez, 1994, 1996Barendregt et al., 2003). According to the long tradition of differential equations in modelling infectious diseases, we follow the way of differential equations. We show that our approach is able to obtain Keiding's result (Keiding, 1991).

Methods
Let S(t, a) denote the absolute number of subjects aged a at time t in state H. Moreover, let C(t, a, d) be the number of people age a at time t who are in state I for exact duration d. The total number of subjects aged a at t who have the chronic disease is C (t, a) = a 0 C(t, a, δ) dδ. In this article, we made three assumptions: 1. The population is closed, i.e. there is no migration.
2. We consider only diseases contracted after birth. Thus, it holds C (t, 0) = 0 for all t.
3. The functions S and C are sufficiently smooth.

Keiding's equation
If we look at the change rates of the subjects in the states, balance equations for S and C can be formulated as follows: (2.2) For ease of notation we have written ∂ x = ∂/∂x for x ∈ {t, a, d}. Equations (2.1-2.2) are partial differential equations (PDEs) that describe the outflows from the states H and I, respectively. The first of these equations implies that leaving the state H is a competing risk of the events Death without having contracted the disease and Contracting the disease (Putter et al., 2007). The system of PDEs (2.1-2.2) 427 is extended by the following initial conditions: The first initial condition describes the number of (disease-free) newborns, and the second describes the number of newly diseased persons at (t, a). The PDEs with the initial conditions have the following solutions: For brevity we define Then, the total number C of diseased subjects is By inserting Equations (2.3) and (2.4) into the definition of the age-specific prevalence we get the following theorem.
Theorem 2.1 (Keiding, 1991) The prevalence p(t, a) of those aged a 0 at time t can be calculated by Given the incidence rate i and the mortality rates m k , k = 0, 1, Equation (2.5) analytically describes the prevalence p of the chronic disease for a specific age a and at a specific point in time t. The formula reflects the complex interplay of the involved incidence and mortality rates.
Unfortunately, Equation (2.5) is rarely used in epidemiology or public health contexts. One of the reasons may be that only a few researchers are aware of the equation and its potential. A huge advantage of the equation is the possibility of simulating scenarios. For instance, in the context of planning future health resources one might ask: What would be the effect of reducing the incidence of a specific chronic disease by 25% on the prevalence in the age group 60-80? What would be the effect of lowering the mortality m 1 of the diseased persons by 10%?
These are important questions in predicting the effects (e.g. outcomes, costs, budget impact etc.) of interventions or health programmes. Thus, we think the equation can contribute in planning the allocation of health resources or in the field of health policy decision-making.

Partial differential equations
In this section we formulate another relation between prevalence and incidence. We start with a lemma.
Lemma 2.1 The total number C of diseased persons aged a 0 at t, C (t, a) = a 0 C(t, a, δ) dδ, is the solution of the initial value problem Proof. May be found in Appendix.
With the lemma we are able to derive the main result of this article.
Theorem 2.2 The age-specific prevalence p is the solution of the initial value problem with p(t, 0) = 0.
Proof. By applying the quotient rule to p = C S+C and substituting the expressions for (∂ t + ∂ a )S and (∂ t + ∂ a )C we get Equation (2.7).
Before we describe the advantages of Equation (2.7), we show that it is a generalization of the relations found in Brinks et al. (2013) and Brinks & Landwehr (2014). If m 1 is independent from d, i.e. m 1 (t, a, d) = m 1 (t, a), then it holds m 1 = m 1 and (2.7) becomes which has been shown in Brinks & Landwehr (2014). If in addition all rates are independent from t, one obtains the ordinary differential equation as in Brinks et al. (2013). Hence, Equation (2.7) is an extension of our previously published results if the mortality m 1 of the diseased depends on the duration d. For some chronic diseases, there is epidemiological evidence that d plays a crucial role for m 1 , for example, in diabetes (Carstensen et al., 2008) and systemic lupus erythematosus (Bernatsky et al., 2006). Compared with Keiding's Equation (2.5) the PDE approach is simpler and has a greater flexibility, which is illustrated in three points. The first point is a new possibility of estimating incidence rates from prevalence data. This is an important application in epidemiology and is demonstrated in the next section. The second advantage of the PDE approach becomes obvious, when the information about the mortality is not given in terms of the mortality rates m 0 and m 1 of the healthy and the diseased population, respectively, but in terms of the general mortality m of the whole population and the relative mortality R = m 1 /m 0 . While Keiding's Equation (2.5) is not able to calculate the prevalence p in this situation, the PDE is. A brief calculation using the relation m = pm 1 + (1 − p)m 0 shows that p is the solution of the PDE The situation of given m and R is very common in epidemiology and public health. Often, the general mortality m can be obtained from official vital statistics or life tables. The relative mortality R is taken from disease-specific surveys. Then, Equation (2.9) is able to calculate the prevalence whereas Keiding's formula is not. An anonymous reviewer gave us the valuable hint that Brunet & Struchiner (1999) also derived a relation between prevalence odds p/(1 − p), incidence i and mortalities m k , k = 0, 1, in terms of a PDE, which is similar to Equation (2.8). Similar to Keiding's formula, the approach of Brunet & Struchiner (1999) is not able to cope with the situation when m and R are given instead of m 0 and m 1 .
Remark 2.1 The fraction on the right-hand side in (2.9) is the population attributable fraction, a wellknown epidemiological quantity (Kirkwood & Sterne, 2003).
Finally, the greater flexibility of the PDE compared with Keiding's and Brunet and Struchiner's formula is apparent if we release the assumption of a closed population. Keiding and Brunet & Struchiner do not cover this case, whereas by an extension of the PDE (2.7) this is easily possible. The necessary steps are described in Brinks & Landwehr (2014).
Remark 2.2 Equation (2.7) uses calendar time t and age a as underlying (independent) variables and describes the change of the prevalence as a function of t and a. This may seen in the light of the celebrated McKendrick-Von Foerster Equation, which does the same for the population density (in a closed population). For a review of the history and further references, see the excellent overview by Keiding (2011).

Estimation of the age-specific incidence from two cross-sectional studies
The primary advantage of the PDE approach over Keiding's Equation (2.5) is a possibility of deriving incidence rates from prevalence data. We start with the observation that in contrast to (2.5), the PDE (2.7) can be solved for the incidence rate i : This equation provides a way to estimate the age-specific incidence from two cross-sectional studies. Consider two points in time, t 0 and t 0 + Δ, Δ > 0, and assume we know the age-specific mortalities m 0 (·, a) and m 1 (·, a) at calendar time t 0 + Δ/2. Then, Equation (2.10) is the basis for the following algorithm: Algorithm 2.1 (Incidence from two cross-sections) Let the age-specific prevalence p(·, a) be given at t 0 and t 0 + Δ, Δ > 0. Sett = t 0 + Δ/2.
While the last step in Algorithm 2.1 is mathematically exact, the algorithm comprises two approximation steps (indicated by the ' . =' sign), which are sources for errors. First, an error occurs for approximating the prevalence p(t, a) by the mean of p(t 0 , a − Δ/2) and p(t 0 + Δ, a + Δ/2) in Equation (2.11). The second error arises in estimating the partial derivative (∂ t + ∂ a )p by the finite difference in (2.12).
In both approximations, the underlying idea is linearization, i.e. the assumption that the intermediate value in (2.11) and that the derivative in (2.12) can be approximated by linear functions. If the prevalence p was a linear function, both steps would yield the associated exact values and the errors would be equal to zero. In practical applications, one would not choose the time lag Δ between the two crosssections too long [but long enough to gain a reliable estimate in (2.11) and (2.12)].

Example
For illustration of the practical relevance, we apply the theory to an example motivated by dementia in German males. The mortality m 0 of the non-diseased is chosen to be m 0 (t, a) = exp(−9.0 + 0.085a − t log e (1.01)), which is an approximation of the age-specific mortality of the male German population aged 50 in the past six decades (Federal Statistical Office of Germany, 2011). The calendar time t is given in years since 1960.
The age-specific incidence of dementia is assumed to be This is an approximation of the observed rate in males (Ziegler & Doblhammer, 2009). As there are indications that the age-specific incidence is relatively stable (Qiu et al., 2009), we consider it to be independent from calendar time t.
Concerning the mortality m 1 of the men with dementia, we examine two cases: m 1 being independent and being dependent on the disease duration d. In both cases, we use Keiding's Equation (2.5) to calculate the age-specific prevalence of dementia in the years 2010 and 2015. This mimics two cross-sectional studies with a time lag of 5 years (Δ = 5). The two cross-sections are used to derive the age-specific incidence rate in at t = 2012.5 by Algorithm 2.1. As we know the true incidence underlying the simulation, we can compare the estimates of Algorithm 2.1 with the true values given by (3.1). In this way, we compare our estimate with our own input and do not need additional data for validation.

Independence from duration
In the first example, we assume that the mortality m 1 of the diseased is independent from the duration d. Even more, m 1 is considered proportional to m 0 : m 1 (t, a, d) = m 1 (t, a) = R m 0 (t, a). The relative mortality R(= m 1 /m 0 ) is chosen to be R = 2.63, which is the average value of the relative mortality in the first 6 years after diagnosis of dementia in a comparable English population (Rait et al., 2010). The age courses of the prevalence in 2010 and 2015 are calculated by Keiding's Equation (2.5) in steps of 2.5 years length a = 60, 62.5, . . . , 97.5, 100. The integrals have been calculated using Romberg's method, which allows a prescribed accuracy (Dahlquist & Björck, 1974). The results are shown in Fig. 2.
Based on the age course of the prevalence in Fig. 2, we apply Algorithm 2.1 with m 1 (t, a) = 2 .63 m 0 (t, a). The results are shown in Table 1.
Comparing the true and the calculated incidence rates, we see that the absolute value of the relative error for all ages a = 62.5, . . . , 97.5 is less than 2%.

Duration dependency
The second example mimics the mortality m 1 being dependent on the duration since onset of the disease. According to the values reported in the study of Rait et al.,   Again, we calculate the age-specific prevalence in the years 2010 and 2015 using Keiding's Equation (2.5). The resulting age-specific prevalence is similar to the prevalence shown in Fig. 2.
If we want to extract the age-specific incidence as in the previous section, we should know m 1 . Although m 1 may be accessible by epidemiological surveys, in our setting we do not know the exact rate, because the distribution C(t, a, d)/ a 0 C(t, a, δ)dδ in Equation (2.6) is unknown. We present two ways to overcome this problem in practice: (a) we apply Algorithm 2.1 as in the previous section with setting m 1 (t, a) = m 1 (t, a) = 2.63 m 0 (t, a). The value 2.63 is the average of all the reported relative mortalities from year 1 to year 6 after diagnosis. (b) In the study by Rait et al. it has been observed that the persons aged > 90 die quite soon after diagnosis of dementia. Thus, we set m 1 (t, a) = 2.755 m 0 (t, a) for a > 90, where 2.755 is the average relative mortality from year 1 to year 4 after diagnosis. The comparisons of the estimated incidence rates with the true values are shown in Table 2. The third and fourth columns refer to method (a) and the fifth and sixth columns refer to method (b).
Although the relation m 1 (t, a, d) = R(d) m 0 (t, a) that has been used to generate the input data has not been utilized in Algorithm 2.1, the estimated age-specific incidence rates deviate only slightly for a < 90, namely less than 2% in absolute terms. For ages 90 and more, the deviations increase with age in method (a), which has given rise to increase the relative mortality of this age group in method (b). The rationale behind method (b) is that in the age group 90 only a small percentage survive 5 years after diagnosis. Thus, averaging the relative mortality over years 1 to 6 gives too much weight on the later years after diagnosis, when the relative mortality is lower than in the early years after diagnosis.

Summary
In this article we have formulated and proven a new relation between the age-specific prevalence, the incidence and the mortality rates in terms of a PDE. The relation generalizes differential equations published recently in Brinks et al. (2013) and Brinks & Landwehr (2014). Compared with the relations from Keiding (1991) and Brunet & Struchiner (1999), the PDE is simpler and has a greater flexibility. The flexibility has been illustrated in three points: (i) a new way of deriving incidence rates from prevalence data, (ii) the use of the method if the general mortality is given instead of the mortality rates of the healthy and diseased and (iii) the possible extension in case of migration. A fourth aspect may be mentioned if we allow a transition from the disease state (I) back to the state (H). Again the PDE is capable to deal with this situation and Keiding is not, see Brinks & Landwehr (2014) for details. The new method of deriving incidence rates from prevalence data may be very useful in epidemiology. While prevalence data may be obtained by cross-sectional studies, the estimation of incidence rates mostly require lengthy and costly follow-up studies. Especially in low or middle income countries data about incidence of many diseases have not been surveyed yet. Furthermore, in some situations, estimates from cross-sectional data might be more reliable than estimates by follow-up studies. For example, in surveying occurrence of health states where subjects might feel uncomfortable or even stigmatized, losses to follow-up or withdrawals of consent are very likely. An anonymous cross-section may be found more acceptable and less intrusive for study participants than repetitive re-examinations.
With a view to practical applications of Algorithm 2.1, apart from the approximation errors, sampling errors in surveying the age-specific prevalence have to be considered. The sampling error depends on several parameters and a discussion is beyond the scope of this article. For an introduction about this issue we refer to Brinks et al. (2013) and the associated technical appendix, where sampling error was assessed in simulation studies. Error bounds arising from uncertainties in raw population data may be obtained by bootstrap methods as described and demonstrated in Brinks et al. (2013).
In summary, we have presented a new relation between the age-specific prevalence, the incidence and the mortality rates. The relation is applicable in many contexts from epidemiology, public health and demography. Furthermore, it is simpler and more flexible than a previously found equation. With our findings, we hope to contribute to the quantitative understanding of how basic epidemiological rates and processes may impact global health and burden of chronic diseases. We have to show that C is the solution of the PDE (∂ t + ∂ a )C = −m 1 C + iS. With ∂ = ∂ t + ∂ a it holds: For the second equality Leibniz's integral rule has been used.