Abstract

There has been growing interest on forecasting mortality. In this article, we propose a novel dynamic Bayesian approach for modelling and forecasting the age-at-death distribution, focusing on a three-component mixture of a Dirac mass, a Gaussian distribution and a skew-normal distribution. According to the specified model, the age-at-death distribution is characterized via seven parameters corresponding to the main aspects of infant, adult and old-age mortality. The proposed approach focuses on coherent modelling of multiple countries, and following a Bayesian approach to inference we allow to borrow information across populations and to shrink parameters towards a common mean level, implicitly penalizing diverging scenarios. Dynamic modelling across years is induced through an hierarchical dynamic prior distribution that allows to characterize the temporal evolution of each mortality component and to forecast the age-at-death distribution. Empirical results on multiple countries indicate that the proposed approach outperforms popular methods for forecasting mortality, providing interpretable insights on its evolution.

1 Introduction

In the last decades, changes in life expectancy and population growths have stimulated growing interest in sophisticated models for mortality (e.g. Booth, 2020; Zanotto et al., 2020). These approaches characterize the observed age-specific regularities of the mortality function and their trends over time, providing a parsimonious representation of the components of mortality and allowing to provide forecasts. Estimates of past mortality trends provide insights on the global health of a population, describe relevant aspects of its demographic structure and facilitate comparisons across countries. Similarly, probabilistic forecasts are routinely employed by researchers, practitioners and organizations to aid and support welfare planning strategies such as pension funds, insurances and health systems, among many others (e.g. Lutz & Samir, 2010).

Human mortality has been generally characterized by several quantities, with the hazard function, the survival function and the probability density function being the most popular ones (e.g. Klein & Moeschberger, 2006). Although each representation can be directly obtained from the others, their interpretations are complementary and highlight more clearly different aspects of mortality (e.g. Basellini & Camarda, 2019). In this article we will focus on modelling the distribution of the ages at death, generally referred to as ‘age-at-death distribution’ and corresponding to the density function that characterizes mortality (Keyfitz, 2005). In the literature on modelling and forecasting mortality, this quantity has received significantly less attention than other quantities such as mortality rates (e.g. Lee & Carter, 1992; Li & Lee, 2005). However, as outlined in Basellini and Camarda (2020) and Pascariu et al. (2019), analysis of the age-at-death distribution can be incredibly useful, since it provides a direct evaluation of longevity and lifespan variability that cannot be directly inferred from other quantities (see also Canudas-Romo, 2010; Cheung et al., 2005).

To further see that, Figure 1 illustrates the age-specific distribution of deaths from four illustrative countries, over a time period that ranges from 1970 to 2015. These data highlight crucial aspects of the temporal evolution of mortality, such as the global shift of the curve to older ages—due to the increase in life expectancy—and the drop of perinatal and infant mortality. Figure 1 also suggests that such an evolution has characterized countries differently, resulting in faster changes in some countries (e.g. Italy) and slower in others (e.g. Sweden). Furthermore, the age-specific distribution of deaths illustrates that several mortality trends are converging, resulting in similar patterns in terms of modal age at death, infant mortality and compression of old-age mortality.

Age-at-death distribution across Italy, France, Sweden and Norway from 1970 to 2015
FIGURE 1

Age-at-death distribution across Italy, France, Sweden and Norway from 1970 to 2015

The first approaches for modelling the distribution of deaths date back to the 19-th century (e.g. Gompertz, 1825; Makeham, 1860; Pearson, 1897), and have been generalized in the following century (Heligman & Pollard, 1980; Siler, 1983). The main interest of such methods has been on characterizing mortality patterns across ages, focusing on a given country during a specific year. According to these approaches, the mortality curve can be characterized using different parametric specifications, accounting for distinct aspects of the mortality curve; see also Dellaportas et al. (2001) and Mazzuco et al. (2018). One notable limitation of these approaches is the focus on single-country modelling; when interest is on forecasting mortality for multiple countries, independent country-specific analysis is required. This strategy ignores the fact that in most developed nations similar trends are observed, and this abundance of information can be appropriately included in the model to improve the quality of the estimates. Joint modelling of multiple nations, in fact, is expected to improve the estimation of country-specific parameters and temporal trends, allowing to share information across different countries.

More recently, researchers have focused on explicitly modelling the temporal evolution of the age-at-death distribution, forecasting the parameters using time-series specifications (Basellini & Camarda, 2019; Pascariu et al., 2019). These methods borrow ideas from death-rate forecasting, such as the milestone approach introduced by Lee and Carter (1992) and sequent developments (e.g. Hyndman & Ullah, 2007). Potentially, these models can be both used to describe the evolution of mortality in the past years and to provide forecasts, although the focus of applications is more frequently on forecasting only. However, separate forecasts often lead to unreasonable results, with country-specific or sex-specific predictions that diverge substantially across groups. Such divergent forecasts across sub-populations are implausible, since industrialized countries have been showing global convergence trends in terms of mortality levels, and this tendency is expected to endure (Bergeron-Boucher et al., 2017; Oeppen, 2008; Wilson, 2001).

Following the breakthrough approach of Li and Lee (2005), different authors have focused on ‘coherent’ mortality forecasts, defined as predictions that do not diverge across groups. Such a strategy has constantly received attention in the past years, being successfully implemented in several models for death rates; for example, generalizing compositional-data techniques (Bergeron-Boucher et al., 2017) and functional-data models (Hyndman et al., 2013) for mortality. However, a critical issue with these methods concerns with the choice of a population standard to be used as a reference. This choice is substantially arbitrary, and a common consensus on which sub-population should be used is still lacking, since results are often sensitive to different references (Booth, 2020; Kjærgaard et al., 2016).

Motivated by the above discussion, in this article we propose a novel approach for modelling and forecasting the age-at-death distributions of multiple countries. We will focus on the age distribution of deaths, combining the life table quantities with the total number of deaths. This procedure allows to naturally take into account the population size of each country and its age structure, which are important determinants of the age-at-death distribution. The proposed methods can also be applied, without modifications, directly to the life table quantities, where we can assume that data are provided on artificial populations with fixed population size. However, this would discard an important source of uncertainty, which corresponds to the population size of a specific nation. Indeed, imposing a fixed same sample size weights each nation equally, implying that a relatively small country (e.g. Sweden) would have the same influence as a much bigger country (e.g. Germany or the United Kingdom). Therefore, we prefer to use the actual sample size of each county to have a proper quantification of uncertainty and genuinely account for different population structures.

We propose a method based on a mixture distribution function of a Dirac mass, a Gaussian distribution and a skew-normal distribution, in order to characterize infant, adult and old-age mortality with an interpretable set of parameters. Following a Bayesian approach to inference, we specify hierarchical priors for the dynamic parameters and the country-specific coefficients, relying on informative common prior specifications to improve the borrowing of information across nations. Compared to the available techniques, our method allows to model directly the temporal evolution and the country-specific variations across multiple sub-populations, avoiding sequential procedures and providing estimates shrunk towards common levels. Under the proposed methods, coherent predictions are naturally obtained without selecting a reference population or an external standard, but instead estimating these quantities from the available data. We will refer to the proposed dynamic model based on a skewed distribution function as dysm in the sequel.

2 Methods

2.1 Model specification

Our interest is on providing a flexible representation of the age-at-death distribution in multiple countries, dynamically across different years or time periods. We will characterize this quantity through a continuous probability density function (pdf) f(x;ϑjt) which parametrizes the probability of dying at age x—based on the individuals current living at age x—in country j during year t via a set of country- and time-specific parameters ϑjt, with x = 0, …, 110, j = 1, …, p countries and t = 1, …, T years. Although f(x;·) is often specified as a continuous function, in practice we focus on characterizing the age-at-death distribution on a discrete grid, corresponding to the probability pxjt of dying at a specific age x = 0, …, 110. Such an aim can be addressed discretizing f(x;ϑjt) over a set of disjoint intervals (x − 1/2, x + 1/2] as

1

where F denotes the cumulative distribution function (cdf) associated with f. This characterization of probabilities via a discretization of an underlying pdf automatically guarantees that the vector pjt=pjt(ϑjt)=[p0jt,,p110jt] with elements as in Equation 1 is a proper probability vector with pxjt[0,1] and xpxjt=1, therefore avoiding further restrictions (e.g. Oeppen, 2008); see the left panel of Figure 2 for a graphical representation of the discretization process.

Proposed mixture density function. Panel (a) provides a sketch of the discretization of the pdf into a set of probabilities pj, with j = 57, …, 60. Panel (b) highlights the three components of the mixture density function
FIGURE 2

Proposed mixture density function. Panel (a) provides a sketch of the discretization of the pdf into a set of probabilities pj, with j = 57, …, 60. Panel (b) highlights the three components of the mixture density function

We denote as Djt=[D0jt,,D110jt] the number of deaths and as pjt=[p0jt,,p110jt] the death probabilities, where elements Dxjt and pxjt denote, respectively, the number of deaths and the probability of dying at age x—based on the current individuals living at age x—in country j during year t. Given the total number of deaths njt=xDxjt and the probabilities outlined in Equation 1, we model the vector Djt as a multinomial distribution with

(2)

A crucial aspect of this approach involves the specification of f(x;·), which induces a model on the vector of probabilities pjt. In practice, different parametric models for f(x;·) are available in the literature, with the Gompertz (1825) and Siler (1983) models being popular options. The age-at-death distribution f(x;·) could be also modelled semi-parametrically, using for example splines or other basis functions (Basellini & Camarda, 2020; Hyndman et al., 2013). However, when interest is explicitly on characterizing relevant aspects of mortality—such as the impact of infant mortality or the shape of old-age mortality—parametric models provide a much clearer interpretation and deeper insights on the structure of the populations under investigation. In addition, mortality can be naturally divided into different components, accounting for deaths at different phases of life (e.g. Lexis, 1879; Pearson, 1897). Therefore, it is desirable to rely on a specification which preserves such a decomposition, for example using mixtures of distributions (Carriere, 1992; Mazzuco et al., 2018).

In this work, we draw inspiration from Mazzuco et al. (2018) and Zanotto et al. (2020) and rely on a mixture of three components to characterize mortality; see also Carriere (1992) for related arguments. Specifically, in order to facilitate model's interpretation, we decompose mortality as the contribution of infant mortality, adult mortality and old-age mortality. Infant mortality is accounted through a single Dirac mass at age 0, while adult mortality is modelled via a Gaussian distribution with mean μjtR and standard deviation σjtR+. Old-age mortality, instead, is characterized using a skew-normal distribution (e.g. Azzalini & Capitanio, 2013) with location parameter ξjtR, scale parameter ωjtR+ and shape parameter αjtR. The impact of each component to the overall mortality function is accounted assigning mixture weights πhjt[0,1] such that π0jt+π1jt+π2jt=1 in order to guarantee that the resulting expression remains a proper pdf. This choice leads to the following characterization of f(x;ϑjt)

(3)

with ϑjt=(π1jt,π2jt,μjt,σjt,ξjt,ωjt,αjt), 1[·] denoting the indicator function and φ(·), Φ(·) denoting the pdf and the cdf of a standard Gaussian, respectively; we omit the parameter π0jt from ϑjt since π0jt=1π1jtπ2jt due to the constraint on the mixture weights; see the right panel of Figure 2 for a graphical representation of the proposed model for the age-at-death distribution.

Equation (3) allows to characterize the age-at-death distribution with flexibility, while also retaining interpretability of the mortality structure. The first component of Equation (3) accounts for infant and perinatal mortality, characterizing deaths in the first year of life. As outlined, for example, in Figure 1, perinatal mortality is notably decreased during recent years, mainly due to technological advances (e.g. Wilson, 2001; Zanotto et al., 2020); the mixture weight π0jt characterizes the impact of this component in the overall mortality. The use of a Dirac mass is particularly adequate for developed countries, where infant mortality occurs during the first year of life and can be properly modelled with a single mass, avoiding more elaborate specifications and additional parameters. However, dysm can be naturally adapted to countries with high infant-mortality levels, replacing the Dirac mass with an half-normal distribution (Zanotto et al., 2020) and leveraging on the same model architecture; in Section 3.4, we compare such a specification with the proposed model. The Gaussian component, instead, is assigned to a relative weight π1jt and characterizes adult mortality via the location parameter μjt and the scale parameters σjt. In addition, these quantities correspond also to the moments of the Gaussian component, and therefore they can be interpreted directly as the mean and standard deviation of the age at adult mortality.

The skew-normal component, lastly, accounts for the asymmetric shape of old-age mortality, which has been frequently observed in developed countries (e.g. Mazzuco et al., 2018; Pascariu et al., 2019). Since most deaths occur at old ages, this component is expected to be associated with the largest mixture weight π2jt. Inference on the structure of old-age mortality rely on such mixture weight and on the skew-normal parameters ξjt,ωjt and αjt, providing insights on the evolution of old-age mortality in terms of location, scale and shape of such a component. In practice, it can be convenient to focus on the mean and standard deviation of old-age mortality, which are more interpretable measures of the shift and compression of the age-at-death distribution. According to the proposed parametrization, these quantities can be conveniently expressed as

(4)

with δjt=αjt/1+αjt (e.g. Azzalini & Capitanio, 2013, section 2.1.4). Alternatively, it is also possible to parameterize directly the skew-normal distribution in terms of its cumulants (e.g. Azzalini & Capitanio, 2013, section 3.1.4). However, for simplicity in implementation and exposition, we prefer to work with the outlined parameters, and eventually post-process the estimates to focus on the functionals of interest.

2.2 Hierarchical dynamic model

The multinomial model outlined in Expression 2, with probabilities given by Equations (1) and (3), provides a flexible specification for the distribution of the number of deaths occurred in a single country j during a specific year t. In order to account for the temporal variation of the age-at-death distribution, we model explicitly the dynamic evolution of the parameters ϑjt through an hierarchical dynamic model. This strategy will allow us to highlight what aspects of mortality have changed in the past years, characterizing the rates of such changes and facilitating comparison across different countries. In addition, the inclusion of a dynamic component facilitates the developments of mortality forecasts extrapolating parameters in time, obtaining predictions for the future age-at-death distributions.

We specify the hierarchical model in an unconstrained domain, transforming the parameters of Equation (3). This choice facilitates computation and implementations, allowing to rely on Gaussian dynamic models in the transformed space and avoiding restrictions on the support of the parameters. For this purpose, we introduce the reparametrized vector ϑ~jt, with elements (ϑ~jt1,,ϑ~jt7) corresponding to

(5)

In Equation (5), the mixture weights (π1jt,π2jt) are mapped into (ϑ~jt1,ϑ~jt2) leveraging a cumulative logit transformation, while σjt and ωjt— corresponding to adult and old-age mortality scale parameters—are transformed into ϑ~jt4 and ϑ~jt6, respectively, via a logarithmic transformations. Note that each ϑ~jtkR for k = 1, …, 7, and that the vector ϑ~jt effectively corresponds to a one-to-one reparametrization of ϑjt.

We model the dynamic evolution of the elements ϑ~jtk leveraging a random-walk model with drift. This choice leads to the following specification.

(6)

In Equation (6), ϑ~j0kR denotes the initial condition of ϑ~jtk, and it is assigned to a Gaussian distribution with mean mkR and standard deviation skR+. These values correspond to the initial information for the dynamic hierarchical model, and characterize a probabilistic representation of our beliefs before the data are observed (e.g. West & Harrison, 1997). The drift parameter βjkR measures the expected difference across consecutive values of ϑ~jtk and accounts for possible increasing or decreasing linear trends in the evolution of the parameters. The inclusion of this component is particularly relevant in our problem, where we expected to observe clear and important trends in the evolution of mortality, accounted by the parameters charactering the age-at-death distribution. Conditionally on the value at time (t − 1), each ϑ~jtk is recursively distributed as a Gaussian distribution with mean βjk+ϑ~jt1k, and standard deviation ηjkR+. Overall, we can interpret dysm as a Multinomial state-space model, where Equation (6) controls the evolution of the latent states via random walks with Gaussian innovations and, conditional on the states, the observed vectors of deaths are modelled as independent Multinomial variables with probabilities given by Equations (1) and (3); see Figure 3 for a graphical representation of the proposed hierarchical dynamic model.

Graphical illustration of the dynamic component of dysm. Solid circles, white squares and grey squares denote vectors of dynamic parameters, probabilities characterizing the age-at-death distribution and observed number of deaths respectively
FIGURE 3

Graphical illustration of the dynamic component of dysm. Solid circles, white squares and grey squares denote vectors of dynamic parameters, probabilities characterizing the age-at-death distribution and observed number of deaths respectively

Although this dynamic model might seem oversimplistic, we found that such a specification leads to good results in a large variety of applications involving developed countries; see Section 3 for further details. More elaborate extensions are possible, for example including a moving-average component, increasing the lag of the autoregressive terms or joint modelling multiple components of ϑ~jtk via multivariate random walks. However, a random walk with drift on the latent states often characterizes well many phenomena without over-parameterizing the dynamic component, and is also general enough to be directly used, without further modifications, in a large variety of settings; see Durbin and Koopman (2012) and West and Harrison (1997) for related arguments and Section 3.4 for a comparison with alternative specifications.

Additionally, we highlight that Equation (6) is valid for any equally spaced time grid, and therefore the proposed approach can be also applied to different time scales. In fact, our description focuses on yearly data for simplicity in exposition, but this requirement is not necessary for developing the proposed model in practice. Several important settings involve analysis of mortality across different time scales, such as monthly data or multi-year periods; for example, when interest is on modelling short-term variations due to extraordinary events (e.g. covid-19 pandemic) or large-scale trends, respectively. The general structure of dysm facilitates its use in these different settings, without modifying the model specification.

2.3 Bayesian inference

As discussed in Section 1, a crucial aspect of mortality forecasting involves the penalization of diverging scenarios via coherent modelling. This can be obtained explicitly including a common trend across countries (e.g. Li, 2013; Li & Lee, 2005), or choosing a reference population (e.g. Basellini & Camarda, 2019; Hyndman et al., 2013); see also Booth (2020) for a recent discussion. However, these choices are arbitrary and significantly affect the final results (e.g. Kjærgaard et al., 2016). Therefore, we follow a different path and induce coherent predictions via a Bayesian approach to inference, regularizing estimates towards a common mean trend. More specifically, we introduce a common prior specification for the country-specific dynamic hierarchical model defined in Equation (6), for each coefficient k = 1, …, 7. Including the prior for ϑ~j0k, we let

(7)

where mβk,sβk,ak and bk denote fixed hyper-parameters; see Figure 4 for an illustration of the hierarchical dependence structure induced by the common prior specification.

Graphical illustration of the hierarchical component of dysm, inducing borrowing of information and coherent predictions across countries, with j = 1,…, p. Top level block denotes fixed hyper-parameters, while bottom blocks denote model's parameters
FIGURE 4

Graphical illustration of the hierarchical component of dysm, inducing borrowing of information and coherent predictions across countries, with j = 1,…, p. Top level block denotes fixed hyper-parameters, while bottom blocks denote model's parameters

The specification of a common prior distribution for the country-specific parameters allows to share information across different nations, improving estimates for the age-at-death distributions. In addition, the introduction of a common prior shrinks estimates towards a common mean, providing an implicit regularization on the parameters. Such a shrinkage is what makes dysm a ‘coherent’ model, using a data-driven reference mortality level without the need of a user-defined benchmark.

2.4 Prior elicitation and posterior computation

The large abundance of research on the evolution of mortality, and the clear interpretation of the parameters in Equation 3, facilitate informative elicitation for the hyper-parameters of Equation 7. For example, European adult mortality is expected to lie between 30 and 70 years with high probability (e.g. Lewer et al., 2020); therefore, we can assume ϑ~j03𝒩(50,10) for the initial condition of ϑ~jt3, independently for j = 1, …, p. Similarly, informative specifications for the prior distribution on the drift parameter βjk allow to include information on the direction and intensity of the temporal trends. For instance, we could centre the trend βj3 around mβ3=2, resulting in an expected increasing trend for the age at adult mortality of 2 years between consecutive time points. In Section 3.4, we compare such informative prior specification with a non-informative elicitation, confirming the importance of the proposed approach to induce shrinkage across countries and obtain coherent predictions.

There is a large literature on simulation methods for non-Gaussian state-space models; see, for example, Durbin and Koopman (2012 chapter 13.3), Prado and West (2010 chapter 6) or Chopin and Papaspiliopoulos (2020) for recent developments. A major complexity in our methods derive from the computation of the age-at-death distribution, which links ϑjt to pjt via Equation (1). Therefore, we conduct posterior inference relying on the r package nimble (de Valpine et al., 2020), that allows to specify the main structure of dysm, compile it in c++, and implement an adaptive Metropolis-within-Gibbs mcmc algorithm to simulate from the posterior distribution of the model's parameters. Specifically, the parameters of Equation (7) are updated using Gibbs steps, while the remaining are updated with multivariate Metropolis steps, due to the lack of conjugacy induced by Equation (1). Pseudo-code illustrating the main steps to conduct posterior inference is reported in the Supplementary materials. Forecasts for future time points t > T can be simulated recursively from the posterior predictive distribution, continuing the conditional representation of Equation (6) for each draw of the mcmc output.

In order to improve the computational performance, we included further modifications to efficiently compute the cdf of the mixture density function in Expression (3) using c++. This operation is non-trivial since it requires to compute the cdf of the skew-normal component, which relies on the Owen (1956) T-function; see also Azzalini and Capitanio (2013 Appendix B) for the details and Patefield and Tandy (2000) for practical considerations. Overall, posterior computations in a setting with p = 12 countries and T = 20 time points requires roughly 2 min per 1000 iterations on an 8-core intel I7-7700HQ, 2.8 ghz processor running Linux, which is satisfactory for our purposes.

3 European Countries Mortality

3.1 Data description

The focus of this section is to evaluate the ability of dysm to characterize the age-at-death distributions of different European countries. In Section 3.2, we are interested in comparing its performance with some state-of-the-art methods for mortality forecast, focusing on rolling windows scenarios. In Section 3.3 we use the proposed method to estimate mortality and to forecasts for future years, illustrating in details how dysm can be used to draw thoughtful inferential conclusions by making inference on different quantities. Analysis will focus on the time range 1960–2016, and on joint modelling the following p = 12 countries: Austria (aut), Switzerland (che), England & Wales (gbr), Germany (deu), Denmark (dnk), Finland (fin), France (fra), Italy (ita), Netherlands (nld), Norway (nor), Spain (esp) and Sweden (swe), separately for the male and female populations. These countries were selected according to the European countries analysed in other approaches, such as Li and Lee (2005). Mortality data are retrieved from the Human Mortality Database (2020), and the year- and country-specific number of deaths are obtained with the utilities of the r package demograpy (Hyndman et al., 2019).

3.2 Performance evaluation via rolling window

In order to quantitatively evaluate the performance of dysm in estimating and forecasting mortality, we rely on rolling window scenarios. Specifically, analysis focus on sub-periods of 20 years of data to fit the models, and subsequent periods of 10 years of data to provide forecasts. Performance is evaluated both in terms of in-sample accuracy—comparing the model fit with the 20 observed years—and in terms of forecasts ability—comparing predictions with the out-of-sample block for the subsequent 10 years. Multiple scenarios are obtained rolling the window forward of steps of 1 years, thereby dividing the range 1960–2016 in 30 consecutive scenarios.

We compare dysm with different popular approaches for mortality forecasting: the Lee-Carter model (Lee & Carter, 1992), the coherent Li-Lee model (Li & Lee, 2005), the functional mortality model of Hyndman–Ullah (Hyndman & Ullah, 2007), the compositional Oeppen model (Oeppen, 2008) and the mem5 model (Pascariu et al., 2019). All methods are conveniently available in a common interface within the r package MortalityForecast (Pascariu, 2020). Since these methods focus on modelling single population data, we perform the analysis on each country separately. In order to compare the approaches on the same basis, results are evaluated in terms of the quality of the fitted and predicted age-at-death distributions and death rates. Under dysm, such estimates are obtained post-processing the mcmc output, computing the posterior distribution of Equation 1 via Monte Carlo integration. For the remaining approaches, we follow standard practices and transform predictions into death rates and age-at-death distributions (e.g. Pascariu, 2020).

Posterior inference proceeds separately for the female and male populations, relying on informative prior distributions. We elicited the initial conditions ϑ~j0k for the hierarchical dynamic component considering the expected level of mortality in the temporal window under investigation, centring old-age mortality around 70 years (m5=70), adult mortality around 50 years (m3=50) and setting the remaining values of mk to 0. We also let sk=10 for all k to induce sufficient variability in such initial prior guesses and cover the temporal window under investigation. Lastly, we specified a shared standard Gaussian for the drift parameters βjk (setting mβk=0 and sβk=1) and a weakly informative Inverse-Gamma on the variances of the dynamic innovations ηjk2, setting ak=0.01,bk=0.01 for all k. In each scenario, posterior inference for dysm relies on 100000 iterations collected after a burn-in of 5000, thinning one iteration every 5. Convergence was assessed in terms of mixing, autocorrelation of the chains and Geweke diagnostics, which resulted satisfactory in all the windows considered, with an effective sample size larger than 18000 of 20000 stored iterations for all the parameters.

Performance is evaluated in terms of mean squared error (mse) and mean absolute error (mae) between the fitted and predicted age-at-death distributions or death rates and the ground truth. Since the elements of the age-at-death distribution are generally very small numbers, we report results in relative terms, dividing the performance of each approach by the performance obtained by dysm in each scenario and country. Recalling that both mse and mae are measure of discrepancy, results larger than 1 indicate that predictions under dysm achieve an error which is lower than the competitor, and therefore we can conclude that dysm performs better; the opposite holds for values lower than 1.

Results referring to in-sample evaluation are reported in Table 1, computing the median, first and third quartiles of the relative performances across countries and rolling windows scenarios. Current results suggest that dysm outperforms the competitors both for the male and female populations, with a performance that is better than the other approaches in all settings. These results confirm the ability of dysm to characterize with flexibility age-at-death distributions across multiple countries and time periods, providing estimates that are consistent with the observed data and more accurate than state-of-the art methods. When dysm is used for fitting death rates, a relatively worse performance is observed, particularly in terms of mae. This result can be explained considering that old ages (80–100) are assigned more weight in the computation of death rates, compared to other quantities that characterize mortality (e.g. Keyfitz, 2005). Therefore, dysm performs particularly well in characterizing premature and adult mortality, and this leads to large improvements for modelling the age-at-death distribution, in particular for male populations where the share of adult mortality is relatively large.

TABLE 1

In-sample relative performance in estimating the age-at-death distribution (top) and death rates (bottom), comparing predictions of dysm against the competitors. Median across rolling windows and countries. First and third quartiles are reported in squared brackets

femalemale
maemsemaemse
Age-at-death distr.
dysm1.001.001.001.00
Hyndman–Ullah1.41 [1.17, 1.64]2.73 [1.91, 3.80]1.92 [1.39, 2.40]5.37 [2.76, 7.79]
Lee-Carter1.40 [1.17, 1.62]2.74 [1.90, 3.79]1.93 [1.39, 2.36]5.34 [2.77, 7.77]
Li-Lee1.23 [0.94, 1.45]2.19 [1.22, 2.97]1.75 [1.30, 2.28]4.71 [2.53, 6.99]
mem51.22 [1.07, 1.31]2.01 [1.55, 2.55]1.86 [1.56, 2.19]4.20 [3.01, 6.61]
Oeppen1.13 [0.92, 1.36]1.88 [1.31, 2.66]1.61 [1.24, 2.14]3.79 [2.24, 6.37]
Death rates
dysm1.001.001.001.00
Hyndman–Ullah0.86 [0.76, 0.95]1.02 [0.86, 1.11]0.84 [0.79, 0.92]1.02 [0.85, 1.19]
Lee-Carter0.91 [0.79, 1.06]1.08 [0.88, 1.86]0.87 [0.80, 1.00]1.07 [0.87, 1.58]
Li-Lee1.02 [0.71, 1.26]1.81 [0.71, 5.42]0.99 [0.77, 1.17]1.90 [0.95, 3.89]
mem51.06 [0.97, 1.14]0.99 [0.90, 1.09]1.01 [0.96, 1.05]1.02 [0.95, 1.08]
Oeppen1.27 [1.03, 4.71]3.87 [1.10, 6.51]1.09 [0.89, 2.98]1.32 [0.81, 7.01]
femalemale
maemsemaemse
Age-at-death distr.
dysm1.001.001.001.00
Hyndman–Ullah1.41 [1.17, 1.64]2.73 [1.91, 3.80]1.92 [1.39, 2.40]5.37 [2.76, 7.79]
Lee-Carter1.40 [1.17, 1.62]2.74 [1.90, 3.79]1.93 [1.39, 2.36]5.34 [2.77, 7.77]
Li-Lee1.23 [0.94, 1.45]2.19 [1.22, 2.97]1.75 [1.30, 2.28]4.71 [2.53, 6.99]
mem51.22 [1.07, 1.31]2.01 [1.55, 2.55]1.86 [1.56, 2.19]4.20 [3.01, 6.61]
Oeppen1.13 [0.92, 1.36]1.88 [1.31, 2.66]1.61 [1.24, 2.14]3.79 [2.24, 6.37]
Death rates
dysm1.001.001.001.00
Hyndman–Ullah0.86 [0.76, 0.95]1.02 [0.86, 1.11]0.84 [0.79, 0.92]1.02 [0.85, 1.19]
Lee-Carter0.91 [0.79, 1.06]1.08 [0.88, 1.86]0.87 [0.80, 1.00]1.07 [0.87, 1.58]
Li-Lee1.02 [0.71, 1.26]1.81 [0.71, 5.42]0.99 [0.77, 1.17]1.90 [0.95, 3.89]
mem51.06 [0.97, 1.14]0.99 [0.90, 1.09]1.01 [0.96, 1.05]1.02 [0.95, 1.08]
Oeppen1.27 [1.03, 4.71]3.87 [1.10, 6.51]1.09 [0.89, 2.98]1.32 [0.81, 7.01]
TABLE 1

In-sample relative performance in estimating the age-at-death distribution (top) and death rates (bottom), comparing predictions of dysm against the competitors. Median across rolling windows and countries. First and third quartiles are reported in squared brackets

femalemale
maemsemaemse
Age-at-death distr.
dysm1.001.001.001.00
Hyndman–Ullah1.41 [1.17, 1.64]2.73 [1.91, 3.80]1.92 [1.39, 2.40]5.37 [2.76, 7.79]
Lee-Carter1.40 [1.17, 1.62]2.74 [1.90, 3.79]1.93 [1.39, 2.36]5.34 [2.77, 7.77]
Li-Lee1.23 [0.94, 1.45]2.19 [1.22, 2.97]1.75 [1.30, 2.28]4.71 [2.53, 6.99]
mem51.22 [1.07, 1.31]2.01 [1.55, 2.55]1.86 [1.56, 2.19]4.20 [3.01, 6.61]
Oeppen1.13 [0.92, 1.36]1.88 [1.31, 2.66]1.61 [1.24, 2.14]3.79 [2.24, 6.37]
Death rates
dysm1.001.001.001.00
Hyndman–Ullah0.86 [0.76, 0.95]1.02 [0.86, 1.11]0.84 [0.79, 0.92]1.02 [0.85, 1.19]
Lee-Carter0.91 [0.79, 1.06]1.08 [0.88, 1.86]0.87 [0.80, 1.00]1.07 [0.87, 1.58]
Li-Lee1.02 [0.71, 1.26]1.81 [0.71, 5.42]0.99 [0.77, 1.17]1.90 [0.95, 3.89]
mem51.06 [0.97, 1.14]0.99 [0.90, 1.09]1.01 [0.96, 1.05]1.02 [0.95, 1.08]
Oeppen1.27 [1.03, 4.71]3.87 [1.10, 6.51]1.09 [0.89, 2.98]1.32 [0.81, 7.01]
femalemale
maemsemaemse
Age-at-death distr.
dysm1.001.001.001.00
Hyndman–Ullah1.41 [1.17, 1.64]2.73 [1.91, 3.80]1.92 [1.39, 2.40]5.37 [2.76, 7.79]
Lee-Carter1.40 [1.17, 1.62]2.74 [1.90, 3.79]1.93 [1.39, 2.36]5.34 [2.77, 7.77]
Li-Lee1.23 [0.94, 1.45]2.19 [1.22, 2.97]1.75 [1.30, 2.28]4.71 [2.53, 6.99]
mem51.22 [1.07, 1.31]2.01 [1.55, 2.55]1.86 [1.56, 2.19]4.20 [3.01, 6.61]
Oeppen1.13 [0.92, 1.36]1.88 [1.31, 2.66]1.61 [1.24, 2.14]3.79 [2.24, 6.37]
Death rates
dysm1.001.001.001.00
Hyndman–Ullah0.86 [0.76, 0.95]1.02 [0.86, 1.11]0.84 [0.79, 0.92]1.02 [0.85, 1.19]
Lee-Carter0.91 [0.79, 1.06]1.08 [0.88, 1.86]0.87 [0.80, 1.00]1.07 [0.87, 1.58]
Li-Lee1.02 [0.71, 1.26]1.81 [0.71, 5.42]0.99 [0.77, 1.17]1.90 [0.95, 3.89]
mem51.06 [0.97, 1.14]0.99 [0.90, 1.09]1.01 [0.96, 1.05]1.02 [0.95, 1.08]
Oeppen1.27 [1.03, 4.71]3.87 [1.10, 6.51]1.09 [0.89, 2.98]1.32 [0.81, 7.01]

Out-of-sample performances are instead reported in Table 2. Results provide further evidence in favour of the ability of dysm, which obtains out-of-sample forecasts more accurate than the competitors. This important results suggest that, overall, dysm is the most accurate methods for modelling and forecasting the age-at-death distribution in the 12 European countries under investigation. Performances stratified by countries are reported in the Supplementary materials.

TABLE 2

Out-of-sample relative performance in estimating the age-at-death distribution (top) and death rates (bottom), comparing predictions of dysm against the competitors. Median across rolling windows and countries. First and third quartiles are reported in squared brackets

femalesmales
maemsemaemse
Age-at-death distr.
dysm1.001.001.001.00
Hyndman–Ullah1.38 [1.14, 1.69]1.99 [1.39, 2.88]1.91 [1.12, 2.48]3.64 [1.42, 6.36]
Lee-Carter1.30 [1.03, 1.62]1.75 [1.12, 2.65]1.73 [1.06, 2.37]3.23 [1.25, 5.54]
Li-Lee1.28 [0.91, 1.56]1.66 [0.93, 2.54]1.73 [1.06, 2.36]3.23 [1.25, 5.53]
mem52.44 [2.02, 3.03]5.67 [3.98, 8.88]2.84 [1.70, 3.88]7.92 [3.23, 9.96]
Oeppen1.29 [0.94, 1.82]2.30 [1.29, 9.34]1.66 [1.06, 2.36]3.86 [1.45, 8.11]
Death rates
dysm1.001.001.001.00
Hyndman–Ullah1.02 [0.88, 1.10]1.35 [1.18, 1.70]0.96 [0.89, 1.05]1.24 [1.05, 1.47]
Lee-Carter1.12 [0.95, 2.66]1.73 [1.22, 8.96]1.06 [0.94, 2.08]1.51 [1.14, 9.24]
Li-Lee1.12 [0.95, 2.66]1.73 [1.22, 7.95]1.06 [0.94, 2.08]1.51 [1.14, 9.24]
mem51.06 [0.89, 1.19]1.03 [0.76, 1.48]1.32 [1.03, 1.58]1.05 [0.83, 1.41]
Oeppen1.67 [1.24, 4.07]3.99 [1.53, 9.44]1.29 [1.10, 1.69]1.97 [1.36, 6.35]
femalesmales
maemsemaemse
Age-at-death distr.
dysm1.001.001.001.00
Hyndman–Ullah1.38 [1.14, 1.69]1.99 [1.39, 2.88]1.91 [1.12, 2.48]3.64 [1.42, 6.36]
Lee-Carter1.30 [1.03, 1.62]1.75 [1.12, 2.65]1.73 [1.06, 2.37]3.23 [1.25, 5.54]
Li-Lee1.28 [0.91, 1.56]1.66 [0.93, 2.54]1.73 [1.06, 2.36]3.23 [1.25, 5.53]
mem52.44 [2.02, 3.03]5.67 [3.98, 8.88]2.84 [1.70, 3.88]7.92 [3.23, 9.96]
Oeppen1.29 [0.94, 1.82]2.30 [1.29, 9.34]1.66 [1.06, 2.36]3.86 [1.45, 8.11]
Death rates
dysm1.001.001.001.00
Hyndman–Ullah1.02 [0.88, 1.10]1.35 [1.18, 1.70]0.96 [0.89, 1.05]1.24 [1.05, 1.47]
Lee-Carter1.12 [0.95, 2.66]1.73 [1.22, 8.96]1.06 [0.94, 2.08]1.51 [1.14, 9.24]
Li-Lee1.12 [0.95, 2.66]1.73 [1.22, 7.95]1.06 [0.94, 2.08]1.51 [1.14, 9.24]
mem51.06 [0.89, 1.19]1.03 [0.76, 1.48]1.32 [1.03, 1.58]1.05 [0.83, 1.41]
Oeppen1.67 [1.24, 4.07]3.99 [1.53, 9.44]1.29 [1.10, 1.69]1.97 [1.36, 6.35]
TABLE 2

Out-of-sample relative performance in estimating the age-at-death distribution (top) and death rates (bottom), comparing predictions of dysm against the competitors. Median across rolling windows and countries. First and third quartiles are reported in squared brackets

femalesmales
maemsemaemse
Age-at-death distr.
dysm1.001.001.001.00
Hyndman–Ullah1.38 [1.14, 1.69]1.99 [1.39, 2.88]1.91 [1.12, 2.48]3.64 [1.42, 6.36]
Lee-Carter1.30 [1.03, 1.62]1.75 [1.12, 2.65]1.73 [1.06, 2.37]3.23 [1.25, 5.54]
Li-Lee1.28 [0.91, 1.56]1.66 [0.93, 2.54]1.73 [1.06, 2.36]3.23 [1.25, 5.53]
mem52.44 [2.02, 3.03]5.67 [3.98, 8.88]2.84 [1.70, 3.88]7.92 [3.23, 9.96]
Oeppen1.29 [0.94, 1.82]2.30 [1.29, 9.34]1.66 [1.06, 2.36]3.86 [1.45, 8.11]
Death rates
dysm1.001.001.001.00
Hyndman–Ullah1.02 [0.88, 1.10]1.35 [1.18, 1.70]0.96 [0.89, 1.05]1.24 [1.05, 1.47]
Lee-Carter1.12 [0.95, 2.66]1.73 [1.22, 8.96]1.06 [0.94, 2.08]1.51 [1.14, 9.24]
Li-Lee1.12 [0.95, 2.66]1.73 [1.22, 7.95]1.06 [0.94, 2.08]1.51 [1.14, 9.24]
mem51.06 [0.89, 1.19]1.03 [0.76, 1.48]1.32 [1.03, 1.58]1.05 [0.83, 1.41]
Oeppen1.67 [1.24, 4.07]3.99 [1.53, 9.44]1.29 [1.10, 1.69]1.97 [1.36, 6.35]
femalesmales
maemsemaemse
Age-at-death distr.
dysm1.001.001.001.00
Hyndman–Ullah1.38 [1.14, 1.69]1.99 [1.39, 2.88]1.91 [1.12, 2.48]3.64 [1.42, 6.36]
Lee-Carter1.30 [1.03, 1.62]1.75 [1.12, 2.65]1.73 [1.06, 2.37]3.23 [1.25, 5.54]
Li-Lee1.28 [0.91, 1.56]1.66 [0.93, 2.54]1.73 [1.06, 2.36]3.23 [1.25, 5.53]
mem52.44 [2.02, 3.03]5.67 [3.98, 8.88]2.84 [1.70, 3.88]7.92 [3.23, 9.96]
Oeppen1.29 [0.94, 1.82]2.30 [1.29, 9.34]1.66 [1.06, 2.36]3.86 [1.45, 8.11]
Death rates
dysm1.001.001.001.00
Hyndman–Ullah1.02 [0.88, 1.10]1.35 [1.18, 1.70]0.96 [0.89, 1.05]1.24 [1.05, 1.47]
Lee-Carter1.12 [0.95, 2.66]1.73 [1.22, 8.96]1.06 [0.94, 2.08]1.51 [1.14, 9.24]
Li-Lee1.12 [0.95, 2.66]1.73 [1.22, 7.95]1.06 [0.94, 2.08]1.51 [1.14, 9.24]
mem51.06 [0.89, 1.19]1.03 [0.76, 1.48]1.32 [1.03, 1.58]1.05 [0.83, 1.41]
Oeppen1.67 [1.24, 4.07]3.99 [1.53, 9.44]1.29 [1.10, 1.69]1.97 [1.36, 6.35]

3.3 Results for European mortality

In Section 3.2, we have demonstrated that dysm outperforms the competitors in terms of the quality of the fitted and predicted age-at-death distributions. In this section we argue that the proposed method has concrete practical benefits in terms of mortality modelling also from an interpretative point of view. Indeed, posterior inference on the parameters characterizing dysm and their functionals provides insights on the structure of mortality and its future evolution, highlighting aspects that have larger impact in such changes and explaining differences across countries.

To illustrate this, we focus on the same European countries as in the previous section, modelling the more recent period 1980–2017 and providing forecasts until 2037. Posterior simulation for this application relies on the same settings as in Section 3.2. Results in terms of mixing, autocorrelations, Geweke diagnostic and effective sample size were satisfactory for all the parameters considered, with an effective sample size larger than 19000 of 20000 iterations for all the parameter. We conduct inference on the set of parameters (π0jt,π1jt,π2jt,μjt,σjt,αjt,μ~jtSN,σ~jtSN) to further facilitate interpretation. Results are reported in Figure 5, illustrating the posterior medians and 90% credible intervals for the estimated parameters for male and female populations separately.

Posterior medians (solid lines) and 90% credible intervals (dotted lines) for the parameters characterizing dysm and functionals of interest. Black vertical dotted lines indicate the last observed period (2017)
FIGURE 5

Posterior medians (solid lines) and 90% credible intervals (dotted lines) for the parameters characterizing dysm and functionals of interest. Black vertical dotted lines indicate the last observed period (2017)

Current empirical findings are consistent with known aspects of the temporal evolution of mortality of the past decades. For example, the impact of infant mortality—measured via the parameters π0jt—is notably decreased in all countries in the past years, dropping from values around 0.01 to 0.005. Although both male and female populations share such decreasing trends, we also observe that infant mortality is larger for males than females (e.g. Drevenstedt et al., 2008). Another clear aspect involves the evolution of the mean age of old-age mortality, captured via the skew-normal mean μ~jtSN, which has constantly increased in all countries. The overall trend is similar between males and females, with woman reporting older ages on average. Similarly, the decreasing trend of σ~jtSN is consistent with the empirical evidence on compression of old-age mortality.

Under dysm, estimates and credible intervals for any complex functional of the model's parameters are easily obtained. More importantly, prediction intervals are determined taking into account the uncertainty of the entire inferential process, which is not the case for other coherent models, where all the uncertainty related to choice of reference is generally not accounted for. Some interesting functionals include death rates and life expectancies, which can be obtained from the age-at-death distribution via deterministic transformations; dysm allows to conduct coherent inference also on this quantities, rigorously characterizing uncertainty post-processing the mcmc output. As an illustrative example, Figures 6 and 7 depict the life expectancy and the death probability, respectively, computed at different ages and considering male and female populations in Italy and Sweden. Results are consistent with the conclusions outlines above, confirming regular patters of mortality such as non-overlapping trends for males and females and an overall improvement at all ages.

Life expectancy at age 0 (at birth), 20, 40, 60, 70 and 80. Italy and Sweden, female and male population. Estimates are obtaining post-processing the mcmc for the age-at-death distribution. Solid and dashed lines denote posterior medians and 90% credible intervals, respectively
FIGURE 6

Life expectancy at age 0 (at birth), 20, 40, 60, 70 and 80. Italy and Sweden, female and male population. Estimates are obtaining post-processing the mcmc for the age-at-death distribution. Solid and dashed lines denote posterior medians and 90% credible intervals, respectively

Logarithm of death probability at age 50, 60, 70, 80 and 90. Italy and Sweden, female and male population. Estimates are obtaining post-processing the mcmc for the age-at-death distribution. Solid and dashed lines denote posterior medians and 90% credible intervals, respectively
FIGURE 7

Logarithm of death probability at age 50, 60, 70, 80 and 90. Italy and Sweden, female and male population. Estimates are obtaining post-processing the mcmc for the age-at-death distribution. Solid and dashed lines denote posterior medians and 90% credible intervals, respectively

3.4 Alternative parametric specifications

In this section we evaluate the robustness of the parametric assumptions underlying dysm, focusing on the application outlined in Section 3.3. We investigate different specifications for the mixture components characterizing dysm and alternative prior specifications. Specifically, we modify dysm by

  • modelling old-age mortality with a rescaled Beta distribution, supported in the interval [75–110], instead of a skew-normal distribution;

  • modelling infant mortality with an half-normal distribution, as in Zanotto et al. (2020), instead of a single Dirac mass at age 0;

  • removing the Gaussian component accounting for adult mortality, setting π1jt=0;

  • characterizing the joint evolution of ϑ~jt with a multivariate random walk, instead of independent innovations;

  • characterizing the evolution of ϑ~jt via random walks with Student-t innovations with 5 degrees of freedom, instead of Gaussian distributions;

  • replacing informative priors distributions in Equation 7 with flat uniform priors.

All modifications are implemented and estimated with nimble, relying on the same settings as in Section 3.3. Model evaluation is performed via marginal likelihood—estimated via the harmonic mean estimator (e.g. Raftery et al., 2007)—and comparing the mse and mae of predictionsobtained via posterior medians. The empirical findings are reported in Table 3, and suggest that the current implementation for dysm is the best performing model in terms of marginal likelihood and other measures. For example, modelling old-age mortality with a scaled Beta or characterizing infant mortality with an half-normal distribution do not improve the fit of the current specification for dysm; similar results are obtained with more complex specifications for the dynamic prior distribution or non-informative elicitation, confirming the importance of the proposed specifications. Interestingly, current results indicate that the specification without adult mortality performs better for female than for male mortality, confirming that the inclusion of a separate component devoted to adult mortality is fundamental for modelling male populations, but also provides concrete benefits for females.

TABLE 3

Evaluation of model fit: − log p(D) refer to the logarithm of the marginal likelihood, changed of sign; mse and mae refer to mean squared error and mean absloute error of the posterior medians respectively. Smaller values indicate better fit

malesfemales
− log p(D)maemse− log p(D)maemise
dysm  55.650.10  10.1946.20.08  10.21
Beta (old-age mortality)453.550.13  33.83307.530.92  98.72
Half-normal (infant mortality)176.030.15    9.79164.070.95  98.72
Removing adult mortality264.200.88  30.7988.230.74  83.74
Multivariate random-walk301.528.78116.76205.524.75135.77
Student-t innovations254.070.90  10.26178.230.91  96.24
dysm, uniform prior359.702.35175.14466.522.01121.61
malesfemales
− log p(D)maemse− log p(D)maemise
dysm  55.650.10  10.1946.20.08  10.21
Beta (old-age mortality)453.550.13  33.83307.530.92  98.72
Half-normal (infant mortality)176.030.15    9.79164.070.95  98.72
Removing adult mortality264.200.88  30.7988.230.74  83.74
Multivariate random-walk301.528.78116.76205.524.75135.77
Student-t innovations254.070.90  10.26178.230.91  96.24
dysm, uniform prior359.702.35175.14466.522.01121.61
TABLE 3

Evaluation of model fit: − log p(D) refer to the logarithm of the marginal likelihood, changed of sign; mse and mae refer to mean squared error and mean absloute error of the posterior medians respectively. Smaller values indicate better fit

malesfemales
− log p(D)maemse− log p(D)maemise
dysm  55.650.10  10.1946.20.08  10.21
Beta (old-age mortality)453.550.13  33.83307.530.92  98.72
Half-normal (infant mortality)176.030.15    9.79164.070.95  98.72
Removing adult mortality264.200.88  30.7988.230.74  83.74
Multivariate random-walk301.528.78116.76205.524.75135.77
Student-t innovations254.070.90  10.26178.230.91  96.24
dysm, uniform prior359.702.35175.14466.522.01121.61
malesfemales
− log p(D)maemse− log p(D)maemise
dysm  55.650.10  10.1946.20.08  10.21
Beta (old-age mortality)453.550.13  33.83307.530.92  98.72
Half-normal (infant mortality)176.030.15    9.79164.070.95  98.72
Removing adult mortality264.200.88  30.7988.230.74  83.74
Multivariate random-walk301.528.78116.76205.524.75135.77
Student-t innovations254.070.90  10.26178.230.91  96.24
dysm, uniform prior359.702.35175.14466.522.01121.61

4 Conclusion

In this article we have introduced a dynamic model based on a skewed distribution function (dysm) for modelling and forecasting the age-at-death distribution of multiple countries across time. The proposed method automatically regularize diverging scenarios, and allows to obtain predictions which are coherent across sub-populations. In addition, this strategy leads to a borrowing of information which improved the quality of the fit and of the forecasts, compared with state-of-the-art methods for mortality forecasting. Another important advantage of the proposed method is that it provides coherent forecasts without the need of choosing a reference population, which is instead naturally derived from data. The choice of reference population, in fact, is still an open issue of coherent forecast models and has received notable attention from scholars in recent years (Booth, 2020; Kjærgaard et al., 2016).

We have focused directly on the age distribution for the number of deaths, considering the total number of deaths as a given quantity and relying on a multinomial likelihood. This approach allows to focus directly on the distributions of deaths across different ages, accounting for the heterogeneity in the overall mortality trends. For future developments, it might be useful to include a further dynamic level in the hierarchy and model the process of death counts njt. Such an aim can be achieved, for example, using a hierarchical Poisson or Negative-Binomial dynamic model.

Data Availability Statement

The data that support the findings of this study are publicly available at the Human Mortality Database, at https://urldefense.com/v3/__https://www.mortality.org/__;!!N11eV2iwtfs!vLl4T8uK9UTbafiRG3tuesmjmbYYzfLL2h8NR80WddPeKqXfDTXjlO9TZQBmHTBQQTnm-6NXgMNE0hdYmw$. Code implementing the method is available at https://urldefense.com/v3/__https://github.com/emanuelealiverti/dysm__;!!N11eV2iwtfs!vLl4T8uK9UTbafiRG3tuesmjmbYYzfLL2h8NR80WddPeKqXfDTXjlO9TZQBmHTBQQTnm-6NXgMMCJEFYng$

Supporting information

Additional supporting information may be found in the on-line version of the article at the publisher’s website.

Acknowledgements

This work was supported by the grant miur–prin 2017 project 20177BR-JXS. The authors thank David Dunson, Sally Paganin and Tommaso Rigon for their comments and the fruitful discussions.

References

1

Azzalini
,
A.
&
Capitanio
,
A.
(
2013
)
The skew-normal and related families
, volume 3.
Cambridge
:
Cambridge University Press
.

2

Basellini
,
U.
&
Camarda
,
C.G.
(
2019
)
Modelling and forecasting adult age-at-death distributions
.
Population Studies
,
73
(
1
),
119
138
.

3

Basellini
,
U.
&
Camarda
,
C.G.
(
2020
) A three-component approach to model and forecast age at-death distributions. In:
Mazzuco
,
S.
&
Keilman
,
N.
(Eds.)
Developments in demographic forecasting
,
Cham
:
Springer International Publishing
pp.
105
129
.

4

Bergeron-Boucher
,
M.-P.
,
Canudas-Romo
,
V.
,
Oeppen
,
J.
&
Vaupel
,
J.W.
(
2017
)
Coherent forecasts of mortality with compositional data analysis
.
Demographic Research
,
37
,
527
566
.

5

Booth
,
H.
(
2020
) Coherent mortality forecasting with standards: low mortality serves as a guide. In:
Mazzuco
,
S.
&
Keilman
,
N.
(Eds.)
Developments in demographic forecasting
,
Cham
:
Springer International Publishing
, pp.
153
178
.

6

Canudas-Romo
,
V.
(
2010
)
Three measures of longevity: time trends and record values
.
Demography
,
47
(
2
),
299
312
.

7

Carriere
,
J.F.
(
1992
)
Parametric models for life tables
.
Transactions of the Society of Actuaries
,
44
,
77
99
.

8

Cheung
,
S.L.K.
,
Robine
,
J.-M.
,
Tu
,
E.J.-C.
&
Caselli
,
G.
(
2005
)
Three dimensions of the survival curve: horizontalization, verticalization, and longevity extension
.
Demography
,
42
(
2
),
243
258
.

9

Chopin
,
N.
&
Papaspiliopoulos
,
O.
(
2020
)
An introduction to sequential Monte Carlo
.
Berlin
:
Springer International Publishing
.

10

Dellaportas
,
P.
,
Smith
,
A.F.
&
Stavropoulos
,
P.
(
2001
)
Bayesian analysis of mortality data
.
Journal of the Royal Statistical Society: Series A (Statistics in Society)
,
164
(
2
),
275
291
.

11

Drevenstedt
,
G.L.
,
Crimmins
,
E.M.
,
Vasunilashorn
,
S.
&
Finch
,
C.E.
(
2008
)
The rise and fall of excess male infant mortality
.
Proceedings of the National Academy of Sciences
,
105
(
13
),
5016
5021
.

12

Durbin
,
J.
&
Koopman
,
S.J.
(
2012
)
Time series analysis by state space methods
.
Oxford
:
Oxford university press
.

13

Gompertz
,
B.
(
1825
)
On the nature of the function expressive of the law of human mortality, and on a new mode of determining the value of life contingencies
.
Philosophical Transactions of the Royal Society of London
,
115
,
513
583
.

14

Heligman
,
L.
&
Pollard
,
J.H.
(
1980
)
The age pattern of mortality
.
Journal of the Institute of Actuaries
,
107
(
1
),
49
80
.

15

Human Mortality Database
. (
2020
)
University of California Berkeley, USA and Max Planck Institute for Demographic Research, Germany
. data downloaded on October 2020.

16

Hyndman
,
R.J.
&
Ullah
,
M.S.
(
2007
)
Robust forecasting of mortality and fertility rates: a functional data approach
.
Computational Statistics & Data Analysis
,
51
(
10
),
4942
4956
.

17

Hyndman
,
R.J.
,
Booth
,
H.
&
Yasmeen
,
F.
(
2013
)
Coherent mortality forecasting: the product-ratio method with functional time series models
.
Demography
,
50
(
1
),
261
283
.

18

Hyndman
,
R.J.W.C.F.H.B.
,
Tickle
,
L.
&
Maindonald
,
J.
(
2019
)
demography: Forecasting Mortality, Fertility, Migration and Population Data
. R package version 1.22.

19

Keyfitz
,
N.
(
2005
)
Applied mathematical demography
.
Berlin
:
Springer
.

20

Kjærgaard
,
S.
,
Canudas-Romo
,
V.
&
Vaupel
,
J.W.
(
2016
)
The importance of the reference populations for coherent mortality forecasting models
. In European Population Conference.

21

Klein
,
J.P.
&
Moeschberger
,
M.L.
(
2006
)
Survival analysis: techniques for censored and truncated data
.
Berlin
:
Springer Science & Business Media
.

22

Lee
,
R.D.
&
Carter
,
L.R.
(
1992
)
Modeling and forecasting us mortality
.
Journal of the American Statistical Association
,
87
(
419
),
659
671
.

23

Lewer
,
D.
,
Jayatunga
,
W.
,
Aldridge
,
R.W.
,
Edge
,
C.
,
Marmot
,
M.
,
Story
,
A.
et al. (
2020
)
Premature mortality attributable to socioeconomic inequality in england between 2003 and 2018: an observational study
.
The Lancet Public Health
,
5
(
1
),
e33
e41
.

24

Lexis
,
W.H.R.A.
(
1879
)
Sur la durée normale de la vie humaine et sur la théorie de la stabilité des rapports statistiques
. Vve. F. Henry.

25

Li
,
J.
(
2013
)
A Poisson common factor model for projecting mortality and life expectancy jointly for females and males
.
Population Studies
,
67
(
1
),
111
126
.

26

Li
,
N.
&
Lee
,
R.
(
2005
)
Coherent mortality forecasts for a group of populations: an extension of the Lee-Carter method
.
Demography
,
42
(
3
),
575
594
.

27

Lutz
,
W.
&
Samir
,
K.
(
2010
)
Dimensions of global population projections: what do we know about future population trends and structures
?
Philosophical Transactions of the Royal Society B: Biological Sciences
,
365
(
1554
),
2779
2791
.

28

Makeham
,
W.M.
(
1860
)
On the law of mortality and the construction of annuity tables
.
Journal of the Institute of Actuaries
,
8
(
6
),
301
310
.

29

Mazzuco
,
S.
,
Scarpa
,
B.
&
Zanotto
,
L.
(
2018
)
A mortality model based on a mixture distribution function
.
Population Studies
,
72
(
2
),
191
200
.

30

Oeppen
,
J.
(
2008
)
Coherent forecasting of multiple-decrement life tables: a test using Japanese cause of death data
. Compositional data analysis conference.

31

Owen
,
D.B.
(
1956
)
Tables for computing bivariate normal probabilities
.
The Annals of Mathematical Statistics
,
27
(
4
),
1075
1090
.

32

Pascariu
,
M.D.
(
2020
)
MortalityForecast: Standard Tools to Compare and Evaluate Various Mortality Forecasting Methods
. R package version 0.8.0.

33

Pascariu
,
M.D.
,
Lenart
,
A.
&
Canudas-Romo
,
V.
(
2019
)
The maximum entropy mortality model: forecasting mortality using statistical moments
.
Scandinavian Actuarial Journal
,
2019
(
8
),
661
685
.

34

Patefield
,
M.
&
Tandy
,
D.
(
2000
)
Fast and accurate calculation of Owen T-function
.
Journal of Statistical Software
,
5
(
5
),
1
25
.

35

Pearson
,
K.
(
1897
)
The chances of death, and other studies in evolution
, volume 2.
London, England
:
E. Arnold
.

36

Prado
,
R.
&
West
,
M.
(
2010
)
Time series: modeling, computation, and inference
.
Boca Raton
:
CRC Press
.

37

Raftery
,
A.E.
,
Newton
,
M.A.
,
Satagopan
,
J.M.
&
Krivitsky
,
P.N.
(
2007
)
Estimating the integrated likelihood via posterior simulation using the harmonic mean identity
.
Bayesian Statistics
,
8
,
1
45
.

38

Siler
,
W.
(
1983
)
Parameters of mortality in human populations with widely varying life spans
.
Statistics in Medicine
,
2
(
3
),
373
380
.

39

de Valpine
,
P.
,
Paciorek
,
C.
,
Turek
,
D.
,
Michaud
,
N.
,
Anderson-Bergman
,
C.
,
Obermeyer
,
F.
, et al. (
2020
)
nimble: mcmc, particle filtering, and programmable hierarchical modeling
. R package version 0.9.1.

40

West
,
M.
&
Harrison
,
J.
(
1997
)
Bayesian forecasting and dynamic models
.
Berlin
:
Springer Science & Business Media
.

41

Wilson
,
C.
(
2001
)
On the scale of global demographic convergence 1950–2000
.
Population and Development Review
,
27
(
1
),
155
171
.

42

Zanotto
,
L.
,
Canudas-Romo
,
V.
&
Mazzuco
,
S.
(
2020
)
A mixture-function mortality model: illustration of the evolution of premature mortality
.
European Journal of Population
,
37
(
1
),
1
27
.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)