A Bayesian semi-parametric approach for inference on the population partly conditional mean from longitudinal data with dropout

Studies of memory trajectories using longitudinal data often result in highly non-representative samples due to selective study enrollment and attrition. An additional bias comes from practice effects that result in improved or maintained performance due to familiarity with test content or context. These challenges may bias study findings and severely distort the ability to generalize to the target population. In this study we propose an approach for estimating the finite population mean of a longitudinal outcome conditioning on being alive at a specific time point. We develop a flexible Bayesian semi-parametric predictive estimator for population inference when longitudinal auxiliary information is known for the target population. We evaluate sensitivity of the results to untestable assumptions and further compare our approach to other methods used for population inference in a simulation study. The proposed approach is motivated by 15-year longitudinal data from the Betula longitudinal cohort study. We apply our approach to estimate lifespan trajectories in episodic memory, with the aim to generalize findings to a target population.


Introduction
Studies of lifespan trajectories in memory using longitudinal data present numerous methodological challenges including highly non-representative samples, due to selective study enrollment and attrition (Weuve et al. 2015), and practice effects, which results in improved or maintained performance due to familiarity with test content or context (Salthouse 2010). These challenges may bias study findings and severely distort the ability to generalize to the target population, even in well-designed studies.
Attrition is inevitable especially if individuals in the studied population are followed over a long time period. Standard methods often rely on the assumption of missing at random (MAR) which is invalid if the missingness is missing not at random (MNAR) or due to death. One natural way of dealing with MNAR missingness in any analysis of incomplete data is to explore sensitivity to deviations from the MAR assumption. This can be accomplished in a Bayesian setting, by introducing sensitivity parameters that incorporate prior beliefs about the differences between respondents and non-respondents and assigning an appropriate prior distribution (Daniels and Hogan 2008). However, attrition due to death must be treated differently if a participant's death during followup truncates the outcome process (Kurland et al. 2009). For example, in research on aging patterns in memory, memory performance after death is not defined and not of interest to study. Joint models have been proposed to address the combination of dropout and death (Rizopoulos 2012). This so called mortal cohort inference truncate participants outcomes after death, either by conditioning on the sub-population who are still alive at that time point, i.e. partly conditional inference (Kurland 2005), or by conditioning on the subpopulation that would survive irrespective of exposure, i.e. principal stratification (Frangakis and Rubin 2002;Frangakis et al. 2007). The latter approach has been proposed to handle truncation by death when interest is in estimating the survivors average causal effect (e.g. Josefsson et al. 2016, McGuinness et al. 2019, Shardell and Ferrucci 2018. In contrast, partly conditional inference has been proposed for estimating non-causal associations. For example by using augmented inverse probability weighting for longitudinal cohort data with truncation by death and MAR dropout (Wen, Terrera, and Seaman 2018) as well as MNAR dropout . In addition, Li and Su (2018) proposed an approach for longitudinal semicompeting risk data with MNAR missingness and death.
This type of inference can for example be of interest, when the studied outcome is related to health of the elderly and thereby the need for health promotion and/or disease prevention programs. Since this concerns all individuals who are alive. These methods, however, fails to account for the complex nature of the sampling design such as selective study enrollment.
Post-stratification (PS) is a technique in survey analysis that uses auxiliary information on the finite population to adjust for known or expected discrepancies between the sample and the population. A classical weighting estimator for finite population inference is the Horvitz-Thompson (HT) estimator (Horvitz and Thompson 1952). Although the HT estimator is design unbiased, it can potentially be very inefficient. In contrast, a model-based (MB) approach specifies a model for the study outcome, usually a regression model, which is then used to make predictions for the population, and hence, finite population quantities. Predictions are calculated by plugging in the auxiliary variables for all units in the population in the working model. Although, model-based inferences generally will outperform design-based approaches, valid inference requires correct model specification. This can be difficult when there is a large set of regressors, the relationship is non-linear and/or includes interaction terms, and there are multiple observation times.
Several approaches for poststratification adjustment in a cross-sectional setting have shown improved performance compared to the HT and the MB regression estimator.
Multilevel regression combined with poststratification (MRP; e.g. Little 1997, Park, Gelman, andBafumi 2004) have shown improved performance compared to standard weighting and MB regression estimators, and was used for producing accurate population estimates from a nonrepresentative sample of Xbox computer games users (Wang et al. 2015). The general regression estimator (GREG; Deville and Särndal 1992) is a dual-modelling strategy that combines prediction and weighting. The approach require both a model for the outcome and the participation mechanism, and is double robust in the sense that it remains consistent if either one of the models are correctly specified. Bisbee (2019) combined a semi-parametric machine learning approach (Bayesian Additive Regression Trees (BART); Chipman, George, McCulloch, et al. 2010) with poststratification for predicting opinions using cross-sectional data. Kern et al. (2016) showed in a simulation study that BART and inverse probability weighting using random forests performed better than approximately doubly robust estimators for estimating the target population average treatment effect. Although showing improved performance compared to MRP and GREG, previous semi-parametric approaches for population inference are not valid for longitudinal data with dropout and deaths.
In this study we propose an approach for estimating the finite population mean of a longitudinal continuous outcome conditioning on being alive at a specific time point, i.e. the population partly conditional mean (PPCM). Specifically, we develop a flexible Bayesian semi-parametric predictive estimator, when longitudinal auxiliary information is known for all units in the target population. The approach is to specify observed data 4 models using BART and then to use assumptions with embedded sensitivity parameters to identify and estimate the PPCM. We evaluate sensitivity of the results to untestable assumptions on MNAR dropout and practice effects, and further compare our approach to other methods used for population inference in a simulation study.
We are motivated by the Betula study, a prospective cohort study on memory, health and aging. The aim of the current paper is to extend previous results on cognitive lifespan trajectories (e.g. Rönnlund et al. 2005, Gorbach et al. 2017) by considering population partly conditional inference with MNAR missingness and practice effects. By using longitudinal micro-data from Statistics Sweden and the National Board of Health and Welfare, for both the sample and the target population, we are able to adjust for potential discrepancies in auxiliary variables and thereby improve generalizability of study findings.
The remainder of the paper is organized as follows. In Section 2, we present a motivating example. In Section 3 we present a MB approach for estimating the PPCM using longitudinal data with dropout and deaths and in Section 4 we describe a Bayesian semiparametric modeling approach. In Section 5 we provide results from a simulation study and in Section 6 results from the empirical example using the Betula data. Conclusions are given in Section 7.

Motivating example
The aim of the empirical study is to estimate lifespan trajectories in episodic memory, with the goal to generalize findings to a target population. Two separate sources of data are available for this study; a longitudinal cohort study and a longitudinal database covering the target population.
The Betula study is a population-based cohort study with the objective to study how 5 memory functions change over time and to identify risk factors for dementia (Nilsson et al. 1997). The participants were randomly recruited, stratified by age, from the population registry in the Umeå municipality of Sweden. We consider longitudinal data from the first sample (S1) and four waves of data collection (T1-T4). There was five years in between each wave and the first wave of data collection was initiated in [1988][1989][1990]. A total of n=1000 participants were included, 100 participants from each of the 10 age-cohorts: 35, 40, . . . , 80. In order to obtain a total of 100 subjects in each of the 10 different cohorts, 1,976 persons had to be contacted. Of the 976 that never entered the study, 259 could not be reached, 130 had a illness to the extent that they could not participate, and 481 declined to participate (Nilsson et al., 1997). Memory was assessed at each wave using a composite of five episodic memory tasks, range: 0 -76, where a higher score indicate better memory (Josefsson et al. 2012).
The second data source is the Linneaus Database (Malmberg, Nilsson, and Weinehall 2010), a longitudinal database covering every Swedish resident. The database includes annual data from Statistics Sweden and the National Board of Health and Welfare (similar information as for the Betula sample). In this study we consider micro data for every unit of the population in the Umeå municipality who were alive and non-demented in 1990 as the target population (N = 9203). Although longitudinal data is available annually we restrict data to the years: 1990, 1995, 2000, and 2005, approximately corresponding to the years of testing in the Betula study.
A set of continuous and categorical auxiliary variables, linked to both selective study enrolment and memory, are included in both data sources. From the cause of death register we know death year for each deceased individual. In the Betula sample 29.1% died during the study period and 20.0% of the target population, and there were 128 and 806 dementia cases (12.8% and 8.8%) in the sample and population respectively. vector of auxiliary information x i is observed. A probability sample c of size n is drawn from U . For individual i, denote the participation probability Pr{i ∈ c | x i } by π i . Let y i be the continuous study variable which is observed if i ∈ c. Suppose interest is in studying the finite population mean, µ U = N −1 i∈U y i . If we assume the participation mechanism is ignorable in the sense that y i and i ∈ c are conditionally independent given x i (Rubin 1976), the joint distribution of the outcome, participation mechanism and auxiliary information can be factored into three conditional distributions N i∈Um (x i ). We consider a setting with two separate data sources. Thus, we can not separate out the sample participants in the data covering the target population. Predictions must therefore be made for all participants in the target population.

Longitudinal data with dropout and death
We now consider the problem of finite population inference in the context of longitudinal data with dropout and death. Death must be treated differently than non-response since post-death outcomes are truncated (and do not exist). Here we consider partly conditional inference and, as such, are interested in estimating the finite population mean given survival up to that time point, that is, the population partly conditional mean (PPCM). Initially we further assume the non-response mechanism to be missing at random conditional on being alive at time t (MARS). The working model for making predictions under MARS becomeŝ for all i ∈ U givens it = 1, and is obtained by integrating over the outcome historyȳ t−1 . 8 The MB estimator of the PPCM at time t is given by PPCM M B t = 1 i∈U s it i∈Uŷ it s it . Note that, in context of the Betula study interest is not the PPCM t at a specific test wave, but rather the age specific PPCM aggregated over test waves. That is, where i ∈ U aget is the subset of individuals in age-cohort age at test wave t, and

Non-ignorable dropout among survivors and practice effects
We introduce a set of sensitivity parameters to assess the impact of violations to the MARS assumption for the missingness mechanism. We additionally introduce a sensitivity parameter that account for practice effects, i.e., improved or maintained performance at follow-up testing due to familiarity with test content or context. The general strategy is to model the observed data distribution, and to use priors on the sensitivity parameters to identify the full-data model.
Previous studies of the Betula data suggests that individuals who drop out have lower performance and steeper decline in memory (Josefsson et al. 2012). Thus, we expect dropout to be missing not at random conditioning on survival (MNARS). To allow for deviations from the MARS assumption we introduce a parameter γ it to identify the expected outcome for dropouts among survivors. That is, for all t > 0 and all j < t, this implies a MARS assumption for the outcome, and if γ it > 0 this implies a negative location shift in the outcome at the unobserved test wave.
Improvements in test scores upon repeated assessment due to practice effects (PEs) 9 are well documented in the cognitive aging literature (e.g. Salthouse 2010). PEs arise from increased familiarity with the assessment tools and may result in underestimating the decline in memory in longitudinal studies. Let y * it denote the observed memory score (in contrast to y it which denotes an individual's actual memory function) and δ it denote a sensitivity parameter. Then, for where δ it > 0 implies an overestimated memory performance due to practice effects. Note that we assume no PEs at the initial testing.
Our approach allows to explore sensitivity to the unverifiable assumptions by specifying informative priors for the sensitivity parameters γ i1 , . . . , γ iT , and δ i1 , . . . , δ iT . We specify triangular distributed priors conditioning on auxiliary variables, where the three parameters of the Triangular distribution are the minimum, the mode and the maximum. We restrict the parameters to a plausible range of values, reflecting the analysts' beliefs. In the Analysis of the Betula data we specify values of A t (x it ) and B t (x it ) in context of the study.
With the sensitivity parameters, identification of the PPCM based on the working model in (1), useŝ 4 A semi-parametric approach for estimating the PPCM We propose a Bayesian semi-parametric modelling approach based on Bayesian Additive Regression Trees (BART; Chipman, George, McCulloch, et al. 2010) for the working model in (2) using the observed data and the sensitivity parameters. BART does not rely on strong modeling assumptions for the mean, and in contrast to other tree-based algorithms yields interval estimates for full posterior inference.

Semi-parametric estimation of the outcomes and dropout
We specify BART models for the conditional distributions of the time varying variables y it and r it . The distribution of the continuous outcome, y it , conditioned on the history , for the subset that satisfiesr it = 1 ands it = 1. The mean function is given by the sum-of-trees, distinct binary regression trees denoted by T k y it . Each tree consists of a set of interior node decision rules leading down to b k y it terminal nodes. For a given T k are the associated terminal node parameters.
The BART models for the binary response indicators r it are specified as probit models, , where Φ denotes the cumulative density function of the standard normal distribution and π it (ȳ it−1 ,x it ) is the probability of being observed at wave t given (ȳ it−1 ,x it ) for the subset that satisfiesr it−1 = 1 and s it = 1. Note that, r i0 = 1 and s i0 = 1 for all individuals, and that π it = 0 if r it−1 = 0.

Algorithm
The estimator of the PPCM as described in Section 3 can be computed using the algorithm in Table 1. In practice, draws from the posterior distribution of the BART models are generated using Markov chain Monte Carlo (MCMC). The parameters of the sequential conditional distributions for y it and r it are assumed independent and thus their posteriors can be sampled simultaneously. We use the sparse Dirichlet splitting rule prior for BART (DART; Linero 2018) to encourage parsimony, implemented in the R package BART for continuous and binary responses. To simplify notation we denote the conditional probability π it (ȳ it−1 ,x it ) by π it and the conditional expectation for the outcome at time

Simulation study
In this simulation study we compare five estimators for population inference. The sample and population size, the response rate, and the strength of association betweenŷ it and y it , are chosen to mirror the Betula study.

Data generating process
We consider a finite population of size N = 10, 000, a sample of size n = 1, 000, and 1, 000 simulated datasets were generated. The auxiliary variables are generated independently as follows, x 1 , x 2 ∼ Bernoulli(0.5) and x 3 , x 4 , . . . , x 8 ∼ U nif orm(−1, 1), where x 5 − x 8 are uncorrelated with the outcome. We consider two time points (t = 1, 2) and non-response in the form of dropout (i.e. no deaths). The response rate was set to approximately 75%. Here, interest is in estimating the population mean, µ U = N −1 i∈U y ti , for a continuous outcome variable at t = 2. The approaches are compared in terms of bias, standard deviation (SD), mean squared error (MSE), and coverage of 95% credible intervals. We consider 4 true outcome models: 1. Generalized linear additive models for the sample selection, response mechanism and outcomes. Given the auxiliary variables x 1 − x 8 , the outcome values for i ∈ U were generated as y N (0, 1). The sample selection was generated from the following model logit(π c i ) = −2.67 − 0.4x 1i + 0.4x 2i + 0.4x 3i + 0.4x 3i . In each selected sample, for i ∈ c non-response to the study variable y 2 was generated from the following model logit(π r i ) = −2.7 + 1.2x 1i + 1.2x 2i + 1.2x 3i + 1.2x 4i − 1.2y 1i .
2. Interaction and nonlinear dependencies for the response mechanism. y 1i , y 2i and π c i were generated as for Scenario 1. The response mechanism was generated from the 3. Interactions, nonlinear dependencies and skew normal error terms. y 1i , π r i and π c i were generated as for Scenario 2. However, y 2i was generated according to The error terms for the outcomes were generated from the skew normal distribution, such that ε it ∼ SN (−1.6 * 5 √ 1+5 2 * 2 π , 1.6, 5) for t=1,2, i.e. a right skewed variable with 0 mean, a variance of 1 and a skewness of 1.3.

Estimators for population inference
We compare five estimators for population inference. Our semi-parametric model based approach (MB-sp), was implemented as described in Section 4 and Table 1, but fixing the sensitivity parameters to 0. However, for the fourth scenario we used the PE sensitivity parameter and specified a triangular prior reflecting practice effects, δ i1 ∼ Tri ( In the empirical study using the Betula data, two separate datasets are used for finite population inference and the inclusion probability is not known. Thus, the inclusion weights used in HT and GREG must be estimated using cell weight adjustment. Cell weight adjustment classifies the sample and population into distinct post-stratification cells based on the auxiliary variables recorded for both groups. Note that, continuous variables have to be dichotomized, and further, with a large set of auxiliary variables, the cell sample sizes can be small. This may result in biased and unstable estimates. To overcome the latter, weight trimming and cell collapsing is recommended. In here, only x 1 − x 4 were considered for computing the cell adjustment weights, hence, the uncorrelated variables x 5 − x 8 were omitted. The continuous variables x 3 and x 4 were first categorized into tertiles. Every unique combination of the (categorized) auxiliary variables constituted an adjustment cell. Sparse cells, n j < 20, were identified and combined with their nearest, non-sparse neighbor, i.e. the cell that has the most similar combination of auxiliary variables. Weights larger than 30 were trimmed.
The HT and GREG estimators and their 95% confidence intervals were estimated using the mase package in R, the working models for MB-lm and MRP were fitted using the MCMCpack package and MCMCglmm package in R (R Core Team 2018). MCMC convergence and mixing was monitored using trace plots.

Simulation results
We present the bias, empirical standard deviation (SD; calculated as the standard deviation of the parameter estimates), mean-squared error (MSE), and coverage probabilities for the 95% credible intervals (CP) from the 1,000 simulations in

Analysis of the Betula data
We applied the proposed semi-parametric approach for estimating the PPCM(age) to the Betula data. Here, interest is in estimating the average memory performance across the adult lifespan among non-demented individuals given survival up to a specific age.
We consider population inference in the context of longitudinal cohort data in the presence of practice effects, non-ignorable dropout and death. Longitudinal auxiliary data for both the sample and target population is available for four waves of data collection.
These include the baseline variables age, sex, having children (Y/N), and highest level of education, and additionally, several income variables, benefits received from the government, and marital status were treated as time-varying. Details of baseline characteristics are found in Table 3. Details of the International Classification of Diseases (ICD) codes considered are given in Appendix B of the Supplementary materials.

Sensitivity parameters
The aim of a sensitivity analysis is to evaluate the degree to which inferences are influenced by departures from default assumptions, in our study MARS and no practice effects. We explore sensitivity to these assumptions by specifying informative priors for the sensitivity parameters. Since there is generally little information about the distributional form for the SPs, the prior distributions is often chosen by the analyst reflecting prior beliefs about the departures from the default assumptions.
We assume triangular priors for δ it , reflecting the improvement in memory performance that occur at repeated testings, i.e. practice effects (PEs). The priors were specified as δ i1 ∼ Tri(0, U δ i1 , U δ i1 ) and δ i2 , δ i3 ∼ Tri(0, U δ i2 , U δ i2 ), where U δ i1 = 4.8 − 0.1 * age i + 5.2 * 10 −4 * age 2 i and U δ i2 = 11.0 − 0.3 * age i + 1.9 * 10 −3 * age 2 i . The upper bounds (and modes), were derived from comparing the parameter estimates from regressing memory on age using participants from two different samples (S1 and S3) from the Betula study with unequal testing experience. For t=1, this involved regressing memory at the first test wave for S3 and the second test wave for S1, on age and age squared, for the subsamples of participants who participate in at least two consecutive test waves. And further compute the age-specific differences between those with one versus no previous testing, i.e., differences in memory performance between participants with previous testing experience versus those taking the test for the first time. A similar procedure was implemented for t=2 using data from the third respectively the second test wave of data collection in the Betula study. Note that, practice effects were larger for younger participants and at the third test wave compared to the second.
We further assume γ it = γ i , and specify a triangular prior for γ i , reflecting a decline in memory after dropping out of the study. The prior was specified as γ i ∼ Tri(L γ i , 0, L γ i ).
The lower bound (and mode), L γ i = −8.0 − 0.3 * age i + 3.9 * 10 −3 * age 2 i , was estimated from the Betula data comparing change in memory performance between the first and second wave for responders while adjusting for age and age squared. Note that, this imply that older participants decline more quickly than younger participants after drop out.

Results
We estimated the PPCM using our proposed Bayesian semi-parametric approach with the sensitivity parameters. To reduce computation time of Step 3 of the Algorithm n=14 shorter chains were run in parallel and then combined appropriately instead of a set with one long chain. For each chain the first 1000 iterations were discarded as burn-in, and a total of 1000 posterior samples of the PPCM were obtained. Convergence of the posterior samples was monitored using trace plots.
In the main analysis the priors for the SPs were specified as described in the previous section. For the sensitivity analysis we compare the results for different values of the SPs. Specifically, the SPs were set to i) γ i = 0 while δ it was specified as in the main analysis (MARS and PE adjustment), ii) δ it = 0 for t = 2, 3, 4, while γ i was specified as in the main analysis (MNARS and no PE adjustment), iii) the SPs were specified as twice as large as in the main analysis (2 × δ it for t = 2, 3, 4, and similarly, 2 × γ i ), and iv) γ i = 0 and δ it = 0 for t = 2, 3, 4 (MARS and no PE adjustment). We moreover compared the results from our mortal-cohort analysis to an analysis assuming an immortal cohort, treating non-response (both death and dropout) as missing at random (γ i = 0), and no PE adjustment (δ it = 0 for all t). In this scenario only baseline auxiliary variables were considered since x it is missing when s it = 0.
The results from the main analysis and the various sensitivity analyses are presented in Figure 1. The corresponding 95% credible intervals are also plotted for the main analysis, the immortal-cohort analysis and for the analysis assuming MARS and no PE adjustment ( Figure 1a). For the main analysis, the age-specific PPCM revealed an initial decline in memory performance between the ages 35 to 65, and accelerated decline after the age of 65. Assuming an immortal cohort, no PEs and MAR non-response (both for dropout and death) resulted in a significantly higher estimated memory performance across adulthood, with slightly greater decline at younger ages and less decline in memory after the age of 65.
An analysis assuming MARS missingness and not adjusting for practice effects resulted in a significantly higher estimated memory performance between the ages 40 − 90 compared to the main analysis.
It is apparent in Figure 1b and 1c that adjusting for both PEs and MNARS dropout resulted in lower estimated memory performance across adulthood, although the discrepancy varied in magnitude with respect to age. When the SPs were specified as twice as large as in the main analysis (1c), we see an increased reduction in memory performance, this is more pronounced at older ages. As expected, comparing a PE adjusted analysis to an analysis without PE adjustment (assuming MNARS dropout) revealed lower memory performance for the PE adjusted analysis, although the discrepancy was more pronounced at younger ages. Comparing results from an analysis assuming MNARS and MARS while adjusting for PEs, also revealed lower memory performance, and the difference was more pronounced at older ages.
We compare the results for the Betula data using our approach (MB-sp) with the four other estimators for population inference, MB-lm, HT, MRP, and GREG. The sensitivity parameters, γ i , δ i2 , δ i3 and δ i4 , were for simplicity all set to 0 when estimating the PPCM, i.e assuming MARS missingness and no PEs.
Details of the estimation procedures for MB-lm, HT, MRP, and GREG are described in Section 5.2 and in Appendix A of the Supplementary materials. However, some adjustments were made. For MRP, the continuous variables were first categorized into quartiles plus an additional category if the variable was 0 (added to the model as random effects).
Due to excessive number of cells with a small or zero sample size we only used a subset of the baseline auxiliary variables for computing the cell adjustment weights π i in HT and GREG. These were age, sex, level of education, widowhood, and history of cardiovascular disease.
Results are found in Figure 2. Compared to our semi-parametric approach (MBsp), the HT estimator overestimated memory performance across adulthood, and the three other parametric approaches revealed accelerated decline in memory performance for individuals 70 -95 years old. The large discrepancy in memory performance for HT, and for GREG at older ages, is likely a result from using only a subset of the auxiliary variables, dichotomizing continuous variables and collapsing zero or spars cells, since this may introduce bias in the weights. Given the findings of the simulation study, we are inclined to believe the discrepancies for the parametric approaches (MB-lm and MRP) from our semi-parametric approach are likely due to model-misspecification. Note that, MRP gives results most similar to our semi-parametric approach.

Conclusions
This article proposes a flexible Bayesian semi-parametric predictive estimator for estimating the population partly conditional mean, when a large set of longitudinal auxiliary variables is known for all units in the target population. A key feature of the proposed approach is the flexible modeling approach that effectively addresses nonlinearity and complex interactions. Additionally, BART (using the sparse Dirichlet splitting rule prior) demonstrated excellent predictive performance when irrelevant regressors were added, diminishing the need to carry out formal variable or model selection.
Our study is motivated by the fact that it is becoming increasingly difficult to recruit study participants, which may severely distort the ability to generalize study findings.
The increased availability of microdata covering the population in many countries however, makes post-sampling adjustments an attractive tool. Although weighting is the most popular technique, a large set of auxiliary variables makes cell weight adjustment difficult to implement. In this setting model based approaches are more attractive, but put stronger requirements on correct model specification. As expected, the results of the simulation study showed that the weighting approach (HT) performed poorly across a wide range of scenarios. In contrast, the model based approaches and GREG all performed well under correct specification of the outcome model, although, our semi-parametric method was the only approach who gave unbiased results for the more realistic scenario with unknown nonlinearity and interactions. Furthermore, under the scenario with practice effects our approach performed relatively well compared to the other approaches, showing the importance of adjusting for practice effects.
The goal of the empirical study was to estimate lifespan trajectories in memory for a target population. The results revealed an initial decline in memory performance between the ages 35 to 65, in contrast to previous studies showing stable performance up to the age of 60 (Rönnlund et al. 2005, Gorbach et al. 2017. Furthermore, the standard approach for estimation in previous literature that assumes an immortal cohort, no PEs and MAR non-response revealed significantly higher memory performance across the adult lifespan compared to our approach. This suggests that in previous studies the magnitude of memory performance across adulthood is likely overestimated while the rate of change is likely underestimated, especially at older ages. This is due to both selective study enrollment and attrition. Our approach allows for Bayesian inference under MNAR missingness and truncation by death, as well as the ability to characterize uncertainty about practice effects. This was accomplished by introducing sensitivity parameters (SPs) that incorporated prior beliefs.
A strength of the current approach is that inference with SPs and a mortal cohort is relatively easy to implement and communicate to non-statisticians. However, specifying an appropriate prior distribution can sometimes be difficult; a alternative approach could be a tipping point analysis (Yan, Lee, and Li 2009). In a tipping point analysis subject matter experts can discuss whether the tipping point for the SPs are plausible, which may aid in making judgment based on study findings.

Supplementary online material
In Appendix A of the Supplementary materials we provide details for the alternative estimators for the PPCM. Classification of Diseases (ICD) codes considered are given in Appendix B. R code for the main analysis of the Betula data and the simulation study using our approach also accompany the paper as supplementary material.
Tables and Figure   Table 1: Algorithm for estimation of the PPCM as described in Sections 3 and 4. 1. Models for the outcomes and response mechanisms: For t = 1, . . . , T , sample from the observed data posteriors for the parameters of the conditional distribution of y t and r t using DART.

Sensitivity parameters:
For all i ∈ U and t = 1, . . . , T , sample one set from the prior distributions for γ it and δ it .

30
x k , where k = 1, . . . , K, and j x k = 1, . . . , J x k . All random effects are modeled using independent normal distributions, such that b x k ∼ N (0, σ 2 x k ), k = 1, . . . , K, and, ε it ∼ N (0, σ 2 M RPt ). In the second step, the multilevel model is used for making predictionŝ y it fromȳ it−1 andx it , for all i ∈ U givens it = 1, and is (similarly to our approach) obtained by integrating over the outcome historyȳ t−1 . The poststratification estimate for the population mean is given by PPCM M RP t = 1 i∈U s it i∈Uŷ it s it . The third approach is an extension of the classic Horvitz-Thompson weighting estimator (HT),μ HT = 1 N i∈c y i π i . In a longitudinal setting with MAR missingness and death, the HT estimator is obtained by replacing π i with the probability of participation at time t given survival at that time point, here denoted by w it . The PPCM t is given by, Pr(r ik = 1 |ȳ ik−1 ,r ik−1 ,x ik ,s ik = 1) for all i ∈ c. The response mechanism, the second term on the right hand side, can be estimated from data using e.g. a logistic regression model. In the empirical study using the Betula data, two separate datasets are used for finite population inference and the participation probability is not known. Then π i must be estimated from data using cell weight adjustment. Cell weight adjustment classifies the sample and population into distinct post-stratification cells based on the auxiliary variables recorded for both groups.
The sampled participants in cell j are weighted by the inverse of the sampled rate in cell j. That is, for individual i ∈ j,π i = n j /N j , where n j is the number of individuals in cell j in the sample and N j is the number of individuals in cell j in the population.
However, difficulties may arise in this setting. For example, continuous variables have to be dichotomized, and further, with a large set of auxiliary variables, the cell sample sizes can be small. This may result in biased and unstable estimates. To overcome the latter, weight trimming and cell collapsing is recommended.
The fourth approach is a dual-modelling strategy that combines prediction and weighting. The general regression estimator (GREG) at time t is given by, The approach require both a model for the outcome and the participation mechanism, and is double robust in the sense that it remains consistent if either one of the models are correctly specified.