Development and validation of a prediction model for unemployment and work disability among 55 950 Dutch workers

Abstract Background This study developed prediction models for involuntary exit from paid employment through unemployment and disability benefits and examined if predictors and discriminative ability of these models differ between five common chronic diseases. Methods Data from workers in the Lifelines Cohort Study (n = 55 950) were enriched with monthly information on employment status from Statistics Netherlands. Potential predictors included sociodemographic factors, chronic diseases, unhealthy behaviours and working conditions. Data were analyzed using cause-specific Cox regression analyses. Models were evaluated with the C-index and the positive and negative predictive values (PPV and NPV, respectively). The developed models were externally validated using data from the Study on Transitions in Employment, Ability and Motivation. Results Being female, low education, depression, smoking, obesity, low development possibilities and low social support were predictors of unemployment and disability. Low meaning of work and low physical activity increased the risk for unemployment, while all chronic diseases increased the risk of disability benefits. The discriminative ability of the models of the development and validation cohort were low for unemployment (c = 0.62; c = 0.60) and disability benefits (c = 0.68; c = 0.75). After stratification for specific chronic diseases, the discriminative ability of models predicting disability benefits improved for cardiovascular disease (c = 0.81), chronic obstructive pulmonary disease (c = 0.74) and diabetes mellitus type 2 (c = 0.74). The PPV was low while the NPV was high for all models. Conclusion Taking workers’ particular disease into account may contribute to an improved prediction of disability benefits, yet risk factors are better examined at the population level rather than at the individual level.

Introduction P reventing early involuntary exit from paid employment is important for both individuals and the society as a whole. 1,2 Paid employment provides an individual the possibility to earn an income and perform activities that provide meaning and fulfilment and is associated with better health. 1 From a societal perspective, entering unemployment or disability benefits leads to social costs due to less productivity, higher welfare costs and more health care utilization. 2 In industrialized countries, retaining workers in the labour market becomes even more important in the light of an ageing population and, consequently, a higher proportion of workers with a chronic disease. 3 A large variety of risk factors to exit paid employment involuntary have been identified at the population level. Meta-analyses showed that poor health, including having a chronic disease or self-perceived poor health, and unhealthy behaviours, such as obesity and lack of physical activity, are associated with unemployment and disability benefits. 4,5 Regarding work-related factors, low job control, low rewards and high (physical) demands are risk factors for disability benefits. [6][7][8][9] Low decision latitude, low work social support and high job insecurity are found to predict unemployment. 6,10 Especially for workers with chronic diseases, increased attention to these risk factors is needed to ensure that people are able to continue their working lives. 11 Several studies have developed prediction instruments to assess an individual's risk of early exit from paid employment 12 or disability benefits. [12][13][14] Prediction models do not only indicate which factors are associated with an event but also estimate to what extent a specific individual has an increased risk to leave paid employment. This is useful within an occupational health context, as preventive efforts to reduce early exit from paid employment can be better targeted at high-risk groups of workers. Plomp et al. found that higher age, low education, informal caregiving, a larger social network and low selfesteem were risk factors for early exit from work within 3 years among workers 55 years and older with a chronic disease or low physical performance. 12 Another study among soldiers showed that the number of months with temporary restrictions, frequent work excusals, high outpatient care utilization and psychotropic medication were strong predictors for receiving disability benefits over a period of 9 months. 13 Among the general Finnish population, older age, lower socioeconomic position, smoking, self-rated poor health, a higher number of sickness absences in the previous year, chronic illnesses, sleep problems and a higher body mass index (BMI) were all predictive of disability benefits over a period of approximately 9.5 months. Within this study, an alternative prediction model showed that job strain was the only predictor for disability benefits. 14 Previous prediction models often take frequent work excusals or sick leave into account when predicting the risk of early exit from paid employment. 13,14 While these are strong predictors of future work disability, they are, in contrast to health behaviour and working conditions, not modifiable.
Although the described prediction models have been developed in various settings, a few concerns need to be pointed out on the predictability of exit from paid employment. First, an important methodological issue is whether the C-index is a suitable measure of model performance. The C-index for the three studies was moderate to strong, ranging from 0.70 to 0.85. [12][13][14] In models where the outcome of interest has a low occurrence, the models might actually predict remaining in the labour force rather than leaving paid employment early. Therefore, it is relevant to estimate the positive and negative predictive values (PPV and NPV, respectively), which indicate to what extent prediction models are able to identify individuals at risk to leave paid employment or predict who will stay employed. 15 Second, previous studies have examined specific exit routes into disability 13,16 or examined exit routes together. 12 Since unemployment and disability benefits might act as communicating vessels, it is important to construct prediction models and calculate model performance taking these competing risks into account. Third, a common critique is that most prediction models are not externally validated and, thereby, are too optimistic about performance. 17 A clear need exists to externally validate these models, whereby the decision model's prognostic performance, developed in one cohort, is tested in another cohort. Lastly, prediction models that focus on subgroups of workers may have a higher performance than models for the general population. As having a chronic disease is a strong predictor for leaving paid employment through disability benefits 18,19 and as the impact on daily functioning and work differs across diseases, 20 it is of interest to estimate prediction models for workers with a chronic disease specifically and to compare the performance of these models.
The objectives of this study are (i) to develop and externally validate prognostic prediction models for exit from paid employment through unemployment and disability benefits, (ii) to investigate if predictor effects and discriminative ability of the models differ between chronic diseases. If the prediction model is able to identify workers at risk of early exit from paid employment, then occupational health professionals could use these models to support workers, e.g. by optimizing their work environment.

Study design and sample
The current study used data from the Lifelines Cohort and Biobank Study 21 as the development cohort and data from the Study on Transitions in Employment, Ability and Motivation (STREAM) for the validation cohort, 22 and both were linked to register data of Statistics Netherlands.
Lifelines is a multi-disciplinary prospective population-based cohort study using a unique three-generation design to examine the health and health-related behaviours of 167 729 persons living in the North of The Netherlands. It employs a broad range of investigative procedures in assessing the biomedical, sociodemographic, behavioural, physical and psychological factors that contribute to the health and disease of the general population, with a special focus on multimorbidity and complex genetics. Participants were recruited between November 2006 and December 2013 through general practices, family referral and self-registration. 21  Participants were selected if they were between 18 and 65 years and employed at wave 3 (response rate wave 3 ¼ 62.5%; Supplementary Fig. S1). Information on health behaviours and working conditions was also retrieved from this wave. Information on sociodemographic factors and clinical measures for the classification of the included diseases was retrieved from the baseline measures. The median time between the baseline measures and wave 3 was 25 months [interquartile range (IQR) [23][24][25][26][27][28][29]. Lifelines data were enriched with data from Statistics Netherlands with information on main income components, social benefit pensions and gross wages derived from the Dutch tax registers and stored in the social statistical database (SSB). 23 Data were available on a monthly basis from the time of enrolment until December 2018. The median time at risk was 54 months (IQR 44-66).
The validation cohort STREAM is a longitudinal cohort study among older workers aged 45 years and older from 2010 onwards. STREAM was also linked to monthly information on income components from SSB. For more information on the items and constructs for STREAM, see Supplementary file SB.

Outcome variable
Involuntary exit was defined as exiting paid employment early through unemployment or disability benefits. 24 Persons with a disability benefit received benefits for at least 50% of their income. Unemployed persons received either unemployment benefits due to losing their job or social security benefits. An individual needed to be unemployed or receiving disability benefits for at least three months to be included as an event.

Sociodemographic factors
Sociodemographic factors included age, gender, educational level and marital status. Educational level was categorized into low, medium and high educational level. Marital status was dichotomized into being in a relationship versus not being in a relationship.

Chronic disease and multimorbidity
At baseline, cardiovascular disease (CVD), chronic obstructive pulmonary disease (COPD), depression, rheumatoid arthritis and type 2 diabetes mellitus were classified based on previous studies conducted in Lifelines. 25,26 Clinical measures, self-report and medication use were used to classify participants as having one of the chronic diseases. Participants with 2 chronic diseases were considered as having multimorbidity.

Working conditions
Working conditions were measured using six dimensions from an adapted version of the Copenhagen Psychosocial Questionnaire (COPSOQ II). 27 Quantitative demands were measured with two items on getting behind in work and having enough time for work. Work pace was measured with two items on having to work very fast and having a high work pace. Influence at work was measured with items on having influence on the work one has to do and whether one has a high degree of influence on one's work. Possibilities for development were assessed by asking whether one has the possibility to learn new things through work and whether work requires one to take initiative. Meaning of work was measured with two items asking whether someone considers their work to be important and meaningful. Social support was measured by asking about help and support from colleagues and one's superior (two questions) and by asking how often colleagues/ one's superior are/is willing to listen to work-related problems (two questions) (a ¼ 0.76).
All questions were scored on a five-point scale, ranging from 1 (almost never/never) to 5 (always). The answer categories of the working conditions were recoded so that a higher score reflected poorer working conditions. The domain scores were estimated as the sum of scores on the questions within each domain and were multiplied by 0.5 for social support to ensure consistency across domains. Thus, scores could range from 2 to 10 for all domains.

Health behaviour
Smoking was dichotomized with categories 'non-smoking' (including ex-smokers) and 'smoking'. Physical activity was assessed based on one question from the 'Short QUestionnaire to ASsess Health enhancing physical activity' (SQUASH) 28 : 'On average, how many days per week do you cycle, do odd jobs, garden, or exercise for a total of at least half an hour?'. Participants were asked how often they ate (fresh) fruit in the past month and how often they ate (cooked or stir-fried) vegetables. Both questions had seven response categories on an ordinal scale ('6-7 days per week', '4-5 days per week', '2-3 days per week', '1 day per week', '2-3 days per month', '1 day per month' and 'not during the preceding month'). BMI was based on self-reported weight. Participants were categorized as having a healthy weight (BMI 18.5-25 kg/m 2 ), overweight (BMI 25.0-30 kg/m 2 ) or obese (BMI 30.0 kg/m 2 ).

Statistical analyses
First, missing values were examined in the development cohort and ranged from 0.9% for marital status to 33.6% for fruit and vegetable intake. Missing values were imputed using the R mice-package, imputing 20 datasets based on multiple imputation by chained equations. Imputation for the working conditions was performed on item-level and domain scores were calculated after data imputation. Second, cause-specific Cox proportional hazard regression models were fitted to the m ¼ 20 imputed datasets and pooled to analyze the effects of the predictors on early exit from paid employment through unemployment and disability benefits, taking into account competing risks. Individuals were censored in case of missing data or when they exited paid employment through (early) retirement or economic inactivity. Backward elimination was used based on the m ¼ 20 pooled datasets. Variables with the highest P values were removed one by one to obtain a more parsimonious model with variables that had a significant contribution to the events. For variable selection, P < 0.10 was considered significant. The C-index (concordance) was examined to evaluate discriminative ability of the models. The C-index ranges from 0.5 to 1 and a higher C-index indicates better discriminative ability of the model. Third, models including interaction terms with the chronic diseases were examined and analyses were stratified for the five different chronic diseases. Stratified analyses included multimorbidity instead of specific diseases. The C-index was provided and the area under the curve (AUC) was calculated. Lastly, external model validation was performed for the final models using STREAM data. The C-index Notes: Higher scores reflect poorer working conditions. -This measure was not available in STREAM; STREAM included the following chronic diseases: heart disease, respiratory disease, psychological disease, diabetes and musculoskeletal disease. The 'paid employment' category includes workers who are censored during follow-up. and the AUC were calculated for all final models and calibration graphs are shown. For the final models for unemployment and disability benefits, the sensitivity, specificity, PPV and NPV were calculated for different thresholds of 5, 10 and 20% as the risk of early exit varies between these values. 18 Disease-specific models were not possible to externally validate as the sample size in the different disease subgroups became too small in STREAM. Analyses were conducted using R version 3.6.2.

Baseline characteristics
The final study population of the development cohort consisted of 55 950 workers with a mean age of 44.4 years (SD ¼ 9.8).  (table 2). A lower age was associated with a higher risk to receive disability benefits. Being in a relationship was associated with a lower risk. All five chronic Table 2 The influence of personal and work-related predictors on involuntary exit from paid employment in the development cohort (Lifelines, n ¼ 55 950) and validation cohort (STREAM, n ¼ 10 093) Intermediate n/a n/a Low n/a n/a Vegetable intake (high ¼ ref) Intermediate n/a n/a Low n/a n/a BMI (healthy weight ¼ Notes: n reflects number of workers who exit paid employment through this route; higher scores reflect poorer working conditions. n/a indicates that this measure was not available in STREAM; STREAM included the following chronic diseases: heart disease, respiratory disease, psychological disease, diabetes and musculoskeletal disease; HR, hazard ratio. diseases were associated with an increased risk. Furthermore, smoking and obesity increased the risk. Finally, low possibilities for development, low influence and low social support increased the risk of disability benefits. A C-index of 0.68 (95% CI: 0.66-0.70) was found.

Disease-specific models for disability benefits
For unemployment, the interaction between predictors and depression was significant (Supplementary table S3). For disability benefits, the interaction between predictors and CVD, COPD and rheumatoid arthritis was significant (table 3)

External model validation
For unemployment, the C-index in the validation cohort was 0.60 (95% CI: 0.58-0.62) and an AUC of 0.57 (SE ¼ 0.02) was found at 24 months of follow-up. For disability benefits, a C-index of 0.75 (95% CI: 0.73-0.77) was found and the AUC was 0.74 (SE ¼ 0.02). Figure 1 shows the calibration graphs. Overall, calibration of the developed prediction models was reasonable. The sensitivity, specificity, the PPV and NPV were retrieved for 12, 24 and 60 months of follow-up (Supplementary table S4). For all models, the PPV was low (ranging from 5 to 19% for unemployment and from 0 to 18% for disability benefits) while the NPV was high (ranging from 89 to 98% for unemployment and from 97 to 100% for disability benefits).

Discussion
Predictors for exit from paid employment through both unemployment and disability benefits were identified on the level of sociodemographic factors, chronic diseases, health behaviours and working conditions. Model performance in the development cohort and validation cohort yielded low C-indexes, but this improved for disability benefits when risk factors were modelled for workers with CVD, COPD or diabetes. The PPV of the models was low while the NPV of the models was high, indicating that the models more accurately predicted when workers remained employed rather than when workers exited paid employment. The risk factors for involuntary exit from paid employment in the development cohort and the validation cohort correspond with risk factors found in previous research at the population level. Females are known to be at higher risk of early exit from paid employment. 29 Furthermore, having a chronic disease was more strongly associated with disability benefits than with unemployment, 30 which is a less health-driven pathway out of paid employment. Smoking and obesity have previously been shown to increase the risk of both involuntary exit routes. 5,31 Lastly, the role of social support and low influence at work in involuntary exit through unemployment and disability benefits has also been found earlier. 32,33 In the development cohort, having few possibilities for development was a predictor of both involuntary exit routes, and low meaning was a risk factor for unemployment. The Metlife Mature Market Institute (2006) in the USA indicated that, among older workers, an opportunity to do meaningful work was the primary reason to continue Notes: n reflects number of workers who exit paid employment through this route; higher scores reflect poorer working conditions.
working. 34 Whereas that study focused on retirement, results correspond with findings from the current study on unemployment.
Discriminative ability of the current model was lower compared with previous models for disability benefits in which moderate to strong C-indexes ranging from 0.70 to 0.85 were found. 13,14 These differences can be explained by the fact that the previous studies included frequent work excusals or sick leave days in the past year, 13,14 which are strongly related to subsequent disability benefits. When we restricted the study population to workers with specific chronic diseases, discriminative ability of the models increased for disability benefits. However, in previous studies as well as in the current study, the number of individuals who actually leave paid employment involuntary was low. This indicates that the model can better predict who will continue to work instead of who will exit paid work, as also shown in the high NPV and lower PPV. Therefore, it may be more suitable to examine the relative importance of these factors for early exit at a population level rather than making accurate predictions at the individual level. 15 The current study has shed a light on which of these factors are important within these different disease groups. Working conditions seemed to be important especially for workers with COPD and diabetes, whereas smoking was especially important for workers with CVD and obesity was important for workers with rheumatoid arthritis. This information is relevant for occupational health care workers who can discuss these health behaviours and working conditions in consultations with workers.
A strength of the current study is the use of a large representative group of workers from the Lifelines Cohort Study. This enabled the stratification of models across specific chronic diseases, which were classified according to a combination of clinical measures, medication use and self-report. Furthermore, results were validated in a cohort among older workers in which similar constructs were measured. A limitation is different timing of the variables, as chronic diseases were measured at baseline, while health behaviours and working conditions were measured at wave 3. Additionally, the percentage of missing data was rather high for some variables, e.g. fruit and vegetable intake. With regard to the predictors, physical work demands-an important factor related to health-related job loss 35 was unfortunately not measured in Lifelines. Another limitation was that the definition of specific chronic diseases differed between Lifelines and STREAM. Whereas workers with rheumatoid arthritis were included in Lifelines, workers with musculoskeletal problems were included in STREAM, which is a broader concept also including back and neck problems, resulting in a larger proportion of workers reporting this disease. Lastly, while Lifelines included workers of all ages, STREAM included older workers.
To conclude, sociodemographic factors, chronic diseases, unhealthy behaviours and working conditions were associated with unemployment and disability benefits. However, prediction models were not able to accurately estimate a personalized risk. Additional predictors are needed to improve the discriminative ability of prediction models. In addition, further research is needed to identify which predictors are the best targets for prevention. When the risk of the predictors was modelled for chronic diseases individually, model performance increased and personalized estimations were more accurate. Taking workers' particular disease into account may contribute to the prevention of early exit from work into disability benefits.

Supplementary data
Supplementary data are available at EURPUB online.

Data availability statement
Data are stored at Statistics Netherlands. Data are available upon reasonable request, following the guidelines of the Statistics Netherlands.