- Split View
-
Views
-
CiteCitation
M.A. Green, M. Strong, F. Razak, S.V. Subramanian, C. Relton, P. Bissell; Who are the obese? A cluster analysis exploring subgroups of the obese, Journal of Public Health, Volume 38, Issue 2, 1 June 2016, Pages 258–264, https://doi.org/10.1093/pubmed/fdv040
Download citation file:
© 2018 Oxford University Press
Close -
Share
Abstract
Body mass index (BMI) can be used to group individuals in terms of their height and weight as obese. However, such a distinction fails to account for the variation within this group across other factors such as health, demographic and behavioural characteristics. The study aims to examine the existence of subgroups of obese individuals.
Data were taken from the Yorkshire Health Study (2010–12) including information on demographic, health and behavioural characteristics. Individuals with a BMI of ≥30 were included. A two-step cluster analysis was used to define groups of individuals who shared common characteristics.
The cluster analysis found six distinct groups of individuals whose BMI was ≥30. These subgroups were heavy drinking males, young healthy females; the affluent and healthy elderly; the physically sick but happy elderly; the unhappy and anxious middle aged and a cluster with the poorest health.
It is important to account for the important heterogeneity within individuals who are obese. Interventions introduced by clinicians and policymakers should not target obese individuals as a whole but tailor strategies depending upon the subgroups that individuals belong to.
Introduction
Individuals with a body mass index (BMI) of ≥30 are classified as obese. However, this classification assumes homogeneity within the group. BMI is a classification of weight and height, not of people, who are only similar in terms of their BMI category.1 Using a single classification for obesity will fail to recognize the variation between the individuals who are obese.2 For example, there has been debate about the existence of a metabolically healthy and fit set of obese individuals, contrary to the traditional understanding of the impact of obesity on health.3 Failing to acknowledge the population-level heterogeneity among those classified as obese may restrict the effectiveness of interventions or treatments offered by clinicians, since approaches may need tailoring depending upon the health practices associated with individuals involved.
There has been relatively little consideration of the population-level heterogeneity of those classified as obese. There is some evidence that obesity-related behaviours including physical activities, diet/nutrition, lifestyle and sedentary behaviours cluster together among adolescents across all BMI categories.4–10 For example, Ogden et al.11 used a cluster analysis to identify subgroups of individuals who were successful in maintaining weight loss. Analysis showed that individuals differed in their strategies and outcomes, indicating that weight loss interventions require tailoring towards individuals to be most effective. Such an approach may be useful for thinking about obesity, allowing our understanding to move beyond a single classification of individuals as just obese.
Obesity has been shown to be associated with a variety of demographic factors, for example age,12 gender12,13 and deprivation.14,15 There are behavioural differences such as reduced physical activity7 and poorer diet.9 Obesity is associated with increased risk of adverse health outcomes including diabetes, cardiovascular diseases, stroke and osteoarthritis.16–18 Such research tends to consider obesity as a single discrete factor, analysing its relationship to a single outcome variable treated in isolation to the other factors. This ignores how these demographic, health and behavioural factors are inter-related among (and between) certain groups of individuals.7 Exploring the heterogeneity of obese individuals will help to identify population subgroups that can help clinicians and policymakers explore the need for differing strategies/interventions in helping individuals lose weight.
Methodology
Data
Data were taken from the first wave of the Yorkshire Health Study (YHS) (2010–12). The YHS is a longitudinal observational study that collects information on the health and heath needs of individuals in the Yorkshire region of England (the first wave only collected data for residents in South Yorkshire).19,20 The focus of the YHS is on weight, weight management and chronic health conditions. Data were collected through recruiting general practitioners surgeries (43 accepted; 50% acceptance) and sending out questionnaires to all patients aged 16–85 (achieving a response rate of 15.9%). Data were self-reported. The first wave of the YHS collected data on 27806 individuals, of which 4144 were classified as having a BMI ≥30.
Demographic variables included were age, sex, ethnicity and socioeconomic deprivation due to their importance in previous research.12–14 Ethnicity was reported as a binary variable for whether an individual was ‘White’ or ‘Non-White’. Deprivation was determined using the area individuals lived in. The area measure ‘Indices of Deprivation 2010’ was used since it is a multidimensional measure presenting a detailed understanding of the multiple determinants of deprivation.21
Health-related variables included were whether an individual reported the following chronic conditions: fatigue, pain, insomnia, anxiety, depression, diabetes, breathing problems, high blood pressure, heart disease, osteoarthritis, stroke or cancer. The EuroQoL EQ5D was included as a measure an individual's health-related quality of life.22 Well-being was captured through asking individuals how satisfied they were of their life on a scale of 0 (completely dissatisfied) to 10 (completely satisfied).
Variables related to behavioural characteristics included were smoking status (binary variable for whether individuals smoked or not), the number of units of alcohol consumed in the previous week, whether an individual engaged in >1 h of physical activity a week and whether an individual walked for >1 h in a week. The low-level cut-off points for physical activity and walking were chosen to capture sedentary characteristics. Finally, we included a binary factor relating to whether an individual engaged in active management of their own weight (including use of slimming clubs, increasing exercise, controlling portion size, eating healthier, using over-the-counter weight loss medication or using meal replacements).
Analysis
Cluster analysis was used to identify subgroups within the data. The method groups together individuals based upon similarity across the set of characteristics defined above. The analysis is exploratory and hypothesis generating.23,24 Although it cannot identify causation, the results are often used to drive future research.4
As the data included both binary and continuous variables, a two-step cluster analysis method was used.7,25 The method operates through firstly scanning the data in a pre-classificatory stage and identifying ‘dense’ regions of data known as cluster features (data points that share similar values across a range of variables).25 An algorithm similar to an agglomerative hierarchical clustering method is then used to classify the data.25 The log-likelihood is used as a distance measure since it normalizes distance between different data types.25 Continuous variables are also standardized using z-scores to allow for greater comparability between the different scales.23 The analysis was conducted using SPSS (version 21).
There is no well-defined method for determining the optimal number of clusters.23,26 The number of clusters needs to be large enough to capture the important features in the data, but not too large that interpretation becomes difficult.23,24 Chiu et al.22 recommend using Schwartz's Bayesian Information Criterion (BIC) to inform the decision on the number of clusters that best represents the underlying structure of the data, as it has been shown to be useful.27 Change in BIC was plotted against the number of clusters to identify a ‘kink’ in the relationship that would demonstrate when subsequent solutions added less detail to the results.23
Interpretation of the results will be described through calculating the mean values of the variables for each cluster (with clusters labelled accordingly). The coefficient of variation was also calculated to present a normalized measure of the variation in variables to help assess their contribution to cluster formation.
To test the stability of the clusters, a replication analysis was conducted to assess how robust the clusters are at capturing the structure of the data. Blashfield and Macintyre's28 split sample method was used. The procedure randomly divides the sample into half and performs the cluster analysis using the same rules and parameters from the main cluster analysis on each sample. The results from one of the samples are then taken and applied to the other sample, using the clusters derived from the first sample to classify the data in the second sample. The cluster centres for the two solutions for the second sample are compared, and Cohen's kappa coefficient is calculated to measure the agreement between their equivalent clusters.
Results
Table 1 presents the demographic characteristics of the sample of individuals defined as obese using BMI (deprivation is reported in national quintiles). Mean age of the sample was 56 (SD = 15). Mean BMI was 34 (SD = 4). 57.6% of the sample were female, 95.2% were White, and individuals were most commonly found in more deprived areas. The obese sample contained a higher proportion of older people, females and individuals from deprived areas than the rest of the YHS (which is broadly representative of South Yorkshire20).
Description of the demographic factors (%) of individuals whose body mass index (BMI) was ≥30
| Variable | Obese sample (BMI ≥30) |
|---|---|
| Gender | |
| Female | 57.6 |
| Male | 42.4 |
| Age | |
| ≤24 | 4.9 |
| 25–34 | 7.7 |
| 35–44 | 11.6 |
| 45–54 | 16.8 |
| 55–64 | 23.7 |
| 65–74 | 23.3 |
| ≥75 | 11.9 |
| Deprivation quintile | |
| 1 (Least deprived) | 8.9 |
| 2 | 19.5 |
| 3 | 16.1 |
| 4 | 20.9 |
| 5 (Most deprived) | 34.6 |
| Ethnicity | |
| White | 95.2 |
| Non-White | 4.8 |
| Variable | Obese sample (BMI ≥30) |
|---|---|
| Gender | |
| Female | 57.6 |
| Male | 42.4 |
| Age | |
| ≤24 | 4.9 |
| 25–34 | 7.7 |
| 35–44 | 11.6 |
| 45–54 | 16.8 |
| 55–64 | 23.7 |
| 65–74 | 23.3 |
| ≥75 | 11.9 |
| Deprivation quintile | |
| 1 (Least deprived) | 8.9 |
| 2 | 19.5 |
| 3 | 16.1 |
| 4 | 20.9 |
| 5 (Most deprived) | 34.6 |
| Ethnicity | |
| White | 95.2 |
| Non-White | 4.8 |
Description of the demographic factors (%) of individuals whose body mass index (BMI) was ≥30
| Variable | Obese sample (BMI ≥30) |
|---|---|
| Gender | |
| Female | 57.6 |
| Male | 42.4 |
| Age | |
| ≤24 | 4.9 |
| 25–34 | 7.7 |
| 35–44 | 11.6 |
| 45–54 | 16.8 |
| 55–64 | 23.7 |
| 65–74 | 23.3 |
| ≥75 | 11.9 |
| Deprivation quintile | |
| 1 (Least deprived) | 8.9 |
| 2 | 19.5 |
| 3 | 16.1 |
| 4 | 20.9 |
| 5 (Most deprived) | 34.6 |
| Ethnicity | |
| White | 95.2 |
| Non-White | 4.8 |
| Variable | Obese sample (BMI ≥30) |
|---|---|
| Gender | |
| Female | 57.6 |
| Male | 42.4 |
| Age | |
| ≤24 | 4.9 |
| 25–34 | 7.7 |
| 35–44 | 11.6 |
| 45–54 | 16.8 |
| 55–64 | 23.7 |
| 65–74 | 23.3 |
| ≥75 | 11.9 |
| Deprivation quintile | |
| 1 (Least deprived) | 8.9 |
| 2 | 19.5 |
| 3 | 16.1 |
| 4 | 20.9 |
| 5 (Most deprived) | 34.6 |
| Ethnicity | |
| White | 95.2 |
| Non-White | 4.8 |
Change in BIC against number of clusters is shown in Fig. 1. There are two obvious kinks in the plot: one at a three-cluster solution and the other at a six-cluster solution. Although the three-cluster solution leads to a distinct change in the gradient of the slope, the BIC continues to decline considerably until the six-cluster solution. This would suggest that a six-cluster solution offers greater discriminatory power by capturing further variation in the data that would be missed out if the broader three-cluster solution was used. A six-cluster solution was chosen, and Table 2 presents the mean characteristics of each cluster from this solution.
The mean values of variables split by clusters
| Variable | Clusters | All individuals | Coefficient of variation | |||||
|---|---|---|---|---|---|---|---|---|
| Physically sick but happy elderly | Affluent healthy elderly | Younger healthy females | Unhappy anxious middle aged | Heavy drinking males | Poorest health | |||
| Sample size | 794 | 555 | 1021 | 577 | 887 | 310 | 4144 | |
| Mean body mass index | 34.41 | 33.68 | 34.06 | 34.32 | 32.98 | 36.49 | 34.07 | 0.03 |
| Mean age | 67 | 62 | 49 | 52 | 52 | 62 | 56 | 0.13 |
| Proportion male | 0.48 | 0.53 | 0.00 | 0.27 | 1.00 | 0.56 | 0.46 | 0.72 |
| Proportion non-White | 0.01 | 0.03 | 0.03 | 0.03 | 0.03 | 0.02 | 0.03 | 0.28 |
| Mean deprivation score | 27.07 | 23.78 | 24.38 | 27.48 | 24.37 | 33.94 | 25.96 | 0.15 |
| Mean life satisfaction score | 7.45 | 7.99 | 7.55 | 5.62 | 7.6 | 4.76 | 7.12 | 0.18 |
| Mean EQ5D | 0.60 | 0.87 | 0.88 | 0.59 | 0.87 | 0.21 | 0.73 | 0.36 |
| Proportion with fatigue | 0.40 | 0.03 | 0.02 | 0.70 | 0.04 | 0.82 | 0.25 | 1.44 |
| Proportion with pain | 0.76 | 0.03 | 0.07 | 0.58 | 0.09 | 0.91 | 0.33 | 1.18 |
| Proportion with insomnia | 0.08 | 0.01 | 0.00 | 0.32 | 0.01 | 0.36 | 0.09 | 1.84 |
| Proportion with anxiety | 0.03 | 0.03 | 0.01 | 0.56 | 0.01 | 0.58 | 0.13 | 2.19 |
| Proportion with depression | 0.02 | 0.03 | 0.02 | 0.46 | 0.01 | 0.69 | 0.13 | 2.28 |
| Proportion with diabetes | 0.32 | 0.18 | 0.04 | 0.04 | 0.08 | 0.38 | 0.15 | 0.98 |
| Proportion with breathing problems | 0.27 | 0.07 | 0.07 | 0.15 | 0.06 | 0.47 | 0.15 | 1.08 |
| Proportion with high blood pressure | 0.62 | 0.99 | 0.00 | 0.15 | 0.02 | 0.70 | 0.33 | 1.25 |
| Proportion with heart disease | 0.23 | 0.04 | 0.02 | 0.01 | 0.04 | 0.36 | 0.09 | 1.61 |
| Proportion with osteoarthritis | 0.38 | 0.08 | 0.03 | 0.11 | 0.03 | 0.44 | 0.15 | 1.22 |
| Proportion with stroke | 0.04 | 0.01 | 0.00 | 0.02 | 0.01 | 0.13 | 0.02 | 2.42 |
| Proportion with cancer | 0.07 | 0.03 | 0.01 | 0.03 | 0.01 | 0.05 | 0.03 | 0.78 |
| Proportion who smoke | 0.08 | 0.06 | 0.12 | 0.16 | 0.13 | 0.21 | 0.12 | 0.45 |
| Mean alcohol intake (units/week) | 5.31 | 8.03 | 4.98 | 4.85 | 11.86 | 6.57 | 7.03 | 0.38 |
| Proportion who walk >1 h/week | 0.26 | 0.46 | 0.44 | 0.36 | 0.43 | 0.08 | 0.37 | 0.40 |
| Proportion who do physical exercise >1 h/week | 0.31 | 0.49 | 0.51 | 0.40 | 0.48 | 0.12 | 0.42 | 0.36 |
| Proportion who actively manage their weight | 0.87 | 0.89 | 0.93 | 0.96 | 0.79 | 0.73 | 0.87 | 0.10 |
| Variable | Clusters | All individuals | Coefficient of variation | |||||
|---|---|---|---|---|---|---|---|---|
| Physically sick but happy elderly | Affluent healthy elderly | Younger healthy females | Unhappy anxious middle aged | Heavy drinking males | Poorest health | |||
| Sample size | 794 | 555 | 1021 | 577 | 887 | 310 | 4144 | |
| Mean body mass index | 34.41 | 33.68 | 34.06 | 34.32 | 32.98 | 36.49 | 34.07 | 0.03 |
| Mean age | 67 | 62 | 49 | 52 | 52 | 62 | 56 | 0.13 |
| Proportion male | 0.48 | 0.53 | 0.00 | 0.27 | 1.00 | 0.56 | 0.46 | 0.72 |
| Proportion non-White | 0.01 | 0.03 | 0.03 | 0.03 | 0.03 | 0.02 | 0.03 | 0.28 |
| Mean deprivation score | 27.07 | 23.78 | 24.38 | 27.48 | 24.37 | 33.94 | 25.96 | 0.15 |
| Mean life satisfaction score | 7.45 | 7.99 | 7.55 | 5.62 | 7.6 | 4.76 | 7.12 | 0.18 |
| Mean EQ5D | 0.60 | 0.87 | 0.88 | 0.59 | 0.87 | 0.21 | 0.73 | 0.36 |
| Proportion with fatigue | 0.40 | 0.03 | 0.02 | 0.70 | 0.04 | 0.82 | 0.25 | 1.44 |
| Proportion with pain | 0.76 | 0.03 | 0.07 | 0.58 | 0.09 | 0.91 | 0.33 | 1.18 |
| Proportion with insomnia | 0.08 | 0.01 | 0.00 | 0.32 | 0.01 | 0.36 | 0.09 | 1.84 |
| Proportion with anxiety | 0.03 | 0.03 | 0.01 | 0.56 | 0.01 | 0.58 | 0.13 | 2.19 |
| Proportion with depression | 0.02 | 0.03 | 0.02 | 0.46 | 0.01 | 0.69 | 0.13 | 2.28 |
| Proportion with diabetes | 0.32 | 0.18 | 0.04 | 0.04 | 0.08 | 0.38 | 0.15 | 0.98 |
| Proportion with breathing problems | 0.27 | 0.07 | 0.07 | 0.15 | 0.06 | 0.47 | 0.15 | 1.08 |
| Proportion with high blood pressure | 0.62 | 0.99 | 0.00 | 0.15 | 0.02 | 0.70 | 0.33 | 1.25 |
| Proportion with heart disease | 0.23 | 0.04 | 0.02 | 0.01 | 0.04 | 0.36 | 0.09 | 1.61 |
| Proportion with osteoarthritis | 0.38 | 0.08 | 0.03 | 0.11 | 0.03 | 0.44 | 0.15 | 1.22 |
| Proportion with stroke | 0.04 | 0.01 | 0.00 | 0.02 | 0.01 | 0.13 | 0.02 | 2.42 |
| Proportion with cancer | 0.07 | 0.03 | 0.01 | 0.03 | 0.01 | 0.05 | 0.03 | 0.78 |
| Proportion who smoke | 0.08 | 0.06 | 0.12 | 0.16 | 0.13 | 0.21 | 0.12 | 0.45 |
| Mean alcohol intake (units/week) | 5.31 | 8.03 | 4.98 | 4.85 | 11.86 | 6.57 | 7.03 | 0.38 |
| Proportion who walk >1 h/week | 0.26 | 0.46 | 0.44 | 0.36 | 0.43 | 0.08 | 0.37 | 0.40 |
| Proportion who do physical exercise >1 h/week | 0.31 | 0.49 | 0.51 | 0.40 | 0.48 | 0.12 | 0.42 | 0.36 |
| Proportion who actively manage their weight | 0.87 | 0.89 | 0.93 | 0.96 | 0.79 | 0.73 | 0.87 | 0.10 |
The mean values of variables split by clusters
| Variable | Clusters | All individuals | Coefficient of variation | |||||
|---|---|---|---|---|---|---|---|---|
| Physically sick but happy elderly | Affluent healthy elderly | Younger healthy females | Unhappy anxious middle aged | Heavy drinking males | Poorest health | |||
| Sample size | 794 | 555 | 1021 | 577 | 887 | 310 | 4144 | |
| Mean body mass index | 34.41 | 33.68 | 34.06 | 34.32 | 32.98 | 36.49 | 34.07 | 0.03 |
| Mean age | 67 | 62 | 49 | 52 | 52 | 62 | 56 | 0.13 |
| Proportion male | 0.48 | 0.53 | 0.00 | 0.27 | 1.00 | 0.56 | 0.46 | 0.72 |
| Proportion non-White | 0.01 | 0.03 | 0.03 | 0.03 | 0.03 | 0.02 | 0.03 | 0.28 |
| Mean deprivation score | 27.07 | 23.78 | 24.38 | 27.48 | 24.37 | 33.94 | 25.96 | 0.15 |
| Mean life satisfaction score | 7.45 | 7.99 | 7.55 | 5.62 | 7.6 | 4.76 | 7.12 | 0.18 |
| Mean EQ5D | 0.60 | 0.87 | 0.88 | 0.59 | 0.87 | 0.21 | 0.73 | 0.36 |
| Proportion with fatigue | 0.40 | 0.03 | 0.02 | 0.70 | 0.04 | 0.82 | 0.25 | 1.44 |
| Proportion with pain | 0.76 | 0.03 | 0.07 | 0.58 | 0.09 | 0.91 | 0.33 | 1.18 |
| Proportion with insomnia | 0.08 | 0.01 | 0.00 | 0.32 | 0.01 | 0.36 | 0.09 | 1.84 |
| Proportion with anxiety | 0.03 | 0.03 | 0.01 | 0.56 | 0.01 | 0.58 | 0.13 | 2.19 |
| Proportion with depression | 0.02 | 0.03 | 0.02 | 0.46 | 0.01 | 0.69 | 0.13 | 2.28 |
| Proportion with diabetes | 0.32 | 0.18 | 0.04 | 0.04 | 0.08 | 0.38 | 0.15 | 0.98 |
| Proportion with breathing problems | 0.27 | 0.07 | 0.07 | 0.15 | 0.06 | 0.47 | 0.15 | 1.08 |
| Proportion with high blood pressure | 0.62 | 0.99 | 0.00 | 0.15 | 0.02 | 0.70 | 0.33 | 1.25 |
| Proportion with heart disease | 0.23 | 0.04 | 0.02 | 0.01 | 0.04 | 0.36 | 0.09 | 1.61 |
| Proportion with osteoarthritis | 0.38 | 0.08 | 0.03 | 0.11 | 0.03 | 0.44 | 0.15 | 1.22 |
| Proportion with stroke | 0.04 | 0.01 | 0.00 | 0.02 | 0.01 | 0.13 | 0.02 | 2.42 |
| Proportion with cancer | 0.07 | 0.03 | 0.01 | 0.03 | 0.01 | 0.05 | 0.03 | 0.78 |
| Proportion who smoke | 0.08 | 0.06 | 0.12 | 0.16 | 0.13 | 0.21 | 0.12 | 0.45 |
| Mean alcohol intake (units/week) | 5.31 | 8.03 | 4.98 | 4.85 | 11.86 | 6.57 | 7.03 | 0.38 |
| Proportion who walk >1 h/week | 0.26 | 0.46 | 0.44 | 0.36 | 0.43 | 0.08 | 0.37 | 0.40 |
| Proportion who do physical exercise >1 h/week | 0.31 | 0.49 | 0.51 | 0.40 | 0.48 | 0.12 | 0.42 | 0.36 |
| Proportion who actively manage their weight | 0.87 | 0.89 | 0.93 | 0.96 | 0.79 | 0.73 | 0.87 | 0.10 |
| Variable | Clusters | All individuals | Coefficient of variation | |||||
|---|---|---|---|---|---|---|---|---|
| Physically sick but happy elderly | Affluent healthy elderly | Younger healthy females | Unhappy anxious middle aged | Heavy drinking males | Poorest health | |||
| Sample size | 794 | 555 | 1021 | 577 | 887 | 310 | 4144 | |
| Mean body mass index | 34.41 | 33.68 | 34.06 | 34.32 | 32.98 | 36.49 | 34.07 | 0.03 |
| Mean age | 67 | 62 | 49 | 52 | 52 | 62 | 56 | 0.13 |
| Proportion male | 0.48 | 0.53 | 0.00 | 0.27 | 1.00 | 0.56 | 0.46 | 0.72 |
| Proportion non-White | 0.01 | 0.03 | 0.03 | 0.03 | 0.03 | 0.02 | 0.03 | 0.28 |
| Mean deprivation score | 27.07 | 23.78 | 24.38 | 27.48 | 24.37 | 33.94 | 25.96 | 0.15 |
| Mean life satisfaction score | 7.45 | 7.99 | 7.55 | 5.62 | 7.6 | 4.76 | 7.12 | 0.18 |
| Mean EQ5D | 0.60 | 0.87 | 0.88 | 0.59 | 0.87 | 0.21 | 0.73 | 0.36 |
| Proportion with fatigue | 0.40 | 0.03 | 0.02 | 0.70 | 0.04 | 0.82 | 0.25 | 1.44 |
| Proportion with pain | 0.76 | 0.03 | 0.07 | 0.58 | 0.09 | 0.91 | 0.33 | 1.18 |
| Proportion with insomnia | 0.08 | 0.01 | 0.00 | 0.32 | 0.01 | 0.36 | 0.09 | 1.84 |
| Proportion with anxiety | 0.03 | 0.03 | 0.01 | 0.56 | 0.01 | 0.58 | 0.13 | 2.19 |
| Proportion with depression | 0.02 | 0.03 | 0.02 | 0.46 | 0.01 | 0.69 | 0.13 | 2.28 |
| Proportion with diabetes | 0.32 | 0.18 | 0.04 | 0.04 | 0.08 | 0.38 | 0.15 | 0.98 |
| Proportion with breathing problems | 0.27 | 0.07 | 0.07 | 0.15 | 0.06 | 0.47 | 0.15 | 1.08 |
| Proportion with high blood pressure | 0.62 | 0.99 | 0.00 | 0.15 | 0.02 | 0.70 | 0.33 | 1.25 |
| Proportion with heart disease | 0.23 | 0.04 | 0.02 | 0.01 | 0.04 | 0.36 | 0.09 | 1.61 |
| Proportion with osteoarthritis | 0.38 | 0.08 | 0.03 | 0.11 | 0.03 | 0.44 | 0.15 | 1.22 |
| Proportion with stroke | 0.04 | 0.01 | 0.00 | 0.02 | 0.01 | 0.13 | 0.02 | 2.42 |
| Proportion with cancer | 0.07 | 0.03 | 0.01 | 0.03 | 0.01 | 0.05 | 0.03 | 0.78 |
| Proportion who smoke | 0.08 | 0.06 | 0.12 | 0.16 | 0.13 | 0.21 | 0.12 | 0.45 |
| Mean alcohol intake (units/week) | 5.31 | 8.03 | 4.98 | 4.85 | 11.86 | 6.57 | 7.03 | 0.38 |
| Proportion who walk >1 h/week | 0.26 | 0.46 | 0.44 | 0.36 | 0.43 | 0.08 | 0.37 | 0.40 |
| Proportion who do physical exercise >1 h/week | 0.31 | 0.49 | 0.51 | 0.40 | 0.48 | 0.12 | 0.42 | 0.36 |
| Proportion who actively manage their weight | 0.87 | 0.89 | 0.93 | 0.96 | 0.79 | 0.73 | 0.87 | 0.10 |
Change in the Bayesian Information Criterion (BIC) against number of clusters.
Change in the Bayesian Information Criterion (BIC) against number of clusters.
The largest cluster was ‘younger healthy females’ who were the youngest cluster. They displayed the most positive health characteristics of all the clusters and also engaged in some healthy behaviours. ‘Heavy drinking males’ were similar to ‘young healthy females’ except with respect to their high alcohol consumption. This group was also less likely to be managing their weight, although they did report above average levels of physical exercise and walking.
The ‘unhappy anxious middle aged’ group were primarily female. They have poor mental health, with a low EQ5D and high values for insomnia, anxiety, depression and fatigue. Their sense of well-being is relatively low. However, this group does engage in healthy physical activity and in weight management, and has the lowest alcohol consumption.
The final three clusters capture different patterns among older people. The ‘affluent healthy elderly’ is the least deprived cluster. They have positive health characteristics (although have above average alcohol consumption). The individuals in this cluster include a large proportion with high blood pressure, and this may partially explain how the cluster was formed, although the high value may be related to their age. Next there are the ‘physically sick but happy elderly’, a group that has a higher prevalence of chronic health conditions (including osteoarthritis, diabetes and high blood pressure) but who also exhibit low levels of anxiety and depression. Finally, there are those with the ‘poorest health’. This group is the most deprived, has the worst health (highest prevalence of most chronic health conditions) and tends not to engage in healthy behaviours. Individuals reported high levels of ‘pain’ and ‘fatigue’. The cluster also has the highest mean BMI.
The coefficient of variation presented information regarding the dispersion of variables. Variables with greater variation will be more important in cluster formation, as they allow for higher discrimination between clusters. Values were highest among the health-related variables (especially stroke, anxiety, depression), suggesting that they were most important in cluster formation. Although mean alcohol intake contained a low value, this is because the variable only contains high values for one cluster, with the other clusters containing similar values.
Conducting a replication analysis using Blashfield and Macintyre's28 split sample method produced clusters that were fairly similar to the main results. Cohen's kappa coefficient was 0.41 (P < 0.001), suggesting moderate agreement.29 The cases that altered were mostly found on the boundaries of each cluster and hence any differences were due to these cases moving to clusters of similar characteristics (i.e. knife edge issues23,24). The clusters also remained consistent if the morbidly obese were removed from the sample (i.e. leaving only the obese (BMI = 30–40, n = 3757); kappa coefficient against the original solution was 0.56, P < 0.001). A cluster analysis of just the morbidly obese (BMI = 40+, n = 387) did not produce broadly similar clusters. We also repeated the analysis for normal and overweight BMI categories. The clusters we found in the overweight group were similar to those found in the obese group; however, this similarity was not seen in the analysis of the normal BMI group.
Discussion
Main findings
The analysis presented in this study has identified six types of obese individuals: heavy drinking males; younger healthy females; affluent healthy elderly; the physically sick but happy elderly; unhappy anxious middle aged and a cluster with the poorest health. Health conditions (especially stroke, anxiety and depression) displayed the greatest variation in mean values between clusters, suggesting that they were important in differentiating between types of obese individuals. Testing suggested that the clusters were fairly stable.
What is already known on this topic
Although there has been sustained criticism of BMI as a measure of body fat and obesity,30,31 BMI continues to be an important measure for reporting obesity levels within a population.18 There has been less critique surrounding BMI as a classificatory tool. Our paper helps to drive debate around the application of BMI, refining the measure to improve the detail it can offer as tool for grouping individuals.
What this study adds
The heterogeneity of obese individuals has important policy and clinical relevance. Obesity-related interventions often target obese people in general, rather than any particular population subgroup (except perhaps relative to age and/or gender).4,10 A focus on subgroups of individuals may allow a much more efficient targeting of scarce healthcare and health promotion resources. An example might be the targeting of messages about the role of alcohol in weight management to young males as opposed to young females. Likewise, the ‘affluent healthy elderly’ may respond to different messages and may require different interventions, compared with those who are in the ‘poorest health’ group who appear to be those with the most significant clinical need.
Although it is clear that the individuals in the study would benefit from weight loss, an important implication from the results is that weight loss may not be the primary clinical focus. For example, among the ‘poorest health’ group weight loss may be less of an issue compared with the chronic health issues associated with the cluster. This is in contrast to other groups such as ‘younger healthy females’ or ‘affluent healthy elderly’ where weight loss could be a priority. Clinical prioritization is a key here, and the cluster analysis highlights this issue when dealing with obese patients (where weight loss itself might be more effectively considered a secondary outcome).
Interventions may need to be targeted to the cluster that patients correspond, to help patients lose weight or to effectively tailor their design. For example, for the ‘unhappy anxious middle aged’, an intervention involving increasing exercise may need to be mixed with psycho-social counselling, whereas for those in the ‘poorest health’ group advice surrounding exercise may not be reasonable and much more modest goals may be needed. It is important for clinicians and policymakers to recognize the different types of obese individuals they will encounter and moving beyond just using BMI alone will be important for successful treatment.32
Limitations of this study
BMI is used to identify individuals who are obese for the analysis. BMI may not always accurately classify individuals as obese as it does not directly measure body fat. Although BMI is easy to measure, there is conflicting evidence surrounding how useful BMI is as a measure of obesity in the general population. Shah and Braverman30 found that BMI underestimated prevalence of obesity compared with body fat; however, Gallagher et al.31 have shown BMI to be correlated to most other measures of body fat. Green18 also demonstrated that BMI was a useful measure compared with waist circumference for estimating a variety of health risks. Future research should explore whether the clusters exist when using other measures of obesity to explore their validity.
The data collected in the YHS are based on self-reported information and are therefore subject to a range of different biases. Self-reported BMI has been shown to be downwardly biased.30 The variables used to classify individuals may also be affected by bias; for example, diabetes has been shown to be under-reported through self-reported methods.33
The number of clusters chosen is somewhat arbitrary and will affect the results that are reported.23,24 While the decision for the number of clusters selected was based upon the BIC (a measure previously shown to be effective at detecting the correct number of clusters), the choice is still somewhat subjective.25,27 Exploring the results for a varying number of clusters in the dataset showed that the patterns captured in the six-cluster solution remained broadly consistent (with clusters either combined or split up depending on the number of clusters), suggesting that a six-cluster solution was appropriate. Testing also indicated that the solution was fairly stable.
Cluster analysis is a data-driven method, and hence despite the stability of the clusters within the study, the results may not be generalizable to other obese populations. However, the methodology is designed to generate hypotheses that can drive future research.23,24 It would be useful to explore the existence of the clusters in other datasets to validate how useful the groupings are both nationally and internationally. It is also worth exploring how similar the obese clusters are to other population subgroups to examine whether they are just specific obese individuals.
Acknowledgements
We are grateful to all the individuals who have enrolled in the cohort. We also acknowledge the GP practice staff for their contribution in the recruiting process. This publication is the work of the authors and does not necessarily reflect the views of the Yorkshire Health Study Management Team or Steering Committee. This publication presents independent research by the National Institute for Health Research Collaboration for Leadership in Applied Health Research and Care for South Yorkshire (NIHR CLAHRC SY) a pilot which ended in 2013. Further details about the new NIHR CLAHRC Yorkshire and Humber can be found at www.clahrc-yh.nihr.ac.uk. The views and opinions expressed are those of the authors, and not necessarily those of the NHS, the NIHR or the Department of Health.
