-
PDF
- Split View
-
Views
-
Cite
Cite
Shu Kay Ng, Richard Tawiah, Michael Sawyer, Paul Scuffham, Patterns of multimorbid health conditions: a systematic review of analytical methods and comparison analysis, International Journal of Epidemiology, Volume 47, Issue 5, October 2018, Pages 1687–1704, https://doi.org/10.1093/ije/dyy134
- Share Icon Share
Abstract
The latest review of studies on multimorbidity patterns showed high heterogeneity in the methodology for identifying groups of multimorbid conditions. However, it is unclear how analytical methods used influence the identified multimorbidity patterns.
We undertook a systematic review of analytical methods used to identify multimorbidity patterns in PubMed and EMBASE from their inception to January 2017. We conducted a comparison analysis to assess the effect the analytical methods had on the multimorbidity patterns identified, using the Australian National Health Survey (NHS) 2007–08 data.
We identified 13 194 studies and excluded 13 091 based on titles/abstracts. From the full-text reviews of the 103 remaining publications, we identified 41 studies that used five different analytical methods to identify multimorbid conditions in the studies. Thirty-seven studies (90%) adopted either the factor-analysis or hierarchical-clustering methods, but heterogeneity arises for the use of different proximity measures within each method to form clusters. Our comparison analysis showed the variation in identified groups of multimorbid conditions when applying the methods to the same NHS data. We extracted main similarities among the groupings obtained by the five methods: (i) cardiovascular and metabolic diseases, (ii) mental health problems and (iii) allergic diseases.
We showed the extent of effects for heterogeneous analytical methods on identification of multimorbidity patterns. However, more work is needed to guide investigators for choosing the best analytical method to improve the validity and generalizability of findings. Investigators should also attempt to compare results obtained by various methods for a consensus grouping of multimorbid conditions.
Identification of multimorbid conditions beyond chance is important in multimorbidity research, as the findings not only generate new hypotheses on possible shared biologic processes among specific diseases, but also facilitate quantifying the effect of multimorbidity on health-related outcomes.
Previous reviews showed substantial variation in methodologies in studies for identifying the nature and patterns of non-random multimorbidity. However, questions remain with regard to how the analytical methods used in these studies address non-random association between conditions and the extent of their effects on the grouping of multimorbid conditions.
We reviewed the analytical methods in epidemiological studies on multimorbidity and identified five different methods to reveal multimorbid conditions in these studies. Within each method, heterogeneity of findings arises due to the use of different proximity measures to form clusters.
In the comparison analysis, we used the Australian National Health Survey data to illustrate the extent of effects for heterogeneous analytical methods on identification of multimorbidity patterns. Main similarities among the clusters obtained fall into three disease groups.
These findings will contribute to develop a more uniform methodological framework to guide investigators for choosing the analytical method to enhance the validity and generalizability of findings in future multimorbidity research.
Introduction
Multimorbidity (the co-occurrence of two or more health conditions within one person1) constitutes a serious burden and challenge on the healthcare system in many countries, as it is closely associated with poorer health outcomes, more complex clinical management, a higher use of health services and associated cost.2–6 Based on the 2004–05 Australian National Health Survey (NHS), 80% of Australians aged ≥65 years have three or more chronic conditions. A recent study estimated that more than 210 000 Australians aged ≥55 years spend more than 20% of their income on health and healthcare.7 High prevalence rates of multimorbidity are also observed in other countries such as Canada and the USA.8,9 Previous studies10,11 suggested that social disadvantage enhanced the likelihood of illness through increased exposure to environments that have a detrimental effect on health, such as poor diet and care, adverse family conditions and exposure to smoking, drinking and psychosocial stress.
Although multimorbidity is increasingly recognized as an important issue in medical care, most clinical studies have used a single disease-based paradigm with clinical trials typically excluding participants with multimorbidity.12–14 In addition, only a minority of clinical guidelines in Australia and the USA make specific recommendations about the management of individuals with multiple conditions.2 The vast majority of services in tertiary care settings also largely focus on single conditions despite high rates of multimorbidity among populations attending such services. Multimorbidity research has the potential to play an important role addressing these limitations of existing services and shift the focus of health services away from a focus on single conditions towards the development of more effective strategies to better help the large numbers of individuals with multimorbid conditions.15,16
There are two major approaches in epidemiological studies on multimorbidity. The first approach investigates multimorbid conditions with a specific index condition, such as attention deficit hyperactivity disorders (ADHDs) in childhood.17 From this perspective, comorbidity is the term to that it is usually referred to.1 The comorbidity study approach allows studying specific health problems on selected subpopulations of interest. The second approach emphasizes the quantification of ‘non-random’ multimorbidity and differences in multimorbidity patterns among individuals in the general population. Identification of multimorbid conditions beyond chance is important in multimorbidity research, as the findings not only generate new hypotheses on possible shared biologic processes or pathophysiological pathways among specific diseases,12,18 but also facilitate quantifying the impact of multimorbidity on health-related outcomes and quality of life.19–21 The latter is essential to provide the evidence base for designing future studies to improve prevention, treatment and care of patients with multiple conditions. A recent Cochrane review of the effectiveness of interventions to improve outcomes for people with multimorbidity22 showed that a variety of interventions conducted in this area to date had little improvement on outcomes, especially on health-service use and medication adherence. There are remaining uncertainties about how to characterize multimorbidity and assess its impact on health outcomes. Whilst a number of large-scale national surveys have been conducted worldwide to study important health problems associated with multimorbidity, the synergistic and interactive effects with other key risk factors on multimorbidity patterns as well as the effect on health outcomes remain unclear.23
To assess and quantify the impact of multimorbidity on individuals’ health-related outcomes and service use, there is a need to obtain an appropriate measure of multimorbidity to characterize individuals’ multimorbidity patterns.24 Most existing measures use a single score to quantify the multimorbidity of an individual, such as the sum of the number of conditions and weighted scores including the Charlson Multimorbidity Index and Cumulative Illness Rating Scale.24–27 These numerical indices, however, do not account for multimorbidity by chance28 and may suffer generalizability problems, as they either depend on the number of conditions under study or they were originally developed and validated for an index condition or for specific outcomes.25,29 Previous reviews and studies demonstrated that certain combinations of multimorbid conditions have greater synergistic effects on specific health outcomes and service uses, compared with multimorbidity involving other conditions,30–32 implying that the sum of the number of diagnosed conditions widely used to adjust for individual multimorbidity in outcome analyses is not an appropriate measure.33
The latest systematic review of epidemiological studies on multimorbidity patterns published before June 2012 showed the diversity of the nature and patterns of multimorbidity found across different studies.12 This finding may be explained by different methodological approaches used to identify multimorbid condition groups, select, define and code diseases, as well as different population samples used in those analyses. As different analytical methods adjust for multimorbidity by chance to different extents, it is anticipated that the reported prevalence of multimorbidity, and hence multimorbid groups of conditions from different studies, also vary.34
The objective of this systematic review was to identify the analytical methods used in studies for the identification of multimorbid condition groups and understand how each method addresses the potential for non-random association. Moreover, we conducted a comparison analysis of the identified analytical methods using the Australian NHS 2007–08 data, where the relative impact on the multimorbidity patterns that was attributed entirely to the methods were investigated as other key factors such as selected conditions and population samples were the same for all methods. This review will improve our knowledge about the nature and diversity of the analytical methods used in various studies. The comparison analysis will inform the extent of effects for heterogeneous analytical methods on the identified multimorbidity patterns. These findings will contribute to developing a methodological framework to guide investigators for choosing the best analytical method to enhance the validity and generalizability of findings, reporting multimorbidity patterns and advancing the evidence base around the non-random associative multimorbidity of health conditions. The ultimate goal is to address increasingly complex health needs due to multimorbidity, by developing health management and guidelines for the effective and efficient prevention, treatment and care of patients with multiple conditions as well as improved navigation of the health services for the patients.
Methods
Systematic search strategy
We conducted a systematic search in PubMed and EMBASE electronic databases from their inception to January 2017 to identify relevant studies. In PubMed, we searched via Medical Subject Headings (MeSH) by combining MeSH terms relevant to our topic with Boolean operators. We specified multimorbidity and cluster analysis as MeSH headings and combined search terms synonymous with multimorbidity as well as keywords, such as coexisting health conditions, multiple diagnoses, concurrent diagnoses, etc. We conducted the search strategy in EMBASE using the index terms listed above via the EMBASE EMTREE thesaurus. Supplementary Tables 1 and 2, available as Supplementary data at IJE online, show the detailed search strategies in PubMed and EMBASE.
Inclusion criteria
Studies eligible for the systematic review must be original (written in English) and indexed in PubMed and EMBASE, wherein the basis of research is the identification of patterns of multimorbid groups of health conditions with explicit emphasis on the method(s) used for exploring multimorbidity patterns.
Exclusion criteria
Studies were excluded per the following criteria:
applied descriptive measures of multimorbidity, e.g. based on prevalence or count of health conditions;
studied association between health conditions by first selecting index conditions;
identified multimorbid clusters based on health conditions fewer than 10 in number;
studies focusing on grouping patients (or participants) only instead of conditions.
Data extraction and quality assessment
We extracted the name of the first author, publication year, title, journal and abstract for each paper during the screening, as well as the sample size, number of health conditions, statistical method, clustering algorithm, proximity measure and method to determine the number of groups for each included study. The quality of reporting of the included studies was assessed using a shorter version of the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) checklist,26 in which 23 items relevant to reports of cross-sectional studies were included. An overall quality score for an included study was determined by the total number of STROBE items the study addressed (ranging from 0 to 23). The selection of included studies, full-text reviews, data extraction and quality assessment were conducted independently by two reviewers (S.N. and R.T.).
Comparison analysis
We compared the analytical methods for identifying multimorbid groups of health conditions using the Australian NHS 2007–08 data. The survey collected information about the prevalence of current long-term conditions from 20 788 Australians, where current long-term conditions were defined as medical conditions that were current at the time of the survey and that had lasted at least 6 months, or that in which the respondent expected to last for 6 months or more.35 Medical conditions were classified based on the World Health Organization International Classification of Diseases, Tenth Revision.36 The ABS website provides a summary of findings37 and the NHS data in Confidentialized Unit Record Files (CURFs), which may be accessed via the Remote Access Data Laboratory.38
Results
Literature search results
Using the above systematic search strategy, we identified a total of 13 194 unique papers after removing duplicated records. Among these papers, we excluded 13 091 papers as irrelevant after reviewing the papers’ titles and abstracts with reference to our inclusion and exclusion criteria. We conducted full-text reviews of the 103 remaining papers. Of these, we excluded further 62 papers because they adopted descriptive measures of multimorbidity, aimed to cluster patients and/or considered association between health conditions with a pre-selected index condition. Table 1 displays the methodological aspects in terms of the analytical method, the proximity measure and the type of clustering algorithm for the identification of multimorbid condition groups in the final inclusion of 41 papers.
Analytical methodology for the identification of multimorbid condition groups (41 studies)
Study, year . | No. of individuals; no. of conditions . | Statistical method . | Clustering algorithm . | Proximity measure . | Determination of number of groups . |
---|---|---|---|---|---|
Abad-Díez et al.,58 2014 | 72 815; 51 | Exploratory factor analysis | Principal-factor method; Oblique (Oblimin) rotation | Tetrachoric correlation matrix | Scree plot; factor loadings ≥0.25 |
Alonso-Moran et al.,59 2015 | 126 889 patients with T2DM; 28 | Agglomerative hierarchical cluster analysis | Ward’s minimum-variance method | Pearson’s dissimilarity | Cut-off point of 0.98 imposed on dendrogram |
Clerencia-Sierra et al.,60 2015 | 924 patients aged ≥65 years; 59 | Exploratory factor analysis (stratified by sex) | Principal-factor method; rotation method NS | Tetrachoric correlation matrix | Scree plot; factor loadings ≥0.25 |
Cornell et al.,61 2007 | 1 645 314; 45 | Agglomerative hierarchical cluster analysis | Lance-Williams method with β set at –0.50 | Jaccard coefficient | Evaluated by seven health services researchers relative to three clinical criteria |
Dong et al.,62 2013 | 496; 16 | Agglomerative hierarchical cluster analysis (stratified by sex) | Average linkage | Yule Q | Cut-off between 0.2 and 0.3 imposed on dendrogram |
Dorenkamp et al.,63 2016 | 3386; 15 | Agglomerative hierarchical cluster analysis | Ward’s minimum-variance method | Squared Euclidean distance | Agglomerative coefficient; dendrogram; Pseudo F statistic |
Foguet-Boreu et al.,64 2015 | 322 328 aged ≥65 years; 99 | Agglomerative hierarchical cluster analysis (stratified by age group and sex) | Ward’s minimum-variance method | Jaccard coefficient | Semi-partial R2; Calinski-Harabasz Pseudo F; Pseudo T2 statistic |
Formiga et al.,65 2013 | 328 aged 85 years; 16 | Agglomerative hierarchical cluster analysis | Average agglomeration | Yule Q | NS |
Gabilondo et al.,66 2017 | 2 255 406; 27 | Agglomerative hierarchical cluster analysis (stratified by sex) | Ward’s minimum-variance method | Pearson’s dissimilarity | Cut-off based on clinical importance |
García-Olmos et al.,67 2012 | 198 670; 26 | Multiple correspondence analysis (account for age and sex) | Graphic of category points in selected dimensions to indicate ‘importance’ | Data matrix | Benzécri inertia adjustment; inertia >0.05 |
Garin et al.,68 2014 | 3625 aged >50 years; 11 | Exploratory factor analysis | Extraction method NS; Oblique (Oblimin) rotation | Tetrachoric correlation matrix | Eigenvalue ≥1; factor loadings >0.25 |
Garin et al.,69 2016 | 41 909 aged >50 years; 12 | Exploratory factor analysis | Extraction method NS, Oblique rotation | Tetrachoric correlation matrix | Parallel analysis by simulation; Scree test |
Goldstein et al.,70 2010 | 3591 homeless veterans; 16 | Factor analysis | Principal-component method; Varimax rotation | Pearson-correlation matrix | Eigenvalue ≥1 |
Gu et al.,71 2017 | 2452 aged ≥60 years; 13 | Factor analysis | Principal-component method; Oblique (Oblimin) rotation | Tetrachoric correlation matrix | Eigenvalue >1; factor loadings ≥0.25 |
Held et al.,54 2016 | 1464 aged ≥70 years; 17 | Network and cluster analysis | Modularity class analysis | ForceAtlas2 layout (geometric distance and degree of nodes) | Resolution = 0.6 |
Hermans and Evenhuis,72 2014 | 1047 with at least two conditions and aged ≥50 years with ID; 14 | Exploratory factor analysis | Maximum-likelihood method; Varimax rotation | Tetrachoric correlation matrix | CFI; RMSEA; SRMR |
Herr et al.,73 2015 | 117 with diagnosed heart failure; 10 | Factor analysis | Principal-component method; Oblimin rotation | Pearson-correlation matrix | Scree plot |
Holden et al.,28 2011 | 78 430; 23 | Exploratory factor analysis | Extraction method NS; orthogonal quartimax/quartimin rotation | Tetrachoric correlation matrix | Scree plot; Eigenvalue >1.0; SRMR < 0.05; CFI & TLI > 0.95; factor loadings (magnitude) >0.4 |
Islam et al.,74 2014 | 4574; 10 | Agglomerative hierarchical cluster analysis; principal-component analysis | Average linkage; Varimax rotation | Yule Q; tetrachoric correlation matrix | Dendrogram; Agglomerative coefficient; Scree plot; Eigenvalue >1; SRMSR; CFI; TLI; factor loadings (magnitude) >0.3 |
Jackson et al.,75 2015 | 7270 women aged >75 years; 31 | Exploratory factor analysis | Principal-factor method; Varimax rotation | Tetrachoric correlation matrix | Eigenvalue >1; Scree plot |
Jackson et al.,76 2016 | 4896 women; 31 | Exploratory factor analysis | Principal-factor method; Varimax rotation | Tetrachoric correlation matrix | Eigenvalue >1; Scree plot |
John et al.,25 2003 | 1039 aged ≥60 years; 11 | Agglomerative hierarchical cluster analysis | Complete linkage; average linkage | Phi coefficient; Yule Q | Sensitivity analysis; Goodman and Kruskal’s gamma |
Jovic et al.,77 2016 | 13 103 aged ≥20 years; 13 | Exploratory factor analysis (stratified by age group and sex) | Principal-component method; Varimax orthogonal rotation | NS | Scree plot; Factor loadings >0.25; Eigenvalues >1 |
Kim et al.,78 2012 | 1844 HIV-infected patients; 15 | Exploratory factor analysis | Extraction method NS; oblique rotation | Tetrachoric correlation matrix | Eigenvalues; CFL; TLI; RMSEA |
Kirchberger et al.,79 2012 | 4127 aged ≥65 years; 13 | Exploratory factor analysis | Principal-factor method; oblique (Oblimin) rotation | Tetrachoric correlation matrix | Eigenvalue ≥1; factor loadings ≥0.25 |
Kumar et al.,80 2017 | 2134 aged ≥50 years; 45 | Hierarchical cluster analysis using treelet transformation | Principal-component analysis | Correlation | Dendrogram; cut level of 12 determined by a 10-fold cross-validation procedure |
Magnan et al.,81 2018 | 29 562 with types 1 or 2 diabetes; 12 | Exploratory factor analysis | Robust weight least squares; oblique (Geomin) rotation | Tetrachoric correlation matrix | Eigenvalue >1; interpretability of factors |
Marengoni et al.,20 2009 | 1099; 15 | Agglomerative hierarchical cluster analysis | Average linkage | Yule Q | NS |
Marengoni et al.,82 2010 | 1332 aged ≥65 years; 19 | Agglomerative hierarchical cluster analysis | Average linkage | Yule Q | NS |
Marengoni et al.,83 2013 | n1 = 1155, n2 = 1173; 19 | Agglomerative hierarchical cluster analysis | Average linkage | Yule Q | NS |
Ng et al.,34 2012 | 8841; 24 | Three-step cluster analysis | Searching all comorbid pairs to form groups | Somers’ D statistic | Association test using the Benjamini–Hochberg procedure |
Nurnberg et al.,84 1991 | 110; 12 | Factor analysis | Principal-factor method; orthogonal Varimax rotation | Correlation matrix | Eigenvalue >1; factor loadings ≥0.4 |
Poblador-Plou et al.,85 2014 | n1 = 79 291, n2 = 275 682; 114 | Exploratory factor analysis | Principal-factor method; oblique (Oblimin) rotation | Tetrachoric correlation matrix | Scree plot; factor loadings ≥0.25 |
Prados-Torres et al.,86 2012 | 275 682; 13–39 for various age group and sex | Exploratory factor analysis (stratified by age group and sex) | Principal-factor method; oblique (Oblimin) rotation | Tetrachoric correlation matrix | Scree plot; factor loadings ≥0.25 |
Prazeres et al.,87 2015 | 1993; 16 | Agglomerative hierarchical cluster analysis | Average linkage | Yule Q | NS |
Ruiz et al.,88 2015 | 2 788 900 aged ≥65 years; 20 | Correspondence analysis | Graphic of category points in selected dimensions to quantify information explained | Data matrix | Benzécri inertia adjustment; Greenacre approximation |
Schäfer et al.,89 2010 | 149 280 aged ≥65 years; 46 | Exploratory factor analysis (stratified by sex) | Principal-factor method; oblique (Oblimin) rotation | Tetrachoric correlation matrix | Eigenvalue ≥1; factor loadings ≥0.25 |
Sideris et al.,90 2016 | No. of individuals and conditions NS | Agglomerative hierarchical cluster analysis | Ward’s minimum-variance method | Jaccard coefficient | Dendrogram; cut level of 1.5 determined based on a validation procedure |
Vu et al.,91 2011 | 3729 with at least two conditions and aged ≥65 years (fall-related injuries); 14 | Agglomerative hierarchical cluster analysis | Average linkage | Jaccard coefficient; Yule Q | Calinski/Harabasz index; Duda/Hart index |
Walker et al.,92 2016 | 5647 aged ≥55 years; 19 | Exploratory factor analysis | Principal-factor method; oblique (Oblimin) rotation | Tetrachoric correlation matrix | Scree plot; Eigenvalue ≥1; factor loadings >0.25 |
Wang et al.,93 2015 | 1480 aged ≥60 years; 16 | Exploratory factor analysis | Extraction method NS; oblique (Oblimin) rotation | Tetrachoric correlation matrix | Eigenvalue >1; factor loadings ≥0.25 |
Study, year . | No. of individuals; no. of conditions . | Statistical method . | Clustering algorithm . | Proximity measure . | Determination of number of groups . |
---|---|---|---|---|---|
Abad-Díez et al.,58 2014 | 72 815; 51 | Exploratory factor analysis | Principal-factor method; Oblique (Oblimin) rotation | Tetrachoric correlation matrix | Scree plot; factor loadings ≥0.25 |
Alonso-Moran et al.,59 2015 | 126 889 patients with T2DM; 28 | Agglomerative hierarchical cluster analysis | Ward’s minimum-variance method | Pearson’s dissimilarity | Cut-off point of 0.98 imposed on dendrogram |
Clerencia-Sierra et al.,60 2015 | 924 patients aged ≥65 years; 59 | Exploratory factor analysis (stratified by sex) | Principal-factor method; rotation method NS | Tetrachoric correlation matrix | Scree plot; factor loadings ≥0.25 |
Cornell et al.,61 2007 | 1 645 314; 45 | Agglomerative hierarchical cluster analysis | Lance-Williams method with β set at –0.50 | Jaccard coefficient | Evaluated by seven health services researchers relative to three clinical criteria |
Dong et al.,62 2013 | 496; 16 | Agglomerative hierarchical cluster analysis (stratified by sex) | Average linkage | Yule Q | Cut-off between 0.2 and 0.3 imposed on dendrogram |
Dorenkamp et al.,63 2016 | 3386; 15 | Agglomerative hierarchical cluster analysis | Ward’s minimum-variance method | Squared Euclidean distance | Agglomerative coefficient; dendrogram; Pseudo F statistic |
Foguet-Boreu et al.,64 2015 | 322 328 aged ≥65 years; 99 | Agglomerative hierarchical cluster analysis (stratified by age group and sex) | Ward’s minimum-variance method | Jaccard coefficient | Semi-partial R2; Calinski-Harabasz Pseudo F; Pseudo T2 statistic |
Formiga et al.,65 2013 | 328 aged 85 years; 16 | Agglomerative hierarchical cluster analysis | Average agglomeration | Yule Q | NS |
Gabilondo et al.,66 2017 | 2 255 406; 27 | Agglomerative hierarchical cluster analysis (stratified by sex) | Ward’s minimum-variance method | Pearson’s dissimilarity | Cut-off based on clinical importance |
García-Olmos et al.,67 2012 | 198 670; 26 | Multiple correspondence analysis (account for age and sex) | Graphic of category points in selected dimensions to indicate ‘importance’ | Data matrix | Benzécri inertia adjustment; inertia >0.05 |
Garin et al.,68 2014 | 3625 aged >50 years; 11 | Exploratory factor analysis | Extraction method NS; Oblique (Oblimin) rotation | Tetrachoric correlation matrix | Eigenvalue ≥1; factor loadings >0.25 |
Garin et al.,69 2016 | 41 909 aged >50 years; 12 | Exploratory factor analysis | Extraction method NS, Oblique rotation | Tetrachoric correlation matrix | Parallel analysis by simulation; Scree test |
Goldstein et al.,70 2010 | 3591 homeless veterans; 16 | Factor analysis | Principal-component method; Varimax rotation | Pearson-correlation matrix | Eigenvalue ≥1 |
Gu et al.,71 2017 | 2452 aged ≥60 years; 13 | Factor analysis | Principal-component method; Oblique (Oblimin) rotation | Tetrachoric correlation matrix | Eigenvalue >1; factor loadings ≥0.25 |
Held et al.,54 2016 | 1464 aged ≥70 years; 17 | Network and cluster analysis | Modularity class analysis | ForceAtlas2 layout (geometric distance and degree of nodes) | Resolution = 0.6 |
Hermans and Evenhuis,72 2014 | 1047 with at least two conditions and aged ≥50 years with ID; 14 | Exploratory factor analysis | Maximum-likelihood method; Varimax rotation | Tetrachoric correlation matrix | CFI; RMSEA; SRMR |
Herr et al.,73 2015 | 117 with diagnosed heart failure; 10 | Factor analysis | Principal-component method; Oblimin rotation | Pearson-correlation matrix | Scree plot |
Holden et al.,28 2011 | 78 430; 23 | Exploratory factor analysis | Extraction method NS; orthogonal quartimax/quartimin rotation | Tetrachoric correlation matrix | Scree plot; Eigenvalue >1.0; SRMR < 0.05; CFI & TLI > 0.95; factor loadings (magnitude) >0.4 |
Islam et al.,74 2014 | 4574; 10 | Agglomerative hierarchical cluster analysis; principal-component analysis | Average linkage; Varimax rotation | Yule Q; tetrachoric correlation matrix | Dendrogram; Agglomerative coefficient; Scree plot; Eigenvalue >1; SRMSR; CFI; TLI; factor loadings (magnitude) >0.3 |
Jackson et al.,75 2015 | 7270 women aged >75 years; 31 | Exploratory factor analysis | Principal-factor method; Varimax rotation | Tetrachoric correlation matrix | Eigenvalue >1; Scree plot |
Jackson et al.,76 2016 | 4896 women; 31 | Exploratory factor analysis | Principal-factor method; Varimax rotation | Tetrachoric correlation matrix | Eigenvalue >1; Scree plot |
John et al.,25 2003 | 1039 aged ≥60 years; 11 | Agglomerative hierarchical cluster analysis | Complete linkage; average linkage | Phi coefficient; Yule Q | Sensitivity analysis; Goodman and Kruskal’s gamma |
Jovic et al.,77 2016 | 13 103 aged ≥20 years; 13 | Exploratory factor analysis (stratified by age group and sex) | Principal-component method; Varimax orthogonal rotation | NS | Scree plot; Factor loadings >0.25; Eigenvalues >1 |
Kim et al.,78 2012 | 1844 HIV-infected patients; 15 | Exploratory factor analysis | Extraction method NS; oblique rotation | Tetrachoric correlation matrix | Eigenvalues; CFL; TLI; RMSEA |
Kirchberger et al.,79 2012 | 4127 aged ≥65 years; 13 | Exploratory factor analysis | Principal-factor method; oblique (Oblimin) rotation | Tetrachoric correlation matrix | Eigenvalue ≥1; factor loadings ≥0.25 |
Kumar et al.,80 2017 | 2134 aged ≥50 years; 45 | Hierarchical cluster analysis using treelet transformation | Principal-component analysis | Correlation | Dendrogram; cut level of 12 determined by a 10-fold cross-validation procedure |
Magnan et al.,81 2018 | 29 562 with types 1 or 2 diabetes; 12 | Exploratory factor analysis | Robust weight least squares; oblique (Geomin) rotation | Tetrachoric correlation matrix | Eigenvalue >1; interpretability of factors |
Marengoni et al.,20 2009 | 1099; 15 | Agglomerative hierarchical cluster analysis | Average linkage | Yule Q | NS |
Marengoni et al.,82 2010 | 1332 aged ≥65 years; 19 | Agglomerative hierarchical cluster analysis | Average linkage | Yule Q | NS |
Marengoni et al.,83 2013 | n1 = 1155, n2 = 1173; 19 | Agglomerative hierarchical cluster analysis | Average linkage | Yule Q | NS |
Ng et al.,34 2012 | 8841; 24 | Three-step cluster analysis | Searching all comorbid pairs to form groups | Somers’ D statistic | Association test using the Benjamini–Hochberg procedure |
Nurnberg et al.,84 1991 | 110; 12 | Factor analysis | Principal-factor method; orthogonal Varimax rotation | Correlation matrix | Eigenvalue >1; factor loadings ≥0.4 |
Poblador-Plou et al.,85 2014 | n1 = 79 291, n2 = 275 682; 114 | Exploratory factor analysis | Principal-factor method; oblique (Oblimin) rotation | Tetrachoric correlation matrix | Scree plot; factor loadings ≥0.25 |
Prados-Torres et al.,86 2012 | 275 682; 13–39 for various age group and sex | Exploratory factor analysis (stratified by age group and sex) | Principal-factor method; oblique (Oblimin) rotation | Tetrachoric correlation matrix | Scree plot; factor loadings ≥0.25 |
Prazeres et al.,87 2015 | 1993; 16 | Agglomerative hierarchical cluster analysis | Average linkage | Yule Q | NS |
Ruiz et al.,88 2015 | 2 788 900 aged ≥65 years; 20 | Correspondence analysis | Graphic of category points in selected dimensions to quantify information explained | Data matrix | Benzécri inertia adjustment; Greenacre approximation |
Schäfer et al.,89 2010 | 149 280 aged ≥65 years; 46 | Exploratory factor analysis (stratified by sex) | Principal-factor method; oblique (Oblimin) rotation | Tetrachoric correlation matrix | Eigenvalue ≥1; factor loadings ≥0.25 |
Sideris et al.,90 2016 | No. of individuals and conditions NS | Agglomerative hierarchical cluster analysis | Ward’s minimum-variance method | Jaccard coefficient | Dendrogram; cut level of 1.5 determined based on a validation procedure |
Vu et al.,91 2011 | 3729 with at least two conditions and aged ≥65 years (fall-related injuries); 14 | Agglomerative hierarchical cluster analysis | Average linkage | Jaccard coefficient; Yule Q | Calinski/Harabasz index; Duda/Hart index |
Walker et al.,92 2016 | 5647 aged ≥55 years; 19 | Exploratory factor analysis | Principal-factor method; oblique (Oblimin) rotation | Tetrachoric correlation matrix | Scree plot; Eigenvalue ≥1; factor loadings >0.25 |
Wang et al.,93 2015 | 1480 aged ≥60 years; 16 | Exploratory factor analysis | Extraction method NS; oblique (Oblimin) rotation | Tetrachoric correlation matrix | Eigenvalue >1; factor loadings ≥0.25 |
NS, not specified; CFI, comparative fit index; TLI, Tucker-Lewis index; RMSEA, root mean square error approximation; SRMR, standardized root mean square residual; ID, intellectual disabilities.
Analytical methodology for the identification of multimorbid condition groups (41 studies)
Study, year . | No. of individuals; no. of conditions . | Statistical method . | Clustering algorithm . | Proximity measure . | Determination of number of groups . |
---|---|---|---|---|---|
Abad-Díez et al.,58 2014 | 72 815; 51 | Exploratory factor analysis | Principal-factor method; Oblique (Oblimin) rotation | Tetrachoric correlation matrix | Scree plot; factor loadings ≥0.25 |
Alonso-Moran et al.,59 2015 | 126 889 patients with T2DM; 28 | Agglomerative hierarchical cluster analysis | Ward’s minimum-variance method | Pearson’s dissimilarity | Cut-off point of 0.98 imposed on dendrogram |
Clerencia-Sierra et al.,60 2015 | 924 patients aged ≥65 years; 59 | Exploratory factor analysis (stratified by sex) | Principal-factor method; rotation method NS | Tetrachoric correlation matrix | Scree plot; factor loadings ≥0.25 |
Cornell et al.,61 2007 | 1 645 314; 45 | Agglomerative hierarchical cluster analysis | Lance-Williams method with β set at –0.50 | Jaccard coefficient | Evaluated by seven health services researchers relative to three clinical criteria |
Dong et al.,62 2013 | 496; 16 | Agglomerative hierarchical cluster analysis (stratified by sex) | Average linkage | Yule Q | Cut-off between 0.2 and 0.3 imposed on dendrogram |
Dorenkamp et al.,63 2016 | 3386; 15 | Agglomerative hierarchical cluster analysis | Ward’s minimum-variance method | Squared Euclidean distance | Agglomerative coefficient; dendrogram; Pseudo F statistic |
Foguet-Boreu et al.,64 2015 | 322 328 aged ≥65 years; 99 | Agglomerative hierarchical cluster analysis (stratified by age group and sex) | Ward’s minimum-variance method | Jaccard coefficient | Semi-partial R2; Calinski-Harabasz Pseudo F; Pseudo T2 statistic |
Formiga et al.,65 2013 | 328 aged 85 years; 16 | Agglomerative hierarchical cluster analysis | Average agglomeration | Yule Q | NS |
Gabilondo et al.,66 2017 | 2 255 406; 27 | Agglomerative hierarchical cluster analysis (stratified by sex) | Ward’s minimum-variance method | Pearson’s dissimilarity | Cut-off based on clinical importance |
García-Olmos et al.,67 2012 | 198 670; 26 | Multiple correspondence analysis (account for age and sex) | Graphic of category points in selected dimensions to indicate ‘importance’ | Data matrix | Benzécri inertia adjustment; inertia >0.05 |
Garin et al.,68 2014 | 3625 aged >50 years; 11 | Exploratory factor analysis | Extraction method NS; Oblique (Oblimin) rotation | Tetrachoric correlation matrix | Eigenvalue ≥1; factor loadings >0.25 |
Garin et al.,69 2016 | 41 909 aged >50 years; 12 | Exploratory factor analysis | Extraction method NS, Oblique rotation | Tetrachoric correlation matrix | Parallel analysis by simulation; Scree test |
Goldstein et al.,70 2010 | 3591 homeless veterans; 16 | Factor analysis | Principal-component method; Varimax rotation | Pearson-correlation matrix | Eigenvalue ≥1 |
Gu et al.,71 2017 | 2452 aged ≥60 years; 13 | Factor analysis | Principal-component method; Oblique (Oblimin) rotation | Tetrachoric correlation matrix | Eigenvalue >1; factor loadings ≥0.25 |
Held et al.,54 2016 | 1464 aged ≥70 years; 17 | Network and cluster analysis | Modularity class analysis | ForceAtlas2 layout (geometric distance and degree of nodes) | Resolution = 0.6 |
Hermans and Evenhuis,72 2014 | 1047 with at least two conditions and aged ≥50 years with ID; 14 | Exploratory factor analysis | Maximum-likelihood method; Varimax rotation | Tetrachoric correlation matrix | CFI; RMSEA; SRMR |
Herr et al.,73 2015 | 117 with diagnosed heart failure; 10 | Factor analysis | Principal-component method; Oblimin rotation | Pearson-correlation matrix | Scree plot |
Holden et al.,28 2011 | 78 430; 23 | Exploratory factor analysis | Extraction method NS; orthogonal quartimax/quartimin rotation | Tetrachoric correlation matrix | Scree plot; Eigenvalue >1.0; SRMR < 0.05; CFI & TLI > 0.95; factor loadings (magnitude) >0.4 |
Islam et al.,74 2014 | 4574; 10 | Agglomerative hierarchical cluster analysis; principal-component analysis | Average linkage; Varimax rotation | Yule Q; tetrachoric correlation matrix | Dendrogram; Agglomerative coefficient; Scree plot; Eigenvalue >1; SRMSR; CFI; TLI; factor loadings (magnitude) >0.3 |
Jackson et al.,75 2015 | 7270 women aged >75 years; 31 | Exploratory factor analysis | Principal-factor method; Varimax rotation | Tetrachoric correlation matrix | Eigenvalue >1; Scree plot |
Jackson et al.,76 2016 | 4896 women; 31 | Exploratory factor analysis | Principal-factor method; Varimax rotation | Tetrachoric correlation matrix | Eigenvalue >1; Scree plot |
John et al.,25 2003 | 1039 aged ≥60 years; 11 | Agglomerative hierarchical cluster analysis | Complete linkage; average linkage | Phi coefficient; Yule Q | Sensitivity analysis; Goodman and Kruskal’s gamma |
Jovic et al.,77 2016 | 13 103 aged ≥20 years; 13 | Exploratory factor analysis (stratified by age group and sex) | Principal-component method; Varimax orthogonal rotation | NS | Scree plot; Factor loadings >0.25; Eigenvalues >1 |
Kim et al.,78 2012 | 1844 HIV-infected patients; 15 | Exploratory factor analysis | Extraction method NS; oblique rotation | Tetrachoric correlation matrix | Eigenvalues; CFL; TLI; RMSEA |
Kirchberger et al.,79 2012 | 4127 aged ≥65 years; 13 | Exploratory factor analysis | Principal-factor method; oblique (Oblimin) rotation | Tetrachoric correlation matrix | Eigenvalue ≥1; factor loadings ≥0.25 |
Kumar et al.,80 2017 | 2134 aged ≥50 years; 45 | Hierarchical cluster analysis using treelet transformation | Principal-component analysis | Correlation | Dendrogram; cut level of 12 determined by a 10-fold cross-validation procedure |
Magnan et al.,81 2018 | 29 562 with types 1 or 2 diabetes; 12 | Exploratory factor analysis | Robust weight least squares; oblique (Geomin) rotation | Tetrachoric correlation matrix | Eigenvalue >1; interpretability of factors |
Marengoni et al.,20 2009 | 1099; 15 | Agglomerative hierarchical cluster analysis | Average linkage | Yule Q | NS |
Marengoni et al.,82 2010 | 1332 aged ≥65 years; 19 | Agglomerative hierarchical cluster analysis | Average linkage | Yule Q | NS |
Marengoni et al.,83 2013 | n1 = 1155, n2 = 1173; 19 | Agglomerative hierarchical cluster analysis | Average linkage | Yule Q | NS |
Ng et al.,34 2012 | 8841; 24 | Three-step cluster analysis | Searching all comorbid pairs to form groups | Somers’ D statistic | Association test using the Benjamini–Hochberg procedure |
Nurnberg et al.,84 1991 | 110; 12 | Factor analysis | Principal-factor method; orthogonal Varimax rotation | Correlation matrix | Eigenvalue >1; factor loadings ≥0.4 |
Poblador-Plou et al.,85 2014 | n1 = 79 291, n2 = 275 682; 114 | Exploratory factor analysis | Principal-factor method; oblique (Oblimin) rotation | Tetrachoric correlation matrix | Scree plot; factor loadings ≥0.25 |
Prados-Torres et al.,86 2012 | 275 682; 13–39 for various age group and sex | Exploratory factor analysis (stratified by age group and sex) | Principal-factor method; oblique (Oblimin) rotation | Tetrachoric correlation matrix | Scree plot; factor loadings ≥0.25 |
Prazeres et al.,87 2015 | 1993; 16 | Agglomerative hierarchical cluster analysis | Average linkage | Yule Q | NS |
Ruiz et al.,88 2015 | 2 788 900 aged ≥65 years; 20 | Correspondence analysis | Graphic of category points in selected dimensions to quantify information explained | Data matrix | Benzécri inertia adjustment; Greenacre approximation |
Schäfer et al.,89 2010 | 149 280 aged ≥65 years; 46 | Exploratory factor analysis (stratified by sex) | Principal-factor method; oblique (Oblimin) rotation | Tetrachoric correlation matrix | Eigenvalue ≥1; factor loadings ≥0.25 |
Sideris et al.,90 2016 | No. of individuals and conditions NS | Agglomerative hierarchical cluster analysis | Ward’s minimum-variance method | Jaccard coefficient | Dendrogram; cut level of 1.5 determined based on a validation procedure |
Vu et al.,91 2011 | 3729 with at least two conditions and aged ≥65 years (fall-related injuries); 14 | Agglomerative hierarchical cluster analysis | Average linkage | Jaccard coefficient; Yule Q | Calinski/Harabasz index; Duda/Hart index |
Walker et al.,92 2016 | 5647 aged ≥55 years; 19 | Exploratory factor analysis | Principal-factor method; oblique (Oblimin) rotation | Tetrachoric correlation matrix | Scree plot; Eigenvalue ≥1; factor loadings >0.25 |
Wang et al.,93 2015 | 1480 aged ≥60 years; 16 | Exploratory factor analysis | Extraction method NS; oblique (Oblimin) rotation | Tetrachoric correlation matrix | Eigenvalue >1; factor loadings ≥0.25 |
Study, year . | No. of individuals; no. of conditions . | Statistical method . | Clustering algorithm . | Proximity measure . | Determination of number of groups . |
---|---|---|---|---|---|
Abad-Díez et al.,58 2014 | 72 815; 51 | Exploratory factor analysis | Principal-factor method; Oblique (Oblimin) rotation | Tetrachoric correlation matrix | Scree plot; factor loadings ≥0.25 |
Alonso-Moran et al.,59 2015 | 126 889 patients with T2DM; 28 | Agglomerative hierarchical cluster analysis | Ward’s minimum-variance method | Pearson’s dissimilarity | Cut-off point of 0.98 imposed on dendrogram |
Clerencia-Sierra et al.,60 2015 | 924 patients aged ≥65 years; 59 | Exploratory factor analysis (stratified by sex) | Principal-factor method; rotation method NS | Tetrachoric correlation matrix | Scree plot; factor loadings ≥0.25 |
Cornell et al.,61 2007 | 1 645 314; 45 | Agglomerative hierarchical cluster analysis | Lance-Williams method with β set at –0.50 | Jaccard coefficient | Evaluated by seven health services researchers relative to three clinical criteria |
Dong et al.,62 2013 | 496; 16 | Agglomerative hierarchical cluster analysis (stratified by sex) | Average linkage | Yule Q | Cut-off between 0.2 and 0.3 imposed on dendrogram |
Dorenkamp et al.,63 2016 | 3386; 15 | Agglomerative hierarchical cluster analysis | Ward’s minimum-variance method | Squared Euclidean distance | Agglomerative coefficient; dendrogram; Pseudo F statistic |
Foguet-Boreu et al.,64 2015 | 322 328 aged ≥65 years; 99 | Agglomerative hierarchical cluster analysis (stratified by age group and sex) | Ward’s minimum-variance method | Jaccard coefficient | Semi-partial R2; Calinski-Harabasz Pseudo F; Pseudo T2 statistic |
Formiga et al.,65 2013 | 328 aged 85 years; 16 | Agglomerative hierarchical cluster analysis | Average agglomeration | Yule Q | NS |
Gabilondo et al.,66 2017 | 2 255 406; 27 | Agglomerative hierarchical cluster analysis (stratified by sex) | Ward’s minimum-variance method | Pearson’s dissimilarity | Cut-off based on clinical importance |
García-Olmos et al.,67 2012 | 198 670; 26 | Multiple correspondence analysis (account for age and sex) | Graphic of category points in selected dimensions to indicate ‘importance’ | Data matrix | Benzécri inertia adjustment; inertia >0.05 |
Garin et al.,68 2014 | 3625 aged >50 years; 11 | Exploratory factor analysis | Extraction method NS; Oblique (Oblimin) rotation | Tetrachoric correlation matrix | Eigenvalue ≥1; factor loadings >0.25 |
Garin et al.,69 2016 | 41 909 aged >50 years; 12 | Exploratory factor analysis | Extraction method NS, Oblique rotation | Tetrachoric correlation matrix | Parallel analysis by simulation; Scree test |
Goldstein et al.,70 2010 | 3591 homeless veterans; 16 | Factor analysis | Principal-component method; Varimax rotation | Pearson-correlation matrix | Eigenvalue ≥1 |
Gu et al.,71 2017 | 2452 aged ≥60 years; 13 | Factor analysis | Principal-component method; Oblique (Oblimin) rotation | Tetrachoric correlation matrix | Eigenvalue >1; factor loadings ≥0.25 |
Held et al.,54 2016 | 1464 aged ≥70 years; 17 | Network and cluster analysis | Modularity class analysis | ForceAtlas2 layout (geometric distance and degree of nodes) | Resolution = 0.6 |
Hermans and Evenhuis,72 2014 | 1047 with at least two conditions and aged ≥50 years with ID; 14 | Exploratory factor analysis | Maximum-likelihood method; Varimax rotation | Tetrachoric correlation matrix | CFI; RMSEA; SRMR |
Herr et al.,73 2015 | 117 with diagnosed heart failure; 10 | Factor analysis | Principal-component method; Oblimin rotation | Pearson-correlation matrix | Scree plot |
Holden et al.,28 2011 | 78 430; 23 | Exploratory factor analysis | Extraction method NS; orthogonal quartimax/quartimin rotation | Tetrachoric correlation matrix | Scree plot; Eigenvalue >1.0; SRMR < 0.05; CFI & TLI > 0.95; factor loadings (magnitude) >0.4 |
Islam et al.,74 2014 | 4574; 10 | Agglomerative hierarchical cluster analysis; principal-component analysis | Average linkage; Varimax rotation | Yule Q; tetrachoric correlation matrix | Dendrogram; Agglomerative coefficient; Scree plot; Eigenvalue >1; SRMSR; CFI; TLI; factor loadings (magnitude) >0.3 |
Jackson et al.,75 2015 | 7270 women aged >75 years; 31 | Exploratory factor analysis | Principal-factor method; Varimax rotation | Tetrachoric correlation matrix | Eigenvalue >1; Scree plot |
Jackson et al.,76 2016 | 4896 women; 31 | Exploratory factor analysis | Principal-factor method; Varimax rotation | Tetrachoric correlation matrix | Eigenvalue >1; Scree plot |
John et al.,25 2003 | 1039 aged ≥60 years; 11 | Agglomerative hierarchical cluster analysis | Complete linkage; average linkage | Phi coefficient; Yule Q | Sensitivity analysis; Goodman and Kruskal’s gamma |
Jovic et al.,77 2016 | 13 103 aged ≥20 years; 13 | Exploratory factor analysis (stratified by age group and sex) | Principal-component method; Varimax orthogonal rotation | NS | Scree plot; Factor loadings >0.25; Eigenvalues >1 |
Kim et al.,78 2012 | 1844 HIV-infected patients; 15 | Exploratory factor analysis | Extraction method NS; oblique rotation | Tetrachoric correlation matrix | Eigenvalues; CFL; TLI; RMSEA |
Kirchberger et al.,79 2012 | 4127 aged ≥65 years; 13 | Exploratory factor analysis | Principal-factor method; oblique (Oblimin) rotation | Tetrachoric correlation matrix | Eigenvalue ≥1; factor loadings ≥0.25 |
Kumar et al.,80 2017 | 2134 aged ≥50 years; 45 | Hierarchical cluster analysis using treelet transformation | Principal-component analysis | Correlation | Dendrogram; cut level of 12 determined by a 10-fold cross-validation procedure |
Magnan et al.,81 2018 | 29 562 with types 1 or 2 diabetes; 12 | Exploratory factor analysis | Robust weight least squares; oblique (Geomin) rotation | Tetrachoric correlation matrix | Eigenvalue >1; interpretability of factors |
Marengoni et al.,20 2009 | 1099; 15 | Agglomerative hierarchical cluster analysis | Average linkage | Yule Q | NS |
Marengoni et al.,82 2010 | 1332 aged ≥65 years; 19 | Agglomerative hierarchical cluster analysis | Average linkage | Yule Q | NS |
Marengoni et al.,83 2013 | n1 = 1155, n2 = 1173; 19 | Agglomerative hierarchical cluster analysis | Average linkage | Yule Q | NS |
Ng et al.,34 2012 | 8841; 24 | Three-step cluster analysis | Searching all comorbid pairs to form groups | Somers’ D statistic | Association test using the Benjamini–Hochberg procedure |
Nurnberg et al.,84 1991 | 110; 12 | Factor analysis | Principal-factor method; orthogonal Varimax rotation | Correlation matrix | Eigenvalue >1; factor loadings ≥0.4 |
Poblador-Plou et al.,85 2014 | n1 = 79 291, n2 = 275 682; 114 | Exploratory factor analysis | Principal-factor method; oblique (Oblimin) rotation | Tetrachoric correlation matrix | Scree plot; factor loadings ≥0.25 |
Prados-Torres et al.,86 2012 | 275 682; 13–39 for various age group and sex | Exploratory factor analysis (stratified by age group and sex) | Principal-factor method; oblique (Oblimin) rotation | Tetrachoric correlation matrix | Scree plot; factor loadings ≥0.25 |
Prazeres et al.,87 2015 | 1993; 16 | Agglomerative hierarchical cluster analysis | Average linkage | Yule Q | NS |
Ruiz et al.,88 2015 | 2 788 900 aged ≥65 years; 20 | Correspondence analysis | Graphic of category points in selected dimensions to quantify information explained | Data matrix | Benzécri inertia adjustment; Greenacre approximation |
Schäfer et al.,89 2010 | 149 280 aged ≥65 years; 46 | Exploratory factor analysis (stratified by sex) | Principal-factor method; oblique (Oblimin) rotation | Tetrachoric correlation matrix | Eigenvalue ≥1; factor loadings ≥0.25 |
Sideris et al.,90 2016 | No. of individuals and conditions NS | Agglomerative hierarchical cluster analysis | Ward’s minimum-variance method | Jaccard coefficient | Dendrogram; cut level of 1.5 determined based on a validation procedure |
Vu et al.,91 2011 | 3729 with at least two conditions and aged ≥65 years (fall-related injuries); 14 | Agglomerative hierarchical cluster analysis | Average linkage | Jaccard coefficient; Yule Q | Calinski/Harabasz index; Duda/Hart index |
Walker et al.,92 2016 | 5647 aged ≥55 years; 19 | Exploratory factor analysis | Principal-factor method; oblique (Oblimin) rotation | Tetrachoric correlation matrix | Scree plot; Eigenvalue ≥1; factor loadings >0.25 |
Wang et al.,93 2015 | 1480 aged ≥60 years; 16 | Exploratory factor analysis | Extraction method NS; oblique (Oblimin) rotation | Tetrachoric correlation matrix | Eigenvalue >1; factor loadings ≥0.25 |
NS, not specified; CFI, comparative fit index; TLI, Tucker-Lewis index; RMSEA, root mean square error approximation; SRMR, standardized root mean square residual; ID, intellectual disabilities.
Analytical methods
Our review identified five analytical methods used for identifying multimorbid condition groups (Table 1). We described these five methods below, along with their adaptations specifically for multimorbidity research.
Factor-analysis method
The role of factor analysis is to identify ‘latent’ factors based on the assumption that variables associated with the same factor share a common underlying trait that is responsible for the correlation among them. The implementation of factor analysis begins by factoring a correlation or covariance matrix that measures the pairwise associations between the observed variables. For the dichotomous morbidity variable, a tetrachoric correlation matrix will lead to more valid results.39 Factoring the matrix requires statistical estimation procedures such as the principal-factor method, maximum-likelihood method and least squares, among others.40–42
A crucial component of factor analysis is the decision about the number of factors to retain/extract. Common factor-extraction criteria are the eigenvalue-greater-than-one rule, the graphical scree test, parellel analysis and the method of explained variance. Studies have shown that the eigenvalue-greater-than-one rule leads to overextraction (inclusion of extraneous factors),43,44 especially in the presence of outliers, because their effect can cause eigenvalues to increase dramatically.45 The method of specified variance leads to few errors, although it can yield too many factors and lead to constructs with little theoretical value.45 Goodness-of-fit indexes are often considered to confirm the adequacy of the selected number of factors, including Tucker-Lewis index (TLI), root mean square residual (RMSR), standardized root mean square residual (SRMR) and comparative fit index (CFI).
Factor rotation is also required to obtain a solution with a simple structure that is parsimonious and easily interpretable. Two broad classes of rotation techniques: orthogonal rotation (encompasses Varimax, Quartimax and Equamax) tries to separate variables into non-overlapping clusters, each possibly associated with one factor in a way that factors are independent; and Oblique rotation, such as Oblimin, Quartimin and Promax, permits correlation among the factors, so that variables can be associated with more than one factor.
Hierarchical-clustering method
Proximity between entities can be quantified by either a similarity or a dissimilarity measure. In multimorbidity research, the proximity between two conditions is based on binary variables indicating the presence or absence of conditions in each individual and can be presented as a two-by-two contingency table.
Unified-clustering algorithm
Ng et al.34 proposed a three-step unified-clustering method to identify groups of multimorbid conditions. The method specifically addresses three statistical issues for using cluster analysis to study multimorbidity patterns, namely the adjustment for multimorbidity by chance, the uniqueness of clustering results and the control for false discovery. To this end, the first step adopts the asymmetric Somers’ D statistic to quantify the degree of non-random multimorbidity between any two conditions. The Somers’ D statistic adjusts for multimorbidity by chance and offers higher power to detect non-random multimorbidity compared with other common concordance statistics such as kappa, Kendall’s Tau-b, gamma and adjusted Rand index.34 With p conditions, the first step computes asymmetric Somers’ D statistics for p(p-1)/2 distinct pairs of conditions. The second step assesses their strength of association using the Benjamini–Hochberg procedure to control for the false discovery rate (FDR) at level α and offer protection against false positives.48 Suppose that q pairs of conditions are truly non-random, this procedure obtains the null distribution of Somers’ D statistics using a permutation approach and sets the error rate so that the expected number of false positives among these q pairs is less than one.34 In the third step, the algorithm adopts a ‘clumping’ clustering technique to search all comorbid pairs of conditions and then form groups of multimorbid conditions in which all conditions belonging to a group coexist with one another. The strength of multimorbidity among conditions in a group is given by the averaged Somers’ D statistics for all pairs of conditions belonging to the group.
This three-step method identifies overlapping groups of multimorbid conditions, which are easy to interpret. It is more realistic that some health conditions could belong to more than one multimorbidity group simultaneously.28 For the subsequent clustering of individuals on the basis of their multimorbidity patterns, Ng9 proposed a procedure to convert the overlapping groups into non-overlapping groups of multimorbid conditions. An updated version of the procedure is available, where we add an extra criterion in forming the non-overlapping groups. This new criterion reflects the idea that a given health condition and its ‘closest’ condition (with which the pairwise Somers’ D statistic is maximum) should form a multimorbid pair and belong to the same group. The use of UCINET6 for Windows49 can display graphically the network of health conditions, which are classified either as a ‘closed’ non-overlapping group (all conditions in a group coexist with one another) or a ‘semi-closed’ non-overlapping group (conditions in a group coexist with more than half of the members).
Multiple correspondence analysis
Multiple correspondence analysis (MCA)50 is a non-parametric multivariate method that uses graphical procedures to reveal the association between categorical variables, including binary, nominal and ordinal.51 The method attempts to present multivariate categorical data in a low-dimensional space (a counterpart of principal-component analysis for categorical data).
MCA is concerned with the decomposition of a data matrix to generate diagrams of row and column coordinates, called maps, which display the associations amongst (within) the variables. The decomposition can be approached using a Burt matrix.52 Obtaining the coordinates of row and column profiles with respect to the principal axes (dimensions) involve eigenvalue-eigenvector decomposition on the Burt matrix.53 The Burt matrix is a cross-product of an indicator matrix that arrays together all two-way cross-tabulations of the entire variables. It is symmetrical and its decomposition yields a solution with row and column profiles that account for the largest proportion of variance (inertia). The Burt matrix approach may overestimate the total inertia due to the existence of submatrices on the main diagonal that are cross-tabulations of each variable with itself.52,53 This problem can be avoided by using either adjusted inertias or joint correspondence analysis.53
Like most dimension-reduction techniques, the optimal number of dimensions to retain is of concern in MCA. This is done by defining a threshold, thus dividing 1 by the total number of variables in use.52 The dimensions with inertia or adjusted inertia greater than the defined threshold are therefore retained. Interpretation can be done by inspecting the map taking into consideration the dimensions on which each response category is best positioned based on proximity. The positions of the response categories along a given dimension set out the magnitude and sign of their loadings on that dimension. Response categories with the same sign on a given dimension are deemed to be in association with, but inversely related to, any category with an opposite sign.
Network and cluster analyses
Summary of review findings
The majority of the 41 fully reviewed papers presented in Table 1 adopted the factor-analysis method (21 studies; 51%), indicating that factor analysis was the most widely used analytical approach for identifying multimorbid condition groups in previous studies. Almost all of them (18 studies) used tetrachoric correlation matrices to account for binary morbidity data. Among them, 14 studies used the oblique method for rotation of factors.58,68,71,96,97 Only eight studies used the Varimax or the orthogonal quartimax rotation methods.28,72,74
The next commonly used analytical method was the hierarchical-clustering algorithm (16 studies; 39%). The average linkage,20,25,61,62,64,82,87,91 the Ward’s minimum-variance method59,63,64,98–102 and the Lance and Williams formula61,101,103 were the common methods adopted to update distances between clusters. Typical similarity measures adopted were Jaccard coefficient61,64,91,101,103 and Yule Q.20,25,62,65,74,82,87,104
The other three analytical methods were only adopted in recent studies. In particular, the unified three-step clustering method was developed specifically for clustering health conditions,34 the MCA technique is a multivariate statistical method specifically for uncovering associations among categorical (binary) morbidity data67,105 and the network and cluster analyses were employed to identify sub-networks or groups of connected health conditions.54
The quality of reporting of the 41 included studies was generally good, with quality scores rated from 14 to 21 out of 23 (see the detailed rating of these studies provided in Supplementary Table 3, available as Supplementary data at IJE online). The included papers with a lower reporting quality score were mainly methodology papers, which focused on the method for identifying groups of multimorbid conditions and illustrated its applicability using real datasets as an example.
Comparison analysis results
We applied the five analytical methods to the same Australian NHS 2007–08 data. We considered 25 health conditions, which prevalence rates are given in Table 2. Supplementary Figures 1–4, available as Supplementary data at IJE online, provide the algorithms in R (R Foundation for Statistical Computing, Vienna, Austria) for the first four analytical methods to identify multimorbid groups of these 25 conditions. In Supplementary Figure 5, available as Supplementary data at IJE online, we display a snapshot to illustrate the fifth method with the network and cluster analyses. Table 2 summarizes the identified groups of multimorbid conditions.
List of 25 health conditions and the groups of multimorbid health conditions identified by various analytical methods (NHS data)
Prevalence rate of health condition (in decreasing order) . | Analytical method (specific characteristics) . | Numbers of groups and conditions . | Groups of multimorbid health conditions (number of conditions in each group) . |
---|---|---|---|
Hay fever (16.4%) Hypertension (11.0%) Asthma (10.5%) Back pain (9.4%) Osteoarthritis (9.2%) Sinusitis (9.0%) Cholesterol (6.6%) Migraine (5.8%) Disc disorder (5.7%) Food allergy (5.6%) Mood disorder (5.5%) Gout (5.2%) Osteoporosis (3.8%) Anxiety (3.6%) Diabetes (3.6%) Depression (3.5%) Thyroid (2.8%) Rheumatoid (2.6%) Psoriasis (2.3%) Hernia (2.2%) Bronchitis (2.0%) Angina (2.0%) Anaemia (1.8%) Oedema (1.7%) Dermatitis (1.1%) | Factor analysis (oblique Oblimin rotation) | Five groups of 13 health conditions | Diabetes, Cholesterol, Angina, Hypertension (4) Anxiety, Depression, Mood disorder (3) Osteoarthritis, Osteoporosis (2) Hayfever, Sinusitis (2) Bronchitis, Asthma (2) |
Factor analysis (orthogonal quartimax rotation)a | Five groups of 16 health conditions | Diabetes, Cholesterol, Angina, Oedema, Hypertension, Osteoarthritis, Osteoporosis, Gout (8) Anxiety, Depression, Mood disorder, Migraine (4) Osteoarthritis, Osteoporosis (2) Hayfever, Sinusitis (2) Bronchitis, Asthma (2) | |
Hierarchical-clustering algorithm (Yule Q coefficient, average linkage) | Five groups of 22 health conditions | Diabetes, Cholesterol, Angina, Oedema, Hypertension, Dermatitis, Osteoarthritis, Rheumatoid, Osteoporosis, Gout (10) Anxiety, Depression, Mood disorder, Migraine (4) Hayfever, Sinusitis, Bronchitis, Asthma (4) Anaemias, Thyroid (2) Hernia, Disc disorder (2) | |
Hierarchical-clustering algorithm (Jaccard coefficient, average linkage) | Two groups of 19 health conditions | Diabetes, Cholesterol, Angina, Oedema, Hypertension, Disc disorder, Osteoarthritis, Osteoporosis, Gout (9) Anxiety, Depression, Mood disorder, Migraine, Hayfever, Sinusitis, Bronchitis, Asthma, Backpain, Food allergy (10) | |
Three-step unified-clustering methodb | Five groups of 19 health conditions | Diabetes, Cholesterol, Angina, Oedema, Hypertension, Gout (6) Osteoarthritis, Osteoporosis, Thyroid (3) Bronchitis, Asthma (2) Migraine, Hayfever, Sinusitis, Food allergy (4) Anxiety, Depression, Mood disorder, Backpain (4) | |
Multiple correspondence analysis (two dimensions) | Four groups of 19 health conditions | Thyroid, Rheumatoid, Hernia, Osteoarthritis, Osteoporosis (5) Diabetes, Cholesterol, Hypertension, Gout (4) Backpain, Psoriasis (2) Asthma, Hayfever, Food allergy, Sinusitis, Mood disorder, Migraine, Anxiety, Depression (8) | |
Multiple correspondence analysis (three dimensions)c | Six groups of 22 health conditions | Disc disorder, Thyroid, Rheumatoid, Hernia, Osteoarthritis, Osteroporosis (6) Diabetes, Cholesterol, Hypertension, Gout (4) Backpain, Psoriasis (2) Asthma, Hayfever, Food allergy, Sinusitis, Mood disorder, Migraine, Anxiety, Depression, Bronchitis, Anaemias (10) Food allergy, Asthma, Hayfever, Sinusitis (4) Anxiety, Depression, Mood disorder (3) | |
Network and cluster analyses method | Five groups of 25 health conditions | Diabetes, Cholesterol, Angina, Oedema, Hypertension, Gout (6) Hayfever, Asthma, Sinusitis, Bronchitis, Dermatitis, Food allergy (6) Anxiety, Depression, Mood disorder, Disc disorder (4) Migraine, Backpain, Anaemias, Psoriasis (4) Thyroid, Rheumatoid, Hernia, Osteoarthritis, Osteoporosis (5) |
Prevalence rate of health condition (in decreasing order) . | Analytical method (specific characteristics) . | Numbers of groups and conditions . | Groups of multimorbid health conditions (number of conditions in each group) . |
---|---|---|---|
Hay fever (16.4%) Hypertension (11.0%) Asthma (10.5%) Back pain (9.4%) Osteoarthritis (9.2%) Sinusitis (9.0%) Cholesterol (6.6%) Migraine (5.8%) Disc disorder (5.7%) Food allergy (5.6%) Mood disorder (5.5%) Gout (5.2%) Osteoporosis (3.8%) Anxiety (3.6%) Diabetes (3.6%) Depression (3.5%) Thyroid (2.8%) Rheumatoid (2.6%) Psoriasis (2.3%) Hernia (2.2%) Bronchitis (2.0%) Angina (2.0%) Anaemia (1.8%) Oedema (1.7%) Dermatitis (1.1%) | Factor analysis (oblique Oblimin rotation) | Five groups of 13 health conditions | Diabetes, Cholesterol, Angina, Hypertension (4) Anxiety, Depression, Mood disorder (3) Osteoarthritis, Osteoporosis (2) Hayfever, Sinusitis (2) Bronchitis, Asthma (2) |
Factor analysis (orthogonal quartimax rotation)a | Five groups of 16 health conditions | Diabetes, Cholesterol, Angina, Oedema, Hypertension, Osteoarthritis, Osteoporosis, Gout (8) Anxiety, Depression, Mood disorder, Migraine (4) Osteoarthritis, Osteoporosis (2) Hayfever, Sinusitis (2) Bronchitis, Asthma (2) | |
Hierarchical-clustering algorithm (Yule Q coefficient, average linkage) | Five groups of 22 health conditions | Diabetes, Cholesterol, Angina, Oedema, Hypertension, Dermatitis, Osteoarthritis, Rheumatoid, Osteoporosis, Gout (10) Anxiety, Depression, Mood disorder, Migraine (4) Hayfever, Sinusitis, Bronchitis, Asthma (4) Anaemias, Thyroid (2) Hernia, Disc disorder (2) | |
Hierarchical-clustering algorithm (Jaccard coefficient, average linkage) | Two groups of 19 health conditions | Diabetes, Cholesterol, Angina, Oedema, Hypertension, Disc disorder, Osteoarthritis, Osteoporosis, Gout (9) Anxiety, Depression, Mood disorder, Migraine, Hayfever, Sinusitis, Bronchitis, Asthma, Backpain, Food allergy (10) | |
Three-step unified-clustering methodb | Five groups of 19 health conditions | Diabetes, Cholesterol, Angina, Oedema, Hypertension, Gout (6) Osteoarthritis, Osteoporosis, Thyroid (3) Bronchitis, Asthma (2) Migraine, Hayfever, Sinusitis, Food allergy (4) Anxiety, Depression, Mood disorder, Backpain (4) | |
Multiple correspondence analysis (two dimensions) | Four groups of 19 health conditions | Thyroid, Rheumatoid, Hernia, Osteoarthritis, Osteoporosis (5) Diabetes, Cholesterol, Hypertension, Gout (4) Backpain, Psoriasis (2) Asthma, Hayfever, Food allergy, Sinusitis, Mood disorder, Migraine, Anxiety, Depression (8) | |
Multiple correspondence analysis (three dimensions)c | Six groups of 22 health conditions | Disc disorder, Thyroid, Rheumatoid, Hernia, Osteoarthritis, Osteroporosis (6) Diabetes, Cholesterol, Hypertension, Gout (4) Backpain, Psoriasis (2) Asthma, Hayfever, Food allergy, Sinusitis, Mood disorder, Migraine, Anxiety, Depression, Bronchitis, Anaemias (10) Food allergy, Asthma, Hayfever, Sinusitis (4) Anxiety, Depression, Mood disorder (3) | |
Network and cluster analyses method | Five groups of 25 health conditions | Diabetes, Cholesterol, Angina, Oedema, Hypertension, Gout (6) Hayfever, Asthma, Sinusitis, Bronchitis, Dermatitis, Food allergy (6) Anxiety, Depression, Mood disorder, Disc disorder (4) Migraine, Backpain, Anaemias, Psoriasis (4) Thyroid, Rheumatoid, Hernia, Osteoarthritis, Osteoporosis (5) |
Main similarities among the identified groupings of conditions are in bold: (i) cardiovascular and metabolic diseases: Diabetes, Cholesterol, Angina, Oedema, Hypertension, Gout; (ii) mental health problems: Anxiety, Depression, Mood disorder; and (iii) allergic diseases: Hayfever, Sinusitis, Food allergy.
aOsteoarthritis and osteoporosis were overlapped in Groups 1 and 3.
bThe first three groups were closed, whereas the last two groups were semi-closed (see text for definitions).
cAsthma, Hayfever, Food allergy, Sinusitis, Mood disorder, Anxiety and Depression overlapped.
List of 25 health conditions and the groups of multimorbid health conditions identified by various analytical methods (NHS data)
Prevalence rate of health condition (in decreasing order) . | Analytical method (specific characteristics) . | Numbers of groups and conditions . | Groups of multimorbid health conditions (number of conditions in each group) . |
---|---|---|---|
Hay fever (16.4%) Hypertension (11.0%) Asthma (10.5%) Back pain (9.4%) Osteoarthritis (9.2%) Sinusitis (9.0%) Cholesterol (6.6%) Migraine (5.8%) Disc disorder (5.7%) Food allergy (5.6%) Mood disorder (5.5%) Gout (5.2%) Osteoporosis (3.8%) Anxiety (3.6%) Diabetes (3.6%) Depression (3.5%) Thyroid (2.8%) Rheumatoid (2.6%) Psoriasis (2.3%) Hernia (2.2%) Bronchitis (2.0%) Angina (2.0%) Anaemia (1.8%) Oedema (1.7%) Dermatitis (1.1%) | Factor analysis (oblique Oblimin rotation) | Five groups of 13 health conditions | Diabetes, Cholesterol, Angina, Hypertension (4) Anxiety, Depression, Mood disorder (3) Osteoarthritis, Osteoporosis (2) Hayfever, Sinusitis (2) Bronchitis, Asthma (2) |
Factor analysis (orthogonal quartimax rotation)a | Five groups of 16 health conditions | Diabetes, Cholesterol, Angina, Oedema, Hypertension, Osteoarthritis, Osteoporosis, Gout (8) Anxiety, Depression, Mood disorder, Migraine (4) Osteoarthritis, Osteoporosis (2) Hayfever, Sinusitis (2) Bronchitis, Asthma (2) | |
Hierarchical-clustering algorithm (Yule Q coefficient, average linkage) | Five groups of 22 health conditions | Diabetes, Cholesterol, Angina, Oedema, Hypertension, Dermatitis, Osteoarthritis, Rheumatoid, Osteoporosis, Gout (10) Anxiety, Depression, Mood disorder, Migraine (4) Hayfever, Sinusitis, Bronchitis, Asthma (4) Anaemias, Thyroid (2) Hernia, Disc disorder (2) | |
Hierarchical-clustering algorithm (Jaccard coefficient, average linkage) | Two groups of 19 health conditions | Diabetes, Cholesterol, Angina, Oedema, Hypertension, Disc disorder, Osteoarthritis, Osteoporosis, Gout (9) Anxiety, Depression, Mood disorder, Migraine, Hayfever, Sinusitis, Bronchitis, Asthma, Backpain, Food allergy (10) | |
Three-step unified-clustering methodb | Five groups of 19 health conditions | Diabetes, Cholesterol, Angina, Oedema, Hypertension, Gout (6) Osteoarthritis, Osteoporosis, Thyroid (3) Bronchitis, Asthma (2) Migraine, Hayfever, Sinusitis, Food allergy (4) Anxiety, Depression, Mood disorder, Backpain (4) | |
Multiple correspondence analysis (two dimensions) | Four groups of 19 health conditions | Thyroid, Rheumatoid, Hernia, Osteoarthritis, Osteoporosis (5) Diabetes, Cholesterol, Hypertension, Gout (4) Backpain, Psoriasis (2) Asthma, Hayfever, Food allergy, Sinusitis, Mood disorder, Migraine, Anxiety, Depression (8) | |
Multiple correspondence analysis (three dimensions)c | Six groups of 22 health conditions | Disc disorder, Thyroid, Rheumatoid, Hernia, Osteoarthritis, Osteroporosis (6) Diabetes, Cholesterol, Hypertension, Gout (4) Backpain, Psoriasis (2) Asthma, Hayfever, Food allergy, Sinusitis, Mood disorder, Migraine, Anxiety, Depression, Bronchitis, Anaemias (10) Food allergy, Asthma, Hayfever, Sinusitis (4) Anxiety, Depression, Mood disorder (3) | |
Network and cluster analyses method | Five groups of 25 health conditions | Diabetes, Cholesterol, Angina, Oedema, Hypertension, Gout (6) Hayfever, Asthma, Sinusitis, Bronchitis, Dermatitis, Food allergy (6) Anxiety, Depression, Mood disorder, Disc disorder (4) Migraine, Backpain, Anaemias, Psoriasis (4) Thyroid, Rheumatoid, Hernia, Osteoarthritis, Osteoporosis (5) |
Prevalence rate of health condition (in decreasing order) . | Analytical method (specific characteristics) . | Numbers of groups and conditions . | Groups of multimorbid health conditions (number of conditions in each group) . |
---|---|---|---|
Hay fever (16.4%) Hypertension (11.0%) Asthma (10.5%) Back pain (9.4%) Osteoarthritis (9.2%) Sinusitis (9.0%) Cholesterol (6.6%) Migraine (5.8%) Disc disorder (5.7%) Food allergy (5.6%) Mood disorder (5.5%) Gout (5.2%) Osteoporosis (3.8%) Anxiety (3.6%) Diabetes (3.6%) Depression (3.5%) Thyroid (2.8%) Rheumatoid (2.6%) Psoriasis (2.3%) Hernia (2.2%) Bronchitis (2.0%) Angina (2.0%) Anaemia (1.8%) Oedema (1.7%) Dermatitis (1.1%) | Factor analysis (oblique Oblimin rotation) | Five groups of 13 health conditions | Diabetes, Cholesterol, Angina, Hypertension (4) Anxiety, Depression, Mood disorder (3) Osteoarthritis, Osteoporosis (2) Hayfever, Sinusitis (2) Bronchitis, Asthma (2) |
Factor analysis (orthogonal quartimax rotation)a | Five groups of 16 health conditions | Diabetes, Cholesterol, Angina, Oedema, Hypertension, Osteoarthritis, Osteoporosis, Gout (8) Anxiety, Depression, Mood disorder, Migraine (4) Osteoarthritis, Osteoporosis (2) Hayfever, Sinusitis (2) Bronchitis, Asthma (2) | |
Hierarchical-clustering algorithm (Yule Q coefficient, average linkage) | Five groups of 22 health conditions | Diabetes, Cholesterol, Angina, Oedema, Hypertension, Dermatitis, Osteoarthritis, Rheumatoid, Osteoporosis, Gout (10) Anxiety, Depression, Mood disorder, Migraine (4) Hayfever, Sinusitis, Bronchitis, Asthma (4) Anaemias, Thyroid (2) Hernia, Disc disorder (2) | |
Hierarchical-clustering algorithm (Jaccard coefficient, average linkage) | Two groups of 19 health conditions | Diabetes, Cholesterol, Angina, Oedema, Hypertension, Disc disorder, Osteoarthritis, Osteoporosis, Gout (9) Anxiety, Depression, Mood disorder, Migraine, Hayfever, Sinusitis, Bronchitis, Asthma, Backpain, Food allergy (10) | |
Three-step unified-clustering methodb | Five groups of 19 health conditions | Diabetes, Cholesterol, Angina, Oedema, Hypertension, Gout (6) Osteoarthritis, Osteoporosis, Thyroid (3) Bronchitis, Asthma (2) Migraine, Hayfever, Sinusitis, Food allergy (4) Anxiety, Depression, Mood disorder, Backpain (4) | |
Multiple correspondence analysis (two dimensions) | Four groups of 19 health conditions | Thyroid, Rheumatoid, Hernia, Osteoarthritis, Osteoporosis (5) Diabetes, Cholesterol, Hypertension, Gout (4) Backpain, Psoriasis (2) Asthma, Hayfever, Food allergy, Sinusitis, Mood disorder, Migraine, Anxiety, Depression (8) | |
Multiple correspondence analysis (three dimensions)c | Six groups of 22 health conditions | Disc disorder, Thyroid, Rheumatoid, Hernia, Osteoarthritis, Osteroporosis (6) Diabetes, Cholesterol, Hypertension, Gout (4) Backpain, Psoriasis (2) Asthma, Hayfever, Food allergy, Sinusitis, Mood disorder, Migraine, Anxiety, Depression, Bronchitis, Anaemias (10) Food allergy, Asthma, Hayfever, Sinusitis (4) Anxiety, Depression, Mood disorder (3) | |
Network and cluster analyses method | Five groups of 25 health conditions | Diabetes, Cholesterol, Angina, Oedema, Hypertension, Gout (6) Hayfever, Asthma, Sinusitis, Bronchitis, Dermatitis, Food allergy (6) Anxiety, Depression, Mood disorder, Disc disorder (4) Migraine, Backpain, Anaemias, Psoriasis (4) Thyroid, Rheumatoid, Hernia, Osteoarthritis, Osteoporosis (5) |
Main similarities among the identified groupings of conditions are in bold: (i) cardiovascular and metabolic diseases: Diabetes, Cholesterol, Angina, Oedema, Hypertension, Gout; (ii) mental health problems: Anxiety, Depression, Mood disorder; and (iii) allergic diseases: Hayfever, Sinusitis, Food allergy.
aOsteoarthritis and osteoporosis were overlapped in Groups 1 and 3.
bThe first three groups were closed, whereas the last two groups were semi-closed (see text for definitions).
cAsthma, Hayfever, Food allergy, Sinusitis, Mood disorder, Anxiety and Depression overlapped.
Factor analysis with tetrachoric correlations led to different groupings of multimorbid conditions when different model-selection and/or rotation methods were used. With the maximum-likelihood method, we identified five groups of 13 health conditions with the commonly used oblique Oblimin rotation method and five groups of 16 conditions when the orthogonal quartimax rotation method was used (Table 2). Supplementary Tables 4 and 5, available as Supplementary data at IJE online, display the factors and their loadings for these two rotation methods.
Hierarchical-clustering algorithms also led to different groupings of multimorbid conditions when different methods of calculating distances between clusters and proximity measures were used. Figure 1 displays the dendrogram using the Yule Q coefficient and the average linkage method. We identified three groups of 22 health conditions (Table 2). Supplementary Figure 6, available as Supplementary data at IJE online, shows the dendrogram of a different grouping result (two groups of 19 conditions) with the uses of the Jaccard coefficient and the average linkage method (Table 2).

Hierarchical clustering of health conditions using Yule Q coefficient and the average linkage method.
The three-step unified-clustering method obtained initially 18 overlapping groups of multimorbid conditions (see Supplementary Table 6, available as Supplementary data at IJE online). Figure 2 displays the network of identified groups of multimorbid conditions. Osteoarthritis, Hypertension and Sinusitis are the top three conditions with the highest number of multimorbid conditions. Using the conversion procedure, we identified five non-overlapping groups of 19 conditions. Supplementary Table 7, available as Supplementary data at IJE online, shows the pairwise Somers’ D statistics on the degree of non-random multimorbidity among 25 conditions.
With MCA, seven dimensions consisted of principal inertias (see Supplementary Table 8, available as Supplementary data at IJE online), all greater than the average inertia 1/25 = 0.04. Among these seven principal inertias, the last five dimensions made a worthless contribution to the explained inertia, hence the first two dimensions that accounted for 83.1% were retained. Figure 3 displays the results of the MCA based on a Burt matrix for these two dimensions. We identified four groups of 19 conditions (Table 2). Supplementary Figure 7, available as Supplementary data at IJE online, shows a different grouping result (six groups of 22 conditions) corresponding to the MCA based on the Burt matrix of three dimensions, which explain 84.8% of principal inertias (Table 2).

Non-random multimorbidity between 19 conditions using the three-step unified clustering method (nodal size proportional to the number of conditions multimorbid with the condition; bolded lines link the ‘closest’ pairs).

MCA map for Dimension 2 vs Dimension 1 of health conditions based on Burt matrix with adjusted inertia: 0 = absence of condition, 1 = presence of condition.
The network and cluster analyses, using Gephi software (version 0.9.1) through a ForceAtlas2 layout algorithm with a resolution of 0.6 and edge weight influence of 1.0, identified five non-overlapping groups of 25 conditions (Table 2). Figure 4 presents the network of identified groups of conditions.

Network analysis of health conditions using ForceAtlas2 layout algorithm: same shaded/coloured nodes (indicated with the group number) depict a multimorbid health condition group obtained via modularity class analysis (nodal size proportional to the prevalence of the health condition).
Discussion
Our systematic review showed that the analytical methods used in different studies to identify groups of multimorbid conditions are heterogeneous, which may explain the variation in the multimorbidity patterns reported in these studies. As illustrated in the comparison analysis using the NHS data, even the same analytical method with different criteria for the proximity measure or the distance between clusters may obtain various results. The specific analytical methodology adopted for grouping multimorbid conditions along with the selection/coding of conditions and the target populations are the key factors to explain the diversity of the multimorbid groups found across different studies.12 Future studies should provide a detailed description of the methodology used when reporting and summarizing patterns of multimorbidity. Comparison of multimorbidity patterns among studies is plausible only if the same analytical method is used for grouping multimorbid health conditions.
The five analytical methods use different approaches to form clusters and adapt to the study of multimorbidity. The measure adopted to quantify co-occurrence between a pair of conditions is particularly crucial in explaining the variation in the identified multimorbid condition groups between these methods. Factor analysis works on a tetrachoric correlation matrix of morbidity data, but the capacity of this measure to adjust for co-occurrence by chance is unknown. An interpretation issue may also arise when negative factor loadings are found for certain latent factors (see Supplementary Tables 4 and 5, available as Supplementary data at IJE online) or when estimates of variances converge to a value close to zero (a Heywood case).106 Hierarchical-clustering methods commonly use Jaccard or Yule Q coefficients for multimorbidity studies. A previous study34 showed that the Jaccard coefficient is inappropriate to measure multimorbidity because it does not adjust for co-occurrence by chance. The three-step clustering method quantifies the degree of co-occurrence of a pair of conditions using the asymmetric Somers’ D statistic with control for the FDR, which showed high power to detect non-random multimorbidity. The MCA works on a Burt matrix, which is the symmetrical matrix of all two-way cross-tabulations between each pair of conditions (an analogy to the covariance matrix for continuous variables). It is necessary to investigate further how this measure performs for adjusting co-occurrence by chance. The method involved both network and cluster analyses relies on Gephi software and a ForceAtlas2 algorithm to visualize the network of conditions based on a force-directed continuous graph layout and a modularity analysis to define clusters of conditions. Similarly to hierarchical-clustering methods, different cut-off values produce clustering results with a distinct number of clusters.
Applying these five methods to the same NHS morbidity data, we found that the methodological discrepancy between these methods had influences on the identification of multimorbidity patterns, leading to different groupings of multimorbid conditions. We extracted the main similarities among these identified groups. The first group consists of a combination of cardiovascular and metabolic diseases, including angina, hypertension, cholesterol, diabetes, oedema and gout. Some methods also relate osteoarthritis to this group, but Figure 2 indicates that osteoarthritis appears to form a stronger group of its own with osteoporosis and thyroid. The second group corresponds to mental health problems (anxiety, depression, mood disorder) alone. The hierarchical-clustering method with Yule Q coefficients links migraine with this group, whereas the three-step clustering method relates back pain with the mental health conditions (as back pain and depression form the closest pair) and the network and cluster analyses link disc disorder with the mental health disorder group. Previous studies107,108 showed associations between mental health disorders and pain. Another recent review found a high level of multimorbidity between mental disorders.109 The third group comprised allergic diseases, including hay fever, Sinusitis and food allergy. There is also suggestion that this group of allergic diseases may link to both bronchitis and asthma.
Previous multimorbidity studies often considered an index condition or an isolated combination of multimorbid conditions.17,110 When the prevalence of multimorbid condition combinations is not high [e.g. the most prevalent triple combination in the NHS data is (Hayfever, Sinusitis, Asthma); prevalence of 1%], the sample size for certain combinations of multimorbid conditions may not be sufficiently large to provide reliable and/or generalizable results. New approaches focus on identifying groups of individuals with different patterns of multimorbidity and thus allow impact analyses using the whole sample simultaneously. They include the use of latent class analysis,111,112 hierarchical-clustering methods,113,114 self-organizing maps,115 principal-component analysis116 and the mixture of multivariate generalized Bernoulli distributions.9
Recent reviews on multimorbidity studies12,26 recommended that future epidemiological studies should cover a broad selection of health conditions in order to avoid missing important nosological entities and enhance external validity. When many conditions are considered, clustering of individuals on the basis of morbidity data will face the issue of high-dimensional problems.117 This is particularly important when a clustering-based approach is adopted to assess the impact of multimorbidity on individuals’ health outcomes and service use. In handling high-dimensional morbidity data, grouping of multimorbid conditions provides a dimensional reduction method in the subsequent clustering of individuals.9 The three-step clustering method34 establishes a clustering-based multimorbidity score for each individual on the basis of a weighted scale in terms of the estimated probabilities of the membership of multimorbid condition groups.9 Grouping of multimorbid conditions thus offers an alternative approach to measure individual multimorbidity status for impact analyses to health outcomes. Another review118 focuses on multimorbidity in primary care and highlights the need to explore longitudinal data for evaluating which multimorbid conditions are more persistent over time119 and their role in the influence of health outcomes and service uses at different periods of time.21
The main strength of our review is the use of broad terms for building the search algorithm, which produced a list of 13 194 studies in the first round of the review, including relevant papers that had not been identified in previous reviews on multimorbidity studies. Our review has the following limitations. We included English-only articles. We may have omitted articles that considered multimorbidity with an index condition, and thus the interpretation of the identified multimorbidity patterns should be confined to the nature of co-occurrence of conditions observed in the general population. Our review focused on the analytical methods for identifying groups of multimorbid conditions and, as such, the assessment of the content of the identified patterns of multimorbidity falls outside of the scope of the present review. We used a single NHS dataset in the comparison analysis to assess the effects for heterogeneous analytical methods on the identified multimorbidity patterns.
Conclusion
Our review and comparison analysis suggest that the methodology applied to reveal the group structure of multimorbid conditions has great influence on the multimorbidity patterns found across different studies, inducing a serious challenge to interpreting or comparing these patterns. We identified and compared five analytical methods for identifying groups of multimorbid conditions. However, more work on further comparison of these methods in a wide variety of settings and populations would be helpful to guide choosing the analytical method for improved validity and generalizability of findings. Investigators should also attempt to compare results obtained by various methods for a consensus grouping of multimorbid conditions. Studies with a population-based longitudinal-study setting offer a cost-effective approach to trace patterns of multimorbidity at different times and identify which patterns have the greatest impact on health outcomes and health-service use over time. The findings will inform future research on intervention studies or trials for long-term achievements of improved care and wellbeing for people with multiple multimorbid conditions.
Funding
The work was supported by a grant from the Menzies Health Institute Queensland, Griffith University, Australia.
Acknowledgements
The authors are grateful to the Australian Bureau of Statistics for providing the Australian National Health Survey data in Confidentialised Unit Record Files (CURFs).
Conflict of interest: The authors have no conflicts of interest to declare.