Factor analysis has emerged as a useful method for understanding patterns underlying the co-occurrence of metabolic risk factors for both type 2 diabetes and atherosclerosis—often referred to as “insulin resistance syndrome.” In factor analysis of data on 322 healthy elderly people from the Cardiovascular Health Study, Sakkinen et al. (Am J Epidemiol 2000;152:897–907) confirmed findings from a dozen prior studies that as many as four distinct physiologic domains comprise the syndrome, with a unifying role for markers of insulin resistance. With the addition of markers of hemostasis and inflammation, they also found that impaired fibrinolysis and endothelial dysfunction are central features of the syndrome, while inflammation is only weakly linked to insulin resistance through associations with obesity. Am J Epidemiol 2000;152:908–11.
“Insulin resistance syndrome” is the co-occurrence of multiple metabolic and physiologic risk factors for both type 2 diabetes mellitus and atherosclerotic cardiovascular disease. This concept—that there exists a unique metabolic syndrome arising from tissue resistance to insulin—was put forth about a decade ago on the basis of both controlled clinical (1) and epidemiologic (2) observations. In the intervening years, a number of studies carried out in diverse populations have shown that multiple risk factors (including overall and central obesity, impaired glucose tolerance, the dyslipidemic combination of low levels of high density lipoprotein cholesterol and high levels of triglyceride, and hypertension) co-occur to a greater degree than would be expected by chance alone; furthermore, subjects with multiple concurrent risk factors tend to have high insulin levels, reflecting greater insulin resistance (3–8). Yet there persists little consensus about whether insulin resistance really is the unifying underlying abnormality, or about which metabolic variables constitute essential features, such that a case of the syndrome can be defined; even the name of the syndrome is the subject of uncertainty. While these basic issues remain unresolved, numerous claims are being put forth in the literature for various “extensions” of the syndrome or additional, novel features, including abnormalities in the fibrinolytic (9) and inflammatory (10) systems. In this light, the study by Sakkinen et al. (11) in this issue of the Journal expands our understanding of relations between basic features of the syndrome and ways in which novel abnormalities in fibrinolysis and inflammation are linked to these basic features.
Sakkinen et al. used factor analysis to assess metabolic risk factor data on 322 healthy elderly subjects. These subjects were a subset of participants in the Cardiovascular Health Study, for whom data on a variety of hemostatic and inflammatory markers had been obtained. Why use factor analysis? Factor analysis is a multivariate correlation method that is well suited for revealing underlying patterns or structure among variables showing high degrees of intercorrelation, as in the case of risk variables putatively comprising the insulin resistance syndrome. Factor analysis reduces a large set of measured risk variables into a smaller set of “latent” underlying factors. When derived using the method employed by Sakkinen et al.—principal components with orthogonal rotation—these factors represent unique, statistically independent domains. In the case of metabolic risk variables, these domains can be interpreted to represent distinct underlying physiologic phenotypes that are highly correlated with the variables which define the domains and uncorrelated with those which do not. Thus, a confusingly large number of apparently related risk variables can be reduced to a simpler set of distinct underlying physiologic phenotypes. If an analysis reveals only one underlying factor, this may be interpreted as supporting the unity hypothesis—that one unifying physiology (perhaps insulin resistance) underlies, and accounts for, metabolic risk variable clustering. Finding more than one factor contradicts the unity hypothesis. However, if a measured risk variable is associated with more than one factor, this overlap reveals unifying commonalties between physiologic domains. The pattern of overlap provides insight into the underlying structure of the syndrome.
Factor analysis has been used for decades to understand relations among psychometric variables, but only since the 1994 study by Edwards et al. (12) has this method been used to understand relations among metabolic risk variables. Since this seminal study, almost a dozen factor analyses of insulin resistance syndrome metabolic variables have been published (13–23). These studies have examined diverse populations, including men and women, children and the elderly, many racial/ethnic groups, and subjects with and without increased genetic risk for type 2 diabetes. Variations across studies in variables analyzed and in methods used for interpretation preclude formal direct comparisons, but an overview of the studies reveals several common patterns. First, all of the studies have found at least two factors, and usually three or four factors, underlying the overall correlation between risk variables. All of them have found that insulin variables (as a proxy for insulin resistance) correlate with or “load on” the same factors as measures of glycemia and obesity, and in many cases dyslipidemia. Most have found that measures of blood pressure load on a unique, separate factor. Most importantly, essentially all factor analyses have found that measures of insulin share loadings among more than one factor. Notable among these studies is that by Donahue et al. (24), wherein insulin resistance was measured directly with an insulin clamp rather than inferred by the presence of fasting hyperinsulinemia. In this analysis, nine variables were reduced to two factors joined together, unified, by the rate of insulin-mediated glucose disposal. The finding of more than one factor is interpreted as evidence that more than one physiologic process underlies full expression of insulin resistance syndrome, but factor analyses have made it clear that insulin resistance/hyperinsulinemia is a common unifying theme, the weft that weaves together much of the fabric of the syndrome.
The present study confirms and extends the results of others. As a first step, Sakkinen et al. reduced 10 basic metabolic features to four latent factors explaining 70 percent of the original variance (11). Concerned that the limited number of subjects in the subset with hemostatic/inflammatory markers might have given rise to spurious results, they repeated the basic factor analysis in the larger background Cardiovascular Health Study cohort. Analysis of a larger number of subjects produced a greater degree of overlap between the body mass and insulin/glucose factors, with variance shared between these factors by fasting levels of glucose and insulin. Relaxing the loading threshold used for interpretation from 0.4 to 0.3 further strengthened this overlap. There was no overlap at all between these two factors and either the blood pressure factor or the lipid factor. Lack of overlap with the blood pressure factor agrees with most prior results, blood pressure consistently being the most tenuously linked constituent of the insulin resistance syndrome. Lack of overlap with the lipid factor in this elderly cohort may be partially explained by observations that lipid levels become progressively less correlated with other cardiovascular disease risk factors as people age (25).
However, lack of overlap between half of the factors in the analysis raises the question of whether a four-factor solution is an overfactored solution in this data, with a three-factor model being the more parsimonious result. Both overfactoring and underfactoring are to be avoided, since either will give rise to spurious or incomplete impressions of underlying domains. The selection of the correct number of factors to consider should be based at least as much on the apparent attainment of “simple structure” as on application of strict quantitative thresholds. The authors used an eigenvalue threshold of unity (1.0) to retain factors in the model; this is a commonly used threshold recommended in Stevens' widely cited textbook on factor analysis (26). However, Cureton and D'Agostino's equally authoritative textbook (27) points out that the critical eigenvalue for retaining factors will vary considerably depending on the number of measured variables considered in the model. They recommend employing several different approaches to selecting a maximum number of salient factors, including (in addition to eigenvalue thresholds) the Bartlett test (which is very sensitive to the population sample size), the scree plot (which involves a fair amount of subjectivity in determining the inflection point above the “scree”), and requiring that the factor matrix retain relatively high loadings on some proportion of factors prior to rotation. If several of these approaches settle on a mutually agreeable number of factors to retain, that is probably the correct number. In many cases, that number will be equal to or less than the number of original variables divided by 3 (or 3, in the present case). Also a matter of some subjectivity is selection of the threshold above which to consider a loading “valid” for interpretation. Loadings are continuously distributed correlations, with higher loadings indicating stronger associations between measured variables and associated factors; the threshold for interpretation is an arbitrary, artifactual choice. Sakkinen et al. used the common threshold of 0.4, while Cureton and D'Agostino (27) suggest loading thresholds as low as 0.2 for factor interpretation, pointing out that in the original conception of factor analysis, ideal solutions were considered to be those that 1) found the fewest number of factors, 2) accounted for the majority of the original variance in the data, and 3) retained some overlap between factors, all while giving a clear pattern of both independence and relatedness of factors: “simple structure.” The lack of uniform, objective rules requires of the factor analyst thoroughness, flexibility, and a willingness to explore and present a variety of solutions, as well as skepticism of narrowly interpreted results.
After exploring the factor analytical relations among the basic insulin resistance syndrome metabolic variables, Sakkinen et al. added to the analysis 11 markers of hemostasis and inflammation. Here, a 21-variable model reduced to a seven-factor solution explaining 70 percent of the original variance in the data. Levels of plasminogen activator inhibitor-1 (PAI-1) had high loadings on the body mass and insulin/glucose factors. Elevated levels of PAI-1 reflect impaired fibrinolysis and a propensity towards acute arterial thrombosis, as well as impaired endothelial function. The present results suggest that these physiologic abnormalities are core features of insulin resistance syndrome. Links between hyperinsulinemia and abnormal regulation of PAI-1 have also been shown in other population studies (28). As in the first four-factor model considered, lowering the loading interpretation threshold to 0.3 in this seven-factor model served to further tighten the weave of intercorrelations between body mass, insulin/glucose, and PAI-1.
The seven-factor model showed no obvious overlaps with the so-called “inflammation,” “procoagulation,” and “vitamin K-dependent protein” factors (11), which suggests that these physiologic domains are essentially unrelated to insulin resistance and its attendant abnormalities. The failure of inflammatory markers to share substantial loadings with at least the body mass factor is surprising, given the emerging body of data linking elevated levels of C-reactive protein, for example, with increased body mass index (29, 30). Whether the relation between obesity and C-reactive protein holds as strongly in the elderly is uncertain; as for lipid levels, weakened associations with aging could obscure patterns existing in younger populations. Lack of association between fibrinogen level and any phenotype of the syndrome is not surprising, given that fibrinogen is consistently independent of insulin and other cardiovascular disease risk factors in many studies (28, 31). However, marginal loadings on the body mass factor by levels of plasmin-α2-antiplasmin (loading = −0.34) and C-reactive protein (0.28) do suggest at least weak links in these data between the central features of insulin resistance syndrome and a domain reflecting subclinical inflammation. Furthermore, in this model the authors did consider whether they had an overfactored solution. They explored relations in a six-factor solution, without apparently substantial differences in results (unfortunately, these results were not presented in detail).
Critics of factor analysis are quick to point out its subjective, qualitative nature. The potential for subjectivity is underscored by the discussion above regarding proper selection of the number of factors to consider, or the proper threshold for interpreting loadings. The most serious criticism of factor analysis is that it is difficult to disprove the findings. Most approaches do not include formal statistical hypothesis testing, and the number and nature of the variables chosen for inclusion in the modeling can drive the results. In the setting of large numbers of risk variables from relatively few study subjects, especially those who are somewhat selected “healthy survivors,” there is a very real possibility that quirks in the variance distribution for some variables or other stochastic “noise” in the data will produce spurious, undetectable errors. In this study, the authors did attempt to validate their results by repeating the analysis in the larger background cohort. Several other methods for more formal testing of factor analysis results have been summarized (23).
External validation is also a reasonable way to view the credibility of the factor analytical approach to understanding interrelated metabolic risk factors. The work by Sakkinen et al. confirms findings from a dozen studies that similar patterns occur—in this case, in a very elderly cohort. The basic insulin resistance syndrome is comprised of three or four distinct underlying phenotypes: obesity-hyperinsulinemia, dyslipidemia, impaired glucose tolerance, and hypertension, whose joint occurrence defines a case of the syndrome. Levels of insulin, marking the presence of insulin resistance, are a shared feature joining together most of these phenotypes. Blood pressure remains only loosely associated with the central features of the syndrome. In the Cardiovascular Health Study cohort, impaired fibrinolysis and perhaps endothelial dysfunction are also part of the syndrome, constituting a core feature of the obesity-hyperinsulinemia phenotype. Subclinical inflammation is perhaps also linked through associations with body mass, although this finding needs to be reexplored in other, younger populations. Factor analysis, once the more-or-less exclusive tool of psychometricians, provides unique and by now reasonably consistent insight into phenotypic patterns hidden within the fabric of risk factors comprising the appropriately named insulin resistance syndrome.
This work was supported by a clinical research grant from the American Diabetes Association (Alexandria, Virginia) and an unrestricted research grant from SmithKline Beecham (Philadelphia, Pennsylvania).