Abstract

Is the search for the causes of complex disease akin to the alchemist's vain quest for the Philosopher's Stone? Complex chronic diseases have tremendous public health impact in the industrialized world. Much effort has been expended on research into their causes, with the aim of predicting who will be affected or preventing effects before they arise, but progress has been halting at best. In this paper, we discuss possible reasons including the use of models and methods that fit point-source and Mendelian diseases but may not be as appropriate for complex diseases, reliance on causal criteria that may not be as relevant as they are for communicable diseases, and the biology of complex disease itself. Finally, we ask whether most complex diseases are even good candidates for the kind of prediction and prevention that we have come to expect based on experience with infectious and Mendelian disease.

Introduction

Chronic complex diseases have increasingly become the focus of epidemiology and human genetics as conditions such as cardiovascular disease, hypertension, asthma, diabetes, Alzheimer's disease, cancers, psychiatric diseases, and the like have increased in prevalence and public health importance. Yet, in spite of massive investments in research time and funding, the causes of these diseases are still largely unknown or, at best, only vaguely known. This state of affairs stands in stark contrast to prior decades of success in which both epidemiology and human genetics uncovered major causal risk factors such as genes for Mendelian disease, infectious disease vectors, or environmental toxins, which led to the development of numerous preventive measures in the form of workplace regulation, public sanitation, vaccination, and prenatal genetic testing, as well as effective treatments, and lent the fields of epidemiology and human genetics a well-earned authority.

Complex diseases have proven much less tractable (Table 1). Often, the results of genetic association studies can not be replicated or subsequent studies implicate different genetic loci, environmental epidemiological findings are contradicted by new studies, publication bias in favour of strong positive results makes it difficult to interpret the significance of later studies, risk estimates are highly variable or inconsistent or upon meta-analyses converge on little or no effect, and so forth.

Table 1

Recent unconfirmed or contradicted risk factor/disease associations

Table of irreproducible results?
 
Hormone replacement therapy and heart disease 
Hormone replacement therapy and cancer 
Stress and stomach ulcers 
Annual physical checkups and disease prevention 
Behavioural disorders and their cause 
Diagnostic mammography and cancer prevention 
Breast self-exam and cancer prevention 
Echinacea and colds 
Vitamin C and colds 
Baby aspirin and heart disease prevention 
Dietary salt and hypertension 
Dietary fat and heart disease 
Dietary calcium and bone strength 
Obesity and disease 
Dietary fibre and colon cancer 
The food pyramid and nutrient RDAs 
Cholesterol and heart disease 
Homocysteine and heart disease 
Inflammation and heart disease 
Olive oil and breast cancer 
Fidgeting and obesity 
Sun and cancer 
Mercury and autism 
Obstetric practice and schizophrenia 
Mothering patterns and schizophrenia 
Anything else and schizophrenia 
Red wine (but not white, and not grape juice) and heart disease 
Syphilis and genes 
Mothering patterns and autism 
Breast feeding and asthma 
Bottle feeding and asthma 
Anything and asthma 
Power transformers and leukaemia 
Nuclear power plants and leukaemia 
Cell phones and brain tumours 
Vitamin antioxidants and cancer, aging 
HMOs and reduced health care cost 
HMOs and healthier Americans 
Genes and you name it! 
Table of irreproducible results?
 
Hormone replacement therapy and heart disease 
Hormone replacement therapy and cancer 
Stress and stomach ulcers 
Annual physical checkups and disease prevention 
Behavioural disorders and their cause 
Diagnostic mammography and cancer prevention 
Breast self-exam and cancer prevention 
Echinacea and colds 
Vitamin C and colds 
Baby aspirin and heart disease prevention 
Dietary salt and hypertension 
Dietary fat and heart disease 
Dietary calcium and bone strength 
Obesity and disease 
Dietary fibre and colon cancer 
The food pyramid and nutrient RDAs 
Cholesterol and heart disease 
Homocysteine and heart disease 
Inflammation and heart disease 
Olive oil and breast cancer 
Fidgeting and obesity 
Sun and cancer 
Mercury and autism 
Obstetric practice and schizophrenia 
Mothering patterns and schizophrenia 
Anything else and schizophrenia 
Red wine (but not white, and not grape juice) and heart disease 
Syphilis and genes 
Mothering patterns and autism 
Breast feeding and asthma 
Bottle feeding and asthma 
Anything and asthma 
Power transformers and leukaemia 
Nuclear power plants and leukaemia 
Cell phones and brain tumours 
Vitamin antioxidants and cancer, aging 
HMOs and reduced health care cost 
HMOs and healthier Americans 
Genes and you name it! 

These issues have not gone unremarked by the research community.15 The typical response has been to assert the need for intensification of current approaches: bigger longer-term observational studies, better technology, and more elaborate statistical methods. Another response has been to argue that pure statistical empiricism—black-box epidemiology—is not working and that we need to turn to a more determined consideration of the underlying biological mechanism and causation of disease.68 To many epidemiologists this has meant a turn to genetics.916 Unfortunately, the same difficulties that plague environmental epidemiology apply to the genetics of complex disease as well, and for similar reasons.

In this paper, we ask why it is that after a massive scientific assault, we are still unable to account specifically for more than a small fraction of common complex diseases. We discuss methodological issues that are particularly problematic owing to the biology of complex disease, and we address epistemological assumptions about causation. The problem is a compound one, but these considerations are at its core.

Complex traits and current methodology

Why have not the traditional methods of epidemiology and genetics been sufficient to determine the causes of complex diseases, and why is individual risk essentially impossible to predict? The reasons largely have to do with the fact that, because they are generally due to both endogenous genetic and exogenous environmental components, complex diseases represent a many-to-many causal universe, a likely fact that is routinely stated but almost as routinely given little more than lip service. Our methods are largely ill-equipped to tease these kinds of multiple cause and effect relationships apart. The multifactorial biology of these diseases means that most causal factors have a relatively weak effect, whereas our methods and study designs were developed to detect risk factors with strong effect. Here we briefly mention some aspects of complex diseases that may confound epidemiological or genetic studies; all are well-known, but too often not considered when studies are designed.

Phenotypic heterogeneity

Generally speaking, complex diseases are those for which there is no single cause with uniformly high predictive power. Individual cases may be multifactorial or each may be unicausal but due to one of multiple single paths to the same disease—but it is likely that both situations occur for most diseases, and it is not easy to sort this out. Similarly, a disease given a single name may encompass a range of related conditions so that assembling cases with the ‘same’ disease can be a challenge. Even when researchers attempt to define the disease precisely—same cell type, same behavioural manifestations, same symptoms, for example—they may in fact be studying a causally heterogeneous phenotype. This might seem to be readily avoidable, but are the very different forms of autism for example, even within families, the same condition?

Genotypic heterogeneity and phenotypic ambiguity

Even with ‘simple’ Mendelian traits there are many, sometimes hundreds, of alleles that cause the ‘same’ disease—a phenomenon that has been called ‘phenogenetic equivalence’.17,18 The designation ‘simple’ itself is probably quite wrong as a rule.19 The finding of multiple genetic pathways to a single outcome is not unexpected given that traits evolve by selection on the phenotypes, not genotypes17,18—but it can confound genetic studies of complex disease.

The unilocus nature of a trait that is polymorphic can be obscured by population heterogeneity. The responsible alleles (genetic variants) may have different frequencies (including zero) in different populations, and/or a plethora of relevant alleles in the causal gene usually differ among populations.17 If there are m different alleles at a gene, there are m(m + 1)/2 possible genotypes, that even with modest m provides sorting, sample size, sample comparability, and multiple-testing challenges when it comes to finding, and evaluating the effects of, individual variants (a problem that will be multiplicatively compounded if serious attention was simultaneously given to both genetic and environmental factors). It is only partial comfort that most of the genotypes that actually exist are rare. Rare genotypes are captured by the medical ascertainment system that screens hundreds of millions of people and, in aggregate, comprise a sizeable fraction of alleles found in cases. It may be easy to evaluate causation for the common genotypes, but often only by assuming what we want to show—that the gene is the cause of the disease—can causation be attributed to specific rare ones. The upshot is that it is only by assumption that many alleles routinely considered to be causal for diseases like phenylketonuria, cystic fibrosis, and breast cancer are viewed as causal.

It is difficult to know how to choose cases to maximize genetic homogeneity, even among cases with similar phenotypes. For many mapping (gene-finding) methods, such as association studies6,2023 genetic heterogeneity can vitiate what might otherwise be a good statistical design.

Genes, but not disease, are inherited: when genetic diseases are not Mendelian

‘Simple’ Mendelian diseases follow discernible patterns: they appear in families in predictable proportions, the disease is not transmissible to unrelated individuals, the age of onset may be reasonably predictable, and so forth. Thus, it is relatively straightforward, and indeed good research practice, to demonstrate a likely genetic cause before a costly and laborious search for genes is undertaken.2426 Initial evidence may only be of a general nature: There is often excess risk to relatives of an affected person compared with a random member of the population, or twin studies may indicate higher risk among monozygous compared with dizygous twins or other siblings, or adoption studies, with the effect of the environment controlled, may indicate genetic risk.

It is much harder to demonstrate, a priori, whether and in what way genes are involved in the aetiology of complex traits and diseases. If penetrance, the probability of the trait given the genotype, is low, and the trait appears in unpredictable proportions, it may seem sporadic or spontaneous, or weakly familial owing to shared environment and predicting which model the disease will follow can be a challenge. For example, if an allele is necessary but not sufficient to produce disease, the allele will follow Mendelian rules of inheritance but the disease will not, particularly if additional risk factors are uncorrelated with the genetic variation being transmitted. Thus, the influence of viral risk factors (as with type 1 diabetes, multiple sclerosis, or other autoimmune diseases), diet (some cancers, perhaps including breast, heart disease), unidentified environmental triggers (asthma), or additional genes with which the gene under study interacts (Hirschsprung's or Bardet Biedl Syndrome), can obscure genetic effects.

A well-known confounder is that there are many non-genetic ways for disease to cluster in families. As far back as his classic 1908 paper on genotype frequency equilibrium that bears his name,27 Wilhelm Weinberg cautioned that diseases with an environmental aetiology may appear to be familial, because family members tend to be exposed to the same environment. Syphilis, for example, was once thought to be a heritable disease because it can be passed from mother to newborn. In this way, some cases of the diseases we now consider ‘complex’ may eventually prove to be ‘simple’, perhaps infectious or monogenic disorders—even if the reverse is more often the case.17,19

The subtleties here are many. A disease owing to the contribution of variants from multiple genes can appear to segregate or to simulate Mendelism,28 Contrariwise, many genetic phenomena may lead to ‘non-Mendelian’ familial aggregation, including somatic mutation,29 gene–environment interaction, gene–gene interaction, polygenes, genetic modifiers, a gene or genes with low penetrance, locus heterogeneity (i.e. true monogenic disorders appear complex because the loci vary by affected family), or the appearance of ‘phenocopies’ (phenotypes similar to the genetic form of a disease but due to environmental risk factors rather than genes). This increases the difficulty of specifying an appropriate model of gene–environment interaction, or measuring the effect of environmental risk factors, or detecting weak signal or finding linkage (between genetic marker and trait) when the excess risk in relatives is not great. At the same time, if conditions are appropriate, small relative risks among family members do not preclude detectable genetic effects.24,25

Because these problems make it difficult to establish Mendelian patterns of occurrence, the expectation that a costly genetic study will proceed only with good reason to believe the disease is ‘genetic’ has largely been waived. Association mapping (case–control) studies have essentially no such prior requirement. As a result, the lack of clear results is no surprise.

Many-to-many relationship of cause to disease

A widely stated genetic objective is to be able to use exposure to risk factors from the moment of conception, through the cloud of other entangling environmental exposures, to predict late-onset disease. As shown in Figure 1, causality may in some cases be a kind of hourglass phenomenon, the neck of which is a proximate risk factor (e.g. obesity), that may be arrived at via many alternative genetic and/or environmental paths and that then may lead to diverse associated outcomes (e.g. type 2 diabetes, hypertension, heart disease), the many-to-many causal fabric. Metaphorically, the individual grains of causal sand may pass through the neck with momentum, direction, and stochasticity that make prediction epistemologically problematic even with an estimate of the proximate risk factor for an individual. In other cases, there may be no proximate factor, or it may be unknown, and the neck may be quite narrow or intractably broad.

Figure 1

Hourglass metaphor for the development of complex diseases, with gene–gene and gene–environment interaction

Figure 1

Hourglass metaphor for the development of complex diseases, with gene–gene and gene–environment interaction

Sometimes, an insightful combination of genetic and environmental concepts can focus study on whether a specific risk factor actually affects a trait. Mendelian randomization is an example in which knowledge of a specific gene and the substrate on which it acts can be combined with environmental exposure data to show convincingly that an environmental factor is or is not involved.9,10 But this only applies when we have rather unusually detailed understanding of a disease, will usually only account for a fraction of cases, and hence will not turn complex traits into simple ones.

Dynamic nature of environmental risk

Risk factor exposures are dynamic. Exposure effects are perforce estimated retrospectively, even in cohort studies, because effects cannot be estimated until they have had their impact. But these estimates may be of little use because exposures are changeable and clearly unpredictable in number, identity, combination, and intensity. To estimate prospective risk we need to know what the exposure of today's population, subject, or patient will be tomorrow.

Human beings are not infinitely resilient, but the problem we face in epidemiology is to work out the multiplicity of weak effects. We have evolved to be buffered against insult, with compensatory physiological mechanisms, and these vary in detail among individuals and populations. Behaviourally too, we as people respond to information (or folklore or hope-lore) about various risk factors, by changing exposures: beef consumption rises and falls, and differentially according to social variables, based on what the media and health professionals say about it, whether correct or not.

Ecological fallacy and predicting risk

A well-known potential source of error in epidemiology is known as the ‘ecological fallacy’, which can result when making deductions about individuals from group data. This can be obvious: if a voting district always elects the Republican candidate, it would clearly be incorrect to assume that everyone in the district is Republican. This would be akin to assuming that total cholesterol >200 mg/dL always leads to myocardial infarction, or that all smokers eventually develop lung cancer, assumptions that would clearly be unjustified.

However, in significant ways these kind of inferences become de facto working assumptions because when an individual's risk cannot be predicted with certainty from group data, in practice any risk at all, estimated from the population average, is treated as being high and the exposure to be avoided. This arises in clinical practice when physicians are asked to predict a patient's risk of a given disease. Estimates are usually based on the number of risk factors a patient has, as identified from population data, or the fraction of a study cohort with the same risk factors who went on to develop disease (Figure 2); the patient might be told that he/she has, for example, a 60% risk of heart attack. A physician can counsel the patient to lower his/her risk by eliminating or reducing risk factors associated with disease at the population level but, since we can never know what an individual's true risk is at any given time, we can never know how well the intervention works or even whether it worked at all. A related problem is treating things like race and socioeconomic status as categorical variables, when they clearly are internally heterogeneous.

Figure 2

Heart attack risk calculator based on Framingham Heart Study data30

Figure 2

Heart attack risk calculator based on Framingham Heart Study data30

A population effect is a public health success even if we do not know who actually benefited from an intervention. It is the equating of population level risk with individual risk that is ambiguous. Even if taking statins to lower serum cholesterol leads to a significantly lowered group incidence of myocardial infarction, the intervention is difficult to interpret at the individual level.

However, there can be fallacy in the ecological fallacy itself. Social epidemiologists31,32 caution that there can be group-level effects on health that cannot be measured at the individual level, but that are as important a part of the disease ‘pathway’ as individual environmental exposures or genes. In these situations the social context is as much a determinant of disease as more proximate causes.

Chance

In discussions of the problem of contradictory results, statistical sampling or measurement error is often invoked to defend a hypothesis. Because of the vagaries of biology and timing of exposures and so forth, chance plays a much larger role in disease, beyond simple measurement ‘noise’, than generally acknowledged, and may not be quantifiable in any sense that we currently understand. If ‘signal’ is elusive because we are not measuring relevant variables, chance has an entirely different meaning—and effects cannot be better understood simply by increasing sample sizes. The negative impact of this can be inaccurate replication and prediction relative to expectations based on conventional computations of standard errors. If instead samples can truly be replicated—often more hope than reality—then the nature of chance might be testable, if, for example, variance decreases with increased sample size, then approaches like meta-analysis, the combining of multiple smaller studies to attempt a stable risk estimate, may be useful in assessing causal heterogeneity.33

Publication or analytic bias

False positive associations are often attributed to publication bias favouring positive results, rather than causal intractability. Evidence for effect-inflating bias may be found in the typically weaker associations observed in confirmatory studies. The statistical significance even of the latter is difficult to ascertain when weak, especially negative, results are not pursued with the same intensity as positive ones.34 Meta-analysis is often done only when an initial study yielded sufficiently enticing positive results to induce subsequent investigators to examine the same risk factors.3539 Although there are always exceptions, a common result is to find a gradual approach, with increasing number of samples, towards statistically significant but small relative risks or ones that asymptotically approach 1.0. Even when this is a reliable result, it usually leaves most cases of a disease unaccounted for.

A major cause of false positive results in epidemiology, genetic or environmental, is multiple-testing. Studies routinely include very many risk factors such as environmental variables or polymorphic markers at hundreds or even thousands of chromosomal locations. Correcting rigorously to reduce false positives so that no more than a nominal fraction (e.g. 5%) of studies generate false positive results may require such stringent reduction of per-test significance cut-offs that too many false negatives arise or the required samples are so large as to preclude practicability.6,40,41 A plethora of false negative, irreproducible results has been one predictable consequence of the adoption of a suggested loosening of criteria to include ‘suggestive’ significance42 in genetic mapping studies. There is understandable resistance to rigorous multiple-testing criteria that would be research-inhibiting by depriving most studies of results worthy of follow-up. Since significance cut-offs are always subjectively chosen, a routine trade-off is to weaken criteria to reduce required sample sizes, accepting false positives to avoid too many false negatives. But the criteria are vague and we are paying the price for that. Ironically, this becomes the flip side of the usual publication bias, because weakened criteria lead studies almost automatically to include at least some ‘positive’ result. It is no surprise that subsequent individual or meta-analyses converge on risk estimates that are small or zero.

Causation and criteria for scientific inference

Much of the ambiguity about complex disease causation has to do with how causality itself is actually determined, the epistemology of observational science itself. The effect of unexamined assumptions about study design or the interpretation of results can be insidious and profound.

Foundational criteria for causal inference

The question of cause and effect has been of interest to philosophers since the time of the ancient Greeks. Their idea was that inherent truths could be reasoned out by a brain created to have an intuitive understanding of Nature. This worldview was replaced by the 17th century Enlightenment's age of strict empiricism, in which truth was to be established according to specific operational guidelines, ushering in the age of method, that has persisted with increasing predominance to the present day.

The primary assumption of science, namely that the world is ultimately based on invariant ‘laws of nature,’ justifies the basic criterion underlying all scientific reasoning, induction. Causality requires that the repeated observation of a specific agent or factor be associated with a given effect: if people with allele X are consistently found to have disease Y, we infer that X causes Y.

In the 20th century, philosophers of science forcefully identified weaknesses in the strong reliance on induction and introduced the contrary but related notion of falsifiability stressed by Karl Popper who wrote that no theory can be proven by pure induction because future exceptions cannot be ruled out. Instead hypotheses can only be falsified by finding exceptions. Systematic attempts at falsification, like Sherlock Holmes' famous reasoning, eventually eliminate all but the correct hypothesis. Falsification is routinely cited as the fundamental criterion in science, and a combination of induction and falsification is the basis of the textbook hypothetico-deductive ‘Scientific Method’. However, it was long ago shown that falsification is itself fallible, because exceptions can always be attributed to experimental error, misinterpretation, or failure to include relevant variables. And there is another problem: if, the syphilis spirochete is found in every patient with syphilis, and none without, the spirochete becomes a strong candidate for the cause of the disease and only repeated, robustly documented, instances of failing to observe the association will shake confidence in that causal hypothesis. But, as pointed out as long ago as 1935 by Ludwik Fleck's founding work on modern philosophy of science (reprinted as Ref. 43), when a disease eventually becomes defined by its cause, so that a patient has syphilis only when the syphilis spirochete is found or AIDS only when HIV is isolated, the causal hypothesis is effectively no longer falsifiable, but for reasons unrelated to nature herself.

The epistemological infrastructure of modern science implies the fundamental operational criterion that a true causal statement must lead to the successful prediction of future effects. This is the heart of public health, because prediction should enable disease prevention by the removal of exposure to causes of disease. Prediction is related to the deductive step in the Scientific Method, in which the conclusion is guaranteed given the premise, and extends the reach of pure induction from experience because future epidemiological situations are often not exact replicates of past situations in which causal factors were identified.

The hypothetico-deductive approach cycles by design, from observation to assumption to prediction and back to new observation, but the underlying thinking can also become circular. The cause and effect relationship is determined through induction, inferred for the general population based on findings at the level of the individual, and then predicted back to new individuals through deduction from the population level, meaning that our deductions can depend on our inductions, and vice versa.

Probabilistic causation

In ways both practical and theoretical, in the 20th century, science was disabused of its more purely deterministic, absolutist notions of laws of Nature. This was first true of the physical sciences, but became especially apparent in epidemiology. For one thing, epidemiological studies always entail various sources of measurement or sampling error, and statistical methods were developed to deal with this kind of observational ‘noise’. We were forced to accept that induction leads only to estimation of causal parameters. Confirmation and falsification (goodness and badness of fit) became largely statistical notions, though this was generally viewed as making straightforward scientific sense within classical concepts of ultimately deterministic causation.

However, causation itself also dissolved in many ways during the last century. Aspects of causation were identified in both physical and biological sciences that were inherently probabilistic. Mendelian segregation is an important and well-understood example. Probabilistic cause, when its underlying mechanism is known, can have orderly, theoretically specifiable properties, but they have to be inferred through the additional cloud of non-specified sources of statistical noise.

The probabilistic nature of science constituted a change in outlook that vitiates much of the traditional criteria like replicability and falsifiability, and truth has become, as a consequence, hazier. Whether inference is based on significance values, standard errors, likelihood ratios, or other criteria, decision-making in the identification and characterization of causation has become formally subjective. A test may ‘replicate’ or ‘falsify’ a hypothesis only because some entirely arbitrary (that is, human-chosen and unrelated to the actual causal situation itself) significance level or likelihood ratio fails to be ratified.

In epidemiology and genetics, risk is now estimated probabilistically from repeated observations in samples. But a risk r may apply to everyone in a sample, or there may be heterogeneity. Each person either does or does not get a disease, but in many situations it is at best difficult to relate this binary outcome to individual ‘risk’. In most cases of complex disease, risk is far <100%. What does it mean when risk of disease is estimated to be 60%? No one would argue that all exposed persons are at a true 60% risk nor, the other extreme, that 60% are at 100% risk and the remainder 0%. Demographers have characterized this as a problem of ‘hidden heterogeneity’, which sometimes can be modelled in terms of distributions of underlying individual risk but which are by no means trivial to work in specific causal terms.

With probabilistic causation something else happens: the probabilistic estimates of the effect size of a risk factor vary among studies. What is ‘the’ risk and how do we treat the estimates? Are we estimating a true, fixed, Aristotelian kind of causation, or is the risk in question truly probabilistic—and by what criteria can we assume that the risk has a true value, rather than one that varies uncontrollably? What might this mean in the predictive sense? Does it account for differences among studies? The answers currently offered are that there is sampling variation and variation in unmeasured confounders, and methods that are usually purely empirical, such as multiple regression, are used to express the causal situation.

These issues arise, not always explicitly acknowledged, in meta-analysis where the idea is that joint analysis of pooled studies will asymptotically approach the true risk-factor effect as the number of studies increases. But if we have to pool studies to get stable and accurate results, it usually means that the risk effect is small—whatever that means causally—at least as it is being studied with current methods. This enterprise in effect makes the strong and usually not-justified assumption that intra-study as well as inter-study variation is all due to sampling variance rather than underlying heterogeneity. Contrariwise, it may sometimes be assumed that failure of (human-chosen) standard error ranges among studies to overlap implies causal heterogeneity. It is not clear how to know which view is reasonable, or when, and one consequence is a lack of clear stopping rules at which point we can say we know ‘the’ answer or if there is only one answer, that enough samples have been included and, hence, that further studies would not be wise funding investments. Under these conditions, even falsification loses much of its meaning.

We have inadequate ground rules to turn to when we face recalcitrant problems such as those of complex disease epidemiology, especially because ‘P-value’ decisions are entirely subjective. We need some better way to establish causality.

Bayesian inference

In some situations, there is a better way. One important response to statistical epistemology has been to use Bayesian reasoning.4446 We generally choose to study a question only when we have some set of plausible hypotheses about it. These a priori ideas are critical to practical study design. When conditions are appropriate, and with proper awareness of its limitations, Bayesian analysis allows us to evaluate the relative ‘probabilities’ of—a way of describing our intuitive confidence in—the truth of each of a set of specified competing hypotheses. Note that here the hypothesis itself, as well as or rather than, the underlying causal process, is viewed probabilistically.

Bayesian analysis formalizes the Scientific Method in a quantitative way. The investigator must specify prior probabilities for each of a set of competing hypotheses, which are revised in the light of new experimental or observational data. The choice of prior probabilities inevitably involves subjective guesswork, and this has been the most common objection to Bayesian inference.46,47 With such play in the system, the process of fitting observation to hypothesis can be elusively similar to fudging.48 Nonetheless, Bayesian analysis has clearly proven ideal for some situations, such as Mendelian inference in families, in which the set of plausible competing hypotheses is tractably small and closed and in which the hypotheses involve formally specifiable probabilities.44,45 Thus, a decision among hypotheses is based on our relative confidence in them, but the underlying process need not thereby be deterministic. We may come to prefer a single-gene dominant over a recessive model, but the segregation of the individual alleles from parent to offspring is still probabilistic—and the cut-off point for deciding ‘proof’ remains subjective.

Thus, even with Bayesian tools the amount of probabilism and uncertainty in modern criteria for inference, especially in observational fields like human genetics and epidemiology, can leave us in a kind of epistemological Limbo Land. This is the situation today with regard to many complex diseases. And the situation is actually much worse than has been described, because there is another very large source of non-probabilistic uncertainty in the actual practice of sciences as will be described below.

Proximate causal criteria in practice

We have been discussing basic, general, or ultimate criteria for making causal inference in science. These are sometimes stated but often tacit. However, both genetics and epidemiology also have their own proximate day-to-day working criteria that may be explicit, invoked de facto—or ignored.

Koch's infectious disease postulates (Koch, 1890) were long the gold standard: the causative bacterium must be detectable in every infected individual, and in no healthy one, and it must be possible to recover the bacterium from the host, grow it in culture in the laboratory, and successfully reinfect a new individual. However, as Fleck43 and others pointed out long ago, these postulates have limitations such as it is not possible to culture all pathogens in the laboratory. The criteria remain useful general guidelines for determining the agent responsible for a new infectious disease even though numerous infectious disease vectors do not meet the Koch criteria and were identified by other means.49

Diseases for which experimentation is not as useful require different criteria. Epidemiology has essentially coalesced around a list proposed by Hill in 1965 for assessing causation in occupational medicine50,51 (Table 2). These criteria (though actually proposed by Hill as imprecise guidelines only) are often cited as such,5257 although only temporality is regarded as necessary, and none as sufficient to explain causation. Hill himself noted that they need not all be true in every instance, and an observed association may be truly causal even when the criteria are not met, leaving Bayesianesque room for judgement and raising issues in complex disease causation not currently well addressed.

Table 2

Hill causal criteria

Hill criteria for causation
 
1. Strength of association 
    The stronger the association, the more likely the causal relationship is real 
2. Consistency 
    The cause-effect relationship is replicable 
3. Specificity 
    The association of cause and effect is one-to-one 
4. Temporality 
    The cause precedes the effect 
5. Biological gradient, or dose–response curve 
    More of the causal factor produces a stronger effect 
6. Biological plausibility 
7. Coherence 
    Epidemiological patterns make sense 
8. Experimental evidence for cause and effect 
9. Analogy 
    Precedent exists for the proposed causal relationship 
Hill criteria for causation
 
1. Strength of association 
    The stronger the association, the more likely the causal relationship is real 
2. Consistency 
    The cause-effect relationship is replicable 
3. Specificity 
    The association of cause and effect is one-to-one 
4. Temporality 
    The cause precedes the effect 
5. Biological gradient, or dose–response curve 
    More of the causal factor produces a stronger effect 
6. Biological plausibility 
7. Coherence 
    Epidemiological patterns make sense 
8. Experimental evidence for cause and effect 
9. Analogy 
    Precedent exists for the proposed causal relationship 

Assessing causation was historically somewhat more straightforward in human genetics because genes follow what appear to be true rules of inheritance. However, these rules work well only under three important conditions: (i) the effect (disease) has to be closely determined by a variant state of a gene so that the presence of the genetic variant or, in family and population studies, its inheritance is closely reflected by the appearance of the trait; (ii) the trait cannot be caused by too many genes in any given family (even two genes create difficult analytic challenges); and (iii) there cannot be too many non-genetic risk factors. These conditions are generally not met by complex diseases.

In search of new criteria

If the causes of complex disease are many, with small effect, and they are difficult if not impossible to detect, and disease patterns do not follow the rules of Mendelian inheritance, and causation cannot be definitively affirmed by the classical means of epidemiology, how do we make decisions about causation? And how do we decide whether to approach a disease as heritably genetic, due to somatic mutation, gene–environment interaction, gene–gene interaction, an infectious agent, polygenes, genetic modifiers, a gene with low penetrance, or any or all of the many possible scenarios that are routinely proposed, but so rarely confirmed, and how do we know when we are right?

The lack of consensus on inferential criteria, or even on what makes a trait complex, and the flood of irreproducible genetic association studies have led a number of professional journals to address these issues, and some have established guidelines for the publication of association studies in particular, but also of genetic studies in general.5860 Rather than proposing rigorous new concepts about disease and its causation, however, journals are falling once again back on method, suggesting that the problem is inadequate research standards: the perceived solution is larger studies, better statistical infrastructure, and high-throughput molecular technologies. But accurate confidence intervals are meaningless if the study was designed around a question that does not appropriately address the biological complexity of the disease. The editors recognize that there is a problem, but challenging the inertia of the status quo is currently beyond their scope, as is true for most observers of the problem. Acceptance of the need for gestalt changes is hard to come by.

The shifting target

For reasons that may largely have to do with the relative feasibility of the search, many geneticists have turned their attention to searching for genes involved in biophysiological pathways rather than finding specific allelic variation that will account for, or ‘cause’, the preponderance of cases of a disease. The intent is to identify pathways that might be the target of pharmacological intervention, to decipher and then mine the ‘druggable genome’.61,62 In that sense, the goal is not disease prevention or even prediction, but treatment. This will work if causation is not too complex or a pathway truly constitutes a funnel in the causal hourglass and can be intercepted without untoward ancillary consequences.

Informal Bayesianism

Heretofore, we have neglected an important and relevant aspect of how science is done—perhaps even the most important one, and formal causal criteria or strict adherence to high standards of study design are often secondary to this when it comes to how research is actually done. Science works through a kind of informal Bayesianism. Scientists rely heavily on unstated criteria, or degrees of confidence—Bayesian priors—in particular hypotheses or approaches, and investigators are only sometimes fully able or willing to acknowledge them.

Informal Bayesian priors are probabilities only in the loosest of senses. They include all of the vanities, vested interests, hunches, experiences, politics, careerism, grantsmanship tactics, competing cadres of collaborators, imperfections, and backgrounds of the scientists investigating problems at any time. These are not formal probabilities but degrees of commitment to a specific idea or explanation. They do not represent replicable trial situations, have no ‘distributions’, and have little if any constructive effect on statistical sampling issues.

These are then put into play, again informally, in post-study adjustment. But we often adjust the basic research criteria themselves, in service to these ancillary factors. Refined statistical methods, such as larger or longer studies or meta-analysis, or more exotic statistical tests, may serve these interests when data already at hand clearly show that the object of pursuit is a weak signal at best. Even if the signal is a true one, its pursuit exerts a conservative inertia of vested interests in the search for the more important factors. Weak genetic signal does not mean giving up on genetics but typing more mapping markers. Such adjustments ensure that the current worldview, of a tractable, replicable underlying causal truth, is not threatened.

There are good reasons for some conservatism in science, but in the case of complex disease there is enough clear existing evidence to show that we face conceptual and not just methodological problems. Ideological conformism is a general characteristic of science,43 as it is of society generally. But an environment that is institutionalized and bureaucratized at many levels, and based on intensely competitive funding for professional success, provides insufficient incentive and mechanisms to generate innovative conceptual rethinking.

Conclusions: why identify cause?

The quest for the Philosopher's Stone

The ‘Philosopher's Stone’ more often elicits fictional visions of Harry Potter, wizardry, and Mugwumps, than visions of wealth, but for centuries this term referred to the alchemists' long-sought, presumably real substance that would turn base metals into gold. In today's non-fictional world, the search for causes of complex disease resembles the ancient alchemists' quest. Current methods are failing, yet the response by investigators is to intensify current ideas of study design, analytic method, and technology to identify the causes of complex disease—and prove that the search was not in vain.

Is it even important to identify these causes? This may depend on the disease. Some complex diseases are primarily of early onset, others late; some have recently risen in prevalence, whereas the prevalence of others has not changed. These patterns can give important clues as to whether searching for cause, or even which sort of cause, is likely to be worthwhile.

Diseases with recently changed prevalence

Complex diseases with a recent history of rising prevalence (i.e. over the course of one or two generations) are clearly responses to environmental or lifestyle changes. Even if there is an underlying genetic predisposition, and it remains unidentified, identifying or intercepting these lifestyle changes could have tremendous public health benefit, particularly for early-onset diseases for which the ensuing contribution to person-years of health would be significant.

Asthma, for example, is a common disease with some clear environmental aetiology, the prevalence of which has risen sharply since the 1980s.63 Although epidemiological studies have failed to identify a single likely trigger, the epidemiology of asthma suggests that this might be possible. Numerous genome scans have been carried out in the quest for ‘the’ asthma gene, with a few suggestive results64 but they have generally not been confirmed by further studies. Despite the efforts in both epidemiology and genetics, we still know very little about the causes of asthma or how to prevent it. Similarly, type 2 diabetes was rare before the Second World War but is now at epidemic levels in many populations around the world.65,66 The rise in prevalence is usually attributed generically to ‘westernization’, but specific environmental changes or genes have not been definitively identified as major triggers. However, the condition is understood well enough in a black-box way that it is probable that general energy imbalance is the culprit. Even without a specific identified ‘cause’, this common disease is largely preventable or controllable by diet and exercise.

However, these are generic answers that do not relieve the epistemological dilemma of epidemiology. It is not encouraging that we do not yet know whether it is better to eat butter or margarine, or whether it is excessive cleanliness or pollution that causes asthma. Beyond the methodological issues we have discussed, much may depend on how tightly we cling to prior ideas, or to limitations on our search space. Perhaps the asthma epidemic is due to, say, widespread exposure to disposable diapers, or an infectious agent, that researchers bent, for example, on genetic solutions simply do not take seriously while they work away at repeated genome scans or the usual battery of environmental factors, with little more than suggestive results that are not confirmable. If seemingly improbable hypotheses are not entertained, they cannot be tested. Previous unexpected findings have relied on serendipity or accident (e.g. the discovery of penicillin), or just been slow to find acceptance (e.g. H. pylori causes gastric ulcers), particularly when they defy long-held tenets of biology (bacteria can not survive in the acidic environment of the stomach).

Late onset chronic diseases

Chronic diseases with late age-of-onset may be appreciably less likely to have a tractable set of identifiable causes as such, because they are typically the result of decades-long processes, with disease developing very slowly along a continuum from health to pathology, and the outcome often essentially stochastic. Examples are cardiovascular disease, hypertension, kidney disease, and many problems associated with obesity. Searching for genetic or environmental ‘causes’ of such traits may simply be to misunderstand their actual biology because we dichotomize physiology as either normal or disease.

Every individual's genotype is unique at literally millions of nucleotides, and exposure to environmental risk factors varies uniquely as well. Diseases likely to be the result of so many small, ephemeral effects may be undetectable by current statistical methods, even in principle. The greater the number of risk factors, the smaller the number of people any particular disease process pertains to, the less generalizable the findings, and, thus, the smaller the public health impact that can come of searching for any one genetic cause. Similarly, because environmental exposures vary tremendously, and unpredictably, what pertains to this generation may be of little relevance to the next.

Fortunately, many of the most common complex diseases are among the very ones that can generally be prevented or modified with behavioural changes like diet and exercise—focusing preventive or therapeutic measures on the waist of the hourglass in Figure 1. By intervening on intermediate risk factors such as weight, cholesterol, or blood pressure, the pathophysiological pathways to disease seem to be interruptible, without regard for the genes or environmental factors that led to the development of the intermediate risk factors. Ironically, once these common if generic intermediate factors are removed, the residual cases will be those that are most likely to be due to identifiable factors, like very high risk genotypes, that are causal in the usual sense of the term. Indeed, from rare instances of strongly familial breast cancer, hyperlipidaemia, and other chronic diseases, we know this is the case.

Chronic diseases with stable prevalence

Diseases for which prevalence rates have not changed appreciably over recent decades, particularly those with early-onset such as psychiatric diseases or epilepsies and some autoimmune disorders, may be more likely to have meaningful ‘genetic’ causation. However, these diseases are not proving to be any more genetically tractable than the more clearly environmentally induced disease. The reasons are many. Most importantly, their genetic basis may be multigenic, each person or family affected by alleles at different genes, or each by combinations of alleles at multiple genes, and differing in these ways among populations, and the usual study designs will not detect these. Non-heritable genetic change, due to somatic mutation, may be a cryptic genetic cause as well.29 The usual kinds of family or case–control genetic studies will be hard pressed to pick up these causes. If gene-by-environment interaction is involved, as is likely in many cases, the problem is comparably more complex, and even scaling up current methods may be impotent to find these causes, because they are so variable and ephemeral.

The illusion of the Philosopher's Stone

The proper advice to alchemists searching for the Philosopher's Stone would have been to stop that approach and seek their riches elsewhere. The problems we face are profound and fundamental in observational science, but the lack of an obvious alternative does not justify continuing to invest in what does not work. Unfortunately, the informal Bayesian nature of scientific criteria and indeed the hunger of the paying public for easy answers and promises are great obstacles to change, but when even the public is beginning to doubt that base metal can be turned to gold, it is time for scientists to rethink the quest.

References

1
Risch NJ. Searching for genetic determinants in the new millennium.
Nature
 
2000
;
405
:
847
–56.
2
Skrabanek P. Has risk-factor epidemiology outlived its usefulness?
Am J Epidemiol
 
1993
;
138
:
1016
–17.
3
Taubes G. Epidemiology faces its limits.
Science
 
1995
;
269
:
164
–69.
4
Davey Smith G. Reflections on the limitations to epidemiology.
J Clin Epidemiol
 
2001
;
54
:
325
–31.
5
Davey Smith G, Ebrahim S. Epidemiology—is it time to call it a day?
Int J Epidemiol
 
2001
;
30
:
1
–11.
6
Terwilliger JD, Weiss KM. Confounding, ascertainment bias, and the blind quest for a genetic ‘fountain of youth’.
Ann Med
 
2003
;
35
:
532
–44.
7
Weiss KM, Terwilliger JD. How many diseases does it take to map a gene with SNPs?
Nat Genet
 
2000
;
26
:
151
–57.
8
Weiss KM, Clark AG. Linkage disequilibrium and the mapping of complex human traits.
Trends Genet
 
2002
;
18
:
19
–24.
9
Davey Smith G, Ebrahim S. ‘Mendelian randomization’: can genetic epidemiology contribute to understanding environmental determinants of disease?
Int J Epidemiol
 
2003
;
32
:
1
–22.
10
Davey Smith G, Ebrahim S. Mendelian randomization: prospects, potentials, and limitations.
Int J Epidemiol
 
2004
;
33
:
30
–42.
11
Day IN, Gu D, Ganderton RH, Spanakis E, Ye S. Epidemiology and the genetic basis of disease.
Int J Epidemiol
 
2001
;
30
:
661
–67.
12
Beaty TH, Khoury MJ. Interface of genetics and epidemiology.
Epidemiol Rev
 
2000
;
22
:
120
–25.
13
Ellsworth DL, Manolio TA. The emerging importance of genetics in epidemiologic research II. Issues in study design and gene mapping.
Ann Epidemiol
 
1999
;
9
:
75
–90.
14
Khoury MJ, Millikan R, Little J, Gwinn M. The emergence of epidemiology in the genomics age.
Int J Epidemiol
 
2004
;
33
:
936
–44.
15
Merikangas KR. Implications of genomics for public health: the role of genetic epidemiology.
Cold Spring Harb Symp Quant Biol
 
2003
;
68
:
359
–64.
16
Millikan R. The changing face of epidemiology in the genomics era.
Epidemiology
 
2002
;
13
:
472
–80.
17
Weiss KM, Buchanan AV. Evolution by phenotype: a biomedical perspective.
Perspect Biol Med
 
2003
;
46
:
159
–82.
18
Weiss KM, Buchanan A. Genetics and the Logic of Evolution. Hoboken, NJ: Wiley-Liss;
2004
.
19
Scriver CR, Waters PJ. Monogenic traits are not simple: lessons from phenylketonuria.
Trends Genet
 
1999
;
15
:
267
–72.
20
Gambaro G, Anglani F, D'Angelo A. Association studies of genetic polymorphisms and complex disease.
Lancet
 
2000
;
355
:
308
–11.
21
Neale BM, Sham PC. The future of association studies: gene-based analysis and replication.
Am J Hum Genet
 
2004
;
75
:
353
–62.
22
Rebbeck TR, Spitz M, Wu X. Assessing the function of genetic variants in candidate gene association studies.
Nat Rev Genet
 
2004
;
5
:
589
–97.
23
Rao DC, Province MA (eds). Genetic Dissection of Complex Traits. San Diego, CA: Academic Press,
1999
.
24
Risch N. Linkage strategies for genetically complex traits. I. Multilocus models.
Am J Hum Genet
 
1990
;
46
:
222
–8.
25
Weiss KM, Chakraborty R, Majumder PP, Smouse PE. Problems in the assessment of relative risk of chronic disease among biological relatives of affected individuals.
J Chronic Dis
 
1982
;
35
:
539
–51.
26
Ghosh S, Collins FS. The geneticist's approach to complex disease.
Annu Rev Med
 
1996
;
47
:
333
–53.
27
Weinberg W. Uber den Nachweis der Verebung beim Menschen (On the demonstration of heredity in man). Jahreshefte des Vereins fur Vaterlandische Natukunde in Wurttemberg, Stuttgart
1908
;
64
:
368
–82.
28
Edwards JH. The simulation of mendelism.
Acta Genet Stat Med
 
1960
;
10
:
63
–70.
29
Weiss KM. Cryptic causation of human disease: reading between the (germ) lines.
Trends Genet
 
2005
;
21
:
82
–8.
30
Framingham Prognostic Score Calculator, (n.d.), retrieved April 15,
2005
from http://www.cardiology.palo-alto.med.va.gov/tools/medcalc/fram/
31
Kaufman JS, Poole C. Looking back on ‘causal thinking in the health sciences’.
Annu Rev Public Health
 
2000
;
21
:
101
–19.
32
Krieger N. Theories for social epidemiology in the 21st century: an ecosocial perspective.
Int J Epidemiol
 
2001
;
30
:
668
–77.
33
Weed DL. Meta-analysis under the microscope.
J Natl Cancer Inst
 
1997
;
89
:
904
–05.
34
Colhoun HM, McKeigue PM, Davey Smith G. Problems of reporting genetic associations with complex outcomes.
Lancet
 
2003
;
361
:
865
–72.
35
Ioannidis JP. Genetic associations: false or true?
Trends Mol Med
 
2003
;
9
:
135
–38.
36
Ioannidis JP, Ntzani EE, Trikalinos TA. ‘Racial’ differences in genetic effects for complex diseases.
Nat Genet
 
2004
;
36
:
1312
–18.
37
Ioannidis JP, Ntzani EE, Trikalinos TA, Contopoulos-Ioannidis DG. Replication validity of genetic association studies.
Nat Genet
 
2001
;
29
:
306
–09.
38
Ioannidis JP, Trikalinos TA, Ntzani EE, Contopoulos-Ioannidis DG. Genetic associations in large versus small studies: an empirical assessment.
Lancet
 
2003
;
361
:
567
–71.
39
Lohmueller KE, Pearce CL, Pike M, Lander ES, Hirschhorn JN. Meta-analysis of genetic association studies supports a contribution of common variants to susceptibility to common disease.
Nat Genet
 
2003
;
33
:
177
–82.
40
Goring HH, Terwilliger JD, Blangero J. Large upward bias in estimation of locus-specific effects from genomewide scans.
Am J Hum Genet
 
2001
;
69
:
1357
–69.
41
Terwilliger JD. On the resolution and feasilibity of genome scanning approaches. In: Rao DC, Province MA, (eds). Genetic Dissection of Complex Traits. San Diego, CA: Academic Press,
1999
, pp. 351–91.
42
Lander E, Kruglyak L. Genetic dissection of complex traits: guidelines for interpreting and reporting linkage results.
Nat Genet
 
1995
;
11
:
241
–47.
43
Fleck L. Genesis and Development of a Scientific Fact. Chicago: University of Chicago Press,
1979
.
44
Howson C, Urbach P. Scientific Reasoning: The Bayesian Approach. 2nd edn. La Salle, IL: Open Court,
1993
.
45
Jaynes ET. Probability Theory: The Logic of Science. Cambridge, England: Cambridge University Press,
2003
.
46
Godfrey-Smith P. Theory and Reality. Chicago: University of Chicago Press,
2003
.
47
Chalmers AF. What is This Thing Called Science? 3rd edn. Indianapolis: Hackett Publications,
1999
.
48
Lipton P. Testing hypotheses: prediction and prejudice.
Science
 
2005
;
307
:
219
–21.
49
Evans AS. Causation and Disease: A Chronological Journey. New York: Plenum Medical Book Co.,
1993
.
50
Hill AB. The Environment And Disease: Association Or Causation?
Proc R Soc Med
 
1965
;
58
:
295
–300.
51
Susser M. What is a cause and how do we know one? A grammar for pragmatic epidemiology.
Am J Epidemiol
 
1991
;
133
:
635
–48.
52
Mengersen KL, Merrilees MJ, Tweedie RL. Environmental tobacco smoke and ischaemic heart disease: a case study in applying causal criteria.
Int Arch Occup Environ Health
 
1999
;
72
(Suppl):
R1
–40.
53
Potischman N, Weed DL. Causal criteria in nutritional epidemiology.
Am J Clin Nutr
 
1999
;
69
:
1309S
–1314S.
54
Petersen KE, James WO. Agents, vehicles, and causal inference in bacterial foodborne disease outbreaks: 82 reports (1986–1995).
J Am Vet Med Assoc
 
1998
;
212
:
1874
–81.
55
Paneth N, Ahmed F, Stein AD. Early nutritional origins of hypertension: a hypothesis still lacking support.
J Hypertens Suppl
 
1996
;
14
:
S121
–29.
56
Nicol-Smith L. Causality, menopause, and depression: a critical review of the literature.
BMJ
 
1996
;
313
:
1229
–32.
57
Macdonald S, Cherpitel CJ, Borges G, Desouza A, Giesbrecht N, Stockwell T. The criteria for causation of alcohol in violent injuries based on emergency room data from six countries.
Addict Behav
 
2005
;
30
:
103
–13.
58
Huizinga TW, Pisetsky DS, Kimberly RP. Associations, populations, and the truth: recommendations for genetic association studies in Arthritis & Rheumatism.
Arthritis Rheum
 
2004
;
50
:
2066
–71.
59
Lancet T. In search of genetic precision.
Lancet
 
2003
;
361
:
357
.
60
Genetics N. Freely associating.
Nat Genet
 
1999
;
22
:
1
–2.
61
Hopkins AL, Groom CR. The druggable genome.
Nat Rev Drug Discov
 
2002
;
1
:
727
–30.
62
Orth AP, Batalov S, Perrone M, Chanda SK. The promise of genomics to identify novel therapeutic targets.
Expert Opin Ther Targets
 
2004
;
8
:
587
–96.
63
Adams PF, Hendershot GE, Marano MA. Current estimates from the National Health Interview Survey, 1996.
Vital Health Stat 10
 
1999
;
1
–203.
64
Wills-Karp M, Ewart SL. Time to draw breath: asthma-susceptibility genes are identified.
Nat Rev Genet
 
2004
;
5
:
376
–87.
65
Songer TJ, Zimmet PZ. Epidemiology of type II diabetes: an international perspective.
Pharmacoeconomics
 
1995
;
8
(Suppl 1):
1
–11.
66
Zimmet P. The burden of type 2 diabetes: are we doing enough?
Diabetes Metab
 
2003
;
29
:
6S9
–18.