Recent clinical trials demonstrating that hormone replacement therapy (HRT) does not prevent coronary heart disease in women have again raised doubts concerning observational studies. Although much of the explanation probably lies in what might be called the “healthy HRT user” effect, another contributing factor may be that most observational studies included many prevalent users: women taking HRT for some time before study follow-up began. This practice can cause two types of bias, both of which plausibly may have contributed to the discrepancy between observational and randomized studies. First, prevalent users are “survivors” of the early period of pharmacotherapy, which can introduce substantial bias if risk varies with time, just as in studies of operative procedures that enroll patients after they have survived surgery. This article provides several examples of medications for which the hazard function varies with time and thus would be subject to prevalent user bias. Second, covariates for drug users at study entry often are plausibly affected by the drug itself. Investigators often do not adjust for these factors on the causal pathway, which may introduce confounding. A new-userdesign eliminates these biases by restricting the analysis to persons under observation at the start of the current course of treatment. This article thus argues that such designs should be used more frequently in pharmacoepidemiology.
Received for publication November 7, 2002; accepted for publication May 1, 2003.
Recent reports from clinical trials of the effects of hormone replacement therapy (HRT) on coronary heart disease in women (1–4) have again raised doubts concerning the use of observational studies in clinical research (5). Observational studies have consistently shown a 35–60 percent reduction in coronary heart disease among postmenopausal users of replacement estrogens, with or without progestins (6); many were prospective cohort studies that collected extensive data on potential confounding factors and thus were considered state of the art (7, 8). Indeed, it is likely that findings from these nonrandomized studies influenced millions of women to use HRT for the presumed cardiac benefits (9). Yet, two randomized controlled trials in postmenopausal women with coronary heart disease (1–3) and results from the Women’s Health Initiative study of healthy postmenopausal women (4) have not demonstrated a benefit; in fact, the latter found that using estrogen with a progestin increased the risk of coronary heart disease. One of the many questions these data raise is how the prior observational studies could have been so misleading.
Much of the explanation probably lies in what has been called the “healthy user” effect (10). It is well recognized that the primary way in which observational studies differ from a clinical trial is that the women, not the study investigators, determine whether to use HRT and, if so, which type. Thus, many believe that women who decide to use postmenopausal hormones have a more favorable cardiovascular risk factor profile than do nonusers and that these differences are due to factors not measured in many of the observational studies (6, 11). Indeed, a subgroup analysis from a recent meta-analysis (10) provides some support for this thesis: in observational studies of fair or better quality that controlled for socioeconomic status and education, HRT did not protect against nonfatal coronary artery disease. Because such a healthy drug user effect is plausible for many medications, especially those used for prevention, substantial attention must be devoted to measuring factors that differ systematically between drug users and nonusers and to determining the extent to which doing so improves the quality of observational studies.
However, another phenomenon also may have contributed to the discrepancy. Several of the large prospective cohort studies of HRT (7, 8) included many women who had been using estrogens for some time prior to entry into the study and the beginning of follow-up. As shown in this article, these prevalent users can introduce two types of bias: 1) underascertainment of events that occur early in therapy and 2) the inability to control for disease risk factors that may be altered by the study drugs. Both of these biases may have contributed to the discrepancy between observational and randomized studies of HRT.
This article reviews new-user designs, which avoid these biases by excluding prevalent users from the study. It begins by defining new-user designs for medication studies and discussing their historical antecedents. It then illustrates the susceptibility of designs that include prevalent users to the biases mentioned above and shows how these biases are avoided by the new-user design. It explains how new-user designs can be implemented as case-control studies, either nested or nonnested. Finally, it emphasizes the logistical and sample size limitations of new-user designs.
A new-user design begins by identifying all of the patients in a defined population (both in terms of people and time) who start a course of treatment with the study medication. Study follow-up for endpoints begins at precisely the same time as initiation of therapy, or t0. The study is further restricted to patients with a minimum period of nonuse (washout) prior to t0. The study should include all patients in the study population meeting these criteria. Data for all patient characteristics are obtained at a time just before t0. Cohort studies can be performed by initially assembling a cohort consisting of only new users and an appropriate comparison group or by identifying new users and the comparison group from an existing cohort.
This definition is very similar to the way in which data are analyzed in a clinical trial, where t0 is the time of randomization (usually just before treatment begins). In particular, if two therapies used for the same indication are compared, this design now is very analogous to a clinical trial, except of course that treatment is not assigned by randomization. However, this design differs from most observational studies in that it excludes prevalent users.
The key idea underlying new-user designs—that the beginning of study follow-up be synchronized with starting the drug—was described in 1971 by Alvan Feinstein (12, 13). In that essay, he notes that “each member of a cohort must have a chronologic reference point … from which the subsequent follow-up begins” (12, p. 870; 13, p. 95). Feinstein then uses the term zero time (the t0 used in this article) to indicate when each member’s exposure to the factor under study began (for therapeutic interventions, when the intervention began), and he continues: “Because the purpose of cohort research is to observe the effects of a maneuver, the logical choice of a reference date is zero time: the inception of the maneuver” (12, p. 870; 13, p. 95). Feinstein (12, 13) describes the broad applicability of this principle to cohort studies of disease prognosis, surgical interventions, and medications and characterizes the potential consequences of failing to observe this principle as “scientifically disastrous,” potentially creating “major, irremediable sources of bias” (which he denotes as chronology bias).
Interestingly, these seminal ideas have been widely adopted to study disease prognosis. Inception cohorts, defined in this context as “patients … identified at an early and uniform point (inception) in the course of their disease” (14, p. 177), are now standard for prognostic studies. They also are widely recognized as crucial for studies of surgical interventions. However, inception cohorts (a new-user design is an inception cohort if all persons in the study are first-time medication users) and new-user designs are used very infrequently for pharmacoepidemiologic studies. This infrequent use is probably due in large part to the formidable logistical complexity of identifying new users of medications, which requires tracking medication use on a day-by-day basis, and, for prospective studies, the loss of sample size and power that would result from excluding prevalent users. Nevertheless, as described below, the increasing availability of data resources that provide detailed drug use data should reduce the logistical barriers to new-user designs.
UNDERASCERTAINMENT OF EARLY EVENTS
For many treatments, the rate at which treatment-related outcomes occur varies with time since the start of therapy. Surgical procedures, such as coronary artery bypass, are one of the most dramatic examples, where there is an early surgical and postoperative period of very high risk. Thus, it is now well recognized that in evaluations of surgical procedures, follow-up for all patients must begin just before surgery. Recruiting patients into the study after they have survived surgery will introduce serious bias that usually favors the treatment because, by definition, this procedure excludes perioperative deaths (14).
However, this time dependence of risk also can occur with medications, where the period of early use often is associated with elevated risk. For HRT, there is evidence from the Heart and Estrogen/Progestin Replacement Study (figure 1) (1), the Women’s Health Initiative (4), and some observational studies (15) that risk is increased in the first year, which may be due to early adverse effects (such as prothrombotic effects). This pattern of increased early risk is common in therapeutics (16) and has been observed for intussusception following administration of rotavirus vaccine (17), for falls after beginning use of benzodiazepines (18), for peptic ulcers in users of nonsteroidal anti-inflammatory drugs (19), for angioedema in users of angiotensin-converting enzyme inhibitors (20), and for venous thrombosis in users of oral contraceptives (21). Time-dependent risk can result from early attrition of those patients most susceptible to the event (22), medications that have both beneficial and adverse effects but with different induction periods (23), physiologic adaptation that occurs during prolonged periods of treatment (24), or other selection factors, such as adherence bias, that vary according to duration of therapy.
Just as for evaluations of surgical procedures, using cohorts comprised largely of “survivors” of the early period of pharmacotherapy can introduce substantial bias if risk varies with time. The following hypothetical example shows how this bias could produce discrepancies similar to those observed for HRT. Assume a cohort of 100,000 postmenopausal women, followed for 4 years, in which 1 percent begin HRT use on the first day of each study year and continue throughout the study. Assume that in the first year of HRT use, the rate of serious coronary heart disease is 10 per 1,000 person-years and drops to 2 per 1,000 in subsequent years. Assume that the rate for comparable nonusers of HRT is a constant 4 per 1,000. Thus, this cohort will include 4,000 new users with 10,000 person-years of use (4,000 for the first year of HRT use and 6,000 subsequently) and 52 events (40 for the first year of HRT use and 12 subsequently), for a rate of 5.2 per 1,000, slightly greater than that for nonusers. However, assume there were 10,000 prevalent users at the beginning of the study, all whose duration of HRT use was longer than 1 year and who continued use during study follow-up. Including these users in the study would thus add 40,000 person-years and 80 events. The overall rate now would be 2.6 per 1,000, one half that for the new-user cohort and lower than that for nonusers.
When two treatments are compared, inclusion of prevalent users can lead to treatment groups with different durations of prior therapy, which can introduce the bias in the manner described above (16). This situation is often very plausible in pharmacoepidemiologic studies because new drugs frequently are compared with existing therapies. This type of bias has been suggested as a potential explanation for the excess of venous thromboembolisms among users of third-generation oral contraceptives because the risk of venous thromboembolism was greatest early in therapy and women taking third-generation drugs had started use more recently than those using earlier agents (21). This problem cannot be resolved by recording duration of prior therapy at the start of follow-up because these “survivors” exclude persons whose duration of use is comparable but who stopped use because of events or other adverse effects prior to the beginning of study follow-up. In terms of the surgical example, a study that begins follow-up after hospital discharge cannot eliminate the bias that occurs from failing to include perioperative deaths by recording the time between surgery and study entry.
The new-user design eliminates this bias because analysis begins with the start of the current course of treatment for every cohort member. Thus, just as in the clinical trial, t0 is known for each cohort member, and all early events are included in the analysis. Although persons may enter the cohort on different calendar dates, the analysis for each is relative to the time that therapy started, as in a clinical trial.
Even for drugs whose physiologic effects do not vary with duration of therapy, including prevalent users may amplify adherence bias (10). This bias is thought to underlie the findings from analyses of data from several randomized controlled clinical trials in which better adherence to placebo has been associated with a 30–60 percent reduced risk of death from cardiovascular disease (25–27) and fewer episodes of fever or infection in cancer patients (28). The magnitude of the association was not materially affected by adjustment for several potential confounders. It is thought that adherence is a marker for a constellation of unmeasured factors, some of which may be time dependent, associated with better prognosis (10, 26, 27).
Analyses that include prevalent users are more susceptible to adherence bias because, as illustrated in the following simplified example, long-term users tend to be patients adherent to therapy. Assume that a study begins in a population 4 years after a new medication is introduced. Assume that in each of the preceding 4 years, 10,000 persons begin using this drug on the first day of the year. Of these, 5,000 are poor adherers and cease use within the year. The remaining good adherers continue use indefinitely. Thus, a cohort study beginning in year 5 that allowed prevalent users would include 25,000 good adherers (20,000 from the previous 4 years) and 5,000 poor adherers. In contrast, a new-user design beginning in year 5 would include equal numbers of good and poor adherers.
For drugs that have been marketed for some time, new-user designs will include many patients who, although not using the drug during the washout period when the study begins, will have a past history of using that drug. They also may have a past history of using a different drug, but with similar pharmacologic properties. Does this vitiate the advantages of new-user designs? If, as is thought to be true for many drugs, a patient returns to a state similar to that of a naive user following washout, there is no problem. However, some drug effects, such as early attrition of patients most susceptible to the event, may be irreversible. Inclusion of persons with a history of past use does not cause systematic bias in a clinical trial because randomization works to equalize the distribution of past users across the study groups. A similar effect can be achieved in a new-user cohort study by choosing controls matched according to past use. In both clinical trials and new-user designs, inclusion of nonnaive users does limit inference. For example, if in a study of HRT, either randomized or observational, all patients had some past exposure to HRT, then inferences could not necessarily be made about the effects unique to first exposure.
DISEASE RISK FACTORS ALTERED BY STUDY DRUGS
Inclusion of prevalent users in a study complicates control for potential confounders because these factors often are plausibly affected by the treatment itself. Doing so leads to a difficult conundrum: either the investigators adjust for the values of the covariates, thus committing the error of adjusting for factors on the causal pathway (29), or they do not, thus potentially introducing confounding (6, 11). For example, HRT favorably alters both high and low density lipoproteins (4), suggesting that these potential confounders should not be included in multivariate models; however, the “healthy user” hypothesis would suggest that women electing HRT could have more favorable lipid profiles prior to beginning therapy (11), thus arguing for adjustment.
In the new-user design, potential confounders can be measured just prior to t0—analogous to the practice in clinical trials of measuring the values of important prognostic factors just prior to randomization—and thus cannot be influenced by the therapy. Therefore, this conundrum does not occur. The baseline values of confounders can be used to adjust for differences between the treatment groups. For factors not affected by treatment that potentially can change over time, standard time-dependent covariate analyses can be used.
The potential for bias introduced by treatment effects on intervening variables is present for many important questions of therapeutic safety and efficacy. In studies of psychotropic drugs and fall-related injuries in the elderly (30), the question arises as to whether to adjust for history of falls prior to study entry. For prevalent users of a psychotropic drug, the occurrence of prior falls may be due to the psychomotor effects of the drug per se, thus arguing against control for this factor. However, it also is plausible that psychotropic drug users could have conditions such as major depression or higher levels of somatic impairment, which themselves affect the risk of falls, arguing for control. A similar dilemma arises in the study of nonaspirin, nonsteroidal anti-inflammatory drugs and coronary heart disease (31, 32) because several of these drugs affect factors such as hypertension (33) that are on the causal pathway for this endpoint.
NEW USERS IN CASE-CONTROL STUDIES
New-user designs can be implemented as nested case-control studies, which may be viewed as cohort studies with sampling to improve efficiency (29). The nested case-control study samples from a study base (34–36) composed of explicitly identified people and, for each person, a study time window. If a member of the study base develops the outcome of interest during that person’s time window, then he or she is included in the study as a case. Other members of the study base are eligible to be controls during their study time windows. A study base member who is using the drug being evaluated just before the study time window begins is a prevalent user; one who begins such use during the time window and has the appropriate antecedent drug-free washout period is a new user. A new-user design may be implemented by restricting the study base to new users and the appropriate comparison group.
New-user studies can be conducted with nonnested case-control designs provided that all cases within a defined population are ascertained for a defined time period. Let s0 and s1 be the beginning and end of the case accrual period, respectively (the notation is different so as not to confuse the beginning of accrual with the beginning of a period of drug use). Then, for each case identified or potential control selected within the period s0 to s1, that study subject is a prevalent user if the drug was being used just before s0 and would thus be excluded from the study. Covariates would need to be ascertained at a time just prior to the start of drug use.
Case-control studies of any type that include persons who began drug use before study accrual of cases (s0) are susceptible to bias related to underascertainment of early events. For example, consider two HRT users, each beginning use 1 year prior to s0. The first has a fatal myocardial infarction 1 month after beginning HRT; the second has no study events. A case-control study that included persons beginning HRT use before the study accrual period would miss the former and include the latter as a potential control, thus underestimating the risk associated with HRT.
LOGISTICAL AND OTHER LIMITATIONS OF THE NEW-USER DESIGN
An important limitation of new-user designs is the logistical difficulty of identifying the time that medication use began and collecting information on potential confounders at this t0. Doing so usually would require tracking both drug use and potential confounders on a day-to-day basis. Thus, for studies whose primary source of data is interview of subjects, meeting this requirement would be so expensive and cumbersome for study subjects as to be generally infeasible. For nonnested case-control studies, limiting them to subjects beginning medication use within the study accrual period would materially reduce the efficiency of this design, particularly with regard to long-term effects of drugs.
Use of automated databases and record linkage in epidemiology is growing. These data often include detailed information on medication prescriptions and other information that can be used to define potential confounders. At present, there are at least three types of these databases in which it is practical to conduct new-user designs.
Computerized databases of medical care encounters for defined populations now are frequently kept by health care payers or are the by-product of computerized medical record systems. Examples of the former include Medicaid (37, 38), health maintenance organizations (39), and universal health insurance plans (40). The General Practice Research Database in the United Kingdom is an example of the latter (41). Each includes records of prescriptions filled at the pharmacy or written by physicians, which provide a measure of medication use that is sufficiently detailed to identify new users. Prescriptions, other medical care encounters, and other linked files can be used to ascertain potential confounders. New-user studies have been conducted with these databases (32, 42, 43).
Nursing homes and hospitals maintain daily “medication administration records,” which increasingly are computerized. They can be used to identify new users. Other data available in these settings, such as the now-computerized nursing home Minimum Data Set (44), provide extensive and regularly updated information on patient health, function, and medical care that can be used to ascertain potential confounders. This data source was used to conduct a new-user study of antidepressants and falls in nursing homes (30).
Special-purpose, computerized patient disease registries are another potential source of data for new-user studies. These registries typically enroll patients at a well-defined point in therapy (often the onset), thus providing the opportunity to identify new users, and can include sufficient information to ascertain potential confounders. Examples of this type of registry include the databases for tracking human immunodeficiency virus patients (45) and the registry of patients receiving the atypical antipsychotic clozapine (46).
Restricting a study to new users usually will reduce sample size and thus study power. For a study with an enrollment period that begins several years after a drug has been on the market, the number of new users available for study is likely to be considerably smaller than that of prevalent users. Furthermore, the prevalent users will include many long-term users, who are particularly important for evaluating the risk of effects related to chronic exposure, such as breast cancer. This limitation can be addressed in two ways. First, some longitudinal databases will include sufficient history to enable study of a drug from the time it is introduced. In this circumstance, the new-user design does not limit power because the time of first use can be identified for each drug user. This strategy was used in a study of hydroxymethylglutaryl-coenzyme A-reductase inhibitors (statins) and hip fracture conducted in a Medicaid database (43). Second, one could assess the magnitude of the potential biases related to including prevalent users. Included could be analysis of prior or study data to assess the extent to which the hazard function varies with time (16), the magnitude of adherence bias, and whether important covariates are influenced by study exposures, as well as a comparative analysis of new and prevalent users. If no evidence of the material presence of these biases was found, then prevalent users could be included in the analysis.
For medications used for both acute and long-term indications, an analysis of new users may give excessive weight to short-term users. For example, nonsteroidal anti-inflammatory drugs are used short term for various types of acute pain and inflammation but also are used chronically by persons with osteoarthritis or rheumatoid arthritis. A new-user cohort might include disproportionate numbers of the former group. Similarly, patients who are poor compliers or who do poorly on existing medications may be overrepresented among those starting new courses of therapy. This limitation can be addressed in two ways. First, when possible, data on indication and behavioral factors should be collected and assessed as possible effect modifiers. Second, just as in clinical trials, survival analysis methods can be used to determine how the hazard function for the study outcome varies with time since the start of therapy.
The discrepancy between observational studies of HRT and clinical trials should stimulate reexamination of the methodology for observational studies of therapies to identify ways to improve these designs. This article has described one weakness of the most commonly used observational designs: inclusion of prevalent users. Doing so engenders susceptibility to biases related to underascertainment of adverse effects occurring early in therapy and modification of variables on the causal pathway. It is plausible, although certainly not proven, that these biases contributed to the misleading results provided by observational studies of HRT (15). However, although the new-user designs proposed here are not a panacea for the shortcomings of observational studies and have their own limitations of more complicated logistics and reduced statistical power, they can eliminate the two specific biases described and thus should be a valuable addition to the clinical research armamentarium. They are particularly important for evaluating medications such as replacement estrogens, where some adverse events may occur at increased frequency early in therapy.
Supported in part by an Agency for Healthcare Research and Quality, Centers for Education and Research in Therapeutics cooperative agreement (grant HS1-0384) and a cooperative agreement with the Food and Drug Administration (FD-U-001641).
Correspondence to Dr. Wayne A. Ray, Department of Preventive Medicine, Medical Center North, A-1124, Vanderbilt University Medical Center, Nashville, TN 37232 (e-mail: email@example.com).