The standardization of various medical procedures, an evaluation of “quality” care, and the development of formal approaches to error prevention are becoming increasingly important processes in modern medicine.1 Procedural and interpretative algorithms, evidence-based practice protocols, and other guidelines have been published by various medical societies, and others are sure to follow.2 Compliance with their use is usually regarded as a “yardstick” for quality evaluation by health care administrators, and such information is used by payers and licensing agencies; it is even electronically published on the Internet.3 Accordingly, most medical specialties currently sponsor the development of “evidence-based” practice guidelines that are developed by using various concepts that fall under the rubric of “evidence-based medicine” (EBM).4 As a group, however, pathologists have been slow to adopt a similar approach, which can be termed evidence-based pathology.5,6
Literally hundreds of evidence-based medical protocols are already readily available through the National Guideline Clearinghouse, the Institute of Medicine, the American Cancer Society, the Cochrane Collaboration, and other Web sites.2,7–9 These documents are based on data in medical publications, which are generally referenced as “evidence.” The inherent “quality” of their contents is described as reflection of “evidence levels” (ELs).10 EBM advocates have promoted the use of various schemes for evaluation of ELs, principally aimed at an assessment of the validity and clinical applicability of therapeutic procedures. For example, a recent book by Straus et al11 codifies several ELs with level 1 being the best. Level 1a is the label for systematic reviews with homogeneity of randomized clinical trials; level 1b refers to individual randomized clinical trials (RCTs) with narrow confidence intervals; level 2a denotes systematic homogeneous reviews of cohort studies; level 2b includes individual cohort studies including “low-quality” RCTs (eg, with <80% follow-up); level 3a refers to systematic homogeneous reviews of case-control studies; level 3b is principally represented by individual case-control studies; level 4 denotes case series with poor-quality cohort-based and case-control studies; and level 5 is the EL represented by “expert opinion” without explicit critical appraisal or firsthand generation of data (“first principles”).11
Other comparable EL systems have been published by the Cochrane Collaboration9 and similar groups.10 Generally, only information obtained by RCTs, or systematic reviews with meta-analysis of homogeneous case-control studies, has been considered as evidence in levels 1 and 2. Data derived from individual case-control assessments are usually considered to be in level 3 or higher because it has been shown that such observational studies are affected negatively by sources of bias that result from patient selection, sample size, distribution of data, lack of independent validation, and others.5,6
The ELs that are used for evaluation of clinical treatment protocols generally pose a problem for pathology and laboratory medicine (PLM). Because scientific studies in our specialty do not lend themselves to the use of RCT designs, clinical EL systems essentially consign most pathology literature to EL 3 or worse. Nevertheless, the notion that pathologist-generated literature is, at best, mediocre undervalues the many contributions of our specialty to the body of medical knowledge. Indeed, many observational studies in PLM, including well-designed case series and even some seminal case reports, have resulted in recognition of new diseases and specific clinicopathologic entities.
Moreover, the evidence levels proposed by non-PLM advocates of EBM seem to have little relevance to the particulars of our professional discipline. In fact, they may act as a disincentive for pathologists to improve the quality—in a statistical sense—of their publications. Indeed, no practical system currently exists to allow readers of PLM articles—especially those with a clinical emphasis—to assess the quality of the data they present. That failing also affects the evaluation of individual recommendations that are published by perceived “experts” in our specialty, tending to give them special weight.
Studies by several analysts have considered the process of designing scientific assessments; because physicians in PLM are unlikely to include RCTs in their investigations, it is important to recognize key factors that may influence the validity of their work.4 A detailed consideration of that issue is beyond the scope of this discussion, but generally, the essential details in scientific study design include the following: (1) a preference for prospective rather than retrospective accrual of data; (2) attention to sample size, with “power analysis” thereof; (3) the use of case-control paradigms rather than simple descriptions of findings in a particular cohort; and (4) the application of appropriate and complete statistical analyses of data sets.5,6
In addition, the results of an observational study should ideally be validated using an independent data set that has not been used to derive the conclusions.11 Under such conditions, the results would be affirmed or disputed through the use of external data from several other institutions. As another approach, the use of meta-analyses provides a well-developed method for integration of results from different publications.12 Conversely, a lack of adherence to preferred principles of study design often results in disease classification schemes that are irreproducible in practice, immunostaining procedures that produce incongruent results when they are performed at different laboratories, marked variation in the prognostic and predictive values of various analytes, and other well-known problems in the current practice of PLM.
Table 1 proposes a simple approach for the assessment of ELs in PLM and considers some of the issues discussed herein. We hope that its contents will stimulate interest and debate among the members of our professional societies, with the aim of developing a comprehensive and useful scale of ELs that can be applied meaningfully in the publications of our specialty.