Sir,

We read with great interest the recent article by Dr Cahan and colleagues regarding the inability of residents to accurately determine probability.1 Despite the recent surge in teaching Evidence Based Medicine (EBM) in medical schools, the effectiveness of current teaching strategies remains unclear. We sought to determine how readily medical students and physicians identify the diagnostic terms often stressed in EBM.

Relevant articles were identified by searching various database including Medline (1980–2003), Embase (1988–2003), PsychInfo (1984–2003), Web of Science (1993–2003), educational websites, and bibliography of relevant articles. Study design, quality of study, and limitations of study were abstracted by two independent reviewers. Review articles, letters of editors, editorials of innumeracy, and diagnostic tests were excluded.

We identified eight articles (5 case scenarios, 2 questionnaires, and 1 telephone survey) that met the inclusion criteria (Table 1).2–9 The number of participants in the studies varied from 31 to 300. There was considerable heterogeneity in the various studies. The commonest physician error was in overestimating the PPV (78–95%). One study described that the number of physicians using Bayesian calculations, ROC and LR was 3%, 1% and 0.66%, respectively. Medical students could not rule out diseases in low and intermediate probability case scenarios applying Bayesian estimates. In one study from Australia, 13 of 50 (26%) physicians stated that they could describe PPV, although on direct interviewing only one could actually illustrate it with an example. In another study, presenting the data in Natural frequency format increased the accuracy of determining PPV to 46%.

Table 1

Studies of understanding diagnostic terms

Study Type Author (year) Design Results 
Telephone survey Reid MC (1998)2 Survey of 300 physicians– frequency of using quantitative diagnostic methods, sensitivity, specificity, ROC, LR, Bayesian logic Eight (3%) used Bayesian, three (1%) used ROC, two (0.66%) used LR. Non-familiarity with LR and ROC (97%), Bayesian 76% 
Controlled questionnaire Steurer J, (2002)3 Swiss GPs (n = 263) were surveyed on definition of terms sensitivity, PPV determined and also calculated PPV. Test accuracy in clinical vignette, when tests were presented as test only, test + (sens. and spec.), test + (description of LR in plain language) Correct definition of sensitivity 76%, PPV 61%. PPV was calculated accurately only by 22% of GPs. PPV best estimated when results of LR of test presented in plain language 
Questionnaire and survey Young JM (2002)4 Australian GPs (n = 50) were surveyed to describe the terms PPV, sensitivity, and specificity, followed by a direct interview by study author 13/50 said they knew about PPV but only one met the criteria for identifying it correctly 
Clinical cases Hoffrage U, Gigerenzer G (1998)5 German physicians (n = 48) were asked to calculate the PPV of four diagnostic tests. Data were presented as probabilities or as natural frequencies. Overall correct answers: Bayesian format 10%; Natural frequency format 46% 
Generic case scenarios Lyman GH (1994)6 Physicians (n = 31) and health care workers (n = 19) were presented with cases where the sensitivity, specificity and pre-test probability were varied, and asked to calculate PPV Overestimating PPV in scenarios presented with lower pre-test probability. Non physicians estimates of PPV in cases with negative tests were inconsistent 
Generic cases Lyman GH (1993)7 Physicians (n = 31) and health care workers (n = 19) were presented with two hypothetical cases of a 30-year-old and a 70-year-old woman with a breast lump. Estimate pre-test, post-test, sensitivity, specificity Physicians and non-physicians both overestimate the PPV 
Case scenarios Noguchi Y (2002)8 Japanese medical students (n = 234). Three case scenarios with low, intermediate, high probability for CAD. Estimates of pre-, post-test characteristics of stress tests were elicited from students (intuitive estimates), and from literature (reference estimates) Medical students could not rule out disease in low and intermediate probability situations, because of error in estimating the pre-test diagnosis and applying Bayesian estimates in clinical practice. May result in ordering unnecessary testing 
Generic cases Eddy DM (1982)9 Physicians (n = 100) asked to calculate PPV given positive mammogram 95/100 estimated an incorrect probability of 75%, which was 10 times the correct frequency 
Study Type Author (year) Design Results 
Telephone survey Reid MC (1998)2 Survey of 300 physicians– frequency of using quantitative diagnostic methods, sensitivity, specificity, ROC, LR, Bayesian logic Eight (3%) used Bayesian, three (1%) used ROC, two (0.66%) used LR. Non-familiarity with LR and ROC (97%), Bayesian 76% 
Controlled questionnaire Steurer J, (2002)3 Swiss GPs (n = 263) were surveyed on definition of terms sensitivity, PPV determined and also calculated PPV. Test accuracy in clinical vignette, when tests were presented as test only, test + (sens. and spec.), test + (description of LR in plain language) Correct definition of sensitivity 76%, PPV 61%. PPV was calculated accurately only by 22% of GPs. PPV best estimated when results of LR of test presented in plain language 
Questionnaire and survey Young JM (2002)4 Australian GPs (n = 50) were surveyed to describe the terms PPV, sensitivity, and specificity, followed by a direct interview by study author 13/50 said they knew about PPV but only one met the criteria for identifying it correctly 
Clinical cases Hoffrage U, Gigerenzer G (1998)5 German physicians (n = 48) were asked to calculate the PPV of four diagnostic tests. Data were presented as probabilities or as natural frequencies. Overall correct answers: Bayesian format 10%; Natural frequency format 46% 
Generic case scenarios Lyman GH (1994)6 Physicians (n = 31) and health care workers (n = 19) were presented with cases where the sensitivity, specificity and pre-test probability were varied, and asked to calculate PPV Overestimating PPV in scenarios presented with lower pre-test probability. Non physicians estimates of PPV in cases with negative tests were inconsistent 
Generic cases Lyman GH (1993)7 Physicians (n = 31) and health care workers (n = 19) were presented with two hypothetical cases of a 30-year-old and a 70-year-old woman with a breast lump. Estimate pre-test, post-test, sensitivity, specificity Physicians and non-physicians both overestimate the PPV 
Case scenarios Noguchi Y (2002)8 Japanese medical students (n = 234). Three case scenarios with low, intermediate, high probability for CAD. Estimates of pre-, post-test characteristics of stress tests were elicited from students (intuitive estimates), and from literature (reference estimates) Medical students could not rule out disease in low and intermediate probability situations, because of error in estimating the pre-test diagnosis and applying Bayesian estimates in clinical practice. May result in ordering unnecessary testing 
Generic cases Eddy DM (1982)9 Physicians (n = 100) asked to calculate PPV given positive mammogram 95/100 estimated an incorrect probability of 75%, which was 10 times the correct frequency 

Despite the heterogeneity in the various studies, the results are generalizable as they have been carried out in four continents and yield similar results. Physician innumeracy remains an impediment in popularizing EBM. Inattention to pre-test probability, and inability to assess the PPV accurately, could result in increased anxiety in patients by generating unnecessary tests and consultations. Increased attention to EBM instructions and presentation of data in alternative formats (e.g. natural frequency) may be indicated. The limitations of our analysis include the small number of studies, their sometimes small number of subjects, and the variation in study design.

References

1
Cahan A, Gilon D, Manor O, Paltiel O. Probabilistic reasoning and clinical decision-making: do doctors overestimate diagnostic probabilities?
Q J Med
 
2003
;
96
:
763
–9.
2
Reid MC, Lane DA, Feinstein Academic calculations versus clinical judgements: Practicing physicians use of quantitative measures of test accuracy.
Am J Med
 
1998
;
104
:
374
–80.
3
Steurer J, Fischer JE, Bachman LM, et al. Communicating accuracy of tests to general practioners: a controlled study.
Br Med J
 
2002
;
324
:
824
.
4
Young JM, Glasziou P, Ward JE. General practitioners’ self rating of skills in evidence based medicine: a validation study.
Br Med J
 
2002
;
324
:
950
.
5
Hoffrage U, Gigenzenger G. Using natural frequencies to improve diagnostic inferences.
Acad Med
 
1998
;
73
:
538
–40.
6
Lyman GH, Balducci L The effect of changing disease risk on clinical reasoning.
J Gen Intern Med
 
1994
;
9
:
488
–94.
7
Lyman G,.Balducci L. Overestimation of test effects in clinical judgement.
J Cancer Edu
 
1993
;
8
:
297
–307.
8
Noguchi Y, Matsui K, Imura H, et al. Quantitative evaluation of the diagnostic thinking process in medical students.
J Gen Intern Med
 
2002
;
17
:
839
.
9
Eddy DM. Probabilistic reasoning in clinical medicine: problems and opportunities. In: Kahneman D, Sloviv, Tversky A, eds.
Judgement under uncertainty: Heuristics and Biases
 . Cambridge UK, Cambridge University Press,
1982
:
249
–67.