The “gold standard” employed for obtaining blood pressure (BP) for all the National Health and Nutrition Examination Surveys (NHANES) has been the mercury sphygmomanometer (HgS). Because of environmental concerns, there is a need to explore an alternative to HgS.
We compared the accuracy of the Welch Allyn 767 wall aneroid sphygmomanometer (AnS) to the HgS in children and adults and by BP cuff sizes. Each participant had three BP measurements per device recorded sequentially. The order of the devices and observer were random. A total of 727 individuals participating in the NHANES participated in the study.
The mean AnS readings were not statistically significantly different from those of the HgS with the exception of systolic BP (SBP) in aged 8–17 years (mean difference 1.10, s.d. 4.87). There were no statistically significantly different by BP cuff sizes. Agreement for the prevalence of hypertension (BP ≥140 systolic or diastolic≥90 mm H g) was above chance (κ = 0. 81; sensitivity = 81%; specificity = 98%) with AnS readings underestimating by 1.66%(18.33 vs. 20%, P> 0.05) compared to the HgS reading.
American Journal of Hypertension advance online publication 16 December 2010;.doi:10.1038/ajh.2010.232
For many years, the “gold standard” employed for all the National Health and Nutrition Examination Surveys (NHANES) has been the mercury sphygmomanometer (HgS). However, because of the increased environmental concerns about the disposal of mercury-contaminated medical waste and the risk of spills from HgSs, the HgS is being phased out in clinical settings. In 1999, the Environmental Protection Agency and American Hospital Association agreed to eliminate mercury-containing waste from the health-care industry by 20051—an ideal not yet achieved. In that report the example of best practices cited a federal hospital's complete elimination of mercury-containing instruments including sphygmomanometers and cited multiple sources of discussion around the recommendations for and against. The most recent American Heart Association recommendations for blood pressure (BP) measurement also note that mercury use is being banned in many countries.2 Therefore, alternatives to the HgS are being increasingly used and national standards need to continue with technology that can be used by the majority of health-care providers and for home BP measurement.
The aneroid sphygmomanometer (AnS), a mercury-free device employing the same auscultatory principles as those used for the HgS, has been used as a substitute for the mercury HgS. Although the AnS is ubiquitous, it is surprising to find that few studies have been published comparing aneroid to the mercury manometers.3,4 and there are none in the public service setting. The exception is a recently published study by Ma et al. comparing an aneroid (the Welch Allyn Tyco 767) to a HgS.5 The study involved 997 subjects but it had a number of limitations. Specifically, it took place in 24 clinics, among adults only and the same technician observed both the AnS and HgS raising the issues of observer bias.
Our study comparing a HgS and an AnS, in the setting of the NHANES, had three objectives. The first was to compare the measured values of BP, systolic and diastolic, both in youth and adults. Because recent studies showed that mid-arm circumference has increased significantly in recent years6,7 and this has implication for the accuracy of BP measurements; we compared the measured values of BP among the different BP cuff sizes. The second was to compare hypertension classifications (JNC 7) between the two sphygmomanometer types.8 The third was to examine the influence of higher levels of BP (systolic and diastolic) on the difference between the readings obtained with the two sphygmomanometers, controlling for age, gender, body mass index, BP cuff size, and examiner effects.
Study population. The study participants were from the NHANES. Briefly, NHANES is a national survey of the civilian noninstitutionalized US population. In 1999, NHANES became a continuous survey. Annual samples are selected using a complex, stratified, multistage probability sampling method and the data are released in 2-year cycles. Descriptions of the sample design and data collection methods are available at the NHANES website.9 Survey participants were interviewed in their homes and examined in mobile examination centers. After completing the mobile examination centers examination portion the participants were asked to volunteer for the validation study, which was described as an additional optional study; no randomization was attempted and the sample was a “convenience” sample. The NCHS Ethics Review Board approved the study.
The study was designed in two parts, the first of which dealt with the accuracy of the Omron HEM-907XL10 and the second part with the accuracy of the aneroid manometer against the mercury manometer. This analysis addresses the aneroid—mercury arm of the study. During survey years 2006 and 2007, a total of 727 individuals aged 8 years and older agreed to participate in the validation study. Among the participants 390 were male (54%) and 337 (46%) were female. For this analysis, age was categorized into three groups: 8–17, 18–49, and ≥50 years. Among the age groups, 127 (17%) individuals were 8–17 years old; 236 (32%) were 18–49 years old; and 364 (50%) were ≥50 years. The mean age of the participants was 45.9 years (s.d. 23.3; range 8–92 years).
Measurements. An auscultatory AnS (wall-mounted Welch Allyn model 767) was used in this study. The aneroid gauge consists of a metal bellows and a watch-like movement connected to the compression cuff. Variations of pressure within the system cause the bellows to expand and contract. Movement of the bellows rotates a gear that turns a pointer pivoted on bearings, across a calibrated dial. The pressure is read from the dial.11
A wall-mounted HgS (Baumanometer) was used as the standard comparison device. Four cuff sizes (child, adult, large adult, and thigh) also manufactured by Baumanometer were shared by both devices. Specifically, among the participants: 3.3% used a child cuff; 31.5% used an adult cuff; 52.1% used a large adult cuff; and 13.1% used a thigh cuff.
Measurement of BP. A standardized protocol was used to train each technician. The initial training was followed by a biannual retraining. The training protocol for the mercury device was similar to that used for training the NHANES physicians who measure survey BP12 and greatly reduces digit preference.
The measurements were taken in a quiet room with an ambient temperature between 58 and 83°F, average = 76°F. Participants were seated in a chair with back support, with both feet resting comfortably on the floor and right arm in which the BP readings were taken supported on a level surface with the cubital fossa at heart level. After a 5-min rest, the study participants had their systolic and diastolic BP (DBP) (onset of K1 and K5) measured, each determination 30sec apart. The order of sphygmomanometer used was assigned at random. Additionally, observers were randomly assigned to AnS or HgS. If the AnS measurements were made first, a different observer made the HgS measurements and vice versa to avoid observer bias. Moreover, the observers were blinded to the other device values.
Following the Mayo Clinic, Rochester Minnesota protocol the Netech DigiMano digital pressure and vacuum meter part no. 200-2000IN was used for accurate calibration of the aneroid.11 The HgS device was calibrated using the standard NHANES calibration protocol.12,13 Both instruments were calibrated at the beginning of the study and on a weekly basis throughout the duration of the study which totaled 1.5 years. (In addition, there were daily visual checks that were performed on each device.)
A total of eight trained technicians were involved in the BP comparative study. The observers zero end digit preference was 22% for the HgS observed systolic; 25% for the HgS observed DBP; 24% for the AnS observed systolic; and 26% for the AnS observed DBP.
Other measurements. Individual age was obtained at interview time. Height in cm and weight in kg were measured in the mobile examination center following the standard NHANES anthropometric protocol. Body mass index was calculated as weight in kg divided by the height in m2 (kg/m2).
Mid-arm circumference is determined by having the examinee stand erect with feet together and the right arm flexed 90° at the elbow with the palm facing up. On the right scapula, the observer locates and marks with a horizontal line the uppermost edge of the posterior border of the acromion process. The observer holds the zero end of the measuring tape at this mark and extends the tape down the posterior surface of the arm to the tip of the olecranon process. The observer makes a horizontal mark at the midpoint at the posterior aspect of the arm and measures the arm circumference.12
Statistical analyses. We estimated that a sample size of 127 participants, at an α = 0.05 and given a s.d. of 17.7mmHg, would provide 85% power to detect a ≥3mmHg mean difference between the 2 sphygmomanometers.
Each subject contributed three paired observations for each device. The data from these three observations were averaged. Paired t-test was used to test the difference between the two device-readings for each participant, overall and for three age groups: 8–17, 18–49, and 50+ years. Also, a graphic display of averaged device reading differences against corresponding averaged systolic and diastolic paired measurements separately (Bland and Altman graph) is presented to visually assess the relationship between the two devices.14
We calculated sensitivity and specificity as well as the κ statistic to assess the agreement between the devices whereas HgS was considered the gold standard for systolic defined hypertension (BP ≥140mmHg), diastolic defined hypertension (BP ≥90mmHg), and ≥stage one hypertension (≥140 systolic or 90 diastolic mmHg).8 In addition to κ, sensitivity, and specificity statistics we used McNemar's test to assess the proportion of individuals classified as hypertensive by the two devices. McNemar's test assesses the significance of the difference between two correlated proportions where the two proportions are based on the same sample of subjects.15 The hypertension classification values were based on the HgS readings obtained during the validation study and were estimated for adults only (individuals 18 years and older).
The correlation between systolic and diastolic readings by each device was assessed using the Pearson correlation coefficient. In addition, intraclass correlation coefficient (2.1), was calculated. Conceptually, intraclass correlation coefficient 2.1 assesses raters' agreement; in the present study the “raters” were the two devices.16
Multiple linear regression analyses were conducted to model the association of HgS systolic and diastolic measured BP readings on the magnitude of the difference in readings between the two devices for systolic and DBP controlling for age, gender, body mass index, observers (dummy coded), and cuff size: child, adult, large, and thigh cuff (dummy coded).
Finally, the effect of the order of reading (HgS/AnS or AnS/HgS) was tested using a paired t-test by age group categories. The α-level for a significant test was considered to be P < 0.05.
Between sphygmomanometer agreement
Table 1 presents the mean and s.d. for the HgS and AnS readings for systolic and DBP. With the exception of mean systolic BP (SBP) readings among individuals aged 8–17 years (SBP: HgS = 104.77 ± 9.64mmHg, AnS = 105.87 ± 9.79mmHg; mean difference = 1.1 ± 4.87mmHg, P = 0.011) there were no statistically significant differences between HgS and AnS readings overall or for each age group or cuff size category. Absolute mean differences for systolic and DBP readings were calculated. Fifty-two percent of SBP reading differences were ≤3mmHg and 69% were ≤5mmHg. Also, for DBP, 49% of diastolic reading differences were ≤3mmHg and 69% were ≤5mmHg.
Figures 1 and 2 displays the Bland–Altman plots; the plots show the mean differences in systolic and DBP device readings against the corresponding averaged BP readings of both devices. All figures show some extreme values beyond two s.d. but no discerning linear relationship can be ascertained between the y- and the x-axes (systolic correlation r = −0.06, P > 0.05; diastolic correlation r = 0.03, P > 0.05).
Bland Altman14 graph aneroid vs. mercury, systolic. BP, blood pressure.
Bland Altman14 graph aneroid vs. mercury, diastolic. BP, blood pressure.
Table 2 presents the agreement in the classification of an elevated SBP (≥140), DBP (≥90), and hypertension (BP ≥140 systolic or /90 diastolic mmHg) by the two devices. All agreements were above those to be expected by chance (κ = 0.82, 0.75, and 0.81) with agreement lowest for DBP. Consistently more individuals were correctly identified as nonhypertensive in all hypertensive subcategories (a notably high specificity) than were correctly identified as hypertensive (relatively lower sensitivity). For all hypertension subcategories the HgS device identified a slightly higher proportion of hypertensives than did the AnS (device difference for SBP = 0.67%, for DBP = 0.84%, and for hypertension = 1.66%). The difference in the proportion of individuals classified as hypertensive by the two devices across all hypertension subcategories were not statistically significant (McNemar's test P > 0.05).
The BP values obtained using the two devices were highly correlated (r = 0.94 for SBP, P < 0.01 and r = 0.83 for DBP, P < 0.01). The intraclass correlation coefficient was 0.936 (95% confidence interval, 0.927–0.944) for SBP and 0.832 (95% confidence interval, 0.811–0.854) for diastolic showing a high between-device agreement, in essences, suggesting that the devices can be interchangeable.
Multiple linear regression analyses of the between-device difference in systolic values adjusted for age, gender, body mass index, cuff size, observer effect, and SBP (measured by HgS) and a similar model for between-device difference in DBP (measured by HgS) were calculated. The results show that HgS measured systolic and diastolic values were significantly associated with the difference in device readings for SBP and DBP (β-coefficient = −0.08890, P < 0.01; β-coefficient = −0.18298, P < 0.01, respectively). Because both coefficients were negative, the results suggest that 1mmHg increase in SBP corresponded to 0.09mmHg decrease in the differences in systolic device readings. Similarly, 1mmHg increase in DBP corresponded to 0.2mmHg decrease in the differences between-device diastolic readings. Neither observers nor cuff sizes were significantly associated with the discrepancy between the two devices for either systolic or diastolic readings (data not shown).
Finally, we analyzed the effect of the order of randomization on device difference by age groups (8–17, 18–49, and 50+). There was a statistically significant difference (P < 0.05) between the device readings for SBP in individuals aged 8–17 where AnS was the first condition (e.g., AnS/HgS) (systolic: AnS mean = 104.81s.d. 9.13; HgS mean = 102.86, s.d. 9.82mmHg, P = 0.003). All other randomization conditions were not statistically significantly different.
The overall results of the study demonstrate that BP readings obtained by an AnS, a non-HgS, were not significantly different from readings obtained by an HgS. There was a statistically significant difference between-device readings in SBP only for youth (ages 8–17), however, the difference is not clinically significant (1.1mmHg). The few previous studies did not include youth in their samples, so we have no data to compare with the results in our study.
BP cuff size did not significantly affect the difference in device readings. Regrettably, we did not have enough statistical power to generalize from small cuff size (N = 24) affect. As for X-large cuff size category (N = 95), accepting a mean difference of ≤4mmHg, at an α = 0.05 and a s.d. of 17.7mmHg would provide 90% power to detect difference between the two sphygmomanometers in paired N of 85 individuals, hence we do have enough power to generalize; provided we accept a slightly larger mean difference (from 3 to 4mmHg).
The HgS device identified a slightly higher proportion of hypertensives than did the AnS but these differences were not statistically significant. Crudely applying these findings to the national prevalence estimates of hypertension for years 2005–2006 would have resulted in a nonsignificant change in the hypertension prevalence estimate. Stated differently, if an AnS device was used in the NHANES for survey years 2005–2006, the prevalence rate of hypertension would have been 1.66% lower than the reported rate of 29% but still within the 95% confident interval (29% confidence interval 27–30.9%).
This lower estimate of hypertension prevalence by the AnS though modest and a result of the 81% sensitivity of the AnS could present a potential problem in comparison of national prevalence estimates in the future if the HgS replacement is made with a similar AnS. However, it is problem that will have to be faced given a mandatory need to replace HgS in the future with whatever replacement sphygmomanometers are chosen. It is also a problem faced by all disease-tracking technologies, whether by new laboratory measures of disease markers and or changes in nomenclature as with ICD coding changes that have resulted in mortality “steps” up or down of specific disease classifications. That said, the AnS “step” of 1.66% is a modest one and can easily be incorporated with the very high specificity into trackable hypertension estimates in the future
The multivariate adjusted regression model findings suggested that differences between the two devices were larger at higher BP values. However, although statistically significant, the multivariate regression models for both systolic and DBP explained very little of the variability of the difference between-device readings (adjusted R2 = 0.06 and 0.09, respectively). Similar results were reported very recently by Ma et al.5
The aneroid manometer has moving parts that are subject to fatigue and breakage. The metal diaphragms and the coiled spring are the most vulnerable to such problems but when they are properly maintained they can provide accurate BP readings.11 We followed the Mayo Clinic, Rochester Minnesota protocol for regular calibration of the aneroid. Indeed, during the 2 years of the study we had to replace only one aneroid. In that setting it should also be remembered that HgSs are readily imprecise if not maintained regularly. In two different hospital surveys it was noted that as many as 50% of HgSs tested defective.17,18
The study had a number of limitations. We were unable to validate the aneroid manometer using either the Association for the Advancement of Medical Instrumentation or British Hypertension Society standard protocols.19,20 To approximate these protocols we would have required that two trained observers be used to obtain BP measurements for each device, for a total of four observers, which although desirable was not feasible in our survey environment.
Furthermore, unlike traditional validation studies the BP observers were not continuously monitored for observer agreement within 4 or 5mmHg, repeating any measure >4 or 5mmHg. No exclusion criteria were used in the study to exclude individuals having arrhythmias or on pharmacological therapy for hypertension.
The AnS was easy to use and differences in readings compared to the HgS were not statistically significant with the exception of a small (1.1mmHg) over-read of SBP in youth ages 8–17 years. However, some caution needs to be considered given the 81% sensitivity with regard to hypertension classification. This is the first field study to include individuals in a varied age range in which the observers were blinded to the mercury results and in which both subjects and observers were randomly assigned to mercury or aneroid device for the first readings. Moreover, all observers were rigorously trained and closely followed for quality assurance and control. The field setting of the study provided a true measure of the device performance in a survey environment and also shows that an accurate, well-calibrated AnS could replace a HgS in the quest to remove mercury from the environment.
We would like to express our gratitude to all involved and who made it possible to complete this validation study. We would like to thank Carlene Grim from Shared Care, Inc., Dr Grace Willard, Doe Knight, Gunda Kube, Ruth Pressley, Olga Guererro, Jamie Saltsman, and Belma Ybarra all from Westat, Inc. and Jeffery Hughes from the NHANES program NCHS, CDC.
The authors declared no conflict of interest.