-
PDF
- Split View
-
Views
-
Cite
Cite
Umberto Di Dedda, Gabriele Pelissero, Beatrice Agnelli, Carlo De Vincentiis, Serenella Castelvecchio, Marco Ranucci, Accuracy, calibration and clinical performance of the new EuroSCORE II risk stratification system, European Journal of Cardio-Thoracic Surgery, Volume 43, Issue 1, January 2013, Pages 27–32, https://doi.org/10.1093/ejcts/ezs196
- Share Icon Share
Abstract
The European System for Cardiac Operative Risk Evaluation (EuroSCORE) has been used for many years since its introduction in 1999. Recently, a new EuroSCORE (EuroSCORE II) has been developed to update the previous version. The EuroSCORE II includes some different predictors and/or introduces a new classification of the already existing predictors. This study presents a validation series for the EuroSCORE II compared with the previous additive and the logistic EuroSCORE and with the Age, Creatinine and Ejection Fraction (ACEF) score.
A total of 1090 consecutive adult patients operated on at our institution from September 2010 to October 2011 were admitted to this retrospective study. All the patients received a risk stratification based on the EuroSCORE II and the other scores considered. Accuracy, calibration and clinical performance of the various risk models were assessed.
The accuracy of the EuroSCORE II was good (c-statistic 0.81) but not significantly higher than the other scores (range 0.78–0.8). Calibration at the Hosmer–Lemeshow statistic was good for all the scores; the difference between observed (3.75%) and predicted mortality in the overall population was not significant for the EuroSCORE II (3.1%) and the ACEF score (3.4%), whereas the additive EuroSCORE (5.8%) and the logistic EuroSCORE (7.3%) significantly overestimated the risk. In patients at low, mild moderate and high mortality risk, the EuroSCORE II provided a risk prediction not significantly different from the observed mortality rate, whereas in very high-risk patients (observed mortality rate 11%), it significantly underestimated (6.5%) the mortality risk. The accuracy of the EuroSCORE II was acceptable in isolated coronary surgery, and good or excellent in the other operations.
The EuroSCORE II represents a useful update of the previous EuroSCORE version, with a much better clinical performance and the same good level of accuracy. It is possible that for the risk stratification of very high-risk patients, other factors (rare but associated with a mortality rate >50%) should be included in the future models.
INTRODUCTION
The European System for Cardiac Operative Risk Evaluation (EuroSCORE), developed in 1999, is a predictive model for 30-days (operative) mortality risk after cardiac operations in adults [1, 2].
After extensive validation tests, the use of this score gained wide spread popularity in many European countries [3–10].
Recently, the EuroSCORE has been considered one of the possible tools to assess the operative mortality risk in coronary operations [together with the STS-PROM score and the Age, Creatinine and Ejection Fraction (ACEF) score] in the joint guidelines of the European Association for Cardio-Thoracic Surgeons (EACTS) and the European Society of Cardiology (ESC) [11].
Initially, the EuroSCORE was based on an additive system (the additive EuroSCORE) [1]. Subsequently, the logistic EuroSCORE was proposed as a more sound approach to risk prediction [6]. Many studies addressed the accuracy, calibration and clinical performance of the EuroSCORE [12–15].
As a general opinion, the EuroSCORE, both in its logistic and additive versions, has a good level of accuracy, with a c-statistic of around 0.75–0.80 [15]. Conversely, the calibration of the model is more doubtful, and the predicted mortality rate was significantly higher than the observed one in a number of recent validation studies. This applies especially to high-risk patients [13–15].
Different attempts were made to mathematically adjust the EuroSCORE [15, 16], but in October 2011 a new version of the EuroSCORE, named the EuroSCORE II, was presented at the EACTS Lisbon meeting. The EuroSCORE II is the result of a totally new development series of patients, and carries a number of differences with respect to the previous version. Some new factors were included, others were excluded and yet others were more detailed in their definition. Based on these new factors or definitions, the EuroSCORE II was developed with a standard logistic regression approach, and this of course led to a different calibration of the model deriving from totally different regression coefficients for each of the individual risk factors.
So far, no study addressed the accuracy and calibration of the EuroSCORE II. The present study is a validation study of the EuroSCORE II with specific respect to its accuracy and calibration compared with the original standard and logistic EuroSCORE and with the ACEF score [17, 18].
METHODS
EuroSCORE II
The EuroSCORE II differs from the previous version by being inclusive of new factors and definitions. Among the new patient-related risk factors, diabetes on insulin therapy was added. Eight factors (age, gender, chronic pulmonary disease, extracardiac arteropathy, neurological dysfunction, previous cardiac surgery, active endocarditis and critical preoperative state) remained in the model with different definitions for some of them. In particular, extracardiac arteropathy now includes previous amputation, and neurological dysfunction is replaced by ‘poor mobility’, defined as mobility impairment due to musculoskeletal or neurological dysfunction. Serum creatinine value was replaced by creatinine clearance, and the left ventricular ejection fraction (LVEF) was stratified in four categories.
There are now five cardiac-related factors, as the patient's New York Heart Association (NYHA) classification was introduced. Pulmonary hypertension is now stratified into two categories, according to the systolic pulmonary arterial pressure (sPAP) (moderate if sPAP = 31–35 mmHg and severe if sPAP >55 mmHg).
Three operation-related factors are now considered: the regimen of intervention is divided into elective, urgent, emergency and salvage. The operation-related risk is now more detailed based on the type of operation performed. Surgery on the thoracic aorta is still present as single risk factor apart from other procedures. Ventricular septal rupture is not included anymore.
Patients
This is a retrospective study on a series of 1090 consecutive patients operated on at our institution in the period from September 2010 to October 2011. The local ethics committee approved the experimental design and waived the need for a written informed consent from the patients, who however gave a written consent to the scientific treatment of their data for scientific purposes and in an anonymous form. Exclusion criteria were: age <18 and >95 years (according to the EuroSCORE notes, www.euroscore.org) and the presence of congenital heart disease.
The original dataset was 1513 patients. Three hundred and forty-nine patients were excluded for being outside the age range, and 74 were excluded as they affected by a congenital heart disease. The remaining cohort of 1090 patients represented the study patient population.
Data collection
All data were retrieved from our institutional database. Starting from a predisposed computerized format containing all individual patients’ data, the EuroSCORE II was calculated for every single patient enrolled in the series with the EuroSCORE II interactive calculator (available at www.euroscore.org).
The mortality risk according to the EuroSCORE II was not calculated based on the logistic regression coefficients that are still not publicly available, but directly calculated for each patient by using the on-line calculator at www.euroscore.org accessed from 20 November 2011 to 15 December 2011. For each patient, the mortality risk was already assessed according to the other two risk models routinely used at our institution (the EuroSCORE [additive and logistic] and the ACEF score).
Data routinely collected in our institutional database include all the risk factors considered in the EuroSCORE plus some additional data, which allowed us to re-calculate the operative mortality risk according to the EuroSCORE II. When needed, additional information could be retrieved from the original patients’ files. Creatinine clearance was mathematically calculated based on serum creatinine value (mg/dl), age, weight and gender, according to the Cockcroft–Gault equation [19]. Poor mobility was assessed using our frailty scale (measurement done only in patients with evident functional impairment) and the presence of neurological deficits. The presence of amputation was already noted and was used to re-define peripheral arteropathy. NYHA class was already present in the database as well as the exact LVEF and pulmonary artery pressure. The type of operation and the urgency level were already in the database.
The EuroSCORE II, the EuroSCORE and the ACEF score were assessed for accuracy, calibration and clinical performance with adequate statistical analyses.
The accuracy of each model was assessed with receiver operating characteristics (ROC) analyses considering the area under the curve (AUC) and its 95% confidence intervals (CI).
The calibration was assessed with the Hosmer–Lemeshow test and, as a further measure of calibration and clinical performance, the predicted mortality rate was compared with the observed mortality rate in the whole patient population and in subgroups of patients according to the categorization of risk into quintiles of distribution. Bias and agreement between the EuroSCORE II and the logistic EuroSCORE were assessed using a Bland–Altman analysis [20]. The accuracy and clinical performance of the EuroSCORE II were tested in subgroups of patients according to the predicted risk and to the type of cardiac operation.
Differences between AUCs in the different scores, and differences between predicted and observed mortality rates were explored by comparing the values with 95% CI and considering the inclusion of one of the values within the 95% CI of the other as non-significant.
All data were expressed as number and percentage with 95% CI, mean and standard deviation (SD) or median and interquartile range when appropriate. Statistical analysis was performed using a computerized package SPSS 13.0 (SPSS Inc., Chicago, IL, USA).
RESULTS
The patient population was analysed for preoperative risk profile (Table 1). According to the data reported in Table 1, risk stratification was performed using the additive and logistic EuroSCORE, the EuroSCORE II and the ACEF score. The operative mortality rate was 3.75% (95% CI 2.55–4.79%). This observed value was compared with the predicted value according to the four risk models utilized (Table 2). The predicted and observed values did not significantly differ for the EuroSCORE II and the ACEF score; conversely, both the additive and logistic EuroSCORE significantly overestimated the operative mortality risk in the overall population.
Risk factor . | Value number (%) or mean ± SD or median (range) . |
---|---|
Age (years) | 64.5 ± 13.5 |
Female | 345 (31.7) |
Creatinine clearance (ml/min) | |
Good | 425 (39) |
Moderately impaired | 454 (41.7) |
Severely impaired | 200 (18.3) |
On dialysis | 11 (1) |
Extracardiac arteropathy | 37 (3.4) |
Poor mobility | 1 (0.1) |
Previous cardiac surgery | 79 (7.3) |
Chronic lung disease | 84 (7.7) |
Active endocarditis | 29 (2.7) |
Critical preoperative state | 65 (5.9) |
Diabetes on insulin | 194 (17.8) |
NYHA classification | 1 (1–3) |
CCS Class 4 angina | 33 (3.0) |
Left ventricular ejection fraction | |
Good | 664 (60.9) |
Intermediate | 359 (32.9) |
Poor | 58 (5.3) |
Very poor | 9 (0.8) |
Recent myocardial infarction | 106 (9.7) |
Pulmonary hypertension | |
Missing | 267 (24) |
Moderate | 6 (0.5) |
Severe | 1 (0.1) |
Urgency | |
Elective | 1048 (96.1) |
Urgent | 24 (2.2) |
Emergency | 18 (1.7) |
Salvage | 0 (0) |
Weight of the intervention | |
Isolated CABG | 372 (34.1) |
Single non-CABG | 386 (35.4) |
Two procedures | 268 (24.6) |
Three procedures | 64 (5.9) |
Surgery on thoracic aorta | 85 (7.8) |
Risk factor . | Value number (%) or mean ± SD or median (range) . |
---|---|
Age (years) | 64.5 ± 13.5 |
Female | 345 (31.7) |
Creatinine clearance (ml/min) | |
Good | 425 (39) |
Moderately impaired | 454 (41.7) |
Severely impaired | 200 (18.3) |
On dialysis | 11 (1) |
Extracardiac arteropathy | 37 (3.4) |
Poor mobility | 1 (0.1) |
Previous cardiac surgery | 79 (7.3) |
Chronic lung disease | 84 (7.7) |
Active endocarditis | 29 (2.7) |
Critical preoperative state | 65 (5.9) |
Diabetes on insulin | 194 (17.8) |
NYHA classification | 1 (1–3) |
CCS Class 4 angina | 33 (3.0) |
Left ventricular ejection fraction | |
Good | 664 (60.9) |
Intermediate | 359 (32.9) |
Poor | 58 (5.3) |
Very poor | 9 (0.8) |
Recent myocardial infarction | 106 (9.7) |
Pulmonary hypertension | |
Missing | 267 (24) |
Moderate | 6 (0.5) |
Severe | 1 (0.1) |
Urgency | |
Elective | 1048 (96.1) |
Urgent | 24 (2.2) |
Emergency | 18 (1.7) |
Salvage | 0 (0) |
Weight of the intervention | |
Isolated CABG | 372 (34.1) |
Single non-CABG | 386 (35.4) |
Two procedures | 268 (24.6) |
Three procedures | 64 (5.9) |
Surgery on thoracic aorta | 85 (7.8) |
CABG: coronary artery bypass graft; CCS: Canadian Cardiovascular Society; NYHA: New York Heart Association; SD: standard deviation.
Risk factor . | Value number (%) or mean ± SD or median (range) . |
---|---|
Age (years) | 64.5 ± 13.5 |
Female | 345 (31.7) |
Creatinine clearance (ml/min) | |
Good | 425 (39) |
Moderately impaired | 454 (41.7) |
Severely impaired | 200 (18.3) |
On dialysis | 11 (1) |
Extracardiac arteropathy | 37 (3.4) |
Poor mobility | 1 (0.1) |
Previous cardiac surgery | 79 (7.3) |
Chronic lung disease | 84 (7.7) |
Active endocarditis | 29 (2.7) |
Critical preoperative state | 65 (5.9) |
Diabetes on insulin | 194 (17.8) |
NYHA classification | 1 (1–3) |
CCS Class 4 angina | 33 (3.0) |
Left ventricular ejection fraction | |
Good | 664 (60.9) |
Intermediate | 359 (32.9) |
Poor | 58 (5.3) |
Very poor | 9 (0.8) |
Recent myocardial infarction | 106 (9.7) |
Pulmonary hypertension | |
Missing | 267 (24) |
Moderate | 6 (0.5) |
Severe | 1 (0.1) |
Urgency | |
Elective | 1048 (96.1) |
Urgent | 24 (2.2) |
Emergency | 18 (1.7) |
Salvage | 0 (0) |
Weight of the intervention | |
Isolated CABG | 372 (34.1) |
Single non-CABG | 386 (35.4) |
Two procedures | 268 (24.6) |
Three procedures | 64 (5.9) |
Surgery on thoracic aorta | 85 (7.8) |
Risk factor . | Value number (%) or mean ± SD or median (range) . |
---|---|
Age (years) | 64.5 ± 13.5 |
Female | 345 (31.7) |
Creatinine clearance (ml/min) | |
Good | 425 (39) |
Moderately impaired | 454 (41.7) |
Severely impaired | 200 (18.3) |
On dialysis | 11 (1) |
Extracardiac arteropathy | 37 (3.4) |
Poor mobility | 1 (0.1) |
Previous cardiac surgery | 79 (7.3) |
Chronic lung disease | 84 (7.7) |
Active endocarditis | 29 (2.7) |
Critical preoperative state | 65 (5.9) |
Diabetes on insulin | 194 (17.8) |
NYHA classification | 1 (1–3) |
CCS Class 4 angina | 33 (3.0) |
Left ventricular ejection fraction | |
Good | 664 (60.9) |
Intermediate | 359 (32.9) |
Poor | 58 (5.3) |
Very poor | 9 (0.8) |
Recent myocardial infarction | 106 (9.7) |
Pulmonary hypertension | |
Missing | 267 (24) |
Moderate | 6 (0.5) |
Severe | 1 (0.1) |
Urgency | |
Elective | 1048 (96.1) |
Urgent | 24 (2.2) |
Emergency | 18 (1.7) |
Salvage | 0 (0) |
Weight of the intervention | |
Isolated CABG | 372 (34.1) |
Single non-CABG | 386 (35.4) |
Two procedures | 268 (24.6) |
Three procedures | 64 (5.9) |
Surgery on thoracic aorta | 85 (7.8) |
CABG: coronary artery bypass graft; CCS: Canadian Cardiovascular Society; NYHA: New York Heart Association; SD: standard deviation.
Risk stratification model . | Predicted mortality rate (% and 95% CI) . | Difference with observed mortality rate (%) . | P-value . |
---|---|---|---|
Additive EuroSCORE | 5.76 (5.56–5.96) | +2.01 | <0.001 |
Logistic EuroSCORE | 7.33 (6.62–8.04) | +3.58 | <0.001 |
ACEF score | 3.36 (2.88–3.85) | −0.39 | n.s. |
EuroSCORE II | 3.10 (2.75–3.44) | −0.65 | n.s. |
Risk stratification model . | Predicted mortality rate (% and 95% CI) . | Difference with observed mortality rate (%) . | P-value . |
---|---|---|---|
Additive EuroSCORE | 5.76 (5.56–5.96) | +2.01 | <0.001 |
Logistic EuroSCORE | 7.33 (6.62–8.04) | +3.58 | <0.001 |
ACEF score | 3.36 (2.88–3.85) | −0.39 | n.s. |
EuroSCORE II | 3.10 (2.75–3.44) | −0.65 | n.s. |
CI: confidence interval; n.s.: not significant; ACEF: the Age, Creatinine and Ejection Fraction (ACEF) score.
Risk stratification model . | Predicted mortality rate (% and 95% CI) . | Difference with observed mortality rate (%) . | P-value . |
---|---|---|---|
Additive EuroSCORE | 5.76 (5.56–5.96) | +2.01 | <0.001 |
Logistic EuroSCORE | 7.33 (6.62–8.04) | +3.58 | <0.001 |
ACEF score | 3.36 (2.88–3.85) | −0.39 | n.s. |
EuroSCORE II | 3.10 (2.75–3.44) | −0.65 | n.s. |
Risk stratification model . | Predicted mortality rate (% and 95% CI) . | Difference with observed mortality rate (%) . | P-value . |
---|---|---|---|
Additive EuroSCORE | 5.76 (5.56–5.96) | +2.01 | <0.001 |
Logistic EuroSCORE | 7.33 (6.62–8.04) | +3.58 | <0.001 |
ACEF score | 3.36 (2.88–3.85) | −0.39 | n.s. |
EuroSCORE II | 3.10 (2.75–3.44) | −0.65 | n.s. |
CI: confidence interval; n.s.: not significant; ACEF: the Age, Creatinine and Ejection Fraction (ACEF) score.
Accuracy and calibration of the different models are depicted in Table 3. The accuracy of the models is similar, ranging from an AUC of 0.78–0.81. The Hosmer–Lemeshow statistics did not demonstrate a significant overall lack of calibration of neither model. Figure 1 shows the ROC analyses for the four models considered.
Risk stratification model . | AUC (95% CI) . | Hosmer–Lemeshow statistics (P-value) . |
---|---|---|
EuroSCORE II | 0.81 (0.74–0.88) | 0.22 |
ACEF score | 0.80 (0.73–0.88) | 0.68 |
Additive EuroSCORE | 0.78 (0.69–0.86) | 0.28 |
Logistic EuroSCORE | 0.79 (0.71–0.87) | 0.30 |
Risk stratification model . | AUC (95% CI) . | Hosmer–Lemeshow statistics (P-value) . |
---|---|---|
EuroSCORE II | 0.81 (0.74–0.88) | 0.22 |
ACEF score | 0.80 (0.73–0.88) | 0.68 |
Additive EuroSCORE | 0.78 (0.69–0.86) | 0.28 |
Logistic EuroSCORE | 0.79 (0.71–0.87) | 0.30 |
AUC: area under the curve; CI: confidence interval.
Risk stratification model . | AUC (95% CI) . | Hosmer–Lemeshow statistics (P-value) . |
---|---|---|
EuroSCORE II | 0.81 (0.74–0.88) | 0.22 |
ACEF score | 0.80 (0.73–0.88) | 0.68 |
Additive EuroSCORE | 0.78 (0.69–0.86) | 0.28 |
Logistic EuroSCORE | 0.79 (0.71–0.87) | 0.30 |
Risk stratification model . | AUC (95% CI) . | Hosmer–Lemeshow statistics (P-value) . |
---|---|---|
EuroSCORE II | 0.81 (0.74–0.88) | 0.22 |
ACEF score | 0.80 (0.73–0.88) | 0.68 |
Additive EuroSCORE | 0.78 (0.69–0.86) | 0.28 |
Logistic EuroSCORE | 0.79 (0.71–0.87) | 0.30 |
AUC: area under the curve; CI: confidence interval.

ROC analysis for the EuroSCORE II and the other scores considered.
The clinical performance of the EuroSCORE II was tested in different populations of predicted mortality risk patients (Fig. 2). The overall population was divided into quintiles according to the logistic EuroSCORE, and a comparison between observed and predicted mortality according to the five models considered was made. In low-risk patients (1st quintile, predicted risk 0.5–1.9%), all the models performed well. In mild-risk patients (2nd quintile, predicted risk 2.0–3.1%), the EuroSCORE II, the logistic EuroSCORE and the ACEF score performed well, whereas the additive EuroSCORE significantly overestimated the mortality risk. In moderate-to-high-risk patients (3rd and 4th quintile, predicted risk 3.2–10%), only the EuroSCORE II and the ACEF score performed well, whereas both the additive and logistic EuroSCORE significantly overestimated the mortality risk. Finally, in very high-risk patients (5th quintile, predicted risk >10%), the logistic EuroSCORE significantly overestimated the mortality risk and the EuroSCORE II significantly underestimated it, with the other scores performing well.

The observed vs predicted mortality rate for the EuroSCORE II and the other scores considered, according to the quintile distribution. *P < 0.01; °P < 0.05. AES: additive EuroSCORE; LES: logistic EuroSCORE; ES II: EuroSCORE II.
Accuracy and clinical performance were assessed according to the type of operation, limiting the analysis to isolated coronary operations, isolated aortic valve replacement, isolated mitral valve procedure and combined operations (Table 4). The accuracy of the EuroSCORE II was modest for isolated coronary surgery (AUC 0.70), good for isolated aortic valve replacement and combined operations (AUC 0.79 and 0.77, respectively) and excellent for isolated mitral valve procedures (AUC 0.89). The observed mortality rate almost perfectly reflected the predicted value for all the different procedures.
Accuracy and clinical performance of the EuroSCORE II in different types of cardiac surgery operations
Surgical operation . | AUC (95% CI) . | Predicted mortality % (95% CI) . | Observed mortality % (95% CI) . | P-value . |
---|---|---|---|---|
Isolated CABG (n = 372) | 0.70 (0.45–0.95) | 1.9 (1.7–2.1) | 2.1 (0.6–3.6) | n.s. |
Isolated AVR (n = 206) | 0.79 (0.60–0.98) | 2.1 (1.7–2.5) | 1.9 (0–3.8) | n.s. |
Isolated mitral (n = 200) | 0.89 (0.82–0.97) | 3.8 (2.1–4.6) | 4.5 (1.6–7.3) | n.s. |
Combined operation (n = 332) | 0.77 (0.65–0.89) | 5.2 (4.6–5.9) | 5.4 (3–7.8) | n.s. |
Surgical operation . | AUC (95% CI) . | Predicted mortality % (95% CI) . | Observed mortality % (95% CI) . | P-value . |
---|---|---|---|---|
Isolated CABG (n = 372) | 0.70 (0.45–0.95) | 1.9 (1.7–2.1) | 2.1 (0.6–3.6) | n.s. |
Isolated AVR (n = 206) | 0.79 (0.60–0.98) | 2.1 (1.7–2.5) | 1.9 (0–3.8) | n.s. |
Isolated mitral (n = 200) | 0.89 (0.82–0.97) | 3.8 (2.1–4.6) | 4.5 (1.6–7.3) | n.s. |
Combined operation (n = 332) | 0.77 (0.65–0.89) | 5.2 (4.6–5.9) | 5.4 (3–7.8) | n.s. |
AUC: area under the curve; AVR: aortic valve replacement; CABG: coronary artery bypass graft; CI: confidence interval; n.s.: not significant.
Accuracy and clinical performance of the EuroSCORE II in different types of cardiac surgery operations
Surgical operation . | AUC (95% CI) . | Predicted mortality % (95% CI) . | Observed mortality % (95% CI) . | P-value . |
---|---|---|---|---|
Isolated CABG (n = 372) | 0.70 (0.45–0.95) | 1.9 (1.7–2.1) | 2.1 (0.6–3.6) | n.s. |
Isolated AVR (n = 206) | 0.79 (0.60–0.98) | 2.1 (1.7–2.5) | 1.9 (0–3.8) | n.s. |
Isolated mitral (n = 200) | 0.89 (0.82–0.97) | 3.8 (2.1–4.6) | 4.5 (1.6–7.3) | n.s. |
Combined operation (n = 332) | 0.77 (0.65–0.89) | 5.2 (4.6–5.9) | 5.4 (3–7.8) | n.s. |
Surgical operation . | AUC (95% CI) . | Predicted mortality % (95% CI) . | Observed mortality % (95% CI) . | P-value . |
---|---|---|---|---|
Isolated CABG (n = 372) | 0.70 (0.45–0.95) | 1.9 (1.7–2.1) | 2.1 (0.6–3.6) | n.s. |
Isolated AVR (n = 206) | 0.79 (0.60–0.98) | 2.1 (1.7–2.5) | 1.9 (0–3.8) | n.s. |
Isolated mitral (n = 200) | 0.89 (0.82–0.97) | 3.8 (2.1–4.6) | 4.5 (1.6–7.3) | n.s. |
Combined operation (n = 332) | 0.77 (0.65–0.89) | 5.2 (4.6–5.9) | 5.4 (3–7.8) | n.s. |
AUC: area under the curve; AVR: aortic valve replacement; CABG: coronary artery bypass graft; CI: confidence interval; n.s.: not significant.
To assess the changes introduced by the EuroSCORE II vs the logistic EuroSCORE in the risk prediction, we have performed a Bland–Altman analysis to verify the bias and the limits of agreement between the two measurements (Fig. 3). The EuroSCORE II produced a mean predicted risk that is 4.26% lower than the logistic EuroSCORE (bias), with the limits of agreement of 14.5%. The graphical analysis of the Bland–Altman relationship clearly highlighted that the best agreement is found for low-to-moderate-risk patients (predicted mortality <10%), with a few outliers where the EuroSCORE II was more than 10% higher than the logistic EuroSCORE. Conversely, in patients at a high or very high risk, the agreement is lower, and there are a number of outliers where the EuroSCORE II produced a risk prediction from 20% down to 60% lower than the logistic EuroSCORE.

The Bland–Altman analysis of the EuroSCORE II vs the logistic EuroSCORE.
COMMENT
The main results of our study are that (i) the EuroSCORE II has an accuracy similar to the old versions of the EuroSCORE and to the ACEF score; (ii) it has a much better calibration than the previous versions and (iii) there is the possibility that in very high-risk patients, the EuroSCORE II may actually underestimate the 30-days mortality risk.
The finding that the EuroSCORE II does not have a better accuracy than the previous models is not surprising. The AUC (c-statistics) values that we could find in this study for all the tested models are in the range of 0.78–0.81, therefore, confirming a good accuracy of all the models, as already demonstrated in other studies, where the c-statistics for the logistic EuroSCORE ranged between 0.74 [8, 13] and 0.83 [9, 15, 21, 22]. It is likely that with the currently used logistic regression analysis-based predictive models, this is the best possible accuracy in predicting 30-day mortality. As a matter of fact, the logistic models discharge some possible important mortality risk factors that are rare (prevalence 0.1–0.5%) but often almost lethal. Severe liver cirrhosis has a prevalence that is no higher than 0.5% in the cardiac surgical population [23], but the mortality rates in Child-Pugh class A, B and C are 5.2, 35.4 and 70%, respectively [24]. Nevertheless, liver cirrhosis is not included in the existing scores, as well as other rare risk factors (frailty, oxygen-dependent chronic obstructive pulmonary disease and others). It is likely that until different statistical approaches allow for inclusion of these neglected extreme risk factors, the predictive models will never exceed the current accuracy level.
The clinical performance of the EuroSCORE II is certainly far better than the one of the previous versions. In particular, the well-known risk overestimation of both the additive and the logistic EuroSCORE has now been corrected. As a mean, the predicted mortality rate in the overall population according to the EuroSCORE II is about half the one predicted according to the additive EuroSCORE and less than half the one predicted by the logistic EuroSCORE. The predicted mortality is not significantly different from the ACEF score prediction, therefore confirming that both the scores may be useful, accurate and now even calibrated in the setting of risk stratification in cardiac surgery.
However, the subgroup analysis and the Bland–Altman comparison between the logistic EuroSCORE and the EuroSCORE II highlights the possibility that this new EuroSCORE II may actually underestimate the mortality risk in very high-risk patients. As a matter of fact, in our patient population, the very high-risk patients (observed mortality 11%) were still overestimated by the logistic EuroSCORE (predicted mortality 20%), correctly estimated by the ACEF score (predicted mortality 7.5%) and slightly but significantly underestimated by the EuroSCORE II (predicted mortality 6.5%).
This observation is confirmed by the Bland–Altman analysis, with the evidence of ∼50 patients with a mean predicted mortality risk >10% who were outliers in the direction of increased predictive risk negative difference, with six patients were the difference exceeded 50%.
In conclusion, we think that the effort of the EuroSCORE inventors to update their risk model was worthwhile and produced a viable score with a good clinical performance, correcting the well-known limits of the previous versions. It is possible that in their attempt to correct the well-recognized risk overestimation, the EuroSCORE II may actually not fit very well with very high-risk patients’ identification. This last hypothesis should be verified in other external and maybe larger series.
The main limitation of the EuroSCORE II is the same (old) problem that was identified for almost all the existing risk models. First of all, it is questionable if we should still stick to the definition of ‘operative mortality’ limited to 30-days observation. As a matter of fact, patients go on dying at least for 2–3 months after the operation [25]. We think that future models should extend the observation time for operative-related mortality, and may include other indicators of quality of life, inclusive of the presence of severe disabling conditions.
The EuroSCORE II as well as the other existing risk models is an important educational tool and a useful guide for discussing the operative risk with the patients. However, it cannot totally replace the clinical judgment that is based on the observation and evaluation of many other clinical signs and symptoms that are not included in the model.
The main limitation of our study is the relatively low sample size, which generated small groups for subgroup analysis. Therefore, we think that a larger validation study should be guaranteed especially for the analysis of the marginal population represented by very low- and very high-risk patients. This particularly applies to high-risk patients undergoing aortic valve replacement, where new data for the correct allocation of patients to conventional surgery or transcatheter procedures may be obtained. Additional limitations are a considerable number of missing pulmonary hypertension data and the non-routine measurement of frailty in our patient series.
Further attempts to improve the predictive accuracy of the model deserve to be undertaken, by including rare predictors of operative mortality, in order to particularly address the category of very high-risk patients.
Funding
This study was funded by the local research fund of IRCCS Policlinico S. Donato.
Conflict of interest: none declared.