Eur J Cardiothorac Surg 2002;21:199-204
© 2002 Elsevier Science NL
Inaccuracy of four coronary surgery risk-adjusted models to predict mortality in individual patients
P. Pinna-Pintora*,
M. Bobbioa,
S. Colangeloa,
F. Vegliab,
M. Giammariaa,
D. Çunia,
F. Maisanoa,
O. Alfieria
a Arturo Pinna Pintor Foundation, Via A. Vespucci 61, 10129 Turin, Italy
b I.S.I. (Institute for Scientific Exchange), Turin, Italy
Received 31 October 2000;
received in revised form 21 September 2001;
accepted 21 November 2001.
* Corresponding author. Tel.: +39-011-593911; fax: +39-011-5683893
e-mail: fappto{at}tin.it
 |
Abstract
|
|---|
Objectives: This study was undertaken to evaluate the accuracy of four different risk-adjusted models in predicting mortality in individual patients who are undergoing coronary artery by-pass graft surgery. In the last decade several models to stratify patients before open heart surgery, according to factors affecting mortality, were developed with the aim of retrospectively comparing outcomes of open heart surgery, based on reliable stratification of case-mix, and of prospectively identifying high risk patients as a basis for a meaningful informed consent for patients counseling. Methods: The pre-operative risk of death was calculated with four different models in 418 consecutive patients who underwent coronary artery by-pass surgery and then compared with the actual outcome. To discriminate patients with favorable and unfavorable outcome, the logistic regression analysis and the areas under the receiver-operating-characteristic curves were applied. The accuracy score was used to evaluate the reliability of each score to predict the individual outcome. Results: Seven deaths (1.7%) were observed within 30 days from the operation, and the overall incidence was similar to that predicted by all models. Only the NBI score was not able to discriminate survivors from patients who will die, and the areas under the curves were 0.596 for the Parsonnet score, 0.861 for the Cleveland Clinic Foundation score, 0.823 for the French score, and 0.806 for the EuroSCORE. The four models were highly accurate (between 0.97 and 0.98) to predict the overall mortality. In seven patients who died the mean predictive scores were very low and ranged between 2.1 and 4.6, but were significantly higher than those of patients who survived (between 1.1 and 2.2). Conclusions: The four pre-surgical predictive models were similarly able to discriminate favorable vs. unfavorable outcomes and highly accurate to predict overall mortality, but very inaccurate to predict mortality in individual patients.
Key Words: Predictive models Coronary artery by-pass graft surgery Risk-adjusted mortality
 |
1. Introduction
|
|---|
Operative mortality is widely used as an indicator of the quality of surgery. However, the comparison of different institutions or surgeons or of different time periods in the same institution may be misleading if the mortality is not adjusted by patients' characteristics that can adversely affect survival during and early after surgery. In the last 10 years, a growing interest was focused on coronary artery by-pass graft (CABG) surgery by surgeons, epidemiologists, hospital managers, policy makers, and third paying parties because of the high incidence of procedures, increasing expenses, and the relatively homogeneous indication for surgery and standardized procedures.
During the last decade several models [113] to stratify patients before open heart surgery according to factors affecting mortality were developed with a double aim: (1) to retrospectively compare outcomes of open heart surgery based on reliable stratification of case-mix; (2) to prospectively identify high risk patients as a basis for a meaningful informed consent for patients counseling. In three previous studies, different models were compared in a surgical population different from the one where the model was developed, to assess which model better fits in different clinical settings [1416]. The overall conclusion is that the tested models are generally accurate and perform a useful service even if generalizability to different hospitals or health systems cannot be warranted. Local models work better in the same institution where they are developed but cannot be used to make equitable and fair comparisons among different institutions. The attention of investigators was focused on using models to compare institutions, but not in predicting mortality for counseling purposes.
Aim of this study was to evaluate the accuracy of four different risk-adjusted models in predicting mortality in individual patients who are undergoing CABG surgery.
 |
2. Materials and methods
|
|---|
The present study includes 418 consecutive patients who underwent coronary artery by-pass surgery in our institution between January 1993 and December 1994. From the beginning of the surgical activity in 1984, standard demographic, clinical, angiographic, and operative variables of surgical patients were prospectively collected in a computerized database. A physician not of the surgical team and unaware of the result of the operation, calculated the pre-operative risk of death following the models published by Parsonnet et al. [1] and the following update [17] by Higgins et al. [2], by Roques et al. [3], and by Nashef et al. [5]. Two of those were developed in the United States [1,2] and two in Europe [3,5]. Our institution contributed to the database of the EuroSCORE [6] including the patients operated during the period SeptemberNovember 1995, so that the study population is not included in the database of the EuroSCORE.
2.1. Accuracy
The degree of the predictive accuracy was quantified, in each patient and for each model, by the comparison of the predicted value and the outcome of the operation, using the Shannon index (S), which is a method based on the theory of information [1819]:
where o is the presence (=1) or the absence (=0) of the observed event (in our case the post-operative death), and e is the estimated prediction obtained with each model for the individual patient. The accuracy index ranges from 0 to 1, the latter being the perfect prediction.
2.2. Area under the receiver-operating-characteristic (ROC) curve
The ROC curves were initially applied for comparing results of different tests with 1-specificity plotted on the x-axis against sensitivity plotted on the y-axis and subsequently to evaluate prognostic indexes, because it is possible to depict the relationship between a continuous variable (test result or prognostic variable) with a dichotomous endpoint (the gold standard or the clinical outcome). The curve represents the relationship between a test's true positive rate and false positive rate as the discriminant threshold is changed [20].
2.3. Statistical analysis
The performances of the four models were compared using two different approaches. The first one consisted in measuring the ability of each score to discriminate patients with favorable and unfavorable outcome. We applied logistic regression and used the likelihood ratio test to assess the significance of the differences between the models. The area under the ROC curve and the proportion of correctly classified subjects were also computed for each model, but the number of observed events did not grant enough power for a statistical comparison of these two measures. The scores were also compared between survivors and dead patients by the non-parametric Wilcoxon Rank Sum test.
The second approach consisted in comparing the accuracy of the four models in predicting the actual probability of death (calibration). For each model, the patients were sorted according to their predicted mortality and divided into five similarly sized groups and the average score and the observed percent mortality were measured in each quintile. When predicted mortality is plotted vs. observed mortality for each quintile, if a model is well calibrated all the points will lie near the identity line, while over- and under-estimates can be easily visualized.
 |
3. Results
|
|---|
3.1. Population
Pre-operative, intra-operative, and post-operative characteristics of our population are listed in Table 1. In the studied population of 418 patients we observed seven deaths within 30 days from the operation (1.7%).
3.2. Performance of the four models
By logistic regression analysis, in our sample all the scores were significantly associated with the risk of death, with the exception of the NBI score (P=0.10), in spite of the limited power provided by the seven events that occurred. When the likelihood ratios were compared between the models, only the difference between the CCF and the NBI models (those with the highest and the lowest association, respectively) reached statistical significance (Chi square=-8.212 with two degrees of freedom, P=0.016). The ROC curves for the four models are shown in Fig. 1
. The areas under the curves were 0.60 for the NBI score, 0.86 for the CCF score, 0.82 for the French score, and 0.81 for the EuroSCORE. The percent of correctly classified patients was 48 for the NBI score, 82 for the CCF score, 79 for the French score, and 76 for the EuroSCORE (Table 2). The Wilcoxon Rank Sum test yielded very similar results: while the NBI scores were not significantly different in survivors vs. dead patients (P=0.45), the difference was significant for CCF score (P=0.0015), French score (P=0.0006), and EuroSCORE (P=0.005).

View larger version (30K):
[in this window]
[in a new window]
|
Fig. 1. ROC curves of the four models. Panel (a): NBI score. Panel (b): Cleveland score. Panel (c): French score. Panel (d): EuroSCORE.
|
|
View this table:
[in this window]
[in a new window]
|
Table 2. Observed and estimated mortality likelihood ratios according to the logistic regression analysis, area under the ROC curves, numbers of patients correctly classified, and accuracy score of the four models
|
|
3.3. Accuracy
According to the Shannon accuracy index (Table 2) the overall mortality accuracy of each model was similar and very high (between 0.970 and 0.978). Fig. 2
shows the accuracy of the four models when applied to our sample. For each model, five points are reported, corresponding to the observed (x-axis) and predicted (y-axis) mortality in the five quintiles of the sample. In this figure, the diagonal line represents a perfect agreement; points above the diagonal indicate an over-estimate, while points below indicate an underestimate. From a visual analysis the French score (open squares) was the one with the best relationship between observed vs. predicted mortality for each quintile and the NBI score (close squares) poorly performed in the second quintile. The CCF score (open triangles) underscored and the EuroSCORE (close circles) over-scored the observed mortality.

View larger version (29K):
[in this window]
[in a new window]
|
Fig. 2. Comparison of observed vs. predicted mortality of four models in the five quintiles of the sample. The diagonal line represents a perfect agreement. Point above the diagonal indicates over-estimation.
|
|
3.4. Prediction mortality in single patient
In Table 3 the scores and the accuracy indexes obtained with each model are shown for the seven patients who died within 30 days after the operation and globally for the patients who survived. In the patients who died the scores ranged between 0 and 6.05 for the NBI score, from 1 to 9 for the CCF score, from 2 to 7 for the French score, and from 3 to 6 for the EuroSCORE. The mean scores were, respectively, 2.1±2.6, 4.2±2.5, 4.1±1.6, and 4.6±1.0. These scores were significantly higher than those of the patients who survived (1.1±1.5, 1.3±1.7, 1.7±2.2, and 2.2±1.9, respectively).
View this table:
[in this window]
[in a new window]
|
Table 3. Scores and accuracy indexes for the patients who died within 30 days from the operation and for those who survived
|
|
The predictive accuracy of patients who died was extremely low: from 0 to 0.16 for the NBI score, from 0.04 to 0.26 for the CCF score, from 0.04 to 0.26 for the French score, and from 0.04 to 0.16 for the EuroSCORE. The mean accuracy indexes were, respectively, 0.15±0.07, 0.10±0.05, 0.07±0.04, 0.09±0.05 and they were significantly lower than those of patients who survived (0.99±0.001 for all the models).
 |
4. Discussion
|
|---|
This study was designed to address an important issue: how risk stratification models can predict post-CABG mortality in individual patients. The issue is clinically relevant because those models were developed with the aim to compare quality in different institutions regardless the severity of patients undergoing heart surgery and to give documented information of surgical risk of the patient referred to open heart surgery. According to Loop [21], appropriate models may help physicians and patients to make easier the discussion about an individual patient's operation risk and benefit. However, to our knowledge non-specific analyses were performed to validate those models in the single patient.
In our study, we demonstrated that four different risk stratification models were all (except for the NBI) similarly discriminant to predict the risk of death in an independent population, highly accurate to predict survival, but they were extremely inaccurate to predict mortality in the single patient who will die. It is noteworthy that accuracy and discriminant power can be fairly independent, since theoretically a model that soundly over- or under-estimates the probability of death can still be efficient in discriminating the patients who will die from those who will survive. Globally, the estimates of mortality obtained with all the models are not very far from the observed mortality in our institution (1.7%) but they are very different for the patients who died. For those patients the mean estimated risk was 2.1±2.6 with the NBI score, 4.2±2.5 with the CCF score, 4.1±1.6 with the French score, and 4.6±1.0 with the EuroSCORE. Thus, each model was very far from actual outcome that is equal to 100% for those who die.
This lack of prediction is not dependent on the single model, on the variables included, or on the statistical analysis adopted, but it is an intrinsic fault of any attempt to predict a low rate event [22]. According to the Bayes theorem, positive predictive value is very low when the disease prevalence is low, even if the discriminatory test has a high sensitivity and specificity. It was shown that the enzyme linked immunosorbent assay (ELISA) test has a specificity of 99.99% [23] and we can assume a sensitivity equal to 100%. Both parameters are unusually high for any medical test and, according to our usual standard, the test should be considered perfect. However, if we apply the test to a population at a very low risk of human immunodeficiency virus (HIV) an unexpected high number of false positive tests occurs. When ELISA is tested in population with 1/100,000 prevalence of HIV we should expect ten false positive tests and only one true positive test, with a positive predictive value equal to 10% (only 10% of subjects with a positive test should be expected to have acquired immune deficiency syndrome (AIDS)).
In the case of CABG, surgical mortality is almost unpredictable in the single patient; in fact, if a model would predict a 100% chance of dying during the operation no one would accept to be referred to surgery and the surgeon would be ethically constrained to perform the intervention. All models consider high risk a probability of death around 15%. This means that mortality in a single patient is never foreseeable. Thus, in case of individual prediction any risk stratification model could be used to predict survival but not mortality. In any case, pre-surgical risk stratification models will be useful to compare quality in different centers [24] and to assess costs related to patients severity [25,26] in order to plan the activity according to case-mix. What cardiologists and cardiac surgeons should keep in mind, when they use pre-surgical predictive models at bed site in order to provide the patient with an estimate of surgical risk, is that they assign a reliable probability of death of a population and not for the actual patient. They can only estimate the overall risk of a subgroup of patients with the same risk profile but not the real outcome.
Our study has several limitations. First of all, we compared only four of the 12 available models. However, it is very unlikely that one of the non-tested models could be far more accurate in predicting mortality in the single patient because of the general considerations discussed above. According to Daley [27], who performed a comparative analysis of different risk-adjusted models, the systems have more similarities than differences and identify many of the same patient's characteristics predictive of a higher likelihood of mortality. Thus, even if the model analysis was not complete, we are confident of the overall conclusions of the study. Second, the studied population is small compared with those used to derive the models. The low discriminant power of the NBI score should be ascribed to the chance that two dead patients were classified with low score. However, we do not think that our analysis would produce different results if done on a wider number of patients. Third, our study population was collected in only one institution and this limitation could have adversely reduced the variety of patient conditions, characteristics, and surgical expertise. A wider spectrum of patients would have affected the overall comparison between the models, but not the message that models cannot be used to predict mortality in single patients. Fourth, the overall 30 day mortality observed in our institution was only 1.7%, so that we were able to compare the four models with a low risk population.
In ancient Greek, the Delphi Oracle used to declare sentences that could be oppositely interpreted according to the position of a comma. The most famous sentence is the verdict given to a soldier who asked the Oracle if he would come back from the war: Come back you won't die in war. The meaning of the sentence completely changes if you put the comma after back of after won't. After more than 2000 years with more sophisticated statistical methods, wider databases, and fast computation instruments, we still cannot tell whether the patient will survive or die a surgical operation.
 |
Acknowledgments
|
|---|
The authors are grateful to Ms Piera Colonna and Ms Maddalena Caviglia for their skilful secretarial assistance.
 |
References
|
|---|
-
Parsonnet V., Dean D., Bernstein A.D. A method of uniform stratification of risk for evaluating the results of surgery in acquired adult heart disease. Circulation 1989;79(Suppl 1):3-12.
-
Higgins T.L., Estefanous F.G., Loop F., Beck G.J., Blum J.M., Paranandi L. Stratification of morbidity and mortality outcome by preoperative risk factors in coronary artery bypass patients. A clinical severity score. J Am Med Assoc 1992;267:2344-2348.[Abstract/Free Full Text]
-
Roques F., Gabrielle F., Michel P., De Vincentiis C., David M., Baudet E. Quality of care in adult heart surgery: proposal for a self-assessment approach based on a French multicenter study. Eur J Cardiothorac Surg 1995;9:433-439.[Abstract]
-
The Society of Thoracic Surgeons. National Cardiac Surgery Database Manual for data managers. Minneapolis: Summit Medical System; August 1993.
-
Nashef S.A.M., Roques F., Michel P., Gauducheau E., Lemeshow S., Salomon on the EuroSCORE study group. European system for cardiac operative risk evaluation (EuroSCORE). Eur J Cardiothorac Surg 1999;16:9-13.[Abstract/Free Full Text]
-
Roques F., Nashef S.A.M., Michel P., Pinna Pintor P., David M., Baudet E. The EuroSCORE Study Group: Does EuroSCORE work in individual European countries?. Eur J Cardiothorac Surg 2000;18:27-30.[Abstract/Free Full Text]
-
Hannan E.L., Kilburn H., Jr, O'Donnel J.F., Lukacic G., Shields E.P. Adult open heart surgery in New York State. An analysis of risk factors and hospital mortality rates. J Am Med Assoc 1990;264:2768-2774.[Abstract/Free Full Text]
-
The Pennsylvania Health Care Cost Containment Council. A consumer guide to coronary artery bypass graft surgery, vol. IV; June 1995.
-
O'Connor G.T., Plume S.K., Olmstead E.M. Multivariate prediction of in-hospital mortality associated with coronary artery bypass graft surgery. Circulation 1992;85:2110-2118.[Abstract/Free Full Text]
-
Tuman K.J., McCarthy R.J., March R.J., Najafi H., Ivankovich A.D. Morbidity and duration of intensive care unity stay after cardiac surgery. A model for preoperative risk assessment. Chest 1992;102:36-44.[Abstract/Free Full Text]
-
Geraci J.M., Rosen A.K., Ash A.S., McNiff K.J., Moskowitz M.A. Predicting the occurrence of adverse events after coronary artery bypass surgery. Ann Intern Med 1993;118:18-24.[Abstract/Free Full Text]
-
Tu J.V., Jaglal S.B., Naylor C.D., Steering Committee of the Provincial Adult Cardiac Care Network of Ontario. Multicenter validation of a risk index for mortality, intensive care stay, and overall hospital length of stay after cardiac surgery. Circulation 1995;91:677-684.[Abstract/Free Full Text]
-
Staat P., Cucherat M., Georget M., Lehot J.J., Jegaden O., André-Fouet X., Beaune J. Severe morbidity after coronary artery surgery: development and validation of a simple predictive clinical score. Eur Heart J 1999;20:960-966.[Abstract/Free Full Text]
-
Orr R.K., Mainni B.S., Sottile F.D., Dumas E.M., O'Mara P. A comparison of four severity-adjusted models to predict mortality after coronary artery bypass graft surgery. Arch Surg 1995;130:301-306.[Abstract/Free Full Text]
-
Bridgewater B., Neve H., Moat N., Hooper T., Jones M. Predicting operative risk for coronary artery surgery in the United Kingdom: a comparison of various risk prediction algorithms. Heart 1998;79:350-355.[Abstract/Free Full Text]
-
Pliam M.B., Shaw R.E., Zapolanski A. Comparative analysis of coronary surgery risk stratification models. J Invasive Cardiol 1997;9:203-222.[Medline]
-
Parsonnet V., Bernstein A.D. Development of practical methods for estimating preoperative risk of mortality in cardiac surgery: the new Jersey experience. Circulation 1996;94:I-507.
-
Shannon E.C., Weave W. The mathematical theory of communication. Chicago, IL: University of Illinois Press, 1949.
-
Bobbio M., Deorsola A., Pistis G., Brusca A., Diamond G.A. Physician perception of exercise electrocardiography as a prognostic test after acute myocardial infarction. Am J Cardiol 1988;62:675-678.[Medline]
-
Metz C.E. Basic principles of ROC analysis. Semin Nucl Med 1978;8:283-298.[Medline]
-
Loop F.D., Berrettoni J.N., Pichard A., Siegel W., Razavi M., Effler D.D. Selection of the candidate for myocardial revascularization: a profile of high risk based on multivariate analysis. J Thorac Surg 1975;69:40-51.[Abstract]
-
Bobbio M., Pollock B.H., Cohen I., Diamond G.A. Comparative accuracy of clinical tests for diagnosis and prognosis of coronary artery disease. Am J Cardiol 1988;62:896-900.[Medline]
-
Burke D.S., Brundage J.F., Redfield R.R., Damato J.J., Schable C.A., Putman P., Visintine R., Kim H.I. Measurement of the false positive rate in a screening program for human immunodeficiency virus infection. N Engl J Med 1988;319:961-964.[Abstract]
-
Pinna Pintor P., Bobbio M., Sandrelli L., Giammaria M., Patanè F., Bartolozzi S., Bergandi G., Alfieri O. Risk stratification for open heart operations: comparison of centers regardless of the influence of the surgical team. Ann Thorac Surg 1997;64:410-413.[Abstract/Free Full Text]
-
Pinna Pintor P., Giammaria M., Alfieri O., Bobbio M. Pre-surgical risk stratification to predict hospital charges of coronary artery bypass surgery. Circulation 1996;94(Suppl I):I-168.
-
Smith P.K., Smith L.R., Muhlbaier L.H. Risk stratification for adverse economic outcomes in cardiac surgery. Ann Thorac Surg 1997;64:S61-S63.
-
Daley J. Criteria by which to evaluate risk-adjusted outcomes programs in cardiac surgery. Ann Thorac Surg 1994;58:1827-1835.[Abstract]
This article has been cited by other articles:

|
 |

|
 |
 
A. Zajarias and A. G. Cribier
Outcomes and Safety of Percutaneous Aortic Valve Replacement
J. Am. Coll. Cardiol.,
May 19, 2009;
53(20):
1829 - 1836.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
J. Granton and D. Cheng
Risk Stratification Models for Cardiac Surgery
Seminars in Cardiothoracic and Vascular Anesthesia,
September 1, 2008;
12(3):
167 - 174.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
P. D'Errigo, F. Seccareccia, S. Rosato, V. Manno, G. Badoni, D. Fusco, C. A. Perucci, and the Research Group of the Italian CABG Outcome Pro
Comparison between an empirically derived model and the EuroSCORE system in the evaluation of hospital performance: the example of the Italian CABG Outcome Project
Eur. J. Cardiothorac. Surg.,
March 1, 2008;
33(3):
325 - 333.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
J. Nilsson, M. Ohlsson, L. Thulin, P. Hoglund, S. A.M. Nashef, and J. Brandt
Risk factor identification and mortality prediction in cardiac surgery using artificial neural networks
J. Thorac. Cardiovasc. Surg.,
July 1, 2006;
132(1):
12 - 19.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
J. Nilsson, L. Algotsson, P. Hoglund, C. Luhrs, and J. Brandt
Comparison of 19 pre-operative risk stratification models in open-heart surgery
Eur. Heart J.,
April 1, 2006;
27(7):
867 - 874.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
H. Rexius, G. Brandrup-Wognsen, J. Nilsson, A. Oden, and A. Jeppsson
A Simple Score to Assess Mortality Risk in Patients Waiting for Coronary Artery Bypass Grafting
Ann. Thorac. Surg.,
February 1, 2006;
81(2):
577 - 582.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
D. M. Shahian, E. H. Blackstone, F. H. Edwards, F. L. Grover, G. L. Grunkemeier, D. C. Naftel, S. A.M. Nashef, W. C. Nugent, and E. D. Peterson
Cardiac Surgery Risk Models: A Position Article
Ann. Thorac. Surg.,
November 1, 2004;
78(5):
1868 - 1877.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
F. Biancari, J. Lahtinen, S. Lepojarvi, P. Rainio, E. Salmela, R. Pokela, M. Lepojarvi, J. Satta, and T. S. Juvonen
Preoperative C-reactive protein and outcome after coronary artery bypass surgery
Ann. Thorac. Surg.,
December 1, 2003;
76(6):
2007 - 2012.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
G. Varela, N. Novoa, M.F. Jimenez, and G. Santos
Applicability of logistic regression (LR) risk modelling to decision making in lung cancer resection
Interactive CardioVascular and Thoracic Surgery,
March 1, 2003;
2(1):
12 - 15.
[Abstract]
[Full Text]
[PDF]
|
 |
|