|
|
||||||||
Eur J Cardiothorac Surg 2003;23:935-942
© 2003 Elsevier Science NL
a Imperial College School of Medicine at Hammersmith Hospital, London, UK
b Imperial College School of Medicine at Harefield Hospital, London, UK
c Department of Statistical Science, University College, London, UK
Received 13 November 2002; received in revised form 22 February 2003; accepted 12 March 2003.
* Corresponding author. Cardiothoracic Department, St George's Hospital, Blackshaw Road, London SW17 0QT, UK. Tel. +44-20-8725-3565
e-mail: geoasi{at}hotmail.com
| Abstract |
|---|
|
|
|---|
Key Words: Risk stratification Mortality Cardiac surgery Clinical aims
| 1. Introduction |
|---|
|
|
|---|
The Parsonnet system was developed in the US and was one of the first systems for predicting risk in cardiac surgery [2]. It is widely used in the UK although it has been criticised for including subjective variables [11]. The EuroSCORE is similar in concept to the Parsonnet score and was developed using data from 128 European cardiac surgical centres [8]. Recently the American College of Cardiology/American Heart Association Task Force revised their guidelines for Coronary Artery Bypass Graft Surgery, including a system for prediction of outcome after isolated coronary artery bypass grafting (CABG) surgery [10]. All three of these risk scores use simple, additive scoring systems.
The Society of Cardiothoracic Surgeons of Great Britain and Ireland (SCTS) have proposed the use of a Bayes Model for CABG patients in the UK [12]. More recently the Society developed a new complex Bayes model and a new simple Bayes model [11]. The nine factor complex Bayes model is a subset of the old model while the five factor simple Bays model is derived from the complex model. These models are designed to automatically handle missing values for risk factors by effectively assigning an average risk to the missing factor or category. Hence risk scores can be calculated for all patients. The risk factors included in the six models described above are outlined in Table 1.
|
| 2. Patients and methods |
|---|
|
|
|---|
Scores for the different risk stratification models were calculated retrospectively using data of the 5471 patients who underwent isolated CABG at the participating hospitals between January 1993 and December 1999. The 13 patients who underwent salvage procedure were not included. Salvage patients were excluded because of the strongly subjective element in applying the Parsonnet score in this special group of patients. Patients undergoing other concomitant procedures, such as valve replacement, were also excluded from this study. All procedures were carried out with cardiopulmonary bypass. The outcome measurement considered in the analysis was hospital mortality, defined as death occurring before hospital discharge. Predicted hospital mortality was calculated separately for the Parsonnet system, ACC/AHA system, EuroSCORE, old UK Bayes, complex Bayes and simple Bayes systems, using the risk scoring systems suggested in the original publications of these risk models.
2.1. Statistical analysis
Calibration signifies the degree of correspondence between the actual mortality and the mortality predicted by each risk model. We evaluated the predictive accuracy of the risk stratification systems for both institutional comparisons and patient evaluation. For the former we considered the agreement between the total observed mortality and the total predicted mortality. A 95% reference interval was constructed around the total predicted mortality and we considered whether the total observed mortality lay within this. The total predicted mortality is the sum of all the individual patient predictions (when expressed as probabilities) and the interval takes into account the uncertainty in this total (since the predictions are probabilities, not certainties).
To evaluate the risk stratification systems at the patient level (that is, whether it can be used to predict mortality for an individual patient) we used the Hosmer-Lemeshow (H-L) test [13]. This test evaluates the correspondence between observed and predicted mortality within a number of risk groups. The smaller the value of the H-L test statistic the better the calibration. A P-value of <0.05 indicates that statistically the model is significantly a bad fit to the data and is not predicting the risk of mortality accurately. To carry out the H-L test, the patients were split into six clinical risk groups, based on preoperative predicted mortality (<1, 12, 23, 35, 510 and >10%).
Discrimination is the ability of the system to discriminate between patients who will die in the hospital following surgery and patients who will survive. Discrimination was assessed using the receiver-operating-characteristic (ROC) curve area. Values of 0.5 indicate that model cannot discriminate better than chance and 1 indicates perfect discrimination.
All analysis was carried out using the statistical software Stata 7 (Stata Corporation, USA) [14].
| 3. Results |
|---|
|
|
|---|
Data allowing the calculation of risk scores for the Parsonnet, EuroSCORE and the ACC/AHA models were not available for all patients. We were only able to calculate the Parsonnet score for 4439 patients mainly because the two risk factors body mass index and recently failed intervention used in this model had missing values. As suggested by the SCTS [11] we did not use the subjective risk factors catastrophic states and other rare circumstances that were included in the original Parsonnet model. We were able to calculate the EuroSCORE for 4654 patients. We did not have pulmonary hypertension so the effect of this was not incorporated into the calculation of the score. We calculated the ACC/AHA score for 4753 patients, as information on operative priority required by this model was occasionally missing. We investigated the available pre-operative characteristics of patients who had some risk factors missing. They were similar to those of the other patients in the study.
3.1. Calibration
3.1.1. Overall mortality
Table 2 shows the total number of observed deaths alongside the total number of predicted deaths and the corresponding 95% reference interval. EuroSCORE and the simple Bayes model predict overall mortality reasonably well with the observed totals lying inside the reference intervals. The observed total lies right on the edge of the reference interval from the complex Bayes model. The other three models are not accurate with the ACC/AHA score grossly underestimating the risk of mortality and the Parsonnet model overestimating mortality.
|
|
|
|
| 4. Discussion |
|---|
|
|
|---|
Risk stratification models have been criticised for reduced applicability when used in different patient populations to the ones they were formulated on. Models developed in the US, for instance, may not predict satisfactorily clinical outcome in European patient populations. Developing a statistical model that predicts risk of death after cardiac surgery with a high degree of accuracy, is desirable for various reasons. It facilitates meaningful informed consent of the patient and contributes to the decision whether the potential benefit of surgery for a particular individual outweighs the potential risk. This knowledge assists in defining the indication for surgery, improves communication between the physician and the patient and ultimately improves patient care. At a collective level, analysis of patient outcome in relation to predicted risk allows individual surgeons and institutions to evaluate their results and compare them to others. Meaningful audit allows evaluation of outcome and is likely to protect clinicians from medico-legal litigation. Furthermore, by estimating which patient will benefit from surgery, risk prediction contributes to better allocation of resources.
This study evaluates six existing risk stratification systems for CABG surgery with respect to three specific clinical aims: (a) to compare overall institutional performance; (b) to provide patient advice; and (c) to manage treatment. We use data from 5471 patients operated on between 1993 and 1999 at two British cardiac surgical centres.
The Parsonnet, EuroSCORE, and ACC/AHA models present simple risk scores that can be used easily for patient consultation by facilitating rapid prediction of risk. The Bayes models on the other hand, require more complex calculation of risk scores, usually performed by computers.
Several studies have validated the Parsonnet model using independent data [36,11,15]. The Parsonnet model has been shown to over-estimate mortality [3,5,6,15] and this is the case in our study. This may be attributed to the methodology used to develop the score and this has been criticised [16,17]. The EuroSCORE predicts overall mortality reasonably well in our study and this has also been demonstrated in other studies [9,18]. The score also made good predictions in our study at the patient level for low to medium-risk patients (97% of the sample). We were unable to include pulmonary hypertension in the risk score calculation since this measurement is rarely available for patients undergoing isolated CABG in the UK. This could be partially responsible for this model predicting poorly for the high risk group of patients. The ACC/AHA model predicts risk of mortality as well as stroke and medistinitis, and the small number of variables make it considerably simpler than the EuroSCORE and Parsonnet models. However our study suggests that it grossly under-estimates mortality. There are no published studies to date that that have carried out a validation of this scoring system using external data. The Bayes models have been validated using patients from the same centres, undergoing surgery at similar time periods as that of the patients used in the model development process and have been shown to perform reasonably well [11]. However no other published studies have carried a validation of these models on the basis of completely independent data. All the Bayes systems under-estimated mortality for low to medium risk patients in our study although the simple Bayes model predicted overall mortality reasonably well.
At the patient level, the calibration results (Fig. 2 and Table 3) suggest that risk models should be used with caution when informing a patient about their chance of dying in hospital following surgery. On the basis of the findings from this study, none of the risk models can be completely relied on to give accurate information to patients although EuroSCORE is seems accurate for all but high-risk patients.
Five of the models achieved similar ROC areas in our study (0.760.77). The ROC area for Parsonnet was considerably lower (0.73). These ROC areas are comparable with those of the UK National Adult Cardiac Surgical Database [11], 0.71 for Parsonnet, 0.75 for EuroSCORE, 0.74 for simple Bayes and 0.75 for complex Bayes, while no externally calculated ROC area has been published previously for the ACC/AHA system. When assessed at different single centres, the ROC area for Parsonnet system varied from 0.65 to 0.85 [36,9], while the EuroSCORE produced ROC areas of 0.750.78 in previous studies [9,18].
In this study we have concentrated on short-term mortality. However this may not by itself be an adequate indicator of quality of care or resource use. Morbidity being more common than mortality may be more informative and can be measured in terms of post-operative complications and length of stay in hospital. Long-term mortality, which may be a more useful outcome, is rarely assessed probably because of the difficulty in following patients over long periods of time [19]. However in countries, such as the UK, with good death registration systems this is achievable and should be a priority for future research in risk modelling.
We have considered the performance of the risk scores but have not commented on the methodology used to develop them. This was beyond the scope of this paper although we do discuss the issues elsewhere [20]. Also we have not commented on the ability of the risk systems to be used for comparing surgeons. This is because we do not have information regarding the surgeons involved. However we believe that if a model is well-calibrated at the patient level then it should be suitable for this purpose. However, more research is required to demonstrate this.
In summary, this study evaluates six risk stratification models in cardiac surgical patients on a completely new patient sample from cardiac centres in the UK. Good calibration at the institutional level is essential for performing risk-stratified comparisons between institutions. In our study two of the models, EuroSCORE and simple Bayes, predicted the overall level of mortality in these data reasonably well (Table 2). These results suggest that these scores may be appropriate for producing case-mix adjusted league tables to assess institutional performances. However we would need to see similar findings from other institutions before drawing a firm conclusion. None of the six models excelled at producing good mortality estimates for an individual patient. However EuroSCORE proved to be accurate for low to medium risk patients which is the majority of patients. We note that the SCTS [11] comment on the difficulty of producing accurate predictions for high-risk patients. In term of discrimination, at least five of the models showed a moderate ability to differentiate between low and high risk patients which may help in making treatment decisions and managing surgical lists. The perfect risk stratification system is still eluding us but some of these risk models, used with care, may have some value for institutional comparisons and patient counselling.
| Acknowledgments |
|---|
| Footnotes |
|---|
| Appendix A. Conference discussion |
|---|
|
|
|---|
The C index of 0.75 or 0.76 or the ROC curve leaves about 25% of the predictability to either the way that a hospital or the physicians or the whole team takes care of the patients, or by chance, and that is probably a reasonable amount of difference to expect according to the variability and care.
I think the main value for these systems is our ability as surgeons to look at our own data and compare our data to those of others, or at least to the aggregate, nationally or in Europe, whatever the geographic area is, and then you kind of see if you are doing really well or maybe not so well and areas that you can improve. When you get into comparing hospital to hospital, as you mentioned, that is always a more ticklish issue.
I think certainly the STS, which uses a logistic regression model and has had the chance to mature over the years with enough patients in the system, it now has roughly 25 variables that are predictive of operative mortality, and our goodness of fit or calibration, so to speak, is actually very good, being the least accurate at the very, very high risk because there are fewer patients in the very high risk.
We found that if we don't do this ourselves, somehow to kind of look at ourselves and occasionally identify hospitals, the government will do it for us, and I guess the main gist I would like to say in this discussion is that I think you need to keep pressing ahead for better systems, and I would urge you to be very proactive as a professional organisation, taking charge of the kind of data, have the surgeons collect the data rather than have the government do it, or else they will do it for you and it won't be done nearly as well.
Mr B. Keogh (Birmingham, UK): Thank you very much for a very interesting presentation, which I think illustrates some of the concerns that many of us have over the overuse and over-confidence in risk stratification. But there is another message I think from your paper, and that is that risk stratification models are designed on large population groups and are designed to predict variance and performance within those population groups.
So, for example, the EuroSCORE is designed on a huge European database and the Bayes models for the UK which you allude to are derived from a large number of UK patients, and you would expect the predictive value of those to vary from hospital to hospital within those groups: they will overscore in 50% of hospitals and underscore in 50% of hospitals.
So I think you must expect this kind of variance, and to attempting to validate a national model using only two hospitals is less a validation of the model, I think, and more a study of the variance of performance within those hospitals. But having said that, I think your point is very well taken, that these models should simply be used as pointers to look elsewhere, and I think Fred Grover's point, that we should strive for more accurate models, is also extremely important.
Dr Asimakopoulus: Probably a quick comment on that to emphasise my main point, as Mr Keogh said. One of the aims of the study is that when next time the UK government or any government publishes a list of hospitals based on mortality, it would be inappropriate, at least based on the validation of the study in our two hospitals, to use any of the existing systems in order to rank the hospitals based on mortality. A better more accurate system will have to be developed first.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
A. Parolari, L. L. Pesce, M. Trezzi, C. Loardi, S. Kassem, C. Brambillasca, B. Miguel, E. Tremoli, P. Biglioli, and F. Alamanni Performance of EuroSCORE in CABG and off-pump coronary artery bypass grafting: single institution experience and meta-analysis Eur. Heart J., February 1, 2009; 30(3): 297 - 304. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Granton and D. Cheng Risk Stratification Models for Cardiac Surgery Seminars in Cardiothoracic and Vascular Anesthesia, September 1, 2008; 12(3): 167 - 174. [Abstract] [PDF] |
||||
![]() |
V. Aboyans, M. Frank, K. Nubret, P. Lacroix, and M. Laskar Heart rate and pulse pressure at rest are major prognostic markers of early postoperative complications after coronary bypass surgery Eur. J. Cardiothorac. Surg., June 1, 2008; 33(6): 971 - 976. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Nilsson, L. Algotsson, P. Hoglund, C. Luhrs, and J. Brandt Comparison of 19 pre-operative risk stratification models in open-heart surgery Eur. Heart J., April 1, 2006; 27(7): 867 - 874. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Kolh Importance of risk stratification models in cardiac surgery Eur. Heart J., April 1, 2006; 27(7): 768 - 769. [Full Text] [PDF] |
||||
![]() |
M. Berman, A. Stamler, G. Sahar, G. P. Georghiou, E. Sharoni, R. Brauner, B. Medalion, B. A. Vidne, and A. Kogan Validation of the 2000 Bernstein-Parsonnet Score Versus the EuroSCORE as a Prognostic Tool in Cardiac Surgery Ann. Thorac. Surg., February 1, 2006; 81(2): 537 - 540. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Asimakopoulos, A. P. Karagounis, O. Valencia, D. Rose, G. Niranjan, and V. Chandrasekaran How Safe Is It to Train Residents to Perform Off-Pump Coronary Artery Bypass Surgery? Ann. Thorac. Surg., February 1, 2006; 81(2): 568 - 572. [Abstract] [Full Text] [PDF] |
||||
![]() |
C.-C. Chen, C.-C. Wang, S.-R. Hsieh, H.-W. Tsai, H.-J. Wei, and Y. Chang Application of European system for cardiac operative risk evaluation (EuroSCORE) in coronary artery bypass surgery for Taiwanese Interactive CardioVascular and Thoracic Surgery, December 1, 2004; 3(4): 562 - 565. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Zingone, A. Pappalardo, and L. Dreas Logistic versus additive EuroSCORE. A comparative assessment of the two models in an independent population sample Eur. J. Cardiothorac. Surg., December 1, 2004; 26(6): 1134 - 1140. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Gogbashian, A. Sedrakyan, and T. Treasure EuroSCORE: a systematic review of international performance Eur. J. Cardiothorac. Surg., May 1, 2004; 25(5): 695 - 700. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Ugolini and L. Nobilio Risk adjustment for coronary artery bypass graft surgery: an administrative approach versus EuroSCORE Int. J. Qual. Health Care, April 1, 2004; 16(2): 157 - 164. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| ANN THORAC SURG | ASIAN CARDIOVASC THORAC ANN | EUR J CARDIOTHORAC SURG |
| J THORAC CARDIOVASC SURG | ICVTS | ALL CTSNet JOURNALS |