|
|
||||||||
Eur J Cardiothorac Surg 2004;26:1134-1140
© 2004 Elsevier Science NL
Division of Cardiac Surgery, Department of Cardiology, Ospedali Riuniti di Trieste, Trieste, Italy
Received 9 July 2004; received in revised form 1 September 2004; accepted 1 September 2004.
* Corresponding author. Vicolo degli Scaglioni, 22, 34141 Trieste, Italy, Tel.: +39 040 3994856; fax: +39 040 3994995. (E-mail: bartolo.zingone{at}aots.sanita.fvg.it).
| Abstract |
|---|
|
|
|---|
| 1. Introduction |
|---|
|
|
|---|
Although the predictive accuracy of EuroSCORE is now firmly established, it should be appreciated that the additive model alone has been subjected to validation out of the boundaries of the original database [3]. In addition to this, inconsistencies have been found among the popular additive version and the logistic model when applied to the higher risk segment of the original study population in the EuroSCORE database [3,6]. Indeed, details of the full logistic model including the numeric coefficients required to run the procedure had not been available before a recently published report [7].
We have been fortunate in obtaining the logistic EuroSCORE model as early as of January 1999 (P Michel, personal communication), and have used it prospectively within our quality improvement program ever since. We deemed therefore useful to report upon the first external validation of the logistic EuroSCORE in an independent population sample and comparatively assess the additive approach in the same dataset.
| 2. Methods |
|---|
|
|
|---|
Data were collected at the time of surgery on a standardised A4 form by the surgeon's first assistant and next inputted in a PC database. Stored data were formatted (or subsequently transformed) so as to comply with the sometimes differing definitions stated from the various models. Data supervision was performed by the project coordinator for consistency, and aggregate outputs were periodically cross-checked against an independent clinical database.
The outcome of interest, death early after surgery, was defined as that occurring at any time during the hospital stay or within 30 days since surgery for discharged patients. Hospital stay includes transferral to other units and, occasionally, to other hospitals. Vital status of discharged patients was ascertained at
30 days by phone interviews. Predictors were prospectively defined according to EuroSCORE criteria but for extra-cardiac arteriopathy, for which the absence of peripheral pulses was also added to the set of criteria [1]. Missing categorical variables were considered absent. Moderate or severe left ventricular dysfunction were assigned by semi-quantitative echocardiographic or angiographic assessment in 144 and by any available measurement of ejection fraction in 620 cases.
The dataset was eventually imported and analysed in SPSS 10.1 (SPSS Inc, Chicago, IL). The individual probabilities of death were estimated by running a syntax code incorporating the regression coefficients and intercept developed by EuroSCORE [7]. An individual score was also calculated by a simple additive SPSS syntax. The percent difference among the two model estimates was computed on a patient basis by the formula: (logistic estimateadditive score)/logistic estimate.
The discriminating ability of the logistic and additive models was assessed by separately developing Receiver Operating Characteristic (ROC) curves [8]. Calibration was formally assessed by the HosmerLemeshow test with 8 Degrees of Freedom comparing observed and predicted deaths in risk deciles separately generated for each model [9]. In order to provide additional insight two further grouping strategies were adopted, first by using a clinical risk classification generating low-, medium-, high-, and very high-risk groups based on additive scores [6]. Next, smaller interval scales were used to break down the sample in as a large number of groups as practical.
Finally, the series was split into two equal-sized consecutive groups in the search for temporal variation in performance and in order to verify whether this might affect the performance of the two EuroSCORE models.
Risk-adjusted mortality rates (RAMR) were calculated by multiplying the Observed/Expected ratio of death rates in the study sample by the 4.8% raw mortality in the EuroSCORE population [1]. The prevalence of categorical data were compared by the chi square test. Either chi square test or 95% Confidence Intervals around Risk Ratios were used for comparing the incidence of death among groups. The one sample t-test was employed to compare patient's ages among the EuroSCORE population and the study sample.
During the study period 2443 consecutive patients were enrolled. Seventeen cases (sixteen surviving) operated upon without requiring extra-corporeal circulation were excluded on the assumption that they would not satisfy the enrolment criteria of EuroSCORE, so leaving 2426 cases useful for analysis. Excluded patients underwent procedures for acute or chronic pericardial disease in 11 cases, exploratory thoracotomy in 3 cases, pacemaker procedures in 2 cases, and repair of cardiac rupture in 1 case. Patients undergoing off pump coronary artery bypass grafting (OPCABG, n=109) were included.
Surgical operations consisted in isolated coronary artery bypass grafting (CABG) in 1629 cases (67.1%), isolated valve surgery in 340 cases (14.0%), combined CABG and valve surgery in 234 cases (9.6%), thoracic aortic surgery in 140 cases (5.8%) and miscellaneous procedures in 83 cases (3.4%).
| 3. Results |
|---|
|
|
|---|
|
Discriminating ability was good for both additive and logistic models, with areas under the ROC curve of 0.80 for the logistic model and 0.79 for the additive model (Fig. 1). Calibration was good for the logistic model (P=0.12) but turned out being inadequate for the additive model (P<0.0001).
|
|
|
|
|
|
|
| 4. Discussion |
|---|
|
|
|---|
Before we try to answer this question we would like to acknowledge that the characteristics of the population described in our study significantly differ from those of the EuroSCORE population in terms of prevalence of the predictive variables, resulting in higher predicted and observed death rates than measured in the original study [1]. Among others, the older age of our patients probably explain much of the variation and also justifies the greater prevalence of co-morbidity. We are also quite confident that adherence to strict definition criteria makes unstable angina, recent infarct, critical preoperative state and emergency truly more prevalent in our sample. They probably reflect the generally evolving attitude towards increasingly aggressive management of acute coronary syndromes in the time frame our population was enrolled, compared to the less recent EuroSCORE data collection. This having said, it should be noted that what is actually expected from a risk-stratifying tool is exactly the possibility of reliably predicting outcomes in practice experiences different from the one which the model was generated upon.
Being based on a large database collecting patients from 128 cardiac units throughout Europe, EuroSCORE has been shown to be quite robust in dealing with different populations. Despite major epidemiological differences, for instance, Nashef et al. obtained striking concordances with the Society of Thoracic Surgeons model on predicting death rates by EuroSCORE in a large North American cardiac surgical population [4]. Less of a surprise, EuroSCORE performed quite well in individual European countries [11]. Additional validations have been provided on a British population [12], on Japanese cardiovascular patients [13] and in Turkish patients [14]. Some studies concentrated themselves on the subset of patients undergoing isolated CABG [1517] and some others further restricted their scope to high-risk CABG [18] or to thoracic aortic surgery [19]. Quite surprisingly, all of these studies but one possible exception [11] tested the additive EuroSCORE which transforms the original logistic coefficients into integer risk scores though, by so doing, may not necessarily reproduce the same discrimination and calibration properties demonstrated for the internally validated logistic model.
We therefore addressed the validation of the logistic model of EuroSCORE and compared its performance with its additive counterpart. Our study revealed that the logistic model has both a good discriminating ability and a fairly good calibration across the full range of risk values in our population. To the extent our split-series can be credited of, variation of outcomes due to changing performance over time does not affect the capacity of the logistic model to risk-stratify samples of different risk-profile compared to the reference EuroSCORE. On the other hand, the additive model appeared poorly calibrated in our study, and reproducibly so after multiple attempts at placing different cut-offs for risk grouping. Splitting our series in two temporal groups confirmed that such a limitation is indeed intrinsic to the additive model itself and cannot be attributed to variations in the level of clinical performance.
These findings are not entirely new. Several studies have already pointed to the strong propensity for the additive EuroSCORE to under-predict death rates in high risk settings [6,15,17]. Although high risk cases constitute a minority of the caseload in most cardiac programs they also express a significant proportion of the overall mortality, for which poor approximation in stratification cannot be easily tolerated [15]. On the other hand, an opposite trend towards over-predicting the risk of death has been shown for low-to-medium risk patients by Sergeant et al. [17] in CABG patients. Gogbashian et al. [3] further strengthened both these criticisms and expanded their scope to the whole caseload by reviewing the published reports on EuroSCORE validation in populations not participating to the original database. By directly comparing the estimates from the two models on the same population sample we found a consistent twist effect from over-predicting at medium-risk levels to under-predicting at high risk levels of the additive vs the logistic model.
We also confirmed that ROC curves alone cannot be taken as the only indicator of predictive models performance. Formal calibration by HosmerLemeshow testing and additional breakdown tables are of utmost importance in understanding the possible limitations of different models showing similarly good c-statistics [10].
| 5. Limitations |
|---|
|
|
|---|
The process of external validation of stratifying models heavily relies on strict adherence to the variable definitions provided by the original model. To this regard, mild over-expression of the variable extra-cardiac arteriopathy in our series should be considered a possible cause of excess risk estimation, though we would guess it was of trivial degree and did not consider it influential. Far more relevant deviations, such as defining mortality as within 30-days [20] or limiting the events to the in-hospital course [5,17] or ignoring stronger predictors such as pulmonary hypertension [5] may weaken the results of the validation procedure and should be discouraged.
Formal, external auditing was not available for our data. However, systematic supervision and periodical cross-checking as described captured (and corrected) a number of erroneous data inputs small enough to render the whole dataset satisfactorily reliable.
| 6. Conclusion |
|---|
|
|
|---|
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
J. J. Andreasen, V. Nekrasas, and C. Dethlefsen Endoscopic vs open saphenous vein harvest for coronary artery bypass grafting: a prospective randomized trial. Eur. J. Cardiothorac. Surg., August 1, 2008; 34(2): 384 - 389. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. D'Errigo, F. Seccareccia, S. Rosato, V. Manno, G. Badoni, D. Fusco, C. A. Perucci, and the Research Group of the Italian CABG Outcome Pro Comparison between an empirically derived model and the EuroSCORE system in the evaluation of hospital performance: the example of the Italian CABG Outcome Project Eur. J. Cardiothorac. Surg., March 1, 2008; 33(3): 325 - 333. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. A. Mestres, M. A. Castro, E. Bernabeu, M. Josa, R. Cartana, J. L. Pomar, J. M. Miro, J. Mulet, and the Hospital Clinico Endocarditis Study Group Preoperative risk stratification in infective endocarditis. Does the EuroSCORE model work? Preliminary results Eur. J. Cardiothorac. Surg., August 1, 2007; 32(2): 281 - 285. [Abstract] [Full Text] [PDF] |
||||
![]() |
F Bhatti, A D Grayson, G Grotte, B M Fabri, J Au, M Jones, B Bridgewater, and on behalf of the North West Quality Improvement Pr The logistic EuroSCORE in cardiac surgery: how well does it predict operative risk? Heart, December 1, 2006; 92(12): 1817 - 1820. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Matsuura, H. Ogino, H. Matsuda, K. Minatoya, H. Sasaki, T. Yagihara, and S. Kitamura Limitations of EuroSCORE for Measurement of Risk-Stratified Mortality in Aortic Arch Surgery Using Selective Cerebral Perfusion: Is Advanced Age No Longer a Risk? Ann. Thorac. Surg., June 1, 2006; 81(6): 2084 - 2087. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Jin, G. L. Grunkemeier, and For the Providence Health System Cardiovascular St Additive vs. logistic risk models for cardiac surgery mortality Eur. J. Cardiothorac. Surg., August 1, 2005; 28(2): 240 - 243. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| ANN THORAC SURG | ASIAN CARDIOVASC THORAC ANN | EUR J CARDIOTHORAC SURG |
| J THORAC CARDIOVASC SURG | ICVTS | ALL CTSNet JOURNALS |