|
|
||||||||
a National Centre of Epidemiology, Surveillance and Health Promotion – Istituto Superiore di Sanità, Rome, Italy
b Department of Epidemiology – ASL RME, Rome, Italy
Received 1 June 2007; received in revised form 16 November 2007; accepted 3 December 2007.
* Corresponding author. Address: National Centre of Epidemiology, Surveillance and Health Promotion – Istituto Superiore di Sanità, Via Giano della Bella, 34, I-00161 Rome, Italy. Tel.: +39 06 49904236; fax: +39 06 49904230. (Email: paola.derrigo{at}iss.it).
| Abstract |
|---|
|
|
|---|
Key Words: Coronary artery bypass graft Risk-adjustment EuroSCORE
| 1. Introduction |
|---|
|
|
|---|
In recent years, several models were developed to stratify patients before open-heart surgery, according to factors affecting mortality [1–8]. The aim was to compare outcomes retrospectively, adjusting by the case-mix of patients and to identify high-risk patients in order to provide a basis for a meaningful informed consent for patients counselling.
In order to assess their appropriateness in different clinical settings these models were often applied to surgical procedures for which they were not originally designed [9–13]. The overall conclusion was that the tested models are generally accurate and perform a useful service, but their applicability to different health systems cannot be warranted.
The Italian CABG Outcome Project was carried out from 2002 to 2004 with the objective of evaluating the performance of Italian cardiac surgery centres. The study facilitated the collection, analysis and publication of data relating to the 30-day mortality rates after CABG. Because risk-adjustment models work best when applied to the same population in which they were developed [14,15], a local empirically derived risk function was applied to control for real confounders.
The choice of the locally derived risk function aroused bitter controversy among Italian cardiac surgeons who regularly employ the 10-year-old EuroSCORE system to assess their patients preoperative risk [16]. This controversy focused on the real need to develop a new Italian risk function.
The present analysis was carried out to compare the accuracy and predictive power of the Italian empirically derived statistical model with the additive and logistic EuroSCORE in the Italian CABG population. The study was also to compare the results of the systems in determining hospital performance.
| 2. Material and methods |
|---|
|
|
|---|
For the enrolled patients, a set of demographic variables, clinical characteristics and information on the type and circumstances of the intervention were collected; an active follow-up to determine their life status was carried out by each centre. In case of death within 30 days from the intervention, the date and specific cause were recorded. The list of variables and their detailed definitions can be found elsewhere [17].
The best empirical model was developed using a multiple logistic regression analysis in order to account for joint confounding. First, all known confounding variables were included in the model; second, a backward stepwise method was used with the aim of identifying independent associations with the outcome (exclusion probability = 0.20; inclusion probability = 0.10). A set of a priori defined interaction hypotheses was also tested. A cross validation procedure was applied to avoid over fitting: patients were randomly split into two equal-size samples: sample I was used to build the predictive model (n = 17,231); sample II was used as an independent database for model validation (n = 17,079). The entire data set was finally used for estimating the definitive coefficients and calculating their p-values [17].
For the purpose of this analysis, in order to provide a proper comparison between the risk-adjustment model adopted in the Italian CABG Outcome Project and the EuroSCORE model, all records with missing values for variables considered in both models were excluded. A preliminary analysis comparing the mortality rate based on the excluded records with that based on the used database showed no statistically significant differences. This comparison confirmed that no biases were introduced by this selection. Therefore, the database actually used consisted of 30,610 isolated CABG interventions, with a 30-day mortality rate of 2.54%.
Although the Italian CABG Outcome Project and EuroSCORE collected almost the same variables, some differences were known to exist. A comparison between the sets of variables used in the two studies and their operative definitions was made; if inconsistencies were found, in order to apply EuroSCORE models to the Italian database, specific EuroSCORE variables were built through the combination of other parameters collected in the Italian CABG Outcome Project.
As some variables did not correspond exactly, some assumptions were made to make the two systems as comparable as possible. In particular, the EuroSCORE variable Critical preoperative state was built by combining three risk factors collected in the Italian CABG Outcome study and included in the EuroSCORE original variable definition: Unstable haemodynamic condition or shock, Dialysis and Malignant ventricular arrhythmia while the Respiratory disease variable was collected using slightly different definitions. Since the Italian CABG Outcome study examined only the 30-day mortality from isolated CABG interventions, variables such as Other than isolated CABG, Ventricular septal rupture and Thoracic aortic surgery were not collected and their status (yes or no) was forced to be negative when the EuroSCORE was applied. In the Italian model, unlike the EuroSCORE, Age was offered both as a linear and quadratic term (Age2) and the variable Diabetes under treatment was included. Other variables were examined but did not enter the final Italian model, such as Neurological dysfunction disease, Active endocarditis, Recent myocardial infarction.
Table 1 shows the comparison between variables defined in the original EuroSCORE system and those collected in the Italian CABG Outcome Project.
|
The risk function based on the EuroSCORE study provided two different methods for risk evaluation: logistic and additive, the latter being derived from the first. Both of them were used to estimate expected deaths in the Italian CABG population.
In particular, in the logistic approach coefficients from the EuroSCORE multivariate model were directly applied to the Italian population; in the additive approach, the score from the additive EuroSCORE was first used as the only determinant in a univariate logistic regression of 30-day mortality and then its coefficient was applied to the Italian CABG population. Both methods allowed us to estimate an individual's probability of dying within 30 days of the procedure.
To evaluate how well the three models (additive EuroSCORE, logistic EuroSCORE and Italian CABG model) could predict 30-day mortality in cases of both low and high preoperative risk, five risk classes, previously described by other authors and concerning the EuroSCORE risk stratification for patients undergoing heart and thoracic aorta surgery [18], were analysed: 0.00–2.49%; 2.50–4.99%; 5.00–9.99%; 10.00–19.99%;
20.00%. The first risk class (0.00–2.49%) was subsequently split into two smaller classes (0.00–1.24% and 1.25–2.49%) because the 30-day mortality after isolated CABG intervention was the only end-point considered in the Italian study and was about 2.5%.
In each risk class and for each applied predictive model, the observed deaths/expected deaths ratio (O/E ratio) was calculated. In order to evaluate the concurrence of the different methods, the K statistics for the Cohen weighted concordance was calculated.
The performance of the three models in predicting 30-day mortality was formally assessed for calibration and discrimination; the Akaike Information Criterion (AIC) statistic was calculated, when possible.
2.1 Calibration
Calibration refers to the accuracy of a score's prediction. Model calibration can be assessed using the Hosmer–Lemeshow test (H–L test). In this analysis, records were split into eight groups of roughly equal size, on the basis of their predicted probability of death within 30 days of surgery. The predicted number of deaths in each group was then compared with the corresponding number of observed deaths. A significant result indicates that the observed and predicted values do not satisfactorily overlap.
2.2 Discrimination
Discrimination refers to the ability of a model to distinguish the value 0 from the value 1 of the dependent variable, that is, the ability of the score to distinguish patients who died from those who lived. Discrimination can be assessed by the area under the receiver operative characteristic curve (ROC). The ROC area can be interpreted as the probability that a patient who died had a higher risk score than a patient who survived. Thus the area under the curve is the percentage of randomly drawn pairs for which this is true. This is a fairly subjective measure and values greater than 0.8 usually indicate potentially useful discrimination. A value of 0.5 indicates random predictions.
The ROC values estimated from each applied predictive model were compared using the Chi-square test for ROC area comparisons [19].
2.3 Akaike Information Criterion (AIC)
In order to better compare the predictive models the AIC statistic was evaluated.
The AIC statistic (–2 x log-likelihood + 2 x number of parameters in the model) increases with an increasing number of coefficients but decreases when a better adaptability to data is achieved. It represents the measure of how much a specific model is suitable to describe the study phenomenon and is a function of the model's residual variance (prediction error): the less the variance the more the accuracy. According to Akaike [20], the model exhibiting the smallest AIC value is the model providing the most information on the study sample.
2.4 Comparison between centres mortality and mean population mortality
The Italian CABG model was developed to evaluate the performance of the Italian cardiac surgery centres, taking into account differences in patients characteristics.
This evaluation was carried out by comparing the risk-adjusted mortality rate (RAMR) 30 days after isolated CABG interventions for each centre with the average mortality rate for the whole study population.
By applying the best predictive algorithm back to each centre data set, the expected number of deaths for each centre was estimated. As the main objective of this analysis was to compare results from a local risk function to an external risk function, a rough recalibration of each model was performed. To perform this recalibration, the expected number of deaths in each centre was divided by the expected number of deaths in the whole population, as estimated by each statistical model, and then multiplied by the total number of observed deaths. This recalibration procedure is equivalent to multiplying the expected deaths in each centre by the O/E ratio in the whole population. The RAMRs were then calculated by dividing the number of the observed deaths in each centre by these recalibrated expected number of deaths and multiplying this ratio by the mortality rate of the whole sample.
To test heterogeneity, the exact Poisson test with a significance threshold of p = 0.05 was used. A RAMR significantly lower than the average mortality rate indicates that the health care provider's performance is better than the average of the whole sample (low-outlier centre); on the contrary, a RAMR significantly higher shows a worse performance (high-outlier centre).
These procedures were also followed when the statistical models derived from the additive and logistic EuroSCORE were applied. The aim was to verify possible differences between the Italian CABG model and the EuroSCORE models in the identification of outliers.
All the analyses were performed by STATA 8.1 statistical package.
| 3. Results |
|---|
|
|
|---|
|
|
|
|
2
= 20.02, p
< 0.0001) as is the difference between the ROC area for the CABG model and the logistic EuroSCORE (
2
= 19.30, p
< 0.0001).
|
The AIC statistic results were higher for the additive EuroSCORE than the Italian CABG model. This score decreases when a better adaptability to data is achieved; thus, the Italian CABG model appears to fit the analysed Italian data better than the additive EuroSCORE. For the logistic EuroSCORE, the required parameters were not available, so the AIC statistic could not be computed.
The K statistic (not reported in the table) between the additive EuroSCORE and the Italian CABG model was 0.69 and suggests an acceptable concordance between risk groups; in contrast, the K statistic between the Italian CABG model and logistic EuroSCORE was low (0.39).
In Table 5 , low- and high-outliers, as obtained by applying the three different approaches, are reported. For the reasons explained in Section 2, expected deaths from each model were multiplied by the O/E ratio of the whole population. For the additive EuroSCORE and the Italian model the expected deaths were multiplied by 1 (population O/E ratio = 1), but for the logistic EuroSCORE this ratio was 2.54/6.27 = 0.4. This analysis shows a satisfactory concordance between RAMRs from the additive EuroSCORE, logistic EuroSCORE and Italian CABG model.
|
| 4. Discussion |
|---|
|
|
|---|
In general, pre-surgical risk stratification models are developed with the aim of comparing the quality of different institutions while accounting for differences in the severity of disease. In spite of the widely acknowledged inaccuracy in predicting individual postoperative mortality [8,21] these models are currently used by surgeons to give patients referred to open-heart surgery, documented information about their surgical risk.
As already stated by other authors, one single risk score cannot predict mortality precisely in a heterogeneous group of patients, but it should be specific for a single procedure performed at a given time and geographical location, on a group of patients who are classified according to risk factors [14,15].
The Italian CABG Outcome Project was carried out with the main objective of ranking the performance of Italian cardiac surgery centres, adjusting for those risk factors that are closely related to outcome but heterogeneously distributed among centres (real confounders).
The EuroSCORE system, although developed in the 1990s, is still the pre-surgical risk stratification model largely used by European surgeons to evaluate an individual's risk associated with open-heart surgery [9,16,22]. The development of a new Italian risk function generated a strong debate in the Italian scientific community. A proper analysis aimed at evaluating the two methods and comparing their results in the same population could shed light on this controversy.
When a model built on a population has to be exported to another, the first condition required is that the two populations are indeed similar.
Actually, the EuroSCORE and Italian CABG populations reveal some important differences. These differences are mainly due to technological improvements and to the changes in patients characteristics over the 10 years between the two studies [23]. Nowadays, coronary angioplasty represents a very common, well-timed and effective procedure, almost completely replacing coronary surgery. For this reason, patients undergoing a CABG intervention represent only a sub-group that probably has a high prevalence of severe risk factors. At the same time, because of technological progress, results obtained in this group are improving over time. These two opposite but contemporary trends – i.e., the increasing severity of patients risk factors and the decreasing mortality rate – lead the two populations to have different relationships between risk factors and mortality.
Moreover, the Italian CABG model is specific to a single procedure (isolated CABG) while the EuroSCORE model takes into account all types of open-heart surgery [16,17]. The presence of variables that are risk factors in any cardiac surgery other than coronary in the EuroSCORE model definitely modifies the coefficients of the other risk factors. As those variables are not included in the Italian study population, when the EuroSCORE model is applied to the Italian CABG population, their status (yes/no) is forced to be negative. In this way, the coefficients of all variables are potentially affected and, as a consequence, the reliability of death estimates is undermined.
Strictly considering the statistical comparison, an undeniable advantage of the Italian CABG model is the low number of variables needed in the risk calculation (14 vs 17 of EuroSCORE). It is well known that including too many variables in risk indexes not only increases costs and errors but may result in statistical over fitting and instability also [15].
The benefit of using the Italian CABG risk-adjustment function is also confirmed by the evaluation of the model parameters. In particular, the ROC area and the AIC indicate a better performance of the Italian model than the additive EuroSCORE, while the H–L test is not significant for both models. The ROC area for the logistic EuroSCORE is comparable to that of the additive model, but the H–L test reveals substantial differences between the observed and expected deaths in the estimated risk classes and highlights the poor calibration of the logistic approach.
This last finding confirms the changes discussed above that have occurred over the last 10 years and the relationship between the characteristics of the population undergoing CABG intervention and the 30-day mortality rate. In this way, this relationship cannot be rightly described by the coefficients estimated in the EuroSCORE population. Thus, when the logistic EuroSCORE is applied in the Italian CABG population, it leads to an overestimation of an individual's probability of death shortly after CABG. This overestimation remains constant through the considered risk classes (O/E = 0.4). If the expected deaths in each risk class are multiplied by 0.4, the H–L test for the logistic EuroSCORE becomes non-significant, suggesting the achievement of an acceptable level of calibration. On the contrary, the overestimation problem is not present with the additive approach, where the back application of the score on the Italian population directly determined a model recalibration.
Although the specific weight of each risk factor changed over time, their linear combination remained valid, allowing comparisons of the individual risk of patients belonging to the same population. This is not the case for the evaluation of the individual absolute risk or for the comparison of risk in patients from different populations. For these last purposes, as other authors already asserted [23], if a risk function has to be exported from one population to another, at least a recalibration procedure is required.
As shown in Table 5, the recalibrated logistic and additive EuroSCOREs provide a classification of the outlier status of Italian cardiac surgery centres that is satisfactorily similar to the classification obtained from the Italian CABG model. The ranking from the three statistical approaches are more similar for the high- than for the low-outliers. This finding can be explained by the greater relative fluctuation of estimates in centres with a small number of deaths in comparison with the relative stability of estimates in centres with a high number of deaths.
In conclusion, the analysis of the three models indicates that the Italian CABG model takes fewer variables into account, and has a better performance than the others two models. As stated in previous studies, this research confirms that a risk-adjustment function should be specific for time and geographical location. If statistical models imported from other populations or estimated some years before have to be used, periodical updating and recalibration are strongly recommended to ensure satisfactory model performance [14,15,24].
Providing the above recommendations are followed the EuroSCORE model can be exported to the Italian population and successfully used to rank hospital performances as well as to evaluate the preoperative risk of patients undergoing open-heart surgery.
| Appendix A |
|---|
|
|
|---|
A.1 Study design, analysis and coordination
|
|
| Acknowledgments |
|---|
The authors thank the Italian Cardiac Surgery Centres for their participation in this Study and Dr Paola Ciccarelli for her editorial help.
| Footnotes |
|---|
The Italian CABG Outcome Study was financed by the Italian Ministry of Health and partly supported by Il Progetto CUORE II – Epidemiology and Prevention of Ischaemic Heart Disease and Progetto Mattoni – Outcome Measurement of the Italian Ministry of Health.
1 See Appendix A. ![]()
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
M. Ranucci, S. Castelvecchio, L. A. Menicanti, S. Scolletta, B. Biagioli, and P. Giomarelli An adjusted EuroSCORE model for high-risk cardiac patients Eur. J. Cardiothorac. Surg., November 1, 2009; 36(5): 791 - 797. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Nissinen, F. Biancari, J.-O. Wistbacka, P. Loponen, K. Teittinen, P. Tarkiainen, S.-P. Koivisto, and M. Tarkka Is it possible to improve the accuracy of EuroSCORE? Eur. J. Cardiothorac. Surg., November 1, 2009; 36(5): 799 - 804. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Ranucci, S. Castelvecchio, L. Menicanti, A. Frigiola, and G. Pelissero Risk of Assessing Mortality Risk in Elective Cardiac Operations: Age, Creatinine, Ejection Fraction, and the Law of Parsimony Circulation, June 23, 2009; 119(24): 3053 - 3061. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Ranucci Italian hospital mortality risk model vs additive and logistic EuroSCORE in coronary operations Eur. J. Cardiothorac. Surg., February 1, 2009; 35(2): 379 - 380. [Full Text] [PDF] |
||||
![]() |
A. Parolari, L. L. Pesce, M. Trezzi, C. Loardi, S. Kassem, C. Brambillasca, B. Miguel, E. Tremoli, P. Biglioli, and F. Alamanni Performance of EuroSCORE in CABG and off-pump coronary artery bypass grafting: single institution experience and meta-analysis Eur. Heart J., February 1, 2009; 30(3): 297 - 304. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. K. Choong, P. Sergeant, S. A.M. Nashef, J. A. Smith, and B. Bridgewater Editorial comment: The EuroSCORE risk stratification system in the current era: how accurate is it and what should be done if it is inaccurate? Eur. J. Cardiothorac. Surg., January 1, 2009; 35(1): 59 - 61. [Full Text] [PDF] |
||||
![]() |
P. D'Errigo, F. Seccareccia, D. Fusco, and C. A. Perucci Re: Editorial comment by Dr Menicanti. Eur. J. Cardiothorac. Surg., August 1, 2008; 34(2): 468 - 469. [Full Text] [PDF] |
||||
![]() |
L. A. Menicanti The surgeon, the statistics and the data Eur. J. Cardiothorac. Surg., March 1, 2008; 33(3): 323 - 324. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| ANN THORAC SURG | ASIAN CARDIOVASC THORAC ANN | EUR J CARDIOTHORAC SURG |
| J THORAC CARDIOVASC SURG | ICVTS | ALL CTSNet JOURNALS |