|
|
||||||||
Eur J Cardiothorac Surg 2005;27:1022-1029
© 2005 Elsevier Science NL
UK Cardiothoracic Transplant Audit, Clinical Effectiveness Unit, The Royal College of Surgeons of England, 3543 Lincoln's Inn Fields, London WC2A 3PE, UK
Received 15 September 2004; received in revised form 24 February 2005; accepted 28 February 2005.
* Corresponding author. Tel.: +44 207 869 6620; fax: +44 207 869 6644. (E-mail: chris.rogers{at}bristol.ac.uk).
| Abstract |
|---|
|
|
|---|
Key Words: CUSUM VLAD SPRT Case-mix Risk adjustment Monitoring
| 1. Introduction |
|---|
|
|
|---|
Industrial processes have been subjected to quality monitoring for almost 75 years. Various forms of control chart have been developed to aid this process, which date from the seminal work of Shewhart [4]. Process control charts were first used in a health care setting in 1971 [5]. Since, then they have been used in assessing individual surgical performance [6,7], monitoring the performance of trainees [8] and in comparing institutions [9], to name a few.
The cumulative sum (CUSUM) chart, developed by Page [10] has proved particularly popular as it provides a graphical summary of changes in performance over time while also avoiding the problem of repeated significance testing [11]. It has shown to be ideally suited for detecting small but persistent process changes [12]. Its use in sequentially monitoring outcomes in medicine was proposed by Altman et al. [13] and it was first used in cardiac surgery by de Leval et al. [6].
Although the chart is easy to construct, monitoring health care processes against set standards is not entirely straightforward, and is mired with controversy and shortcomings. Patients vary greatly in age, disease severity and co-morbidity (case-mix). In assessing performance there is a need to adjust or control for this case-mix. Also, the standard against which to measure performance may not be clear, as in the case of intrathoracic transplantation in the UK where there is no nationally accepted standard.
Using a prospective validated national transplant database, we used cumulative sum techniques, the CUSUM chart and its case-mix adjusted variants, the variable life adjusted display (VLAD) proposed by Lovegrove et al. [14] and the risk-adjusted sequential probability ratio test (SPRT) chart described by Speigelhalter et al. [15], to assess whether these techniques were applicable in a transplant setting. We examined the impact of risk adjustment on the interpretation of the charts and the likely impact of these techniques for the prospective monitoring of post-transplant performance.
| 2. Materials and methods |
|---|
|
|
|---|
2.2. Case-mix adjustment
Multiple logistic regression was used to identify predictors of mortality. The heart transplantation model was built using data from the July 1995 to March 2001 cohort, and was evaluated using subsequent transplants (April 2001March 2004). The lung model, which included single, bilateral sequential and heartlung transplants, was constructed using data from July 1995 to December 2001. Potential risk factors were identified from a literature-review or based on clinical opinion. Factors were chosen using backward selection. For the lung model factors significant at P<0.10 were retained, while for the heart model bootstrapping and cross-validation methods were used to identify factors with predictive ability. Model calibration and discriminatory ability were assessed using the HosmerLemeshow test and the area under the receiver operating curve, respectively. Further details on the methods used to derive the models used are available on request.
2.3. Construction and interpretation of sequential monitoring charts with control limits
2.3.1. Without case-mix adjustment
In order to construct a cumulative sum chart with control limits to detect when performance has become unacceptable four parameters need to be specified; (a) the standard or target mortality rate that is considered acceptable (p0) (also termed the null hypothesis H0); (b) the unacceptable mortality rate that the chart is intended to detect (p1, where p1>p0) (the alternative hypothesis H1); (c)
, probability of rejecting p0 when it is true, false positive, type I error rate and (d) ß, probability of rejecting p1 when it is true, false negative, type II error rate. The formulae for calculating the cumulative sum and the control limits (boundary lines) are given in the Appendix A. The chart is then a plot of transplant number on the horizontal axis against cumulative sum on the vertical axis. The boundary lines run parallel to the horizontal axis.
2.3.2. With case-mix adjustment
For the risk-adjusted analogue [15], the fixed p0 is replaced by a patient-specific predicted probability of death. The unacceptable mortality rate, p1, is also patient-specific and is defined in terms of an increase in risk of death. The false positive and false negative rates remain unchanged.
2.3.3. Interpretation
The cumulative sum chart as described, whether risk-adjusted or not, has no direct interpretation. The interpretation comes from its position relative to the boundary lines. While the chart remains between the lines no conclusion can be drawn and monitoring should continue. If the chart crosses the upper boundary line it is said to have signalled that performance has deteriorated (it may be an alert or alarm depending on which line(s) are crossed). In contrast, performance is confirmed acceptable when the lower boundary is crossed. The natural progress of the chart for a process in control is downwards towards the lower boundary line.
2.4. Construction and interpretation of Sequential monitoring charts without control limits
Unlike the charts with control limits, the variable life adjusted display has a direct interpretation. Briefly, a transplant that fails is given a score of one and a success scores zero. After each transplant the predicted probability of death (p) is subtracted from the transplant score and the value obtained added to the cumulative total. So if the patient dies within 30 days the cumulative sum is increased by (1p) and if the patient lives the sum is decreased by p. The cumulative sum obtained is the cumulative difference between the observed and expected mortality. If the performance is acceptable (i.e. as expected) the chart should oscillate about the horizontal axis, while an increase in gradient would indicate a rise in mortality over what is expected and a decrease in gradient would suggest better than expected results. Methods for formally detecting changes have been suggested [14] but they do not equate to a hypothesis test in quite the same way as described above for the other charts. Further details on the construction and interpretation of sequential monitoring charts are available [17,18].
The CUSUM chart and its case-mix adjusted variants, the variable life adjusted display (VLAD) and the risk-adjusted sequential probability ratio test (SPRT) chart, described above, were used for performance monitoring. The acceptable mortality rate for the CUSUM charts without case-mix adjustment, p0, was set at the overall 30-day mortality rate for the cohort and for the risk adjusted charts the predicted probability of death derived from the risk models described was used. All charts were designed to detect a 50% increase in mortality risk (odds ratio 1.5). Two settings for
and ß were chosen; (i)
=ß=0.10 (10%) to signal an alert and (ii)
=ß=0.05 (5%) for an alarm. All analyses presented were carried out using STATA version 8.2 (Stata Corporation, TX).
| 3. Results |
|---|
|
|
|---|
2-test, P=0.008), but the national mortality rate has remained constant over time. Mortality after lung transplantation varied from 9.6% at centre A to 21.3% at centre E (
2-test, P=0.46), and the national mortality rate has declined slightly over time (
2 test, P=0.05).
A total of 1173 heart transplants and 757 lung transplants were used to derive the logistic regression models used for case-mix adjustment. Eighteen factors were considered for inclusion in the heart model and 19 for the lung model. Seven factors were included in the final heart model: vascular disease (weight+1.25), ventilated at registration and/or transplant (+0.97), diabetic recipient (+0.71), creatinine clearance
50ml/min/1.73m2 at transplant (+0.65), more than one previous open heart operation (+0.47), ischaemia time (120179min: +0.36; 180239min: +0.72;
240min: +1.08) and donor age (2640 years: +0.28; 4155 years: +0.56; >55 years +0.84). Distribution of risk scores (obtained by summing the weights for the risk factors present) by centre are illustrated in Fig. 1. The model was well calibrated for both the model building and validation datasets and showed moderate discriminatory ability (area under receiver operating curve=0.67 for both datasets). Six factors were included in the adjustment for case-mix after lung transplantation: type of transplant (single lung: 0.04; heartlung: +0.47), diagnosis (pulmonary hypertension: +0.93; fibrosis: +0.11; chronic obstructive pulmonary disease: +0.06; other diagnosis (not suppurative): +0.91), ventilated at registration and/or transplant (+1.15), diabetic recipient (+0.70), creatinine clearance
50ml/min/1.73m2 at registration (+0.53), and ischaemia time (180239min: +0.43;
240min: +0.86). The model was well calibrated and showed good discriminatory ability (area under receiver operating curve=0.71). Distribution of risk scores by centre are illustrated in Fig. 2.
|
|
3.1. Heart transplantation
The CUSUM chart without case-mix adjustment (Fig. 3) remained firmly within the boundary lines for centre H, indicating monitoring should continue. One centre, C, crossed the lower boundary early in the sequence, confirming that their unadjusted rate was acceptable (i.e. in line or better than the target mortality rate). Three of the remaining five centres, B, F and G reached the lower 5% acceptance boundary at the end of the monitoring period and centre D crossed the 10% acceptance limit. Centres A and E were very both close to signalling an alert, warning of a rise in mortality rate, but neither chart signalled an alarm.
|
|
|
|
|
|
| 4. Discussion |
|---|
|
|
|---|
Sequential monitoring tools are intended as an aid to identifying deteriorating outcomes and as such are relevant for the real-time monitoring of results. The analyses reported here are retrospective and so indicate what would have been found had the monitoring been in place at the time. The increase in mortality after heart transplantation above what was expected at centre E was detected locally and prompted an investigation. Transplantation at that centre has since ceased. Further analyses of data from centre E using sequential monitoring methods are reported by Poloneicki el al. [19].
The charts described are relatively simple to construct and well suited for detecting small but persistent process changes [12], but care needs be taken in setting the chart parameters. The settings used here were data driven, which could be viewed as a limitation. For prospective monitoring the parameters would need to be set in advance. Currently in the UK there is no recognised accepted mortality standard. The overall 30-day mortality rates for heart and lung transplantation are 12.5% and 12.6%, respectively and the British Transplantation Society has suggested that centres aspire to a 30-day survival target of 85% and 80% for heart and lung transplants, respectively [20]. Thirty-day mortality rates after heart transplantation in the US and Europe are available (http://www.ishlt.org/registries/slides.asp), but these may not be the most appropriate target, due to differences in referral pattern, case-mix and health systems between different countries. The national or centre-specific observed mortality rate derived from recent historic data is possibly the best choice of target for future monitoring, when, as here, the mortality rate has remained fairly constant over time.
The limited data available prevented us from deriving our models for case-mix adjustment using datasets that were independent of those used for monitoring. In other areas of cardiac surgery, such as coronary artery bypass grafting, there are many well-established methods of case-mix adjustment, such as the Parsonnet [21] and EuroSCORE [22], but with many fewer transplant operations accrued and differences in referral pattern, case-mix and transplant practice between the UK, the rest of Europe and the US there are, to date, no published risk models for 30-day mortality after heart or lung transplant, which have been validated for use in the UK population.
Case-mix adjustment is critical to the acceptance of monitoring schemes by the clinical community. Such variation in complexity is the main factor that distinguishes medicine from the manufacturing industry. An inability to control for case-mix or lack of faith in the systems available remains the greatest limitation to the more widespread adoption of monitoring methods in clinical practice. Although it is well recognised that all risk adjustment methods are imperfect, belief in the method used amongst those being monitored is essential for successful implementation.
For our study, the false positive and false negative rates (
and ß) were set at 0.10 for an alert and 0.05 for an alarm, although Spiegelhalter et al. [15] suggest that if the charts are used to monitor several surgeons or institutions smaller values may be more appropriate. Smaller
and ß would make the boundaries more stringent, thereby reducing the probability that the chart for an individual surgeon or institution performing acceptably signals an alert or alarm. The decision on the most appropriate settings for
and ß will also depend on the context and the action to be taken when an alert or alarm is signalled. For example, (a) who is responsible for the monitoring processthe unit, the monitoring body, the public? (b) who has access to the results? (c) how critical or important is the outcome, e.g. death or a non-fatal complication, that may impact on longer term outcome and (d) if the chart signals an alarm will that trigger an internal review of policy and practice or a formal investigation by an external body?
There is no consensus on what monitoring action should be taken when a chart crosses a boundary line and a conclusion regarding the performance (as expected or otherwise) is reached. It has been suggested that the chart should be reset to zero after crossing the lower acceptance boundary, in order to avoid the build up of excessive credit and increase the sensitivity of the monitoring procedure [15]. For our data this would have involved resetting the heart transplant chart for centre C twice and the lung transplant chart for centre A once during the monitoring period covered. Resetting a chart like those seen for centres A and C is intuitive and appealing but the strict interpretation of
and ß is lost. Successive restarts increase the overall chance of a false positive and decrease the chance of a false negative result [15].
The unacceptable mortality whether defined explicitly for an unadjusted chart or as an increase in risk for case-mix adjusted monitoring, also needs to be selected with care. If set too small, the chart may not be sufficiently sensitive to change, as long sequences are needed to detect small changes, while setting the risk increase too high may lead to a system which is too sensitive and signals frequently. Simulation studies, assessing the performance of the scheme for known patterns of response (i.e. a centre performing as expected vs. a centre with a higher than expected mortality rate throughout vs. a centre whose mortality rate increased gradually over the monitoring period) may help to guide the decision making process.
More than 8 years of transplant activity are included in the monitoring charts presented. The relevance of operations carried out in 1995 to practice in 2004 is questionable. However, focussing on more recent data, the last 3 years say, would severely reduce the number of cases available, and hence the likelihood that the chart would signal performance change. Ideally one would like to place greater emphasis on current results and give less weight to past activity. Moving averages based on the last N procedures [23] or which give less weight to increasingly historic cases [6] are attractive options but, as far as we are aware this methodology has yet to be applied in a risk-adjusted setting and it is not clear how appropriate boundary lines would be constructed. Alternatively, we could increase the volume by monitoring the overall cardiothoracic transplantation programme for a centre, rather than assessing the heart and lung programmes separately. The appropriateness of doing so will depend on the context and may serve to mask trends in the data, if, for example, results for one aspect of the programme are good and for another less so, as was seen in the unadjusted results for centre A.
Despite there still being many questions and issues surrounding implementation which warrant further exploration we believe sequential monitoring tools such as those described here have a role in monitoring outcomes after surgery. The charts are simple to construct, provide an objective independent assessment of results and have a straightforward interpretation. In the UK, centres are planning to use this methodology for real-time monitoring at a local level. Using the audit data collected thus far together with simulated data the Cardiothoracic Audit Group is working towards providing guidance on the setting of chart parameters and average numbers of transplants needed before a genuine mortality increase would be detected.
| Steering group of the UK Cardiothoracic Transplant Audit |
|---|
|
|
|---|
Prof. John Dark, Freeman Hospital, Newcastle
Prof. Martin Elliott, Great Ormond Street Hospital for Children, London
Dr Bill Gutteridge, Representative, NSCAG, Department of Health
Mr Asghar Khaghani, Harefield Hospital, Harefield
Dr Jan van der Meulen, CEU, The Royal College of Surgeons of England, London
Mr Andrew J Murday, Scottish Cardiopulmonary Transplant Unit, Glasgow
Prof. John Wallwork, Papworth Hospital, Papworth
Mr Nizar Yonan, Wythenshawe Hospital, Manchester
| Appendix A |
|---|
|
|
|---|
, the type I error rate (probability of rejecting the null hypothesis, H0, that the event rate is p0, when it is true) and ß, the type II error rate (probability of accepting the null hypothesis, H0, when it is false).
If Xi denotes the outcome for procedure
i, with Xi=1 if a death
occurred and Xi=0 if it did not, then the
cumulative sum is given by
|
| (1) |
|
| (2) |
|
| (3) |
Ti is plotted against operation number
i, and the upper control limit h1 (to detect
an increase from p0 to p1), and
lower control limit h0 (to accept
p0), are defined by
|
| (4) |
|
| (5) |
Note that OR is the odds ratio corresponding to an increase in event rate from p0 to p1.
The corresponding risk-adjusted chart is constructed in a similar
manner. The control limits are horizontal and defined as above Eqs.
(4) and (5). The cumulative
sum Ti in Eq. (1) is replaced by
|
| (6) |
|
| (7) |
p0i, the procedure specific risk-adjusted probability of death.
p1i, the procedure specific risk-adjusted probability of failure under the alternative hypothesis, H1, corresponding to an increase in odds ratio of size OR.
In practice it is not necessary to calculate
p1i for each procedure i as it can be shown
that
|
| (8) |
|
| (9) |
Variable life-adjusted displays (VLAD), in contrast, graph
|
| (10) |
against i, where p0i is defined as above. Replacing the predicted p0i by the constant acceptable event rate p0, provides an equivalent unadjusted chart.
| Appendix B. Conference discussion |
|---|
|
|
|---|
Mr Ganesh: With the basic "difference between observed and expected CUSUM chart", what happens is we create a score of zero for every successful transplant, meaning everyone who lives beyond 30 days, and one for everyone who has died before 30 days, and the probability of surviving, that was estimated from the risk-adjusted model, is subtracted from this score.
If you take the probability as P, then for a patient who is dead, it is one minus P, so the graph will become more and more positive and it will keep going up, whereas for somebody who has survived it is zero minus P and it is negative and the chart keeps going down. So this is straightforward.
So the greater the number of deaths a centre has the more steeply the chart will go up or go down, and it is a visual representation and it is directly comparable, whereas the SPRT charts are a bit more difficult to interpret in the sense that they involve more complex techniques of using the alert and alarm lines. So we basically see whether they crossed the alert line or the alarm line and then we alert the centre.
Mr Keogh: Do you think that it is reasonable to reset to zero at regular intervals, and if so, what should those intervals be?
Mr Ganesh: It has been advised in the statistical literature that these charts should be regularly reset because we don't want to retain historical data as we go ahead, but we find that it is quite complex to actually apply risk-adjusted alert and alarm lines when you reset the chart, and there are no guidelines for telling us when we should reset it and what we should do when we reset it, because when we reset, we lose the baseline really. We are still working on that.
Dr R. Schistek (Salzburg, Austria): Have you tried to do simulations with similar cases on the subject?
Mr Ganesh: Well, we are actually working on real-time simulations now, and the next step is for us to give a real-time simulation program to the center so that they can enter the data live and see where they are, and that is what we are working on now.
Dr Schistek: We have done a similar thing with risk stratification with the EuroSCORE for coronary patients, and you see then that these curves can drift up and down for a long time without being off the expected results. So I think one has to be careful in taking this too serious.
Mr Ganesh: Yes, we agree with this point.
| Acknowledgments |
|---|
Sharon Beer, Heather Constance, Yvonne Davenport, Joanne Hasan, Myra Kerr, Vince Salter, Kirsty White, Pauline Whitmore.
UK Transplant
We are indebted to the Data Executive ar UK Transplant who initially accrue and assimilate the data for analysis by the UKCTA.
The UKCTA is funded by the Department of Health.
| Footnotes |
|---|
Presented at the joint 18th Annual Meeting of the European Association for Cardio-thoracic Surgery and the 12th Annual Meeting of the European Society of Thoracic Surgeons, Leipzig, Germany, September 1215, 2004. | References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
M. E. Matheny, D. A. Morrow, L. Ohno-Machado, C. P. Cannon, M. S. Sabatine, and F. S. Resnic Validation of an Automated Safety Surveillance System with Prospective, Randomized Trial Data Med Decis Making, March 1, 2009; 29(2): 247 - 256. [Abstract] [PDF] |
||||
![]() |
D. M. Holzhey, S. Jacobs, T. Walther, M. Mochalski, F. W. Mohr, and V. Falk Cumulative sum failure analysis for eight surgeons performing minimally invasive direct coronary artery bypass J. Thorac. Cardiovasc. Surg., September 1, 2007; 134(3): 663 - 669. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. J Biau, M. Resche-Rigon, G. Godiris-Petit, R. S Nizard, and R. Porcher Quality control of surgical and interventional procedures: a review of the CUSUM Qual. Saf. Health Care, June 1, 2007; 16(3): 203 - 207. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. J. Novick, S. A. Fox, L. W. Stitt, T. L. Forbes, and S. Steiner Direct comparison of risk-adjusted and non-risk-adjusted CUSUM analyses of coronary artery bypass surgery outcomes. J. Thorac. Cardiovasc. Surg., August 1, 2006; 132(2): 386 - 391. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| ANN THORAC SURG | ASIAN CARDIOVASC THORAC ANN | EUR J CARDIOTHORAC SURG |
| J THORAC CARDIOVASC SURG | ICVTS | ALL CTSNet JOURNALS |