|
|
||||||||
a Clinical Operational Research Unit, University College, London, UK
b Cardiothoracic Unit, Great Ormond Street Hospital for Children NHS Trust, Great Ormond Street, London, UK
c Cardiovascular Surgery Database (CVSDB) of the Division of Cardiac Surgery at the Hospital for Sick Children, Toronto, Ontario, Canada
d Hospital for Sick Children and the University of Toronto, Ontario, Canada
Received 5 September 2007; received in revised form 15 November 2007; accepted 20 November 2007.
* Corresponding author. Address: Clinical Operational Research Unit, University College, 4 Taviton Street, London WC1H OBT, UK. Tel.: +44 207 6794508/9; fax: +44 207 813 2814. (Email: s.gallivan{at}ucl.ac.uk).
| Abstract |
|---|
|
|
|---|
Key Words: Mortality Errors Simulation
| 1. Introduction |
|---|
|
|
|---|
The original impetus for the research was the Bristol Royal Infirmary Inquiry, a cause célèbre in the UK, concerning the investigation of a cardiac surgery centre whose outcomes were judged to be substantially worse than at other centres. Analysis of data from two national clinical databases played a major part in the Inquiry [9]. This analysis was hampered due to the fact that there were major discrepancies between the two databases concerning estimates for both the number of operations of different types done and estimates of mortality rates for different types of surgery. Indeed for some types of operation, the mortality estimate from one database was more than twice that recorded in the other [9]. Some of the difficulties associated with doing sensible statistics with clinical data known to contain errors have been discussed elsewhere [10].
In view of the numerous potential sources of error and the fallibility of data cleaning methods, it is plausible that most major databases contain some undetected errors.
The information within databases is often subjected to statistical analyses (indeed this is usually the reason that a database is assembled in the first place). However, most statistical analysis methodology is based on an implicit assumption that the data being analysed are error free. Given the acceptance that most databases contain errors, placing reliance on the results of such analysis requires something of an act of faith that these errors are unlikely to make the analysis misleading. The obvious question to ask is whether this act of faith is sound. Reflection on this issue led to an intriguing thought experiment. Suppose one takes a totally accurate database concerning deaths following surgery and deliberately seeds it, at random, with errors with a known average rate, then what effects would this have on mortality rate estimates?
In fact, it is possible to analyse this problem mathematically and those who have an interest in arcane probability theory, or have difficulty sleeping, can find the details of the mathematics related to this elsewhere [11].
Here, for clarity of exposition, instead of using mathematical analysis, we adopt an alternative and simpler analysis approach using a computer simulation of the error-seeding process. This has been implemented in the context of a real clinical database concerning outcomes for congenital heart surgery. We report the results of a number of thought experiments carried out using this simulation method.
| 2. Materials and methods |
|---|
|
|
|---|
In order to replicate a realistic analysis scenario, we have used data from the Toronto Cardiovascular Surgery Database for Congenital Heart Surgery (CVSDB). CVSDB consists of a master file of demographics, one for each patient, linked to a table of operation files (one or more for each patient). Every paediatric cardiac surgical patient (n = 11,571) and cardiac operation (n = 17,668) at the Hospital for Sick Children in Toronto has been entered since CVSDB's inception 1 July 1982 to 31 December 2005. Our analysis excludes all re-operations during the same admission and is based upon 14,522 index operations in 11,571 patients.
It should be noted that, given the nature of thought experimentation, it is not necessary that we establish that this database is completely error free since its use is purely to give a credible baseline for data that might typically be contained in a clinical database.
A computer simulation of the effects of errors has been written making use of facilities within the Excel spreadsheet system. An extract of the data that the user must supply to use the simulation is shown in Table 1 . The user selects a marker operation whose mortality rate is to be estimated, in this case Atrial Septal Defect/Primum repair.
|
For example, the data in Table 1 reflects the assumption that on average, 2.0% of cases of ASD/secundum repair would be coded as ASD/Primum repair (irrespective of what their actual outcome was).
The simulation operates by setting up a spreadsheet of data corresponding to the true entries in Table 1, and then uses a random number generator to randomly seed this with errors according to the average rates specified.
The time taken for a single simulation run is short. For the analysis described in this paper, for every scenario investigated, the simulation was used to carry out 1000 independent simulations of the effects of error seeding. This gave 1000 estimates for the mortality rate for the marker operation from which the mean and standard deviation for the mortality rates were calculated.
The CVSDB database concerns outcomes for 132 operation types. In our analysis, we restricted attention to operation types that had been performed at least 100 times according to the database and which had non-zero mortality, a total of 30 marker operations.
In addition, we carried out various thought experiments using data related to the 30 marker operations, depending on different assumptions made about the nature of errors.
In all cases, the analyses described above were also carried out using a mathematical model [11] and the results compared.
| 3. Results |
|---|
|
|
|---|
|
|
|
|
| 4. Discussion |
|---|
|
|
|---|
We have shown that the results of a simple thought experiment can be used to establish very useful, if disturbing insights into the dangers posed if too much faith is placed in the accuracy of databases and in the results of statistical analysis of their content.
The reason that outcome miscoding has such a major impact is clear once one thinks about it. For low mortality operations, if outcome miscoding occurs at random, then the vast majority of instances of miscoding will occur for survivors (since the majority of patients survive). The effect of this is a large proportional increase in the estimated number of deaths (since the actual number is relatively small). In mathematical terms, thinking of mortality rate as a fraction rather than as a percentage, then outcome miscoding has a large proportional effect on its numerator (the number of deaths) and none on its denominator (the number of cases).
The reason that random data omission has a relatively small effect is also apparent in retrospect by considering the mortality rate as a fraction. If data are lost at random, the numerator (the number of deaths) will on average be reduced by the same proportion as the denominator (the number of cases), leaving the value of the fraction unchanged. Of course this would not be the case if the rates of data loss for deaths and survivors were different, a likely clinical scenario as it is far more likely to omit entry of an emergency salvage operation done in the middle of the night than it would be to omit a surgical triumph.
The effect of miscoding of operation type is rather more difficult to explain. For the lowest risk procedures, the possibility that higher risk procedures would erroneously contribute to their tally of outcomes means that their mortality rate would be overestimated. On similar grounds, mortality rates for the highest risk operations would be underestimated. In some sense, this is a regression to mean effect. Under- or overestimation of mortality rate can also occur for procedures whose risks are between these two extremes, but the extent of this depends on the relative numbers of each type of case performed.
The consequences of this can only be imagined. There is currently enormous interest in assembling large-scale databases, particularly since the advent of risk scoring systems and methods for monitoring outcomes. The authors certainly support such efforts, indeed have contributed to such work [1–6,8]. However, the results reported here sound a loud note of caution and perhaps it is time for a reappraisal of the clinical database culture.
All the activity surrounding the amassing of clinical data has its genesis in the advent of computer systems and, to a certain extent, the culture of clinical research done while in training. Now even modest lap top computers allow easy access to powerful database and statistical analysis packages which would have been an impossibility in say the 1970s. But computer packages being easy to access does not equate to them being used well or even sensibly. There is often a somewhat misplaced belief that if one gathers lots of data then, if analysed cleverly enough, they will reveal a new truth. Most medical statisticians will recognise the circumstances of being approached by a clinician doing research training, who has gathered a lot of data, compiled a database and wants advice about what to do with it next. Often such databases have many more data items than there are patients (a database thicker than it is deep in cynical statistical parlance). The ambitious early stages of a study often involve designing data collection forms and all too often, the aim seems to be to identify as many things as possible that can be written down. In some sense this reflects travelling hopefully, surely if there are lots of data, something will emerge. This view is wrong headed; the reality is that the more data items that are collected, the more errors occur [7].
It must be stressed that our analysis has deliberately been chosen to be as simple as possible, very much in keeping with the ethos of thought experiments. The aim has been to establish a point of principle and to provide insights into the potentially damaging effects of errors in data. The aim has not been to quantify what error rates are in practice nor to reflect other subtleties associated with errors. In practice, the manner in which errors come about may well be far more complex than reflected by the assumptions made in our simple model, and it would be surprising if it were not. Our assumption of random errors reflects the sort of errors that might be made when inputting data using a computer keyboard and it is certainly plausible that such errors are made. However, there are many other mechanisms whereby errors can occur which are not completely random. Complete data may be more likely to be available for elective rather than emergency cases, centres where there is a research interest in data collection may perhaps be more thorough at checking data than others, cases where several procedures are done are possibly more prone to miscoding than cases with just a single procedure. Thus we cannot claim that our model is an accurate representation of reality. However, if our simple thought experiments suggest that data errors can lead to very misleading analyses, it seems unlikely that a more detailed modelling exercise would reverse this view. Indeed it may well be that the assumption of random errors is conservative, since if in reality there are systematic or even deliberate errors, these might be expected to introduce even more bias.
For a given operation type, the results of each error-seeding simulation exercise give a thousand estimates of mortality rate which may well have a skew distribution. There are some who may question why we report the mean of this distribution rather than the median. There is a common, but sometimes misguided view that medians are the appropriate method for summarising a skew distribution. However there are circumstances, and this is one, where it is perfectly sensible to report the mean value of a skew distribution. Another example concerns summarising a poker player's nightly losses over a year. It would be disingenuous for him to tell his wife that the median loss was zero, when in fact he had just lost the house. In this case the mean loss per night, or the total loss are far more informative ways of summarising outcome than the median. Statistical nuances aside, pragmatically, it is a fact that audit methods for monitoring cardiac surgery mortality are typically based on mean rates. If estimates of mean mortality rates are inaccurate, then this would have a direct effect on the quality of audit. This is why we have chosen to report mean rather than median values.
| 5. Conclusions |
|---|
|
|
|---|
Miscoding outcomes is particularly problematic. For most surgery, random miscoding of outcome can be expected to lead to overestimation of mortality rates, the expected scale of such overestimation being considerable, particularly for operations with a low mortality rate and we have given a plausible mathematical rationale for why this would be expected. Our study suggests that the effects of random data omission have a smaller effect and again there is a plausible mathematical rationale for this.
Miscoding operation types can also have a major impact on the expected accuracy of mortality rates, although the mathematical rationale for this is rather more opaque.
We strongly advocate the careful compilation and checking of data concerning clinical outcomes since, as this paper shows, even relatively small error rates in data gathering and coding can render clinical databases unfit for purpose. One should decide ahead of time precisely what it is that a database is going to be used for, what analyses will be done and what data items that analysis needs. Having done that, one should focus data collection and the logic of the data coding only on those data items [17]. The sparser the data design the better, so long as the analyses specified prospectively can be achieved, if they are associated with registry data or research data.
Even with a sparse data design, steps need to be taken to eliminate errors and there are several methods by which this can be done. It is worth considering entering data twice, preferably by different coders, then comparing the two versions to detect discrepancies. It can also be useful for coders to work in pairs, one to read from a computer printout of the data, the other to check this against the forms from which the data were extracted. A random selection of data items should be down loaded and checked against original records.
Our experience with the CVSDB database has been that it has been useful to have a monthly review to confirm the entries for that month with the operating room team, clinical coordinator and surgeons secretaries and by circulating a summary report to ask for feedback regarding any errors or omissions. It has also been found that frequent use of the database by many people is also a good quality check. So, a requirement of a clinical database is that it should be easy to use and accessible to all appropriate staff.
Even with such labour intensive methods, it is unlikely that errors will be completely eradicated. In view of this, it is possibly wise to adopt a more sceptical attitude to quantitative results, particularly in relation to rates that are small.
The analysis presented here can clearly be applied beyond the realms of surgical mortality and concerns any adverse events frequency of which is rare.
| Appendix A |
|---|
|
|
|---|
Dr T. Treasure (London, United Kingdom): We should remember that databases are extremely difficult to create, to monitor and to use, but they are easy to criticise. We are all very aware of that. In our meeting in 2004 Bill Williams as EACTS Honoured Guest gave us a superb talk on this subject and reminded us of the fundamental differences between a research database and a registry.
I have two questions. This flaw you describe, this danger, seems to be inherent in the registry, and yet registries are not subject to the same quality control as research databases, perhaps exemplifying the hazards. The other one is a slightly theoretical question. This fascinating way in which the arterial switch operation remains on the line while the other two examples diverge is the reverse of regression to the mean and perhaps we could call it divergence from the mean and is the inevitable consequence of random errors.
Dr Stark: I think it's true that registries are not subject to the same quality control as research databases. It's very difficult to control any database. It's time-consuming and very costly. The mathematical co-author, Steve Gallivan, always emphasises that you really have to set up what you want to get out of the database and what analysis you will use. You will then know what data to collect. For the registry, you should collect as few data as possible, but even if your dataset is only 20–25 items, it is still very important to check, because, as we have shown, a small error of 1% can increase your mortality estimates by five times.
Dr C. Tchervenkov (Montreal, Canada): This is a very important experiment you did and it demonstrates the limitations when data is inaccurate in either direction, but this was based on one institutional experiment, so it perhaps has more relevance to single-institution studies that are presented. We know that the research database is very extensive. Once you collect the patients, then you can go up to 500–700 variables, but sometimes if the database out of which you pull out this clinical study is inaccurate to start with, then you dont have the true denominator. So now Im leading to the question that leads to the relevance of this study to multi-institutional registries and databases. Have you done an experiment across multiple institutions to see what impact these errors have on the overall registry type of database? In other words, could you speculate what you might find? Going from the transposition example where it was fairly accurate because maybe there were errors in both directions so it was close to the average, is it possible that perhaps in a multi-registry database when you combine all these errors in different directions that perhaps actually they might be more accurate than the single institution, or actually they might be even less accurate?
Dr Stark: My speculation would be that multi-institutional registries would be far less accurate, because to control one database is difficult enough. If you have people collecting data from many institutions, in general, I think it would be more likely to have more errors. However, the reason why we used this database was not that we assumed that it was very pristine and accurate, because even if it was inaccurate, it would still demonstrate our point. We have actually done mathematical modelling, but we thought that to present it to a surgical audience would be more difficult. So to make it more understandable, we have used this database. What is interesting is that the mathematical modelling very nicely corresponded to our experiments with the database. As we point out in the paper, this model is available free of charge on the Internet, so if you want to use it for your multi-institutional database, you can do it yourself.
Dr W. Brawn (Birmingham, United Kingdom): It's fantastic that youve highlighted really the cost involved, because all this work involves money and it's very difficult for our institution to get that, and also highlighted that these databases, if you use them as research tools as well, that helps to clean them and they become living organisms and you can continually review the data and make sure it is accurate, and I think that needs to be accented, as you have done and as Bill Williams has done.
Dr Stark: I think I should just mention that the impetus for this paper was the Bristol Royal Infirmary Inquiry which used for assessment of the institution and the surgeons two available databases. It was found during the investigation that these databases were inaccurate, that the mortality rate between the two differed as much as twice the mortality rate that was recorded in the other database. So I think one has to be very careful when the databases, even when they are well collected, are used for the analysis of surgeons performance on the basis of which the surgeons may be suspended.
Dr B. Maruszewski (Warsaw, Poland): As a person responsible for the EACTS Congenital Database, I cannot be more grateful. I am really very grateful for what you said. I think it is very important that people realise that any data collection which is not verified in a professional way is not valid and can be misused and can be used against our profession. That's why I think that verification of data at the various levels, including the local institutional level, including the automatic and computerised level, up to visiting the sites and checking 100% of the data is of the greatest importance. I think our association has shown that although we take responsibility for collecting and possessing enormous amounts of data, we realise how important it is to validate and verify this data, and, as you know, we have published it and the process continues. At the moment, 20% of the content of the European congenital surgeons database is completely verified 100% at the sites. I think it's a very important message to send.
| Footnotes |
|---|
Presented at the 21st Annual Meeting of the European Association for Cardio-thoracic Surgery, Geneva, Switzerland, September 16–19, 2007. | References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
K. F. Welke, B. S. Diggs, T. Karamlou, and R. M. Ungerleider Comparison of Pediatric Cardiac Surgical Mortality Rates From National Administrative Data to Contemporary Clinical Standards Ann. Thorac. Surg., January 1, 2009; 87(1): 216 - 223. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| ANN THORAC SURG | ASIAN CARDIOVASC THORAC ANN | EUR J CARDIOTHORAC SURG |
| J THORAC CARDIOVASC SURG | ICVTS | ALL CTSNet JOURNALS |