Statistics and reality: part 2

Davis Balestracci

Principles of Quality Management - (2006) Volume 14, Issue 2

Statistics and reality: part 2

Davis Balestracci MS^*

Harmony Consulting, Portland, USA

Corresponding Author:: Davis Balestracci
Harmon y Consulting
94 Ashley Ln.,Portlan d
ME 04103–2789,USA
Tel: +207 899 0962
E-mail: davis@dbharmony.com
Website: www.dbharmony.com

Received date: 18 January 2006 Accepted date: 10 March 2006

Visit for more related articles at Quality in Primary Care

Abstract

In Part 1,after discussing why a process-orientated view of statistics is necessary,the conclusion reached was that the best statistical analysis will encourage the art of asking better questions in response to variations in data.1 The purpose of this article is to debunk the myth that statistics can be used to ‘massage’ data and prove anything. Itwill demonstrate the counterintuitive simplicity and power of merely ‘plotting the dots’– simple time plots of process outputs. These alone usually yield far more profound questions than the most complicated (alleged) statistical analysis.

Keywords

common cause,comm on cause strategy, run chart, special cause, stable process, tampering,varia tion

Old habits die hard

Suppose that it is time for the dreaded quarterly meeting about ‘how you’re doing’ regarding your system’s three hospitals’ infection rates. The summary table is shown in Table 1 (the data are fictitious). But, first,of course, one is ‘obligated’ to show the data through what this author has found to be the most useless of tools: the bar graph (Figure 1).

Table 1: ‘Statistical’ comparison of three hospitals’ infection control performance

Figure 1: Charts showing three hospitals’ infection control performance

So ... how are you doing?

Luckily,your local statistical ‘guru,’ a ‘Six Sigma black belt’,has come to your rescue and written a report, which is handed out. Several ‘significant’ findings are shared:

1 ‘Pictures are very important. A comparative histogram was done to compare the distributions of the three hospitals’ infection rates (see Figure 2). There seem to be no differences among the hospitals; however,the appearance of bell shapes suggests that we can test the normal distribution hypothesis so as to be able to perform more sophisticated statistical analyses.

Figure 2: Histogram comparison of the three hospitals

2 The three data sets were statistically tested for the assumption of normality. The results (not shown) indicated that we can assume them to be normally distributed (P values of 0.889,0.745 ,and 0.669, respectively,all of which are > 0.05); however, we have to be cautious – just because the data passed the test for normality does not necessarily mean that the data is normally distributed ... only that, under the null hypothesis, the data cannot be proven to be nonnormal.

3 Since the data can be assumed to be normally distributed,I proceeded with the analysis of variance (ANOVA) and generated the 95% confidence intervals (see Figure 3).

Figure 3: One-way analysis of variance

4 The P value of 0.897 is greater than 0.05. Therefore, we can reasonably conclude that there are no statistically significant differences among the hospitals, as further confirmed by the overlapping 95% confidence intervals.’

Let’s see ... have we used all the potential jargon?

Mean ... median ... standard deviation ... trimmed mean ... quartile ... normality ... histogram ... P value ... analysis of variance (ANOVA) ... 95% confidence interval ... null hypothesis ... statistical significance ... Standard Error of the Mean ... F test ... degrees of freedom ...

Oh,by the way ... did you know that this analysis is totally worthless?

The good news is that you can forget most of the statistics that you have been previously taught. I can hear you all reassuring me,‘Don ’t worry ... we already have!’. Unfortunately,we remain a ‘maths phobic’ society,but there is no choice – whether or not we understand statistics,we are already using statistics!

We could think of an infection as ‘undesirable variation’ from an ideal state of ‘no infections’. By acting on or reacting to this ‘variation’,you instinctively gather ‘data’ (hard or soft) upon which to assess the situation and take action to eradicate the variation. You have just used statistics – reacting to a perceived undesirable gap of variation from a desired state to close the gap!

However,there are two types of variation – common cause and special cause – and treating one type as the other,as is commonly done, actually makes things worse. As famous curmudgeonly W Edwards Deming once said,‘For every problem, there is a solution – simple,obvio us ... and wrong!’.

But there is actually more good news. The statistics you need for improvement are far easier than you ever could have imagined. However,thi s philosophy will be initially quite counterintuitive to most of you and very counterintuitive to the people you work with, especially physicians trained in the statistics of research and clinical trials,which,unfortunate ly, are not appropriate for most quality improvement situations! If nothing else,it will at least make your jobs easier by freeing up a lot of time,by recognising when to walk out of time-wasting meetings! It will also help you gain the cultural respect you deserve as quality professionals because your data collections and analyses will be simpler and more efficient ... and more effective! The respect will also be magnified because you will be able to stop inappropriate responses to variation that would make people’s jobs more complicated without adding any value to the organisation.

Back to the three hospital infection rate data

There are three questions that should become a part of every quality professional’s vocabulary whenever faced with a set of data for the first time:

1 how were the data defined and collected ... and were they collected specifically for the current purpose?

2 were the systems that produced these data stable?

3 were the analyses appropriate,given the way the data were collected and the stability state of the systems?

How were the data collected?

Table 1 represents a descriptive statistical summary for each hospital of 30 numbers that were collected monthly. At the end of each month,each hospital’s computer calculates the infection rate based on incidents observed and normalises to a rate adjusting by the number of patient days. No one has looked at the calculation formula for the last 10 years.

Were the systems that produced these data stable?

This might be a new question for you. As was made clear in Part 1,everythi ng is a process.1 All processes occur over time. Hence,all data have a ‘time order’ element to them that allows one to assess the stability of the system producing the data. Otherwise,many statistical tools can become useless and put one at risk for taking inappropriate actions. Therefore,i t is always a good idea as an initial analysis to plot the data in its naturally occurring time order.

Were the analyses appropriate, given the way the data were collected and stability state of the systems?

‘But ... the data passed the normal distribution test. Isn’t that all you need to know?’ you ask.

And your local ‘guru’ also concluded that there were no statistically significant differences among the hospitals. Well,now consider the three simple time plots for the individual hospitals (see Figure 4).

Figure 4: Time plots for the three hospitals

No difference?! ...

Note that just by ‘plotting the dots’,you have far more insight and are able to ask more incisive questions whose answers will lead to more productive system improvements.

Compare this to having only the bar graphs,summary tables,and the ‘sophisticated’ statistical analyses. What questions do you ask from those? Would they even be helpful? ‘Unfortunately’,you are all smart people. You will, with the best of intentions,come up with theories and actions that could unwittingly harm your system. Or,wors e yet,you might do nothing because ‘there are no statistical differences’ among the systems. Or,you might decide,‘We need more data’.

Regarding the computer-generated ‘statistics’,what do the ‘averages’ of Hospital 1 and Hospital 2 mean? I’ll tell you: ‘If I stick my right foot in a bucket of boiling water and my left foot in a bucket of ice water, on average,I’m pretty comfortable’. It is inappropriate to calculate averages,stand ard deviations, etc on unstable processes.

Also,note that you haven’t calculated one statistic, yet you have just done a powerful statistical analysis! To summarise: ‘plot the dots!!!’.

More on ‘plotting the dots’: common and special causes

Almost all quality experts agree that merely plotting a process’ output over time is one of the most simple, elegant,and awesome tools for gaining a deep understanding of any situation. Before one can plot, one must ask questions,clarif y objectives,con template action and review current use of the data. Questioning from this statistical thinking perspective leads immediately to unexpected deeper understanding of the process. This results in establishing baselines for key processes and then allows honest dialogue to determine meaningful goals and action.

A more typical process is to impose arbitrary numerical goals that are ‘retrofitted’ onto the process and ‘enforced’ by exhortation that treats any deviation of process performance from the goal as unique and needing explanation – known as a special cause strategy. In paraphrasing the question and looking at specific undesirable events into the context of observing a process over time:

Is this an isolated excessive deviation (‘special cause’) or, when compared to the previous measurement of the same situation,does it merely reflect the effects of ongoing actions of process inputs that have always been present and can’t be predicted ahead of time (‘common cause’)? Would I necessarily expect the exact same number the next time I measure? If not,then how much difference from the current or previous number is ‘too much’?

It is very important to realise that just because one can explain an occurrence ‘after the fact’ does not mean that it was ‘unique’. Thinking in terms of process,you have inputs causing variation that are always present and conspire in random ways to affect your process’ output; however,they conspire to produce a predictable range of possible outputs. Many explanations merely point out things that have been waiting to happen ... and will happen again at some random time in the future! Also,your process ‘prefers’ some of these ‘bogeys’ to others. So,how can you collect data to find these deeper solutions to reduce the range of variation encountered by customers (‘common cause’ strategy)?

So,if a process fluctuates within a relatively fixed range of variation,it is said to exhibit common cause variation and it can be considered ‘stable’ and predictable – although one may not necessarily like the results. If there is evidenceof variationover and above what seems to be inherent,the process is said to exhibit special cause variation. This usually occurs in one of two ways: either isolatedsingle datapoints that are totally out of character in the context of the other data points or a distinct ‘shift (or shifts)’ in the process level due to outside interventions (intentional or unintentional) that have now become part of the everyday process inputs.

The most common error in improvement efforts is to treat common cause (inherent) variation as if it were special cause (unique) variation. This is known as tampering and will generally add more complexity to a process without any value. In other words, despite the best of intentions,the improvement effort has actually made things worse.

The ‘reward’ luncheon

You have been invited to a free pizza lunch in celebration of meeting a safety goal. Two years ago,your organisation had 45 undesirable ‘incidents’ and set a goal in the past year of reducing them by at least 25%. The December data are in,and the yearly total was: 32 incidents – a 28.9% decrease!

Various graphs were used to prove that the goal had been met.

Figure 5,the obligatory bar graph display, shows that adverse incidents in eight months of the second year were lower than those in the previous year – ‘obviously’ an improvement!

Figure 5: Undesirable incident data: two years, plotted by month

However,as Figure 6 so ‘obviously’ shows,the improvementwasmuch better than originally thought! The local statistical ‘guru’ did a trend analysis (Thank God for Excel!),w hich showed a 46.2% decrease! The ‘guru’ also predicts 20 or fewer accidents for the next year.

Figure 6:Trend analysis of undesirable data: 4.173 to 2.243 – 46.2% decrease!

You think of all the hard work in the monthly safety meeting where each individual incident is dissected and discussed to find root causes. Then there are the months where you have zero incidents and the reasons for this are discussed and implemented. It all paid off ... or did it?

Run charts

Imagine these data as 24 observations from a process. The chart in Figure 7 is known as a run chart – a timeordered plot of the data with the overall data median drawn in as a reference. The median is the empirical ‘midpoint’ of a dataset,irr espective of time order. Half of the data values are literally higher than the median value and half of the data values are lower. In the present case,the median is 3 (you could sort these 24 observations from lowest to highest and the median would be the average of the 12th and 13th observations in the sorted sequence – both of which happen to be 3).

Figure 7: Run chart for accident data January 1989 to December 1990

The initial run chart of a situation always ‘assumes’ no change was made (‘innocent until proven guilty’). A deceptively powerful set of three rules based in statistical theory can be applied to a chart like this to see whether your ‘special cause’ (you did intervene for the specific purpose of creating a change in the average, didn’t you?) did indeed affect the level of the process.

So,the question in this case becomes,‘is the process that produced the 12 data points of the second year the same as the process that produced the data points of the first year?’. The three statistical tests,call ed a runs analysis, give no evidence of a change.2 Thus,despi te the (alleged) achievement of what was seen as an aggressive goal,there is no statistical evidence of it having been met. It just goes to show you: ‘given two different numbers,one will be bigger!’.

However,regardless ,even among people who agree either ‘yes’ or ‘no’,there could still be as many different ways of coming to this conclusion as there are people reading this,w hichwould result in differing proposed actions – and a lot of formal meeting time! What is needed is a common approach for quickly and appropriately interpreting the variation through statistical theory.

Thus,given two numbers (45 and 32),one was smaller – and it also happened to coincidentally meet an aggressive goal. The ‘year-over-year’ and ‘trend’ analyses were inappropriate – and very misleading.

And it suddenly hits you: The result of all the hard work in the monthly safety meetings has been no improvement over two years and a lot of unneeded new policies!

In fact,if you continue to use this strategy (treating common cause as if it were special cause),you will observe between 20 and 57 accidents the following year!

So ... does ‘common cause’ mean we have to live with it?

Not at all. In the case of this data,peop le were treating data from a stable process exhibiting common causes of variation as if there were special causes of variation. Any observed differences were due totally to chance. Looking at individual numbers or summaries and calling any differences ‘real’ is a no yield strategy, as is looking at accidents individually. Once again,treating common cause as if it were special cause: tampering. Statistics on the number of accidents do not prevent accidents.

A common cause strategy looks for underlying patterns producing the data – a statistical ‘slicing and dicing’,if you will,to try to expose process inputs that could be accounting for a significant source of the process’ variation.

In the case of the adverse event data,one might ask, ‘is there a commonality among all the high-event months ... or the low-event months ... or the months where there were zero events? Are some accidents unique to certain departments? Do some accidents occur in all departments? Does one department exhibit a disproportionate total of accidents because its safety policy enforcement process is sloppy overall?’ These questions address process patterns that are exerting their influence consistently as inputs to the safety process. Neither the monthly data points nor individual accidents should be treated uniquely in isolation. It is only by looking at the aggregated factors contributing to all 77 accidentswhere opportunities in the underlying process inputs will be exposed.

Think of an ‘accident’ as: a hazardous situation that was unsuccessfully avoided!

It is a common approach to have ‘incident reviews’ every month and go over every single incident individually – in essence,‘scrap ing it like a piece of burnt toast’ – and making a recommendation after each review. Can you see that this treats each incident as if it were a special cause?

Smart people have no trouble finding ‘reasons’ if they look hard enough ... after the fact. There is a high risk of treating spurious correlation as cause-andeffect, w hich only adds unneeded complexity to the current process,but no value.

This also has implications for the current issue of ‘sentinel event’ analysis without asking the question:

Was this an individually isolated event (special cause) or is the process such that it has been waiting to happen because the process inputs all randomly aligned to make it happen (common cause) ... which means that it could happen again?

Summary

As quality professionals,it is important to realise that data analysis goes far beyond the routine statistical ‘crunching’ ofnumbers and useless bar graph displays. The greatest contribution to an organisation is getting people to understand and use a process-oriented context in analysing situations as well as principles of good, simple, efficient data collection,a nalysis and display. This cannot help but enhance the healthcare quality professionals’ credibility. It will also help gain the confidence and co-operation of organisational culture during stressful transitions and investigations. It will be vital to put a stop to many of the current wellmeaning but ultimately damaging ad hoc uses of statistics. Whether or not people understand statistics, they are already using statistics ... and with the best of intentions.

As a final summary,Box 1 shows the key lessons to keep in mind as you start looking at your organisation through a lens of statistical thinking and Box 2 and Table 2 summarise the statistical mindsets of Parts 1 and 2 in ‘The 10 Fundamentals of Variation’ and ‘Four Common Statistical Traps’,resp ectively.

Box 1: Key lessons

Box 2: The fundamentals of variation

Table 2: Four common statistical traps

References

Balestracci D. Data ‘sanity’ – statistics and reality. Quality in Primary Care 2006;14:49–53.
Balestracci D and Barlow J. Quality Improvement: practical applications for medical group practice (2e). Englewood,CO: Center for Research in Ambulatory Health Care Administration (CRAHCA),19 96. Phone: +303–397–7888 to order ($8) or access through the following link: www5.mgma.com/ecom/Default.aspx? tabid=138&action=INVProductDetails&args=479 (accessed 16 March 2006).