Displaying Variation

As with all aspects of improvement work, expect to carry out multiple passes through data representation and interpretation as the quality of your understanding develops and additional insights into the data arise.

The key difference between improvement data interpretation and “regular” management data interpretation is that the regular view is concerned with aggregation into totals and averages, whereas the improvement view is concerned with differences within and between the data.

This implies that your data is available in a sufficiently granular form to allow these differences to be brought out, which is one of the considerations of data collection design.

The fundamental dimension that we are interested in exploring for improvement purposes is the variation over time - how does the process perform on successive iterations, or over consecutive time periods (e.g. daily, weekly, monthly). We can represent either the absolute variation, or the difference between forecast and actual (which is useful for tracking volumes).

runchart

Run Chart

We can use two graphical tools to explore this variability. The first is the common-or-garden run chart, which simply depicts the data in time sequence.

Conventionally, run charts are drawn as line charts (no vertical bar charts, please). The run chart enables us to get a first impression of the range of variability and of patterns and systematic changes over time.

However, the run chart alone does not give us sufficient evidence to determine whether such variations are endemic within the process or are evidence of changes to the process.

  • Why is point 7 so low?
  • Is there a trend from point 7 to point 13?
  • Why the big drop from point 13 to point 14?
  • What will happen at point 20?

It is human nature to read meaning into the ups and downs - meaning which is meaningless without proper statistical backing.

To make such distinctions, we need to understand the statistical “spread” of the process. The initial graphical representation for doing this is the histogram, in which we count the number of occurrences that fall into fixed width intervals. 

Histogram

runandhistWhile we are plotting our run chart, we can count up how many points fall within equally spaced value ranges to get a picture of the total performance of the process over the data collection period.

As you can see in the chart at left, the number of dots in the histogram portion is just the count of how many points fall into each value range as demarcated by the dotted lines.

Bear in mind that the histogram "forgets" the order of the data. You can't tell from a histogram whether the process changed during the course of the data collection period. If the data have been collected over an extended period, you should always review the matching run chart alongside the histogram.

histogram

Conventionally, histograms are rotated by 90 degrees and use bars rather than dots to represent the number of observations in each band.

Looking at the histogram will give us some idea of the consistency of the process, although there are some practical issues involved in creating and interpreting the histogram with smaller data sets.

Histograms come into their own as a way of summarising the variation in large data sets.

Box Plots

There is another tool that is most useful for carrying out a quick visual comparison between groups of variable data.

The box plot is based on the median (mid point) of the data set. The "box" part of the plot brackets the middle 50% of the data, and the "whiskers" estimate the amount of variation expected within the data set based on the observed values.

box-and-hist

The exhibit at left shows the box plot that matches the run chart and horizontal histogram.

The histogram is generally more informative than the box plot, but the box plot is useful for side by side comparisons as we will see in root cause analysis.