# When Not to Connect the Dots

When plotting a sequence of points, should we connect the dots into a line? We usually do, but it shouldn’t be a foregone conclusion. Every chart element should have a clear and precise meaning: if we can’t explain what it means or it is ambiguous, it confuses readers and we should omit it.

The bulk of the SPC literature shows Control Charts as broken-line graphs. 100 years ago, Walter Shewhart, the inventor of these charts, plotted separate points instead. He did not explain why, so it’s on us to try and figure out what may have been his reasons.

# Shewhart’s Dots

It’s surprising to see that Walter Shewhart plots his Control Charts as unconnected points in both of his books:

As a physics Ph.D. from UC Berkeley, Shewhart surely knew how to connect dots. If he didn’t do it, it’s because he chose not to.

## A Broken-Line Chart from Shewhart

In a 2021 guest post at the Deming Institute, John Hunter included the following broken-line plot from Shewhart, describing it as “the first control chart”:

Looking at these two plots side-by-side, however, we can see a key difference:

When we measure at discrete times a quantity that exists continuously between measurements, it makes sense to interpolate between measurements. In the absence of events like explosions, thermometer readings meet these conditions.

Let us assume we measure an object’s temperature. If we read the thermometer periodically, we can reasonably assume that a hypothetical reading between two actual ones would fall on the segment joining them. The broken line graph is 8th-grade math, taught through the example of temperatures over time. We can take it a step further:

We may find the hinge points in the broken line unrealistic regarding thermometer readings, as we don’t expect these readings to change direction abruptly. Like a Bezier curve through the dots, a smooth line is more realistic. Interpolation differs from linear modeling in that it produces a line that passes through the measurements and, therefore, connects the dots. The linear model doesn’t. Interpolation and linear modeling do not serve the same purpose and are based on different assumptions about the data.

## Workpiece Dimensions

Unlike temperatures, measurements of hole diameters on workpieces coming out of a machine exist only when you measure them. They are not a quantity that exists continuously, regardless of our ability to measure it. The hole diameter does not exist between unit completions while the next hole is being drilled inside the machine.

In Shewhart’s statistical control model, the line between two points cannot be viewed as a trend because the measurements are assumed to be independent and identically distributed (i.i.d.) fluctuations around a central value.

A line between points on a chart implies some kind of link between the variables. The assumption of independence between measurements X_i and X_{i+1} means that the line joining observations x_i and x_{i+1} is no more meaningful than between the winning numbers in two consecutive roulette spins. And this is true of subgroup averages as well as individual values. Spinning a roulette four times, we may get the following results:

# The Later SPC Literature

Shewhart’s successors had no reservations about connecting the dots. 30 years after Shewhart invented the Control Chart, the Western Electric Statistical Quality Handbook shows Control Charts as broken-line graphs, omitting the x-axis altogether:

Another 60 years later, the support site for Minitab shows the following example:

Douglas Montgomery uses Minitab to generate examples of Control Charts by sample with connected dots. Don Wheeler also systematically connects the dots, sometimes with no x-axis, often with the x-axis by time, but sometimes by sample number, as in this example in his analysis of chunky data in 2023:

While the x-axis is omitted, he provides the raw data, and shows how the 27 data points on each chart are generated from a sequence of 27 subgroups of 5 measurements each of a rheostat knob dimension. Therefore, the broken lines on this chart connect stats on successive samples, which Shewhart declined to do 100 years ago.

# First and Last Piece Checking

When process capability is high in a process that produces one piece at a time, manufacturers sometimes reduce monitoring to the first and last pieces of a production run. Under the assumptions of the Shewhart Control Chart, this would make no sense, as the first and last pieces in the run would be deemed to provide no more information about the whole run than any randomly selected two pieces.

For first and last piece checking to assure that all the pieces in between are consistent, you have to assume that the independent fluctuations for every piece generated by the common causes are negligible compared to systematic changes caused by production in sequence one piece at a time, like tool wear in machining or target depletion in sputtering.

These changes are predictable and harmless as long as they are small enough. You check that the first and last pieces occurred as expected and estimate the values for the other parts by interpolation. Today’s machines often have control systems that continue the process until they detect its end-point, automatically compensating for factors like tool wear. It is then observable through process parameters like time rather than measurements on parts.

Unlike first-and-last-piece checking, the \overline{X}-chart doesn’t use the part sequence within the sample in generating alarms. It only uses the sample sequence information to determine when the alarm is issued.

# Conclusions

As children, we learned to connect dots literally, to let the picture of a rabbit emerge. As adults, we made “connecting the dots” a metaphor for recognizing patterns. When actually connecting dots on chart into a broken line, we must always consider the backstory of the data, which determines the meaning of the connections.