Sep 17 2025

Label your charts!

Charts you share with others need a bodyguard of text to be self-explanatory, avert misunderstandings, and support learning. None of this matters when you chart exclusively for your own use, but it is obligatory when communicating with a team or making a case to management.

Generating an informative, actionable chart can take hours; documenting and labeling it should take minutes, yet we encounter charts with missing or unclear labels in business documents, published articles, and even textbooks.

The Context of a Chart

At a minimum, the labels should answer the 5W1H questions: Who? What? Where? When? Why? How? For a chart in operations, this translates to the following :

Who is in charge of generating or maintaining the chart. This is so that they can be part of any discussion of it.
The purpose of the chart. It can be in the chart title. If on a slide, it can be in the slide title; if in a report, in the figure caption.
The production line or business operation the chart is about. It does not need repetitions on every chart on a board, as long as it is prominently on the board.
The date and time the chart was generated.
The range of time the data covers.
A revision number is needed if you have shared multiple versions.
The labels on the chart itself must explain what we are looking at:

- Descriptive variable names. “OLT1” is better than nothing, but “Oxide Layer Thickness 1” is more descriptive.
- Units of measure. Omitting units in charts is not a negligible detail. At best, it is a discourtesy to the reader; at worst, the cause of disasters.
- The type of chart, when it is not obvious. Some types look alike, like boxplots and candlestick charts. They use the same symbols with different meanings.
- The points in the plot. Are they individual measurements or sample statistics? In the latter case, which statistics are used: mean, median, standard deviation, range, extreme values, etc.?
- Sample sizes. How many individual values are in each sample?
- Sampling frequency. How often do you take samples?
- Thresholds. If any limits are shown, are they spec limits, control limits, warning limits,…?

The methods used for sampling, measuring, and setting limits.

If you think this is an undue administrative burden, remember NASA’s, Mars Climate Orbiter, which crashed on Mars in 1999 because the ground software measured thruster impulse in pound-force-seconds and the spacecraft’s onboard software in newton-seconds. $193M of work went down the drain for failure to pay attention to units of measure. We know of this case because NASA publicized it; we don’t hear about the smaller losses this kind of negligence routinely causes on factory floors, because it is no object of corporate pride,

Formatting

The formatting requirements vary depending on the use of the charts. For internal working documents displayed on boards or used in solving problems, you want the same test items in the same places, for example the author name in the bottom right-hand corner. This means using a form for headers and footers as in engineering drawings or work instructions.

For the charts themselves, Jean-Luc Doumont has formatting recommendations that are worth studying:

The poor example is the most commonly found in publications, largely because it is the Excel default. It has many flaws:

A heavy grid overwhelms the data. A fine grid is only useful when you use a chart to look up individual values, as in old-fashioned books of mathematical functions.
More tick marks on the axes than necessary, again, unless you are using the chart for looking up values.
A legend block that requires readers to take their eyes away from the data.
A vertical title for the y-axis that twists the reader’s neck.

The first iteration of improvement deletes the grid and the extra tick marks, uses color to distinguish measured values from the theoretical curve, and identifies them right on the chart, with labels of matching colors. It also has a horizontal title for the y-axis.

The second iteration restricts the axes to the ranges of the data, and the tick marks to points of interest, like the maximum. The author also annotated the chart with the range of frequencies between inflection points.

Improving the charts is more manual work than using the Excel defaults. The better charts are prettier but that’s not the motivation. You do it for better communication, not aesthetics. Don’t systematically follow any set of rules. Focus instead on what you are trying to communicate and to whom. Axes with regularly spaced tick marks and backgrounds with some form of grid may not strictly be best but they are a cultural constraint for audiences that grew up with them.

Engineering versus SPC

At least in engineering books, charts are labeled, if not to Jean-Luc Doumont’s standards. This example is about the thrust force in lbs applied by a drill in machining in as a function of the feed in ipr (inches/revolution) for four different drill head geometries, from Degarmo’s Materials & Processes in Manufacturing:

Some schools publish guidelines for engineering students preparing plots. They all require the y-axis to be labeled with a descriptive variable name and unit of measure. In the SPC literature, Shewhart followed these standards, but most of his followers didn’t, and it is worth pondering why.

Shewhart’s Charts

In this, as in many other areas, Walter Shewhart set a standard. Most of his successors did not follow it. This is Figure 7 from Shewhart’s Statistical Method from the Viewpoint of Quality Control (1939), with units and descriptive titles on both axes:

Later Control Charts

The Grant & Leavenworth textbook on Statistical Quality Control followed Shewhart in its discussion of Control Charts, duly labeling axes and plotting points rather then broken lines. I also found a Japanese book on the 7 tools of QC by Katsuya Hosotani from 1986 that showed examples with labeled axes. None of the other American references I checked did.

Western Electric Statistical Quality Control Handbook (1956)

In the Western Electric Statistical Quality Control Handbook (1956), some but not all charts have y-axes with units. In this example, the unit is given for the y-axis but not the name of the variable, and the x-axis is unlabeled:

Juran’s Quality Control Handbook, 5th edition (1998)

In Juran’s Quality Control Handbook, 5th edition (1998), the y-axis in Figure 4.10 has a variable name but no units, and the x-axis is mislabeled “Number of Samples” instead of “Sample Number”:

Wheeler & Chambers, Understanding Statistical Process Control (1992)

Wheeler & Chambers, in Understanding Statistical Process Control (1992) have a section 9.1 about Inadequate Measurement Units, centered on measurement resolution. If you take measurements to 0.01 in, you may smooth out patterns that are visible with measurements to 0.001 in. It is certainly a concern, but the charts in this section have y-axes with no units:

“Done right” in this context refers to the method used to set control limits.

Math versus Engineering

In math, you don’t worry about units; in engineering, you do. In Math, ε is “arbitrarily small”; in Engineering, a linewidth is 3 μ or 3 nm. You can study math at the highest level without ever putting units on charts. In a School of Engineering, on the other hand, turning in work without units is a shortcut to a failing grade. Academics don’t take it lightly, and practicing engineers shouldn’t either.

The following chart is from a Japanese manual on the 7 tools of QC by Katsuya Hosotani from 1986:

As it illustrates a generic concept in mathematical terms, it contains no units. Similar examples of departure from control follow it. A few pages later, there is an example about actual measurements, with all the details to satisfy a nitpicky engineer, albeit not necessarily at the best locations:

Statistical Quality versus Industrial Engineering

Self-taught, shop-smart individuals created Industrial Engineering, with Lillian Gilbreth as the only PhD in the crowd. The statistical approach to quality is different. Shewhart, Deming, Romig, Feigenbaum, Ishikawa, and Taguchi all held PhDs in mathematics, physics, or statistics. The least degreed were Harold Dodge, with a Master’s in math and physics, and Juran, with a law degree. People with these backgrounds knew how to label a chart. If they violated basic rules drilled into the heads of every engineer, it wasn’t out of ignorance. It was a choice.

Shewhart honored what was ”customarily used in engineering practice,” and said so when he used it as the basis for his choice of ±3σ limits. His charts are clearly labeled and annotated; perhaps because he wanted engineers to adopt his ideas. Most of his successors chose not to follow him on this. To them, signals like threshold crossings and patterns like runs or trends were all that mattered, regardless of the data’s nature.

Wheeler & Chambers’s Blast Furnace Silicon Example

Lonnie Wilson directed my attention to an example about Blast Furnace Silicon in pp. 91-94 of Wheeler & Chambers’ Understanding Statistical Process Control. There is no discussion of why it is a characteristic of interest, how you measure it, in what units you express it, or what action you might take in response to anomalies.

They just plot 33 samples of 3 measurements each on $\bar{X}\text{-}R$ charts and observe long runs over and below the centerline:

They conclude, “these runs are interpreted as indications of a lack of control.” This implies that this data represents a departure from a state of control under which such runs would not occur, but the authors do not provide any data collected in such a state. An alternative conclusion is that the $\bar{X}\text{-}R$ chart is a poor fit to Blast Furnace Silicon data, and that you should use another model. Let’s consider the context of the data.

The backstory of blast furnace silicon

Wheeler and Chambers don’t even say which metal the blast furnace is for. In this section, I used a Google search to attempt to reverse-engineer the context information. It may be entirely wrong, but it’s their example, and as a reader, I shouldn’t have to guess.

The hot metal could be iron, copper, or lead, but the literature on blast furnace silicon primarily focuses on iron, and more specifically, pig iron. So I assume that the data about extracting pig iron from ore, as Kobe Steel does in their Kakogawa Blast Furnace:

The temperature inside a blast furnace ranges from 1,200 °C to 2,000 °C, and it is measured with thermocouples or infrared sensors, which means that you can poll the temperature as often as you want. Silicon content is also a continuous variable, but you cannot sample it at will. Measuring it takes several steps:

Draw a sample from the hot metal.
Cast it.
Grind it.
Use an electric arc to turn it into a plasma.
Use an Optical Emission Spectrometer (OES).

Silicon enters the furnace with a mix of ore, coke, and limestone at the top, and some of it leaves in hot metal at the bottom. You can expect some blending to occur internally, resulting in a moving average of the inputs. As a consequence, you would expect successive measurements in the output to be autocorrelated, which can be checked from the data. You control the silicon content of the outgoing metal through the silica content of the ore, as well as the temperature and residence time of the materials in the furnace.

The silicon content ranges from 0.1% to 0.8% of the weight, which raises the question of what units Wheeler & Chambers used when their measurements range from “72” to “248”.

As explained in Pourya Azadi et al. (2022), “The silicon content of the hot metal is an important variable for the operation and control of a blast furnace, as it not only reflects the product quality but also is an indicator of the internal thermal status of the blast furnace. The measurements of the variation of the silicon content in the hot metal can be used to control the heat fluctuations inside a blast furnace…”

Modeling Blast Furnace Silicon

There are actions you can take with machine tools that process parts one at a time that you cannot take with a blast furnace. For example, if a lathe starts putting out pieces with wrong diameters, you can stop it, analyze the root cause, fix it, and restart it. You can’t do this with a blast furnace. It has to run continuously. If you stop it, you may never be able to restart it. When Kobe Steel had to stop a blast furnace in an emergency due to the Hanshin earthquake of 1995, it took them two and a half months and $980 million to restart it.

The only actions you can take in response to a silicon content anomaly is tweaking the furnace controls. This is a problem of feedback control, and not an obvious fit for SPC. Modeling blast furnace silicon is actually an active research topic with many recent publications, none of which discuss control charts. Instead, they discuss using machine-learning techniques to predict future values.

The control chart model is that, in a state of statistical control, the measured variable is the sum of a constant and white noise due to common causes. In their own analysis of runs, Wheeler & Chambers show that their blast furnace silicon data do not meet these conditions. They interpret the runs as “indications of a lack of control.”

This conclusion, however, only applies within the control chart model. If instead, based on the physics and chemistry of smelting and the data, you fit a time series model to the data, it may be consistent with the presence of runs above or below the mean during normal operations, predict the distributions of future values, and issue alarms when measurements are inconsistent with these predictions. The literature suggests that this is what smelters are doing.

Wheeler and Chambers’s Data

On p. 91, they provide 99 measurements in 33 subgroups of 3 points each:

When you put them on a control chart, with upper and lower control limits set from this same data, you assume that they are independent and identically distributed. Let’s check this out.

The measurements are not identically distributed

Assuming they are all independent and identically distributed, we should take a look at their distribution:

It looks skewed and pointy, but not alarmingly so. The auhors, however, structured the data in subgroups of three values each. The columns for Measurements 1, 2, and 3 should have the same distribution. When we plot their separate densities, however, we get the following result:

The densities for Measurements 2 and 3 are visually close, but Measurement 1 stands apart. This says that the dataset is heterogeneous and that the 99 points are not identically distributed.

The measurements are not independent

If the measurements are independent, there should be no correlation between subgroups. If we plot separately the autocorrelations for Measurements 1, 2, and 3, we get the following:

All three measurement sequences show strong correlations between measurements and their three predecessors, and they are therefore not independent.

Some conclusions about Blast Furnace Silicon

Neither the backstory nor the data support the control chart model. The data is heterogeneous, with the first measurement in each subgroup having a distribution that is different from the other two, and values showing strong autocorrelations.

Wheeler’s response to autocorrelations is to ignore them when they are within ±0.6 and ignore the control limits when the autocorrelations are beyond this range. When you are trying to predict a future state, however, autocorrelation is your friend, not your enemy. A substantial autocorrelation tells you that recent past contains information about the near future. Models like ARIMA have been available for decades to leverage such characteristics, and have been supplemented recently by machine learning.

The example, as presented in the book, does not give enough information to understand why the data is heterogeneous. The autocorrelations explain the runs observed in the data, and they are not necessarily an indication of lack of control. There are plenty of tools available to model and predict autocorrelated processes. $\bar{X}\text{-}R$ charts aren’t one of them.

Overall Conclusion

So why do reference authors on SPC publish charts with unlabeled axes? I give them credit for knowing that they are thereby flouting established norms of the engineering profession.

The only explanation I can think of is that they perceive their tools to be universally applicable. It doesn’t matter what you plot: you apply the formulas for control charts and the rules of SPC let you tell common causes of variations from special causes… If this is true, you don’t need to bother with units or even variable names.

Whether you are measuring the pH of a shampoo, a critical dimension on a car transmission case, Academy Award viewership, the glucose content of your own blood, or Atlantic hurricanes, it doesn’t matter. You plot the same charts, calculate limits the same way, and voilà! Applying the same rules lets you tell common cause variation from variation due to assignable causes.

When I first went to Japan, to work on statistical models of earthquake occurrences, geophysics Prof. Toshi Asada supported my effort but warned me that the great John Tukey had tried to do it and failed, and he thought it was because Tukey had not known anything about seismology. He encouraged me to study it in depth first. Eventually, I didn’t do better than Tukey — his 1960s work on the spectral analysis of seismic waves is not forgotten — but I remembered Asada’s advice when, a few years later, I started working in Manufacturing.

The first thing to do when working with data is to research its context. Then, you visualize the data, clean it, and let what you see guide you in choosing a model with predictive power. Then you show respect for the data by labeling axes with variable names and units.

#chart, #axislabel, #unit, #spc

By Michel Baudin • Data science 0 • Tags: Axis label, Chart, SPC