Oct 4 2022

# Strange Statements About Probability Models | Don Wheeler | Quality Digest

In his latest column in Quality Digest, Don Wheeler wrote the following blanket statements, free of any caveat:

- “Probability models are built on the assumption that the data can be thought of as observations from a set of random variables that are independent and identically distributed.”
- “In the end, which probability model you may fit to your data hardly matters. It is an exercise that serves no practical purpose.”

Source: Wheeler, D. (2022) Converting Capabilities, What difference does the probability model make? Quality Digest

**Michel Baudin**‘s comments:

### Not all models assume i.i.d. variables

Wheeler’s first statement might have applied 100 years ago. Today, however, there are many models in probability that are *not *based on the assumption that data are “observations from a set of random variables that are independent and identically distributed”:

- ARIMA models for time series are used, for example in forecasting beer sales.
- Epidemiologists use models that assume existing infections cause new ones. Therefore counts for successive periods are not independent.
- The spatial data analysis tools used in mining and oil exploration assume that an observation at any point informs you about its neighborhood. The analysts don’t assume that observations at different points are independent.
- The probability models used to locate a wreck on the ocean floor, find a needle in a haystack, and other similar search problems have nothing to do with series of independent and identically distributed observations.

### Probability Models Are Useful

In his second statement, Wheeler seems determined to deter engineers and managers from studying probability. If a prominent statistician tells them it serves no useful purpose, why bother? It is particularly odd when you consider that Wheeler’s beloved XmR/Process Behavior charts use control limits based on the model of observations as the sum of a constant and a Gaussian white noise.

Probability models have many useful purposes. They can keep from pursuing special causes for mere fluctuations and help you find root causes of actual problems. They also help you plan your supply chain and dimension your production lines.

### Histograms are Old-Hat; Use KDE Instead

As Wheeler also says, “Many people have been taught that the first step in the statistical inquisition of their data is to fit some probability model to the histogram.” It’s time to learn something new, that takes advantage of IT developments since Karl Pearson invented the histogram in 1891.

Fitting models to a sample of 250 points based on a histogram is old-hat. A small dataset today is more 30,000 points, and you visualize its distribution with kernel density estimation(KDE), not histograms.

#donwheeler, #probability, #quality

Jason Morin

October 4, 2022@ 9:36 amDid you leave a comment on this post? Maybe with a link your blog post?

Michel Baudin

October 4, 2022@ 9:41 amNo. I focus on this blog and LinkedIn. I would welcome a response from Don Wheeler if he is interested.

Jason Morin

October 4, 2022@ 9:37 amSorry, a comment on

hispost.Kyle Harshbarger

October 4, 2022@ 4:33 pmIf probability models are useless, I guess I’ve been wasting my time for the last decade.

Besides the terrible strawman of fitting the best model, there is a reason control charts are setup the way they are: Central Limit Theorem. A Gaussian distribution will emerge with large samples if the variance is finite.

If you want to predict future performance of a system, you need to understand its possible outcomes and their likelihood. Fitting a distribution to historical data is one way to do this, but you could also just use an empirical distribution of the sample itself. What you do depends on the problem context.

“ The numerically naive think that two numbers that are not the same are different. But statistics teaches us that two numbers that are different may actually be the same.” It’s really hard to take this seriously.

Allen Lee Scott

October 10, 2022@ 11:46 pmDr. Wheeler’s example is manufacturing where models were tried and failed by Shewhart. Your epidemiologist example is one where models are useful as Wheeler said.

“Now to be clear, data analysis is distinct from modeling. Epidemiological models incorporate subject-matter knowledge to create mathematical models that are useful for understanding and predicting the course of an epidemic.

These models allow the experts to evaluate different treatment approaches. While these models are generally refined and updated using the collected data, this is not the same as what I call data analysis. Data analysis can be carried out by non-epidemiologists.

This occurs when people try to use the data to tell the story of what is happening. This article is about the analysis of the existing data by non-epidemiologists. Nothing in what follows should be construed as a critique of epidemiological models.”

Taken from https://www.spcpress.com/pdf/DJW373.pdf

Page 12

When the normal law was found to be inadequate, then generalized functional forms were tried. Today, however, all hopes of finding a unique functional form are blasted.

Taken from https://archive.org/details/CAT10502416/page/12/mode/2up?q=blasted

Michel Baudin

October 11, 2022@ 12:36 amData analysis is the process by which you choose a model, and it is always better done when based on subject-matter knowledge — that is, the backstory of the data.

In the article you cite, Wheeler discusses COVID-19 deaths, as if they were precisely and accurately defined, which they weren’t. By some methods, someone run over by a truck after testing positive for COVID-19 would be counted. If you do consider the backstory, the most objective measure is Excess Deaths.

On one side, you have a model of what mortality should have been without the pandemic; on the other, you have actual death counts. The difference between the two in the number of people who would not have died without the pandemic. In the US, this number was about 40% higher than the COVID-19 death count.

In manufacturing, you need to consider the backstory of quality characteristics too. They are not just context-free numbers. In machining, for example, a sequence of length measurements may reflect a tool wear pattern.

Shewhart’s writings are of historical value but difficult to interpret because he uses probability theory as he had learned it >100 years ago. The theory itself has grown since, and the vocabulary used to describe it has changed. When I read the sentence you quote, I can’t be sure what he meant by, for example, “a unique functional form.”

Allen Lee Scott

October 17, 2022@ 9:55 pmThanks, for the reply.

Some of the unique functional forms are in Dr. Wheeler’s paper. Normal model, lognormal model, Gamma, and Weibull. You might recall Dr. Deming stepped in as editor to Dr. Shewhart’s last book to make it easier for us to read and understand the language. I previously linked the whole textbook.

The difference between the probability approach and Dr. Shewhart’s approach is not well understood going back to Pearson the younger in 1935 and looking at your papers this misunderstanding continues and will continue until the end of time, unfortunately, as it dwarfs real understanding.

“Knowing when to look for assignable causes of exceptional variation, and when to avoid doing so, requires the characterization of the process behavior rather than the estimation of parameters for some probability model. Those who advocate using the probability approach miss this distinction.” – DJW and Henry R. Neave, 2015

source: https://www.spcpress.com/pdf/DJW287.pdf

If this paper above won’t do it or at least spark interest for further investigation, I would just recommend forgetting about it. Hopeless! I am moving on…

Michel Baudin

October 17, 2022@ 11:25 pmWhen I read Shewhart, I see someone working hard to apply probability theory, at it existed in the 1920s. His work is mostly of historical interest because everything has changed since.

Tools developed for the manufacturing of telephone equipment in the 1920s don’t have the power to solve the capability problems of today’s semiconductor manufacturing processes.

The technology to work with data today doesn’t have the restrictions that Shewhart had to work around: tiny, manually collected datasets, paper and pencils. Among other things, these limitations made classical statistics use arbitrary thresholds like 3σ.

Probability theory itself has moved on and, in particular, is taught in a form that didn’t exist in Shewhart’s days. The notion that it’s not applicable to fluctuations is absurd.

Moving on is a good idea. Perhaps away from 100-year-old stuff and to current data science.