Mar 5 2024

Process Control and Gaussians

The statistical quality profession has a love/hate relationship with the Gaussian distribution. In SPC, it treats it like an embarrassing spouse. It uses the Gaussian distribution as the basis for all its control limits while claiming it doesn’t matter. In 2024, what role, if any, should this distribution play in the setting of action limits for quality characteristics?

Background

I have written several posts since 2011 about the need to update the World War II vintage SPC that organizations like the ASQ still teach to address today’s process capability issues with today’s data science:

This is not to revisit the same ground but specifically to explore the role of the Gaussian distribution in classical SPC, and ways we could do better.

Fluctuations Between Pieces

Here, we are looking at the variability in measured characteristics of a sequence of units leaving an in-house production process or arriving from a supplier. It may be after one operation within a process, or it can be on a finished good. The units can leave the process one by one or as a batch.

Sometimes, you measure every unit, and sometimes, a small sample, as when, for example, the measurement involves destructive testing. In other cases, you can replace measurements on every part with go/no-go gauge checks. Sometimes, measurements are mandated by customers or the government, and the data is part of the delivered product.

In all cases, we assume that the Quality Department has done its job in calibrating instruments and gauges so that you see actual differences between product units and not measurement errors. For this, the range of measurement errors must be orders of magnitude lower than the differences between units.

The Implicit Model

While never expressed this way in the literature, the classical SPC model, as first formulated by Walter Shewhart 100 years ago, is that measured variables on manufactured products fluctuate around a constant mean $\mu$ and that these fluctuations constitute a Gaussian white noise, meaning that they are independent for every unit of product, and follow the same distribution $N(0,\sigma)$ .

To detect changes in the mean, you plot individual values, sample averages, or midranges; for changes in the amplitude of fluctuations, sample standard deviations, or ranges. Then, to determine whether you should act, you compare these statistics against control limits, set using control chart constants that you find in published tables.

The literature gives you the recipe but does not dwell on the math behind the constants. If you work it out, you find that they are all based on $\pm3\sigma$ limits for Gaussian fluctuations around a fixed mean. Some SPC specialists then tell you that this assumption does not matter and that the method works regardless of the distribution of the measured variable.

When Measured Variables Are Not Gaussian

In the discussion of process capability indices, we have seen that binning produces distributions that are definitely not Gaussian. There are also cases where you use truncated distributions to deal with asymmetric risks. If you cut a rod too long, you can grind it down to size, but if you cut it too short, you scrap it. To avoid short rods, you aim for the upper part of the tolerance interval and grind down the units that exceed the upper spec limit.

If measuring incoming parts from a supplier, you may encounter a distribution with two modes, which can occur when you have parts from two different production lines, and your data is heterogeneous. Or the distribution may have multiple modes blending into a large flat zone when the supplier is making the parts in four different factories, without tracking which factory a part comes from. These kinds of distributions are revealing about the supplier’s practices. You can read them for clues as to what happened and ask questions.

One characteristic of Gaussians that is not shared with all distributions is that changing its mean simply shifts the distribution along the $x$ -axis. With Gaussians, if you add $D$ to the mean, the probability distribution function changes from $f\left ( x \right )$ to $f\left ( x -D\right )$ . This is not generally true. With the exponential distribution, for example, changing the mean flattens or narrows the distribution but does not shift it. It is easy to forget this fact when you spend too much time with Gaussians.

When are Variations Gaussian?

If you measure parts made with the same process, on the same machines, by the same operators, with materials from the same sources, you expect a homogeneous data set. Then a measured characteristic of the parts coming out is a function $Y = f\left (\mathbf{X} \right )$ of all these inputs, which you can, without much loss of generality, assume to be continuous and have a gradient.

Then, as in the case of measurement errors, small variations $\Delta Y$ are approximately linear in the variations $\Delta \mathbf{X}$ . As for measurement errors, it is a sum of many independent terms of the form $\frac{\partial f }{\partial x_i}\times \Delta X_i$ . Per the extended CLT, with the same caveats as for measurement errors, the Gaussian is a plausible assumption for the distribution of $\Delta Y$ .

To say this, we don’t need to know $f$ . All we need is for it to have a gradient. $f$ also exists in the cases of binning, truncating, or commingling of parts from different lines, but these actions break its continuity. It does not have a gradient at the breakpoints, and the variations in the measured characteristics are not Gaussian.

Also, while this finite sum of independent terms might converge to a Gaussian as their number grows to infinity, the extended CLT says nothing about how fast.

The Quality Literature

Shewhart stands out as an author by asking theoretical questions while his successors present their materials in a cookbook format. A cookbook tells you how to boil an egg without dwelling on the chemistry of coagulation.

The literature on statistical methods in quality does not discuss how the Gaussian distribution might emerge as a common model for the fluctuations in measured variables. Don Wheeler is an often-quoted promoter of classical SPC, particularly of the 1942-vintage XmR chart that he rebranded as Process Behavior Chart. Let’s take a look at what he writes.

Don Wheeler’s Argument

In The Normality Myth, after opening with “the first axiom of data analysis is that no data have any meaning apart from their context,” Don Wheeler argues in favor of using $\pm 3\sigma$ limits for Control Charts, regardless of context. It is a one-size-fits-all “fixed-width filter.” The core of his case is that, with six different models, “three-sigma limits cover 98 percent to 100 percent of the area under each curve.”

It raises a few questions:

It’s only about six distributions out of the infinite range of possibilities.
It ignores the relative frequencies of true and false alarms.
It does not consider the consequences of false alarms.

Wheeler’s Sample of Distributions

First, the options are not limited to the six distributions he uses:

The Gaussian.
The $\chi^2$ with 8 degrees of freedom
The Weibull with $\alpha = 1.6$
The $\chi^2$ with 4 degrees of freedom.
The exponential.
The lognormal with $\beta = 1$

The following plots show the probability distribution functions (p.d.f) of these distributions, with red lines for the $\pm3\sigma$ limits and a thick black line for the mean. In his article, Wheeler centered and normalized them to have $0$ mean and a standard deviation of $1$ so that all the limits are at $-3$ and $3$ . They are untransformed here, highlighting that all but the Gaussian are models for positive variables, for which you wouldn’t use $-3\sigma$ as a lower control limit.

Wheeler’s six distributions, with $\pm 3\sigma$ limits in red and the mean in black

Positive Variables

For positive variables a $-3\sigma$ limit that is $<0$ triggers no alarms for the interval $[-3\sigma,0]$ . In SPC, σ-charts and R-charts only have upper control limits because these sample statistics cannot be negative.

Where negative values are supposed to be impossible, they are data integrity violations, like negative weights or males flagged as pregnant, and they should trigger alarms. It’s unclear why Wheeler didn’t choose alternative distributions for which the $\pm 3\sigma$ limits apply.

The Uniform Distribution

There are, in fact, infinitely many possible distributions that he does not consider, including the uniform distribution that comes up with binning processes. The uniform distribution over $\left [-1,1 \right ]$ has a mean of $\mu = 0$ and a standard deviation of $\sigma = \sqrt{\frac{1}{3}} = 0.577$ , so that its $\pm 3\sigma$ limit interval is $\left [ -.1.73, 1.73 \right ] \supset \left [ -1,1 \right ]$ .

With this distribution, any value outside of $\left [ -1,1 \right ]$ should never occur and therefore should trigger an alarm. With $\pm 3\sigma$ limits, however, it won’t unless it is outside of $\left [ -.1.73, 1.73 \right ]$ . In another case, $\pm 3\sigma$ limits are not an appropriate choice of thresholds.

The Range of Possible Distributions

The probability texts of a few decades ago extensively discuss a small menagerie of distributions. Their densities come in closed form as simple combinations of exponentials, logarithms, polynomials, factorials, and Euler’s beta and gamma functions. First, these distributions are mathematically tractable, and second, we can often relate them to natural or social phenomena that generate variables following them.

Today, a technique like Kernel Density Estimation (KDE) does more than generate a visualization of a distribution based on a sample. It actually generates a probability distribution function that you can work with. It is up to the analyst to tweak the kernel width as needed. It overfits the data if it’s too narrow; if too wide, it smoothes away relevant features.

Consequences of False Alarms

In another paper, Wheeler urges his readers to “not worry so much about straining out the gnats of false alarms that you end up swallowing the camels of undetected process changes.”

This thinking leads him to ignore the consequences of the differences between false alarm rates. For the Gaussian distribution, the $\pm 3\sigma$ interval contains 99.73% of the area under the probability density function, meaning that, without any assignable cause, the probability of an alarm is 0.27%. In the worst case he considers, these numbers are 98.2% and 1.8%, respectively. Then, the false alarm rate is six times higher than for the Gaussian.

Ratio of True to False Alarms

Wheeler argues that this rate is still low enough for the limits to be used. What really matters, however, is less the absolute rate of false alarms than the ratio of false to true alarms, which depends on both the limits and the state of the process.

The point of SPC is to detect when a process in a state of statistical control is thrown off by an assignable cause, which can be anything from a broken tool to exceptional humidity.

Assume a process in such a condition that no assignable cause ever disrupts it. If you monitor it with Control Charts, they will occasionally generate alarms that will be all false. No one responds to these alarms after the system cries wolf too many times.

The first one may send engineers chasing an assignable cause that isn’t there, but these alarms quickly lose credibility. It’s like testing a population for a disease no one has. As infection tests are never perfect, you will get a few positives, all false. On the other hand, if assignable causes frequently occur, most alarms are true, and the quality of the process is so poor that even investigations of false alarms frequently uncover real problems…

Wheeler only considers the second case. Perhaps, it was so prevalent in the 1920s that you could ignore the alternative. Today, however, there are large-scale examples of industries, like automotive parts, that can deliver 1 million consecutive units without a single defective, making them instances of the first case. In fact, as a process improves, the rarefaction of true alarms causes a shift away from tools like control charts to rapid detection and response in flow lines.

False Alarms in Earthquake Prediction and Public Health

The consequences of false alarms are also a consideration in many other applications of probability theory. Earthquake prediction was out of reach when I got interested in it in 1977, and it’s still out of reach in 2024. Imagine, however, that a model of seismic activity in California estimates the probability of a strong earthquake in San Franciso tomorrow at 80%.

What can you do with this information? If you tell the public, the panic may make more victims than the earthquake. Then, if the earthquake doesn’t happen, everyone ignores the next warning.

Public health officials face the same quandary. In 2009, French health minister Roselyne Bachelot had to resign after ordering vaccines and masks against an H1N1 flu epidemic that fizzled, and concerns about being accused of overreacting caused her successors to leave the country unprepared for COVID-19 11 years later.

Don Wheeler’s Conclusions

In his summary of The Normality Myth, Wheeler says the following:

“Regardless of the shape of the histogram, Shewhart’s generic, symmetric, three-sigma limits and the Western Electric zone tests will work reliably to filter out virtually all of the probable noise so you can detect any potential signals.”

As we have seen, apply these limits to the uniform distribution, and you will miss some true alarms. Apply them to a process with no assignable causes of variation, and you will get nothing but false alarms.

Multiple Rules

Individually, the multiple rules specified in the Western Electric handbook are meaningful patterns to look for, like “2 out of the last 3 values outside of $\pm 2\sigma$ , or “4 out of the last 5 values outside of $\pm 1\sigma$ .”

Applying multiple rules one after the other, however, doesn’t sound quite right. If you keep running more tests on the same data, you eventually find one that gives you a “significant” result. It is a common practice in academic papers submitted to scientific publications called p-hacking.

Probability Models

Wheeler then ends with another statement:

“To paraphrase Shewhart, classical statistical techniques start with the assumption that a probability model exists, whereas a process behavior chart starts with the assumption that a probability model does not exist.”

It raises two questions:

Where did Shewhart say anything like it?
What does it mean?

It’s a paraphrase, not a quote, and Wheeler doesn’t give any provenance. Shewhart’s first book does not contain the phrase “probability model.” The second, Statistical Method from the Viewpoint of Quality Control, from 1939, has it once, on page 142, in a chapter on accuracy and precision. It’s about finding a mathematical probability model to make predictions.

A model is an abstraction we build for some aspects of a physical system. To say that a model “does not exist” is to assert that it is mathematically impossible to build a model with any predictive power.

We are so confident in some models that we call them deterministic and don’t bother with probabilities. A case in point is when you have a machine tool that can hold tolerances ten times tighter than what you need for your products. You can ignore its fluctuations.

For others, we predict statistics of the system and apply probability theory to predict their accuracy and precision. I call these models random, but probability model is another good name. Shewhart’s work is about building such models of quality characteristics.

Finally, there are systems we call uncertain because we don’t know how to predict them. These include, for example, the effects of earthquakes, stock market crashes, or the outbreak of war. Shewhart’s assignable causes turn a system that produces a quality characteristic from random to uncertain, and his goal is to detect these effects before they damage your output.

False and True Alarms

Let us dive deeper into the issue, first by considering just distributions and thresholds, and then by factoring in the state of the process.

Small and Large Deviations

The $\left [-3\sigma, +3\sigma \right ]$ limits make no difference between large and small deviations. With large deviations, it is visually obvious that the process has shifted. The small deviations are a challenge to detect as an early warning that something is amiss.

Whether an observation is just outside the interval or far from it, the alarm is the same, although it is obviously less likely to be false for values far away. Let us consider the single-sided p-value for large $y$ :

p\left ( y \right ) = P\left ( Y \geq y \,\vert\, \text{no shift}\right )= 1-F\left ( y \right ) \text{ for } y \geq +3\sigma

Then $p\left ( y \right )$ is the probability that $Y$ is at least as high as $y$ if no shift has happened. If we plot this for $y\in \left [ \mu+3\sigma, \mu +9\sigma \right ]$ for all the distributions in Wheeler’s article, we find that the p-value achieved with a threshold $\mu+3\sigma$ for the Gaussian requires a threshold of $\mu+9\sigma$ for the lognormal.

Wheeler denounces this as the “fixed-coverage filter,” an approach he attributes to Shewhart’s British contemporary Egon Pearson, and says that you should use the “fixed-width filter” $\pm3\sigma$ threshold regardless of the distribution.

Arbitrary Thresholds

As Shewhart put it “For this purpose let us choose 1 − p′ = .9973 because this is about the magnitude customarily used in engineering practice.” In other words, the $\pm3\sigma$ limits are arbitrary thresholds, whose only merit is their p-value for Gaussians, and it does not cross over to other distributions.

Why would Shewhart make this choice? Like data, people’s decisions about their methods can only be understood in context. When Shewhart worked, you collected measurements on paper spreadsheets, looked up tables and charts in books, added with adding machines, and applied formulas with slide rules. Computers were humans skilled with this technology, and the electronic computer was a thought experiment by Alan Turing.

Shewhart knew that the Gaussian was not a fit for all measurements. He gave counterexamples in his books, like the following case, which led, a few years later, W. Weibull to introduce the distribution that bears his name:

Skewed distribution of steel strand tensile strength in Shewhart’s first book

Shewhart also knew that, to gain any traction in the factories of his day, a tool had to be simple enough for the factory people to use yet sophisticated enough to do some good.

Fitting a specific model to every single measured parameter was too much work and required unavailable skills. On the other hand, applying the same model and the same rules to every parameter had a fighting chance, particularly when based on a “magnitude customarily used in engineering practice.”

Our context is different. Our approach is not constrained by the technology available to Shewhart, the problems we face are technically different, and the people working in factories are more educated than their counterparts 100 years ago.

Detecting Specific Process Shifts

Checking a measurement against limits is not just about a measured value, but instead about the process that generated it. Charts of sample means, sample ranges, or individual measurements are supposed to detect shifts in the process location or in the dispersion of the values. Let’s us narrow the discussion with some assumptions:

The process is supposed to be centered at $0$ .
The variable is Y, either a sample mean or an individual measurement, which we assume to have a mean \mu and a standard deviation \sigma. y designates an instance of Y.

Applying the Threshold

As shown in the following figure, if the process has shifted upward by $D$ , the probability that $Y \geq T$ represents a true alarm is $P\left (\text{Alarm}\vert \text{D shift} \right )= 1-F\left (T-D\right)$ . This is the sensitivity of the threshold $T$ for a shift by $D$ . Then the probability of issuing a false alarm is

$P\left (\text{Alarm}\vert \text{no shift} \right )=1-F\left (T\right)$ .

Suppose the shifted and unshifted distributions are so far from each other that their supports do not overlap or that the density in the overlap is negligible. In that case, any threshold in the gap will effectively tell them apart.

Plotting the ROC Curve

We can plot these two quantities as they vary with $T$ . As discussed in an earlier post, the ROC Curve is the orbit of $\left (F\left (T-D\right), F \left( T\right ) \right )$ or (Sensitivity, 1-Specificity) when T varies.

If $T$ is too low, the probabilities of issuing an alarm are both $1$ , whether or not the distribution has shifted by $D$ ; if $T$ is too high, they are both $0$ . The diagonal in the ROC chart is where these probabilities are equal, and you want to be above this line. Note also that the slope of the ROC curve is the likelihood ratio of the distributions with and without the $D$ shift, taken at the threshold $T$ .

As $T$ increases, you want to stop where you are closest to the point where you are certain to have an alarm if there has been a shift by $D$ , and certain not to have one if the process has not shifted. And there is no reason why it should be at $3\sigma$ .

As short for “Receiver Operating Characteristic,” the ROC name refers to the technique’s origins in World War II radars. The curve takes getting used to, but the British radar engineers found it useful in tuning their interrogation friend-or-foe systems. While almost as old as Control Charts, it has yet to find its way into the statistical quality canon. It is now taught as part of the machine-learning tool kit for classification model evaluation.

ROC for the Gaussian

Let us plot this curve for the Gaussian at various levels of shift, from $2\sigma$ to $5\sigma$ :

Along the $y$ axis, you see the performance of the $3\sigma$ threshold. If the process has shifted by $5\sigma$ or more, it will almost certainly detect it and not issue false alarms, as would the naked eye on a line plot. We would like to detect smaller shifts that aren’t visually obvious. For a shift of $2\sigma$ , you only have a probability of 16% of crossing the limit, which means that the mean number of tries before one limit crossing is 6.25.

On the other hand, if you set the threshold at $1.02\sigma$ , you have an 84% probability of crossing the limit if the process has shifted by $2\sigma$ and a 15% probability of crossing it if the process has not shifted. It is, perhaps, not enough evidence to shut the process down but enough to take a closer look.

ROCs for Other Distributions

The following figure shows ROCs for the other distribution in Wheeler’s paper, for shifts from $2\sigma$ to $5\sigma$ . What happens along the $y$ -axis with the threshold set at $T= \mu + 3\sigma$ when $D$ varies is similar to the Gaussian pattern for the first three distributions but not for the last two. At the thresholds that best separate the shifted from the unshifted distributions, the sensitivity is higher than for the Gaussian but the specificity is the same.

Again, this is only from Wheeler’s small sample of distributions. Assume you have a dataset of representative values of the measured variable in the absence of assignable causes of disruption. You can apply KDE to it and plot an ROC curve for the resulting distribution. You could do this on Shewhart’s tensile strength measurements.

What about the process?

The above discussion is entirely based on distributions of the measured variable and does not consider the frequency with which assignable causes disrupt it. As discussed above, if they never do, all the alarms triggered by limit crossings are false. We also need to consider the consequences of these disruptions and the technical challenges of eliminating them.

Probability of Causes Given Alarms

In medical diagnosis, what symptoms a disease produces is nice to know. You are, instead, trying to find which disease a patient has based on observed symptoms. Likewise, rather than the probability of an alarm given a $D$ shift of the process, you would like to know the probability of a $D$ shift given an Alarm:

P\left (\text{D shift}\,\vert\, \text{Alarm} \right )= \frac{P\left (\text{Alarm}\,\vert\, \text{D shift}\right )\times P\left ( \text{D shift} \right )}{P\left ( \text{Alarm} \right )}

SPC does not treat the unconditional probability of a $D$ shift. It only cares whether $D = 0$ or $D \neq 0$ . The idea is that a crossing of the $\pm 3\sigma$ limits is enough evidence of $D \neq 0$ to trigger the search for an assignable cause. This implies that, regardless of the amplitude $D$ , a $D$ shift is not a routine fluctuation of the process, and routine fluctuations within tolerances are the object of process capability studies.

Engineers, however, may develop knowledge of the amplitude of the shifts due to specific assignable causes. These may include tool wear, tool breakage, a substitution in materials, or the arrival of a new operator… As in medical diagnosis, the alarm is a symptom, and what we are really after is the probability of a cause $C$ given the alarm, which suggests bypassing the $D$ shift and going straight to causes:

P\left (\text{Cause C}\,\vert\, \text{Alarm} \right )= \frac{P\left (\text{Alarm}\,\vert\, \text{Cause C}\right )\times P\left ( \text{Cause C} \right )}{P\left ( \text{Alarm} \right )}

Thought Experiments

The data needed to estimate the factors in this formula is not generally available, but it does not keep us from reasoning about it. Let us consider two causes, $C_1$ , occurring on average once every 300 parts, and $C_2$ , once every 10,000 parts, with equal probabilities, when they occur, of causing a measured variable to cross a given threshold.

Then the ratio of the probabilities of the two causes given the Alarm reduces to their unconditional ratio:

\frac{P\left ( C_1 \vert \text{Alarm} \right )}{P\left ( C_2 \vert \text{Alarm}\right )} =\frac{ P\left ( C_1 \right )}{P\left ( C_2\right )}

This means that, if you have an alarm, it’s $10,000/300 = 33.3$ times more likely to be due to $C_1$ than $C_2$ . This trivial example shows that you can’t ignore the probabilities or frequencies when responding to alarms.

The case where the causes have unequal probabilities of triggering an alarm is more interesting:

\frac{P\left ( C_1\, \vert \,\text{Alarm} \right )}{P\left ( C_2\, \vert\, \text{Alarm}\right )} =\frac{P\left ( \text{Alarm} \,\vert\, C_1 \right ) P\left ( C_1 \right )}{ P\left ( \text{Alarm}\, \vert\,C_2 \right ) P\left ( C_2\right )}

If $C_2$ is 10 times more likely to trigger the alarm than $C_1$ , the ratio drops to 3.33. These ratios could let us rank causes by decreasing probability. In a real situation, however, this would not be the only criterion in deciding which causes to work on. Other considerations include the severity of their consequences and the technical challenges of ruling them out or eliminating them.

pFMEA

Suppose the engineers have conducted a process Failure Mode Effect Analysis (pFMEA), a technique I have not seen widely applied. On a new process, a pFMEA identifies issues to resolve in order to make it capable; on a capable process, what could possibly go wrong. In the latter case, it is risk assessment.

After brainstorming on possible causes, or Failure Modes, for the process, the pFMEA researches the patterns of Occurrence, Detection, and Severity for each cause:
- $P\left (\text{Occurrence}\right ) = P\left (\text{Cause}\right )$
- $P\left (\text{Detection}\right ) = P\left ( \text{Alarm}\, \vert\,\text{Cause}\right )$
- $\text{Severity} = E\left ( \text{Loss per instance}\right )$
pFMEAs, however, don’t work on probabilities and expected values but on scores on a scale of 1 to 10. You multiply these scores into a “Risk Priority Number” (RPN) for each cause. You then attack the causes by decreasing RPN.

FMEA example from from The Basics of FMEA, p. 44

Instead of RPNs, the probabilities would give us richer information:

E\left ( \text{Loss} \right ) = P\left ( \text{Occurrence} \right )\times \left [ 1 - P\left ( \text{Detection} \right ) \right ]\times E\left ( \text{Loss per instance}\right )

Severity

The probability of a cause is not a sufficient criterion to decide to act. You must also consider the severity of its consequences. In pFMEA, severity is on a scale of 1 to 10, which you then multiply with scores for occurrence and detection. If, however, the worst effect of $C_1$ is cosmetic damage on the product while $C_2$ can kill a user, a scale of 1 to 10 doesn’t begin to account for the difference.

Within a range, you can estimate or bound the expected losses caused by the effects. You can quantify lost sales due to cosmetic defects or the cost of a recall, but existential threats to the company are beyond your ability to estimate.

If word gets around that a manufacturing company’s products kill people, its very existence is in jeopardy. In principle, you do whatever it takes to keep it from happening. In reality, however, there are limits to how far you go into considering unlikely events. In this exchange in the movie Oppenheimer, he and General Grove discuss the risk that the first test of the atom bomb would end the world:

As we know, the Manhattan Project did not apply the Precautionary Principle. Otherwise, they would never have tested the bomb. This principle requires “scientific certainty” of harmlessness before proceeding. Since scientific certainty is never achieved, a rigorous application of the Precautionary Principle would prevent any action.

Decision-makers cannot escape this dilemma. There are no absolute answers. They must decide how low the risk of a catastrophic outcome must be to accept it. A score of 1 to 10 may conceal it. If a major disruption in operations is a 9, how can a safety violation be only a 10?

Corroboration

The only quantity we lack to calculate the probability of a cause given an alarm is the unconditional probability of the alarm. It can be due to any one of the causes we know about, to an unknown cause, or false. Over time, we can infer it from its frequency of occurrence, from any cause.

In the meantime, as $P\left ( \text{Alarm} \right )$ is a normalizing factor common to all causes, not knowing it does not prevent us from ranking the causes in terms of expected loss given alarm.

Each high-ranking cause becomes a theory of the alarm to confirm or refute through additional observation or experimentation.

Conclusions

Treating all measured variables as if they were Gaussian may have been expedient in 1924. It is unnecessarily crude in 2024. If a process variable is relevant, its process capability analysis should produce a model of its common-cause fluctuations.

With the tools available in 2024, we are not limited to Gaussians. We can focus on choosing a model that fits the data and has predictive power. And the search for such a model should be informed by our knowledge of process physics and chemistry. Then we can set up the conditions for issuing alarms.

Identifying assignable causes that may derail this process and cause alarms pays off. You use theoretical process knowledge, lessons learned from similar processes, and experience with the current one. A pFMEA is the output of such an effort. Occurrence, detection, and severity, however, should be expressed in terms of probabilities and expectations. Calculations from scores on a scale of 1 to 10 can mislead.

References
- Jaynes, E. T. (2003). Probability theory : the logic of science. Cambridge University Press.
- Juran, J.M. (1999). Quality Control Handbook, 5th-edition, Chapters 4.16 to 4.20
- Mikulak, R. J., McDermott, R., Beauregard, M. (2017). The Basics of FMEA. Taylor & Francis.
- Mastrangelo, C. M., Montgomery, D. C. (1991). Introduction to Statistical Quality Control. Wiley.
- Pearson, E. S. (1935). The Application of Statistical Methods to Industrial Standardization and Quality Control. United Kingdom: British Standards Institution.
- Shewhart, W. A. (1931). Economic Control of Quality of Manufactured Product. Martino Publishing.
- Shewhart, W. A. (1939). Statistical Method from the Viewpoint of Quality Control. Dover.
- Statistical Quality Control Handbook. (1958). AT & T Technologies.
- Weibull, W. A statistical theory of the strength of materials. Ing. Vetenskapa Acad. Handlingar 1939, 151, 1–45
- Wheeler, D.J. (2019). The Normality Myth, QualityDigest, 9/9/2019
- Wheeler, D.J. (2019). Phase Two Charts and Their Probability Limits, QualityDigest, 11/4/2019
#SPC, #controlchart, #controllimit, #FMEA, #pFMEA, #quality, #gaussian, #normaldistribution
Share this:
Like this:
Like Loading...

Related

By Michel Baudin • Data science, Technology • 1 • Tags: Control Charts, Control Limits, FMEA, gaussian, Normal distribution, pFMEA, process control, Quality, SPC

Process Control and Gaussians

Background

Fluctuations Between Pieces

The Implicit Model

When Measured Variables Are Not Gaussian

When are Variations Gaussian?

The Quality Literature

Don Wheeler’s Argument

Wheeler’s Sample of Distributions

Positive Variables

The Uniform Distribution

The Range of Possible Distributions

Consequences of False Alarms

Ratio of True to False Alarms

False Alarms in Earthquake Prediction and Public Health

Don Wheeler’s Conclusions

Multiple Rules

Probability Models

False and True Alarms

Small and Large Deviations

Arbitrary Thresholds

Detecting Specific Process Shifts

Applying the Threshold

Plotting the ROC Curve

ROC for the Gaussian

ROCs for Other Distributions

What about the process?

Probability of Causes Given Alarms

Thought Experiments

pFMEA

Severity

Corroboration

Conclusions

References

Share this:

Like this:

Related

One Comment

Leave a Reply Cancel reply

Follow Blog via Email

Recent Posts

Categories

Social links

My tags