Nov 13 2023

Process Capability Indices

Within the quality profession, a capable process is one with a high $C_{p_k}$ . In the field, it is not quite so simple. My colleague Joerg Muenzing recently shared concerns about the process capability indices:

“Many manufacturers that I know struggle with incapable processes. Intellectually, people understand the concept of capability, but are unable to effectively apply it to an entire process. A single-figure measure for the entire chain would be ideal to better understand and manage it. The challenge is that the chain consists of processes with measurable characteristics, like thickness, substitute characteristics, like leak current to infer dryness, and also visual inspection results like blemishes or scratches.

They know that $C_{p_k} < 1$ is bad, and scrap shrinks the bottom line, but not much more. At the same time, customers like large Automotive OEMs demand from their suppliers a $C_{p_k} > 1.33$ for manual and “uncritical” characteristics, and $C_{p_k} >1.66$ or even $>2.00$ for critical characteristics.

What would be useful without being ‘too wrong’?”

Let’s take a closer look.

Contents

What is the capability of a process?
- Process Capability in the Literature
  - Definitions in the literature
  - Comments on the definitions
A First Look At The Capability Indices
Where Capability Indices Fail
Estimation Issues

Conclusions
References
Appendix 1: Why “Gaussian”
Appendix 2: The Bias In Cp and Cpk

What is the capability of a process?

The capability of a manufacturing process is the degree to which it can consistently hold the critical dimensions of the product within tolerances.

The tolerances specify what a process should do; the process capability describes what it can do. As we shall see, the way to express this concept in numbers varies with the process at hand.

To support decisions in manufacturing operations, you measure process capability in terms of yield. First you define what “good” means for a unit of product, whether it is being in-spec for a set of measurements and attributes, being used for the first 90 days without complaint, having a low maintenance cost over 5 years, …

Then, the yield is the probability that a unit coming out of the process is good, and you estimate it by the relative frequency of good units in your output. Finally, you use this number to decide, for example, whether the yields of individual operations within your process are high enough to organize them in a flow line. If not, you can then work on the improvements needed to bring them to the necessary level.

But neither the $C_p$ nor the $C_{p_k}$ are yields, and we need to look into what they are, what they mean, and how they relate to yields.

Process Capability in the Literature

The literature does not provide a crisp, concise, or consistent definition of process capability. First, let’s review a variety of sources, then let’s comment on them.

Definitions in the literature

The following is not a complete list:

Per ChatGPT, “The process capability index (Cp) is a statistical measure that assesses the ability of a process to produce items within specified tolerance limits. It considers both the process variability and the width of the tolerance range. A higher Cp value indicates better capability.” It’s not bad for ChatGPT, but it is incomplete. A statistical measure is a summary of a data set, otherwise known as a statistic. Examples include the mean, standard deviation, median, mode, skewness, kurtosis, etc. The definition doesn’t say which one to use.
The term first appeared in the Western Electric Quality Control Handbook of 1956, where “The natural behavior of the process after unnatural disturbances are eliminated is called the ‘process capability.’” It did not introduce capability indices, and did not reference tolerances.
Per the ASQ glossary, “Process capability is a statistical measure of the inherent process variability of a given characteristic.” It makes no reference to tolerances but, in the next sentence, it lists two capability indices, $C_p$ and $C_{p_k}$ , that are based on tolerances and a specific probabilistic model of the characteristic.
In the “What is process capability?” section of its Engineering Statistics Handbook, NIST manages not to define it. The closest they come in their non-answer is the opening sentence: “Process capability compares the output of an in-control process to the specification limits by using capability indices.” This means that process capability does not exist without capability indices.
Juran, in discussing quality planning, explains the lack of process capability: “Many splendid designs fail because the process by which the physical product is created or the service is delivered is not capable of conforming to the design consistently time after time” (Juran’s Quality Handbook, 5th edition, p. 3.3). Later, on p. 22.11, Frank Gryna wrote “Process capability is the measured, inherent reproducibility of the product turned out by a process,” where inherent reproducibility “refers to the product uniformity resulting from a process that is in a state of statistical control.“
Shewhart does not use the term “process capability” but, in the foreword of Shewhart’s book, Deming says “A process has no measurable capability unless it is in statistical control.” Shewhart discusses the state of statistical control extensively but does not define it explicitly.
Per Douglas Montgomery, “Process capability refers to the uniformity of the process” (Statistical Quality Control, 2nd edition, p. 365). Yes, it refers to that but what is it?
Tom Pyzdek and Paul Keller go straight into a discussion of process capability analysis, which, they say, “provides an indication of whether a controlled process is capable of reliably meeting the customer requirements.” (The Handbook for Quality Management, 2nd Edition, p. 200). Yes, but what exactly is the result of this analysis?

Comments on the definitions

The ASQ definition is about a single characteristic. Most products have many, like the 2,250 critical dimensions of a car transmission case. Per the ASQ then, the transmission case also has 2,250 different process capabilities, but that’s not consistent with usage. We think of the process of making a given model of transmission case as having one single capability to hold the vector of 2,250 critical dimensions within a specified space.

Shewhart’s focus is the distinction between common and assignable causes of variability in quality characteristics. The common causes make the characteristic fluctuate around a fixed value. Whenever you execute the process, these causes are present, in the materials, the methods, the machines, and the people.

Assignable causes of variation are not always present and disrupt the process. In the absence of any, the process is deemed to be in a state of statistical control. It doesn’t mean it’s capable, because the range of the fluctuations may exceed the tolerance interval.

A First Look At The Capability Indices

The $C_p$ and $C_{p_k}$ capability indices are emphasized in many quality courses, and customers sometimes mandate minimum values. So what are they?

The capability indices are intended to be universal metrics of process capability. As described in the earlier post, $C_p$ and $C_{p_k}$ are supposed to characterize equally well the ability to cut steel rods or bake cakes within tolerances on length or sugar content.

The $C_p$ and $C_{p_k}$ only address one dimension. For multiple characteristics, you have a multivariate $C_{p_k}$ called $MC_{p_k}$ that accounts for correlations between characteristics within a tolerance hyperrectangle. For processes with a target value, in addition to a tolerance interval, Genichi Taguchi introduced a variant called $C_{pm}$ .

Definitions of Multiple Capability Indices

$C_p$ and $C_{p_k}$ are functions of a single measured variable $X$ of a product and of the tolerance interval $[L,U]$ within which the product spec requires $X$ to fall. The definitions of $C_p$ and $C_{p_k}$ require $X$ to be a random variable with an expected value $\mu$ and a standard deviation $\sigma$ ,:

$C_p\left ( X, L,U \right )= \frac{U-L}{6\sigma}$ measures the repeatability of the process but can be high even when the process is consistently off-spec.
$C_{p_k}\left ( X, L,U \right )= min\left (\frac{U-\mu}{3\sigma}, \frac{\mu-L}{3\sigma} \right )$ takes into account the position of the spec interval.

In addition, the multiples of $\sigma$ used in the formulas are based on the further assumption that $X$ is Gaussian.

In 2012, Santos-Fernandez and Scagliarini proposed a multivariate capability index $MC_{p_k}$ but it’s not a simple combination of $C_{p_{k}}$ for each dimension. The $MC_{p_k}$ is discussed in my earlier post. It is not commonly used.

Taguchi added the concept of a target $T$ for the variable $X$ , which does not necessarily match its expected value $\mu$ . You are not just shooting for $X$ to fall anywhere within the tolerance interval $\left [ L,U \right ]$ ,; you are going for a specific target $T$ within this interval, and you incur losses that increase with the difference between $X$ and $T$ . In Taguchi’s model, the losses are quadratic – that is, proportional to $(X-T)^2$ . Consistent with this approach, the index commonly attributed to Taguchi reduces the target-free $C_{p_k}$ by dividing it with a factor that is a function of $(\mu - T)^2$ and reduces to $1$ when $\mu =T$ :

$C_{p m}\left ( X,L,U,T \right )=\frac{C_{p_k}\left ( X,L,U \right )}{\sqrt{1+\left(\frac{\mu-T}{\sigma}\right)^2}}$ ,

Other than the desire to punish the process for being off target, I have not seen a theory of the $C_{pm}$ .

Origins of the Indices

R. M. Turunen and G.H. Watson (2021) trace the concept of process capability to the Western Electric Quality Control Handbook. In their 1956 conference paper to the Japanese Society for Quality Control (JSQC). M. Kato and T. Otsu proposed the $C_p$ index as a metric of machine process capability. In a 1967 JSQC conference paper, T. Ishiyama then proposed the $C_{p_k}$ , where the “k” in $C_{p_k}$ stands for “katayori” (偏り), which is Japanese for deviation or bias.

According to Greg Watson, the papers were in the JUSE library archives that were since moved to a university, and “These authors were working in a factory and were attempting to create an index that could be used to identify the idealized production capability of their lines.”

This is where the trail ends for now. I could not find further information about M. Kato, T. Otsu, or T. Ishiyama, or their papers. The main professional society for quality in Japan is JUSE, not JSQC. The JSQC website says that it was founded in 1970, which explains the scarcity of information about earlier conferences. Over the decades since, however, the $C_{p}$ and $C_{p_k}$ from these obscure papers became part of the dogma taught in quality courses.

David Hutchins reports coming across the $C_{p_k}$ in the 1960s as a production engineer in a company supplying Ford and not being impressed. He prefers expressing capabilities in terms of the Taguchi index. Gary Cone provided additional details on the origin of $C_{pm}$ :

“There was great interest in Taguchi in the 80’s. A group of Motorola employees were introduced to Taguchi when I invited Tom Barker of Rochester Institute of Technology to teach his methods. We played around with his notions of target values and called the result $C_{pt}$ (capability with respect to Taguchi). It was introduced to a slightly larger audience in 1988 through a paper in the Journal of Quality Technology and the three authors named it $C_{pm}$ The concept is very applicable in situations where there is an expensive thing to control, such as legal package weight in the food industry, is overcontrolled using $P_{p_k}$ ( $C_{p_k}$ for long term data).”

Underlying Assumptions

Walter Shewhart was perhaps the first to use probability theory to model process characteristics. As he put it in Statistical Method from the Viewpoint of Quality Control, “The engineer must be concerned not only with the tolerance range but also with the probability associated with that range.”

Shewhart’s Ambition

Shewhart’s writings combine considerations about tolerances, part interchangeability, and common-cause versus special-cause variability, with technical specifics. Although he does not explicitly say so, several key points emerge:

He aims to establish a general theory, applicable to quality characteristics in any industry.
He wants to apply the contemporary state of the art in probability and statistics.
He wants to set procedures that any technician could apply, with manual data collection, paper spreadsheets, graph paper, slide rules, and printed tables.

It’s been almost 100 years since Shewhart did his seminal work at Western Electric and, while his insights about probability were valid, much else has changed:

The challenge of making processes capable. Today, techniques useful in the 1920s are no longer needed in mature industries and are insufficient in high technology.
Probability theory itself. Shewhart was a thorough and rigorous thinker, but his discussions of “statistical universes” or “probability limits” are difficult to follow for anyone trained in probability theory in the past 50 years, as the concepts have advanced and the vocabulary to describe them have changed.
Advances in Information Technology (IT) and Operational Technology (OT) have enabled the automatic collection, storage, retrieval, and analysis of datasets that are orders of magnitude larger than in Shewhart’s days.

Shewhart’s Model

Shewhart’s model of a measured characteristic in a state of statistical control is the sum of a target value and a white noise. He does not use these terms but, in today’s language, it is what he does. It means that the model assumes the differences between measurements and the target at different points to be independent and identically distributed.

In addition, for control charts, the thresholds he uses to raise alarms about disruptions from assignable causes are mathematically based on the Gaussian distribution, also known as “normal,” and arbitrarily set to a p-value of 0.3%. In other words, without an assignable cause, the probability that the statistic plotted on the chart is beyond the threshold in either direction is 0.3%. See Appendix 1 about why I am calling this distribution “Gaussian” rather than “normal” or “bell-shaped.”

The Statistical Process Control (SPC) literature skips the math and provides coefficients that technicians, or SPC software, are supposed to use in setting limits. If you work out the math, however, it becomes clear that these coefficients are based on the Gaussian distribution.

Where Capability Indices Fail

Let’s take the example of manufactured goods with characteristics that are not Gaussian. To this day, there is a range of products made with processes that cannot control a critical dimension precisely and where you bin finished units into different grades or sizes based on one or more dimensions.

The first reaction to such a process is that it shouldn’t exist! It is not capable, and the first step should be to make it capable. In reality, however, it does exist, and it’s not a minor phenomenon. From the perspective of the recipient of its output, to the extent the binning step works, the units are all within tolerances, and the process is capable.

There are cases where we just don’t know how to avoid it yet. We have been making ceramics for ~30,000 years. Still, the firing process today introduces variations in dimensions that are too high for parts used in electronics assembly. To meet customer specs you must either cut the products after firing, which is slow and expensive, or bin the output of firing by size classes.

Other manufactured goods that you sort into bins after making them range from lead shot to microprocessors.

Lead Shot

Lead shot for hunting comes in a variety of diameters:

The different sizes, however, are not made by a size-specific process. Instead, they are the result of binning by diameter after a single process, done in a tower like this one:

The lead shot tower of Couëron, active 1878-1993

At the top of the tower, you pour molten lead onto a sieve, and the falling drops solidify on the way down. At the bottom, those not spherical enough don’t roll down a chute properly and return to the melt. Then you sort the remaining ones by diameter.

When you make different products by binning, a common process determines the breakdown by size. With lead shot, you can always recycle the overage if you have more of a given size than the market demands. You do not have that option if you bin socks by size.

Microprocessors

High-technology, ubiquitous products like microprocessors are the result of binning, as described in an interview by an Intel fellow:

Different units of the same microprocessor made in the same fabrication line and possibly on the same wafer of Silicon may be binned at different clock speeds, one being sold as a chip that works at 5.8 GHz, and the other one at 3 GHz. You first test it at the highest speed. If it doesn’t pass, you test it at a lower speed. If it doesn’t pass at any speed, it’s a reject. Otherwise, it goes into the bin for the highest speed it passes – that is, unless other considerations override the objective criterion.

The fastest chips sell at a high premium, as long as they are in short supply, and the chip maker may be tempted to downgrade chips that work at the higher clock speed to maintain the scarcity of the fast chips.

In at least one case, this did not escape the attention of a reseller, who bought large volumes of “slow” chips, retested them, and relabeled the one that passed the high-speed test. The supplier’s response was to engrave their pessimistic speed rating on the chips to make them more difficult to relabel.

Modeling Dimensions After Binning

The following picture shows a variable $X$ with a distribution wider than the tolerance interval $\left [ a,b \right ]$ . From the user’s perspective, this process is capable if you can filter all units with $x$ outside of $\left [ a,b \right ]$ .

In the example above, the distribution within $\left [ a,b \right ]$ is flat enough to be approximated with the uniform distribution, for which

The mean is $\mu = \frac{1}{2}\left ( a+b \right )$ ,
The standard deviation is $\sigma = \sqrt{\frac{1}{12}}\left ( b-a \right )$ ,

and

$C_p = C_{p_k} = \frac{b-a}{6\sigma} = \frac{1}{6\sqrt{\frac{1}{12}}} = \frac{1}{\sqrt{3}} = 0.58$ ,

The process, therefore, has low capability indices in spite of being as capable as a binning process can be. There is no way to achieve high capability indices with this process.

Estimation Issues

When you study the capability of a process, $\mu$ and $\sigma$ are unknown, and must be estimated from a sample $\mathbf{x} = \left ( x_1,...x_n \right )$ of measurements. You replace $\mu$ and $\sigma$ with their usual estimates:

$m(\mathbf{x}) = \frac{1}{n}\sum_{i =1}^{n}x_i$ and
$s(\mathbf{x}) = \sqrt{\frac{1}{n-1}\sum_{i=1}^{n}\left [ x_i - m(\mathbf{x}) \right ]^2}$ ,

And you estimate $C_p$ and $C_{p_k}$ by:

$\hat{C}_p\left ( \mathbf{x}, L,U \right )= \frac{U-L}{6\times s\left ( \mathbf{x} \right )}$ and
$\hat{C}_{p_k}\left ( \mathbf{x}, L,U \right )= min\left (\frac{U-m\left ( \mathbf{x} \right )}{3\times s\left ( \mathbf{x} \right )}, \frac{m\left ( \mathbf{x} \right )-L}{3\times s\left ( \mathbf{x} \right )} \right )$ ,

The ASQ’s Quality Resources article about the capability indices glosses over the fact that $\hat{C}_p$ and $\hat{C}_{p_k}$ are biased estimators of $C_p$ and $C_{p_k}$ . There is a rich and esoteric academic literature on correcting these biases, but the quality literature generally ignores them.

When $X$ is Gaussian, the math shows that they systematically overestimate the indices, by amounts that are not negligible for small samples. If you work with samples of more than 1,000 points, you don’t need to worry about it; with 30 points, you do. You usually conduct process capability studies before ramping up production, on small samples.

Model Parameters Versus Estimates

The literature makes no distinction between model parameters and their estimates:

The $C_p$ and $C_{p_k}$ that are functions of your model of the variable $X$ and the tolerance interval $[L,U]$ . $X$ stands for all the values the variable may take when you measure it.
The $\hat{C}_p$ and $\hat{C}_{p_k}$ estimate $C_p$ and $C_{p_k}$ from data. $\mathbf{x} = \left ( x_1,...x_n \right )$ is the sequence of numbers you measured.

Biases in Capability Indices

Statisticians ask questions about estimators, particularly whether they give the right answer on the average. $s(\mathbf{x})$ is a function of the data set $\mathbf{x} = \left ( x_1,...x_n \right )$ . To answer the question, you view $s(\mathbf{x})$ itself as an instance of the random variable $s(\mathbf{X})$ where $\mathbf{X} = \left ( X_1,...X_n \right )$ is a sequence of independent and identically distributed random variables.

With a little algebra, you can easily verify that $E\left ( s\left ( \mathbf{X} \right )^2\right )= \sigma^2$ . In other words, $s^2$ is an unbiased estimator of $\sigma^2$ . This is true for any random variable that has a standard deviation. It does not have to be Gaussian.

On the other hand $E\left ( s\left ( \mathbf{X} \right )\right ) \neq \sigma$ . The theory of the $\bar{X}-S$ control chart takes this bias into account through a correction factor called $c_4$ , based on the assumption that the variable is Gaussian.

$\hat{C}_p$ and $\hat{C}_{p_k}$ have $s(\mathbf{x})$ in their denominators and, in general, $E\left ( \frac{1}{s\left ( \mathbf{X} \right )} \right )\neq \frac{1}{\sigma}$ .

If the $X_i$ are Gaussian variables with mean $\mu$ and standard deviation $\sigma$ , we have instead the more complicated formula:

$E\left ( \frac{1}{s\left ( \mathbf{X} \right )} \right )= \frac{1}{\sigma}\times g\left ( n \right )$ ,

where

$g\left ( n \right ) = \frac{\sqrt{n-1}}{\sqrt{2}}\times \frac{\Gamma\left ( \left ( n-2 \right )/2 \right ) }{\Gamma\left ( \left ( n-1\right )/2 \right )} = 1+\frac{3}{4n} + O\left (\frac{1}{n^2} \right )$ ,

For readers interested in the math of these formulas, their derivation is in Appendix 2.

How Large are the Biases?

For large $n$ , the bias is approximately $3/4n$ . This approximation is close for a sample size as small as $n= 100$ , giving you 0.75% instead of 0.76%.

The following table shows the ratio by which $E\left ( \frac{1}{s\left ( \mathbf{X} \right )} \right )$ exceeds $\frac{1}{\sigma}$ as a function of sample size:

Sample Size	Bias
5	+25.00%
10	+9.00%
30	+2.6%
100	+0.76%
1,000	+751 ppm
100,000	+8 ppm

This is for the Gaussian distribution. If your measured variable is Gaussian, $\hat{C}_p$ and $\hat{C}_{p_k}$ systematically overestimate $C_p$ and $C_{p_k}$ , by amounts that become negligible as the sample size increases. With $X$ Gaussian, you could use the formula for $g(n)$ to remove the bias. For other distributions, however, the formula does not apply.

The General Case

In the general case, Jensen’s inequality, applied to the convex function $\phi(x) = \frac{1}{\sqrt{x}}$ gives us:

$E\left [ \frac{1}{s\left ( \mathbf{X} \right )} \right ] = E\left [ \frac{1}{\sqrt{s^2\left ( \mathbf{X} \right )}} \right ]\geq \frac{1}{\sqrt{E\left [s^2\left ( \mathbf{X} \right ) \right ]}}= \frac{1}{\sigma}$ ,

This means that the estimates $\hat{C}_p$ and $\hat{C}_{p_k}$ may overestimate $C_p$ and $C_{p_k}$ but not underestimate them. This bias needs to be calculated for the specific distribution of $X$ .

Example from the ASQ

The ASQ’s Quality Resources website discusses sampling for a process capability study:

“For example, suppose you have a rotary tablet press that produces 30 tablets, one from each of 30 pockets per rotation. If you’re interested in tablet thickness, you might want to base your estimate of process capability on the standard deviation calculated from 30 consecutive tablets. Better yet, you might assure representation by taking those 30 consecutive tablets repeatedly over eight time periods spaced evenly throughout a production run. You would pool the eight individual standard deviations yielding a thickness capability estimate based on $8 \times (30 - 1) = 232$ degrees of freedom.”

If you estimate the $C_p$ and $C_{p_k}$ of thickness based on a sample of 30 tablets, you will overestimate them by about 2.6%, and all you have from 8 samples a more consistent bias of 2.6%. If, on the other hand, you estimate your indices from a single sample of $8\times 30 = 240$ points, you have a negligible bias of 0.3%.

The press produces 30 tablets in 30 pockets simultaneously, systematic variations between pockets interest process engineers. To study them, you must separate the tablets by pocket, which is challenging. If you throw all the tablets into a common bin, you lose this information:

To the customer, however, the only thing that matters is the consistency of thickness in the entire population of tablets regardless of which pockets on the machine they come from.

NIX, an injection molding company in Japan had developed this “octopus” to keep parts separated by die cavity in the blue bins at the bottom:

Translation to Yield

In the ASQ tablet example, let us assume that the thickness is centered on the middle of the tolerance interval, so that $C_p= C_{p_k} = \frac{U-L}{6\sigma}$ and $\mu = \frac{U+L}{2}$ . The yield of the process is

Y = \Phi\left ( \frac{U-\mu}{\sigma} \right ) - \Phi\left ( \frac{\mu-L}{\sigma} \right ) = \Phi\left ( \frac{U-L}{2\sigma} \right ) - \Phi\left ( -\frac{U-L }{2\sigma} \right ) = \Phi\left ( 3C_p \right ) - \Phi\left (-3C_p \right )

where $\Phi$ is the cumulative distribution function for the Gaussian with zero mean and unit variance. The yield is only a function of $C_p$ , which is exactly the point of using it.

Yield for Many Characteristics

This is for $1$ measured parameter. An automatic car transmission case has 2,250 critical dimensions. If they are all independent, centered, with the same $C_p$ , then, they all have equal yields, and their joint yield is $Y^{2250}$ . As $C_p$ varies from 1 to 2, the yield for the transmission case varies as follows:

In this simplified but not unrealistic case, any $C_p \leq 1.5$ fails to give you a capable process. In diecasting, it’s unlikely that all dimensions would have the same $C_p$ . On the other hand, if you are assembling thousands of purchased components and require only a $C_p = 1$ from suppliers, you are unlikely to get anything better, and your assembly process will have a yield near $0$ .

A $C_p = 1.5$ says that, for any given dimension $X$ among the 2,250 of the transmission case, the tolerance interval $\left [ L,U \right ]$ matches the $\pm 4.5\sigma$ interval of the dimension’s distribution. Numerically, this translates to

$P\left ( X < L \right ) = P\left ( X > U \right ) = 3.4\times 10^{-6}$ ,

Based on the strange, arbitrary, and confusing theory of the 1.5 Sigma shift, the “Six Sigma” literature calls this the “Six Sigma level.” When a product has 2,250 dimensions, each at this level, it only has a probability of 98.5% that they will all be within tolerances, which is unacceptably low in automotive parts.

At $C_p = 2$ , $\left [ L,U \right ]$ matches the actual $\pm 6\sigma$ interval of the dimension’s distribution. Then

$P\left ( X < L \right ) = P\left ( X > U \right ) = 9.9 \times 10^{-10}$ ,

It is 3445 times smaller than for $C_p = 1.5$ , and the probability of having all dimensions are all within tolerances rises to 99.9996%.

Does the Bias Make a Difference?

Again, let’s look at what happens when we use estimates. Equating $\hat{C}_p$ with $C_p$ , we estimate the yield as

\hat{Y} = \Phi\left ( 3\hat{C}_p \right ) - \Phi\left (-3\hat{C}_p \right )

Does this add more bias? In other words, how does the expected value of the estimate $E\left (\hat{Y} \right ) = E\left [\Phi\left ( 3\hat{C}_p \right ) \right ] - E\left [\Phi\left (-3\hat{C}_p \right ) \right ]$ compare with the theoretical value $Y = \Phi\left ( 3C_p \right ) - \Phi\left (-3C_p \right )$ ?

The easiest way to estimate this is through a simulation. You generate 30 million independent instances of a Gaussian variable with $0$ mean and unit variance, and group them into 1 million samples of 30 each. You estimate the standard deviation within each sample, and then take the average over the 1 million samples:

$\overline{\hat{Y}} =\overline{\Phi\left ( 3\hat{C}_p \right ) - \Phi\left (-3\hat{C}_p \right )}$ ,

Let’s apply this to the transmission case that has 2,250 critical dimensions. If they are all independent, centered, with the same $C_p$ , then, they all have an equal probability $Y$ of falling within their tolerance intervals, and the probability that all of them do is $Y^{2250}$ .

Let’s look at the effect of the bias as $C_p$ varies from $1$ to $2$ :

Let us assume that we use a sample size of $100$ instead of the $30$ in the ASQ example. With the same simulated data, we can form $300,000$ samples of $100$ points each, and recalculate estimates. As shown in the following picture, we get closer to the theoretical value:

Unsurprisingly, larger samples give you better results. Unfortunately, process capability studies are part of new product introduction, when large samples are unavailable.

Conclusions

Originating in obscure papers in Japan in 1956 and 1967, the $C_p$ and $C_{p_k}$ have upstaged yields not only in the literature on process capability but also, sometimes, in the expression of requirements from customers to suppliers.

As defined, the $C_p$ and $C_{p_k}$ are applicable only to measured variables that follow a Gaussian distribution, a requirement that is far from universally met. The formulas commonly used to estimate $C_p$ and $C_{p_k}$ from data are biased in ways that are not negligible for the small datasets used in capability studies, and massive when $C_p$ and $C_{p_k}$ estimates are used to infer yields.

The math of $C_p$ and $C_{p_k}$ is flawed and more complicated than that of yields, and less generally applicable. It’s not clear that they present any advantage.

References

Alvarez. E. , Moya-Fernandez, P.J., .Blanco-Encomienda, F.J., & Munoz, J. F. (2015) Methodological insights for industrial quality control management: The impact of various estimators of the standard deviation on the process capability index, Journal of King Saud University–Science (2015)27,271–277
Burdick, R., Borror, C., & Montgomery, D. (2005) Design and Analysis of Gauge R and R Studies: Making Decisions with Confidence Intervals in Random and Mixed ANOVA Models. American Statistical Association and the Society for Industrial and Applied Mathematics, ISBN: 0898715881.
Chang, L. & Cheng, Smiley & Spiring, Fred. (1988). A New Measure of Process Capability: Cpm. Journal of Quality Technology. 20. 162-175. 10.1080/00224065.1988.11979102.
Chao, M-T & Lin, D. K. J. (2006). Another Look at the Process Capability Index, QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL Qual. Reliab. Engng. Int. 2006: 22: 153-163
Current, M. (2014). Metrology Methods for Ion Implantation Process Controls.
Isaic-Maniu, A., Dragan, I-M., Grigore, A-M., & Constantin, F. (2023). Taguchi Risk and Process Capability. Risks. 11. 178. 10.3390/risks11100178.
May, G. S. & Spanos, C. J. (2006). Fundamentals of Semiconductor Manufacturing and Process Control. Germany: Wiley.
Mastrangelo, C. M., Montgomery, D. C. (1991). Introduction to Statistical Quality Control. United States: John Wiley & Sons.
Pearson, E. S. (1935). The Application of Statistical Methods to Industrial Standardization and Quality Control. United Kingdom: British Standards Institution.
Pyzdek, T., Keller, P. A. (2013). The Handbook for Quality Management, Second Edition: A Complete Guide to Operational Excellence. United Kingdom: McGraw-Hill Education.
Santos-Fernandez, E. & Scagliarini (2012) An R Package for Computing Multivariate Process Capability Indices, Journal of Statistical Software April 2012, Volume 47, Issue 7.
Shewhart, W. A. (1939). Statistical Method from the Viewpoint of Quality Control. United States: Dover Publications.
Small, B.B. (Ed.) (1956). Statistical Quality Control Handbook. United States: Western Electric.
Taghizadeh-Yazdi, M. (2005). Description of Process Capability Indices.
Tricomi, F. G. & Erdelyi. (1951) A., The Asymptotic Expansion of a Ratio of Gamma Functions, Pacific Journal of Mathematics, 1 (1951), pp. 133-142
Turunen, R. M. & Watson, G. H. (2021), Modern Approach: Analyzing the Capability of Lean Processes, Quality Progress, 54:3, pp. 14-21.

Appendix 1: Why “Gaussian”

Many call the Gaussian distribution “normal” or “bell-shaped.” I avoid both terms because normal suggests that all other distributions are somehow abnormal, and bell-shaped is imprecise, as many other distributions are also bell-shaped.

The “Gaussian” label is also commonly used and honors C.F. Gauss, whom Germany recognized on the 10-DM bill in the 1990s for his contributions to the theory of this distribution:

Appendix 2: The Bias In Cp and Cpk

This is beyond High School math. It uses the $\Gamma$ function that Euler defined to extend $n!$ to real and complex arguments, and $O\left (\frac{1}{n^2}\right )$ means “a quantity that vanishes like $\frac{1}{n^2}$ as $n$ increases.” The bias factor $g(n)$ is the result of a long calculation but is needed only for $n \leq 100$ . It can then be used to correct the bias when estimating $C_p$ from small samples.

Distribution of the Variance Estimator

Per Cochran’s theorem, with Gaussian $X_i$ , the sum of squares $\left [\frac{n-1}{\sigma} \right ]^2\times s\left ( \mathbf{X} \right )^2$ follows a $\chi^2$ distribution with $n-1$ degrees of freedom, which involves Euler’s function. The probability distribution function of a $\chi^2$ with $k$ degrees of freedom is

$f\left ( x,k \right )= \frac{1}{2^{k/2}\Gamma\left ( k/2 \right )}x^{k/2-1}e^{-x/2}$ ,

If you are comfortable with this formula, skip to the next section. If you want to see how it is derived from the Gaussian, read on.

A $\chi^2$ with $k$ degrees of freedom is the sum the squares of $k$ independent Gaussians $N(0,1)$ , and its value is constant over the sphere $r^2 = x^2_1 +\dots+x^2_k$ .

The area $A_{k-1}$ of the unit sphere $S_{k-1}$ in $\mathbb{R}^k$ is $\frac{2\pi^{k/2}}{\Gamma\left ( k/2 \right )}$ , and, for the sphere of radius $r$ , it is

$A_{k-1}(r)=r^{k-1}\frac{2\pi^{k/2}}{\Gamma\left ( k/2 \right )}$ ,

If we change the variable to $u = r^2$ , then $dr = \frac{1}{2\sqrt{u}}du$ and the density of a $\chi^2$ with $k$ degrees of freedom becomes

$f\left ( u,k \right )= A_{k-1}(\sqrt{u})\times \phi(u)\times\frac{1}{2\sqrt{u}}$ ,

where $\phi(u)$ is the value of the multivariate Gaussian density for any point in the sphere of radius $\sqrt{u}$ ,:

$\phi(u) = \phi(u) =\prod_{i =1}^{k}\frac{1}{\sqrt{2\pi}}e^{-x^2_i} = \frac{1}{\left ( 2\pi \right )^{k/2}}e^{-r^2/2} = \frac{1}{\left ( 2\pi \right )^{k/2}}e^{-u/2}$ ,

Therefore

$f(u,k)=u^{(k-1)/2}\frac{2\pi^{k/2}}{\Gamma\left ( k/2 \right )}\times \frac{1}{\left ( 2\pi \right )^{k/2}}e^{-u/2}\times \frac{1}{2\sqrt{u}}$ ,

which simplifies to

$f(u,k)=\frac{1}{2^{k/2}\Gamma\left ( k/2 \right )}u^{(k/2-1)}e^{-u/2}$ ,

Expected Value of the Inverse of the Standard Deviation Estimate

$u^2 =(n-1)s^2$ is a $\chi^2$ variable with $n-1$ degrees of freedom. The expected value of the inverse square root of $u$ is therefore

$h\left ( n \right )= \int_{0}^{+\infty}\frac{1}{\sqrt{u}}f(u,n-1)du=\frac{1}{2^{(n-1)/2}\Gamma\left ( (n-1)/2 \right )}\int_{0}^{+\infty}u^{((n-2)/2-1)}e^{-u/2}du$ ,

By a change of variable to $v = u/2$ , the integrand becomes

$\int_{0}^{+\infty}u^{(n-2)/2-1}e^{-u/2}du = 2^{(n-2)/2} \times \int_{0}^{+\infty}v^{(n-2)/2-1}e^{-v}dv$ ,

$\int_{0}^{+\infty}u^{(n-2)/2-1}e^{-u/2}du = 2^{(n-2)/2} \times \Gamma((n-2)/2)$ ,

and

$h\left ( n \right ) = \frac{\Gamma((n-2)/2)}{\sqrt{2}\Gamma\left ( \left ( n-1 \right )/2 \right )}$ ,

Therefore

g\left ( n \right ) = \frac{\sqrt{n-1}}{\sqrt{2}}\times \frac{\Gamma((n-2)/2)}{\Gamma\left ( \left ( n-1 \right )/2 \right )} = \sqrt{ \frac{n-1}{2}}\times \frac{\Gamma((n-1)/2 - 1/2)}{\Gamma\left ( \left ( n-1 \right )/2 \right )}

Approximation for Large Samples

To approximate this for large $n$ , let’s express $g(n)$ as a function of $z = \frac{n-1}{2}$ , so that

$g(n) = \sqrt{z}\times \frac{\Gamma(z - 1/2)}{\Gamma(z)}$ ,

The Stirling’s series for large $z$ gives us

$\frac{\Gamma\left ( z + \alpha \right )}{\Gamma\left ( z + \beta \right )}= z^{\alpha +\beta}\times\left [ 1 + \frac{\left ( \alpha -\beta \right )\left ( \alpha + \beta -1 \right )}{2z}+ O\left ( \frac{1}{z^2} \right ) \right ]$ ,

If we plug in $\alpha = -1/2$ and $\beta = 0$ , we get

$g(n)= \sqrt{z}\times\frac{\Gamma\left ( z -1/2 \right )}{\Gamma\left ( z \right )}= 1 + \frac{3}{8z}+ O\left ( \frac{1}{z^2} \right )$ ,

and, translating back in terms of $n$ ,

$g(n)= 1+ \frac{3}{4\left ( n-1 \right )}+ O\left ( \frac{1}{n^2} \right ) = 1+ \frac{3}{4n}+ O\left ( \frac{1}{n^2} \right )$ ,

#processcapability, #capabilityindex, #cp, #cpk, #quality

By Michel Baudin • Data science • 7 • Tags: capability index, Cp, Cpk, Process capability, Quality

7 Comments

Process Capability Indices | Lean Office .org
November 13, 2023 @ 6:33 pm

[…] post Process Capability Indices appeared first on Michel Baudin's […]

Joerg Muenzing
November 13, 2023 @ 10:09 pm

Thank you for this clear, convincing and thorough article on process capability. It fills many blank spots in the technical literature and blind spots in our quality concepts. Despite the limitations of these indices, they are often used in part specifications where customers require a certain capability to be sure they are getting good parts from a supplier. Reducing to a single number, regardless of the number of critical dimensions, is often required to communicate the end result (the “capability”) within a cross-functional team where not everyone understands it at a deeper level, e.g. a buyer wants to buy a part at a certain price and with a certain capability and not be bothered with hundreds of critical dimensions. Thinking about the limitations of such indices, maybe we should go back to specify overall yield and scrap to express capability.This would be a big departure but also eliminate ambiguity…

Gary Cone
November 14, 2023 @ 2:32 pm

Tools usable and used. That is the only criteria to consider.

The author presents no alternatives that are superior.

Cp, Cpk, Pp, Ppm, and Cpm have all proved useful for decades. Properly trained personnel also graph their data for a sanity check.

Binning is also useful and used extensively. If fact, when processes are not seen as capable, it is an extremely valuable tool.

I always enjoy these theoretical diatribes that offer no solution superior to their subject.

Dr Muhammad Iqbal Hussain
November 22, 2023 @ 8:44 pm

Thanks so much for sharing the details on process capability topic. Process capability provides a basic understanding for the production personnel on how the process behaves in relative to the process specifications. What is value for the producer…

This gives an initial feel on how variability can affect the process and gives us some measurement metrics for quantifying that variability.

Such studies provide information on what the process could do under its best operating conditions, by making improvement to the process to aim for its desired target limits.

Process Control and Gaussians
March 5, 2024 @ 1:36 pm

[…] the discussion of process capability indices, we have seen that binning produces distributions that are definitely not Gaussian. There are also […]

Michael Radeck
January 31, 2025 @ 1:15 pm

Thank you for this great article! I have just found (Google Books) the “Ordonance Inspection Handbook on Statistical Quality Control and Acceptance Sampling, ORD-M608-9”. It was released in March 1952. Section 8.4 has the header “Process Capability”. The definition is:
Process Capability <= Upper Sepc. Limit (TU) – Lower Spec. Limit (TL)
or 6 R_bar/d2 <= TU – TL
All data used is based on control charts: The task of the target value orientation has been left to the control chart. The booklet is a gem and I enjoyed the journey back to the 50s. Great masters on whose legacy we build.

- Michel Baudin
  January 31, 2025 @ 1:18 pm
  
  Thanks for the reference.

Process Capability Indices