Process Capability Indices

Within the quality profession, a capable process is one with a high C_{p_k}. In the field, it is not quite so simple. My colleague Joerg Muenzing recently shared concerns about the process capability indices:

“Many manufacturers that I know struggle with incapable processes. Intellectually, people understand the concept of capability, but are unable to effectively apply it to an entire process. A single-figure measure for the entire chain would be ideal to better understand and manage it. The challenge is that the chain consists of processes with measurable characteristics, like thickness, substitute characteristics, like leak current to infer dryness, and also visual inspection results like blemishes or scratches.

They know that C_{p_k} < 1 is bad, and scrap shrinks the bottom line, but not much more. At the same time, customers like large Automotive OEMs demand from their suppliers a C_{p_k} > 1.33 for manual and “uncritical” characteristics, and C_{p_k} >1.66 or even >2.00 for critical characteristics.

What would be useful without being ‘too wrong’?”

Let’s take a closer look.

What is the capability of a process?

The capability of a manufacturing process is the degree to which it can consistently hold the critical dimensions of the product within tolerances.

The tolerances specify what a process should do; the process capability describes what it can do. As we shall see, the way to express this concept in numbers varies with the process at hand.

To support decisions in manufacturing operations, you measure process capability in terms of yield. First you define what “good” means for a unit of product, whether it is being in-spec for a set of measurements and attributes, being used for the first 90 days without complaint, having a low maintenance cost over 5 years, …

Then, the yield is the probability that a unit coming out of the process is good, and you estimate it by the relative frequency of good units in your output. Finally, you use this number to decide, for example, whether the yields of individual operations within your process are high enough to organize them in a flow line. If not, you can then work on the improvements needed to bring them to the necessary level.

But neither the C_p nor the C_{p_k} are yields, and we need to look into what they are, what they mean, and how they relate to yields.

Process Capability in the Literature

The literature does not provide a crisp, concise, or consistent definition of process capability. First, let’s review a variety of sources, then let’s comment on them. 

Definitions in the literature

The following is not a complete list:

  • Per ChatGPT, “The process capability index (Cp) is a statistical measure that assesses the ability of a process to produce items within specified tolerance limits. It considers both the process variability and the width of the tolerance range. A higher Cp value indicates better capability.” It’s not bad for ChatGPT, but it is incomplete. A statistical measure is a summary of a data set, otherwise known as a statistic. Examples include the mean, standard deviation, median, mode, skewness, kurtosis, etc. The definition doesn’t say which one to use.
  • The term first appeared in the Western Electric Quality Control Handbook of 1956, where “The natural be­havior of the process after unnatural disturb­ances are eliminated is called the ‘process capa­bility.’” It did not introduce capability indices, and did not reference tolerances.
  • Per the ASQ glossary, “Process capability is a statistical measure of the inherent process variability of a given characteristic.” It makes no reference to tolerances but, in the next sentence, it lists two capability indices, C_p and C_{p_k}, that are based on tolerances and a specific probabilistic model of the characteristic.
  • In the “What is process capability?” section of its Engineering Statistics Handbook, NIST manages not to define it. The closest they come in their non-answer is the opening sentence: “Process capability compares the output of an in-control process to the specification limits by using capability indices.” This means that process capability does not exist without capability indices.
  • Juran, in discussing quality planning, explains the lack of process capability: “Many splendid designs fail because the process by which the physical product is created or the service is delivered is not capable of conforming to the design consistently time after time” (Juran’s Quality Handbook, 5th edition, p. 3.3). Later, on p. 22.11, Frank Gryna wrote “Process capability is the measured, inherent reproducibility of the product turned out by a process,” where inherent reproducibilityrefers to the product uniformity resulting from a process that is in a state of statistical control.
  • Shewhart does not use the term “process capability” but, in the foreword of Shewhart’s book, Deming says “A process has no measurable capability unless it is in statistical control.” Shewhart discusses the state of statistical control extensively but does not define it explicitly.
  • Per Douglas Montgomery, “Process capability refers to the uniformity of the process” (Statistical Quality Control, 2nd edition, p. 365). Yes, it refers to that but what is it?
  • Tom Pyzdek and Paul Keller go straight into a discussion of process capability analysis, which, they say, “provides an indication of whether a controlled process is capable of reliably meeting the customer requirements.” (The Handbook for Quality Management, 2nd Edition, p. 200). Yes, but what exactly is the result of this analysis?
Automatic Transmission Case from GM

Comments on the definitions

The ASQ definition is about a single characteristic. Most products have many, like the 2,250 critical dimensions of a car transmission case. Per the ASQ then, the transmission case also has 2,250 different process capabilities, but that’s not consistent with usage. We think of the process of making a given model of transmission case as having one single capability to hold the vector of 2,250 critical dimensions within a specified space.

Shewhart’s focus is the distinction between common and assignable causes of variability in quality characteristics. The common causes make the characteristic fluctuate around a fixed value. Whenever you execute the process, these causes are present, in the materials, the methods, the machines, and the people.

Assignable causes of variation are not always present and disrupt the process. In the absence of any, the process is deemed to be in a state of statistical control. It doesn’t mean it’s capable, because the range of the fluctuations may exceed the tolerance interval.

A First Look At The Capability Indices

The C_p and C_{p_k} capability indices are emphasized in many quality courses, and customers sometimes mandate minimum values. So what are they?

The capability indices are intended to be universal metrics of process capability. As described in the earlier post, C_p and C_{p_k} are supposed to characterize equally well the ability to cut steel rods or bake cakes within tolerances on length or sugar content.

The C_p and C_{p_k} only address one dimension. For multiple characteristics, you have a multivariate C_{p_k} called MC_{p_k} that accounts for correlations between characteristics within a tolerance hyperrectangle. For processes with a target value, in addition to a tolerance interval, Genichi Taguchi introduced a variant called C_{pm}.

Definitions of Multiple Capability Indices

C_p and C_{p_k} are functions of a single measured variable X of a product and of the tolerance interval [L,U] within which the product spec requires X to fall. The definitions of C_p and C_{p_k} require X to be a random variable with an expected value \mu and a standard deviation \sigma,:

  • C_p\left ( X, L,U \right )= \frac{U-L}{6\sigma} measures the repeatability of the process but can be high even when the process is consistently off-spec.
  • C_{p_k}\left ( X, L,U \right )= min\left (\frac{U-\mu}{3\sigma}, \frac{\mu-L}{3\sigma} \right ) takes into account the position of the spec interval.

In addition, the multiples of \sigma used in the formulas are based on the further assumption that X is Gaussian.

In 2012, Santos-Fernandez and Scagliarini proposed a multivariate capability index MC_{p_k} but it’s not a simple combination of C_{p_{k}} for each dimension. The MC_{p_k} is discussed in my earlier post. It is not commonly used.

Taguchi added the concept of a target T for the variable X, which does not necessarily match its expected value \mu. You are not just shooting for X to fall anywhere within the tolerance interval \left [ L,U \right ],; you are going for a specific target T within this interval, and you incur losses that increase with the difference between X and T. In Taguchi’s model, the losses are quadratic – that is, proportional to (X-T)^2. Consistent with this approach, the index commonly attributed to Taguchi reduces the target-free C_{p_k} by dividing it with a factor that is a function of (\mu - T)^2 and reduces to 1 when \mu =T:

C_{p m}\left ( X,L,U,T \right )=\frac{C_{p_k}\left ( X,L,U \right )}{\sqrt{1+\left(\frac{\mu-T}{\sigma}\right)^2}},

Other than the desire to punish the process for being off target, I have not seen a theory of the C_{pm}.

Origins of the Indices

R. M. Turunen and G.H. Watson (2021) trace the concept of process capability to the Western Electric Quality Control Handbook. In their 1956 conference paper to the Japanese Society for Quality Control (JSQC). M. Kato and T. Otsu proposed the C_p index as a metric of machine process capability. In a 1967 JSQC conference paper, T. Ishiyama then proposed the C_{p_k} , where the “k” in C_{p_k} stands for “katayori” (偏り), which is Japanese for deviation or bias.

According to Greg Watson, the papers were in the JUSE library archives that were since moved to a university, and “These authors were working in a factory and were attempting to create an index that could be used to identify the idealized production capability of their lines.”

This is where the trail ends for now. I could not find further information about M. Kato, T. Otsu, or T. Ishiyama, or their papers. The main professional society for quality in Japan is JUSE, not JSQC. The JSQC website says that it was founded in 1970, which explains the scarcity of information about earlier conferences. Over the decades since, however, the C_{p} and C_{p_k} from these obscure papers became part of the dogma taught in quality courses.

David Hutchins reports coming across the C_{p_k} in the 1960s as a production engineer in a company supplying Ford and not being impressed. He prefers expressing capabilities in terms of the Taguchi index. Gary Cone provided additional details on the origin of C_{pm}:

“There was great interest in Taguchi in the 80’s. A group of Motorola employees were introduced to Taguchi when I invited Tom Barker of Rochester Institute of Technology to teach his methods. We played around with his notions of target values and called the result C_{pt} (capability with respect to Taguchi). It was introduced to a slightly larger audience in 1988 through a paper in the Journal of Quality Technology and the three authors named it C_{pm} The concept is very applicable in situations where there is an expensive thing to control, such as legal package weight in the food industry, is overcontrolled using P_{p_k} ( C_{p_k} for long term data).”

Underlying Assumptions

Walter Shewhart was perhaps the first to use probability theory to model process characteristics. As he put it in Statistical Method from the Viewpoint of Quality Control, “The engineer must be concerned not only with the tolerance range but also with the probability associated with that range.”

Shewhart’s Ambition

Shewhart’s writings combine considerations about tolerances, part interchangeability, and common-cause versus special-cause variability, with technical specifics. Although he does not explicitly say so, several key points emerge:

  1. He aims to establish a general theory, applicable to quality characteristics in any industry.
  2. He wants to apply the contemporary state of the art in probability and statistics.
  3. He wants to set procedures that any technician could apply, with manual data collection, paper spreadsheets, graph paper, slide rules, and printed tables.

It’s been almost 100 years since Shewhart did his seminal work at Western Electric and, while his insights about probability were valid, much else has changed:

  1. The challenge of making processes capable. Today, techniques useful in the 1920s are no longer needed in mature industries and are insufficient in high technology.
  2. Probability theory itself. Shewhart was a thorough and rigorous thinker, but his discussions of “statistical universes” or “probability limits” are difficult to follow for anyone trained in probability theory in the past 50 years, as the concepts have advanced and the vocabulary to describe them have changed.
  3. Advances in Information Technology (IT) and Operational Technology (OT) have enabled the automatic collection, storage, retrieval, and analysis of datasets that are orders of magnitude larger than in Shewhart’s days.

Shewhart’s Model

Shewhart’s model of a measured characteristic in a state of statistical control is the sum of a target value and a white noise. He does not use these terms but, in today’s language, it is what he does. It means that the model assumes the differences between measurements and the target at different points to be independent and identically distributed.

In addition, for control charts, the thresholds he uses to raise alarms about disruptions from assignable causes are mathematically based on the Gaussian distribution, also known as “normal,” and arbitrarily set to a p-value of 0.3%. In other words, without an assignable cause, the probability that the statistic plotted on the chart is beyond the threshold in either direction is 0.3%. See Appendix 1 about why I am calling this distribution “Gaussian” rather than “normal” or “bell-shaped.”

The Statistical Process Control (SPC) literature skips the math and provides coefficients that technicians, or SPC software, are supposed to use in setting limits. If you work out the math, however, it becomes clear that these coefficients are based on the Gaussian distribution.

Where Capability Indices Fail

Let’s take the example of manufactured goods with characteristics that are not Gaussian. To this day, there is a range of products made with processes that cannot control a critical dimension precisely and where you bin finished units into different grades or sizes based on one or more dimensions.

The first reaction to such a process is that it shouldn’t exist! It is not capable, and the first step should be to make it capable. In reality, however, it does exist, and it’s not a minor phenomenon. From the perspective of the recipient of its output, to the extent the binning step works, the units are all within tolerances, and the process is capable.

Graveltian Figurine from ~26,000 BCE

There are cases where we just don’t know how to avoid it yet. We have been making ceramics for ~30,000 years. Still, the firing process today introduces variations in dimensions that are too high for parts used in electronics assembly. To meet customer specs you must either cut the products after firing, which is slow and expensive, or bin the output of firing by size classes.

 

Other manufactured goods that you sort into bins after making them range from lead shot to microprocessors.

Lead Shot

Lead shot for hunting comes in a variety of diameters:

The different sizes, however, are not made by a size-specific process. Instead, they are the result of binning by diameter after a single process, done in a tower like this one:

The lead shot tower of Couëron, active 1878-1993

At the top of the tower, you pour molten lead onto a sieve, and the falling drops solidify on the way down. At the bottom, those not spherical enough don’t roll down a chute properly and return to the melt. Then you sort the remaining ones by diameter.

When you make different products by binning, a common process determines the breakdown by size. With lead shot, you can always recycle the overage if you have more of a given size than the market demands. You do not have that option if you bin socks by size.

Microprocessors

High-technology, ubiquitous products like microprocessors are the result of binning, as described in an interview by an Intel fellow:

Different units of the same microprocessor made in the same fabrication line and possibly on the same wafer of Silicon may be binned at different clock speeds, one being sold as a chip that works at 5.8 GHz, and the other one at 3 GHz. You first test it at the highest speed. If it doesn’t pass, you test it at a lower speed. If it doesn’t pass at any speed, it’s a reject. Otherwise, it goes into the bin for the highest speed it passes – that is, unless other considerations override the objective criterion.

The fastest chips sell at a high premium, as long as they are in short supply, and the chip maker may be tempted to downgrade chips that work at the higher clock speed to maintain the scarcity of the fast chips.

In at least one case, this did not escape the attention of a reseller, who bought large volumes of “slow” chips, retested them, and relabeled the one that passed the high-speed test. The supplier’s response was to engrave their pessimistic speed rating on the chips to make them more difficult to relabel.

Modeling Dimensions After Binning

The following picture shows a variable X with a distribution wider than the tolerance interval \left [ a,b \right ]. From the user’s perspective, this process is capable if you can filter all units with x outside of \left [ a,b \right ].

In the example above, the distribution within \left [ a,b \right ] is flat enough to be approximated with the uniform distribution, for which

  • The mean is \mu = \frac{1}{2}\left ( a+b \right ),
  • The standard deviation is \sigma = \sqrt{\frac{1}{12}}\left ( b-a \right ),

and

C_p = C_{p_k} = \frac{b-a}{6\sigma} = \frac{1}{6\sqrt{\frac{1}{12}}} = \frac{1}{\sqrt{3}} = 0.58,

The process, therefore, has low capability indices in spite of being as capable as a binning process can be. There is no way to achieve high capability indices with this process.

Estimation Issues

When you study the capability of a process, \mu and \sigma are unknown, and must be estimated from a sample \mathbf{x} = \left ( x_1,...x_n \right ) of measurements. You replace \mu and \sigma with their usual estimates:

  • m(\mathbf{x}) = \frac{1}{n}\sum_{i =1}^{n}x_i and
  • s(\mathbf{x}) = \sqrt{\frac{1}{n-1}\sum_{i=1}^{n}\left [ x_i - m(\mathbf{x}) \right ]^2},

And you estimate C_p and C_{p_k} by:

  • \hat{C}_p\left ( \mathbf{x}, L,U \right )= \frac{U-L}{6\times s\left ( \mathbf{x} \right )} and
  • \hat{C}_{p_k}\left ( \mathbf{x}, L,U \right )= min\left (\frac{U-m\left ( \mathbf{x} \right )}{3\times s\left ( \mathbf{x} \right )}, \frac{m\left ( \mathbf{x} \right )-L}{3\times s\left ( \mathbf{x} \right )} \right ),

The ASQ’s Quality Resources article about the capability indices glosses over the fact that \hat{C}_p and \hat{C}_{p_k} are biased estimators of C_p and C_{p_k}. There is a rich and esoteric academic literature on correcting these biases, but the quality literature generally ignores them.

When X is Gaussian, the math shows that they systematically overestimate the indices, by amounts that are not negligible for small samples. If you work with samples of more than 1,000 points, you don’t need to worry about it; with 30 points, you do. You usually conduct process capability studies before ramping up production, on small samples.

Model Parameters Versus Estimates

The literature makes no distinction between model parameters and their estimates:

  • The C_p and C_{p_k} that are functions of your model of the variable X and the tolerance interval [L,U]. X stands for all the values the variable may take when you measure it.
  • The \hat{C}_p and \hat{C}_{p_k} estimate C_p and C_{p_k} from data. \mathbf{x} = \left ( x_1,...x_n \right ) is the sequence of numbers you measured.

Biases in Capability Indices

Statisticians ask questions about estimators, particularly whether they give the right answer on the average. s(\mathbf{x}) is a function of the data set \mathbf{x} = \left ( x_1,...x_n \right ). To answer the question, you view s(\mathbf{x}) itself as an instance of the random variable s(\mathbf{X}) where \mathbf{X} = \left ( X_1,...X_n \right ) is a sequence of independent and identically distributed random variables.

With a little algebra, you can easily verify that E\left ( s\left ( \mathbf{X} \right )^2\right )= \sigma^2. In other words, s^2 is an unbiased estimator of \sigma^2. This is true for any random variable that has a standard deviation. It does not have to be Gaussian.

On the other hand E\left ( s\left ( \mathbf{X} \right )\right ) \neq \sigma. The theory of the \bar{X}-S control chart takes this bias into account through a correction factor called c_4, based on the assumption that the variable is Gaussian.

\hat{C}_p and \hat{C}_{p_k} have s(\mathbf{x}) in their denominators and, in general, E\left ( \frac{1}{s\left ( \mathbf{X} \right )} \right )\neq \frac{1}{\sigma}.

If the X_i are Gaussian variables with mean \mu and standard deviation \sigma, we have instead the more complicated formula:

E\left ( \frac{1}{s\left ( \mathbf{X} \right )} \right )= \frac{1}{\sigma}\times g\left ( n \right ),

where

g\left ( n \right ) = \frac{\sqrt{n-1}}{\sqrt{2}}\times \frac{\Gamma\left ( \left ( n-2 \right )/2 \right ) }{\Gamma\left ( \left ( n-1\right )/2 \right )} = 1+\frac{3}{4n} + O\left (\frac{1}{n^2} \right ),

For readers interested in the math of these formulas, their derivation is in Appendix 2.

How Large are the Biases?

For large n, the bias is approximately 3/4n. This approximation is close for a sample size as small as n= 100, giving you 0.75% instead of 0.76%.

The following table shows the ratio by which E\left ( \frac{1}{s\left ( \mathbf{X} \right )} \right ) exceeds \frac{1}{\sigma} as a function of sample size:

Sample SizeBias
5+25.00%
10
+9.00%
30+2.6%
100+0.76%
1,000+751 ppm
100,000+8 ppm

This is for the Gaussian distribution. If your measured variable is Gaussian, \hat{C}_p and \hat{C}_{p_k} systematically overestimate C_p and C_{p_k}, by amounts that become negligible as the sample size increases. With X Gaussian, you could use the formula for g(n) to remove the bias. For other distributions, however, the formula does not apply.

The General Case

In the general case, Jensen’s inequality, applied to the convex function \phi(x) = \frac{1}{\sqrt{x}} gives us:

E\left [ \frac{1}{s\left ( \mathbf{X} \right )} \right ] = E\left [ \frac{1}{\sqrt{s^2\left ( \mathbf{X} \right )}} \right ]\geq \frac{1}{\sqrt{E\left [s^2\left ( \mathbf{X} \right ) \right ]}}= \frac{1}{\sigma},

This means that the estimates \hat{C}_p and \hat{C}_{p_k} may overestimate C_p and C_{p_k} but not underestimate them. This bias needs to be calculated for the specific distribution of X.

Example from the ASQ

The ASQ’s Quality Resources website discusses sampling for a process capability study:

“For example, suppose you have a rotary tablet press that produces 30 tablets, one from each of 30 pockets per rotation. If you’re interested in tablet thickness, you might want to base your estimate of process capability on the standard deviation calculated from 30 consecutive tablets. Better yet, you might assure representation by taking those 30 consecutive tablets repeatedly over eight time periods spaced evenly throughout a production run. You would pool the eight individual standard deviations yielding a thickness capability estimate based on 8 \times (30 - 1) = 232 degrees of freedom.”

If you estimate the C_p and C_{p_k} of thickness based on a sample of 30 tablets, you will overestimate them by about 2.6%, and all you have from 8 samples a more consistent bias of 2.6%. If, on the other hand, you estimate your indices from a single sample of 8\times 30 = 240 points, you have a negligible bias of 0.3%.

The press produces 30 tablets in 30 pockets simultaneously, systematic variations between pockets interest process engineers. To study them, you must separate the tablets by pocket, which is challenging. If you throw all the tablets into a common bin, you lose this information:

 

To the customer, however, the only thing that matters is the consistency of thickness in the entire population of tablets regardless of which pockets on the machine they come from.

NIX, an injection molding company in Japan had developed this “octopus” to keep parts separated by die cavity in the blue bins at the bottom:

Translation to Yield

In the ASQ tablet example, let us assume that the thickness is centered on the middle of the tolerance interval, so that C_p= C_{p_k} = \frac{U-L}{6\sigma} and \mu = \frac{U+L}{2}. The yield of the process is

Y = \Phi\left ( \frac{U-\mu}{\sigma} \right ) - \Phi\left ( \frac{\mu-L}{\sigma} \right ) = \Phi\left ( \frac{U-L}{2\sigma} \right ) - \Phi\left ( -\frac{U-L }{2\sigma} \right ) = \Phi\left ( 3C_p \right ) - \Phi\left (-3C_p \right )

where \Phi is the cumulative distribution function for the Gaussian with zero mean and unit variance. The yield is only a function of C_p, which is exactly the point of using it.

Yield for Many Characteristics

This is for 1 measured parameter. An automatic car transmission case has 2,250 critical dimensions. If they are all independent, centered, with the same C_p, then, they all have equal yields, and their joint yield is Y^{2250}. As C_p varies from 1 to 2, the yield for the transmission case varies as follows:

 

 

In this simplified but not unrealistic case, any C_p \leq 1.5 fails to give you a capable process. In diecasting, it’s unlikely that all dimensions would have the same C_p. On the other hand, if you are assembling thousands of purchased components and require only a C_p = 1 from suppliers, you are unlikely to get anything better, and your assembly process will have a yield near 0.

A C_p = 1.5 says that, for any given dimension X among the 2,250 of the transmission case, the tolerance interval \left [ L,U \right ] matches the \pm 4.5\sigma interval of the dimension’s distribution. Numerically, this translates to

P\left ( X < L \right ) = P\left ( X > U \right ) = 3.4\times 10^{-6},

Based on the strange, arbitrary, and confusing theory of the 1.5 Sigma shift, the “Six Sigma” literature calls this the “Six Sigma level.” When a product has 2,250 dimensions, each at this level, it only has a probability of 98.5% that they will all be within tolerances, which is unacceptably low in automotive parts.

At C_p = 2, \left [ L,U \right ] matches the actual \pm 6\sigma interval of the dimension’s distribution. Then

P\left ( X < L \right ) = P\left ( X > U \right ) = 9.9 \times 10^{-10},

It is 3445 times smaller than for C_p = 1.5, and the probability of having all dimensions are all within tolerances rises to 99.9996%.

Does the Bias Make a Difference?

Again, let’s look at what happens when we use estimates. Equating \hat{C}_p with C_p, we estimate the yield as

\hat{Y} = \Phi\left ( 3\hat{C}_p \right ) - \Phi\left (-3\hat{C}_p \right )

Does this add more bias? In other words, how does the expected value of the estimate E\left (\hat{Y} \right ) = E\left [\Phi\left ( 3\hat{C}_p \right ) \right ] - E\left [\Phi\left (-3\hat{C}_p \right ) \right ] compare with the theoretical value Y = \Phi\left ( 3C_p \right ) - \Phi\left (-3C_p \right ) ?

The easiest way to estimate this is through a simulation. You generate 30 million independent instances of a Gaussian variable with 0 mean and unit variance, and group them into 1 million samples of 30 each. You estimate the standard deviation within each sample, and then take the average over the 1 million samples:

\overline{\hat{Y}} =\overline{\Phi\left ( 3\hat{C}_p \right ) - \Phi\left (-3\hat{C}_p \right )},

Let’s apply this to the transmission case that has 2,250 critical dimensions. If they are all independent, centered, with the same C_p, then, they all have an equal probability Y of falling within their tolerance intervals, and the probability that all of them do is Y^{2250}.

Let’s look at the effect of the bias as C_p varies from 1 to 2:

Let us assume that we use a sample size of 100 instead of the 30 in the ASQ example. With the same simulated data, we can form 300,000 samples of 100 points each, and recalculate estimates. As shown in the following picture, we get closer to the theoretical value:

Unsurprisingly, larger samples give you better results. Unfortunately, process capability studies are part of new product introduction, when large samples are unavailable.

Conclusions

Originating in obscure papers in Japan in 1956 and 1967, the C_p and C_{p_k} have upstaged yields not only in the literature on process capability but also, sometimes, in the expression of requirements from customers to suppliers.

As defined, the C_p and C_{p_k}  are applicable only to measured variables that follow a Gaussian distribution, a requirement that is far from universally met. The formulas commonly used to estimate C_p and C_{p_k} from data are biased in ways that are not negligible for the small datasets used in capability studies, and massive when C_p and C_{p_k} estimates are used to infer yields.

The math of C_p and C_{p_k} is flawed and more complicated than that of yields, and less generally applicable. It’s not clear that they present any advantage.

References

Appendix 1: Why “Gaussian”

Many call the Gaussian distribution “normal” or “bell-shaped.” I avoid both terms because normal suggests that all other distributions are somehow abnormal, and bell-shaped is imprecise, as many other distributions are also bell-shaped.

The “Gaussian” label is also commonly used and honors C.F. Gauss, whom Germany recognized on the 10-DM bill in the 1990s for his contributions to the theory of this distribution:

Appendix 2: The Bias In Cp and Cpk

This is beyond High School math. It uses the \Gamma function that Euler defined to extend n! to real and complex arguments, and O\left (\frac{1}{n^2}\right ) means “a quantity that vanishes like \frac{1}{n^2} as n increases.” The bias factor g(n) is the result of a long calculation but is needed only for n \leq 100. It can then be used to correct the bias when estimating C_p from small samples.

Distribution of the Variance Estimator

Per Cochran’s theorem, with Gaussian X_i, the sum of squares \left [\frac{n-1}{\sigma} \right ]^2\times s\left ( \mathbf{X} \right )^2 follows a \chi^2 distribution with n-1 degrees of freedom, which involves Euler’s function. The probability distribution function of a \chi^2 with k degrees of freedom is

f\left ( x,k \right )= \frac{1}{2^{k/2}\Gamma\left ( k/2 \right )}x^{k/2-1}e^{-x/2},

If you are comfortable with this formula, skip to the next section. If you want to see how it is derived from the Gaussian, read on.

A \chi^2 with k degrees of freedom is the sum the squares of k independent Gaussians N(0,1), and its value is constant over the sphere r^2 = x^2_1 +\dots+x^2_k.

The area A_{k-1} of the unit sphere S_{k-1} in \mathbb{R}^k is \frac{2\pi^{k/2}}{\Gamma\left ( k/2 \right )}, and, for the sphere of radius r, it is

A_{k-1}(r)=r^{k-1}\frac{2\pi^{k/2}}{\Gamma\left ( k/2 \right )},

If we change the variable to u = r^2, then dr = \frac{1}{2\sqrt{u}}du and the density of a \chi^2 with k degrees of freedom becomes

f\left ( u,k \right )= A_{k-1}(\sqrt{u})\times \phi(u)\times\frac{1}{2\sqrt{u}},

where \phi(u) is the value of the multivariate Gaussian density for any point in the sphere of radius \sqrt{u},:

\phi(u) = \phi(u) =\prod_{i =1}^{k}\frac{1}{\sqrt{2\pi}}e^{-x^2_i} = \frac{1}{\left ( 2\pi \right )^{k/2}}e^{-r^2/2} = \frac{1}{\left ( 2\pi \right )^{k/2}}e^{-u/2},

Therefore

f(u,k)=u^{(k-1)/2}\frac{2\pi^{k/2}}{\Gamma\left ( k/2 \right )}\times \frac{1}{\left ( 2\pi \right )^{k/2}}e^{-u/2}\times \frac{1}{2\sqrt{u}},

which simplifies to

f(u,k)=\frac{1}{2^{k/2}\Gamma\left ( k/2 \right )}u^{(k/2-1)}e^{-u/2},

Expected Value of the Inverse of the Standard Deviation Estimate

u^2 =(n-1)s^2 is a \chi^2 variable with n-1 degrees of freedom. The expected value of the inverse square root of u is therefore

h\left ( n \right )= \int_{0}^{+\infty}\frac{1}{\sqrt{u}}f(u,n-1)du=\frac{1}{2^{(n-1)/2}\Gamma\left ( (n-1)/2 \right )}\int_{0}^{+\infty}u^{((n-2)/2-1)}e^{-u/2}du,

By a change of variable to v = u/2, the integrand becomes

\int_{0}^{+\infty}u^{(n-2)/2-1}e^{-u/2}du = 2^{(n-2)/2} \times \int_{0}^{+\infty}v^{(n-2)/2-1}e^{-v}dv,

or

\int_{0}^{+\infty}u^{(n-2)/2-1}e^{-u/2}du = 2^{(n-2)/2} \times \Gamma((n-2)/2),

and

h\left ( n \right ) = \frac{\Gamma((n-2)/2)}{\sqrt{2}\Gamma\left ( \left ( n-1 \right )/2 \right )},

Therefore

g\left ( n \right ) = \frac{\sqrt{n-1}}{\sqrt{2}}\times \frac{\Gamma((n-2)/2)}{\Gamma\left ( \left ( n-1 \right )/2 \right )} = \sqrt{ \frac{n-1}{2}}\times \frac{\Gamma((n-1)/2 - 1/2)}{\Gamma\left ( \left ( n-1 \right )/2 \right )}

Approximation for Large Samples

To approximate this for large n, let’s express g(n) as a function of z = \frac{n-1}{2}, so that

g(n) = \sqrt{z}\times \frac{\Gamma(z - 1/2)}{\Gamma(z)},

The Stirling’s series for large z gives us

\frac{\Gamma\left ( z + \alpha \right )}{\Gamma\left ( z + \beta \right )}= z^{\alpha +\beta}\times\left [ 1 + \frac{\left ( \alpha -\beta \right )\left ( \alpha + \beta -1 \right )}{2z}+ O\left ( \frac{1}{z^2} \right ) \right ],

If we plug in \alpha = -1/2 and \beta = 0, we get

g(n)= \sqrt{z}\times\frac{\Gamma\left ( z -1/2 \right )}{\Gamma\left ( z \right )}= 1 + \frac{3}{8z}+ O\left ( \frac{1}{z^2} \right ),

and, translating back in terms of n,

g(n)= 1+ \frac{3}{4\left ( n-1 \right )}+ O\left ( \frac{1}{n^2} \right ) = 1+ \frac{3}{4n}+ O\left ( \frac{1}{n^2} \right ),

#processcapability, #capabilityindex, #cp, #cpk, #quality