Who Uses Statistical Design Of Experiments In Manufacturing?

Next to SPC, Design of Experiments (DOE) is the most common topic in discussions of Statistical Quality. Outside of niches like semiconductors or pharmaceuticals, however, there is little evidence of use, particularly in production.

At many companies, management pays lip service to DOE and even pays for training in it. You must “Design experiments” if you pursue continuous improvement.

In manufacturing, DOE is intended to help engineers improve processes and design products. It is a rich but stable body of knowledge.  The latest major innovation was Taguchi methods 40 years ago. Since then, Statistics has been subsumed under Data Science and new developments have shifted in emphasis from experimentation to Data Mining.

Experimentation in science and engineering predates DOE by centuries. Mastering DOE is a multi-year commitment that few manufacturing professionals have been willing to make. Furthermore, its effective use requires DOE know-how to be combined with domain knowledge.

Six Sigma originally attempted to train cadres of engineers called “Black Belts” in a subset to DOE. They then served as internal consultants to other engineers within electronics manufacturing. Six Sigma, however, soon lost this focus.

Design of Experiments (DOE) versus Data Mining

From a data science perspective, what sets DOE apart from Data Mining is that you choose the observed variables and collect data specifically for the purpose of solving a problem. Data Mining, by contrast, is the retrieval of information from data already collected for a different purpose. In DOE, you plan every step from data collection to the presentation of results. You decide which variables to collect, by what means, in what quantities, how you will analyze the results, what thresholds will make you conclude one way or the other, and how you will justify your conclusions to stakeholders. Data Mining is like Forrest Gump’s box of chocolates, “you never know what you’re gonna get.

Japanese academic Ichiji Minagawa made the same distinction in slightly different words:

“Design of experiments is a method for finding an optimum value fast from the smallest sample. The point is that there is no data. This is a big difference from multivariate analysis. Multivariate analysis is a method of finding valuable information from a pile of data. Design of experiments, on the other hand, is a technique used when there is no data.”


Designed experiments yield high-quality data in quantities ranging from a few tens in small engineering problems to tens of thousands in the later phases of clinical trials for new drugs or A/B testing of web-page designs. Data mining is done on whatever data is already available. It can be millions of data points but their relevance and their quality need vetting.

Ronald Fisher was explicit in his focus on Small Data when he wrote:

“Only by tackling small sample problems on their merits […] did it seem possible to apply accurate tests to practical data.” Preface to Statistical Methods for Research Workers, 11th Edition (1950).

To draw inferences from small samples, you need thresholds to mark differences as “significant.” You test the observations against a null hypothesis, according to which there is a low probability that they should fall out of a given interval. You set this probability, or level of significance, arbitrarily set at 5%, 1%, 0.3%, or 3.4 ppm. The word “significant” means nothing unless you spell out the level.

The complexity associated with significance, however, vanishes with Big Data. With large datasets, the smallest perceptible wiggle in a statistic passes any significance test with flying colors. If a correlation coefficient calculated on 80 points is 0.2, it is insignificant even at the 5% level. If calculated on 10,000 points, the same correlation coefficient of 0.2 off-the-charts significant.


Until recently, technology limited opportunities. The most celebrated cases of early data mining include identifying the source of a cholera outbreak in London from locating patients and water pumps on a map in 1853,  and estimating the production numbers of German tanks in World War II from the serial numbers of captured or destroyed tanks. Today, companies keep historical records and the web is a source of datasets to download or scrape.

To design experiments with paper, pencil, slide rules, and books of tables in the 1920s, Fisher used local people as human computers, like NASA’s Hidden Figures 30 years later. Today, any of us can do it with readily available software. The challenge has shifted from computing to understanding where the techniques apply and what the results say.


As a consequence, the relative importance of DOE and Data Mining has shifted. It can be ascertained by mining the web for data about publications in both fields. As seen in the following table, the trick is to phrase the query fairly. “Statistical Design of Experiments” is more restrictive than “Design of Experiments” but also more precise:

Source Design of Experiments Statistical Design of Experiments Data Mining
Amazon Books >6,000 321 >20,000
Google Books 15.2M 1.76M 17.8M
Google Scholar 6.55M 4.85M 3.75M


The contrast between the queries is greater on Amazon than Google because Amazon catalogs what they are selling today while Google gives cumulative data. 321 titles on Amazon for “Statistical Design of Experiments” does not qualify the topic as active compared to Data Mining. The top DOE titles are mostly new editions or reprints of books that are decades old. For “Data Mining,” on the other hand, Amazon has more than 20,000 titles, including many published in the 2010s.

For a manufacturing professional in need of DOE, it means that it is, comparatively, a mature field. Its body of knowledge, while rich, does not change every 5 years.

The Development of DOE

DOE was developed in a context where experimentation already had a long history, and not for manufacturing in particular.

Experimentation in Science

As philosopher Michael Strevens writes about scientists “if they are to participate in the scientific enterprise, they must uncover or generate new evidence to argue with.” If you are a paleontologist, you uncover evidence in fossils; if a physicist, you generate it by experiments.

Young Ronald Fisher

Experimentation is a key part of science but not all scientific experiments are statistically designed. Marie Curie and Nikola Tesla were successful experimenters before DOE existed. The first forms of DOE were invented in the 1920s by Ronald Fisher but, decades later, I met molecular biologists who had never heard of DOE and didn’t think they were missing out.

The biologists ran experiments based on their understanding of their science, and either the expected effect occurred or it didn’t. These experiments produced results that were binary and clear. Having run experiments for 450 years, physicists were not particularly receptive to statistical DOE.

DOE comes into play when results are not obvious but answers are needed. It originated in agricultural research where a fertilizer may enhance crop yield by 5% more than another but where such differences have tangible economic stakes. DOE is also used in social sciences and in marketing, where you don’t have the kind of mathematical models of physics.

Experimentation in Engineering

Experimentation is even more central to Engineering than to Science. This is because all engineers do is build artifacts, and experimentation is the only way to prove they work. The Wright Brothers ran experiments on lift in order to build an airplane that flew, not to understand fluid dynamics.

Engineers have to convince people other than their peers that their contraptions are safe and effective. For road vehicles, airplanes, rockets, or pharmaceuticals, market mechanisms and industry self-discipline have proven insufficient. Governments have had to step in and regulate.

In his summary of DOE for Manufacturing, Astakhov (2012) uses the following picture to explain the general context:

The Object of DOE in Engineering

DOE aims to correlate the outputs \left(y_1,..., y_m\right) with inputs \left(x_1,..., x_n\right) in the presence of noise \left(z_1,..., z_p\right) and in the absence of a mathematical model of the System or Process. The experiment, or the sequence of experiments, lets you build an empirical one. It relates outputs to inputs without opening the black box. We still don’t know what happens inside but we have a model of its consequences.

In Astakhov’s discussion of DOE, the system is a black box: the experimenter has no knowledge of its inner workings. He makes it concrete with the following example:

Experimentation Based on Domain Knowledge Versus DOE

While the DOE model looks general, it doesn’t fit all the experiments scientists and engineers do. They often don’t start with a black box but with a generic model of the system and they experiment for the purpose of assigning specific values to model coefficients.

When the Wright brothers experimented with lift, they started from a model and used a wind tunnel to establish that it overestimated lift and measure more accurate values for a number of wing profiles.

The Wright Brothers Wind Tunnel

Their experiment didn’t fit the DOE model because there was no black box. Sometimes, you treat the system as a black box even went you have a model of what happens in it, because all you care about is the relationship of outputs to inputs. The certification of drugs is a case in point.

Manufacturing Applications of DOE

Information about the applications of DOE in Manufacturing is not as easy to find as one might expect. For example, google “manufacturing + design-of-experiments”, you receive links to training courses and software packages but not use cases, preferably compiled by individuals or organizations that, like this author, have no skin in this game.

Readers with personal experience are invited to share it in comments. The point is to establish how effective DOE has been in Manufacturing for those who used it, not to estimate their numbers. It’s about whether it works, not how popular it is.


In pharmaceuticals, every new drug must undergo clinical trials based on protocols set by a government agency, that mandate specific forms of DOE. A chemist might have a better way but it doesn’t matter. A government agency needs to certify the drug before the company can sell it, and it may use up 7 out of the 20 years of the drug patent, after which competitors can produce generic versions. These agencies are the Food & Drug Administration (FDA) in the US and its counterparts in other countries.

This application is sufficiently important for a leading supplier of analytics software like SAS to have dedicated a product to it, JMP® Clinical.


In semiconductors, process engineers use DOE because they need it in process development, not because of any external mandate. They can use whatever method they want as long as it works. While a technical challenge, it is faster and easier to demonstrate process capability in, say, plasma etching, than to convince the FDA that a new pain reliever is safe and effective.

It was not used in the early days of the industry. Bob Noyce did not use DOE to develop the integrated circuit (IC) and neither did Andy Grove to develop the processes used to this day to make ICs in high volumes. DOE penetrated the industry slowly, due to the cultural gap between process engineers and statisticians. In 1986, Six Sigma, as discussed below, was originally intended to spread DOE knowledge among electronics engineers at Motorola. The Six Sigma literature didn’t say so but the content of Six Sigma training did.


In manufacturing industries other than pharmaceuticals and semiconductors, it is difficult to find published cases of DOE that are not from software vendors or academics. The Minitab website, for example, has a case study of a team from Ford working with a supplier to remove brush marks on carpets while maintaining plushness.

It was, no doubt, a technical challenge but not quite as central to the car business as securing the approval of a new drug is to pharmaceuticals or developing the operations to build ICs on Silicon wafers to semiconductors. The car industry equivalent would have been eliminating leaks from transmissions or getting the electronics to run the same way in Death Valley in July and Minnesota in January.

Writing in the Harvard Business Review in 1990, Genichi Taguchi cites work on drive shafts at Mazda but say nothing about Toyota.

The Special Case of Toyota

While Toyota is known for experimentation in all aspects of its operations, it does not share much about DOE with the rest of the world. Given that DOE has been around for decades, if they were extensively using it, we would know by now, at least from Toyota alumni.

Japanese Consultants About DOE At Toyota

In Japan, the Research Center on Management Technology (経営技術研究所) is a consulting firm based in Nagoya whose members, while not Toyota alumni, have written many books about TPS, including one in 2008 about the Keywords for thoroughly implementing Quality Control within TPS.

They devoted 3 out 142 pages to DOE, with an example on spatter prevention in arc-welding, a process used for steel plates like axle housings. The experiment considers three factors — current, voltage, and waveform — and concludes that only current has an effect on the frequency of spattering. It is a simple case for DOE.

Toyota Alumni

Ian Low, an alumnus from Toyota in the UK, commented as follows on LinkedIn:

“My practical experience of this in Toyota was on honing of cylinder blocks for the sz (Yaris) engine. Setting up the honing heads was such a nightmare (utilizing air sizing) to first rough and then finish with the correct honing angle. We would quickly scrap 8-10 blocks every time we did a honing head change before we could get the process to stabilize on the +13 micron/-0 tolerance we needed.

Then along came the 2SZ variant (1.2 litre rather than 1.0 litre) and we had to reset the machine. I was convinced that following the lean principle of running in ratio (2:3) would totally sink our productivity as we’d be spending all our time changing heads and settings and making scrap.

And we did, for the first week. But the intense scrutiny the process came under drew us to look at all the tiny variables we’d never considered before and bring them into control. And I was proven wrong.

Things like ensuring honing fluid wasn’t getting into the air sizing system and resulting in faulty sizing readings. Things like very precisely controlling the pressure and cleanliness of the honing fluid. Basically it’s always a healthy exercise because you learn that things you assumed didn’t matter, actually did matter. I never experienced the use of any complex statistical approaches…”



Boeing Frontiers has an article from 2003 about DOE that refers to an Applied Statistics group in the Phantom Works Mathematics and Computing Technology. Boeing’s Phantom Works still exists in 2020 as part of the defense group. Their website, however, makes no reference to DOE, except for the 2003 article.

The Evolution of DOE

Statistical Design of Experiments, as we know it, started with Ronald Fisher in the 1920s and, in Manufacturing, the state of the art is still largely the work of Genichi Taguchi, known in the US since the 1980s.

Fisher’s DOE and Crop Yields

Rothamsted Dataset

Even though Fisher had been hired at the Rothamsted Agricultural Research Station for the task of mining a dataset accumulated over 70 years, he is best known as the father of Statistical DOE, based on experiments with crop yields.

When you treat the same crop with different fertilizers in different plots, you observe different yields. It doesn’t however, prove that one fertilizer works better than another. You work with small datasets because collecting data takes years. The differences you observe may be fluctuations or may be due to other factors that agricultural science cannot explain. Fisher, who was not an agronomist but a mathematician, developed methods for assessing the reality of these differences.

In The Design of Experiments, Fisher (1935) presents his methods as generic and applicable to any experiment. Other than a tea tasting experiment, all the cases Fisher uses are about crop yields. About tea, the issue is whether one woman can tell from tasting whether the server added tea to milk or milk to tea. Chapter V is about assigning different fertilizer treatments to 36 plots in a 6×6 grid, using Fisher’s Latin Squares.

The following picture shows an arrangement of plots used at Rothamsted for wheat growing experiments since 1843:

The lesson from his book is not that there is a one-size-fits-all methodology for DOE.  Instead, you tailor it to the needs of each domain.

From Crop Yields to Web Page Designs

Today, Google uses the simplest of Fisher’s techniques.  They do A/B Testing on web page details that may be as small as the color of a confirmation button. You choose users to receive one version of the page or the other as if they were different fertilizers. Then you harvest the resulting crops of clicks. You don’t rely on any science of color preferences. You assume that the color of the button makes no difference, and analyze the data for evidence of the contrary.

From One to Multiple Factors

This kind of One-Factor-At-A-Time (OFAT) experiments are simple and easy to understand but often insufficient. You can use fertilizers in different doses, combine different ones, and irrigation patterns also play a role. In addition, crop size may not be the only output of interest. Quality matters too. And when there are interactions between factors, you cannot reach the best combination of levels by OFAT experiments. As data collection is time-consuming and expensive, you end up with small datasets for each combination of levels.


The issues that Fisher addressed with Statistical DOE are still with us. We still use many of the tools he developed, particularly analysis of variance (ANOVA). Scientists and engineers, however, have not massively embraced his DOE.

DOE and Manufacturing Quality

When Fisher was pursuing the highest crop yields, he only cared about aggregate quantities for plots. Variations between individual plants within plots were not a concern. It’s different in manufacturing, where the goal is to produce identical units of the same item.

Critical Characteristics

The classical vision is to attach critical characteristics to an item. These are attributes or measurements with tolerances and you reject any unit missing attributes or with measurements outside of tolerance limits. Within this vision, the goal of process engineering and experimentation is for all units to pass.

It’s imperfect in that it is possible for a unit to have all the right characteristics at all operations and still not function properly at the end of the process. The designation of a characteristic as critical is a human judgment call. The engineers may not know which ones are truly critical, and may not have the means to observe them in-process without destroying the unit or slowing down production.

First Make It Precise, Then Accurate

Making a process capable means reducing the variability of its outputs. The idea is that, once you have the ability to precisely reproduce the output, you can adjust the aim to make it accurate. It is not always true, because the precision of the output may vary with the target you are aiming at.

Statistics for Experimenters

Statistics for Experimenters, 2nd Edition from 1978 by George E.P. Box, William G. Hunter, and J. Stuart Hunter shares the top position in the lists of Best Books about Design of Experiments with John Lawson’s 2014 Design and Analysis of Experiments with R

Math Versus Code

Statistics for Experimenters gives you mathematical formulas and results for examples and has an appendix of statistical tables. Design and Analysis of Experiments with R gives you software you can run — that is, if you know R. Fisher worked with human computers; Box, Hunter, and Hunter, with mainframes and punch cards; Lawson, with a connected laptop.

Applications to Manufacturing

Statistics for Experimenters expands the scope of DOE from Fisher’s crop yields to all sorts of other domains, including manufacturing processes. As Fisher described the Latin Square Design that is sometimes awkwardly abbreviated as “LSD,” it took imagination to envision this design applied outside of fertilizer studies. Box, Hunter, and Hunter did it, to fuel emissions from cars and yarn strength in synthetic fibers.

Their manufacturing examples, however, are like Fisher’s, about maximizing yields, not reducing variability in the output. Example 7.1, for example, is about maximizing the production volume of penicillin, not its consistency. Variation is studied in Chapter 17, in 33 pages out of a 653-page book.

Respect For Domain Expertise

Box, Hunter & Hunter also show respect for domain expertise. In the very beginning, they warn experimenters “not to forget what you know about your subject-matter field! Statistical techniques are most effective when combined with subject-matter knowledge.” They allude to a chasm that remains to this day between data scientists and domain experts.

No Assumption Of Prior Knowledge of Statistics

They also assume a reader who is an experimenter but has no prior knowledge of probability theory or statistics. Given that the subject has since made its way into Middle School curricula, it should be a pessimistic assumption.

Screening and Response Surface Methodology (RSM)

Semiconductor process engineers who practice DOE distinguish between Screening experiments — intended to select which four or five parameters influence the outcome of a process — and Response Surface Methodology (RSM) — the search for the optimal combination of values for these parameters. Statistics for Experimenters covers RSM but only briefly mentions Screening, which is the subject of a separate 2006 book.


Between temperatures, pressures, flow rates, tool wear, humidity, etc., there may be tens of factors that can influence the outcome of an operation in a machine. Screening uncovers the handful that actually matters. Initially, we don’t know which factors they are.

It sounds like dimensionality reduction, commonly done by factor analysis but this technique does not apply here because the factors it identifies are not variables you can control and observe but functions of several of these variables. You can directly program furnace temperatures and flow rates, but not a factor that is a combination of the two.

For a liquid flowing in a pipe, for example, the Reynolds number is a function of the velocity, pipe diameter, liquid density, and viscosity that is useful in determining whether the flow is laminar or turbulent. On the other hand, it’s neither a quantity you read from a sensor nor a control you set with a dial, which is what you screen for.

Screening from Historical Records

If you have historical records, screening is a data mining problem. Otherwise, you collect the data specifically for this purpose, and it’s DOE. In 2020, sensors, the Industrial Internet of Things (IIoT), and supervisory controllers (SCADA systems), have made it easier than ever and the greatest challenge may be to exercise restraint. Just because you can collect 100 temperature readings per second does not mean you should. It’s prudent to collect more than you think you need, but not 1,000 times more.

If you have collected a large number of parameters, you can, for example, fit a linear regression model to predict the outcome from these parameters. Modern regression tools then give each parameter a significance star rating and you can take a first-cut at screening, by weeding out all the low star ratings. If there are still too many parameters, you can use stepwise regression or all subset regression to zoom in on a handful accounting for the bulk of the variability of the outcome. When you use this method, domain knowledge goes into the regression formula.

Screening Experiments

If you don’t have historical records, you conduct screen by experiments on many factors but focussing exclusively on their individual contributions, ignoring interactions. To analyze interactions, you use RSM.

Dorian Shainin’s Red X

Dorian Shainin, a pioneer in the use of DOE in Manufacturing, advocated screening to identify the single Red Xfactor that accounts for most of the variability in the output. If it is a parameter you can control, like a furnace temperature, you reduce its variability by process engineering; if it’s not, like the composition of an ore or a clay dug out of the ground, you mitigate its variability by mixing materials from several lots or you compensate for it by tweaking the process.

In the Overview of the Shainin System, causes of variability are assumed to contribute additively to the variance of the output, which is true only if they are uncorrelated. You don’t need any such assumption when using regression as above.

Response Surface Methodology (RSM)

Statistics for Experimenters co-author George Box and K. B. Wilson coined the term Response Surface Methodology (RSM) in 1951. Amazon today offers 108 books on RSM, the most highly ranked being the 3rd edition, from 2016, of a textbook from 1995, which suggests it’s a stable, mature body of knowledge.

One Factor At A Time (OFAT)

The original approach was One-Factor-At-a-Time (OFAT). Starting from where you are, you hold all factors but one constant and find the optimum for this factor. Then hold it at this value and repeat the process for a second factor, and so on. Besides being time-consuming, this approach only finds a global optimum if the factors are uncorrelated.


With correlated factors, OFAT won’t take you to the optimum. You need to take into account their interactions and filter out the effect of noise. The issue of interactions between inputs is well illustrated by photographic cameras. The following picture compares control of a Digital Single-Lens Reflex (DSLR) with a 1950s vintage Rolleiflex:

In the DSLR, one control wheel allows you to tell the camera that you want to take sports pictures, a landscape, a portrait, or a close-up of a flower. The settings are in terms of the pictures you want to take. The controls on a 1952 Rolleiflex, on the other hand, are settings on features of the camera that have interactions:

  1. The photographer decides which kind of film to use. All the other settings depend on its sensitivity. The high-sensitivity film is easier to work with but the pictures are grainier.
  2. The smaller the aperture, the wider the range of distances — the depth of field — for which the picture is sharp but the longer the exposure needs to be in order to shine enough light on the film.

It is up to the photographer to choose the film sensitivity and set the aperture, exposure, and distance for the picture to come out right. All these inputs interact but in ways that the camera manufacturer understood well enough to provide settings guidelines on the back of the camera. You don’t need experiments but serious photographers do experiment to understand what their equipment can do, and experimenting with film involves the costs and delays of processing.

From Amateur Photography to Semiconductor Wafer Processing

Semiconductor manufacturers cycle silicon wafers through dozens of photolithography steps that have similar issues with film photography. They spin an emulsion of photoresist on a wafer, and expose it section by section through a mask to pattern it, using rays that, over decades, migrated from visible light to ultraviolet and now extreme ultraviolet (EUV). This process, like film photography, is subject to interacting controls of aperture and exposure time.

By opening up, you let more rays through. It reduces the exposure time and increases capacity. In doing so, however, you also reduce the depth of field and, with line widths on the order of 200 nm, the slightest warpage in the wafer can locally blur the pattern, and you need experiments.

Factorial Experiments

Astakhov illustrates the concept with a cake taste experiment having four input factors:

  1. Oven temperature
  2. Sugar quantity
  3. Flour quantity
  4. Number of eggs

With two settings for each, that’s 2^{4} = 16 combinations of settings to bake cakes with, randomizing the sequence as needed to filter out the influence of factors that are not part of the experiment, like the chef or the egg carton.

Then you use the data to assess the influence of each factor and of interactions between factors. You can group the data by any combination of factors, compute Fisher’s ratio of the variance of the target output between and within groups and, if it is deemed significant, use the group means to estimate the output produced by this combination of factors.

Full factorial experiments

In a full factorial experiment, you generate data for all 16 combinations of settings and are able to analyze all the factors individually, in pairs, in triplets, and all four together. That’s all the non-empty subsets of the set of factors, of which there are 2^4 - 1 = 15

While this is work, it is considerably less than running separate experiments for each factor individually and for each pair, triplet, etc.

The work of doing full factorial experiments rises quickly with the number of factors and the number of levels considered for each factor. If you increase to 5 factors with 3 settings for each, the number of combinations for which you need to generate data rises to 3^5 = 243 , to evaluate the effect of 2^5 - 1 = 31 combinations of factors.

Partial factorial experiments

This is why you screen to limit the number of factors to consider. You also need to keep the number of levels or settings for each factor low. It is usually not enough and you also need to use partial factorial designs that include only some of the possible combinations of settings.

The idea is to take advantage of the usually decreasing effect of combinations of factors when you include more. Individual factors contribute more than pairs, which contribute more than triplets, etc. The two Oven temperature settings alone will account for more of the variance in the result than combinations of settings for Oven temperature, Sugar quantity, Flour quantity, and Number of eggs. You take advantage of this to generate partial factorial designs when adding new factors.

Taguchi Methods

Genichi Taguchi popularized his concepts in the US in the 1980s and they are still referenced as the state of the art for DOE applied to manufacturing quality. They are focused on the elimination of variability. Some of Taguchi’s concepts are straightforward, like the quadratic loss function but others are dauntingly complex, which may explain why they are not more widely used.

Robust design

“Robust design” is an alternative and more descriptive name for Taguchi’s methods, used in particular by his associate at Bell Labs, Madhav Phadke. The key idea is to search for the combination of controls that is least sensitive to noise.

Firing ceramics in a kiln is notorious for introducing variability in product dimensions, often requiring the binning of finished goods. Phadke cites the case of a Japanese tile company that Taguchi worked with. The engineers established that the dimensional variations were related to temperature differences due to the positions of the tiles within kiln carts.

They first proposed modifying the kiln but then realized that the variability could be reduced simply by increasing the lime content of the clay. A change in the product design made the outcome less sensitive to temperature differences and therefore the process more robust.

Quadratic Loss Function versus Tolerances

The idea behind tolerances is that, if the critical characteristics of a product unit all fall anywhere within an interval, the unit works. If there are 45 critical characteristics, their tolerance intervals form a 45-dimension hypercube. The actual space of critical characteristics for which the unit works can be of any shape and the target is to make the tolerance cube small enough to fit within that shape, yet no so small that it is beyond the capability of the process.

If you apply this logic rigorously, your loss is 0 if the unit is anywhere inside the cube and constant outside of it, because you reject the unit anyway. It doesn’t matter whether you missed the interval by an inch or a mile.

The Concept of Tolerances

According to Hounshell, the concept of tolerance was introduced in the late 19th century to resolve conflicts between production managers who insisted the output of their shops was close enough to target values and inspectors who rejected it for not being on target. It was a compromise meant to enable production to function.

Deviations from target often do not have the same consequences in both directions. If a rod is slightly too long, you can grind it down to the right length; if too short, you have to scrap it. In other words, the losses are higher on one side of the tolerance interval than the other. The machinists will aim for the top rather than the center of the interval, and the actual length distribution will be off-center.

The effect on the final product of this kind of adjustment over thousands of characteristics is difficult to predict. The safest conclusion to draw is that, on the one hand, having any characteristic outside its tolerance interval is enough to brand a unit as defective but, on the other hand, that having all characteristics within their tolerance intervals does not guarantee its quality. That’s why most manufactured products still undergo a final test at the end of their process.

Target Values Versus Tolerances

The target values are what the designers decided was best for the product, which production ignores when treating all the values in the tolerance interval equally. Taguchi’s loss function is 0 only at the target value, and then grows like the square of the distance to target. Experimenting to minimize it honors the designers’ choices. The following figure, from Madhav Phadke’s book, compares the Taguchi loss function with the traditional one based on tolerances.

This logic is relevant near the target value, inside the tolerance interval. Outside of it, you are back to a logic where your loss is the same whether you are close or far.

No loss within tolerance interval versus quadratic losses, per Phadke

Taguchi’s Justification for the Quadratic Loss Function

With his quadratic loss function around a target value, Taguchi’s loss function shifted emphasis from aiming for characteristics within a tolerance interval to aiming for a target value. While actionable and useful, this concept is also arbitrary. Taguchi justified it in terms of two equally wobbly and unnecessary foundations: loss to society and cost of quality.

When working on manufacturing quality, you are focused on making products that match customer expectations. Whether failure to do so translates a loss to society depends on what kind of product it is, as it could be cigarettes or a cure for lung cancer. In any case, the translation of this idea to numbers is anything but obvious. This notion has a Confucian ring to it that makes it unhelpful in societies that are not influenced by Confucianism.

Cost of quality, on the other hand, is usually defined as the sum of failure, appraisal, and repair costs. Accountants can provide numbers but they usually do not begin to address the business consequences of quality problems, which are predominantly about reputation.

Signal-to-Noise Ratio

Taguchi splits the inputs into Signal Factors and Control Factors. The Signal Factors are the user settings on a product or a machine when running it. The Control Factors are parameters set at design time. They are the characteristics of the hardware and software it is built from. The Noise Factors, as before, are outside the control of the user or the designer.

The objective of the Taguchi experiment is to identify the control factors so as to maximize a “signal-to-noise ratio” defined as the square of the coefficient of variation of the output, expressed in decibels. This idea has not been universally accepted by statisticians.

Orthogonal Arrays

Taguchi has developed a theory of orthogonal arrays for fractional designs with tables of settings for series of experiments, interaction tables, and graphs of the factors and interactions analyzed in an experiment. As Taguchi did this work before becoming aware of Fisher’s theories, there are differences between his orthogonal arrays and factorial designs, as discussed by Scibilia (2017).

The Original Six Sigma

While it’s not obvious from recent literature, DOE was, in fact, the core technical content of the original Six Sigma in the 1980s, before Six Sigma training dialed back to SPC.

Motivation at Motorola

Motorola then was a high-technology company where product and process development required extensive experimentation throughout product life cycles of a few short years. Experimentation was hobbled by the disconnect between engineers who understood device and process physics on the one hand and statisticians trained in DOE on the other.

The engineers didn’t see the value of learning DOE and the statisticians didn’t know enough about the technology to communicate it. The industry needed armies of people proficient in both disciplines and only had a handful.

The Black Belts

The idea of Six Sigma was to package a subset of DOE for massive use by engineers. The idea was to give statistical training to 1% of the work force and let them be a resource for the remaining 99%. The Black Belts were not expected to be PhD-level statisticians, but process engineers with just enough knowledge of modern statistics to be effective.

Besides sounding more assertive than “staff statistician” and making an imaginary connection with Japan, the Black Belt title also made sense because there is a parallel between Six Sigma and martial arts training.

Traditional masters in the martial arts of China trained one or two disciples at the Bruce Lee level in a lifetime, just as universities train only a handful of experts in statistical design of experiments who could be effective in electronics manufacturing. One Karate instructor, on the other hand, can train hundreds of Black Belts, just as a Six Sigma program could teach a subset of DOE to hordes of engineers.


Should Manufacturers use DOE? Except in a few niches, there is today little evidence that the managers of Manufacturing organizations think so. It doesn’t mean they are right.  It does mean that DOE boosters have failed to make a compelling case that the investment in mastering this complex body of knowledge pays off. To the extent it does, the opportunity for manufacturers to gain a competitive advantage from it is still present.

For the Curious

There are many books and articles about experimentation and DOE, with overlapping content. They are listed here in reverse chronological order:

Software for DOE

It is almost a given that any engineer wanting to get started with DOE will reach for Excel first, and a Google search of “Excel + DOE” will provide links to templates, macros, add-ons, and video tutorials. The literature above, however, references specialized software products:

#doe, #experiment, #experimentaldesign, #fisher, #taguchi, #lean, #tps, #semiconductor, #pharmaceuticals