# Variability, Randomness, And Uncertainty in Operations

This elaborates on the topics of randomness versus uncertainty that I briefly touched on in a prior post. Always skittish about using dreaded words like “probability” or “randomness,” writers on manufacturing or service operations, even Deming, prefer to use “variability” or “variation” for the way both demand and performance change over time, but it doesn’t mean the same thing. For example, a hotel room that goes for $100/night in November through March and$200/night from April to October has a price that is variable but not random. The rates are published, and you know them ahead of time.

By contrast, to a passenger, the airfare from San Francisco to Chicago is not only variable but random. The airlines change tens of thousands of fares every day in ways you discover when you book a flight. Based on having flown this route four times in the past 12 months, however, you expect the fare to be in the range of $400 to$800, with $600 as the most likely. The information you have is not complete enough for you to know what the price will be but it does enable you to have a confidence interval for it. Beyond randomness, events like the 9/11 attack in 2001, the financial crisis in 2008, the Fukushima earthquake in 2011, or a toy that is a sudden hit for the Christmas season create uncertainty, a higher level of variability than randomness. Such large-scale, unprecedented events give you no basis to say, on 9/12/2001, when airliners would fly again, in 2008 how low the stock market would go, in 2011 when factories in northeastern Japan would restart, or how many units of the popular toy you should make. In Manufacturing, you encounter all three types of variability, each requiring different management approaches. In production planning, for example: 1. When the volume and mix of products to manufacture is known far in advance relative to your production lead time, you have a low-volume/high-mix but deterministic demand. The demand for commercial aircraft is known 18 months ahead of delivery. If you supply a variety of components to this industry that you can buy components for, build and ship within 6 weeks, you still have to plan and schedule production but your planners don’t need to worry about randomness or uncertainty. 2. When volume and mix fluctuate around constant levels or a predictable trend, you have a random demand. The amplitude of fluctuations in aggregate volume is smaller than for individual products. In this context, you can use many tools. You can, for example, manage a mixed-flow assembly line by operating it at a fixed takt time, revised periodically, using overtime to absorb fluctuations in aggregate volumes, heijunka to sequence the products within a shift, and kanbans to regulate the flow of routinely used components to the line. 3. As recent history shows, uncertain events occur, that can double or halve your demand overnight. No business organization can have planned responses to or all emergencies, but it must be prepared to respond when it needs to. The resources needed in an emergency that need to be nurtured in normal times include a multi-skilled, loyal and motivated workforce, as well as a collaborative supply chain. In many cases, you have to improvise a response; in some, vigilance can help you mitigate the impact of the event. Warned by weather data, Toyota’s logistics group in Chicago anticipated the Mississippi flood of 1993. They were shipping parts by intermodal trains to the NUMMI plant in California and, two days before the flood covered the tracks, they reserved all the available trucking in the area, which cost them daily the equivalent of 6 minutes of production at NUMMI. They were then able to reroute the shipments south of the flooded area. The distinction between random and uncertain is related to that between common and special causes introduced by Shewhart and Deming in the narrower context of quality control. In Deming’s red bead game, operators plunge a paddle into a bowl containing both white and red beads with the goal of retrieving a set of white beads only, and most paddle loads are defective. The problem has a common cause: the production system upstream from this operation is incapable of producing bowls without red beads. In Deming’s experiment, the managers assume is has a special cause: the operator is sloppy. They first try to motivate by slogans, then discipline and eventually fire the operator. The proper response would have been (1) as an immediate countermeasure, filtering the red beads before the operation and (2) for a permanent solution, working with the source to improve the process so that it provides batches with all white beads every time. The imprecision — or randomness — of the process is summarized in terms of its capability, which sets limits on observable parameters of outgoing units. Observations outside of these limits indicate that, due to a special cause, to be identified, the capability model no longer matches reality. In the other cases discussed above, the cause is known: you felt the earthquake, or you heard on the news that war broke out… The only challenge you are facing is deciding how to respond. Deming made “knowledge of variation” one of the pillars of his “system of profound knowledge.” One key part of this knowledge is recognition of the different types of variability described above and mastery of the tools available to deal with each. # If Talk Of Probability Makes Your Eyes Glaze Over… Few terms cause manufacturing professionals’ eyes to glaze over like “probability.” They perceive it as a complicated theory without much relevance to their work. It is nowhere to be found in the Japanese literature on production systems and supply chains, or in the American literature on Lean. Among influential American thinkers on manufacturing, Deming was the only one to focus on it, albeit implicitly, when he made “Knowledge of Variation” one of the four components of his System of Profound Knowledge (SoPK). # Standardization Doesn’t Stamp Out Creativity | The Deming Institute Blog | John Hunter “[…] One of the things I find annoying, in this way, is that reducing variation and using standardization is said to mean everyone has to be the same and creativity is stamped out. This is not what Dr. Deming said at all. And the claim makes no sense when you look at how much emphasis he put on joy in work and the importance of using everyone’s creativity. Yet I hear it over and over, decade after decade.” Sourced through Scoop.it from: blog.deming.org Michel Baudin‘s comments: Yes, the metric system did not stifle anybody’s creativity. By making commerce, engineering, and science easier, it actually helped creative people innovate, invent, and discover. But when Deming says “Standardization does not mean that we all wear the same color and weave of cloth, eat standard sandwiches, or live in standard rooms with standard furnishings,” he seems to exclude the possibility that standardization could be abused. # Dr. Deming: ‘Management Today Does Not Know What Its Job Is’ (Part 2) | Quality content from IndustryWeek “The usual procedure is that when anything happens, [we] suppose that somebody did it. Who did it? Pin a necklace on him. He’s our culprit. He’s the one who did it. That’s wrong, entirely wrong. Chances are good, almost overwhelming, that what happened, happened as a consequence of the system that he works in, not from his own efforts. In other words, performance cannot be measured. You only measure the combined effect of the system and his efforts. You cannot untangle the two. It is very important, I believe, that performance cannot be measured.” Source: www.industryweek.com See on Scoop.itlean manufacturing # Dr. Deming: ‘Management Today Does Not Know What Its Job Is’ (Part 1) | IndustryWeek […]”Management today does not know what its job is. In other words, [managers] don’t understand their responsibilities. They don’t know the potential of their positions. And if they did, they don’t have the required knowledge or abilities. There’s no substitute for knowledge.” Source: www.industryweek.com See on Scoop.itlean manufacturing # Update on The Deming Legacy: Free Sample Available on Leanpub To download the free sample, please click The Deming Legacy, and then on Sample.PDF as shown below: The book is intended to entertain while providing food for thought during a two to three-hour flight on a business trip. The first three chapters are now available as free downloads. In exchange, I would appreciate the following feedback: 1. The value of this book for you. 2. Questions you would like answered about the topics. 3. Any comments or suggestions. The complete table of contents is as follows: • Rereading Deming’s 14 Points • Point 1: Create constancy of purpose • Point 2: Adopt the new philosophy… • Point 3: Cease dependence on inspection for quality • Point 4: Stop awarding business based on price tag • Point 5: Improve the System Constantly and Forever • Point 6: Institute Training on the Job • Point 7: Institute Leadership • Point 8: Drive out fear • Point 9: Break down barriers between departments • Point 10: Eliminate slogans and exhortations • Point 11: Eliminate numerical quotas and goals • 11.a: Eliminate work standards for workers • 11.b: Eliminate numerical goals for managers • Point 12: Remove barriers to pride of workmanship • 12. a. Remove barriers that rob the hourly worker of his right to pride of workmanship • 12. b. Remove barriers that rob people in management and in engineering of their right to pride of workmanship • Point 13: Institute education and self improvement • Point 14: Put everybody to work on the transformation • Other enduring concepts from Deming • Common causes versus Special causes • Deming’s “System of Profound Knowledge” The plan is for the book to be about 100 pages. Most of the material is already written, with drafts posted on this blog, but it still needs to be edited and organized as a book. If you are interested, I would also like to know what price you would be willing to pay for it in electronic or hardcopy form. # A Deming Interview from 1984 In a discussion in the TPS Principles and Practice discussion group on LinkedIn, Hein Winkelaar pointed out the following video of an interview with W. Edwards Deming taped in 1984: for more details, you can also read about Deming’s list of “5 Deadly Diseases” in Chapter 3 of Out of the Crisis, pp. 97-148. # Averages in Manufacturing Data The first question we usually ask about lead times, inventory levels, critical dimensions, defective rates, or any other quantity that varies, is what it is “on the average.” The second question is how much it varies, but we only ask it if we get a satisfactory answer to the first one, and we rarely do. When asked for a lead time, people usually give answers that are either evasive like “It depends,” or weasel-worded like “Typically, three weeks.” The beauty of a “typical value” is that no such technical term exists in data mining, statistics, or probability, and therefore the assertion that it is “three weeks” is immune to any confrontation with data. If the assertion had been that it was a mean or a median, you could have tested it, but, with “typical value,” you can’t. For example, if the person had said “The median is three weeks,” it would have had the precise meaning that 50% of the orders are delivered in less than 3 weeks, and that 50% take longer. If the 3-week figure is true, then the probability of the next 20 orders all taking longer, is $0.5^{20}= 9.6\,ppm$. This means that, if you do observe a run of 20 orders with lead times above 3 weeks, you know the answer was wrong. In Out of the Crisis, Deming was chiding journalists for their statistical illiteracy when, for example, they bemoaned the fact that “50% of the teachers performed beneath the median.” In the US, today, the meaning of averages and medians is taught in Middle School, but the proper use of these tools does not seem to have been assimilated by adults. One great feature of averages is that they add up: the average of the sum of two variables is the sum of their averages. If you take two operations performed in sequence in the route of a product, and consider the average time required to go through these operations by different units of product, then the average time to go through operations 1 and 2 is the sum of the average time through operation 1 and the average time through operation 2, as is obvious from the way an average is calculated. If you have n values $X_{1},...,X_{n}$ the average is just $\bar{X}= \frac{X_{1}+...+X_{n}}{n}$ What is often forgotten is that most other statistics are not additive. To obtain the median, first you need to sort the data so that $X_{\left(1\right)}\leq ... \leq X_{\left(n\right)}$. For each point, the sequence number then tells you how many other points are under it, which you can express as a percentage and plot as in the following example: Graphically, you see the median as the point on the x-axis where the curve crosses 50% on the y-axis. To calculate it, if n is odd, you take the middle value $\tilde{X}= X_{_{\left (\frac{n}{2}+1\right )}}$ and, if n is even, you take the average of the two middle values, or $\tilde{X}= \frac{\left[ X_{_{\left (\frac{n}{2}\right )}}+X_{_{\left (\frac{n}{2}+1\right )}}\right]}{2}$ and it is not generally additive, and neither are all the other statistics based on rank, like the minimum, the maximum, quartiles, percentiles, or stanines. An ERP system, for example, will add operation times along a route to plan production, but the individual operation times input to the system are not averages but worst-case values, chosen so that they can reliably be achieved. The system therefore calculates the lead time for the route as the sum of extreme values at each operation, and this math is wrong because extreme values are not additive. The worst-case value for the whole route is not the sum of the worst-case values of each operation, and the result is an absurdly long lead time. In project management, this is also the key difference between the traditional Critical Path Method (CPM) and Eli Goldratt’s Critical Chain. In CPM, task durations set by the individuals in charge of each task are set so that they can be confident of completing them. They represent a perceived worst-case value for each task, which means that the duration for the whole critical path is the sum of the worst-case values for the tasks on it. In Critical Chain, each task duration is what it is actually expected to require, with a time buffer added at the end to absorb delays and take advantage of early completions. That medians and extreme values are not additive is experienced, if not proven, by a simple simulation in Excel. Using the formula “LOGNORM.INV(RAND(),0,1)” will give you in about a second, 5,000 instances of two highly skewed variables, X and Y, as well as their sum X+Y. On a logarithmic scale, their histograms look as follows: And the summary statistics show the Median, Minimum and Maximum for the sum are not the sums of the values for each term: Averages are not only additive but have many more desirable properties, so why do we ever consider medians? There are real problems with averages, when taken carelessly: 1. Averages are affected by extreme values. It is illustrated by the Bill Gates Walks Into a Bar story. Here we inserted him into a promotional picture of San Fancisco’s Terroir Bar:Attached to each patron other than Bill Gates is a modest yearly income. But his presence pushes the average yearly income above$100M, which is not a meaningful summary of the population. On the other hand, consider the median. Without Bill Gates, the middle person is Larry, and the median yearly income, $46K. Add Bill Gates, and the median is now the average of Larry and Randy, or$48K. The median barely budged! While, in this story, Bill Gates is a genuine outlier, manufacturing data often have outliers that are the result of malfunctions, as when wrong measurements are recorded as a result of a probe failing to touch the object it is measuring, or the instrument is calibrated in the wrong system of units, or a human operator puts a decimal point in the wrong place…Large differences between average and median are a telltale sign of this kind of phenomenon. Once the outliers are identified, assessed, and filtered, you can go back to using the average rather than the median.
2. Averages are meaningless over heterogeneous populations. The statement that best explains this is “The average American has exactly one breast and one testicle.” It says nothing useful about the American population. In manufacturing, when you consider, say, a number of units produced, you need to make sure you are not commingling 32-oz bottles with minuscule free samples.
3. Averages are meaningless for multiplicative quantities. If you data is the sequence $Y_{1}, ...,Y_{n}$ of yields of the n operations in a route, then the overall yield is $Y= Y_{1}\times ...\times Y_{n}$, and the plain average of the yields is irrelevant. Instead, you want the geometric mean $\bar{Y}=\sqrt[n]{Y_{1}\times ...\times Y_{n}}$.
The same logic applies to the compounding of interest rates, and the plain average of rates over several years is irrelevant.
4. Sometimes, averages do not converge when the sample size grows. It can happen even with a homogeneous population, it is not difficult to observe, and it is mind boggling. Let us say your product is a rectangular plate. On each one you make, you measure the differences between their actual lengths and widths and the specs, as in the following picture:
Assume then that, rather than the discrepancies in length and width, you are interested in the slope ΔW/ΔL and calculate its average over an increasing number of plates. You are then surprised to find that, no matter how many data points you add, the ratio keeps bouncing around instead of converging as the law of large numbers has led you to expect. So far, we have looked at the averages as just a formula applied to data. To go further, we must instead consider that they are estimators of the mean of an “underlying distribution” that we use as a model of the phenomenon at hand. Here, we assume that the lengths and widths of the plates are normally distributed around the specs. The slope ΔW/ΔL is then the ratio of two normal variables with 0 mean, and therefore follows the Cauchy distribution. This distribution has the nasty property of not having a mean, as a consequence of which the law of large numbers does not apply. But it has a median, which is 0.

The bottom line is that you should use averages whenever you can, because you can do more with them than with the alternatives, but you shouldn’t use them blindly. Instead, you should do the following:

2. Identify and filter outliers.
3. Make sure that the data represents a sufficiently homogeneous population.
4. Use geometric means for multiplicative data.
5. Make sure that averaging makes sense from a probability standpoint.

As Kaiser Fung would say, use your number sense.

# Forthcoming book: The Deming Legacy

About two years ago, I started posting essays on this blog about Deming’s 14 points and their current relevance. Now I am writing on Points 11.a and 12 through 14, which I have not covered yet, organizing the material, and editing it into an eBook entitled The Deming Legacy, that will be available shortly in PDF, iBook and Kindle formats. If you are interested, please visit the site and let me know. Comments here are also welcome.

The posts on the topic to date are as follows:

The title is a ploy to convince Matt Damon to play Deming in the movie version.

# Organizational Sabotage – The Malpractice of Management By Objective by Ken Craddock & Kelly Allan

See on Scoop.itlean manufacturing

Organizational Sabotage – The Malpractice of Management By Objective by Ken Craddock & Kelly Allan – Innovation, quality and productivity suffer from the abuse of MBOs Objectives are essential to a business.

Michel Baudin‘s insight:

This article brings a new perspective on the discussion of the same topic in this blog.

By Michel Baudin Posted in Deming