Variability, Randomness, And Uncertainty in Operations

This elaborates on the topics of randomness versus uncertainty that I briefly touched on in a prior post. Always skittish about using dreaded words like “probability” or “randomness,” writers on manufacturing or service operations, even Deming, prefer to use “variability” or “variation” for the way both demand and performance change over time, but it doesn’t mean the same thing. For example, a hotel room that goes for $100/night in November through March and $200/night from April to October has a price that is variable but not random. The rates are published, and you know them ahead of time.

By contrast, to a passenger, the airfare from San Francisco to Chicago is not only variable but random. The airlines change tens of thousands of fares every day in ways you discover when you book a flight. Based on having flown this route four times in the past 12 months, however, you expect the fare to be in the range of $400 to $800, with $600 as the most likely. The information you have is not complete enough for you to know what the price will be but it does enable you to have a confidence interval for it.

Continue reading

Probability For Professionals

dice In a previous post, I pointed out that manufacturing professionals’ eyes glaze over when they hear the word “probability.” Even outside manufacturing, most professionals’ idea of probability is that, if you throw a die, you have one chance in six of getting an ace.  2000 years ago, Claudius wrote a book on how to win at dice but the field of inquiry has broadened since, producing results that affect business, technology, science, politics, and everyday life.

In the age of big data, all professionals would benefit from digging deeper and becoming, at least, savvy recipients of probabilistic arguments prepared by others. The analysts themselves need a deeper understanding than their audience. With the software available today in the broad categories of data science or machine learning, however, they don’t need to master 1,000 pages of math in order to apply probability theory, any more than you need to understand the mechanics of gearboxes to drive a car.

It wasn’t the case in earlier decades, when you needed to learn the math and implement it in your own code. Not only is it now unnecessary, but many new tools have been added to the kit. You still need to learn what the math doesn’t tell you: which tools to apply, when and how, in order to solve your actual problems. It’s no longer about computing, but about figuring out what to compute and acting on the results.

Following are a few examples that illustrate these ideas, and pointers on concepts I have personally found most enlightening on this subject. There is more to come, if there is popular demand.

Continue reading

The meaning(s) of “random”

Random and seq. access“That was random!” is my younger son’s response to the many things I say that sound strange to him, and my computer has Random Access Memory (RAM), meaning that access to all memory locations is equally fast, as opposed to sequential access, as on a tape, where you have to go through a sequence of locations to reach the one you want.

In this sense, a side-loading truck provides random access to its load, while a back-loading truck provides sequential access.

While  these uses of random are common, they have nothing to do with probability or statistics, and it’s no problem as long as the context is clear. In discussion of quality management or production control, on the other hand, randomness is connected with the application of models from probability and statistics, and misunderstanding it as a technical term leads to mistakes.

From the AMS blog (2012)

From the AMS blog (2012)

In factories, the only example I ever saw of Control Charts used as recommended in the literature was in a ceramics plant  that was firing thin rectangular plates for use as electronic substrates in batches of 5,000 in a tunnel kiln. They took dimensional measurements on plates prior to firing, as a control on the stamping machine used to cut them, and they made adjustments to the machine settings if control limits were crossed. They did not measure every one of the 5,000 plates on a wagon. The operator explained to us that he took measurements on a “random sample.”

“And how do you take random samples?” I asked.

“Oh! I just pick here and there,” the operator said, pointing to a kiln wagon.

That was the end of the conversation. One of the first things I remember learning when studying statistics was that picking “here and there” did not generate a random sample. A random sample is one in which every unit in the population has an equal probability of being selected, and it doesn’t happen with humans acting arbitrarily.

A common human pattern, for example, is to refrain from picking two neighboring units in succession. A true random sampler does not know where the previous pick took place and selects the unit next to it with the same probability as any other. This is done by having a system select a location based on a random number generator, and direct the operator to it.

This meaning of the word “random” does not carry over to other uses even in probability theory. A mistake that is frequently encountered in discussions of quality is the idea that a random variable is one for which all values are equally likely.  What makes a variable random is that probabilities can be attached to values or sets of values in some fashion;  it does not have to be uniform. One value can have a 90% probability while all other values share the remaining 10%, and it is still a random variable.

When you say of a phenomenon that it is random, technically, it means that it is amenable to modeling using probability theory. Some real phenomena do not need it, because they are deterministic:  you insert the key into the lock and it opens, or you turn on a kettle and you have boiling water. Based on your input, you know what the outcome will be. There is no need to consider multiple outcomes and assign them probabilities.

There are other phenomena that vary so much, or on which you know so little, that you can’t use probability theory. They are called by a variety of names; I use uncertain.  Earthquakes, financial crises, or wars can be generically expected to happen but cannot be specifically predicted. You apply earthquake engineering to construction in Japan or California, but you don’t leave Fukushima or San Francisco based on a prediction that an earthquake will hit tomorrow, because no one knows how to make such a prediction.

Between the two extremes of deterministic and uncertain phenomena is the domain of randomness, where you can apply probabilistic models to estimate the most likely outcome, predict a range of outcomes, or detect when a system has shifted. It includes fluctuations in the critical dimensions of a product or in its daily demand.

The boundaries between the deterministic, random and uncertain domains are fuzzy. Which perspective you apply to a particular phenomenon is a judgement call, and depends on your needs. According to Nate Silver, over the past 20 years, daily weather has transitioned from uncertain to random, and forecasters could give you accurate probabilities that it will rain today. On the air, they overstate the probability of rain, because a wrong rain forecast elicits fewer viewer complaints than a wrong fair weather forecast. In manufacturing, the length of a rod is deterministic from the assembler’s point of view but random from the perspective of an engineer trying to improve the capability of a cutting machine.

Rods for assemblers vs. engineers

Claude Shannon

Claude Shannon

This categorization suggests that that a phenomenon that is almost deterministic is, in some way, “less random” than one that is near uncertainty. But we need a metric of randomness to give a meaning to an expression like “less random.”  Shannon’s entropy does the job. It is not defined for every probabilistic model but, where you can calculate it, it works. It is zero for a deterministic phenomenon, and rises to a maximum where all outcomes are equally likely. This brings us back to random sampling.  We could more accurately  call it “maximum randomness sampling” or “maximum entropy sampling,” but it would take too long.