Jan 3 2017
In a previous post, I pointed out that manufacturing professionals’ eyes glaze over when they hear the word “probability.” Even outside manufacturing, most professionals’ idea of probability is that, if you throw a die, you have one chance in six of getting an ace.
2000 years ago, Claudius wrote a book on how to win at dice but the field of inquiry has broadened since, producing results that affect business, technology, science, politics, and everyday life.
In the age of big data, all professionals would benefit from digging deeper and becoming, at least, savvy recipients of probabilistic arguments prepared by others. The analysts themselves need a deeper understanding than their audience.
With the software available today in the broad categories of data science or machine learning, however, they don’t need to master 1,000 pages of math in order to apply probability theory, any more than you need to understand the mechanics of gearboxes to drive a car.
It wasn’t the case in earlier decades, when you needed to learn the math and implement it in your own code. Not only is it now unnecessary, but many new tools have been added to the kit. You still need to learn what the math doesn’t tell you: which tools to apply, when and how, in order to solve your actual problems. It’s no longer about computing, but about figuring out what to compute and acting on the results.
Following are a few examples that illustrate these ideas, and pointers on concepts I have personally found most enlightening on this subject. There is more to come, if there is popular demand.
Mistakes and Consequences
The 2nd most popular post in this blog cautions against the misuse of a formula for setting safety stocks. I wrote Lean Logistics before I ever heard of this formula and my first encounter with it was the misapplication case study described in the post. It had the following problems:
- The assumptions required for the formula to apply were not satisfied.
- The formula in the Excel spreadsheet had two misplaced exponents. One term that was supposed to be squared wasn’t, and another was, when it shouldn’t have been.
- In addition, the input data was not raw enough to provide estimates for the standard deviations used in the formula, and assumptions were used instead.
Somewhere along the line from the analyst who made the mistakes to the plant manager who acted on the flawed output, someone should have found out and raised these issues.
Start with Nate Silver
To start, math-phobes should read Nate Silver’s The Signal And The Noise. Nate Silver writes great popular science. His math is sound, but he keeps it under the hood. He is a practitioner, not an academic, and explains himself in terms anyone can follow.
He has made a living as an online poker player, developed a baseball player performance forecasting tool named PECOTA, and his FiveThirtyEight blog is the reference for American politics. Nate Silver’s book will entertain you; it will not enable you to follow in his footsteps, but it should provide the motivation to go further.
Altering Weather Predictions to Please TV Viewers
Among other things, you will learn that weather reports on TV systematically overstate the probability of rain tomorrow and why they do it. Nate Silver checked the frequency of rain on days when the weather report had set the probability at, say, 40%, and found that it actually rained on only 25% of those days. The reason was that, at least in cities, the audience never complains about being surprised by fair weather, but does about being surprised by rain.
Altering Demand Forecasts to Please the Boss in Manufacturing
Does altering predictions to please an audience occur in manufacturing? My first exposure to production planning was in a small division of a semiconductor company, making special chips in low volumes. The division manager had announced that the following month’s sales would be $1M. About $515K was coming in as part of long-term defense contracts, which meant that the difference had to come in the form of small commercial orders from many customers called “turns,” and the Production Control manager dutifully plugged in $485K to make the numbers add up to the manager’s announcement.
A basic analysis of the turns of the previous 18 months told me that the following month’s turns would be within $350K ± $50K, but the Production Control manager would not have dared show these numbers to the boss. A month later, the actual turns brought in $360K.
How Dictionaries And Encyclopedias Bungle It
That probability is subtle can be seen from the bungled attempts at defining it that you find in dictionaries and encyclopedias.
The Wikipedia article opens with:
“Probability is the measure of the likelihood that an event will occur.”
If reading this makes you look for the article on “likelihood,” you find that there isn’t one. There is one, however, for likelihood function, which contains the following:
“In informal contexts, ‘likelihood’ is often used as a synonym for ‘probability.'”
So Wikipedia gives you a circular definition.
If you find this unhelpful, you may turn to Webster’s and find the following, equally unhelpful mishmash:
Probability and Statistics
As befits a regular dictionary, four of the five meanings are about common, everyday usage. Only the fourth one is about probability as a technical term, and it presents it under the heading of Statistics, which is incorrect. Probability and Statistics are distinct fields.
Probability theory grew out of the study of games of chance for the purpose of setting gambling strategies; statistics, from the collection and analysis by governments of data about the state.
As recounted by Peter Bernstein in Against the Gods, the two met in 18th-century England, where insurance companies struggled to set premiums on ships and cargoes. It’s been an open relationship ever since. Much of the work in statistics today is visualizing and summarizing data in ways that don’t involve any probability theory. Conversely, probability theory is applied beyond statistics, to telecommunications, physics, game theory, and other fields.
Probability Versus Relative Frequency
Furthermore, the view of probability as being by definition the relative frequency of one outcome within all possible outcomes in repeated, independent trials, like coin tosses or repetitive production of one item in one-piece flow on the same line is overly restrictive.
The dictionary writers’ excuse is that, according to E.T. Jaynes, probability was defined this way by James Bernouilli 300 years ago, and mathematicians agreed for the following 150 years. They have changed their minds.
If this were the general meaning of a probability, what did it mean when FiveThirtyEight, on the eve of the US presidential election, estimated that Hillary Clinton had a 61% probability of winning? The only way to reconcile this with relative frequency is to imagine the 2016 election repeating itself Groundhog Day-like, and the percentage of elections Hillary Clinton wins converging towards 61% of the total number.
This kind of thought experiment is clearly not the way we perceive it. More generally, a probability is really a number between 0 and 1 that we assign to events in such a way that an impossible outcome is a 0, a sure thing a 1, and everything else a number in between.
There are many ways to assign this number to basic events, but probability theory then gives us rules to calculate probabilities for events that are AND, OR, and NOT combinations of other events, and to refine them as the outcome of related events becomes known.
Betting on Soccer
The key is how we use this number to assign weights to our beliefs. Say there is a soccer match tomorrow between Barcelona and Real Madrid, and you want to make a friendly wager on the outcome. If you know nothing about either team, by symmetry, you assign a 50% probability of winning to each.
To you, it’s a coin toss. Then you find out that the “best soccer player of 2016,” Lionel Messi, plays for Barcelona. You don’t know how this distinction was awarded, but you choose to trust the authority that did. In your mind, you increase Barcelona’s probability to >50%. It doesn’t matter whether you raise it to 60% or 80% because the outcome is the same: you bet on Barcelona.
Meanwhile, your friend has other information. He knows, for example, that Real Madrid has the 2nd best player in the world, Cristiano Ronaldo, and that Messi has a knee injury, as a result of which he calculates the probabilities differently and is happy to bet on Real Madrid…
The probabilities are not just a function of the match; they are instead a judgment based on what you know about it. They are subjective in that you and your friend have different information, but objective in that you this information is all you use. Unlike soccer fans, you don’t let emotional attachment to one team or the other influence your bet.
Betting on Your Own Survival
The nature of these probabilities is well illustrated in the “one chance in three” speech by the defecting Soviet captain of the submarine Red October:
E.T. Jaynes On Probability As The Logic Of Science
To E.T. Jaynes probability is “the logic of science.” In math, logical propositions are true, false, or undecidable. Science differs from math in that you are dealing with nature, not just with ideas. When an experiment produces an outcome that does not match a theory, it refutes the theory but when the outcome does match the theory, it does not prove it.
As Karl Popper explains, today’s science is simply the body of propositions about nature that experiments have failed to refute. While safe to accept as facts and more practically useful than mathematical theorems, scientific theories are not at the same level of certainty. They start out as plausible hypotheses, that are gradually reinforced as experiments fail to disprove them.
If false is equated to 0 and true to 1, then plausibility can be quantified as a number between 0 and 1, and the operations of mathematical logic that support deductions extend to operations on plausibility that match probability theory.
In other words, to Jaynes, probability is quantified plausibility. The observed relative frequency heads versus tails in coin tosses, or of good versus defective units in manufacturing, is information that can be used to set probabilities, but so is the presence of an exceptional player in a sports team or of a better machine in a production line.
Georges Matheron On Randomness And Uncertainty
Georges Matheron, under whom I had the privilege to study, applied probabilistic models to spatial phenomena, like orebodies in the ground or the texture of materials. Near the end of his career, he wrote a philosophical essay called Estimating and Choosing, in which, among other topics, he describes the models we use to solve problems as a choice between determinism, randomness, and uncertainty:
- If we can calculate outcomes exactly, a phenomenon is deterministic, like the period of a pendulum.
- If we can apply probability theory to guide our decisions, as in deciding whether to mine an orebody, then it’s random.
- If we can’t apply probability theory, then it’s uncertain, as with war or a stock market prone to indiscriminate rises followed by inexplicable declines.
The boundaries are subjective and blurry and shift over time. A plant manager may perceive product demand as uncertain, while a data scientist reviewing recent order history notices that the bulk of the demand comes from large orders placed like clockwork every other Wednesday by the same customer, and therefore predictable in both timing and range, which makes them random. According to Nate Silver, weather forecasting, in the last two decades, has migrated from uncertain to random.
He also tackled the problem of having no repetitions. Rock formations take shape over millions of years, causing ore to accumulate in some places and not others. It happened once; there are no multiple, independent occurrences to calculate frequencies.
Matheron chose to assume that the variations between observations at different locations in an orebody allow you to infer the variability of ore content at any point in it. He called it ergodicity.
In manufacturing, we make the same assumption in work sampling, which is based on the assumption that we can infer the breakdown in the use of time by a resource from snapshots of the states of multiple, identical resources.
If you have 20 injection-molding machines, each with an Andon light on top, each time you go out on the floor and count the red, yellow, and green lights, you are taking a snapshot of the state of the machines. If these snapshots tell you that, on the average, you have 5 machines down and two idle, you are assuming ergodicity when you infer from these numbers that, on the average, each machine is down 25% of the time, and idle another 10% of the time.
It’s not as accurate as collecting timestamps of the state changes of your machines for a month, but it can be done in one day at minimal cost and is accurate enough to support many management decisions, including the option of taking more detailed measurements on one machine.
Don Wheeler’s View
Don Wheeler is an outspoken advocate for the use of World War II vintage technologies in the analysis of quality or business data. In Myths About Data Analysis, pp. 16-17, he writes:
“In the mathematical sense all probability models are limiting functions for infinite sequences of random variables.”
This is not something you find in probability theory, at least as formulated since the 1930s. What makes the theory such a forbidding subject for most professionals is that it describes abstractions and rules for manipulating them without connecting them to actual problems.
Making this connection is up to you, and it takes different forms whether your challenge is placing a bet, counting a population, sizing a market for a new product, quantifying the range of the demand for a company’s products, quantifying the variations in the technical characteristics of a product as measured at final test, finding a wreck at the bottom of the ocean,…
What I read in Don Wheeler’s statement is that he equates the probability of an event with the limit of its relative frequency as the sample size increases indefinitely. When you are dealing with a bowl containing a mixture of red and white beads, you can decide to use the relative frequency as a probability model but, as we have seen, there are plenty of circumstances in which this is not an option, particularly when you are reasoning about unique events.
- Bernstein, P. (1998) Against the Gods, Wiley
- Jaynes, E.T. (2003) Probability Theory, The Logic of Science, Cambridge University Press
- Matheron, G. (1988) Estimating and Choosing, Springer Verlag
- Popper, K. (2002) The Logic of Scientific Discovery, Routledge Classics
- Silver, N. (2012) The Signal and the Noise, Penguin Press
- Wheeler, D. (2012) Myths About Data Analysis, 2012 International Lean & Six Sigma Conference