Aug 8 2014

## The meaning(s) of “random”

“That was *random*!” is my younger son’s response to the many things I say that sound strange to him, and my computer has *Random* Access Memory (RAM), meaning that access to all memory locations is equally fast, as opposed to *sequential* access, as on a tape, where you have to go through a sequence of locations to reach the one you want.

In this sense, a side-loading truck provides random access to its load, while a back-loading truck provides sequential access.

While these uses of *random* are common, they have nothing to do with probability or statistics, and it’s no problem as long as the context is clear. In discussion of quality management or production control, on the other hand, *randomness* is connected with the application of models from probability and statistics, and misunderstanding it as a technical term leads to mistakes.

In factories, the only example I ever saw of Control Charts used as recommended in the literature was in a ceramics plant that was firing thin rectangular plates for use as electronic substrates in batches of 5,000 in a tunnel kiln. They took dimensional measurements on plates prior to firing, as a control on the stamping machine used to cut them, and they made adjustments to the machine settings if control limits were crossed. They did not measure every one of the 5,000 plates on a wagon. The operator explained to us that he took measurements on a “random sample.”

“And how do you take random samples?” I asked.

“Oh! I just pick here and there,” the operator said, pointing to a kiln wagon.

That was the end of the conversation. One of the first things I remember learning when studying statistics was that picking “here and there” did not generate a *random* sample. A *random sample* is one in which every unit in the population has an equal probability of being selected, and it doesn’t happen with humans acting arbitrarily.

A common human pattern, for example, is to refrain from picking two neighboring units in succession. A true random sampler does not know where the previous pick took place and selects the unit next to it with the same probability as any other. This is done by having a system select a location based on a random number generator, and direct the operator to it.

This meaning of the word “random” does not carry over to other uses even in probability theory. A mistake that is frequently encountered in discussions of quality is the idea that a* random variable *is one for which all values are equally likely. What makes a variable *random* is that probabilities can be attached to values or sets of values in *some* fashion; it does not have to be uniform. One value can have a 90% probability while all other values share the remaining 10%, and it is still a random variable.

When you say of a phenomenon that it is *random*, technically, it means that it is amenable to modeling using probability theory. Some real phenomena do not need it, because they are deterministic: you insert the key into the lock and it opens, or you turn on a kettle and you have boiling water. Based on your input, you know what the outcome will be. There is no need to consider multiple outcomes and assign them probabilities.

There are other phenomena that vary so much, or on which you know so little, that you can’t use probability theory. They are called by a variety of names; I use *uncertain*. Earthquakes, financial crises, or wars can be generically expected to happen but cannot be specifically predicted. You apply earthquake engineering to construction in Japan or California, but you don’t leave Fukushima or San Francisco based on a prediction that an earthquake will hit tomorrow, because no one knows how to make such a prediction.

Between the two extremes of deterministic and uncertain phenomena is the domain of *randomness*, where you can apply probabilistic models to estimate the most likely outcome, predict a range of outcomes, or detect when a system has shifted. It includes fluctuations in the critical dimensions of a product or in its daily demand.

The boundaries between the deterministic, random and uncertain domains are fuzzy. Which perspective you apply to a particular phenomenon is a judgement call, and depends on your needs. According to Nate Silver, over the past 20 years, daily weather has transitioned from uncertain to random, and forecasters could give you accurate probabilities that it will rain today. On the air, they overstate the probability of rain, because a wrong rain forecast elicits fewer viewer complaints than a wrong fair weather forecast. In manufacturing, the length of a rod is deterministic from the assembler’s point of view but random from the perspective of an engineer trying to improve the capability of a cutting machine.

This categorization suggests that that a phenomenon that is almost deterministic is, in some way, “less random” than one that is near uncertainty. But we need a metric of randomness to give a meaning to an expression like “less random.” Shannon’s entropy does the job. It is not defined for every probabilistic model but, where you can calculate it, it works. It is zero for a deterministic phenomenon, and rises to a maximum where all outcomes are equally likely. This brings us back to *random sampling. * We could more accurately call it “maximum randomness sampling” or “maximum entropy sampling,” but it would take too long.

Aug 23 2014

## The bell curve: “Normal” or “Gaussian”?

Most discussions of statistical quality refer to the “Normal distribution,” but “Normal” is a loaded word. If we talk about the “Normal distribution,” it implies that all other distributions are, in some way, abnormal. The “Normal distribution” is also called “Gaussian,” after the discoverer of many of its properties, and I prefer it as a more neutral term. Before Germany adopted the Euro, its last 10-Mark note featured the bell curve next to Gauss’s face.

The Gaussian distribution is widely used, and abused, because its math is simple, well known, and wonderful. Here are a few of its remarkable properties:

It applies to a broad class of measurement errors.John Herschel arrived at the Gaussian distribution for measurement errors in the position of bodies in the sky simply from the fact that the errors in x and y should be independent and that the probability of a given error should depend only on the distance from the true point.It is stable.If you add Gaussian variables, or take any linear combination of them, the result is also Gaussian.Many sums of variables converge to it.The Central Limit Theorem (CLT) says that, if you add variables that are independent, identically distributed, with a distribution that has a mean and a standard deviation, they sum converges towards a Gaussian. It makes it an attractive model, for example, for order quantities for a product coming independently from a large number of customers.It solves the equation of diffusion. The concentration of, say, a dye introduced into clear water through a pinpoint is a Gaussian that spreads overt time. You can experience it in your kitchen: fill a white plate with about 1/8 in of water, and drop the smallest amount of mint syrup you can in the center. After a few seconds, the syrup in the water forms a cloud that looks very much like a two-dimensional Gaussian bell shape for concentration, as shown on the right. And it fact it is, because the Gaussian density function solves the diffusion equation, with a standard deviation that rises with time. It also happens in gases, but too quickly to observe in your kitchen, and in solids, but too slowly.It solves the equation of heat transfer by conduction.Likewise, when heat spreads by conduction from a point source in a solid, the temperature profile is Gaussian… The equation is the same as for diffusion.Unique filter.A time-series of raw data — for temperatures, order quantities, stock prices,… — usually has fluctuations that you want to smooth out in order to bring to light the trends or cycles your are looking for. A common way of doing this is replacing each point with a moving average of its neighbors, taken over windows of varying lengths, often with weights that decrease with distance, so that a point that is 30 minutes in the past counts for less than the point of 1 second ago. And you would like to set these weights so that, whenever you enlarge the window, the peaks in your signal are eroded and the valleys fill up. A surprising, and recent discovery (1986) is that the only weighting function that does this is the Gaussian bell curve, with its standard deviation as the scale parameter.Own transform.This is mostly of interest to mathematicians, but the Gaussian bell curve is its own Laplace transform, which drastically simplifies calculations.For all these reasons, the Gaussian distribution deserves attention, but it doesn’t mean that there aren’t other models that do too. For example, when you pool the output of independent series of events, like failures of different types on a machine, you tend towards a Poisson process, characterized by independent numbers of events in disjoint time intervals, and a constant occurrence rate over time. It is also quite useful but it doesn’t command the same level of attention as the gaussian.

The most egregious misuse of the gaussian distribution is in the rank-and-yank approach to human resources, which forces bosses to rate their subordinates “on a curve.” Measuring several dimensions of people performance and examining their distributions might make sense, but mandating that grades be “normally distributed” is absurd.

## Share this:

## Like this:

By Michel Baudin • Data science • 0 • Tags: gauss, gaussian, measurement, measurement error, Normal distribution, scale-space filtering