# Gaussian (Normal) Distributions In Science

The Gaussian – also known as “Normal” – distribution is used and abused in many domains. In Manufacturing, this includes quality assurance, supply-chain management,  and human resources. This is the first in a series of posts aimed at understanding the range of applicability of this tool.

Googling uses of the normal distribution produces nearly 1 million results. Yet the top ones all ignore science, even when you narrow the query to physics, and this post attempts to remedy this. For example, the Gaussian distribution plays a central role in modeling Brownian motion, diffusion processes, heat transfer by conduction, the measurement of star positions, and the theory of gases.

These matter not just because the models are useful but also because they anchor this abstraction in physical phenomena that we can experience with no more equipment than is used in a Middle School science project. This post will not help you solve a shop floor problem by the end of the day, but I hope you will find it nonetheless enlightening.

# About The Gaussian (Normal) Distribution

If you search Wikipedia for the Gaussian distribution,” it redirects you to the article on the Normal distribution, which contains a good summary of its definition and properties. The key elements to remember are the following:

• The formula for its probability distribution function (p.d.f.) f(x)=\frac{1}{\sigma \sqrt{2 \pi}} e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2}
• When you plot this distribution, you get a bell shape:

The p.d.f. is defined for all x \in \mathbb{R}, and depends on only two parameters, the expected value \mu and the standard deviation \sigma.  Also, not all bell shapes are Gaussian; in fact, the bell used as the featured image for this post has a flatter top than the Gaussian density.

# Gaussian Versus Normal

In an earlier post, I explained my reasons to call the distribution Gaussian rather than Normal. I just found out that Karl Pearson agreed with me 100 years ago, even though he was the first to call it “Normal.” He later realized it was a mistake:

“Many years ago I called the Laplace–Gaussian curve the normal curve, which name, while it avoids an international question of priority, has the disadvantage of leading people to believe that all other distributions of frequency are in one sense or another ‘abnormal’.”  Pearson (1920)

His purpose was to avoid sterile controversy about who in what country came up with the idea, but he led others to believe it was more universally applicable than it really is, and that all other distributions are pathological deviations from it.

The Khan Academy lesson on the normal distribution, intended for High Schoolers, starts out with:

“Early statisticians noticed the same shape coming up over and over again in different distributions—so they named it the normal distribution.”

According to Karl Pearson, this was not the reason. Instead, naming it after Laplace upset the Germans and naming it after Gauss, the French.

“Normal” was neutral, and intended to allow the statistics community to stop bickering. By the late 20th century, however, even French publications routinely called it the “Gaussienne.”

# Brownian motion

Observing pollen in water under a microscope in 1827, botanist Robert Brown reported that the pollen grains were jiggling about in random patterns. Today, you don’t need to wait for spring and collect pollen, you can observe this phenomenon with 1\,\mu-diameter polystyrene microspheres in a DI water solution that you can buy online, as explained for teachers on Youtube:

It shows the movement as the agitation of water molecules knocking about the much larger microspheres in random directions. Observed at regular intervals, the path of a particle in brownian motion may look like this, from Chemtalk:

Brownian motion did not escape Walter Shewhart’s attention. In his 1931 book Economic Control of Quality of Manufactured Product, he discusses it in multiple places, as an example of randomness in nature. He even included the following diagram, which he attributed to Jean Perrin:

## Mathematical Model of Brownian Motion

In the 20th century, physicists and mathematicians, including Albert Einstein and Norbert Wiener, built mathematical models of Brownian motion, according to which the increments of motion of a particle in a time interval follow the Gaussian distribution and increments in disjoint intervals are independent.

The key to the mathematical model of Brownian is stationary independent increments: the path of a particle is infinitely divisible into sums of independent, identically distributed segments with coordinates that have a zero mean and the same standard deviation.

In the previous picture, the path of the particle from its starting point at time 0 to its ending point at time t breaks down into a sequence of n segments through times 0 \leq t_1 \leq\dots\leq t_{n-1} \leq t. Both coordinates of each of these segments are random variables with 0 mean and standard deviations that we can make the same \sigma by spreading the t_i evenly, so that t_{i+1}-t_i = \frac{t}{n} for i = 0,\dots,n-1 and t_0 = 0.

Then the variance of each coordinate of \mathbf{X_t} is n\sigma^2 by the additivity of variances of independent variables. This suggests a model where this variance is of the form Dt for some positive constant D.

The coordinates of \mathbf{X_t} can be expressed as the sum of arbitrarily many independent and identically distributed variables with the same mean and standard deviation. The original Central Limit Theorem then suggests using a Gaussian distribution with 0 mean and variance Dt as a model for each coordinate of \mathbf{X}_t.

In this model, the particle’s path between 0 and t is infinitely divisible into series of independent random segments, and each of these segments in likewise divisible, as shown below :

This structure makes the path what Benoît Mandelbrot would later call a fractal object.

## Scientific Theory of Brownian Motion

To turn this guess into a scientific theory of Brownian motion, you need to do two things:

1. Show that its math holds water; in other words, it doesn’t logically lead to absurd conclusions.
2. Confront it with Brownian motion as observed in physics.

If you are interested in the math, you will find it in Chapter 1 of J. Michael Harrison’s Brownian Motion and Stochastic Flow Systems (1985) who, in the rest of the book, connects it with networks involving random flows of goods. For the physics, see Michael Fowler’s Brownian Motion summary from 2002.

In the mathematical model, the particle’s path between 0 and t is infinitely divisible into series of independent random segments; the path of grain of pollen bounced about by molecules of water is not. Between two consecutive collisions, it moves in a straight line, so that, if the t_i are the sequence of collision times, you cannot divide it further, but this physical limit is so far from the motion we can observe that it doesn’t invalidate the math.

# Diffusion

If, instead of following the motion of one particle in a medium, you insert N = 1,000,000,000 particles at one point into the liquid, you have diffusion. If we change the notation to \mathbf{X}_i\left ( t \right ) for the position of particle i a time t, then the \mathbf{X}_1\left ( t \right ),\dots,\mathbf{X}_N\left ( t \right ) are a sample of 1 billion Gaussians with 0 mean and variance Dt. Consequently, their concentration profile around the origin matches the Gaussian probability distribution function. With just 10,000 simulated points, the scatterplot looks as follows:

With all 10,000,000 points, we get the following heat map:

This was math. Switching to physics, we can look at the actual diffusion of a droplet of mint syrup in a thin layer of water:

## The Equation of Diffusion

In 1855, decades before anyone worked out the math of Brownian motion, Adolf Fick developed the equation of diffusion that treats the concentration of the diffusing material like a continuous variable. This concentration can be interpreted as an expected value.

Fick started from the premise that the diffusing material flows from high- to low-concentration zones, at a rate proportional to the concentration gradient. In other words, it rolls down a concentration hill in proportion to how steep it is.

The flows in an out in the interval \left [ x, x+dx\right ] are therefore

\mathbf{J}\left ( x,t \right ) = -D\frac{\partial c}{\partial x}\bigg|_{x, t}

and

\mathbf{J}\left ( x+dx,t \right ) = -D\frac{\partial c}{\partial x}\bigg|_{x+dx, t}

and the net change in concentration in the infinitesimal box between x and x+dx between times t and t+dt is

\frac{\partial c}{\partial t}\bigg|_{x, t} dxdt = \left [\mathbf{J}\left ( x + dx,t \right )- \mathbf{J}\left ( x,t \right ) \right ]dt = D\frac{\partial^2 c}{\partial x^2}\bigg|_{x, t} dxdt

In other words:

\frac{\partial c}{\partial t} = D\frac{\partial^2 c}{\partial x^2}

which is the basic equation of diffusion in one dimension. With the diffusion coefficient D being the same in every direction, this extends to three dimensions as

\frac{\partial c}{\partial t} = D\left [\frac{\partial^2 c}{\partial x^2}+\frac{\partial^2 c}{\partial y^2} +\frac{\partial^2 c}{\partial z^2}\right ] = D\Delta c

where \Delta c is known as the Laplacian of c

## The Gaussian p.d.f. as Solution

The first surprise is that the Gaussian pdf with mean 0 and variance Dt solves the equation of diffusion. It’s a tedious calculation but, by taking partial derivatives, you can verify that

c\left ( x,t \right ) = \frac{k}{\sqrt{Dt}}e^{\frac{x^2}{2Dt}}

Is a solution of the equation of diffusion in one dimension, and that

c\left ( x,y,z,t \right ) = \frac{k}{\sqrt{Dt}}e^{\frac{x^2+y^2+z^2}{2Dt}}

is a solution in three dimensions, where you usually observe diffusion.

This says that the theory of Brownian motion as developed in the 20th century is consistent with the equations of diffusion Fick worked out half a century earlier.

Any linear combination of Gaussian p.d.f.’s with different means is also a solution, as is its convolution with any function representing the initial concentration at t=0, which gives you the general solution of the equation of diffusion.

If the particles are inserted at one point, the concentration of these particles in the medium after a time t is the Gaussian probability distribution function with a standard deviation proportional to t.

If they are inserted over a surface, then the concentration profile after t is the sum of the profiles generated at each point in the surface – that is, the convolution of the initial condition with the Gaussian p.d.f.

## Eroding Peaks and Filling Valleys

As t increases, diffusion erodes the peaks and fills up the valleys of concentration:

Concentration does not oscillate or propagate as a wave, as it would require peaks and valleys to go up and down. Remarkably, the only difference between the equations of diffusion and waves is a first derivative versus a second derivative with respect to time on the left side. If the equation were

\frac{\partial^2 c}{\partial t^2} = D \Delta c

c would have waves of concentration traveling at the speed of \sqrt{D}, and the Gaussian p.d.f. would not be a solution.

In the early 1980s, researcher Andrew Witkin developed Scale-Space Filtering, which involved smoothing a signal by convolving it with Gaussian p.d.f.s of increasing scale. A key motivation to use this particular function was that increasing the scale always eroded the peaks and filled the valleys.

Witkin conjectured that the Gaussian p.d.f. was the only function with this property. Retired Schlumberger executive Jean Babaud proved it by designing a signal that makes every other possible function fail. Richard O. Duda and I then shaped this proof into a paper published in 1986 that is still being cited.

The method is now used in Kernel Density Estimation, a successor to histograms for visualizing distributions from data.

## Solid-State Diffusion

Diffusion occurs in solids as well as liquids, mostly at elevated temperatures, when the agitation of the atoms in the solid creates openings for foreign particles to sneak into the solid structure. Depending on the structure of the solid, the diffusion coefficient may not be the same in all directions.

Solid-state diffusion is a topic in physical metallurgy and materials science. and the diffusion of dopants into silicon crystals is a key process in semiconductor fabrication. You can see an animation of this process on PVEducation. Andy S. Grove, in the 1960s, invented many of the processes still used today to make integrated circuits and dedicated 50 pages of his Physics and Technology of Semiconductor Devices to solid state diffusion.

## The Present, the Future, and Markov

With a given diffusion coefficient D, if you know the current concentration profile, you can calculate its future. It makes no difference how the current profile emerged. In other words, conditionally on the present, the future does not depend on the past.

This is called the Markov property, and it simplifies the math of many systems. Logically, the game of chess is markovian, in that the state of the board contains all the information a player needs to figure out the next best move. Actual chess, however, is not markovian, because players look for the best move to defeat their opponent, not the logically optimal next move. They study each other’s past games, which means that they use historical information that is not on the board.

## Chromatogram Peaks

The following picture is from an article in News Medical Life Sciences that provides advice on presenting gas chromatography and mass spectrometry results. The peaks represent additives mixed in food separated in a chromatography column and arriving at different times at a detector that identifies them by mass spectrometry.

The peaks look like narrow Gaussian p.d.f.’s as a function of time, with a height that varies with the amount of the additive found in the food. According to the literature, the Gaussian model fits the peaks, and it chalks this phenomenon up to diffusion occurring inside the column.

# Heat Transfer by Conduction

When you make contact between two bodies at different temperatures, heat flows from the hot one to the cold one by conduction. Heat transfers also take other forms, but this one is governed by the same equations as diffusion, resulting in temperature changes profiles over time that are the sum of Gaussians from each point of contact.

Joseph Fourier developed the theory of heat transfer by conduction 200 years ago, more than 30 years ahead of Fick with diffusion. He treated heat as a fluid going down a temperature gradient at a rate determined by a coefficient that is specific to the medium. The equation is as follows:

\frac{\partial u}{\partial t}=\alpha \Delta u

where u(x,y,z,t) is the temperature at point (x,y,z) at time t and \alpha is the medium’s diffusivity. It is formally identical to the equation of diffusion, and the Gaussian p.d.f. plays the same role in the solution. As for diffusion, this equation establishes that conduction produces no heat waves. The phenomena we call “heat waves” are not waves in the sense of water ripples, sound, or radio-frequency waves.

# Stars Positions and Gas Velocities

The Gaussian distribution shows up in other unexpected places, like measurement errors in the locations of stars in the sky or the apparently unrelated velocities of gas molecules. The only connection between the two is that, ten years apart, two scientists used the same simple math to establish its relevance in both contexts.

## Errors in Measurements of Star Positions

As recounted by E.T. Jaynes, in 1850, astronomer John Herschel arrived at it simply from the fact that the errors in latitude x and longitude y should be independent and that the probability of a given error should depend only on the distance from the true point. And simple math proves that these assumptions are sufficient.

Errors in the two coordinates should be independent, and the probability of a given error should depend only on the distance from the true point. And the math proving that these assumptions are sufficient is astonishingly simple. Let’s take the true position of the star as the origin.

The distributions of both coordinates must be identical, and therefore their joint p.d.f. Is of the form f(x)f(y) but, since it must depend only on the distance to the true value, it is also of the form g\left (\sqrt{x^2 + y^2}  \right ), with f(x)f(y)= g\left (\sqrt{x^2 + y^2}  \right ).Since g(x) = f(x)f(0), for any real x and y,

\log \frac{f(x)}{f(0)} + \log \frac{f(y)}{f(0)} = \log \frac{f\left (\sqrt{x^2 + y^2}  \right )}{f(0)}

If we set v(x^2)= \log \frac{f(x)}{f(0)}, this equation becomes

v(x^2) +v(y^2) = v(x^2 +y^2 )

from which we deduce that v(x^2) = \alpha x^2 for some constant \alpha and

f(x) = f(0) e^{\alpha x^2}

which, with appropriate values for f(0) and \alpha, gives you the Gaussian distribution.

## Distribution of Gas Molecule Velocities

Ten years later, James Clerk Maxwell applied the same argument in three dimensions to establish that the velocities \mathbf{v} of the molecules in a gas follow the Gaussian distribution:

f\left (\mathbf{v} \right ) dv_xdv_ydv_z=\left[\frac{m}{2 \pi k T}\right]^{\frac{3}{2}} \exp \left[-\frac{m \left ( v_x^2 + v_y^2 +v_z^2 \right )}{2 k T}\right] dv_xdv_ydv_z

where m is the mass of the molecule, T the absolute temperature, and k Boltzmann’s constant.

When you use the spherical symmetry to get the distribution g of the speed v=\sqrt{v_x^2 + v_y^2 +v_z^2}, it give you the Maxwell-Boltzmann distribution:

g(v)=\left[\frac{m}{2 \pi k T}\right]^{\frac{3}{2}} 4 \pi v^2 \exp \left[-\frac{m v^2}{2 k T}\right]