Oct 12 2022

## Musings on Large Numbers

Anyone who has taken an introductory course in probability, or even SPC, has heard of the law of large numbers. It’s a powerful result from probability theory, and, perhaps, the most widely used. Wikipedia starts the article on this topic with a statement that is free of any caveat or restrictions:

In probability theory, the law of large numbers is a theorem that describes the result of performing the same experiment a large number of times. According to the law, the average of the results obtained from a large number of trials should be close to the expected value and tends to become closer to the expected value as more trials are performed.

This is how the literature describes it and most professionals understand it. Buried in the fine print within the Wikipedia article, however, you find conditions for this law to apply. First, we discuss the differences between sample averages and expected values, both of which we often call “mean.” Then we consider applications of the law of large numbers in cases ranging from SPC to statistical physics. Finally, we zoom in on a simple case, the Cauchy distribution. It easily emerges from experimental data, and the Law of Large Numbers does *not* apply to it.

Nov 7 2022

## Analyzing Variation with Histograms, KDE, and the Bootstrap

Assume you have a dataset that is a clean sample of a measured variable. It could be a critical dimension of a product, delivery lead times from a supplier, or environmental characteristics like temperature and humidity. How do you make it talk about the variable’s distribution? This post explores this challenge in the simple case of 1-dimensional data. I have used methods from histograms to KDE and the Bootstrap, varying in vintage from the 1890s to the 1980s:

Other methods were surely invented for the same purpose between 1895 and 1960 or since 1979, that I don’t know about or haven’t used. Readers are welcome to point them out.

The ones discussed here are not black boxes, automatically producing answers from a stream of data. All require a human to tune the settings of the tools. And this human needs to know the back story of the data.

Continue reading…

## Share this:

## Like this:

By Michel Baudin • Data science • 2 • Tags: Histogram, KDE, Kernel Density Estimator, Process capability