Michel Baudin's Blog

Nov 12 2022

Automation and People

This is the first in a series of posts about learnings from the Van of Nerds tour of 11 manufacturing sites in Northern France completed on September 9. First, I describe who we are, why and how we went on this tour. Then, to make this post more than just an introduction, I appended a summary of our observations on automation and people. There is more to come.

By Michel Baudin • Van of Nerds • 6 • Tags: Automation, I4.0, Industry 4.0, Information technology, IT, jidoka, Lean, Operational Technoloy, OT

Nov 7 2022

Analyzing Variation with Histograms, KDE, and the Bootstrap

Assume you have a dataset that is a clean sample of a measured variable. It could be a critical dimension of a product, delivery lead times from a supplier, or environmental characteristics like temperature and humidity. How do you make it talk about the variable’s distribution? This post explores this challenge in the simple case of 1-dimensional data. I have used methods from histograms to KDE and the Bootstrap, varying in vintage from the 1890s to the 1980s:

Other methods were surely invented for the same purpose between 1895 and 1960 or since 1979, that I don’t know about or haven’t used. Readers are welcome to point them out.

The ones discussed here are not black boxes, automatically producing answers from a stream of data. All require a human to tune the settings of the tools. And this human needs to know the back story of the data.

By Michel Baudin • Data science • 2 • Tags: Histogram, KDE, Kernel Density Estimator, Process capability

Jakob-Bernoulli-stamp-Swiss-formula-graph-law-1713

Oct 12 2022

Musings on Large Numbers

Anyone who has taken an introductory course in probability, or even SPC, has heard of the law of large numbers. It’s a powerful result from probability theory, and, perhaps, the most widely used. Wikipedia starts the article on this topic with a statement that is free of any caveat or restrictions:

In probability theory, the law of large numbers is a theorem that describes the result of performing the same experiment a large number of times. According to the law, the average of the results obtained from a large number of trials should be close to the expected value and tends to become closer to the expected value as more trials are performed.

This is how the literature describes it and most professionals understand it. Buried in the fine print within the Wikipedia article, however, you find conditions for this law to apply. First, we discuss the differences between sample averages and expected values, both of which we often call “mean.” Then we consider applications of the law of large numbers in cases ranging from SPC to statistical physics. Finally, we zoom in on a simple case, the Cauchy distribution. It easily emerges from experimental data, and the Law of Large Numbers does not apply to it.

By Michel Baudin • Laws of nature • 1

Oct 4 2022

Strange Statements About Probability Models | Don Wheeler | Quality Digest

In his latest column in Quality Digest, Don Wheeler wrote the following blanket statements, free of any caveat:

“Probability models are built on the assumption that the data can be thought of as observations from a set of random variables that are independent and identically distributed.”
“In the end, which probability model you may fit to your data hardly matters. It is an exercise that serves no practical purpose.”

Source: Wheeler, D. (2022) Converting Capabilities, What difference does the probability model make? Quality Digest

Michel Baudin‘s comments:

Not all models assume i.i.d. variables

Wheeler’s first statement might have applied 100 years ago. Today, however, there are many models in probability that are not based on the assumption that data are “observations from a set of random variables that are independent and identically distributed”:

ARIMA models for time series are used, for example in forecasting beer sales.
Epidemiologists use models that assume existing infections cause new ones. Therefore counts for successive periods are not independent.
The spatial data analysis tools used in mining and oil exploration assume that an observation at any point informs you about its neighborhood. The analysts don’t assume that observations at different points are independent.
The probability models used to locate a wreck on the ocean floor, find a needle in a haystack, and other similar search problems have nothing to do with series of independent and identically distributed observations.

Probability Models Are Useful

In his second statement, Wheeler seems determined to deter engineers and managers from studying probability. If a prominent statistician tells them it serves no useful purpose, why bother? It is particularly odd when you consider that Wheeler’s beloved XmR/Process Behavior charts use control limits based on the model of observations as the sum of a constant and a Gaussian white noise.

Probability models have many useful purposes. They can keep from pursuing special causes for mere fluctuations and help you find root causes of actual problems. They also help you plan your supply chain and dimension your production lines.

Histograms are Old-Hat; Use KDE Instead

As Wheeler also says, “Many people have been taught that the first step in the statistical inquisition of their data is to fit some probability model to the histogram.” It’s time to learn something new, that takes advantage of IT developments since Karl Pearson invented the histogram in 1891.

Fitting models to a sample of 250 points based on a histogram is old-hat. A small dataset today is more 30,000 points, and you visualize its distribution with kernel density estimation(KDE), not histograms.

#donwheeler, #probability, #quality

By Michel Baudin • Press clippings • 8 • Tags: Don Wheeler, Probability, Quality

Jul 18 2022

The Most Basic Problem in Quality

Two groups of parts are supposed to be identical in quality: they have the same item number and are made to the same specs, at different times in the same production lines, at the same time in different lines, or by different suppliers.

One group may be larger than the other, and both may contain defectives. Is the difference in fraction defectives between the two groups a fluctuation or does it have a cause you need to investigate? It’s as basic a question as it gets, but it’s a real problem, with solutions that aren’t quite as obvious as one might expect. We review several methods that have evolved over the years with information technology.

By Michel Baudin • Data science • 0 • Tags: A/B testing, Barnard's Test, Binomial Probability Paper, Fisher's Test, Incoming QA, Z-test

Jun 30 2022

A Kaizen Case Study

This is the start of a new section of this blog, about case studies. The stories do not have to be extraordinary but they have to be real, from factories large and small. The Japanese example below is a manga. It’s a difficult art, and I am not expecting anyone to submit cases in this form. An infographic showing before and after states, methods used, and results achieved would be plenty. I will then format it for this blog and post it in this category.

By Michel Baudin • Case studies • 0

Automation and People

Like this:

Analyzing Variation with Histograms, KDE, and the Bootstrap

Like this:

Musings on Large Numbers

Like this:

Strange Statements About Probability Models | Don Wheeler | Quality Digest

Not all models assume i.i.d. variables

Probability Models Are Useful

Histograms are Old-Hat; Use KDE Instead

Like this:

The Most Basic Problem in Quality

Like this:

A Kaizen Case Study

Like this:

Follow Blog via Email

Recent Posts

Categories

Automation and People

Share this:

Like this:

Analyzing Variation with Histograms, KDE, and the Bootstrap

Share this:

Like this:

Musings on Large Numbers

Share this:

Like this:

Strange Statements About Probability Models | Don Wheeler | Quality Digest

Not all models assume i.i.d. variables

Probability Models Are Useful

Histograms are Old-Hat; Use KDE Instead

Share this:

Like this:

The Most Basic Problem in Quality

Share this:

Like this:

A Kaizen Case Study

Share this:

Like this:

Follow Blog via Email

Recent Posts

Categories

Social links

My tags