Nov 7 2022
Assume you have a dataset that is a clean sample of a measured variable. It could be a critical dimension of a product, delivery lead times from a supplier, or environmental characteristics like temperature and humidity. How do you make it talk about the variable’s distribution? This post explores this challenge in the simple case of 1-dimensional data. I have used methods from histograms to KDE and the Bootstrap, varying in vintage from the 1890s to the 1980s:
Other methods were surely invented for the same purpose between 1895 and 1960 or since 1979, that I don’t know about or haven’t used. Readers are welcome to point them out.
The ones discussed here are not black boxes, automatically producing answers from a stream of data. All require a human to tune the settings of the tools. And this human needs to know the back story of the data.