Don Wheeler’s Understanding Variation starts with a chapter entitled “Data are random and miscellaneous” that contains no discussion of any part of its title. Implicit in Wheeler’s book, however, is the view that data consists of tables of numbers, representing either measured variables — lengths, weights, densities,… — or event occurrence counts — defective units, defects, machine failures,…
Many times, I have quoted computer scientist Don Knuth on this subject, saying that data is “the stuff that’s input or output,” meaning anything that can be read or written, and it includes much more than tables of numbers. The data we work with today includes, for example, the following:
- Unstructured text, like 25,000 incident reports written by maintenance techs all over the world in their versions of English about problems with jet engines, or thousands of product reviews posted by consumers on e-commerce sites
- Images, like photographs of visual defects on products, or electron-microscope images of integrated circuits.
- Videos recordings of operations.
Analyzing data about a manufacturing process today means extracting information from all sources. The state of the art, based on automatic data acquisition and databases includes analytical techniques that were unthinkable in Shewhart’s day, known under the labels of data science, data mining or machine learning.