Data mining, in general, is the retrieval of information from data collected for a different purpose, such as using sales transaction histories to infer what products tend to be bought together. By contrast, design of experiments involves the collection of observations for the purpose of confirming or refuting hypotheses.
This perspective on data mining is consistent with the literature in expressing purpose, but most authors go further. They include in their definitions that data mining is done with computers, using large databases and specific analytical tools, which I think is too restrictive. The tools they list are the ones they have found useful in analyzing the behavior of millions of users of search engines or commerce websites, and they are not obviously applicable in other areas, such as manufacturing.
During World War II, British analysts used the serial numbers of captured or destroyed German tanks to estimate the numbers produced. Because serial numbers were not attached for this purpose, it was data mining. It used clever statistical models but, obviously, no computers.
Today, PhD-level data miners at Google, eBay, or Amazon sift through the page views and click-throughs of millions of users for clues to patterns they can use. The data, automatically collected, is accurate and collected by the terabytes every day. This “big data” requires parallel processing on clusters of computers and lends itself to the most advanced analytical tools ever developed.
Compared to this fire hose of data, what manufacturing produces is a trickle. In a factory, the master data/technical specs, plans and schedules, status of operations and work in process, and the history of production over, say, 12 months, usually adds up to a few gigabytes. It doesn’t fit on one spreadsheet, but it often does on a memory stick. On the other hand, much of it is still manually generated and therefore contains errors, and it is often structured in ways that make it difficult to work with.
Even if manufacturing companies could hire the data miners away from their current jobs, their experience with e-commerce or web search would not have prepared them well for the different challenges of manufacturing data mining.
There is an opportunity for data mining to contribute to competitiveness in manufacturing, but the approach must start from the needs. It must not be an e-commerce cure in search of manufacturing diseases.