Data Mining/Machine-Learning Tools In Manufacturing

This elaborates on the section on Analyzing The Data of the previous post. For a list of tools used for  “data mining” or “machine-learning,” I researched, for each one, who invented it, when it was invented, for what purpose, and what applications it has had in manufacturing, and summarized my findings in the table below.

I am, however, not satisfied with the level of applications I found and would like to crowdsource more. If you have made these or other tools useful in your own manufacturing environment, please share whatever information you can about your applications in the survey that follows.

Data Mining/Machine Learning Tools

For a list of tools grouped by category, this table provides the following information:

  1. The year the tool was invented. This matters because of the tools’ relationship with the information technology available at the time.
  2. Who invented it. This points to the technical and social context of the invention, beyond just information technology. Sometimes we know about it from publications by the inventors or others and sometimes we don’t. And when the inventors are still alive and active, we can ask them.
  3. What problem the inventors were trying to solve. When you are looking at six different tools for clustering and five for classifying, you may wonder which one to apply, especially when you consider that mastering any one of them is a substantial commitment.
  4. Currently publicized applications in manufacturing. This cell is currently empty for quite a few of the tools and, for existing applications, the publications are coy about their maturity. It is not always obvious whether they are discussing research projects or use in daily operations.

I suspect that there are both applications I have not heard of and tools that are not on the list.

CategoryToolYearInventorOriginal PurposeManufacturing Applications
BayesRefining prior knowledge with data
Bayesian Networks1985Judea PearlDiagnosis, classification, text mining, natural language processing, speech recognition, signal processing, bioinformatics, error-control codes,
medical diagnosis, weather forecasting, cellular networks
Process problem diagnosis, failure analysis
Naive BayesEarly 1960s?Diagnosis, classificationFailure analysis
ClusteringExplorationGroup technology, cell formation
Canopy clustering2000Andrew McCallum, Kamal Nigam and Lyle UngarSpeed up clustering on large data sets
k-means clustering (a.k.a. Centroid-based clustering)1957Stuart LloydAssigning points to k cluster centersGrouping inventory by sales activity or manufacturing metrics
Correlation clustering2002Nikhil Bansal, Avrim Blum, and Shuchi ChawlaGrouping objects with correlated attributes
Density-based clustering1996Martin Ester, Hans-Peter Kriegel, Jörg Sander and Xiaowei XuGrouping points that are closely packed together.
Distribution-based clustering1998Xiaowei Xu, Martin Ester, Hans-Peter Kriegel, Jörg SanderSeparating the pooled output of multiple distributions, by distribution.
Hierarchical clustering1959, 1973Williams & Lambert (1959), Splitting, Sneath and Sokal (1973), AgglomeratingBuild a hierarchy of clusters
Collaborative filteringPredicting the interests of a user from the preferences of many users
Bayesian Networks (See above)
Latent Dirichlet allocation2003 David M. Blei, Andrew Y. Ng and Michael I. JordanDiscovery of topics in sentencesMaking sense of free text incident reports in Maintenance or Quality
Locality sensitive hashing (LSH)1998Rajeev Motwani and Piotr IndykNear-duplicate detection in text, spam filtering, audio, video or DNA fingerprinting.Detecting misspelled duplicates in product or operation names.
Dimensionality Reduction
Fisher Discriminant Analysis1936Ronald FisherSeparate two or more classes of multidimensional, gaussian data by a linear combination of featuresFailure analysis: Variation source identification in multistage manufacturing
Principal Component Analysis (PCA)1933Harold HotellingVisualizing high-dimensional datasets by replacing the points with uncorrelated linear combinations of their coordinates, sorted by decreasing variance, and plotting the top two or three.Equipment monitoring and anomaly detection in semiconductor wafer fabrication.
Kernel PCA1998Bernhard SchölkopfNovelty detection and image de-noising by PCA on transformed data
Distributed computing
MapReduce2004Google (individual contributors unknown)Managing big data across multiple servers
Ensemble MethodsUsing multiple learning algorithms to improve predictions
Bagging, a.k.a. Bootstrap Aggregation1994Leo BreimanImprove accuracy of classification and prediction by resampling and averaging results
Boosting1990Robert SchapireStrengthening a weak classifier by repeated application to weighted data. Similar to Bagging, with systematic weighting instead of resampling.
Random forests1995Tin Kam Ho, Leo Breiman, and Adele CutlerMaking decision trees more stable by a combination of bagging with random feature selection
Linear Classifiers
Support Vector Machines (SVM)1963Vladimir N. Vapnik and Alexey Ya. ChervonenkisCategorization of text and images, and proteins, and handwriting recognitionby finding hyperplanes to best separate two classes of training data. Tool/machine condition monitoring,
fault diagnosis,
tool wear,
quality monitoring
Non-linear classifiers
Classification And Regression Trees (CART)1984Leo BreimanBuild a binary decision treeYield enhancement
Induction trees1972Monica BadBuild the shortest possible decision tree from a dataset, choosing nodes by maximum information gainSequencing tests or
inspection for fastest decision.
Kernel trick1991Isabelle M. Guyon, Bernhard E. Boser and Vladimir N. VapnikExpand SVM to cases where the two classes cannot be separated by a hyperplane, by implicitly embedding the data in a higher dimensional space where they can be.
k-nearest neighbor (k-NN)1967Tom Cover and Peter HartClassification among two classes or regressionMetrology and yield enhancement in semiconductor manufacturing.
Neural networks (ANN)1943Warren McCulloch and Walter PittsUnsupervised learning for vehicle control, process control, natural resource management, quantum chemistry, game-playing and decision making, pattern recognition in radars, face identification, signal classification, gesture, speech, handwritten and printed text recognitio), medical diagnosis, finance, visualization, machine translation, social network filtering, and e-mail spam filtering.The academic papers on neural networks in manufacturing are from the mid-1990s, saying "they could be of great help," for example in part classification and family formation for Group Technology, or design engineering, or process control, or scheduling,... 20+ years later, academic papers still describe these applications as being "in the future."
Deep Learning1965Alexey IvakhnenkoAutomatic speech recognition, image recognition, natural language processing, customer relationship management, recommendation systems, … through multilayered ANNs.
Linear regression1805Adrien-Marie Legendre and Carl GaussPredicting a numeric variable as a linear combination of predictors. In production planning, demand forecasting. In quality, using easily observable substitute characteristics instead of true characteristics that may be difficult to observe or require destructive testing.
Logistic regression1958David CoxPredicting a pass/fail variable as a linear combination of predictors.In binning, predicting which bin a unit belongs in from observed variables.
Nonlinear regression??Predicting a variable as a function of predictors. In process engineering, predicting workpiece characteristic that are not linear functions of input parameters, such as thickness of an oxide layer as a function of oxidation time.
Reinforcement learning
Markov decision process1957Richard BellmanSupplementing Markov chains with decisions. A Markov chain is a sequence of events in which, knowing the present, the future does not depend on the past. Real time job-shop scheduling

Report Your Own Applications

This survey is not to score respondents or tools but simply to collect manufacturing use cases for these tools and to identify tools that are not on the list but should have been. Participants are not required to give their names or contact information but are given the opportunity to do so at the end of the questionnaire. The results will be used to enhance the table.

Most of the questions are, I believe, self-explanatory; the question about application maturity, perhaps not. The options are a follows:

  1. Research. You are using company data to try the tool and gauge its usefulness.
  2. Development. Having established the tool’s potential, you are working with the IT department to make it usable on a range of datasets extracted from the company’s systems.
  3. Pilot use. A single user is applying the tool to a single, real project.
  4. Routine use. All the users who can potentially benefit are trained to use the tool and apply in operations whenever relevant.

The checkboxes and pull-down lists are there just for your convenience. If they don’t fit your experience, please use the Other categories. The results will serve to complete and improve the above table.

[qsm quiz=1]