Jun 2 2018
Data Mining/Machine-Learning Tools In Manufacturing
This elaborates on the section on Analyzing The Data of the previous post. For a list of tools used for “data mining” or “machine-learning,” I researched, for each one, who invented it, when it was invented, for what purpose, and what applications it has had in manufacturing, and summarized my findings in the table below.
I am, however, not satisfied with the level of applications I found and would like to crowdsource more. If you have made these or other tools useful in your own manufacturing environment, please share whatever information you can about your applications in the survey that follows.
Data Mining/Machine Learning Tools
For a list of tools grouped by category, this table provides the following information:
- The year the tool was invented. This matters because of the tools’ relationship with the information technology available at the time.
- Who invented it. This points to the technical and social context of the invention, beyond just information technology. Sometimes we know about it from publications by the inventors or others and sometimes we don’t. And when the inventors are still alive and active, we can ask them.
- What problem the inventors were trying to solve. When you are looking at six different tools for clustering and five for classifying, you may wonder which one to apply, especially when you consider that mastering any one of them is a substantial commitment.
- Currently publicized applications in manufacturing. This cell is currently empty for quite a few of the tools and, for existing applications, the publications are coy about their maturity. It is not always obvious whether they are discussing research projects or use in daily operations.
I suspect that there are both applications I have not heard of and tools that are not on the list.
Category | Tool | Year | Inventor | Original Purpose | Manufacturing Applications |
---|---|---|---|---|---|
Bayes | Refining prior knowledge with data | ||||
Bayesian Networks | 1985 | Judea Pearl | Diagnosis, classification, text mining, natural language processing, speech recognition, signal processing, bioinformatics, error-control codes, medical diagnosis, weather forecasting, cellular networks | Process problem diagnosis, failure analysis | |
Naive Bayes | Early 1960s | ? | Diagnosis, classification | Failure analysis | |
Clustering | Exploration | Group technology, cell formation | |||
Canopy clustering | 2000 | Andrew McCallum, Kamal Nigam and Lyle Ungar | Speed up clustering on large data sets | ||
k-means clustering (a.k.a. Centroid-based clustering) | 1957 | Stuart Lloyd | Assigning points to k cluster centers | Grouping inventory by sales activity or manufacturing metrics | |
Correlation clustering | 2002 | Nikhil Bansal, Avrim Blum, and Shuchi Chawla | Grouping objects with correlated attributes | ||
Density-based clustering | 1996 | Martin Ester, Hans-Peter Kriegel, Jörg Sander and Xiaowei Xu | Grouping points that are closely packed together. | ||
Distribution-based clustering | 1998 | Xiaowei Xu, Martin Ester, Hans-Peter Kriegel, Jörg Sander | Separating the pooled output of multiple distributions, by distribution. | ||
Hierarchical clustering | 1959, 1973 | Williams & Lambert (1959), Splitting, Sneath and Sokal (1973), Agglomerating | Build a hierarchy of clusters | ||
Collaborative filtering | Predicting the interests of a user from the preferences of many users | ||||
Bayesian Networks (See above) | |||||
Latent Dirichlet allocation | 2003 | David M. Blei, Andrew Y. Ng and Michael I. Jordan | Discovery of topics in sentences | Making sense of free text incident reports in Maintenance or Quality | |
Locality sensitive hashing (LSH) | 1998 | Rajeev Motwani and Piotr Indyk | Near-duplicate detection in text, spam filtering, audio, video or DNA fingerprinting. | Detecting misspelled duplicates in product or operation names. | |
Dimensionality Reduction | |||||
Fisher Discriminant Analysis | 1936 | Ronald Fisher | Separate two or more classes of multidimensional, gaussian data by a linear combination of features | Failure analysis: Variation source identification in multistage manufacturing processes |
|
Principal Component Analysis (PCA) | 1933 | Harold Hotelling | Visualizing high-dimensional datasets by replacing the points with uncorrelated linear combinations of their coordinates, sorted by decreasing variance, and plotting the top two or three. | Equipment monitoring and anomaly detection in semiconductor wafer fabrication. | |
Kernel PCA | 1998 | Bernhard Schölkopf | Novelty detection and image de-noising by PCA on transformed data | ||
Distributed computing | |||||
MapReduce | 2004 | Google (individual contributors unknown) | Managing big data across multiple servers | ||
Ensemble Methods | Using multiple learning algorithms to improve predictions | ||||
Bagging, a.k.a. Bootstrap Aggregation | 1994 | Leo Breiman | Improve accuracy of classification and prediction by resampling and averaging results | ||
Boosting | 1990 | Robert Schapire | Strengthening a weak classifier by repeated application to weighted data. Similar to Bagging, with systematic weighting instead of resampling. | ||
Random forests | 1995 | Tin Kam Ho, Leo Breiman, and Adele Cutler | Making decision trees more stable by a combination of bagging with random feature selection | ||
Linear Classifiers | |||||
Support Vector Machines (SVM) | 1963 | Vladimir N. Vapnik and Alexey Ya. Chervonenkis | Categorization of text and images, and proteins, and handwriting recognitionby finding hyperplanes to best separate two classes of training data. | Tool/machine condition monitoring, fault diagnosis, tool wear, quality monitoring |
|
Non-linear classifiers | |||||
Classification And Regression Trees (CART) | 1984 | Leo Breiman | Build a binary decision tree | Yield enhancement | |
Induction trees | 1972 | Monica Bad | Build the shortest possible decision tree from a dataset, choosing nodes by maximum information gain | Sequencing tests or inspection for fastest decision. |
|
Kernel trick | 1991 | Isabelle M. Guyon, Bernhard E. Boser and Vladimir N. Vapnik | Expand SVM to cases where the two classes cannot be separated by a hyperplane, by implicitly embedding the data in a higher dimensional space where they can be. | ||
k-nearest neighbor (k-NN) | 1967 | Tom Cover and Peter Hart | Classification among two classes or regression | Metrology and yield enhancement in semiconductor manufacturing. | |
Neural networks (ANN) | 1943 | Warren McCulloch and Walter Pitts | Unsupervised learning for vehicle control, process control, natural resource management, quantum chemistry, game-playing and decision making, pattern recognition in radars, face identification, signal classification, gesture, speech, handwritten and printed text recognitio), medical diagnosis, finance, visualization, machine translation, social network filtering, and e-mail spam filtering. | The academic papers on neural networks in manufacturing are from the mid-1990s, saying "they could be of great help," for example in part classification and family formation for Group Technology, or design engineering, or process control, or scheduling,... 20+ years later, academic papers still describe these applications as being "in the future." | |
Deep Learning | 1965 | Alexey Ivakhnenko | Automatic speech recognition, image recognition, natural language processing, customer relationship management, recommendation systems, … through multilayered ANNs. | ||
Regression | |||||
Linear regression | 1805 | Adrien-Marie Legendre and Carl Gauss | Predicting a numeric variable as a linear combination of predictors. | In production planning, demand forecasting. In quality, using easily observable substitute characteristics instead of true characteristics that may be difficult to observe or require destructive testing. | |
Logistic regression | 1958 | David Cox | Predicting a pass/fail variable as a linear combination of predictors. | In binning, predicting which bin a unit belongs in from observed variables. | |
Nonlinear regression | ? | ? | Predicting a variable as a function of predictors. | In process engineering, predicting workpiece characteristic that are not linear functions of input parameters, such as thickness of an oxide layer as a function of oxidation time. | |
Reinforcement learning | |||||
Markov decision process | 1957 | Richard Bellman | Supplementing Markov chains with decisions. A Markov chain is a sequence of events in which, knowing the present, the future does not depend on the past. | Real time job-shop scheduling |
Report Your Own Applications
This survey is not to score respondents or tools but simply to collect manufacturing use cases for these tools and to identify tools that are not on the list but should have been. Participants are not required to give their names or contact information but are given the opportunity to do so at the end of the questionnaire. The results will be used to enhance the table.
Most of the questions are, I believe, self-explanatory; the question about application maturity, perhaps not. The options are a follows:
- Research. You are using company data to try the tool and gauge its usefulness.
- Development. Having established the tool’s potential, you are working with the IT department to make it usable on a range of datasets extracted from the company’s systems.
- Pilot use. A single user is applying the tool to a single, real project.
- Routine use. All the users who can potentially benefit are trained to use the tool and apply in operations whenever relevant.
The checkboxes and pull-down lists are there just for your convenience. If they don’t fit your experience, please use the Other categories. The results will serve to complete and improve the above table.
[qsm quiz=1]
George Baggs
June 6, 2018 @ 8:41 pm
CNN and related deep-learning ML methods excel at classifying process outputs that are visually intensive. An example of this type of process is metal additive manufacturing. We have been able to use CNNs to augment the SMEs that currently evaluate these outputs, and have found the machines to be more consistent at classification and sorting (i.e unbiased and with much higher accuracy) than the humans. Another reason is the growing volume of data that is being generated by the AM machines as production ramps up…the human SMEs are becoming overwhelmed, creating a process bottleneck that can only be relieved through automation.