Deep Learning And Profound Knowledge

[The featured image is Maureen Mace’s Tree of Knowledge]

In the news, Deep Learning is the currently emblematic technology of Machine-Learning (ML) and Artificial Intelligence (AI). In Management, the System of Profound Knowledge (SoPK) is a framework by W. Edwards Deming that specifies what individuals should know to be effective leaders of business organizations.

Your knowledge is what you have learned. You would not call a deep lake profound but a deep thought is also profound and vice versa.  When discussing abstractions, there is no daylight in meaning between deep and profound.

Consequently, we might expect Deep Learning to be the process by which you acquire Profound Knowledge but it is nothing of the kind. As technical terms, they are unrelated and neither one matches expectations based on common, everyday usage. 

Deep Learning

As a technical term in AI, Deep Learning designates a class of software used on tasks like image recognition, speech recognition, and natural language processing.  Postal services use it to read addresses on envelopes and banks to read checks.

Developers at DeepMind applied this technology to the less mundane task of playing Go in developing the AlphaGo program that defeated Go world champion Lee Sedol in 2016 and then expanded into AlphaZero to also play other games like chess and shogi.

While it has clearly become a technology that professionals in every field should learn about, including Manufacturing. It means gaining an understanding of what it can and cannot do today, and how it might change in the future. 

When Outputs Need Explaining

H2O, of Mountain View, CA, is a software provider that has been hosting Meetups about Deep Learning. One I attended was about explanations. If your software tells you that a scribble is a handwritten “8,” that a pipe on your machine needs a vibration resistant fitting, or that you should approve a loan application, you may or may not need to know why. In handwriting recognition, you care that the software recognizes symbols accurately but it doesn’t matter how.

The same is true of humans. We instantly recognize symbols in other people’s handwriting but would be unable to explain how we do it. On the other hand, when you act on a diagnosis from a piece of software about a machine or the creditworthiness of an applicant, you are taking responsibility for decisions with such consequences that you can’t do it without a rationale. 

Guillaume bodiou
Guillaume Bodiou

The issues of responsibility and transparency are also behind a proposed regulation of the use of AI by the European Union. In a recent article in French, Guillaume Bodiou said this about responsibility:

“Today, the responsibility can only be human because AI has no moral judgment. Indeed, when a machine learning algorithm gives a bad result, there is inevitably a human responsibility. Already because it is this person who decided to use AI to achieve this task. Thus, in the event of a medical error, a doctor will never be able to hide behind the result given by AI.”

and this about transparency:

“The purpose of requiring transparency and intelligibility is to express to tell the human the reason for the decision. In practice, there are different types of intelligibility because the answer will depend on the audience. Indeed, a data scientist will not require the same level of intelligibility as an average citizen. For example, when AI has given a very bad credit rating, the customer needs to be able to know the reasons.”

Deep Learning Systems Cannot Explain

The way Deep Learning works, however, does not make explanations possible. “Deep” refers to having multiple layers, such as turning the pixels in a picture into strokes, and then grouping the strokes into characters, the characters into words, etc.

Within each layer, modules called “neurons” weigh input evidence and produce outputs that are themselves evidence for other neurons in the next layer. It learns by adjusting weights on inputs based on the data in a training set with answers.

The more known cases you feed it, the better it gets at finding the right answer on new ones. The University of Cincinnati Business Analytics R Programming Guide illustrates Deep Learning as follows:

Deep nn

The upshot of the H2O meetup was that the only way to figure out how a Deep Learning system “thinks” was to treat it as a black box, feed it inputs, and observe its outputs. Neural networks are supposed to emulate how the brain works, at the cellular level. For transparency and intelligibility, a system must instead emulate the way a human mind works when it consciously tackles a problem. 

How Humans Make Decisions

An experienced maintenance technician or loan officer does not function like a neural network, at least consciously. They know their domain and have solved previous cases. They summon memories of similar cases and adapt them to the new one.

The maintenance technician thus forms hypotheses to check for consistency with observations and validate through experiments. You know from experience that the most common reason for electronics to fail is being unplugged, so your first step is to check all connections. With loans, you know soon enough when you have approved a deadbeat; on the other hand, you never know how often you have denied a worthy applicant.

Individuals with less experience or who have to account for their decision process often use decision trees, checklists, or scoring rubrics. In factories, teams use frameworks at various levels of detail, like PDCA, DMAIC, 8D, or TBP to solve problems.

As seen on TV, detectives map cases on big boards with pictures of suspects, places, and objects, connected with red thread. On TV also, Dr. House’s team uses Differential Diagnosis to map patient symptoms to diseases. According to media accounts, real cops and real doctors actually use these techniques.

Alternatives to Deep Learning 

In all their variety, none of these approaches resembles Deep Learning algorithms. There are, however, other AI techniques, like induction-based decision trees and case-based reasonings (CBR), that come much closer to the way humans work and are transparent. See Bergmann et. al for examples of applications in car manufacturing, semiconductor failure analysis, or jet engine diagnosis. Following is their overview of the CBR process:

 

Deep Learning and Prejudice

The approval of loan applications is one of many decisions affecting human beings that should be based on their individual character and not membership in any group they were born into. To date, the support of Deep Learning for such decisions has not eliminated bias.  

Deep Learning tools learn from the data they read but not from the backstories they don’t read. Human societies are rife with prejudice and discrimination. First, organizations deny people education or economic opportunities based on membership in a group they were born into. Then the resulting achievement gap becomes a rationale to brand the group as  group as “inferior.” If you only feed outcomes to a Deep Learning system with group memberships it will draw the same conclusion. A term like “Deep Learning” is misleading in this case.

The key to avoiding this kind of biases is to withhold from the system any data elements that could lead the system to generalize inappropriately about individuals. Items like personal names and addresses often reveal a person’s gender and ethnicity and are irrelevant to the evaluation of, say, credit risk. Even this, however, has limits, in that, for example, the schools a person attended are revealing too but a key element in evaluating resumes for recruitment. 

Profound Knowledge

The “System of Profound Knowledge” (SoPK) was introduced by W. Edwards Deming in The New Economics, and is what the book is best known for. The MIT Press published it in 1994, a year after Deming died and six years before Igor Aizenberg first applied the term Deep Learning to neural networks.

Whoever advised Deming on the titles of his books probably had not read them, because The New Economics is not about economics anymore than Out of the Crisis was about overcoming any crisis. By contrast, Mastering the Art of French Cooking by Deming contemporary Julia Child, has a title that exactly matches its content. 

The Components Of Deming’s SoPK

Deming’s SoPK is intended to provide individuals with “a basis for judgment and for transformation of organizations.” It is what they need to rise above their daily tasks and consider the system they are participating in as an outsider would.  

We would expect “profound knowledge” to be concrete and specific. For a production operator, it would be the ability to perform a task from beginning to end while explaining the purpose of every step, as explained in TWI; for a nuclear engineer, it would be knowing the ways of accelerating or slowing the reaction and how they work. The four components of Deming’s SoPK are instead both abstract and generic:

Appreciation for a system.

A system is more than the collection of its parts, and actions on any part have repercussions on the others. This isn’t news but it’s a point Deming needed to make to managers who are routinely surprised by the unintended side-effects of their decisions. A system has a purpose, and local changes only improve it if they further this purpose.

The performance of a system usually has multiple dimensions, like Quality, Productivity, Delivery, Safety, and Morale in Manufacturing. An improvement enhances performance in at least one dimension, without degrading it in any other. Whether a change meets this criterion is not always obvious. Does it move the entire organization in the direction of a True North.

In Manufacturing, this True North is usually takt-driven production. As explained in an earlier post, in takt-driven production, you perform all operations one-piece at a time with process times that exactly match the takt time, and with instant transfer to the next operation at every beat. It is never perfectly realized, even on an assembly line. Real lines can only be approximations of it but it sets a direction for improvement.

Knowledge about variation

As Deming describes it, it is the ability to tell changes in process outcomes that are due to assignable causes from meaningless fluctuations. It means understanding Shewhart’s concept of statistical control. Deming does not see hypothesis testing as part of this knowledge, as it is “useless for prediction.”

Oddly, the book contains no reference to probability which, after all, is the math of variability. Writing in 2000, Don Wheeler, in Understanding Variation, has only one reference to probability, to claim on p. 140 that no assumptions on distributions are needed for the XmR chart. Knowledge of variation truly is knowledge of probabilities, and the coefficients used to set limits on XmR charts are based on assumptions on the distribution of the data.

Why Deming and Wheeler chose to ignore probability is a mystery. It’s time to take probability out of the closet.  Scatterplots today are taught in American Middle Schools and the meaning of “95% effective” for a vaccine, in High School. To understand variation, business professionals do not need to master probability theory themselves but they need to be savvy readers of conclusions from data scientists who do master it. 

Theory of knowledge

In a nutshell, we process information into theories that we use to make predictions. These theories embody knowledge to the extent that the predictions come true.

Counterexamples refute a theory and you must then revise or extend it. This is how knowledge accumulates. A theory that is consistent with every outcome has no predictive value and is therefore void of content.

In Karl Popper’s view, a theory has content to the extent that there is an experiment that can prove it wrong, in other words a false prediction. The theories can never be proven true. The ones we consider to be knowledge are the ones we have so far failed to disprove. This is consistent with Deming’s theory of knowledge.

In this binary view, either we have disproved a theory or we have not. In practice, the quantity of cases in which predictions have come true makes a difference. It’s not the same thing for a vaccine to be effective in samples of 100, 20,000, or 1 million patients. We have more confidence in it in the latter case than in the former. E.T. Jaynes quantifies this in terms of probability that the theory is true, which he calls “plausibility.”

There is learning not only in cases that refute the theory but also in the ones that agree with it, in that they increase its plausibility. While developed and applicable to human learning, this perspective is central to machine-learning and Deep Learning in particular.

Psychology

Deming here advocates for due consideration of human nature which, in the Toyota literature is phrased as “respect for humanity” and earlier by Lillian Gilbreth as The Psychology of Management.  

Deming wants managers to know is that workers can do more than just follow instructions defined by others and are not purely mercenary, as Frederick Taylor had assumed. It means paying attention to each individual’s abilities, ambitions, and sensitivities.

Psychology And Pseudo-Science

Deming’s discussion of psychology does not reference any of the pseudo-scientific theories and tools that litter the field of enterprise psychology, like Briggs-Myers personality profiles, Maslow’s hierarchy of needs, the Hawthorne effect, or the Kübler-Ross “five stages of grief,”  whose misapplication can be lethal, as it was in the late 2000s at France Telecom.

In response to a change in its business, the company wanted to shed 23,000 of its 130,000 employees. It also want to change the jobs of many of the remaining ones. Consultants trained the managers to expect their subordinates to take it. Based on the “five stages of grief,” they were going to move along the following orbit in energy versus satisfaction.

Kübler Ross grieving curve edited

It did not work. Tens of employees committed suicide and three top managers eventually went to prison for moral harassment. According to Valery Michaux, as of 2018, the five stages of grief were still being taught as part of change management. 

SoPK And Domain Expertise

The four components of the SoPK are useful. A manager, however, can understand systems, predict outcomes with probabilities, tell knowledge from superstition, work well with people, and still have no clue about the business and technology of, say, plastics extrusion.

For anyone who hasn’t read The New Economics, profound knowledge is about a topic. If you describe a person as having profound knowledge, you normally say of what. For plastics extrusion, it would include knowing what products you make with it, in what volumes, who buys them at what prices, as well as the requirements on raw materials, what happens inside an extruder, and how to handle the output. 

Conclusions

As technical terms Deep Learning and Profound Knowledge are both misleading. Deep Learning, even if not descriptive, is catchy;  “multilayer neural network” is descriptive but attractive to a much smaller audience. A descriptive name for Deming’s Profound Knowledge might be “generic management wisdom,” which is also likely to induce yawns.  

Commercial names matter. George Turin once told me a parable of two software companies with rival products. One was superior to the other but failed in the market:

  • The company with the superior product “sold sushi and called it raw fish.”
  • The company with the other product “sold raw fish and called it sushi.”

The winner of that contest 35 years ago dominates this market to this day. 

Implementation happens after you have bought technology or learned a theory, and this is where, as Albert Camus is reported to have said “By naming things wrongly, we add to the misfortunes of the world.” Marketing thrives on ambiguity; Engineering needs precise language. 

Further Reading