Jan 5 2021
Deep Learning And Profound Knowledge
[The featured image is Maureen Mace’s Tree of Knowledge]
In the news, Deep Learning is the currently emblematic technology of Machine-Learning (ML) and Artificial Intelligence (AI). In Management, the System of Profound Knowledge (SoPK) is a framework by W. Edwards Deming that specifies what individuals should know to be effective leaders of business organizations.
Your knowledge is what you have learned. You would not call a deep lake profound but a deep thought is also profound and vice versa. When discussing abstractions, there is no daylight in meaning between deep and profound.
Consequently, we might expect Deep Learning to be the process by which you acquire Profound Knowledge but it is nothing of the kind. As technical terms, they are unrelated and neither one matches expectations based on common, everyday usage.
Deep Learning
As a technical term in AI, Deep Learning designates a class of software used on tasks like image recognition, speech recognition, and natural language processing. Postal services use it to read addresses on envelopes and banks to read checks.
Developers at DeepMind applied this technology to the less mundane task of playing Go in developing the AlphaGo program that defeated Go world champion Lee Sedol in 2016 and then expanded into AlphaZero to also play other games like chess and shogi.
While it has clearly become a technology that professionals in every field should learn about, including Manufacturing. It means gaining an understanding of what it can and cannot do today, and how it might change in the future.
When Outputs Need Explaining
H2O, of Mountain View, CA, is a software provider that has been hosting Meetups about Deep Learning. One I attended was about explanations. If your software tells you that a scribble is a handwritten “8,” that a pipe on your machine needs a vibration resistant fitting, or that you should approve a loan application, you may or may not need to know why. In handwriting recognition, you care that the software recognizes symbols accurately but it doesn’t matter how.
The same is true of humans. We instantly recognize symbols in other people’s handwriting but would be unable to explain how we do it. On the other hand, when you act on a diagnosis from a piece of software about a machine or the creditworthiness of an applicant, you are taking responsibility for decisions with such consequences that you can’t do it without a rationale.
The issues of responsibility and transparency are also behind a proposed regulation of the use of AI by the European Union. In a recent article in French, Guillaume Bodiou said this about responsibility:
“Today, the responsibility can only be human because AI has no moral judgment. Indeed, when a machine learning algorithm gives a bad result, there is inevitably a human responsibility. Already because it is this person who decided to use AI to achieve this task. Thus, in the event of a medical error, a doctor will never be able to hide behind the result given by AI.”
and this about transparency:
“The purpose of requiring transparency and intelligibility is to express to tell the human the reason for the decision. In practice, there are different types of intelligibility because the answer will depend on the audience. Indeed, a data scientist will not require the same level of intelligibility as an average citizen. For example, when AI has given a very bad credit rating, the customer needs to be able to know the reasons.”
Deep Learning Systems Cannot Explain
The way Deep Learning works, however, does not make explanations possible. “Deep” refers to having multiple layers, such as turning the pixels in a picture into strokes, and then grouping the strokes into characters, the characters into words, etc.
Within each layer, modules called “neurons” weigh input evidence and produce outputs that are themselves evidence for other neurons in the next layer. It learns by adjusting weights on inputs based on the data in a training set with answers.
The more known cases you feed it, the better it gets at finding the right answer on new ones. The University of Cincinnati Business Analytics R Programming Guide illustrates Deep Learning as follows:
The upshot of the H2O meetup was that the only way to figure out how a Deep Learning system “thinks” was to treat it as a black box, feed it inputs, and observe its outputs. Neural networks are supposed to emulate how the brain works, at the cellular level. For transparency and intelligibility, a system must instead emulate the way a human mind works when it consciously tackles a problem.
How Humans Make Decisions
An experienced maintenance technician or loan officer does not function like a neural network, at least consciously. They know their domain and have solved previous cases. They summon memories of similar cases and adapt them to the new one.
The maintenance technician thus forms hypotheses to check for consistency with observations and validate through experiments. You know from experience that the most common reason for electronics to fail is being unplugged, so your first step is to check all connections. With loans, you know soon enough when you have approved a deadbeat; on the other hand, you never know how often you have denied a worthy applicant.
Individuals with less experience or who have to account for their decision process often use decision trees, checklists, or scoring rubrics. In factories, teams use frameworks at various levels of detail, like PDCA, DMAIC, 8D, or TBP to solve problems.
As seen on TV, detectives map cases on big boards with pictures of suspects, places, and objects, connected with red thread. On TV also, Dr. House’s team uses Differential Diagnosis to map patient symptoms to diseases. According to media accounts, real cops and real doctors actually use these techniques.
Alternatives to Deep Learning
In all their variety, none of these approaches resembles Deep Learning algorithms. There are, however, other AI techniques, like induction-based decision trees and case-based reasonings (CBR), that come much closer to the way humans work and are transparent. See Bergmann et. al for examples of applications in car manufacturing, semiconductor failure analysis, or jet engine diagnosis. Following is their overview of the CBR process:
Deep Learning and Prejudice
The approval of loan applications is one of many decisions affecting human beings that should be based on their individual character and not membership in any group they were born into. To date, the support of Deep Learning for such decisions has not eliminated bias.
Deep Learning tools learn from the data they read but not from the backstories they don’t read. Human societies are rife with prejudice and discrimination. First, organizations deny people education or economic opportunities based on membership in a group they were born into. Then the resulting achievement gap becomes a rationale to brand the group as group as “inferior.” If you only feed outcomes to a Deep Learning system with group memberships it will draw the same conclusion. A term like “Deep Learning” is misleading in this case.
The key to avoiding this kind of biases is to withhold from the system any data elements that could lead the system to generalize inappropriately about individuals. Items like personal names and addresses often reveal a person’s gender and ethnicity and are irrelevant to the evaluation of, say, credit risk. Even this, however, has limits, in that, for example, the schools a person attended are revealing too but a key element in evaluating resumes for recruitment.
Profound Knowledge
The “System of Profound Knowledge” (SoPK) was introduced by W. Edwards Deming in The New Economics, and is what the book is best known for. The MIT Press published it in 1994, a year after Deming died and six years before Igor Aizenberg first applied the term Deep Learning to neural networks.
Whoever advised Deming on the titles of his books probably had not read them, because The New Economics is not about economics anymore than Out of the Crisis was about overcoming any crisis. By contrast, Mastering the Art of French Cooking by Deming contemporary Julia Child, has a title that exactly matches its content.
The Components Of Deming’s SoPK
Deming’s SoPK is intended to provide individuals with “a basis for judgment and for transformation of organizations.” It is what they need to rise above their daily tasks and consider the system they are participating in as an outsider would.
We would expect “profound knowledge” to be concrete and specific. For a production operator, it would be the ability to perform a task from beginning to end while explaining the purpose of every step, as explained in TWI; for a nuclear engineer, it would be knowing the ways of accelerating or slowing the reaction and how they work. The four components of Deming’s SoPK are instead both abstract and generic:
Appreciation for a system.
A system is more than the collection of its parts, and actions on any part have repercussions on the others. This isn’t news but it’s a point Deming needed to make to managers who are routinely surprised by the unintended side-effects of their decisions. A system has a purpose, and local changes only improve it if they further this purpose.
The performance of a system usually has multiple dimensions, like Quality, Productivity, Delivery, Safety, and Morale in Manufacturing. An improvement enhances performance in at least one dimension, without degrading it in any other. Whether a change meets this criterion is not always obvious. Does it move the entire organization in the direction of a True North.
In Manufacturing, this True North is usually takt-driven production. As explained in an earlier post, in takt-driven production, you perform all operations one-piece at a time with process times that exactly match the takt time, and with instant transfer to the next operation at every beat. It is never perfectly realized, even on an assembly line. Real lines can only be approximations of it but it sets a direction for improvement.
Knowledge about variation
As Deming describes it, it is the ability to tell changes in process outcomes that are due to assignable causes from meaningless fluctuations. It means understanding Shewhart’s concept of statistical control. Deming does not see hypothesis testing as part of this knowledge, as it is “useless for prediction.”
Oddly, the book contains no reference to probability which, after all, is the math of variability. Writing in 2000, Don Wheeler, in Understanding Variation, has only one reference to probability, to claim on p. 140 that no assumptions on distributions are needed for the XmR chart. Knowledge of variation truly is knowledge of probabilities, and the coefficients used to set limits on XmR charts are based on assumptions on the distribution of the data.
Why Deming and Wheeler chose to ignore probability is a mystery. It’s time to take probability out of the closet. Scatterplots today are taught in American Middle Schools and the meaning of “95% effective” for a vaccine, in High School. To understand variation, business professionals do not need to master probability theory themselves but they need to be savvy readers of conclusions from data scientists who do master it.
Theory of knowledge
In a nutshell, we process information into theories that we use to make predictions. These theories embody knowledge to the extent that the predictions come true.
Counterexamples refute a theory and you must then revise or extend it. This is how knowledge accumulates. A theory that is consistent with every outcome has no predictive value and is therefore void of content.
In Karl Popper’s view, a theory has content to the extent that there is an experiment that can prove it wrong, in other words a false prediction. The theories can never be proven true. The ones we consider to be knowledge are the ones we have so far failed to disprove. This is consistent with Deming’s theory of knowledge.
In this binary view, either we have disproved a theory or we have not. In practice, the quantity of cases in which predictions have come true makes a difference. It’s not the same thing for a vaccine to be effective in samples of 100, 20,000, or 1 million patients. We have more confidence in it in the latter case than in the former. E.T. Jaynes quantifies this in terms of probability that the theory is true, which he calls “plausibility.”
There is learning not only in cases that refute the theory but also in the ones that agree with it, in that they increase its plausibility. While developed and applicable to human learning, this perspective is central to machine-learning and Deep Learning in particular.
Psychology
Deming here advocates for due consideration of human nature which, in the Toyota literature is phrased as “respect for humanity” and earlier by Lillian Gilbreth as The Psychology of Management.
Deming wants managers to know is that workers can do more than just follow instructions defined by others and are not purely mercenary, as Frederick Taylor had assumed. It means paying attention to each individual’s abilities, ambitions, and sensitivities.
Psychology And Pseudo-Science
Deming’s discussion of psychology does not reference any of the pseudo-scientific theories and tools that litter the field of enterprise psychology, like Briggs-Myers personality profiles, Maslow’s hierarchy of needs, the Hawthorne effect, or the Kübler-Ross “five stages of grief,” whose misapplication can be lethal, as it was in the late 2000s at France Telecom.
In response to a change in its business, the company wanted to shed 23,000 of its 130,000 employees. It also want to change the jobs of many of the remaining ones. Consultants trained the managers to expect their subordinates to take it. Based on the “five stages of grief,” they were going to move along the following orbit in energy versus satisfaction.
It did not work. Tens of employees committed suicide and three top managers eventually went to prison for moral harassment. According to Valery Michaux, as of 2018, the five stages of grief were still being taught as part of change management.
SoPK And Domain Expertise
The four components of the SoPK are useful. A manager, however, can understand systems, predict outcomes with probabilities, tell knowledge from superstition, work well with people, and still have no clue about the business and technology of, say, plastics extrusion.
For anyone who hasn’t read The New Economics, profound knowledge is about a topic. If you describe a person as having profound knowledge, you normally say of what. For plastics extrusion, it would include knowing what products you make with it, in what volumes, who buys them at what prices, as well as the requirements on raw materials, what happens inside an extruder, and how to handle the output.
Conclusions
As technical terms Deep Learning and Profound Knowledge are both misleading. Deep Learning, even if not descriptive, is catchy; “multilayer neural network” is descriptive but attractive to a much smaller audience. A descriptive name for Deming’s Profound Knowledge might be “generic management wisdom,” which is also likely to induce yawns.
Commercial names matter. George Turin once told me a parable of two software companies with rival products. One was superior to the other but failed in the market:
- The company with the superior product “sold sushi and called it raw fish.”
- The company with the other product “sold raw fish and called it sushi.”
The winner of that contest 35 years ago dominates this market to this day.
Implementation happens after you have bought technology or learned a theory, and this is where, as Albert Camus is reported to have said “By naming things wrongly, we add to the misfortunes of the world.” Marketing thrives on ambiguity; Engineering needs precise language.
Further Reading
- Goodfellow, I. et al. (2016) Deep Learning, MIT Press
- Nielsen, M. (2015) Neural Networks and Deep Learning, Determination Press
- Bergman, R. et al. (2003) Industrial Case-Based Reasoning Applications: The INRECA Methodology, 2nd Edition, Springer, ISBN: 978-3540207375
- Jaynes, E.T. (2003) Probability Theory: The Logic of Science, Cambridge University Press, ISBN: 978-0521592710
- Wheeler, D. J. (2000) Understanding Variation, 2nd Edition, SPC Press, ISBN: 978-0945320-53-1
- Deming, W. (1994) The New Economics, MIT Press, ISBN: 978-0-262-54116-9
- Popper, K. (1959) The Logic of Scientific Discovery, Martino Fine Books (December 3, 2014), ISBN: 978-1614277439
- Gilbreth, L. M. (1914) The Psychology of Management: The Function of the Mind in Determining, Teaching and Installing Methods of Least Waste, 2016 reprint from Bibliolife, ISBN:9781355049975
sid joynson
January 5, 2021 @ 11:31 am
When trying to understand any subject or situation we must recall the advice of Ivan & Taiichi
“Don’t just be collectors of facts. Try to penetrate to the secrets of their occurrence, persistently search for the laws that govern them.” Pavlov.
“Understanding is my favourite word. I believe it has a specific meaning to approach an object/subject positively & comprehend its nature.” Ohno.
“It is easy to remember theory with the mind; the problem is to remember with the body. The goal is to know & do instinctively. Having the spirit to endure the training & practice is the first step on the road to understanding.” Taiichi Ohno.
The Sensei’s role to guide the student along the path to personal understanding. The path must travel through the 4 A’s. This sequence describes the journey we must travel from not-knowing to understanding
A1. I am A-void. I don’t think and I don’t act.
A2 – I become Aware. I do think but I don’t act.
A3 – I Adopt. I do think & I do act.
A4 – I become Adept. I can act without thinking.
When the true Sensei’s job is done, the people should say, “We can do this for ourselves”.
Learning to drive a car is a perfect example of this process that we have all followed.
Bruce Lee defined A3 as the key step, “If you want to learn to swim, jump into the water. On dry land no frame of mind is ever going to help you.”
The intellectual mind prefers A2, & wants to stand on the river- bank & discuss the nature of water.
Ronald Kirby
January 6, 2021 @ 10:03 am
Engineering needs precise language. “Plausibility”, of True North Lean 360 degs. can only be proven with the Value Stream of team employees from (Start to Finish).
Perception of a variable TT of customer demand accomplished with empathy & knowledge of actual 360 Deg. data at each stage of Value Stream variability, in each step of the (process’s) that controls the limited knowledge of TT of daily delivery’s of JIT, even between each step of the inert (false) signal that will always betray the human team signal of completion and managements desire to always reschedule or demand increased false schedules of finished products in order to meet advanced profits due to managements non understanding of True Lean North 360 Deg.
TT daily demands of the customer; (“Quote” Iwata San / Nakao San 2001 – The Customer (IE) Boeing Is not always Correct)… with their increasing and decreasing of TT schedules, BO & GEA management greed has over come GEA and Boeing (Delivery Schedules) of TT in advancing the Cash Flow of (“PERSUMED Plausibility”)… of Cash Flow when the world has Max Aircraft and Demand for Leap Engines SD over night for almost (2+) years.
In the end, all the Kings Men (Management) and Horses (Teams) could not put Humpty Dumpty back together again… until 2023.5! U Think, what was the total cost for, “OVERPRODUCTION”? I can’t count that high in lost wages alone.
Ron Kirby San
Michel Baudin
January 6, 2021 @ 10:16 am
In “Back to School,” the Rodney Dangerfield character says of a Professor: “He really seems to care, about what I have no idea.”
This is what your comment reminds me of. You seem to want to say something about takt times an the B737 Max case. I would like to understand your points but I can’t. Could you please clarify them?
Ronald Kirby
January 6, 2021 @ 11:09 am
Customer Demand = TT (the amount of time it takes at each Value Stream) process to deliver it the finished product to the end customer JIT. These products have yet to be paid for or delivered. In this example of a Boeing737 Max Aircraft. As orders piled up at BO for these aircraft between 2018-2020 the need for cash flow increased with every order being placed and not being sold or leased during that time frame, Boeing demanded more engines and aircraft parts from working teams around the clock to play catch up, even though they were not paid for yet for the orders and leased planes, After the 2 crashes the planes keep flying and being sold until the FAA SD the flying customer aircraft 737’s in the fall October 2019. Boeing keep making the planes and ordering more engines from GEA? In March of 2020, all assy. lines for BO Max Aircraft and GEA Leap engines were stopped because of over production & Covid-19. Why did Bo & GEA continue to Overproduce Products for the 737 Max Aircraft? Total aircraft sitting waiting for customers in March totaled over aprox. 1800 Aircraft? BO & GEA SD 737 Max Production, all sites producing this model Aircraft and Engines are dead stopped as of today.
All of this loss of profit can be blamed on one thing… (“PERSUMED Plausibility”), the need of greed in “Overproduction”!
I apricate your need for… the Knowledge of understanding in your need of “Empathy”… explanation. The timing extrapolation for the start up of this Planes production will not be back up on line until 2023.5.
Thank u for your article above: “Deep Learning And Profound Knowledge”
Ron Kirby San
Michel Baudin
January 6, 2021 @ 12:49 pm
What is the connection with Deep Learning and Profound Knowledge?
Ronald Kirby
January 7, 2021 @ 10:03 am
https://www.flightglobal.com/
As a Lean Specialist in AI inputs of Profound Knowledge & Deep Learnings, one will never see these inputs of Knowledge & Learnings into the (AI data base) of TT input into the technical scheduling of TT. U ask “Why”, this has nothing to do with absolute precision timing of TT Schedules? Or does it? Just read the daily heart beat of the worlds aviation’s news updates & daily updates of the Covid-19 Impact of Aviation Sales, Orders & MRO. Who is buying and who is selling? Where are the data points for AI TT when to speed up ,slow down or to stop? Did I miss something, oh yea I forgot to close the refrigerator door!
Ron Kirby San
Lonnie Wilson
January 7, 2021 @ 11:40 am
Michel,
I am not sure what motivated this, seems like an odd comparison. You say,
” Consequently, we might expect Deep Learning to be the process by which you acquire Profound Knowledge but it is nothing of the kind. As technical terms, they are unrelated and neither one matches expectations based on common, everyday usage”.
Not sure why we would “expect” this as one is about machines and one is about people, but that aside I have a couple of issues.
For Kubler Ross’s and other concepts you seem to denigrate them by using the term “pseudo science”. Since you do not carefully describe what pseudo-science is I can only infer it to be some inferior scientific theory. K-R it has it detractors but it has also been a strong tool in the field of psychology. And to cite one example to support that denigration is hardly compelling evidence. All principles and theories have a bounded applicability which means they work quite well; as Shewhart says ” … at least within limits” and then do not work very well outside those boundaries. Beyond that, any theory, practice or principle, can be done poorly. So finding a piece of evidence that tends to disprove a theory has first to be reality tested against the execution of that theory. simple things such as bad measurement systems; more complicated things such as poor environmental control of the experiment; up to and including having a superficial understanding of the theory can lead to bad results…which have little to do with the strength of the underlying theory. A deep understanding of Maslow’s principles have had huge and positive impacts on many fields of human endeavor so although they may not be perfectly tuned, that does not mean they are not useful.
thanks again and keep poking the bear
Michel Baudin
January 7, 2021 @ 4:36 pm
Hi Lonnie,
I wrote this because I see comments about “deep learning” from Kaizen and Deep Learning as an AI method in the same space. I thought I explained why I would expect deep learning to lead to profound knowledge: learning is the way you acquire knowledge, while deep and profound are synonyms. Ergo, deep learning is the acquisition of profound knowledge.
Psychology, I think, is a much harder science than rockets. The human mind is more complex than propulsion and more difficult to experiment with. I don’t believe that Maslow or Kübler-Ross ever claimed to have run experiments to support their hierarchy of needs or stages of grief.
I don’t even know what such experiments might consist of but they would be necessary to establish a range of applicability. I think these theories are nothing more than reasonable-sounding frameworks.
The claims about the actual number of suicides at France Telecom vary from 19 to 60, with arguments about imputation to the HR policy. The consequences in this one case strike me as severe enough to give pause to anyone thinking of teaching the “5 stages of grief” to managers.
I call such theories pseudo-science because they don’t meet the standards needed to be called science. Putting them on an equal footing with the theory of rocket propulsion is not justified.
Lonnie Wilson
January 8, 2021 @ 9:34 am
Michel,
Thanks for the quick reply. although i agree that Kubler Ross, Maslow, Erikson as well as principles like Mcgregors theory X and Y, may not have been validated by controlled experimentation but it does not mean they are any less useful.
To me the key is usefulness and predictability, and it would be hard to argue that they have not provided use and do not have predictive power. Although I am a strong adherent of scientific principles, at the same time I am often skeptical of the claims made in the name of science.
I lived in Southern California when they were making a strong effort to control auto emissions to reduce smog. Well, after about 10 years they learned their well-thought out and scientifically proven technology was actually increasing the smog.
Well, like I said earlier, any good theory can be applied poorly and make the theory look bad. Likewise bad science can be proven, no problem there.
As for what the pseudo sciences, I am mostly interested in two qualities. Can use you use them to improve things? But the second one is the very purpose of theories and can you use them to predict?
If they have predictive power, then I think they have a use. Whether they are what you call science or pseudo-science I believe they all have a bounded applicability.
And for those we call science we have spent a lot more time defining the boundaries and the applicability. In that regard, those who can properly use Kubler Ross, Maslows principles of hierarchy or Eupsychian Management, or McGregors framework or Deming’s principles may be using science and just not know it.
To me they are far more than reasonable-sounding frameworks, they are useful and they are helpful.
When I was growing up, my grandmother lived next door to us. She grew up in a tiny village in the Italian alps in Northern Italy and immigrated to the US at 20. When we would get sick she would often have some remedies that the doctors laughed at…but they worked, repeatedly. Her remedies helped and given the symptoms she often had something for us that worked repeatedly, they were useful and predictable.
I too would give pause to teach managers the 5 Stages of Grief, but not because of the theory. I would worry about the application by the managers. my grandmother in her thick Italian accent would say, “Lonnie, we get too soon old, and too late smart”. be well and stay safe…