Probability For Professionals

dice In a previous post, I pointed out that manufacturing professionals’ eyes glaze over when they hear the word “probability.” Even outside manufacturing, most professionals’ idea of probability is that, if you throw a die, you have one chance in six of getting an ace.  2000 years ago, Claudius wrote a book on how to win at dice but the field of inquiry has broadened since, producing results that affect business, technology, science, politics, and everyday life.

In the age of big data, all professionals would benefit from digging deeper and becoming, at least, savvy recipients of probabilistic arguments prepared by others. The analysts themselves need a deeper understanding than their audience. With the software available today in the broad categories of data science or machine learning, however, they don’t need to master 1,000 pages of math in order to apply probability theory, any more than you need to understand the mechanics of gearboxes to drive a car.

It wasn’t the case in earlier decades, when you needed to learn the math and implement it in your own code. Not only is it now unnecessary, but many new tools have been added to the kit. You still need to learn what the math doesn’t tell you: which tools to apply, when and how, in order to solve your actual problems. It’s no longer about computing, but about figuring out what to compute and acting on the results.

Following are a few examples that illustrate these ideas, and pointers on concepts I have personally found most enlightening on this subject. There is more to come, if there is popular demand.

Continue reading

The Value Of Surveys: A Debate With Joseph Paris

Joseph Paris and I debated this issue in the Operational Excellence group on LinkedIn, where he started a discussion by posting the following:

“Riddle me this…

If the Japanese way of management and their engagement with employees is supposedly the best, yielding the best result, why is there such a lack of trust among employment across the spectrum; employers, bosses, teams/colleagues. From Bloomberg and EY.

Japanese Workers Really Distrust Their Employers preview image

Japanese Workers Really Distrust Their Employers

Lifetime employment sounds like a great thing, but not if you hate where you work. That seems to be the plight of Japanese “salarymen” and “office ladies.” Only 22 percent of Japanese workers have “a great deal of trust” in their employers, which is way below the average of eight countries surveyed, according to a new report by EY, the global accounting and consulting firm formerly known as Ernst & Young. And it’s not just the companies: Those employees are no more trusting of their bosses or colleagues, the study found.

Continue reading

Introduction to R for Excel Users | Thomas Hopper | R-bloggers

“…The quality of our decisions in an industrial environment depends strongly on the quality of our analyses of data. Excel, a tool designed for simple financial analyses, is often used for data analysis simply because it’s the tool at hand, provided by corporate IT departments who are not trained in data science.

Unfortunately, Excel is a very poor tool for data analysis and its use results in incomplete and inaccurate analyses, which in turn result in incorrect or, at best, suboptimal business decisions. In a highly competitive, global business environment, using the right tools can make the difference between a business’ survival and failure. Alternatives to Excel exist that lead to clearer thinking and better decisions. The free software R is one of the best of these…”

Sourced through from:

Continue reading

Booze, bonks and bodies | The Economist

The various Bonds are more different than you think

Sourced through from:

Michel Baudin‘s comments:

Once hailed by Edward Tufte as purveyor of the most sophisticated graphics in the press, Britain’s “The Economist” has apparently surrendered to the dictatorship of the stacked-bars.

Continue reading

Where Have The Scatterplots Gone?

What passes for “business analytics” (BI), as advertised by software vendors, is limited to basic and poorly designed charts that fail to show interactions between variables, even though the use of scatterplots and elementary regression is taught to American middle schoolers and to shop floor operators participating in quality circles.

But the software suppliers seem to think that it is beyond the cognitive ability of executives. Technically, scatterplots are not difficult to generate, and there are even techniques to visualize more complex interactions than between pairs of variables, like trendalyzers or 3D scatterplots. And, of course, visualization is only the first step. You usually need other techniques to base any decision on data.

Continue reading

“Studies show…” or do they?

Various organization put out studies that, for example, purport to “identify performances and practices in place among U.S. manufacturers.”  The reports contain tables and charts, with narratives about “significant gaps” — without stating any level of significance — or “exponential growth” — as if there were no other kind. They borrow the vocabulary of statistics or data science, but don’t actually use the science; they just use the words to support sweeping statements about what manufacturers should do for the future.

At the bottom of the reports, there usually is a paragraph about the study methodology, explaining that the data was collected as answers to questionnaires mailed to manufacturers and made available on line, with the incentive for recipients to participate  being a free copy of the report. The participants are asked, for example, to rate “the importance of process improvement to their organization’s success over the next five years” on a scale of 1 to 5.

The results are a compilation of subjective answers from a self-selected sample. In marketing, this kind of surveys makes sense. You throw out a questionnaire about a product or a service. The sheer proportion of respondents gives you information about the level of interest in what you are offering, and the responses may further tell you about popular features and shortcomings.

But it is not an effective approach to gauge the state of an industry. For this purpose, you need objective data, either on all companies involved or on a representative sample that you select. Government bodies like the Census Bureau or the Bureau of Labor Statistics collect useful global statistics like value-added per employee or the ratio indirect to direct labor by industry, but they are just a starting point.

Going beyond is so difficult that I don’t know of any successful case. Any serious assessment of a company or factory requires visiting it, interviewing its leaders in person, and reviewing its data. It takes time, money, know-how, and a willing target. It means that the sample has to be small, but there is a clash between the objective of having a representative sample and the constraint of having a sample of the willing.

For these reasons, benchmarking is a more realistic approach, and I know of at least two successful benchmarking studies in manufacturing, both of which, I believe, were funded by the Sloan Foundation:

  • The first was the International Assembly Plant Study, conducted in the late 1980s about the car industry, whose findings were summarized in The Machine That Changed The World in 1990. The goal was not to identify the distribution of manufacturing practices worldwide but to compare the approaches followed in specific plants of specific companies, for the purpose of learning. Among other things, the use of the term “Lean” came out of this study.
  • The second is the Competitive Semiconductor Manufacturing Program, which started in the early 1990s with a benchmarking study of wafer fabrication facilities worldwide. It did not have the public impact of the car assembly plant study, but it did provide valuable information to industry participants.

The car study was conducted out of MIT; the semiconductor study, out of UC Berkeley. Leadership from prestigious academic organizations helped in convincing companies to participate and provided students to collect and analyze the data. Consulting firms might have had better expertise, but could not have been perceived as neutral with respect to the approaches used by the different participants.

The bottom line is that studies based on subjective answers from a self-selected sample are not worth the disk space you can download them onto.

Betting on Lean, or …. Analytics versus Empowerment | Bill Waddell

See on Scoop.itlean manufacturing

“Management is all about playing the odds. […]  In operations, calculate lot sizes, generate forecasts and set quality standards with enough data and increasingly sophisticated algorithms and statistical methods and you will increase the chances of coming close enough.  At least that is the theory, and the hope.

This is the basic premise of big data and ERP.  With point of sale scanning, RFID, smart phones and all of the other data collecting technologies increasingly in use, the data to feed the engines is more and more available.  The potential and the lure of the data driven, analytical approach to finding the center line and getting more decisions closer to correctness is growing.

The other approach is empowered people.  Recognizing that management cannot be involved in every one of the individual customer interactions and operational, situational, tiny decisions, those calls are left to the people on the spot.  They are expected to rely on their knowledge, understanding of company values and goals, and the information available to them in very real time to decide what to do.[…] The basic question is whether empowered people will get it right more often than big computer.”

Michel Baudin‘s insight:

In this article, Bill Waddell presents the data-driven approach to management decision making as contradictory to people empowerment. I do not see these as mutually exclusive.

In 1993, there was a group within Toyota’s logistics organization in the US that, based on weather data, thought that the Mississippi might flood the railroad routes used to ship parts from the Midwest to the NUMMI plant in California. Four days before the flood, they reserved all the trucking available in the Chicago area, for the daily cost of 6 minutes of production at NUMMI. When the flood hit, they were able to ship the parts by truck around the flood zone, and NUMMI didn’t miss a beat.

This is what a good data scientist  does.

In Numbersense, Kaiser Fung points out that data analysis isn’t just about the data, but also about the assumptions people make about it. As an example, he points out the Republican polling fiasco of the 2012 election, as being due to the combination of flawed data collection and equally flawed modeling.

In other words, it’s not a computer that comes up with answers from data, but a human being, and the quality of these answers depends as much on the human analyst’s understanding of the underlying reality as it does on the ability to collect clicks from the web or transactions from point-of-sale systems.

Good data analysis does not require petabytes of data. In statistics, a small sample is 10 points; a large sample, 100 points. The difference matters because, with small samples, there are many convenient approximations that you cannot make. But 100 points is plenty for these approximations to work.

With millions of points, the tiniest wiggle in your data will show overwhelming significance in any statistical test, which means that these test are not much use in that context. To figure out what this tiny wiggle is telling you about reality, however, you still need to understand the world the data is coming from.

I don’t see an opposition between relying on people and relying on data, because, whether you realize it or not, you are never relying on data, only on people’s ability to make sense of it.

See on


Get every new post delivered to your Inbox

Join other followers:

%d bloggers like this: