yardstick

The staying power of bad metrics


A speaker I once heard on manufacturing metrics started with a quote from football coach Vince Lombardi: “If you’re not keeping score, you’re only practicing.” In a sport, your score or your rank is, by definition, the correct measure of success, and we assume too easily that this kind of thinking crosses over to every human endeavor, from national economies to plant performance or education. In this process, we begin using highly aggregated metrics as if they were physical measurements like mass or speed, and avert our eyes from how these sausages are made.

Following are a few of the egregious examples:

  • GDP. Gross Domestic Product (GDP), for example, is in the news everyday. If you pollute and spend money to clean up your toxic waste, you contribute more to the GDP than if you produce cleanly. Because of this kind of absurdity, GDP as a metric has been criticized by many economists, including Joseph Stiglitz. In 2009, he even convinced French president Nicolas Sarkozy to seek alternatives. Yet, two years later, the same president is pushing to include in the country’s constitution a “Golden Rule” that caps budget deficits at a percentage of the same flawed GDP!
  • IQ. In the US, IQ  is still widely treated as a measure of intelligence. On its face, the notion that human intelligence is reducible to a number is an insult to its subject. In fact, all an IQ measures is the ability to take an IQ test. Psychologists recognize this, but many school teachers and the public at large don’t. (See Steven Jay Gould’s The Mismeasure of Man.)
  • Food calories. Calories are the most commonly used metric in nutrition. What this number actually represents is the heat generated by drying and burning a food item. But is digestion the same as combustion? Obviously not, for example, for fibers, which cross the human body unchanged. The absurdity of assigning calories to fibers has not escaped one dieter, who questioned it on a Calorie Count forum, and received, among other replies, the following:

Fiber calories are included in nutrition information, but only in come countries. In the US, it is legal to not put in fiber calories because they are not digestible. Therefore, they do not “count” as such. however, if you, like most people, tend to underestimate cals sightly, there is nothing wring with including them to create a “buffer zone.”

In other words, it makes no sense but you should pretend it does.

Do we behave the same way in the manufacturing world? Yes. For example, many companies measure productivity in terms of Sales/Employee. There is an easy way to boost this metric: outsource all production, close all plants and become a trading company. It is not easy to find metrics for quality, cost, delivery, safety and morale that are meaningful and cannot be gamed, but it can be done. For overall company productivity, for example, you can use Value added/Employee, where

Value added = Sales – (Materials + Energy + Outsourced Services)

This is what Peter Drucker called Contributed Value. Value added/Employee is not a perfect metric, but at least it does not provide a perverse incentive to outsource, and the US census bureau publishes statistics on value added and employment by industry, that are helpful for benchmarking.

Following are a few conditions that a good metric must meet:

  1. A good metric is immediately understandable. No training or even explanation is required to figure out what it means, and the number directly maps to reality, free of any manipulation. One type of common manipulation is to assume that one particular ratio cannot possibly be over 85%, and redefine 85% for this ratio as “100% performance.” While this makes performance look better, it also makes the number misleading and difficult to interpret.
  2. People see how they can affect the outcome. With a good metric, it is also easy to understand what kind of actions can affect the value of the measurement. A shop floor metric, for example, should not a be function of the price of oil in the world market, because there is nothing the operators can do to affect it. Their actions, on the other hand, can affect the number of labor-hours required per unit, or the rework rate.
  3. A better value for the metric always means better business performance for the company. One of the most difficult characteristics to guarantee is that a better value of a metric always translates to better business performance for the company. Equipment efficiency measures are notorious for failing in this area, because maximizing them often leads to overproduction and WIP accumulation.
  4. The input data of the metric should be easy to collect. Lead time statistics, for example, require entry and exit timestamps by unit of production. The difference between these times then only gives you the lead time is calendar time, not in work time. The get lead times in work time, you then have to match the timestamps against the plant’s work calendar. Lead time information, however, can be inferred from WIP and WIP age data, which can be collected by direct observation of WIP on the shop floor. Metrics of
    WIP, therefore, contain essentially the same information but are easier to  calculate. (See Little’s Law.)
  5. All metrics should have the appropriate sensitivity. If daily fluctuations are not what is of interest, then they need to be filtered out. A common method for doing this is to plot 5-day moving averages instead of individual values — that is, the point plotted today is the average of the values observed in the last five days. Daily fluctuations are smoothed away, but weekly trends stand out.

Peter Drucker sold corporate America on the idea that you can’t manage what you can’t measure, and this has led many managers to believe that employees would do whatever it takes to maximize their scores. Given  flawed metrics, it if fortunate for the companies that these managers were wrong. If they had been right, all the  companies that measure productivity in terms of  Sales/Employee would actually have outsourced all production. They didn’t, because metrics are only one of many factors influencing behavior. Most employees, at all level, will not maximize their metrics through actions they feel violate common sense or are inconsistent with their personal ethics.

Taxis waiting for people and people waiting for taxis - reduced

Waiting for each other


We have all seen the absurd situation in the featured picture above of a line of customers waiting for taxis while a line of taxis next to them is waiting for customers, with a barrier separating them. This particular instance is from The Hopeful Traveler blog. The cabs are from London, but the same scene could have been shot in many other major world cities.

I am sure we have all encountered similar situations in other circumstances, which may or may not be easy to resolve. One particular case where it should be easy is the restaurant buffet. Figure 1 shows a typical scene in buffet restaurants, with a line of people waiting to get food all on the one side of the table, while food is waiting and accessible on the opposite side.

Figure 1. A typical buffet

I think the fundamental mistake is the assumption that a buffet is like an assembly line, providing sequential access to dishes. This means that you cannot get to the Alo Gobi until the person in front of you is done with the Tandoori.  The ideal buffet would instead provide random access, meaning that each customer would have immediate access to all dishes at all times. While it may not be feasible, you can get much closer to it than with the linear buffet. The following picture shows an alternative organization of a buffet in circular islands that is non-sequential.

Figure 2. A buffet island at the Holiday Inn in Visalia, CA

The limitation of this concept is that replenishment by waiters can interfere with customers. To avoid this, you would want dishes to be replenished from inside the circle while customers help themselves on the outside, as in the following sketch:

Figure 3. A buffet island with replenishment from inside

One problem with the circular buffet island, however, is its lack of modularity. You can add or remove whole  islands but you cannot expand or shrink an island, which you can if you use straight tables arranged in a U-shape, as in Figure 4.

Figure 4. Buffet island with straight tables

This buffet island may superficially look like a manufacturing cell, but it is radically different. Its purpose is random access to food as opposed to sequential processing of work pieces, and the materials do not flow around the cell but from the inside out.

Such are the thoughts going through my mind while munching on the Naan at Darbar.

Mining operations

Data Mining in Manufacturing versus the Web


Data mining, in general, is the retrieval of information from data collected for a different purpose, such as using sales transaction histories to infer what products tend to be bought together. By contrast, design of experiments  involves the collection of observations for the purpose of confirming or refuting hypotheses.

This perspective on data mining is consistent with the literature in expressing purpose, but most authors go further. They include in their definitions that data mining is done with computers, using large databases and specific analytical tools, which I think is too restrictive. The tools they list are the ones they have found useful in analyzing the behavior of millions of users of search engines or commerce websites, and they are not obviously applicable in other areas, such as manufacturing.

During World War II, British analysts used the serial numbers of captured or destroyed German tanks to estimate the numbers produced. Because serial numbers were not attached for this purpose, it was data mining. It used clever statistical models but, obviously, no computers.

Today, PhD-level data miners at Google, eBay, or Amazon sift through the  page views and click-throughs of millions of users for clues to patterns they can use. The data, automatically collected, is accurate and collected by the terabytes every day. This “big data” requires parallel processing on clusters of computers and lends itself to the most advanced analytical tools ever developed.

Compared to this fire hose of data, what manufacturing produces is a trickle. In a factory, the master data/technical specs, plans and schedules, status of operations and work in process, and the history of production over, say, 12 months, usually adds up to a few gigabytes. It doesn’t  fit on one spreadsheet, but it often does on a memory stick. On the other hand, much of it is still manually generated and therefore contains errors, and it is often structured in ways that make it difficult to work with.

Even if manufacturing companies could hire the data miners away from their current jobs, their experience with e-commerce or web search would not have prepared them well for the different challenges of manufacturing data mining.

There is an opportunity for data mining to contribute to competitiveness in manufacturing, but the approach must start from the needs. It must not be an e-commerce cure in search of manufacturing diseases.

Steam locomotive and typewriter

The steam locomotive and the typewriter


The first draft of my book  Working with Machines contained a chapter that was a post-mortem on two obsolete machines, which was cut on the grounds that, unlike all other chapters, it was not actionable for the reader.

Its abstract is as follows:

The steam locomotive and the typewriter are icons of the industrial age, and their parallel histories show different aspects of the human experience of working with machines. The steam locomotive is fondly remembered; the typewriter, all but forgotten except for the QWERTY keyboard. The steam engine participated in the development of every industrial economy, but the typewriter played no major role in Japan. The typewriter did not demonstrably improve the productivity or quality of office output, but was adopted only because of its image of modernity.

Locomotive driver was a prestigious position for a manual laborer, but typist never was. Compared to electrics and diesels, the steam locomotive had a cab that was exposed to the elements and to the heat of the firebox and therefore uncomfortable, difficult to operate, and dangerous. Yet engineers and firemen preferred it to the tedium and loneliness of modern locomotives. Automatic machines that require human attention only when they malfunction are also in airplanes and in manufacturing plants, challenging the job designer to keep the operator alert and used efficiently.

As the typewriter prints one keystroke at a time, typists were always busy with a single machine and determined both its productivity and output quality. Typists worked in comfortable places, but under pressure, and faced the long-term hazards of sedentary work. The typewriter’s main legacy is that a society can make a long-term investment in machines whose tangible benefits do not obviously exceed their costs.

Click here for a pdf file of the entire chapter.

Manual data collection at end of shift

A management perspective on data quality


Prof. Mei-chen Lo, of National University and Kainan University in Taiwan, worked with Operations Managers in two semiconductor companies to establish a list of 16 dimensions of data quality. Most  are not parameters that can be measured, and should be considered instead as questions to be asked about a company’s data. I learned it from her at an IE conference in Kitakyushu in 2009, and found it useful by itself as a checklist for a thorough assessment of a current state. Her research is about methods for ranking the importance of these criteria.

They are grouped in four main categories:

  1. Intrinsic. Agreement of the data with reality.
  2. Context.  Usability of the information in the data  to support decisions or solve problems.
  3. Representation. The way the data is structured, or not.
  4. Accessibility. The ability to retrieve, analyze and protect the data.

Each category breaks further down as follows:

  1. Intrinsic quality
    • Accuracy. Accuracy is the most obvious issue, and is measurable. If the inventory data says that slot 2-3-2 contains two bins of screws, then can we be confident that, if we walk to aisle 2, column 3, level 2 in the warehouse, we will actually find two bins of screws.
    • Fact or judgement. That slot 2-3-2 contains two bins of screws is a statement of fact. Its accuracy is in principle independent of the observer. On the other hand, “Operator X does not get along with teammates” is a judgement made by a supervisor and cannot carry the same weight as a statement of fact.
    • Source credibility. Is the source of the data credible? Credibility problems may arise due to the following:
      • Lack of training. For example, measurements that are supposed to be taken on “random samples” of parts are not, because no one in the organization knows how to draw a random sample.
      • Mistake-prone collection methods. For example, manually collected measurements are affected by typing errors.
      • Conflicts of interest. Employees collecting data stand to be rewarded or punished depending on the values of the data. For example, forecasters are often rewarded for optimistic forecasts.
    • Believability of the content. Data can unbelievable becauase it is valid news of extraordinary results, or because it is inaccurate. In either case, it warrants special attention.
  2. Context.
    • Relevance. Companies often collect data because they can, rather than because it is relevant. It is the corporate equivalent of looking for keys at night under the street light rather than next to the car. In the semiconductor industry, where this list of criteria was established, measurements are routinely taken after each step of the wafer process and plotted in control charts. This data is relatively easy to collect but of little relevance to the control and improvement of the wafer process as a whole. Most of the relevant data cannot be captured until the circuits can be tested at the end of the process.
    • Value added. Some of the data produced in a plant has a direct economic value. Aerospace or defense goods, for example, are delivered with documentation containing a record of their production process, and this data is part of the product. More generally, the data generated by commercial transactions, such as orders, invoices, shipping notices, or receipts, is at the heart of the company’s business activity. This is to be contrasted with data that is generated satisfy internal needs, such as, for example, the number of employees trained in transaction processing on the ERP system.
    • Timeliness. Is the data available early enough to be actionable? A field failure report on a product that is due to problems with a manufacturing process as it was 6 months ago is not timely if this process has been the object to two engineering changes since then.
    • Completeness. Measurements must be accompanied by all the data characterizing where, when, how and by whom they were collected and in what units they are expressed.
    • Sufficiency. Does the data cover all the parameters needed to support a decision or solve a problem?
  3. Representation
    • Interpretability. What inferences can you draw directly from the data? If the demand for an item has been rising 5%/month for the past 18 months, it is no stretch to infer that this trend will continue next month. On the other hand, if you are told that a machine has an Overall Equipment Effectiveness (OEE) of 35%, what can you deduce from it? The OEE is the product of three ratios: availability, yield, and actual over nominal speed. The 35% figure may tell you that there is a problem, but not where it is.
    • Ease of understanding. Management accounting exists for the purpose of supporting decision making by operations managers. Yet the reports provided to managers are often in a language they don’t understand. This does not have to be, and financial officers like Orrie Fiume have modified the vocabulary used in these reports to make them easier for actual managers to understand. The understandability of technical data can also be impaired when engineers use cryptics instead of plain language.
    • Conciseness. A table with 100 columns and 20,000 rows with 90% of its cells empty is a verbose representation of a sparse matrix. A concise representation would be a list of the rows and columns IDs with values.
    • Consistency. Consistency problems often arise as a result of mergers and acquisition, when the different data models of the companies involved need to be mashed together.
  4. Accessibility
    • Convenience of access. Data that an end-user can retrieve directly through a graphic interface is conveniently accessible; data in paper folders on library shelves is not. Neither are databases in which each new query requires the development of a custom report by a specially trained programmer.
    • Usability. High-usability data, for example, comes in the form of lists of property names and values can easily be tabulated into spreadsheets or database tables, and, from that point on, selected, filtered and summarized in a variety of informative ways. Low-usability data often comes in the form of a string of characters, that first needs to be separated, with character 1 to 5 being one field, 6 to 12 another, etc., and the meaning of each of these substrings needs to be retrieved from a correspondence table, to find that ’00at3′ means “lime green.”
    • Security. Manufacturing data contain some of the company’s intellectual property, which must be protected not only from theft but from inadvertent alterations by unqualified employees. But effective security must also be provided efficiently, so that qualified, authorized employees are not slowed down by security procedures when accessing data.

Prof. Mei-Chen Lo’s research on this topic was published in The assessment of the information quality with the aid of multiple criteria analysis
European Journal of Operational Research, Volume 195, Issue 3, 16 June 2009, Pages 850-856

Dashboard assembly stations

Why production matters


On LinkedIn Lean Business Process group, Ralph Bartelmann asked the following:

Is it really the matter to squeeze the last cent out of production ? In many environments production costs represent a minor part of the over all product cost. Following Pareto reasoning it seems more reasonable to work on other parts of the value stream like supplier developpement, product design etc. What is your opinion and experience ? What are the real challenges ?

Following is my answer:

It’s not about what production costs  but about what it does for the business. Improving production is about making is faster, better, safer, less tedious,… and cheaper.  It needs to be faster to make you more responsive, better so that production does not introduce defects that harm your reputation, safer and less tedious so that you can retain your work force and grow its skills.

If you improve on all these fronts, guess what? Your costs go down, and not only in production but in other parts of the value stream too, because they are not independent of production. For example, there is no point in trying to develop just-in-sequence suppliers unless you practice leveled-sequencing (a.k.a. Heijunka)  in your assembly line.

A manufacturing company ignoring production is an army ignoring combat on the grounds that more money is spent moving soldiers and keeping them supplied.

To get to Dublin, you don't start from here

Kaizen without prerequisites


Question from Pedro Burgos: You have an assembly line where ‘Standardized Work’ is not utilized. The operators have ‘work instructions’ and follow them. They do not miss steps. But there are differences between operators in how the work is performed in terms of sequence. All of them perform under takt time, some more than others. Due to ‘wastes’ of motion. You are asked to do a Kaizen to reduce inventory quantities and space. Inventory quantities on the production floor varies, some parts have 3 days of inventory, some have 2 days, some have 1 day (16 hours). The goal is a ‘kanban’ of 2 hours of inventory in the assembly line. Do you think is possible to do this Kaizen? (Discussion in the LinkedIn Global Lean & Six Sigma Network)

Your question reminds me of the old story about asking a farmer in the countryside of Ireland how to get to Dublin, and getting the answer that “you don’t start from here.” In fact, regardless of its current state, a production line can always be improved.

Looking further at the way you describe the challenge, however, several things disturb me. First, while Kaizen, meaning improvement, is always possible, “a Kaizen” seems to designate a Kaizen event, a project template which is not always appropriate. Second, your description suggests opportunities for productivity and quality improvement, but neither is stated as an objective.

A top-down mandate to use a particular method to pursue two goals like inventory and space reduction does not strike me as a recipe for success. Instead, someone needs to do the following:
1. Take a close look at this assembly line.
2. Identify improvement opportunities in all dimensions of its performance.
3. Select one or two to pursue in priority based on improvement potential, technical feasibility, and human feasibility.
4. Determine the project management approach based on content, rather than the other way around.

This approach is much more likely to achieve the desired inventory and space reductions, among other improvements, than just running a Kaizen event to pursue these objectives directly and exclusively.

Continuous improvement from bolts to clamps

The Role of Technology in Continuous Improvement


In an article on this topic in Industry Week today, Ralph Keller asserts that Continuous Improvement is focused on business processes rather than technology.

However, if you wrap tinfoil around the feet of a welding fixture to make it easier to clean, replace bolts with clamps on a machine to reduce setup time, or mount a hand tool on the machine on which it is used, it usually counts as Continuous Improvement but involves technical changes to work that I don’t think anyone would describe as business processes.

Yes, Continuous Improvement is done without expensive technology, but it does involve cheap technology.

Ralph Keller also reminds us that Continuous Improvement is not “rocket science,” which implies that it is easier. I agree that it is different, but not easier. I don’t know any rocket scientist with the skills to facilitate Continuous Improvement.