Industry 4.0: Without Stable Processes, Nothing Works | Wiegand’s Watch

This is a translation of Bodo Wiegand’s latest newsletter, about Lean in Germany, followed by my comments:

This week I was with a company that is on its way to implement industry 4.0. All machines were networked. The manager could see from his desk which machines were running and which were not. All data were collected centrally and also shown locally to the machine operator. The trend was easy to see. One third of the machines had a malfunction. With an average OEE of 62%, the machines do not always run.

“As long as we buy new machines, we have to live with this,” was his answer to my question.

But, it was not only the newest, but also the older machines that don’t need to be smeared with oil and dirty, even even while generating chips. Provided on request, the Fire-Fighting-factor reported to us by the maintenance technicians was above 75%. The chief knew exactly: 76.6%. An OEE of 62% and 76.6% Firefighting means in plain language: In this business, there is no stable processes.

But what drives intelligent managers then to link his whole company, only to find that the processes are unstable? With some thought they could have discovered this without networking and invested first in stabilizing the processes. Introducing Industry 4.0 For industry on unstable processes will fail. The crucial question: how I manage to stabilize the processes and avoid unplanned shutdowns?

The stability of the manufacturing process itself is the first requirement  for the introduction of Industry 4.0. For this purpose it is necessary to measure the stability of the process plant and secure it by appropriate measures. The 6σ toolbox here offers countless possibilities. But where do I start? Which system comes after? Here the procedure of prioritizing investment from Lean Maintenance has proven itself, however, with the additional component of the causality principle. That is, based on the value stream, we have identified  system X as priority 1, and now we need to find the root cause of the instability. It should be noted that this can also be caused outside the system, for example in incoming materials.

The process stability is important not only for quality but also for the ability to plan. If my processes are unstable, I cannot plan properly. All plans for unstable processes are usually discarded every hour and therefore disrupt production. They cause delays and additional setups,  increase safety stocks, cause more transportation — all first-degree waste.

This brings us to the second approach: Avoiding unscheduled machine downtime. The value-stream oriented Lean Maintenance concept aims to bring maintenance downtime to zero as a maintenance strategy, and to establish an optimal maintenance organization to make processes predictable and reliable, while reducing maintenance costs. Concentrating on the most important equipment for the value-added process and increasing predictability to at least 95% helps conserve resources and make Maintenance into an important factor for Production.

The Lean Maintenance concept builds on the value stream, gives priority to the systems that are important for delivery to the customer at takt. The first priority, of course, is the bottleneck. For these machines, the following applies: if possible, no unplanned outages.

Now, how do you achieve this? For this,  we have developed a classification for the components of these asset classes in terms of potential damage. That we break down each machine into its components and calculate the probability of failure of these components. If the failure of a component means causes unplanned machine downtime, we develop an appropriate maintenance strategy for this component to avoid it.

For example, for a component that wears out, you measure the degree of wear by means of sensors in order to determine the correct maintenance time. If this is not possible, you exchange these components preventively based on experience, according to a plan. With electrical components that fail unpredictably, you apply a strategy of redundancy for the important systems. To make damage from systems predictable and thus avoidable, we have measurement instruments like thermal imagers, vibration and acoustic gauges. It is about avoiding unplanned shutdowns to achieve predictability. If shutdowns still occur, everything must be done to keep the time to repair under the takt time.

In addition, setting priorities among systems and classifying failure effects is the basis to develop strategies for the spare parts inventory and tooling.

To me, stable and predictable processes are a prerequisite for the realization of the vision of Industry 4.0. Companies that ignore this may be networked, but won’t be able to reap what they have sown.

Michel Baudin‘s comments:

Wiegand’s main point is that it’s not worth networking all your machines just to find out how many of them are down on the average. In fact, for a manager to see this information on a screen in the office is not an improvement over seeing it as red Andon lights on the shop floor. This is especially true if it is expressed as OEE (Overall Equipment Effectiveness), an aggregate measure of performance that is not easy to interpret. In practice, to take any action, you end up having to break it down into its constituent factors of availability, performance, and quality.

Where Wiegand’s argument is less compelling is when he suggests that investing in advanced information systems is never a good idea when your challenge is stabilizing processes. While presented in generic terms, his example is about machines that get oily and dirty and produce “chips,” by which he means the bits of metal that come off when you turn, mill, drill or grind a piece of metal, not the electronic chips that power your smart phone. So he is really talking about machine shops and machine tools.

If you actually make electronic chips in a semiconductor wafer fabrication facility, you are using machines performing deposition, patterning, and doping processes based on solid state physics, loaded with sensors and driven by embedded controllers that are themselves powerful computers, networked with higher-level supervisory controllers and Manufacturing Execution Systems (MES). Together with the analytical tools attached to testers for finished chips, these systems are key to the stabilization of the process, known in the industry as “yield enhancement.”

The key reason why, contrary to machining, semiconductor wafer processes require these systems is that they never get a chance to mature. High-volume production starts at yields of 10% and, by the time you reach 60% or 70%, the chips you make with this process are obsolete and you need to start on the next generation. This is an example of a strategically important sector where the issues of process stability differ from machine shops. We must be wary of blanket statements on manufacturing in general.

Wiegand then evokes Lean Maintenance as an approach to both stabilizing processes and avoiding unplanned downtime. He describes this as focusing on identifying the root cause of instability on the bottleneck machines. Bottleneck machines, by definition, throttle production, but their problems may not be the easiest ones to solve for the available maintenance team, and focusing on them may lead to neglect other machines until they deteriorate enough to become bottlenecks themselves, which is not the desired outcome.

To eliminate unplanned downtime, Wiegand then suggests applying Failure-Mode-Effect-Analysis (FMEA) on the bottleneck machines to target modules or subsystems for redundancy, so that the stand-by unit kicks in when the current one fails, and allows the machine to keep operating until the next planned maintenance shutdown. This sounds very much like Reliability-Centered Maintenance (RCM), an approach developed at United Airlines in the late 1960s when it took delivery of the first Boeing 747s and applied since then to nuclear power and chemical plants as well as aircraft maintenance.

In the same spirit, in a machining center, putting the same tool in two different pockets lets you cut with it for twice the tool life before replacing it. In general, however, FMEA is the only tool from RCM that is not widely known in manufacturing, if not extensively used.

What I would understand under “Lean Maintenance” would instead start with the Autonomous Maintenance component of TPM, which involves delegating the simplest maintenance tasks, like switching lightbulbs, to production operators, thereby getting the small problems solved faster while freeing up technicians to address bigger challenges. Then you organize a group of maintenance first-responders with responsibility for all the equipment in a section of the plant, cross-trained to solve the most common mechanical, electrical, plumbing or control problems. The first responders are then backed up by specialists in one of these trades to address the more difficult problems.

In principle, information systems should substantially help make maintenance effective. However, the focus of the commercially available software, called Computerized Maintenance Management Systems (CMMS) or Enterprise Asset Management (EAM), to this day, appears to be on work order management, as opposed to the technical content of the maintenance work. In health care, your clinic’s information system holds your history of symptoms, diagnoses, treatments and outcomes, in a way that (1) helps your physician understand what is ailing you on your next visit, and (2) by aggregating patient data, identify which bugs are running around. It is focused on medical content, with administrative information on visit scheduling and billing as a by-product.

Maintenance technicians, by contrast, do not turn to their CMMS or EAM system to figure out what’s wrong with a machine. These systems are in the category of business, not engineering software. They put out performance metrics for managers, not diagnostic help for technicians. They may locate spare parts once a technician has identified a need, but are no help in relating error messages, gauge readings and direct observation to failure causes.

As a consequence, the administrative data is not automatically generated as a by-product of the work  itself, but instead requires transactions that add labor and are usually entered by a clerk at the end of the shift rather than in real time. This hurts the quality of the data, particularly regarding the use of technicians’ time. The maintenance department must account for the use of its technicians, and it is more a matter of allocating their time to various jobs than recording how much each job actually consumed.

I believe the reason software suppliers focus on administration is that these functions are generic to maintenance, and needed regardless of the nature of the object being maintained. This is why you see customer lists that include food processors, steel mills, research labs, and municipalities. Technical content, on the other hand, is specific. A system addressing the diagnosis of rolling mill failures would have a much smaller niche market. A generic system to diagnose equipment would have to be populated with specific equipment models for each application. It is more difficult to develop and more work to implement.

In maintenance, I agree with Wiegand that most companies “don’t reap what they have sown,” but I don’t think that lack of stability in the process is the reason. Maintenance information systems could be help stabilize processes, and the reason they don’t is their focus on work order management.