Metrics in Lean – Alternatives to Rank-and-Yank in Evaluating People

The July, 2012 edition of Vanity Fair had a cover story about Microsoft and the damage caused to the company by its stack ranking system, also known as rank-and-yank. This story caused a flurry of responses in the press, including an article in Forbes defending the practice. The idea is not just that employees should be evaluated individually and ranked, but that the bottom few performers should be mercilessly culled from the work force, thus bringing up the overall level, tightening the performance range, and motivating survivors for the next round. This assumes that the performance of employees can meaningfully be reduced to a single number, and that living in permanent fear of losing your livelihood is the best motivation for improvement.

According to another Forbes article, GE, the company that championed Rank-and-Yank in the 1980s, abandoned it in 2005. Another once celebrated showcase of this method is the notorious Enron, and now Microsoft is under fire because of it.  Yet, according to the same article, the practice is now widespread among large American corporations, hidden under a variety of names. But where is the evidence that it actually works in the long term?

In fact, stack-ranking is the polar opposite of the Lean approach to human resources, and it needs to be said, explained, and repeated. In this spirit, this post covers the following:

  1. What is Rank-and-Yank?
  2. How Rank-and-Yank would apply in the Tour de France.
  3. The effect of Rank-and-Yank on organizational behavior.
  4. How you should evaluate people.

What is Rank-and-Yank?

The Microsoft model is officially called Vitality Curve. In its latest version, since April, 2011, the model ranks employees in 5 buckets of pre-defined size:

  • 20% are outstanding
  • 20% are above average
  • 40% are average
  • 13% are fair.
  • 7% are poor.

In every department, regardless of what it does, the manager is expected to simply rank the members, with far-reaching consequences. All compensation is pre-defined based on the bucket, and employees in the bottom bucket are ineligible to move positions with the understanding they will soon be fired. And, apparently, this is applied in every department, regardless of what it does.

According to former GE CEO Jack Welch, every team has 20% of A-players, 70% of B-players, and 10% of C-players, who contribute nothing, procrastinate, fail to deliver on promises, and therefore should be fired. This is is predicated on the assumption that the performance of individuals follow a bell curve, as in Figure 1:

Figure 1. Stack ranking bell curve (from Nagesh Belludi’s blog)

Underlying Figure 1 are the following assumptions:

  1. Performance is one-dimensional and numeric.
  2. In any given population, it is normally distributed.

Both are obviously false. It is actually difficult to contrive an example where they might hold. For example, you might do the following:

  1. Choose an objectively measurable task, performed individually, such as shoveling dirt.
  2. To perform the task, pick a random sample of the population, to make sure the participants have no special skills.

The quantities of dirt shoveled by individuals might follow a normal distribution, or bell curve. Then you may well be able to group them in A, B, and C categories, and put in the C categories people to whom you would rather not ask to do this task again. And the proportion of people who fail to meet the minimum standard will vary between samples.

There are many circumstances in which you encounter the bell curve, from the distribution of IQ test scores to temperature profiles in a solid in which heat penetrates by conduction from a point source. It doesn’t mean that every variable fits a bell curve. You never assume it does. Instead, you examine the data to determine whether it is reasonable model, and run statistical tests to confirm it.

To anyone trained in statistics, the idea of manipulating data to force them onto a  given curve is anathema. But it is exactly what rank-and-yank. It mandates that there should be 10% of C-players, in corporate situations where, usually, the following holds:

  1. The people are not a random sample of the population but a group of trained professionals, recruited and vetted for special skills.
  2. They work in teams, not as individuals in parallel.
  3. Performance is multidimensional. There is no single, objective performance metric with which to rank them.

According to Jack Welch, A-players are as follows:

“The As are people who are filled with passion, committed to making things happen, open to new ideas from anywhere, and blessed with lots of runway ahead of them.  They have the ability to energize not only themselves, but everyone who comes in contact with them.  They make business productive and fun at the same time.”

(Straight from the Gut, page 158)

I am not quite sure what “lots of runway ahead” means, and there are many jobs, including in management, that must be done but that no sane person would consider fun. Otherwise, there are domains in which you can expect the recruitment process to have selected entire teams of A-players. It does not mean that you cannot or should not evaluate them, but Rank-and-Yank may not be the best way to do it.

Rank-and-Yank applied to the Tour de France

Tour de France riders on the Champs Elysees

Known as the toughest bicycle race on the planet, the Tour de France has the following special characteristics that make it usable as a metaphor for performance ranking in other businesses:

  1. The participants are not a random sample of the population but the best riders in the world.
  2. They are measured individually by their total time through all the stages of the race. It is an objective, numeric metric, and the only one that matters in the end.
  3. They work in teams during the race.

Individual rider performance

The 2012 Tour de France covered 2,173 miles in 20 stages. The winner , also know as the yellow jersey, was Englishman Bradley Wiggins, who covered this distance in a total of 87 hours, 34 minutes and 47 seconds. The last finisher, known as the red lantern, arrived 3 hours, 57 minutes and 36 seconds later, meaning that it took him to 4.5% longer than the winner. Through the paving stones of the North and the mountain passes of the Alps and the Pyrenees, the yellow jersey averaged 24.82 mph; the red lantern, 23.74 mph. These numbers are not only high but close.

Figure 2 shows a histogram of how long behind the yellow jersey each rider was, also known as his “gap,” in 10-minute bins. It clearly takes too much imagination to see a bell curve.

Figure 2. Histogram of rider gaps in 10-minute bins, Tour de France 2012

We could just leave it at that and conclude right away that the gaps are not normally distributed. Just to make sure, let us call in the heavy statistical artillery. Another way to examine the data is through the cumulative distributions of the gaps, as in Figure 3,  for the 2012 and 2011 Tours de France. These curves are built on the raw data, which avoids the aliasing due to bin size in histograms. The actual data are in burgundy, and the normal distributions in blue. Fitting the normal model involve taking the average and standard deviation of the actual data from the 153 riders for 2012 and 166 from 2011. Theoretically, all tests should therefore be based on the Student t-distribution rather than the Normal distribution. The consensus of statisticians, however, is that it makes no difference when you have more than 50 data points.

Figure 3. Cumulative distribution of gaps in Tour de France performance

The actual data for both years have curves that are sufficiently similar to assume it is not a coincidence, and neither fits the normal distribution. This is not obvious at first sight, but it is when you consider the ends and apply the back-of-the-envelope  test. In the 2012 race, for example, the normal distribution model shows a probability of 1.7% for a  rider to beat the Yellow Jersey and 4.7% to lose to the Red Lantern. With the normal model, the probability that 153 independent riders will all be behind the Yellow Jersey and ahead of the Red Lantern is (1-1.7\%-4.7\%)^{153} = 0.0039\% , which shows that the model does not fit the data.

There are two possible reasons for the gaps not being normally distributed that immediately come to mind:

  1. The Tour de France riders are not a random sample of human beings riding bikes but the best in the world.
  2. They race in teams, not individually.

While records for cycling speed or endurance are readily available on the web, data on ordinary riders are not, although you might expect cities with millions of bicycle commuters to have some. As a consequence, I have not been able to check their speed distributions. On the other hand, information about teams can be retrieved from the Tour de France website.

Team performance

Although riders are ranked individually, they work in teams during the race. In 2012, they were in 22 teams, named after corporate sponsors. Each team has one star rider, considered a contender, that all other riders as expected to support. For example, the supporting riders take turns in front of the contender so that he can ride in their wake. With the energy they have left, the supporting riders can draw attention to themselves by winning stages, or by escaping ahead of the pack for an hour or two during a stage.

Bradley Wiggins’s team in 2012,  SKY PROCYCLING, had kept only one rider from the 2011 team. On the other hand, the team of 2011 winner Cadel Evans, BMC Racing Team, kept 6 of its 9 riders for 2012, and we can compare their performances year-to-year, as  in Figure 4:

Figure 4. BMC Racing Team results in 2011 and 2012

While the 2011 winner fell behind, all other returning team members improved their standings, and in particular the last one. In 2011, Marcus Burghardt was only three positions and 12 minutes ahead of the red lantern;  in 2012, he placed 58th, 1 hour and 43 minutes behind the winner. Had Rank-and-Yank been applied, he would have been branded a C-player.

In Jack Welch’s words, “C-players are non-producers. They are likely to “enervate” rather than “energize”, according to Serge Hovnanian’s model. Procrastination is a common trait of C-players, as well as failure to deliver on promises.”  But can this description possibly apply to a professional rider who finishes the Tour de France? Following is how CyclingNews described Burghardt’s performance in 2011:

“One of Cadel Evans’ domestiques at the 2011 Tour de France, Marcus Burghardt, was instrumental in the Australian’s overall victory. The tall German Classics rider is a powerful rouleur and set the pace for the BMC Racing Team whenever it was needed, protecting the squad’s sole leader throughout the three-week race. Having succeeded in the team’s goal to win the Tour de France, Evans gave one of the plush lions that go with the yellow jersey to Burghardt as a present for his baby girl. This will be one of the 28-year-old’s greatest accomplishments as a rider, […]”

The overall time metric obviously does not tell the whole story. What other ways are there to prove a rider’s value? The Tour de France offers two consolation prizes, the green jersey for the highest number of stage wins, and the polka-dot jersey for the best mountain climber.  Escapes during stages are not recognized by the Tour de France because they have no effect on the race itself if the pack catches up before the end of the stage. They are, however, valued by team sponsors, whose logos on team jerseys are exposed to  TV cameras during escapes. But none of the above indicates how good a team player Marcus Burghardt was.

Here we go from the objective measurement of sport performance, like the total race time, to the subjective assessment that a rider was a good team player because others say he was, and it leads us back to what happens in businesses other than sports.

The effect of Rank-and-Yank on behavior

Your work force may range from an R&D lab populated with PhDs representing the world’s top talent in the domain to a crew of previously unknown day laborers recruited that morning. The notion that the same performance evaluation model could apply throughout a large company is absurd on the face of it and we should not have to even discuss it. Since, however, it is done anyway, it is worth pondering the impact it has on attitudes and behavior.

Rank-and-Yank turns work life into a permanent game of high-stakes musical chairs. Where there are clear metrics, as in shoveling dirt or racing, at least individuals can predict and affect outcomes. Everywhere else, the evaluations are based on subjective assessments, prone to favoritism, and perceived by employees as unfair. The Vanity Fair article shows employees using strategies to protect themselves against this process that are counterproductive for the company, including, for example:

  • Nurturing their image with respect to anyone with influence on their ranking instead of producing output.
  • Withholding key information that might help colleagues, while pretending to collaborate with them.

There is anecdotal evidence that Rank-and-Yank has the same effects in other companies. On a subject this political, with variations across companies in the way the approach is implemented, objective assessments are not easy to find. The overwhelming majority of blog posts is negative, but bloggers are a self-selected group.

How you should evaluate people

In Out of the Crisis, Deming branded evaluation by performance, merit rating, or annual review of performance as a deadly disease. Unfortunately, in every company, management has to make decisions on employee raises, bonuses, stock grants, promotions, transfers,… based on some form of evaluation. Since Deming was also critical of Management By Objectives (MBO), I think his statement on reviews should be viewed in this context.

The less formal the evaluation is, the more arbitrary it is, and the more unfair it is perceived to be, leading employees to distrust the company, disengage with it, and leave it at the first opportunity. It was central to Alfred P. Sloan‘s approach to make General Motors into a company managed through processes that employees could trust. In the 1920s, it was a contrast with rival Ford, which had no such processes in place until a generation later when the Whiz Kids — including  Robert McNamara and Arjay Miller — implemented them after World War II.

The challenge

In general, a fair and objective review process is essential if you want to retain talent, which means that you cannot have a learning organization without it. A learning organization is an organization whose members learn. As a term learning organization can easily mislead us into thinking that organizations have knowledge outside the heads of their members. They don’t. All they have otherwise is a library of data, that only turns into knowledge when a member reads it and checks it against reality, and this is a process that restarts from scratch with every new hire. For an organization to learn and to retain knowledge, it must retain its people, and it doesn’t without a review process that they not only perceive to be fair but also helps them manage their careers, so that they have an idea of what they can accomplish by staying and aligning their interests with those of the company.

While Rank-and-Yank is overly harsh, a system can also fail by giving good reviews to everybody, making the company like Garrison Keillor’s Lake Wobegon, “where all the children are above average.” It is a trap that conflict-averse managers easily fall into, leading to complacency and demoralizing high-achievers.

The challenge, therefore, is to have a review process that recognizes high performance and encourages all to emulate it, without creating an environment where employees work in constant fear of losing their jobs for no reason they can understand.

Who gets reviewed?

Companies like Boeing, Unilever, GE or the old GM, are known to have programs to nurture young employees identified as having executive potential by giving them access to special training, rotating them through various jobs designed to help them broaden their understanding of the company and grow a network of relationships. The rest of the professional staff receives less attention, and shop floor workers none at all, unless they have problems.

One rarely discussed feature of the Toyota Production System (TPS) is the way in which it extended the review and career planning process to all permanent employees, including production operators. Not even the Japanese literature says much about, possibly because it is less unique to Toyota than other aspects of TPS.

Lifetime employment, limited to men under 60, is a practice introduced in Japan after World War II. The men working under this arrangement were supplemented by temporary contractors, retirees, and young women expected to marry and resign. The social origin of candidates was also a factor in hiring, in that, for example, large companies were reluctant to hire heirs to family businesses, based on the assumption that they would leave. Labor mobility was one-way, from larger to smaller organizations. You could leave government service for large companies, or large companies for smaller ones, but not the other way around.

Since the 1990s, the economic crisis has in some places broken the lifetime employment practice, while the evolution of Japanese society has weakened gender, social and ethnic discrimination. But the legacy of the postwar system is that most companies and their employees still have a stronger bond than in the US. The system is self-perpetuating within Japan in many ways. Since there are few opportunities for mid-career job-hopping to an equally prestigious company, few employees can do it, and those who do have a hard time integrating with cohorts that have joined right after school. Also, rotation between jobs inside the company develops skills and relationships that make an employee more valuable to the company but not outside of it.

Long-term retention actually requires you to plan the careers of employees at all levels of education and talent, and it means production operators along with engineers or managers. If a permanent employee cannot keep up with the work, you have to find another way to use his or her  talents.

The review process and its consequences

Fairness may be in the eye of the beholder, but, in this case, the beholder matters. Figure 5 shows a case where the employee does not agree with the process, which we can imagine was a meeting behind closed doors between the pointy haired manager and his superiors. The process should instead be open, and give the employee the opportunity to defend his or her record.

Figure 5. Dilbert’s 10/13/1996 strip

In Lean management, along with leading Kaizen activity, performance review and career planning is a major task for supervisors. To keep the personal chemistry between an employee and a supervisor from unduly influencing outcomes, formal reviews are carried out by panels rather than individuals. In some companies, the employee selects one of the panel members. Formal reviews occur at least twice a year, and cover both hard and soft skills, meaning both the technical ability to carry out tasks and the managerial ability to work with others, contribute to or lead projects, and communicate.

Among production operators, you encounter people who, outside of work, may have management roles in clubs and religious or political organizations. At work, however, daily operations do not afford them much opportunity to show this kind of talent, but Kaizen activity does, and this is another reason it is so essential.

It is a multidimensional assessment and includes an analysis of each employee’s ambitions and of the means of realizing them. That a company should do this for anyone is alien to the Silicon Valley culture, and any that did would be a prime target for raiding by competitors. But it is a tradition for professional staff in older, established companies even in the US. Doing it for a machinist who wants to become a maintenance technician, however, is unheard of. It involves rotating such a person around the shop to get acquainted with a broad variety of machines, while arranging for the required training and certification in mechanics, lubrication, power and controls over, say, 5 years, to allow a smooth transition into the role of maintenance technician.

Of course, the opportunity to fulfill an employee’s ambition is contingent on the availability of positions. Not all members of a cohort can be promoted into the management hierarchy, and even technical positions are in limited supply. For the professional staff, the best known, and worst response, is to turn managers who have been passed over into “window people,” who have a desk near a window to wait for retirement without any specific assignment. Another option is to send them as “fallen angels” to take on management responsibilities at subsidiaries or suppliers. On the shop floor, some operators remain in production for their entire career, but the review process ensures that the abilities they accumulate are recognized in their wages, and, towards the end of their careers, through titles like Master Craftsman, that honor their contributions and designate them as a source of advice for younger workers.

Another consideration is the scope of differences in compensation introduced by the review process. The compensation offered by a company is a means of communication. Extreme differences in rewards for small differences in performance happens in a race like the Tour de France, but should not among people who work together inside a company, lest they focus on competing with one another instead of rival companies. The rewards must be large enough to communicate the company’s appreciation and encouragement to keep up the good work, yet small enough that employees do not turn into bounty hunters. Effective teamwork should be acknowledged by team rewards; outstanding individual performance within a team, by an additional individual reward.