From the Harvard Business Review, a look at how unreliable the kinds of performance measures being implemented in education are, and why business is abandoning the practice.

Microsoft has decided to dump the practice of rating individuals’ performance on a numerical scale – a decision I applauded in a recent post. I argued that such rating systems don’t accomplish the task managers expect from them, which is to accelerate the performance of their people. At best, they serve other goals: allocating compensation fairly, and aligning each individual’s goals with the values and strategies of the company.

However, even if these were sufficient goals, managers would still be frustrated by how poorly ratings-based Human Capital Management (HCM) systems achieves them. Here are the two intractable problems with today’s approach.

False Precision

All current HCM systems are based on the notion that a manager can be guided to become a reliable rater of another person’s strengths and skills. The assumption is that, if we give you just the right scale, and just the right words to anchor that scale, and if we tell you to look for certain behaviors, and to rate this person a “4” if you see these behaviors frequently, and a “3” if you see them less frequently, then, over time, you and your fellow managers will become reliable raters of other people’s performance. Indeed, your ratings will come to have such high inter-rater reliability (meaning that two managers would give the same employee’s performance the same rating) that the company will use your ratings to pinpoint low performers, promote top performers, and pay everyone.

Unfortunately there is no evidence that this happens. Instead, an overwhelming amount of evidence shows that each of us is a horribly unreliable rater of another person’s strengths and skills. It appears that, when it comes to rating someone else, our own strengths, skills, and biases get in the way and we end up rating the person not on some wonderfully objective scale, but on our own scale. Our rating of the other person simply answers the question: “Does she have more or less of this strength or skill than I do?” If she does, her rating is high; if she doesn’t, it is low. Thus our rating is really a rating of us, not of her.

Some companies have tried to neutralize this effect by training the manager how to look for specific clues to the desired strength or skill. This may result in managers becoming more observant, but it doesn’t turn them into better raters. This inability to rate reliably is so entrenched that even when organizations spend millions of hours and dollars training up a roster of experts whose only job is rating, they still don’t get the reliability they seek.

As an example, over the last few years every US state has done precisely that. Each state created a cadre of experts to evaluate, in extraordinary detail, the performance of teachers. One would have expected variation, with some good teachers, some not so good, and some differently good reflected in a range of ratings from the experts. But as The New York Times reported earlier this year, the results of these ratings have revealed alarmingly little variation. These expert raters are simply not very reliable.

Scour the literature and you will discover similar studies all confirming our struggles with rating the strengths and skills of others. Our ratings of others certainly look precise. They look like objective data. But they aren’t. They offer precision, but it is a false precision. So when we decide to promote someone based upon their “4” rating, or when we say that a certain choice assignment is open only to those employees who scored an “exceeds expectations” rating, or when we pay someone based on these ratings, or suggest a particular training course based upon them, we are making decisions on bad data. Earlier this month, in a spirited defense of the forced curve, Jack Welch advocated rating people on lists of competencies so that you can, in his words, “let them know where they stand.” This is a worthy sentiment, but given how poor we are as raters, competency ratings will only ever serve to confuse people as to where they stand. As they say in the data world: “Garbage in, garbage out.”

Bad practice, streamlined

We know how great managers manage. They define very clearly the outcomes they want, and then they get to know the person in as much detail as possible to discover the best way to help this person achieve the outcomes. Whether you call this an individualized approach, a strengths-based approach, or just common sense, it’s what great managers do.

This is not what our current performance management systems do. They ignore the person and instead tell the manager to rate the person on a disembodied list of strengths and skills, often called competencies, and then to teach the person how to acquire the competencies she lacks. This is hard, and not just the rating part. The teaching part is supremely tricky — after all, what is the best way to help someone learn how to be a better “strategic thinker” or to display “learning agility?” In recognition of just how hard this is, current performance management systems attempt to streamline the process by supplying the manager with writing tips on how to phrase feedback about the person’s competencies, or lack thereof, and then by integrating the competency rating with the company’s Learning Management System so that it spits out a training course to fix a particular competency “gap.”

The problem with all of this is not just the lack of credible research proving that the best performers possess the entire list of competencies, or any showing that if you acquire competencies you lack, your performance improves – or even that, as I described above, managers are woefully inaccurate at rating the competencies of others. No, the chief problem with all of this is that it is not what the best managers actually do.

They don’t look past the real person to a list of theoretical competencies. Instead the person, with her unique mix of strengths and skills, is their singular focus. They know they can’t ignore the individual. After all, the person’s messy uniqueness is the very raw material they must mold, shape, and focus in order to create the performance they want. Cloaking it with a generic list of competencies is inherently counter-productive.

Some say that we need to rate people on their competencies because this creates “differentiation,” a necessary practice of great companies. Of course they are right in theory — companies need to be able to differentiate between their people. But the practice is outdated. Differentiation cannot mean rating people on a pre-set list of competencies. These competencies are, by definition, formulaic and so they will actually serve to limit differentiation. True differentiation means focusing on the individual — understanding the strengths of each individual, setting the right expectations for each individual, recognizing the individual, putting the right career plan together for the individual. This is what the best managers do today. They seek to understand, and capitalize on the whole individual. This is hard enough to do when you work with the person every day. It’s nigh on impossible when you are expected to peer through the filter of a formula.

Telegraph Trumps Pony Express

In 1850 it took the average piece of mail five weeks to travel from St. Joseph, Missouri to the California coast. This was frustrating, since in 1848 somebody had discovered gold in the California hills and the wild and crazy rush was on. America was moving west and needed a much more efficient, streamlined way to communicate with its West Coast, full of riches. The Pony Express was the answer. Four hundred horses. A hundred and fifty small wiry riders. Two hundred stations, and the innovation of lightweight, leather cantinas to carry the mail westward. It was a fantastically complicated arrangement requiring careful forethought, detailed planning, and not inconsiderable daring. And, having woven together this complicated system, the inventors managed to streamline the process so well that, on its very first journey, what was once a five-week trek turned into a ten-day sprint from St. Joe to Sacramento. Speeches were made, fireworks fired, a great innovation was celebrated.

And then, Baron Pavel Schilling destroyed it all.

He didn’t do it deliberately of course. But he did invent the telegraph. And with that one invention, that one concept, he created a new worldview, one that rendered obsolete the entire system that they had worked so hard to streamline.

Our current performance management systems are the Pony Express — worthy efforts to streamline a labor-intensive, time-consuming, and unnecessarily complicated process. Who is our Baron Schilling? Well let’s give that role to Microsoft’s Lisa Brummel, the executive who declared “no more ratings.”

And then there’s the biggest question. What’s the telegraph? A topic for the next post.