shame

Shame on the PD and NPR

When the Cleveland Plain Dealer and NPR decided to publish the names of 4,200 Ohio teachers and their value-added grades, their reasoning was specious and self-serving. Most of all, it is damaging to the teaching profession in Ohio.

Despite pointing out all the flaws, caveats, and controversies with the use of value-add as a means to evaluate teachers, both publications decided to go ahead and shame these 4,200 teacher anyway. The publication of teachers names and scores isn't new. It was first done by the LA Times, and was a factor in the suicide of one teacher. The LA Times findings and analysis was then discredited

The research on which the Los Angeles Times relied for its August 2010 teacher effectiveness reporting was demonstrably inadequate to support the published rankings. Using the same L.A. Unified School District data and the same methods as the Times, this study probes deeper and finds the earlier research to have serious weaknesses.

DUE DILIGENCE AND THE EVALUATION OF TEACHERS by National Education Policy Center

The Plain Dealer analysis is weaker than the LA Times, relying on just 2 years worth of data rather than 7. In fact, the Pleain Dealer and NPR stated they only published 4,200 teachers scores and not the 12,000 scores they had data for because most only had 1 years worth of data. A serious error as value-add is known to be highly unreliable and subject to massive variance.

Beyond the questionable statistical analysis, the publication of teachers names and value-added scores has been criticized by a great number of people, including corporate education reformer Bill Gates, in NYT op-ed titled "Shame Is Not the Solution"

LAST week, the New York State Court of Appeals ruled that teachers’ individual performance assessments could be made public. I have no opinion on the ruling as a matter of law, but as a harbinger of education policy in the United States, it is a big mistake.

I am a strong proponent of measuring teachers’ effectiveness, and my foundation works with many schools to help make sure that such evaluations improve the overall quality of teaching. But publicly ranking teachers by name will not help them get better at their jobs or improve student learning. On the contrary, it will make it a lot harder to implement teacher evaluation systems that work.

Gates isn't the only high profile corporate education reformer who is critical of such shaming, Wendy Knopp, CEO of Teach for America has also spoken out against the practice

Kopp is not shy about saying what she'd do differently as New York City schools chancellor. While the Bloomberg administration is fighting the United Federation of Teachers in court for the right to release to the news media individual teachers' "value added" ratings—an estimate of how effective a teacher is at improving his or her students' standardized test scores—Kopp says she finds the idea "baffling" and believes doing so would undermine trust among teachers and between teachers and administrators.

"The principals of very high performing schools would all say their No. 1 strategy is to build extraordinary teams," Kopp said. "I can't imagine it's a good organizational strategy to go publish the names of teachers and one data point about whether they are effective or not in the newspaper."

Indeed, if the editors of the Plain Dealer and NPR had read their own reporting, they would have realized the public release of this information was unsound, unfair and damaging. Let's look at the warning signs in their own reporting

...scores can vary from year to year.

Yet they relied upon only 1 years worth of data for much of their analysis, and just 2 for the teachers whose names they published.

...decided it was more important to provide information — even if flawed.

How can it be useful to the layperson to be provided with flawed information? Why would a newspaper knowingly publish flawed information?

...these scores are only a part of the criteria necessary for full and accurate evaluation of an individual teacher.

And yet they publish 4,200 teachers value-added scores based solely on value add, which at best makes up only 35% of a teachers evaluation. Lay people will not understand these scores are only a partial measurment of a teachers effectiveness, and a poor one at that.

...There are a lot of questions still about the particular formula Ohio.

Indeed, so many questions that one would best be advised to wait until those questions are answered before publically shaming teachers who were part of a pilot program being used to answer those questions.

...variables beyond a teacher’s control need to be considered in arriving at a fair and accurate formula.

Yet none of these reporters considered any of these factors in publishing teachers names, and readers will wholly miss that necassary context.

...The company that calculates value-added for Ohio says scores are most reliable with three years of data.

Again, the data is unreliable, especially with less than 3 years worth of data, yet the Plain Dealer and NRP decided they should shame teachers using just 2 years worth of data.

...Ohio’s value-added ratings do not account for the socioeconomic backgrounds of students, as they do in some other states.

How many "ineffective" teachers are really just working in depressed socioeconomic classrooms? The reporters seem not to care and publish the names anyway.

...Value-added scores are not a teacher’s full rating.

No where in the publication of these names are the teachers full ratings indicated. This again leaves lay-people and site visitors to think these flawed value-added scores are the final reflection of a teachers quality

...ratings are still something of an experiment.

How absurd is the decision to publish now seeming? Shaming people on the basis of the results of an experiement! By their very nature experiments can demonstrate something is wrong, not right.

...The details of how the scores are calculated aren’t public.

We don't even know if the value-added scores are correct and accurate, because the formula is secret. How can it be fair for the results of a secret forumla be public? Did that not rasie any alarm bells for the Plain Dealer and NPR?

...The department’s top research official, Matt Cohen, acknowledged that he can’t explain the details of exactly how Ohio’s value-added model works.

But somehow NPR listeners and Cleveland Plain Dealer readers are supposed to understand the complexities, and read the necessary context into the publication of individual teacher scores?

...StateImpact/Plain Dealer analysis of initial state data suggests.

"Initial", "Suggests". They have decided to shame teachers without properly vetting the data and their own analysis - exactly the same problem the LA Times ran into that we highlighted at the top of this article.

It doesn't take a lot of "analysis" to understand that a failing newspaper needed controversy and eyeballs and that their decision to shame teachers was made in their own economic interests and not that of the public good. In the end then, the real shame falls not on teachers who are working hard everyday often in difficult situations made worse by draconian budget cuts, endless political meddling, and student poverty - but on the editors of these 2 publications for putting their own narrow self-interest above that of Ohio's children.

It's a disgrace that they ought to make 4,200 apologies for.

Shaming teachers

The efforts by corporate education reformers to shame teachers by publishing value-add scores and evaluations is coming under mounting pressure. First Bill Gates penned an op-ed in the NYT titled "Shame Is Not the Solution, now comes 2 new pieces. The first is research from the National Education Policy Center, that finds the LA Times controversial efforts to shame California's teachers was grossly error ridden

In its second attempt to rank Los Angeles teachers based on “value-added” assessments derived from students’ standardized test scores, the Los Angeles Times has still produced unreliable information that cannot be used for the purpose the newspaper intends, according to new research released today by the National Education Policy Center, housed at the University of Colorado Boulder

Dr. Catherine Durso of the University of Denver studied the newspaper’s 2011 rankings of teachers and found that they rely on data yielding results that are unstable from year to year. Additionally, Durso found that the value-added assessment model used by the Times can easily impute to teachers effects that may in fact result from outside factors, such as a student’s poverty level or the neighborhood in which he or she lives.

“The effect estimate for each teacher cannot be taken at face value,” Durso writes. Instead, each teacher’s effect estimate includes a large “error band” that reflects the probable range of scores for a teacher under the assessment system.

“The error band . . . for many teachers is larger than the entire range of scores from the ‘less effective’ to ‘more effective’ designations provided by the LA Times,” Durso writes. As a consequence, the so-called teacher-linked effect for individual teachers “is also unstable over time,” she continues.
[...]
These failings have rendered the Times’ rankings not merely useless, but potentially harmful, according to Alex Molnar, NEPC’s publications director and a research professor at the University of Colorado Boulder.

“The Los Angeles Times has added no value to the discussion of how best to identify and retain the highest-quality teachers for our nation’s children,” Molnar says. “Indeed, it has made things worse. Based on this flawed use of data, parents are enticed into thinking their children’s teachers are either wonderful or terrible.”

“The Los Angeles Times editors and reporters either knew or should have known that their reporting was based on a social science tool that cannot validly or reliably do what they set out to quantify,” Molnar said. “Yet in their ignorance or arrogance they used it anyway, to the detriment of children, teachers, and parents.”

Their full report can be read here. Meanwhile in New York, which has long been at the cutting edge of corporate ed reform efforts has passed legislation that would eliminate this kind of teacher shaming

Senate Republicans agreed to take up Cuomo’s bill on the final day of the session. The bill will make public all teacher evaluations, without names attached. Parents would then be able to obtain the specific evaluations of their own child’s teacher. Assembly Democrats had already agreed to pass it. Senate Majority Leader Dean Skelos says it’s a reasonable compromise.

“It strikes a good balance between parents’ right to know and some form of confidentially,” Skelos said. Some GOP Senators were concerned that the bill would inadvertently result in the disclosure of the identities of teachers in small rural schools.

Senate Education Chair John Flanagan calls it a “work in progress,” and says the message of intent accompanying the bill will attempt to make clear the need to protect teacher privacy. “I’m hoping that if you’re in a small school and they release data by class, subject and grade that there’s some type of interpretation to protect people’s privacy,” said Flanagan.

Ohio's legislature should pass similar efforts in Ohio.

Shame, errors and demoralizing

Shame, errors and demoralizing, just some of the emerging rhetoric being used since the NYT and other publications went ahead and published teacher level value add scores. A great number of articles have been written decrying the move.

Perhaps most surprising of all was Bill Gates, in a piece titled "Shame Is Not the Solution". In it, Gates argues

Value-added ratings are one important piece of a complete personnel system. But student test scores alone aren’t a sensitive enough measure to gauge effective teaching, nor are they diagnostic enough to identify areas of improvement. Teaching is multifaceted, complex work. A reliable evaluation system must incorporate other measures of effectiveness, like students’ feedback about their teachers and classroom observations by highly trained peer evaluators and principals.

Putting sophisticated personnel systems in place is going to take a serious commitment. Those who believe we can do it on the cheap — by doing things like making individual teachers’ performance reports public — are underestimating the level of resources needed to spur real improvement.
[...]
Developing a systematic way to help teachers get better is the most powerful idea in education today. The surest way to weaken it is to twist it into a capricious exercise in public shaming. Let’s focus on creating a personnel system that truly helps teachers improve.

Following that, Matthew Di Carlo at the Shanker institute took a deeper look at the data and the error margins inherent in using it

First, let’s quickly summarize the imprecision associated with the NYC value-added scores, using the raw datasets from the city. It has been heavily reported that the average confidence interval for these estimates – the range within which we can be confident the “true estimate” falls – is 35 percentile points in math and 53 in English Language Arts (ELA). But this oversimplifies the situation somewhat, as the overall average masks quite a bit of variation by data availability.
[...]
This can be illustrated by taking a look at the categories that the city (and the Journal) uses to label teachers (or, in the case of the Times, schools).

Here’s how teachers are rated: low (0-4th percentile); below average (5-24); average (25-74); above average (75-94); and high (95-99).

To understand the rocky relationship between value-added margins of error and these categories, first take a look at the Times’ “sample graph” below.

That level of error in each measurement renders the teacher grades virtually useless. But that was just the start of the problems, as David Cohen notes in a piece titled "Big Apple’s Rotten Ratings".

So far, I think the best image from the whole fiasco comes from math teacher Gary Rubinstein, who ran the numbers himself, a bunch of different ways. The first analysis works on the premise that a teacher should not become dramatically better or worse in one year. He compared the data for 13,000 teachers over two consecutive years and found this – a virtually random distribution:

First of all, as I’ve repeated every chance I get, the three leading professional organizations for educational research and measurement (AERA, NCME, APA) agree that you cannot draw valid inferences about teaching from a test that was designed and validated to measure learning; they are not the same thing. No one using value-added measurement EVER has an answer for that.

Then, I thought of a set of objections that had already been articulated on DiCarlo’s blog by a commenter. Harris Zwerling called for answers to the following questions if we’re to believe in value-added ratings:

1. Does the VAM used to calculate the results plausibly meet its required assumptions? Did the contractor test this? (See Harris, Sass, and Semykina, “Value-Added Models and the Measurement of Teacher Productivity” Calder Working Paper No. 54.)
2. Was the VAM properly specified? (e.g., Did the VAM control for summer learning, tutoring, test for various interactions, e.g., between class size and behavioral disabilities?)
3. What specification tests were performed? How did they affect the categorization of teachers as effective or ineffective?
4. How was missing data handled?
5. How did the contractors handle team teaching or other forms of joint teaching for the purposes of attributing the test score results?
6. Did they use appropriate statistical methods to analyze the test scores? (For example, did the VAM provider use regression techniques if the math and reading tests were not plausibly scored at an interval level?)
7. When referring back to the original tests, particularly ELA, does the range of teacher effects detected cover an educationally meaningful range of test performance?
8. To what degree would the test results differ if different outcome tests were used?
9. Did the VAM provider test for sorting bias?

Today, education historian Diane Ravitch published a piece titled "How to Demoralize Teachers", which draws all these problems together to highlight how counter productive the effort is becoming

Gates raises an important question: What is the point of evaluations? Shaming employees or helping them improve? In New York City, as in Los Angeles in 2010, it's hard to imagine that the publication of the ratings—with all their inaccuracies and errors—will result in anything other than embarrassing and humiliating teachers. No one will be a better teacher because of these actions. Some will leave this disrespected profession—which is daily losing the trappings of professionalism, the autonomy requisite to be considered a profession. Some will think twice about becoming a teacher. And children will lose the good teachers, the confident teachers, the energetic and creative teachers, they need.
[...]
Interesting that teaching is the only profession where job ratings, no matter how inaccurate, are published in the news media. Will we soon see similar evaluations of police officers and firefighters, legislators and reporters? Interesting, too, that no other nation does this to its teachers. Of course, when teachers are graded on a curve, 50 percent will be in the bottom half, and 25 percent in the bottom quartile.

Is this just another ploy to undermine public confidence in public education?

It's hard to conclude that for some, that might very well be the goal.