OEA Response to PD and NPR Teacher shaming

Here's the statement from the Ohio Education Association, which represents over 121,000 educators

Responding to a series of newspaper, web and radio stories on value-added scroes of individual Ohio teachers, Patricia Frost-Brooks, President of the Ohio Education Association criticized the fairness of the stories and the wisdom of using value-added scores as such a prominent index of teacher success:

"The Ohio Education Association was not contacted for comment on the Plain Dealer/StateImpact Ohio stories, despite our expertise, which would have provided desperately needed context and perspective. Reporters and editors admitted this value-added data was 'flawed,' but they chose surprise and impact over fairness, balance and accuracy," Frost-Brooks said.

"We are all accountable for student success – teachers, support professionals, parents, students and elected officials. And the Ohio Education Association is committed to fair teacher evaluation systems that include student performance, among other multiple measures. But listing teachers as effective or ineffective based on narrow tests not designed to be used for this purpose is a disservice to everyone.

"Value-added ratings can never paint a complete or objective picture of an individual teacher’s work or performance. Trained educators can use a student’s value-added data, along with other student data, to improve student instruction. But the stories promote a simplistic and inaccurate view of value-added as a valid basis for high-stakes decisions on schools, teachers and students."

Very questionable that reporters would not contact the largest teachers assoication in crafting their story.

Shame on the PD and NPR

When the Cleveland Plain Dealer and NPR decided to publish the names of 4,200 Ohio teachers and their value-added grades, their reasoning was specious and self-serving. Most of all, it is damaging to the teaching profession in Ohio.

Despite pointing out all the flaws, caveats, and controversies with the use of value-add as a means to evaluate teachers, both publications decided to go ahead and shame these 4,200 teacher anyway. The publication of teachers names and scores isn't new. It was first done by the LA Times, and was a factor in the suicide of one teacher. The LA Times findings and analysis was then discredited

The research on which the Los Angeles Times relied for its August 2010 teacher effectiveness reporting was demonstrably inadequate to support the published rankings. Using the same L.A. Unified School District data and the same methods as the Times, this study probes deeper and finds the earlier research to have serious weaknesses.


The Plain Dealer analysis is weaker than the LA Times, relying on just 2 years worth of data rather than 7. In fact, the Pleain Dealer and NPR stated they only published 4,200 teachers scores and not the 12,000 scores they had data for because most only had 1 years worth of data. A serious error as value-add is known to be highly unreliable and subject to massive variance.

Beyond the questionable statistical analysis, the publication of teachers names and value-added scores has been criticized by a great number of people, including corporate education reformer Bill Gates, in NYT op-ed titled "Shame Is Not the Solution"

LAST week, the New York State Court of Appeals ruled that teachers’ individual performance assessments could be made public. I have no opinion on the ruling as a matter of law, but as a harbinger of education policy in the United States, it is a big mistake.

I am a strong proponent of measuring teachers’ effectiveness, and my foundation works with many schools to help make sure that such evaluations improve the overall quality of teaching. But publicly ranking teachers by name will not help them get better at their jobs or improve student learning. On the contrary, it will make it a lot harder to implement teacher evaluation systems that work.

Gates isn't the only high profile corporate education reformer who is critical of such shaming, Wendy Knopp, CEO of Teach for America has also spoken out against the practice

Kopp is not shy about saying what she'd do differently as New York City schools chancellor. While the Bloomberg administration is fighting the United Federation of Teachers in court for the right to release to the news media individual teachers' "value added" ratings—an estimate of how effective a teacher is at improving his or her students' standardized test scores—Kopp says she finds the idea "baffling" and believes doing so would undermine trust among teachers and between teachers and administrators.

"The principals of very high performing schools would all say their No. 1 strategy is to build extraordinary teams," Kopp said. "I can't imagine it's a good organizational strategy to go publish the names of teachers and one data point about whether they are effective or not in the newspaper."

Indeed, if the editors of the Plain Dealer and NPR had read their own reporting, they would have realized the public release of this information was unsound, unfair and damaging. Let's look at the warning signs in their own reporting

...scores can vary from year to year.

Yet they relied upon only 1 years worth of data for much of their analysis, and just 2 for the teachers whose names they published.

...decided it was more important to provide information — even if flawed.

How can it be useful to the layperson to be provided with flawed information? Why would a newspaper knowingly publish flawed information?

...these scores are only a part of the criteria necessary for full and accurate evaluation of an individual teacher.

And yet they publish 4,200 teachers value-added scores based solely on value add, which at best makes up only 35% of a teachers evaluation. Lay people will not understand these scores are only a partial measurment of a teachers effectiveness, and a poor one at that.

...There are a lot of questions still about the particular formula Ohio.

Indeed, so many questions that one would best be advised to wait until those questions are answered before publically shaming teachers who were part of a pilot program being used to answer those questions.

...variables beyond a teacher’s control need to be considered in arriving at a fair and accurate formula.

Yet none of these reporters considered any of these factors in publishing teachers names, and readers will wholly miss that necassary context.

...The company that calculates value-added for Ohio says scores are most reliable with three years of data.

Again, the data is unreliable, especially with less than 3 years worth of data, yet the Plain Dealer and NRP decided they should shame teachers using just 2 years worth of data.

...Ohio’s value-added ratings do not account for the socioeconomic backgrounds of students, as they do in some other states.

How many "ineffective" teachers are really just working in depressed socioeconomic classrooms? The reporters seem not to care and publish the names anyway.

...Value-added scores are not a teacher’s full rating.

No where in the publication of these names are the teachers full ratings indicated. This again leaves lay-people and site visitors to think these flawed value-added scores are the final reflection of a teachers quality

...ratings are still something of an experiment.

How absurd is the decision to publish now seeming? Shaming people on the basis of the results of an experiement! By their very nature experiments can demonstrate something is wrong, not right.

...The details of how the scores are calculated aren’t public.

We don't even know if the value-added scores are correct and accurate, because the formula is secret. How can it be fair for the results of a secret forumla be public? Did that not rasie any alarm bells for the Plain Dealer and NPR?

...The department’s top research official, Matt Cohen, acknowledged that he can’t explain the details of exactly how Ohio’s value-added model works.

But somehow NPR listeners and Cleveland Plain Dealer readers are supposed to understand the complexities, and read the necessary context into the publication of individual teacher scores?

...StateImpact/Plain Dealer analysis of initial state data suggests.

"Initial", "Suggests". They have decided to shame teachers without properly vetting the data and their own analysis - exactly the same problem the LA Times ran into that we highlighted at the top of this article.

It doesn't take a lot of "analysis" to understand that a failing newspaper needed controversy and eyeballs and that their decision to shame teachers was made in their own economic interests and not that of the public good. In the end then, the real shame falls not on teachers who are working hard everyday often in difficult situations made worse by draconian budget cuts, endless political meddling, and student poverty - but on the editors of these 2 publications for putting their own narrow self-interest above that of Ohio's children.

It's a disgrace that they ought to make 4,200 apologies for.

On Teacher Evaluation: Slow Down And Get It Right

One of the primary policy levers now being employed in states and districts nationwide is teacher evaluation reform. Well-designed evaluations, which should include measures that capture both teacher practice and student learning, have great potential to inform and improve the performance of teachers and, thus, students. Furthermore, most everyone agrees that the previous systems were largely pro forma, failed to provide useful feedback, and needed replacement.

The attitude among many policymakers and advocates is that we must implement these systems and begin using them rapidly for decisions about teachers, while design flaws can be fixed later. Such urgency is undoubtedly influenced by the history of slow, incremental progress in education policy. However, we believe this attitude to be imprudent.

The risks to excessive haste are likely higher than whatever opportunity costs would be incurred by proceeding more cautiously. Moving too quickly gives policymakers and educators less time to devise and test the new systems, and to become familiar with how they work and the results they provide.

Moreover, careless rushing may result in avoidable erroneous high stakes decisions about individual teachers. Such decisions are harmful to the profession, they threaten the credibility of the evaluations, and they may well promote widespread backlash (such as the recent Florida lawsuits and the growing “opt-out” movement). Making things worse, the opposition will likely “spill over” into other promising policies, such as the already-fragile effort to enact the Common Core standards and aligned assessments.

[readon2 url=""]Continue reading...[/readon2]

The Science of Value-Added Evaluation

"A value-added analysis constitutes a series of personal, high-stakes experiments conducted under extremely uncontrolled conditions".

If drug experiments were conduted like VAM we might all have 3 legs or worse

Value-added teacher evaluation has been extensively criticized and strongly defended, but less frequently examined from a dispassionate scientific perspective. Among the value-added movement's most fervent advocates is a respected scientific school of thought that believes reliable causal conclusions can be teased out of huge data sets by economists or statisticians using sophisticated statistical models that control for extraneous factors.

Another scientific school of thought, especially prevalent in medical research, holds that the most reliable method for arriving at defensible causal conclusions involves conducting randomized controlled trials, or RCTs, in which (a) individuals are premeasured on an outcome, (b) randomly assigned to receive different treatments, and (c) measured again to ascertain if changes in the outcome differed based upon the treatments received.

The purpose of this brief essay is not to argue the pros and cons of the two approaches, but to frame value-added teacher evaluation from the latter, experimental perspective. For conceptually, what else is an evaluation of perhaps 500 4th grade teachers in a moderate-size urban school district but 500 high-stakes individual experiments? Are not students premeasured, assigned to receive a particular intervention (the teacher), and measured again to see which teachers were the more (or less) efficacious?

Granted, a number of structural differences exist between a medical randomized controlled trial and a districtwide value-added teacher evaluation. Medical trials normally employ only one intervention instead of 500, but the basic logic is the same. Each medical RCT is also privy to its own comparison group, while individual teachers share a common one (consisting of the entire district's average 4th grade results).

From a methodological perspective, however, both medical and teacher-evaluation trials are designed to generate causal conclusions: namely, that the intervention was statistically superior to the comparison group, statistically inferior, or just the same. But a degree in statistics shouldn't be required to recognize that an individual medical experiment is designed to produce a more defensible causal conclusion than the collected assortment of 500 teacher-evaluation experiments.

How? Let us count the ways:

  • Random assignment is considered the gold standard in medical research because it helps to ensure that the participants in different experimental groups are initially equivalent and therefore have the same propensity to change relative to a specified variable. In controlled clinical trials, the process involves a rigidly prescribed computerized procedure whereby every participant is afforded an equal chance of receiving any given treatment. Public school students cannot be randomly assigned to teachers between schools for logistical reasons and are seldom if ever truly randomly assigned within schools because of (a) individual parent requests for a given teacher; (b) professional judgments regarding which teachers might benefit certain types of students; (c) grouping of classrooms by ability level; and (d) other, often unknown, possibly idiosyncratic reasons. Suffice it to say that no medical trial would ever be published in any reputable journal (or reputable newspaper) which assigned its patients in the haphazard manner in which students are assigned to teachers at the beginning of a school year.
  • Medical experiments are designed to purposefully minimize the occurrence of extraneous events that might potentially influence changes on the outcome variable. (In drug trials, for example, it is customary to ensure that only the experimental drug is received by the intervention group, only the placebo is received by the comparison group, and no auxiliary treatments are received by either.) However, no comparable procedural control is attempted in a value-added teacher-evaluation experiment (either for the current year or for prior student performance) so any student assigned to any teacher can receive auxiliary tutoring, be helped at home, team-taught, or subjected to any number of naturally occurring positive or disruptive learning experiences.
  • When medical trials are reported in the scientific literature, their statistical analysis involves only the patients assigned to an intervention and its comparison group (which could quite conceivably constitute a comparison between two groups of 30 individuals). This means that statistical significance is computed to facilitate a single causal conclusion based upon a total of 60 observations. The statistical analyses reported for a teacher evaluation, on the other hand, would be reported in terms of all 500 combined experiments, which in this example would constitute a total of 15,000 observations (or 30 students times 500 teachers). The 500 causal conclusions published in the newspaper (or on a school district website), on the other hand, are based upon separate contrasts of 500 "treatment groups" (each composed of changes in outcomes for a single teacher's 30 students) versus essentially the same "comparison group."
  • Explicit guidelines exist for the reporting of medical experiments, such as the (a) specification of how many observations were lost between the beginning and the end of the experiment (which is seldom done in value-added experiments, but would entail reporting student transfers, dropouts, missing test data, scoring errors, improperly marked test sheets, clerical errors resulting in incorrect class lists, and so forth for each teacher); and (b) whether statistical significance was obtained—which is impractical for each teacher in a value-added experiment since the reporting of so many individual results would violate multiple statistical principles.

[readon2 url=""]Continue reading...[/readon2]

What I’ve learned so far

A guest post by Robert Barkley, Jr.

What I’ve learned so far – as of November 19, 2012

In February of 1958 I began student teaching in a small rural Pennsylvania town. Approximately one month into that experience my master teacher was drafted into the military. And since there were no other teachers in my field in that small district, I was simply asked to complete the school year as the regular teacher.

From that day on I have been immersed in public education at many levels, in several states – even in Canada and with some international contacts, as well as from many vantage points. So some 54 and a half years later, here’s what I have learned so far.

  1. There will be no significant change in education until and unless our society truly and deeply adopts a sense of community attitude. And a sense of community is first and foremost based upon an acceptance that we all belong together – regardless of wealth, race, gender, etc.
  2. The views of amateurs, otherwise known as politicians and private sector moneyed interests, while they may be genuine and well intentioned, are, at best, less than helpful if unrestrained by the views of the professionals working at ground level. Put another way, the view from 30,000 feet may give a broad sense of how the system looks, but the view from street level gives a sense of how the system actually works. Neither is wrong, but both are inadequate by themselves.
  3. Moneyed interests such as test and textbook manufactures and charter school enthusiasts will destroy general education for they have little commitment to the general welfare and common good
  4. No institution or organization will excel until and unless it adopts at all levels a shared sense of purpose – a central aim if you will, and agrees upon how progress toward that purpose will be measured over time. Education is no different.
  5. At the basic levels all education must begin with the recognition and nurturing of the natural curiosity and the current reality of each student.
  6. Teaching is a team sport. In other words, the structure and general practice in schools of teachers operating as independent sources of instruction is flawed. Anything that exacerbates this flawed structure, such as test score ratings of individual teachers and/or individual performance pay schemes, will be harmful and counterproductive.
  7. The separation of knowledge into separate disciplines may be convenient to organizing instruction but it is counter to the construction of learning. Therefore, integrated curriculum strategies are essential if neuroscience is to be appreciated and taken into account.
  8. School employee unions can be useful or problematic to educational progress. Which they become is dependent upon their full inclusion in determining the structure and purpose of education. The more they are pushed to the sidelines, the more their focus will be narrow and self-serving.

Robert Barkley, Jr., is retired Executive Director of the Ohio Education Association, a thirty-five year veteran of NEA and NEA affiliate staff work. He is the author of Quality in Education: A Primer for Collaborative Visionary Educational Leaders, Leadership In Education: A Handbook for School Superintendents and Teacher Union Presidents, and Lessons for a New Reality: Guidance for Superintendent/Teacher Organization Collaboration. He may be reached at

Teaching as team sport

A gues post by Robert Barkley

Yes, you read that title right. Traditional schools are structured and managed as if teachers were individual performers. Evidence and common sense say that's far from being the case.

Given the recent furor over the Chicago teacher strike and the accompanying union bashing that dominates the mainstream media, we'd do well to give thought to what can be learned from successful schools around the globe.

We talk much about American exceptionalism. A key element of that exceptionalism is our deep-seated belief in the merits of competition. So thoroughly have we adopted the notion that market forces inevitably lead to superior performance, we have great difficulty accepting the fact that schools that emphasize collegial relationships, encourage shared faculty planning, and make use of cooperative approaches to designing and implementing teaching and learning strategies, routinely outpace those that stress competition.

Most teachers know this intuitively, although too few articulate it well. Professional organizations, unions, school administrators, and schools of education are also familiar with the research and conclusions based on experience, but are no more successful than individual teachers at getting the message across. The narrow preoccupation with raising test scores at the expense of all else seems to have so rattled educators they can’t get their sensible messages out.

The need to work together is a major reason why private sector pressure to rate and pay teachers on the basis of test scores and other individual performance measures is a huge mistake. Predictably—given political reliance on corporate funding for campaigns—neither Republicans nor Democrats are willing to listen to educators. Vouchers, choice, charters, merit pay, school closings and “turnarounds,” and other silver bullets being fired by politicians and rich entrepreneurs block dialogue that could be productive if they came to the issues open to the possibility that the hundreds of thousands who actually do the work might just possibly know something about how to do it best.

Corporate fascination with competitiveness notwithstanding, in teaching and learning, competitiveness is almost always counterproductive. It blocks a host of useful strategies for evaluating performance, gets in the way of freely sharing good ideas, and wastes the benefits of knowing one is part of a team, the work of which will inevitably be smarter than that of individual members.

It’s ironic that teamwork—an idea the merit of which is taken for granted on factory floors and playing fields, in neighborhoods and families, and just about everywhere else that humans try to be productive—is seen as counterproductive in classrooms. Within companies managers want employees to collaborate with colleagues. An accountant sitting next to a fellow accountant is required to work with that person. No one wants the two of them to compete, withhold trade secrets, and crush the other by the end of the day.

Finding scapegoats, fixing blame for poor performance on a percentage of teachers or on a few individuals, has an appealing simplicity about it, but it’s a lazy, simplistic, misguided approach to improving system performance. As management experts have been pointing out for decades, if a system isn’t performing, it almost always means there’s a system problem. Since teachers have almost no control over the systems of which they are a part, it’s necessary to make the most of a bad situation, and the easiest way to do that is to capitalize on their collective wisdom. If they’re being forced to compete against each other, there’s no such thing as collective wisdom.

For a generation, under the banner of standards and accountability, teachers have been criticized, scorned, denigrated, maligned, blamed. Accountability in education as indicated by standardized test scores is no more about individual teacher performance than accountability in health care as indicated by patient temperatures is about individual nurse performance.

I’m not making excuses for poor educator performance. Teachers should be held accountable for identifying, understanding, and applying practices that produce the highest level of student achievement. Administrators should be held accountable for creating an environment that encourages the identification, understanding and sharing of effective practices. Schools of education should be held accountable for whatever improves the institution.

But the new reformers aren't interested in improvement, just replacement. Management experts say, "Don't fix blame; fix the system." Just about everyone in the system would love to help do that if given the opportunity, but the opportunity hasn’t been offered, so nothing of consequence changes.

Case in point: The Chicago teachers’ strike. Rahm Emanuel, like the rest of the current “reformers,” came to the table having bought the conventional wisdom in Washington and state capitols that educators either don’t know what to do or aren’t willing to do it. He obviously went to Chicago with the same tired suspicion of teachers, the same belief that they’re the problem rather than the key to a real solution, the same confrontational, competitive stance.

Will we ever learn? Don’t hold your breath.

Robert Barkley, Jr., is retired Executive Director of the Ohio Education Association, a thirty-five year veteran of NEA and NEA affiliate staff work. He is the author of Quality in Education: A Primer for Collaborative Visionary Educational Leaders, Leadership In Education: A Handbook for School Superintendents and Teacher Union Presidents, and Lessons for a New Reality: Guidance for Superintendent/Teacher Organization Collaboration. He may be reached at