scores

Ohio Value-added measures poverty

Congratulations Ohio corporate education reformers, you have discovered yet another way to measure poverty. Unfortunately you seem to believe this is also a good way to evaluate teachers.

Value-added was supposed to be the great equalizer -- a measure of schools that would finally judge fairly how much poor students are learning compared with their wealthier peers.

Meant to gauge whether students learn as much as expected in a given year, value-added will become a key part of rating individual teachers from rich and poor districts alike next school year.

But a Plain Dealer/StateImpact Ohio analysis raises questions about how much of an equalizer it truly is, even as the state ramps up its use.

The 2011-12 value-added results show that districts, schools and teachers with large numbers of poor students tend to have lower value-added results than those that serve more-affluent ones.

Of course there are going to be defenders of the high stakes sweepstakes

"Value-added is not influenced by socioeconomic status," said Matt Cohen, the chief research officer at the Ohio Department of Education. "That much is pretty clear."

That is the same Matt Cohen who admitted he is no expert and has no clue how Value-add is calculated

The department’s top research official, Matt Cohen, acknowledged that he can’t explain the details of exactly how Ohio’s value-added model works. He said that’s not a problem.

“It’s not important for me to be able to be the expert,” he said. “I rely on the expertise of people who have been involved in the field.” 

Perhaps if Mr Cohen became more familiar with the science and the data he would realize that:

  • Value-added scores were 2½ times higher on average for districts where the median family income is above $35,000 than for districts with income below that amount.
  • For low-poverty school districts, two-thirds had positive value-added scores -- scores indicating students made more than a year's worth of progress.
  • For high-poverty school districts, two-thirds had negative value-added scores -- scores indicating that students made less than a year's progress.

  • Almost 40 percent of low-poverty schools scored "Above" the state's value-added target, compared with 20 percent of high-poverty schools.
  • At the same time, 25 percent of high-poverty schools scored "Below" state value-added targets while low-poverty schools were half as likely to score "Below."

  • Students in high-poverty schools are more likely to have teachers rated "Least Effective" -- the lowest state rating -- than "Most Effective" -- the highest of five ratings. The three ratings in the middle are treated by the state as essentially average performance.

Is there really any doubt what is truly being measured here? Ohio's secret Value-added formula is good at measuring poverty, not teacher effectiveness.

We predict districts and administrators and those connected to the development of Value-added measures are going to be deluged with lawsuits once high stakes decisions are attached to the misguided application of these diagnostic scores.

Shame on the PD and NPR

When the Cleveland Plain Dealer and NPR decided to publish the names of 4,200 Ohio teachers and their value-added grades, their reasoning was specious and self-serving. Most of all, it is damaging to the teaching profession in Ohio.

Despite pointing out all the flaws, caveats, and controversies with the use of value-add as a means to evaluate teachers, both publications decided to go ahead and shame these 4,200 teacher anyway. The publication of teachers names and scores isn't new. It was first done by the LA Times, and was a factor in the suicide of one teacher. The LA Times findings and analysis was then discredited

The research on which the Los Angeles Times relied for its August 2010 teacher effectiveness reporting was demonstrably inadequate to support the published rankings. Using the same L.A. Unified School District data and the same methods as the Times, this study probes deeper and finds the earlier research to have serious weaknesses.

DUE DILIGENCE AND THE EVALUATION OF TEACHERS by National Education Policy Center

The Plain Dealer analysis is weaker than the LA Times, relying on just 2 years worth of data rather than 7. In fact, the Pleain Dealer and NPR stated they only published 4,200 teachers scores and not the 12,000 scores they had data for because most only had 1 years worth of data. A serious error as value-add is known to be highly unreliable and subject to massive variance.

Beyond the questionable statistical analysis, the publication of teachers names and value-added scores has been criticized by a great number of people, including corporate education reformer Bill Gates, in NYT op-ed titled "Shame Is Not the Solution"

LAST week, the New York State Court of Appeals ruled that teachers’ individual performance assessments could be made public. I have no opinion on the ruling as a matter of law, but as a harbinger of education policy in the United States, it is a big mistake.

I am a strong proponent of measuring teachers’ effectiveness, and my foundation works with many schools to help make sure that such evaluations improve the overall quality of teaching. But publicly ranking teachers by name will not help them get better at their jobs or improve student learning. On the contrary, it will make it a lot harder to implement teacher evaluation systems that work.

Gates isn't the only high profile corporate education reformer who is critical of such shaming, Wendy Knopp, CEO of Teach for America has also spoken out against the practice

Kopp is not shy about saying what she'd do differently as New York City schools chancellor. While the Bloomberg administration is fighting the United Federation of Teachers in court for the right to release to the news media individual teachers' "value added" ratings—an estimate of how effective a teacher is at improving his or her students' standardized test scores—Kopp says she finds the idea "baffling" and believes doing so would undermine trust among teachers and between teachers and administrators.

"The principals of very high performing schools would all say their No. 1 strategy is to build extraordinary teams," Kopp said. "I can't imagine it's a good organizational strategy to go publish the names of teachers and one data point about whether they are effective or not in the newspaper."

Indeed, if the editors of the Plain Dealer and NPR had read their own reporting, they would have realized the public release of this information was unsound, unfair and damaging. Let's look at the warning signs in their own reporting

...scores can vary from year to year.

Yet they relied upon only 1 years worth of data for much of their analysis, and just 2 for the teachers whose names they published.

...decided it was more important to provide information — even if flawed.

How can it be useful to the layperson to be provided with flawed information? Why would a newspaper knowingly publish flawed information?

...these scores are only a part of the criteria necessary for full and accurate evaluation of an individual teacher.

And yet they publish 4,200 teachers value-added scores based solely on value add, which at best makes up only 35% of a teachers evaluation. Lay people will not understand these scores are only a partial measurment of a teachers effectiveness, and a poor one at that.

...There are a lot of questions still about the particular formula Ohio.

Indeed, so many questions that one would best be advised to wait until those questions are answered before publically shaming teachers who were part of a pilot program being used to answer those questions.

...variables beyond a teacher’s control need to be considered in arriving at a fair and accurate formula.

Yet none of these reporters considered any of these factors in publishing teachers names, and readers will wholly miss that necassary context.

...The company that calculates value-added for Ohio says scores are most reliable with three years of data.

Again, the data is unreliable, especially with less than 3 years worth of data, yet the Plain Dealer and NRP decided they should shame teachers using just 2 years worth of data.

...Ohio’s value-added ratings do not account for the socioeconomic backgrounds of students, as they do in some other states.

How many "ineffective" teachers are really just working in depressed socioeconomic classrooms? The reporters seem not to care and publish the names anyway.

...Value-added scores are not a teacher’s full rating.

No where in the publication of these names are the teachers full ratings indicated. This again leaves lay-people and site visitors to think these flawed value-added scores are the final reflection of a teachers quality

...ratings are still something of an experiment.

How absurd is the decision to publish now seeming? Shaming people on the basis of the results of an experiement! By their very nature experiments can demonstrate something is wrong, not right.

...The details of how the scores are calculated aren’t public.

We don't even know if the value-added scores are correct and accurate, because the formula is secret. How can it be fair for the results of a secret forumla be public? Did that not rasie any alarm bells for the Plain Dealer and NPR?

...The department’s top research official, Matt Cohen, acknowledged that he can’t explain the details of exactly how Ohio’s value-added model works.

But somehow NPR listeners and Cleveland Plain Dealer readers are supposed to understand the complexities, and read the necessary context into the publication of individual teacher scores?

...StateImpact/Plain Dealer analysis of initial state data suggests.

"Initial", "Suggests". They have decided to shame teachers without properly vetting the data and their own analysis - exactly the same problem the LA Times ran into that we highlighted at the top of this article.

It doesn't take a lot of "analysis" to understand that a failing newspaper needed controversy and eyeballs and that their decision to shame teachers was made in their own economic interests and not that of the public good. In the end then, the real shame falls not on teachers who are working hard everyday often in difficult situations made worse by draconian budget cuts, endless political meddling, and student poverty - but on the editors of these 2 publications for putting their own narrow self-interest above that of Ohio's children.

It's a disgrace that they ought to make 4,200 apologies for.

Value-added: How Ohio is destroying a profession

We ended the week last week with a post titled "The 'fun' begins soon", which took a look at the imminent changes to education policy in Ohio. We planned on detailing each of these issues over the next few weeks.

Little did we know that the 'fun' would begin that weekend. It came in the manner of the Cleveland Plain Dealer and NPR publishing a story on the changing landscape of teacher evaluations titled "Grading the Teachers: How Ohio is Measuring Teacher Quality by the Numbers".

It's a solid, long piece, worth the time taken to read it. It covers some, though not all, of the problems of using value-added measurements to evaluate teachers

Those ratings are still something of an experiment. Only reading and math teachers in grades four to eight get value-added ratings now. But the state is exploring how to expand value-added to other grades and subjects.

Among some teachers, there’s confusion about how these measures are calculated and what they mean.

“We just know they have to do better than they did last year,” Beachwood fourth-grade teacher Alesha Trudell said.

Some of the confusion may be due to a lack of transparency around the value-added model.

The details of how the scores are calculated aren’t public. The Ohio Education Department will pay a North Carolina-based company, SAS Institute Inc., $2.3 million this year to do value-added calculations for teachers and schools. The company has released some information on its value-added model but declined to release key details about how Ohio teachers’ value-added scores are calculated.

The Education Department doesn’t have a copy of the full model and data rules either.

The department’s top research official, Matt Cohen, acknowledged that he can’t explain the details of exactly how Ohio’s value-added model works. He said that’s not a problem.

Evaluating a teacher on a secret formula isn't a practice that can be sustained, supported or defended. The article further details a common theme we hear over and over again

But many teachers believe Ohio’s value-added model is essentially unfair. They say it doesn’t account for forces that are out of their control. They also echo a common complaint about standardized tests: that too much is riding on these exams.

“It’s hard for me to think that my evaluation and possibly some day my pay could be in a 13-year-old’s hands who might be falling asleep during the test or might have other things on their mind,” said Zielke, the Columbus middle school teacher.

The article also performs analysis on several thousands value add scores, and that analysis demonstrates what we have long reported, that value-add is a poor indicator of teacher quality, with too many external factors affecting the score

A StateImpact/Plain Dealer analysis of initial state data suggests that teachers with high value-added ratings are more likely to work in schools with fewer poor students: A top-rated teacher is almost twice as likely to work at a school where most students are not from low-income families as in a school where most students are from low-income families.
[…]
Teachers say they’ve seen their value-added scores drop when they’ve had larger classes. Or classes with more students who have special needs. Or more students who are struggling to read.

Teachers who switch from one grade to another are more likely to see their value-added ratings change than teachers who teach the same grade year after year, the StateImpact/Plain Dealer analysis shows. But their ratings went down at about the same rate as teachers who taught the same grade level from one year to the next and saw their ratings change.

What are we measuring here? Surely not teacher quality, but rather socioeconomic factors and budget conditions of the schools and their students.

Teachers are intelligent people, and they are going to adapt to this knowledge in lots of unfortunate ways. It will become progressively harder to districts with poor students to recruit and retain the best teachers. But perhaps the most pernicious effect is captured at the end of the article

Stephon says the idea of Plecnik being an ineffective teacher is “outrageous.”

But Plecnik is through. She’s quitting her job at the end of this school year to go back to school and train to be a counselor — in the community, not in schools.

Plecnik was already frustrated by the focus on testing, mandatory meetings and piles of paperwork. She developed medical problems from the stress of her job, she said. But receiving the news that despite her hard work and the praise of her students and peers the state thought she was Least Effective pushed her out the door.

“That’s when I said I can’t do it anymore,” she said. “For my own sanity, I had to leave.”

The Cleveland Plain Dealer and NPR then decided to add to this stress by publishing individual teachers value-added scores - a matter we will address in our next post.

Why Test Scores CAN'T Evaluate Teachers

From the National Education Policy Center. the entire post is well worth a read, here's the synopsis

The key element here that distinguishes Student Growth Percentiles from some of the other things that people have used in research is the use of percentiles. It's there in the title, so you'd expect it to have something to do with percentiles. What does that mean? It means that these measures are scale-free. They get away from psychometric scaling in a way that many researchers - not all, but many - say is important.

Now these researchers are not psychometricians, who aren't arguing against the scale. The psychometricians as who create our tests, they create a scale, and they use scientific formulae and theories and models to come up with a scale. It's like on the SAT, you can get between 200 and 800. And the idea there is that the difference in the learning or achievement between a 200 and a 300 is the same as between a 700 and an 800.

There is no proof that that is true. There is no proof that that is true. There can't be any proof that is true. But, if you believe their model, then you would agree that that's a good estimate to make. There are a lot of people who argue... they don't trust those scales. And they'd rather use percentiles because it gets them away from the scale.

Let's state this another way so we're absolutely clear: there is, according to Jonah Rockoff, no proof that a gain on a state test like the NJASK from 150 to 160 represents the same amount of "growth" in learning as a gain from 250 to 260. If two students have the same numeric growth but start at different places, there is no proof that their "growth" is equivalent.

Now there's a corollary to this, and it's important: you also can't say that two students who have different numeric levels of "growth" are actually equivalent. I mean, if we don't know whether the same numerical gain at different points on the scale are really equivalent, how can we know whether one is actually "better" or "worse"? And if that's true, how can we possibly compare different numerical gains?

[readon2 url="http://nepc.colorado.edu/blog/why-test-scores-cant-evaluate-teachers"]Continue reading...[/readon2]

The cheating will continue until morale improves

Atlanta wasn’t an isolated incident. Neither was El Paso, or Washington, DC, or Columbus. A new General Accounting Office report demonstrates that cheating by school officials on standardized tests has become commonplace despite the use of security measures the report recommends. The only solution is one that Education Secretary Arne Duncan has so far refused—removing the high stakes attached to standardized testing.

The latest embarrassment is in Columbus, where this month Ohio State Auditor Dave Yost seized records at 20 high schools. This is part of a two-year-old investigation into “scrubbing” 2.8 million attendance records of students who failed tests. Yost has recently widened his investigation to look into whether school administrators also changed grades to boost graduation rates.

A GOA reportreleased May 16 recommends adopting “leading practices to prevent test irregularities.” However, the report reveals that while all states and the District of Columbia use at least some of the recommended best practices, 33 states had confirmed instances of test cheating in the last two school years. And states where the worst offenses are occurring already have adopted most of the practices identified in the report, making it unlikely that greater security will improve test integrity.

Ohio employs five of the nine security plans recommended by the GOA report. Atlanta, where the superintendent and 34 other educators were recently indicted for changing test answers, has adopted eight of nine security practices, as has Texas, where the former El Paso superintendant is now in federal prison for a scheme to encourage low-performing students to drop out. And Washington, D.C., where 191 teachers at 70 schools were implicated in a rash of wrong-to-right erasure marks on tests, uses every single security measure.

The Department of Education responded to the GAO’s findings by holding a symposium on test integrity and issuing a follow-up report on best practices and policies. But the federal government convening a meeting and issuing yet another report might be even less effective at stopping cheating than increased security.

The report also noted that linking awards and recognition to improving test scores and threatening the jobs of principals for low test scores “could provide incentives to cheat.” But at a conference of education writers in April, Sec. Arne Duncan denied that linking test scores to career outcomes could drive educators to criminally manipulate the system.

“I reject the idea that the system forces people to cheat,” he said.

Maybe so, but cheating now seems inherent in the system, and our Education Secretary seems incurious as to why. It’s even hard to get him to admit there is an epidemic of test cheating. Asked about the Ohio investigation, Duncan said, “I almost don’t know of another situation like this.”

[readon2 url="http://jasonstanford.org/2013/05/the-cheating-will-continue-until-morale-improves/"]Continue reading...[/readon2]

Michelle Rhee and the unproven teacher evaluation

Via the LA Times

The debate -- and that’s putting it nicely -- over the use of standardized test scores in teacher evaluations has always confused me, because the answer seemed so simple. One of the things we ask of teachers -- but just one thing -- is to raise those scores. So they have some place in the evaluation. But how much? Easy. Get some good evidence and base the decisions on that, not on guessing. The quality of education is at stake, as well as people’s livelihoods.

Much to my surprise, at a meeting with the editorial board this week, Michelle Rhee agreed, more or less. As one of the more outspoken voices in the school-reform movement, Rhee is at least as polarizing as the topic of teacher evaluations, and her lobbying organization, Students First, takes the position that the standardized test scores of each teacher’s students should count for no less than 50% of that teacher’s rating on performance evaluations.

But asked where the evidence was to back up that or any other percentage figure, Rhee agreed quite openly that it’s lacking.

[readon2 url="http://www.latimes.com/news/opinion/opinion-la/la-ol-michelle-rhee-teachers-20130416,0,4487460.story"]Continue reading...[/readon2]