Researchers Give Failing Marks to Teacher Evaluation Systems

Via Hechinger Report.

School systems around the country are trying to use objective, quantifiable measures to identify which are the good teachers and which are the bad ones. One popular approach used in New York, Chicago and other cities, is to calculate a value-added performance measure (VAM). Essentially, you create a model that begins by calculating how much kids’ test scores, on average, increase each year. (Test score year 2 minus test score year 1). Then you give a high score to teachers who have students who post test-score gains above the average. And you give a low score to teachers whose students show smaller test-score gains. There are lots of mathematical tweaks, but the general idea is to build a model that answers this question: are the students of this particular teacher learning more or less than you expect them to? The teachers’ value-added scores are then used to figure out which teachers to train, fire or reward with bonuses.

Two academic researchers from the University of Southern California and the University of Pennsylvania looked at these value-added measures in six districts around the nation and found that there was weak to zero relationship between these new numbers and the content or quality of the teacher’s instruction.

“These results call into question the fixed and formulaic approach to teacher evaluation that’s being promoted in a lot of states right now,” said Morgan Polikoff, one of the study’s authors, in a video that explains his paper, “Instructional Alignment as a Measure of Teaching Quality,” published online in Education Evaluation and Policy Analysis on May 13, 2014. ”These measures are not yet up to the task of being put into, say, an index to make important summative decisions about teachers.”

Polikoff of the University of Southern California and Andrew Porter of the University of Pennsylvania looked at the value-added scores of 327 fourth- and eighth-grade mathematics and English language arts teachers across all six school districts included in the Measures of Effective Teaching (MET) study (New York City, Dallas, Denver, Charlotte-Mecklenburg, Memphis, and Hillsborough County, Florida). Specifically, they compared the teachers’ value added scores with how closely their instructional materials aligned with their state’s instructional standards and the content of the state tests. But teachers who were teaching the right things weren’t getting higher value-added scores.

They also looked at other measures of teacher quality, such as teacher observations and student evaluations. Similarly, teachers who won high marks from professional observers or students were also not getting higher value-added scores.

“What we’re left with is that state tests aren’t picking up what we think of as good teaching,” Polikoff said.

What’s interesting is that Polikoff’s and Porter’s research was funded by the Gates Foundation, which had been touting how teachers’ effectiveness could be estimated by their students’ progress on standardized tests. The foundation had come under fire from economists for flawed analysis. Now this new Gates Foundation’ commissioned research has proved the critics right. (The Gates Foundation is also among the funders of The Hechinger Report).

Polikoff said that the value-added measures do provide some information, but they’re meaningless if you want to use them to improve instruction. “If the things we think of as defining good instruction don’t seem to producing substantially better student achievement, then how is it that teachers will be able to use the value-added results to make instructional improvements?” he asked.

Polikoff concludes that the research community needs to develop new measures of teacher quality in order to “move the needle” on teacher performance.

You can read the entire report below

Educational Evaluation and Policy Analysis-2014-Polikoff