2 new studies question value add measures

Evidence is overwhelming, as yet more studies show that using value add to measure teacher quality is fraught with error.

Academic tracking in secondary education appears to confound an increasingly common method for gauging differences in teacher quality, according to two recently released studies.

Failing to account for how students are sorted into more- or less-rigorous classes—as well as the effect different tracks have on student learning—can lead to biased "value added" estimates of middle and high school teachers' ability to boost their students' standardized-test scores, the papers conclude.

"I think it suggests that we're making even more errors than we need to—and probably pretty large errors—when we're applying value-added to the middle school level," said Douglas N. Harris, an associate professor of economics at Tulane University in New Orleans, whose study examines the application of a value-added approach to middle school math scores.

High-school-level findings from a separate second study, by C. Kirabo Jackson, an associate professor of human development and social policy at Northwestern University in Evanston, Ill., complement Mr. Harris' paper.

"At the elementary level, [value-added] is a pretty reliable measure, in terms of predicting how teachers will perform the following year," Mr. Jackson said. "At the high school level, it is quite a bit less reliable, so the scope for using this to improve student outcomes is much more limited."

The first study mentioned in this article concludes(emphasis ours)

We test the degree to which variation in measured performance is due to misalignment versus selection bias in a statewide sample of middle schools where students and teachers are assigned to explicit “tracks,” reflecting heterogeneous student ability and/or preferences. We find that failing to account for tracks leads to large biases in teacher value-added estimates.

A teacher of all lower track courses whose measured value-added is at the 50th percentile could increase her measured value-added to the 99th percentile simply by switching to all upper-track courses. We estimate that 75-95 percent of the bias is due to student sorting and the remainder due to test misalignment.

We also decompose the remaining bias into two parts, metric and multidimensionality misalignment, which work in opposite directions. Even after accounting for explicit tracking, the standard method for estimating teacher value-added may yield biased estimates.

The second study, replicates the findings and concludes

Unlike in elementary-school, high-school teacher effects may be confounded with both selection to tracks and unobserved track-level treatments. I document sizable confounding tracks effects, and show that traditional tests for the existence of teacher effects are likely biased. After accounting for these biases, algebra teachers have modest effects and there is little evidence of English teacher effects.

Unlike in elementary-school, value-added estimates are weak predictors of teachers’ future performance. Results indicate that either (a) teachers are less influential in high-school than in elementary-school, or (b) test-scores are a poor metric to measure teacher quality at the high-school level.

Corporate education reformers need to begin to address the science that is refuting their policies, the sooner this happens, the less damage is likely to be wrought.