On Teacher Evaluation: Slow Down And Get It Right

One of the primary policy levers now being employed in states and districts nationwide is teacher evaluation reform. Well-designed evaluations, which should include measures that capture both teacher practice and student learning, have great potential to inform and improve the performance of teachers and, thus, students. Furthermore, most everyone agrees that the previous systems were largely pro forma, failed to provide useful feedback, and needed replacement.

The attitude among many policymakers and advocates is that we must implement these systems and begin using them rapidly for decisions about teachers, while design flaws can be fixed later. Such urgency is undoubtedly influenced by the history of slow, incremental progress in education policy. However, we believe this attitude to be imprudent.

The risks to excessive haste are likely higher than whatever opportunity costs would be incurred by proceeding more cautiously. Moving too quickly gives policymakers and educators less time to devise and test the new systems, and to become familiar with how they work and the results they provide.

Moreover, careless rushing may result in avoidable erroneous high stakes decisions about individual teachers. Such decisions are harmful to the profession, they threaten the credibility of the evaluations, and they may well promote widespread backlash (such as the recent Florida lawsuits and the growing “opt-out” movement). Making things worse, the opposition will likely “spill over” into other promising policies, such as the already-fragile effort to enact the Common Core standards and aligned assessments.

[readon2 url="http://shankerblog.org/?p=8358"]Continue reading...[/readon2]

Do Different Value-Added Models Tell Us the Same Things?



  • Statistical models that evaluate teachers based on growth in student achievement differ in how they account for student backgrounds, school, and classroom resources. They also differ by whether they compare teachers across a district (or state) or just within schools.
  • Statistical models that do not account for student background factors produce estimates of teacher quality that are highly correlated with estimates from value-added models that do control for student backgrounds, as long as each includes a measure of prior student achievement.
  • Even when correlations between models are high, different models will categorize many teachers differently.
  • Teachers of advantaged students benefit from models that do not control for student background factors, while teachers of disadvantaged students benefit from models that do.
  • The type of teacher comparisons, whether within or between schools, generally has a larger effect on teacher rankings than statistical adjustments for differences in student backgrounds across classrooms.


There are good reasons for re-thinking teacher evaluation. As we know, evaluation systems in most school districts appear to be far from rigorous. A recent study showed that more than 99 percent of teachers in a number of districts were rated “satisfactory,” which does not comport with empirical evidence that teachers differ substantially from each other in terms of their effectiveness. Likewise, the ratings do not reflect the assessment of the teacher workforce by administrators, other teachers, or students.

Evaluation systems that fail to recognize the true differences that we know exist among teachers greatly hamper the ability of school leaders and policymakers to make informed decisions about such matters as which teachers to hire, what teachers to help, which teachers to promote, and which teachers to dismiss. Thus it is encouraging that policymakers are developing more rigorous evaluation systems, many of which are partly based on student test scores.

Yet while the idea of using student test scores for teacher evaluations may be conceptually appealing, there is no universally accepted methodology for translating student growth into a measure of teacher performance. In this brief, we review what is known about how measures that use student growth align with one another, and what that agreement or disagreement might mean for policy.

[readon2 url="http://www.carnegieknowledgenetwork.org/briefs/value-added/different-growth-models/"]Continue reading...[/readon2]

Assessing Ourselves To Death

I have two points to make. The first is something that I think everyone knows: Educational outcomes, such as graduation and test scores, are signals of or proxies for the traits that lead to success in life, not the cause of that success.

For example, it is well-documented that high school graduates earn more, on average, than non-graduates. Thus, one often hears arguments that increasing graduation rates will drastically improve students’ future prospects, and the performance of the economy overall. Well, not exactly.

The piece of paper, of course, only goes so far. Rather, the benefits of graduation arise because graduates are more likely to possess the skills – including the critical non-cognitive sort – that make people good employees (and, on a highly related note, because employers know that, and use credentials to screen applicants).

We could very easily increase the graduation rate by easing requirements, but this wouldn’t do much to help kids advance in the labor market. They might get a few more calls for interviews, but over the long haul, they’d still be at a tremendous disadvantage if they lacked the required skills and work habits.

Moreover, employers would quickly catch on, and adjust course accordingly. They’d stop relying as much on high school graduation to screen potential workers. This would not only deflate the economic value of a diploma, but high school completion would also become a less useful measure for policymakers and researchers.

This is, of course, one of the well-known risks of a high-stakes focus on metrics such as test scores. Test-based accountability presumes that tests can account for ability. We all know about what is sometimes called “Campbell’s Law,” and we’ve all heard the warnings and complaints about so-called “teaching to the test.” Some people take these arguments too far, while others are too casually dismissive. In general, though, the public (if not all policymakers) have a sense that test-based accountability can be a good thing so long as it is done correctly and doesn’t go too far.

Now, here’s my second point: I’m afraid we’ve gone too far.

[readon2 url="http://shankerblog.org/?p=6835"]Continue Reading...[/readon2]