If there was a new drug that had shown some promise in curing the flu in lab trials, but there were also some indicators that it had some nasty, in some cases fatal, side effects, do you think that drug required more testing and trials, or should be rushed into production and given out as widely as possible?

That's basically the scenario we have with using value add scores for high stakes decision making when it comes to teachers. Sure no one is actually going to die, but if corporate education reformers have their way, many might falsely lose their jobs, and the money wasted will never be used to actually educate a student, and what of the opportunity cost of missing out on getting effective reforms into the classroom being missed?

Given the context-dependency of the estimators’ ability to produce accurate results, however, and our current lack of knowledge regarding prevailing assignment practices, VAM-based measures of teacher performance, as currently applied in practice and research, must be subjected to close scrutiny regarding the methods used and interpreted with a high degree of caution.

Methods of constructing estimates of teacher effects that we can trust for high-stakes evaluative purposes must be further studied, and there is much left to investigate. In future research, we will explore the extent to which various estimation methods, including more sophisticated dynamic treatment effects estimators, can handle further complexity in the DGPs.

The addition of test measurement error, school effects, time-varying teacher effects, and different types of interactions among teachers and students are a few of many possible dimensions of complexity that must be studied. Finally, diagnostics are needed to identify the structure of decay and prevailing teacher assignment mechanisms. If contextual norms with regard to grouping and assignment mechanisms can be deduced from available data, then it may be possible to determine which estimators should be applied in a given context.

We must be able to prove that evaluations and the metrics that make them up are fair, accurate and stable, and if they are to have any real benefit they must ultimately demonstrate a cost effective way to improve student achievement and education quality. We're simply not there yet and pretending we are is dangerous and carries some very real risks.