The Albert Shanker Institute has an important analysis of Ohio's school report card data, and finds a large amount of instability in the results. This should cause some pause, especially as we move towards using teacher level value add data for high stakes decisions. To say it will be critical to have reliable, trustworthy, and stable data when making hiring/firing and salary decisions is an obvious understatement. If there are serious and genuine questions about building level data stability, then the rush to go further ought to at least have some brakes applied.
Some people might look at these results, in which most schools got different ratings between years, and be very skeptical of Ohio’s value-added measures. Others will have faith in them. It’s important to bear in mind that measuring school “quality” is far from an exact science, and all attempts to do so – using test scores or other metrics – will necessarily entail imprecision, both within and between years. It is good practice to always keep this in mind, and to interpret the results with caution.
So I can’t say definitively whether the two-year instability in ratings among Ohio’s public schools is “high” or “low” by any absolute standard. But I can say that the data suggest that schools really shouldn’t be judged to any significant extent based on just one or two years of value-added ratings.
Unfortunately, that’s exactly what’s happening in Ohio. Starting this year, all schools that come in “above expectations” in any given year are automatically bumped up a full “report card grade,” while schools that receive a “below expectations” ratings for two consecutive years are knocked down a grade (there are six possible grades). In both cases, the rules were changed (effective this year) such that fewer years were required to trigger the bumps – previously, it took two consecutive years “above expectations” to get a higher report card grade, and three consecutive years “below expectations” to lose a grade (see the state’s guide to ratings). These final grades can carry serious consequences, including closure, if they remain persistently low.
As I’ve said before, value-added and other growth models can be useful tools, if used properly. This is especially true of school-level value-added, since the samples are larger, and issues such as non-random assignment are less severe due to pooling of data for an entire school. However, given the rather high instability of ratings between years, and the fact that accuracy improves with additional years of data, the prudent move, if any, would be to require that more years of ratings be required to affect report card grades, not fewer. The state is once again moving in the wrong direction.