Without question, designing school and district rating systems is a difficult task, and Ohio was somewhat ahead of the curve in attempting to do so (and they’re also great about releasing a ton of data every year). As part of its application for ESEA waivers, the state recently announced a newly-designed version of its long-standing system, with the changes slated to go into effect in 2014-15. State officials told reporters that the new scheme is a “more accurate reflection of … true [school and district] quality.”

In reality, however, despite its best intentions, what Ohio has done is perpetuate a troubled system by making less-than-substantive changes that seem to serve the primary purpose of giving lower grades to more schools in order for the results to square with preconceptions about the distribution of “true quality.” It’s not a better system in terms of measurement – both the new and old schemes consist of mostly the same inappropriate components, and the ratings differentiate schools based largely on student characteristics rather than school performance.

So, whether or not the aggregate results seem more plausible is not particularly important, since the manner in which they’re calculated is still deeply flawed. And demonstrating this is very easy.

Rather than get bogged down in details about the schemes, the short and dirty version of the story is that the old system assigned six possible ratings based mostly on four measures: AYP; the state’s performance index; the percent of state standards met; and a value-added growth model (see our post for more details on the old system). The new system essentially retains most of the components of the old, but the formula is a bit different and it incorporates a new “achievement and graduation gap” measure that is supposed to gauge whether student subgroups are making acceptable progress. The “gap” measure is really the only major substantive change to the system’s components, but it basically just replaces one primitive measure (AYP) with another.*

Although the two systems yield different results overall, the major components of both – all but the value-added scores – are, directly or indirectly, “absolute performance” measures. They reflect how highly students score, not how quickly they improve. As a result, the measures are telling you more about the students that schools serve than the quality of instruction that they provide. Making high-stakes decisions based on this information is bad policy. For example, closing a school in a low-income neighborhood based on biased ratings not only means that one might very well be shutting down an effective school, but also that it’s unlikely it will be replaced by a more effective alternative.

Put differently, the most important step in measuring schools’ effectiveness is controlling for confounding observable factors, most notably student characteristics. Ohio’s ratings are driven by them. And they’re not the only state.

(Important side note: With the exception of the state’s value-added model, which, despite the usual issues, such as instability, is pretty good, virtually every indicator used by the state is a cutpoint-based measure. These are severely limited and potentially very misleading in ways that are unrelated to the bias. I will not be discussing these issues in this post, but see the second footnote below this post, and here and here for some related work.)**

The components of the new system

The severe bias in the new system’s constituent measures is unmistakable and easy to spot. To illustrate it in an accessible manner, I’ve identified the schools with free/reduced lunch rates that are among the highest 20 percent (highest quintile) of all non-charter schools in the state. This is an imperfect proxy for student background, but it’s sufficient for our purposes. (Note: charter schools are excluded from all these figures.)

The graph below breaks down schools in terms of how they scored (A-F) on each of the four components in the new system; these four grades are averaged to create the final grade. The bars represent the percent of schools (over 3,000 in total) receiving each grade that are in the highest poverty quintile. For example, looking at the last set of bars on the right (value-added), 17 percent of the schools that received the equivalent of an F (red bar) on the value-added component were high-poverty schools.

[readon2 url="http://shankerblog.org/?p=5511"]Continue reading[/readon2]