The Arbitrary Albatross: Standardized Testing and Teacher Evaluation

On Chicago's streets and Hollywood's silver screens, education reform has been cast as a false dilemma between students and teachers. Reputable actresses and liberal mayors have both fallen prey. At the center of this drama lie teacher evaluations. A linchpin of the debate, they weigh especially heavily around the necks of educators like me.

Think: Shaky Foundation

With the arrival of spring, testing season is now upon us: America's new national pastime. I believe student results from standardized tests should not be used to evaluate teachers because the data are imprecise and the effects are pernicious. Including such inaccurate measures is both unfair to teachers and detrimental to student learning.

As a large body of research suggests, standardized test data are imprecise for two main reasons. First, they do not account for individual and environmental factors affecting student performance, factors over which teachers have no control. (Think: commitment, social class, family.) Second, high-stakes, one-time tests increase the likelihood of random variation so that scores fluctuate in arbitrary ways not linked to teacher efficacy. (Think: sleep, allergies, the heartache of a recent breakup.)

High-stakes assessments are also ruinous to student learning. They encourage, at least, teaching to the test and, at most, outright cheating. This phenomenon is supported by Campbell's law, which states statistics are more likely to be corrupted when used in making decisions, which in turn corrupts the decision making process itself. (Think: presidential campaigns.)

As a teacher, if my livelihood is based on test results, then I will do everything possible to ensure high marks, including narrowing the curriculum and prepping fiercely for the test. The choice between an interesting project and a paycheck is no choice at all. These are amazing disincentives to student learning. Tying teachers' careers to standardized tests does not foster creative, passionate, skillful young adults. It does exactly the opposite.

[readon2 url=""]Continue reading...[/readon2]

Charters and their supporters failing our kids

ODE has finally released the full school report card, though only in spreadsheet format, and it comes with a warning

ODE will not publish PDFs of the Local Report Cards until the investigation by the Auditor of State is concluded.

We thought it would be useful to compare how effective traditional public schools were versus their charter school counterparts. The results are staggeringly bad for charter schools

Report Card Rating Traditional Schools Charter Schools
Academic Emergency 3.4% 18.8%
Academic Watch 4.6% 15.6%
Continuous Improvement 10.4% 27.3%
Effective 21.4% 15.6%
Excellent 41.0% 7.4%
Excellent with Distinction 14.4% 1.1%
Not Rated 4.8% 14.2%

61.6% of all charter schools in Ohio are less than effective, while that can only be said of 18.4% of traditional schools. If the purpose of charter schools was to be incubators of excellence, they are doing a very poor job, with only 8.5% of them hitting the excellent or better rating. Indeed, if you truly want to see excellence, you have to look at traditional public schools, where over 55% are rated excellent or better.

If "school choice" organizations in Ohio had any integrity, the choice they would be urging in almost all cases, would be for parents to choose traditional public schools. In the vast majority of cases, their advocacy of charter schools are an advocacy of miserable failure, at huge tax payer expense.

Diane Ravitch spoke to this issue in Columbus yesterday

Proficiency testing and charter schools were billed in the late 1990s as solutions to a broken public-education system. Now, they are part of a failed status quo, said Ravitch, 74, an author and U.S. assistant secretary of education under President George H.W. Bush.

Proficiency tests have changed — from something that assesses students to something used to punish teachers and schools, said Ravitch. And after a decade of poor results from charter schools, she said, the charter movement and high-stakes testing have proved to be failed national experiments.

Also at the same event, Greg Harris, the Ohio director of the 65,000-member charter-school advocacy group StudentsFirst

...charters were supposed to provide an experiment in innovation, and though many have failed, many others are working.

“The parents are making these choices” to go to charters, Harris said. “These are parents from high-poverty backgrounds who are making major sacrifices to get their kids out of failing schools.

“We agree with her that bad charter schools should be closed, but why close good ones?”

Parents are often steered into these choices by corporate education reformers and their boosters, like StudentsFirst, the most ironically named group of all. And when parents aren't being steered into wrong choices, it's because they are using factors other than quality to make their decisions, as we noted in this article.

Where the polls stand - Post convention

With the RNC and DNC conventions over, the clear winner, based on current polling, appears to be President Obama.

”Mr. Obama had another strong day in the polls on Saturday, making further gains in each of four national tracking polls. The question now is not whether Mr. Obama will get a bounce in the polls, but how substantial it will be.Some of the data, in fact, suggests that the conventions may have changed the composition of the race, making Mr. Obama a reasonably clear favorite as we enter the stretch run of the campaign.” Nate Silver in The New York Times.

Let's take a look at the state of play. First, Real Clear Politics has the race essentially unchanged from last week, with President Obama having 221 electoral college votes to Mitt Romney's 191, 126 are listed as toss-ups

In Ohio, RCP has Obama's lead increasing from an average of 1.4% to 2.2%

538, whom we quoted up top, has the President's advantage increasing by 10 electoral college votes, and now stands at landslide levels of 318.8

In Ohio his chances of victory have also increased and now stand at 74.6%, up from 71.5 last week.

Crazy polling result of the day perhaps comes from a PPP poll of Ohio, where 15% of Ohio Republicans said Mitt Romney deserved more credit for the killing of Osama bin Laden.

Poor schools can’t win

Without question, designing school and district rating systems is a difficult task, and Ohio was somewhat ahead of the curve in attempting to do so (and they’re also great about releasing a ton of data every year). As part of its application for ESEA waivers, the state recently announced a newly-designed version of its long-standing system, with the changes slated to go into effect in 2014-15. State officials told reporters that the new scheme is a “more accurate reflection of … true [school and district] quality.”

In reality, however, despite its best intentions, what Ohio has done is perpetuate a troubled system by making less-than-substantive changes that seem to serve the primary purpose of giving lower grades to more schools in order for the results to square with preconceptions about the distribution of “true quality.” It’s not a better system in terms of measurement – both the new and old schemes consist of mostly the same inappropriate components, and the ratings differentiate schools based largely on student characteristics rather than school performance.

So, whether or not the aggregate results seem more plausible is not particularly important, since the manner in which they’re calculated is still deeply flawed. And demonstrating this is very easy.

Rather than get bogged down in details about the schemes, the short and dirty version of the story is that the old system assigned six possible ratings based mostly on four measures: AYP; the state’s performance index; the percent of state standards met; and a value-added growth model (see our post for more details on the old system). The new system essentially retains most of the components of the old, but the formula is a bit different and it incorporates a new “achievement and graduation gap” measure that is supposed to gauge whether student subgroups are making acceptable progress. The “gap” measure is really the only major substantive change to the system’s components, but it basically just replaces one primitive measure (AYP) with another.*

Although the two systems yield different results overall, the major components of both – all but the value-added scores – are, directly or indirectly, “absolute performance” measures. They reflect how highly students score, not how quickly they improve. As a result, the measures are telling you more about the students that schools serve than the quality of instruction that they provide. Making high-stakes decisions based on this information is bad policy. For example, closing a school in a low-income neighborhood based on biased ratings not only means that one might very well be shutting down an effective school, but also that it’s unlikely it will be replaced by a more effective alternative.

Put differently, the most important step in measuring schools’ effectiveness is controlling for confounding observable factors, most notably student characteristics. Ohio’s ratings are driven by them. And they’re not the only state.

(Important side note: With the exception of the state’s value-added model, which, despite the usual issues, such as instability, is pretty good, virtually every indicator used by the state is a cutpoint-based measure. These are severely limited and potentially very misleading in ways that are unrelated to the bias. I will not be discussing these issues in this post, but see the second footnote below this post, and here and here for some related work.)**

The components of the new system

The severe bias in the new system’s constituent measures is unmistakable and easy to spot. To illustrate it in an accessible manner, I’ve identified the schools with free/reduced lunch rates that are among the highest 20 percent (highest quintile) of all non-charter schools in the state. This is an imperfect proxy for student background, but it’s sufficient for our purposes. (Note: charter schools are excluded from all these figures.)

The graph below breaks down schools in terms of how they scored (A-F) on each of the four components in the new system; these four grades are averaged to create the final grade. The bars represent the percent of schools (over 3,000 in total) receiving each grade that are in the highest poverty quintile. For example, looking at the last set of bars on the right (value-added), 17 percent of the schools that received the equivalent of an F (red bar) on the value-added component were high-poverty schools.

[readon2 url=""]Continue reading[/readon2]

Who are the businesses making threats to children?

Gongwer, March 12th, quoting Governor John Kasich

"I think what's going to happen in Cleveland (if the legislation doesn't pass), I've been told the business community is walking away," Mr. Kasich said. "They're not going to support levies; they're done; they're finished with what's happening there."

These businesses that the Governor references ought to step forward from behind the skirt-tails of the chamber of commerce. People have a right to know who is making threats towards their children's future so they can decide if that's a business they wish to continue to support.

Ohio can't wait to start misusing value add

The Columbus Dispatch ran an article "Ratings start to ID effective teachers", which discusses the recent use of teacher level value add scores, primarily as part of RttT, but which also will feature heavily in teacher evaluations going forward.

The article covers a lot of common ground, but not until the 17th of 27 paragraphs does it even mention how inappropriate value add is for this use

Officials involved in producing the new effectiveness ratings say they should not be used to label a teacher as good or bad. This year’s rating is a statement of a teacher’s effectiveness with his or her students from last school year, and nothing more, said Mary Peters, senior director of research and innovation at Battelle for Kids. The Columbus-based nonprofit organization is helping the Education Department develop the effectiveness system.

“We need to be careful about making judgments about one year of data,” Peters said. “These measures were intended for diagnostic purposes, to provide information to help teachers reflect on their practice and determine with whom they are being successful.”

Despite these constant warnings by academics and researchers alike, policy makers, and some government bureaucrats continue to see teacher level value add as a primary tool for teacher evaluation, and it looks for all the world that Ohio can't wait any longer to begin misusing this tool