Ohio Teacher Evaluation System: Dishonest, Unrealistic, and Not Fully Supported by Academic Research

A great article that appeared on Dailykos a few days ago

I've spent the past three days at an OTES (Ohio Teacher Evaluation System) training. This system is being phased in over the next two years, and will serve as the vehicle by which all teachers in Ohio are evaluated. The workshop culminates with a post-assessment, taken some time after the classes end, resulting in licensure and the ability to evaluate instructional staff. OTES is described by ODE as a system that will
provide educators with a richer and more detailed view of their performance, with a focus on specific strengths and opportunities for improvement.

I talked to a number of administrators and teachers who had already taken the training before attending. Without exception, they were all struck by the rigidity of the rubric. I agree, but there's more here. Any system that wields so much power must be realistic, honest, and rooted in the consensus of academic research. The OTES rubric fails this basic test.

Words Matter
Check out the Ohio Standards for the Teaching Profession (starting on page 16) approved in October of 2005. Now look at the OTES rubric. The first thing you will notice is that the OTES rubric has four levels, and that the Ohio Standards only have three. I think it's fair to say that the Ohio Standards did not include the lowest level. (The document says as much.) The top three levels of the OTES Rubric align with the three levels of the Ohio Standards. The snag? The terminology used in the OTES rubric. Proficient has been replaced by Developing, Accomplished by Proficient, and Distinguished by Accomplished. Each level has been relegated!

One might argue that this doesn't matter. But, it does. Teacher evaluations are public record. School performance, or at least the percentage of teachers that fall into each category, will be published. Newspapers will ask for names of teachers and their ratings. And, as we will see as I unpack the rubric in greater detail, the very best teachers are likely to fall into the Proficient category. What's the one relationship between public education and the word Proficient already burned into the minds of parents? The minimal level of performance required to pass the Ohio Graduation Test. Dishonest.

[readon2 url=""]Continue reading...[/readon2]

3rd grade retention plan could cost $500 million

One of the signature policy initiatives in the Kasich MBR is the proposed change in the 3rd grade reading guarantee.

Specifically, according to LSC analysis, the bill (SB316) makes several changes to the third grade reading guarantee beginning with the 2012-2013 school year. Under current law, the third grade reading guarantee requires school districts and community schools to retain in third grade a student who scores in the "limited" range on the third grade English language arts assessment, unless the student's principal and reading teacher agree that the student is academically prepared for fourth grade or the student will receive intervention services in fourth grade. The bill changes the "cut" score and applies the guarantee to all students who do not receive at least a "proficient" (or passing) score on the assessment. The "limited" score, which currently triggers the guarantee, is the lowest of five scoring ranges and two levels below "proficient."

In short, more students will be held back, and less flexibility will be granted to educators in determining if a student who misses the proficient level can proceed to the fourth grade.

None of this expansion is funded, so let's take a look at what this policy might additionally cost cash strapped schools.

In October 2011, a total of 126,569 3rd grade Ohio public school students participated in the Reading Achievement Test. Here are the aggregated results, according to ODE statistics.

Level Number Percent
Advanced 22,987 18.2%
Accelerated 23,619 18.7%
Proficient 28,038 22.2%
Basic 23,574 18.6%
Limited 28,351 22.4%

In recent years the number of students scoring proficient or higher has varied from a high of 67.5% to a low of 53.3%. Remember, according to the new proposal, any student scoring below proficient is likely to be held back and made to repeat 3rd grade.

Again, according to ODE statistics, the median cost per pupil in Ohio per year in 2011 was $9,567.89, with an average of $9,961.57.

Under the new rules, the 51,925 students who failed to reach the minimum proficiency standard would have been at risk of being held back. At a median cost of $9,961 per student, districts could be on the hook for a total of $517,224,925 to fund that many students repeating 3rd grade.

To put that into some perspective, the crisis in Cleveland public schools is caused by a budget shortfall of $65 million. This unfunded manade could pay for that shortfall 8 times over.

Science Fact

Corporate education reform science fiction, is having an unintended(?) science fact effect.

First the science

If VAM scores are at all accurate, there ought to be a significant correlation between a teacher's score one year compared to the next. In other words, good teachers should have somewhat consistently higher scores, and poor teachers ought to remain poor. He created a scatter plot that put the ratings from 2009 on one axis, and the ratings from 2010 on the other axis. What should we expect here? If there is a correlation, we should see some sort of upward sloping line.

There is one huge takeway from all this. VAM ratings are not an accurate reflection of a teacher's performance, even on the narrow indicators on which they focus. If an indicator is unreliable, it is a farce to call it "objective."

This travesty has the effect of discrediting the whole idea of using test score data to drive reform. What does it say about "reformers" when they are willing to base a large part of teacher and principal evaluations on such an indicator?

That travesty is now manifesting itself in real personal terms.

In 2009, 96 percent of their fifth graders were proficient in English, 89 percent in math. When the New York City Education Department released its numerical ratings recently, it seemed a sure bet that the P.S. 146 teachers would be at the very top.

Actually, they were near the very bottom.
Though 89 percent of P.S. 146 fifth graders were rated proficient in math in 2009, the year before, as fourth graders, 97 percent were rated as proficient. This resulted in the worst thing that can happen to a teacher in America today: negative value was added.

The difference between 89 percent and 97 percent proficiency at P.S. 146 is the result of three children scoring a 2 out of 4 instead of a 3 out of 4.

While Ms. Allanbrook does not believe in lots of test prep, her fourth-grade teachers do more of it than the rest of the school.

In New York City, fourth-grade test results can determine where a child will go to middle school. Fifth-grade scores have never mattered much, so teachers have been free to focus on project-based learning. While that may be good for a child’s intellectual development, it is hard on a teacher’s value-added score.

These teachers are not the only ones.

Bill Turque tells the story of teacher Sarah Wysocki, who was let go by D.C. public schools because her students got low standardized test scores, even though she received stellar personal evaluations as a teacher.

She was evaluated under the the D.C. teacher evaluation system, called IMPACT, a so-called “value-added” method of assessing teachers that uses complicated mathematical formulas that purport to tell how much “value” a teacher adds to how much a student learns.

As more data is demanded, more analysis can be done to demonstrate how unreliable it is for these purposes, and consequently we are guaranteed to read more stories of good teachers becoming victims of bad measurements. It's unfortunate we're going to have to go through all this to arrive at this understanding.

Nation's Report Card' Distracts From Real Concerns For Public Schools

Imagine you’re a parent of a seven-year-old who has just come home from school with her end-of-year report card. And the report card provides marks for only two subjects, and for children who are in grade-levels different from hers. Furthermore, there's nothing on the report card to indicate how well these children have been progressing throughout the year. There are no teacher comments, like "great participation in class" or "needs to turn in homework on time." And to top it off, the report gives a far harsher assessment of academic performance than reports you've gotten from other sources.

That's just the sort of "report card" that was handed to America yesterday in the form of the National Assessment of Education Progress. And while the NAEP is all well and good for what it is -- a biennial norm-referenced, diagnostic assessment of fourth and eighth graders in math and reading -- the results of the NAEP invariably get distorted into all kinds of completely unfounded "conclusions" about the state of America's public education system.

'Nation's Report Card" Is Not A Report Card

First off, let's be clear on what the NAEP results that we got yesterday actually entail. As Diane Ravitch explains, there are two different versions of NAEP: 1) the Main NAEP, which we got yesterday, given every other year in grades 4 and 8 to measure national and state achievement in reading and math based on guidelines that change from time to time; and 2) the Long-Term Trend NAEP given less frequently at ages 9, 13, and 17 to test reading and math on guidelines that have been tested since the early 1970s. (There are also occasional NAEPs given in other subjects.) So in other words, be very wary of anyone claiming to identify "long term trends" based on the Main NAEP. This week's release was not the "long term" assessment.

Second, let's keep in mind the NAEP's limits in measuring "achievement." NAEP reports results in terms of the percent of students attaining Advanced, Proficient, Basic, and Below Basic levels. What's usually reported out by the media is the "proficient and above" figure. After all, don't we want all children to be "proficient?" But what does that really mean? Proficiency as defined by NAEP is actually quite high, in fact, much higher than what most states require and higher than what other nations such as Sweden and Singapore follow.

Third, despite its namesake, NAEP doesn't really show "progress." Because NAEP is a norm-referenced test, its purpose is for comparison -- to see how many children fall above or below a "cut score." Repeated applications of NAEP provide periodic points of comparison of the percentages of students falling above and below the cut score, but does tracking that variance really show "progress?" Statisticians and researchers worth their salt would say no.

Finally, let's remember that NAEP proficiency levels have defined the targets that all states are to aim for according toto the No Child Left Behind legislation. This policy that has now been mostly scrapped, or at least significantly changed, due to the proficiency goals that have been called "unrealistic."

Does this mean that NAEP is useless. Of course not. As a diagnostic tool it certainly has its place. But as the National Center on Fair and Open Testing (FairTest) has concluded, "NAEP is better than many state tests but is still far from the 'gold standard' its proponents claim for it."

[readon2 url=""]Continue reading...[/readon2]