reliable

Gates Foundation Wastes More Money Pushing VAM

Makes it hard to trust the corporate ed reformers when they goose their stats as badly as this.

Any attempt to evaluate teachers that is spoken of repeatedly as being "scientific" is naturally going to provoke rebuttals that verge on technical geek-speak. The MET Project's "Ensuring Fair and Reliable Measures of Effective Teaching" brief does just that. MET was funded by the Bill & Melinda Gates Foundation.

At the center of the brief's claims are a couple of figures (“scatter diagrams” in statistical lingo) that show remarkable agreement in VAM scores for teachers in Language Arts and Math for two consecutive years. The dots form virtual straight lines. A teacher with a high VAM score one year can be relied on to have an equally high VAM score the next, so Figure 2 seems to say.

Not so. The scatter diagrams are not dots of teachers' VAM scores but of averages of groups of VAM scores. For some unexplained reason, the statisticians who analyzed the data for the MET Project report divided the 3,000 teachers into 20 groups of about 150 teachers each and plotted the average VAM scores for each group. Why?

And whatever the reason might be, why would one do such a thing when it has been known for more than 60 years now that correlating averages of groups grossly overstates the strength of the relationship between two variables? W.S. Robinson in 1950 named this the "ecological correlation fallacy." Please look it up in Wikipedia. The fallacy was used decades ago to argue that African-Americans were illiterate because the correlation of %-African-American and %-illiterate was extremely high when measured at the level of the 50 states. In truth, at the level of persons, the correlation is very much lower; we’re talking about differences as great as .90 for aggregates vs .20 for persons.

Just because the average of VAM scores for 150 teachers will agree with next year's VAM score average for the same 150 teachers gives us no confidence that an individual teacher's VAM score is reliable across years. In fact, such scores are not — a fact shown repeatedly in several studies.

[readon2 url="http://ed2worlds.blogspot.com/2013/01/gates-foundation-wastes-more-money.html"]Continue reading...[/readon2]

Value-Added Versus Observations

Value-Added Versus Observations, Part One: Reliability

Although most new teacher evaluations are still in various phases of pre-implementation, it’s safe to say that classroom observations and/or value-added (VA) scores will be the most heavily-weighted components toward teachers’ final scores, depending on whether teachers are in tested grades and subjects. One gets the general sense that many – perhaps most – teachers strongly prefer the former (observations, especially peer observations) over the latter (VA).

One of the most common arguments against VA is that the scores are error-prone and unstable over time – i.e., that they are unreliable. And it’s true that the scores fluctuate between years (also see here), with much of this instability due to measurement error, rather than “real” performance changes. On a related note, different model specifications and different tests can yield very different results for the same teacher/class.

These findings are very important, and often too casually dismissed by VA supporters, but the issue of reliability is, to varying degrees, endemic to all performance measurement. Actually, many of the standard reliability-based criticisms of value-added could also be leveled against observations. Since we cannot observe “true” teacher performance, it’s tough to say which is “better” or “worse,” despite the certainty with which both “sides” often present their respective cases. And, the fact that both entail some level of measurement error doesn’t by itself speak to whether they should be part of evaluations.*

Nevertheless, many states and districts have already made the choice to use both measures, and in these places, the existence of imprecision is less important than how to deal with it. Viewed from this perspective, VA and observations are in many respects more alike than different.

[readon2 url="http://shankerblog.org/?p=5621"]Continue reading part I[/readon2]

Value-Added Versus Observations, Part Two: Validity

In a previous post, I compared value-added (VA) and classroom observations in terms of reliability – the degree to which they are free of error and stable over repeated measurements. But even the most reliable measures aren’t useful unless they are valid – that is, unless they’re measuring what we want them to measure.

Arguments over the validity of teacher performance measures, especially value-added, dominate our discourse on evaluations. There are, in my view, three interrelated issues to keep in mind when discussing the validity of VA and observations. The first is definitional – in a research context, validity is less about a measure itself than the inferences one draws from it. The second point might follow from the first: The validity of VA and observations should be assessed in the context of how they’re being used.

Third and finally, given the difficulties in determining whether either measure is valid in and of itself, as well as the fact that so many states and districts are already moving ahead with new systems, the best approach at this point may be to judge validity in terms of whether the evaluations are improving outcomes. And, unfortunately, there is little indication that this is happening in most places.

Let’s start by quickly defining what is usually meant by validity. Put simply, whereas reliability is about the precision of the answers, validity addresses whether we’re using them to answer the correct questions. For example, a person’s weight is a reliable measure, but this doesn’t necessarily mean it’s valid for gauging the risk of heart disease. Similarly, in the context of VA and observations, the question is: Are these indicators, even if they can be precisely estimated (i.e., they are reliable), measuring teacher performance in a manner that is meaningful for student learning?

[readon2 url="http://shankerblog.org/?p=5670"]Continue reading part II[/readon2]

Linking Student Data to Teachers a Complex Task, Experts Say

As more and more states push legislation tying teacher evaluations to student achievement – a policy incentivized by the federal Race to the Top program – many are scrambling to put data systems in place that can accurately connect teachers to their students. But in a world of student mobility, teacher re-assignments, co-teaching, and multiple service providers, determining the roster of students to attribute to a teacher is more complicated than it may sound.
[...]
Jane West, vice president of policy, programs, and professional issues for the American Association of Colleges of Teacher Education, stressed that while there's a need to track the performance of teacher-education graduates, "we have a long way to go" before the data can be considered reliable.

Teachers who leave the state, teach out-of-field, or move to private schools are nearly impossible to track, she said. And teachers in non-tested subjects and grades are out of the mix as well. Last year, the University of Central Florida was only able to get student-achievement data for 12 percent of its graduating class, yet that information was reported publicly. "What's the threshold?" West asked. "Where's the check to ensure that's a valid and reliable measure? It needs to be more than 12 percent."

In all, the Data Quality Campaign’s conference was tightly managed and left little opportunity for audience participation, offering attendees a controlled (though still controversial) takeaway: that improved student achievement hinges on improving the teacher-student data link.

[readon2 url="http://aacte.org/index.php?/Media-Center/AACTE-in-the-News/linking-student-data-to-teachers-a-complex-task-experts-say.html"]Read the entire article..[/readon2]

Students are not widgets

We wrote the other day about the bi-annual teacher observation provision in S.B.5 that if implemented, would cause a serious administrative strain on schools. Today, promoted by a Dispatch article, we want to expand our look at the other proposed teacher evaluation policies being pushed by the governor and his education Czar

Gov. John Kasich wants teachers to be paid based on performance: They should earn more if they can prove that their students are learning.

But the tool at the heart of Kasich's merit-pay proposals is reliable with only 68 percent confidence. That's why the state plans an upgrade to make "value-added" results 95 percent reliable.

With 146,000 teachers in Ohio, even at 95% accuracy, if that can be believed, 7,300 teacher evaluations would be based on inaccurate data. That's bad enough, if only that were the problem.

But let's just take a step back for a second. What is value added assessment?

Value added assessment assumes that changes in test scores from one year to the next accurately reflect student progress in learning. It evaluates teachers by tracking progress and linking it to schools and teachers. These estimates can be used as indicators of teachers’ and schools’ effectiveness. Sounds good, right ?

In theory. In practice many teachers do not teach classes that are tested, and in many schools, as is pointed out by this terrific article, who is responsible isn't so cut and dried either

In the school where I work teachers are expected to teach reading “across the curriculum” meaning that all teachers are supposed to teach reading. Also, all teachers are supposed to teach writing “across the curriculum.” So, students would have to be tested in those areas as well. But if it taught across the curriculum, how would we know to which teacher to attribute the child’s performance?

Indeed, how would we know?

When you get beyond these obvious problems with value added assessments, there are also serious methodological problems too, as is brought to light by this paper from the Economic Policy institute

there is broad agreement among statisticians, psychometricians, and economists that student test scores alone are not sufficiently reliable and valid indicators of teacher effectiveness to be used in high-stakes personnel decisions, even when the most sophisticated statistical applications such as value-added modeling are employed.

For a variety of reasons, analyses of VAM results have led researchers to doubt whether the methodology can accurately identify more and less effective teachers.

Oh.

Back to that Dispatch article

Robert Sommers, Kasich's top education adviser, said he thinks Ohio's accountability system is ready for merit pay. Value-added has been used in Ohio only to rate schools, not teachers.

"As far as I'm concerned, it is a very, very solid system," he said. "It has had lots of years of maturation."

The Governors education Czar is simply not correct. The system as it pertains to teacher evaluation is not accurate enough, has demonstrably problematic statistical issues, and requires deeper study.

Students are not widgets being processed on a production line by a single teacher. Modern education is a team effort, and attempts to isolate individual contributions to that team effort are going to require approaches far more robust.