errors

How Should Educators Interpret Value-Added Scores?

Via

Highlights

  • Each teacher, in principle, possesses one true value-added score each year, but we never see that "true" score. Instead, we see a single estimate within a range of plausible scores.
  • The range of plausible value-added scores -; the confidence interval -; can overlap considerably for many teachers. Consequently, for many teachers we cannot readily distinguish between them with respect to their true value-added scores.
  • Two conditions would enable us to achieve value-added estimates with high reliability: first, if teachers' value-added measurements were more precise, and second, if teachers’ true value-added scores varied more dramatically than they do.
  • Two kinds of errors of interpretation are possible when classifying teachers based on value-added: a) “false identifications” of teachers who are actually above a certain percentile but who are mistakenly classified as below it; and b) “false non-identifications” of teachers who are actually below a certain percentile but who are classified as above it. Falsely identifying teachers as being below a threshold poses risk to teachers, but failing to identify teachers who are truly ineffective poses risks to students.
  • Districts can conduct a procedure to identify how uncertainty about true value-added scores contributes to potential errors of classification. First, specify the group of teachers you wish to identify. Then, specify the fraction of false identifications you are willing to tolerate. Finally, specify the likely correlation between value-added score this year and next year. In most real-world settings, the degree of uncertainty will lead to considerable rates of misclassification of teachers.

Introduction

A teacher's value-added score is intended to convey how much that teacher has contributed to student learning in a particular subject in a particular year. Different school districts define and compute value-added scores in different ways. But all of them share the idea that teachers who are particularly successful will help their students make large learning gains, that these gains can be measured by students' performance on achievement tests, and that the value-added score isolates the teacher's contribution to these gains.

A variety of people may see value-added estimates, and each group may use them for different purposes. Teachers themselves may want to compare their scores with those of others and use them to improve their work. Administrators may use them to make decisions about teaching assignments, professional development, pay, or promotion. Parents, if they see the scores, may use them to request particular teachers for their children. And, finally, researchers may use the estimates for studies on improving instruction.

Using value-added scores in any of these ways can be controversial. Some people doubt the validity of the achievement tests on which the scores are based, some question the emphasis on test scores to begin with, and others challenge the very idea that student learning gains reflect how well teachers do their jobs.

In order to sensibly interpret value-added scores, it is important to do two things: understand the sources of uncertainty and quantify its extent.

Our purpose is not to settle these controversies, but, rather, to answer a more limited, but essential, question: How might educators reasonably interpret value-added scores? Social science has yet to come up with a perfect measure of teacher effectiveness, so anyone who makes decisions on the basis of value-added estimates will be doing so in the midst of uncertainty. Making choices in the face of doubt is hardly unusual – we routinely contend with projected weather forecasts, financial predictions, medical diagnoses, and election polls. But as in these other areas, in order to sensibly interpret value-added scores, it is important to do two things: understand the sources of uncertainty and quantify its extent. Our aim is to identify possible errors of interpretation, to consider how likely these errors are to arise, and to help educators assess how consequential they are for different decisions.

We'll begin by asking how value-added scores are defined and computed. Next, we'll consider two sources of error: statistical bias and statistical imprecision.

[readon2 url="http://www.carnegieknowledgenetwork.org/briefs/value-added/interpreting-value-added/"]Continue reading...[/readon2]

The weakest "linkage"

Many changes are starting to ripple down to the classroom level as Ohio moves forward with its efforts to implement corporate education reform. One of those changes is the creation and increasing use of teacher level value add reports. We provided some basic background on value add here if you need a refresher.

One of the most important steps in producing these complex reports for each teacher is to know which teacher taught which student, in which subject, and for how long. We need to know this for every student and every teacher. It's a process called "linkage". Without this linkage teachers could not be credited with the instruction they provided to each student.

By 2013, it will not be just RttT districts and Battelle for Kids’ projects that will require this linkage to occur, but all school districts must “implement a classroom-level value-added program (HB 153; Section 3319.112(A)(7)).

These teacher-level value-added reports will be used to determine teacher effectiveness and will be a significant factor in teacher evaluations. So it is clear that being able to accurately link student to teacher per subject is going to be critical if this system has any hope of working fairly.

If one imagines common scenarios such as students moving, teachers getting sick and having a sub, and one multiplies that by over 120,000 teachers in Ohio and almost 2 million students - the opportunity for linkage error is simply massive, only surpassed by the sheer magnitude of the administrative effort needed to keep this whole enterprise from unravelling.

Battelle for Kids has spent part of the summer providing some training and webinars on this issue.

In spring 2010, more than 125,000 rosters were verified by educators in South Carolina, Texas, Ohio and Oklahoma. Recent analyses of linkage results from schools across the country yield alarming results, including

Everytime a student moves, someone will have to go into a computer system and remove them from each teacher's roster, and then when that student enrolls in a new school, someone will have to go into a computer system and add them to each of their new teacher's rosters. Ditto for students changing classes, ditto for teachers needing to be replaced and on and on. Hundreds of thousands of changes throughout the school year will need to be performed, and all this on top of hoping the initial set up of millions of teacher/student linkages is error free each year to begin with!

Because Value add is longitudinal, i.e. results from previous years are used to make current year calculations, any errors from previous years will also be carried over, so it isn't like each year allows for a fresh start either. Indeed, as time rolls by, the errors may be compounding.

According to Battelle's own presentation this system hasn't worked to date in South Carolina, Texas, Oklahoma, nor Ohio - which is set to expand it to every school district.

How much confidence can anyone have in a system that will be used for high stakes decisions such as pay and employment, that relies on such gargantuan administrative tracking that has proven to be as utterly unreliable as this?

Link Before You Leap