measures

The Foolish Endeavor of Rating Ed Schools by Graduates’ Value-Added

Via School Finance 101.

Knowing that I’ve been writing a fair amount about various methods for attributing student achievement to their teachers, several colleagues forwarded to me the recently released standards of the Council For the Accreditation of Educator Preparation, or CAEP. Specifically, several colleagues pointed me toward Standard 4.1 Impact on Student Learning:

4.1.The provider documents, using value-added measures where available, other state-supported P-12 impact measures, and any other measures constructed by the provider, that program completers contribute to an expected level of P-12 student growth.

http://caepnet.org/commission/standards/standard4/

Now, it’s one thing when relatively under-informed pundits, think tankers, politicians and their policy advisors pitch a misguided use of statistical information for immediate policy adoption. It’s yet another when professional organizations are complicit in this misguided use. There’s just no excuse for that! (political pressure, public polling data, or otherwise)

The problems associated with attempting to derive any reasonable conclusions about teacher preparation program quality based on value-added or student growth data (of the students they teach in their first assignments) are insurmountable from a research perspective.

Worse, the perverse incentives likely induced by such a policy are far more likely to do real harm than any good, when it comes to the distribution of teacher and teaching quality across school settings within states.

First and foremost, the idea that we can draw this simple line below between preparation and practice contradicts nearly every reality of modern day teacher credentialing and progress into and through the profession:

one teacher prep institution –> one teacher –> one job in one school –> one representative group of students

The modern day teacher collects multiple credentials from multiple institutions, may switch jobs a handful of times early in his/her career and may serve a very specific type of student, unlike those taught by either peers from the same credentialing program or those from other credentialing programs. This model also relies heavily on minimal to no migration of teachers across state borders (well, either little or none, or a ton of it, so that a state would have a large enough share of teachers from specific out of state institutions to compare). I discuss these issues in earlier posts.

Setting aside that none of the oversimplified assumptions of the linear diagram above hold (a lot to ignore!), let’s probe the more geeky technical issues of trying to use VAM to evaluate ed school effectiveness.

There exist a handful of recent studies which attempt to tease out certification program effects on graduate’s student’s outcomes, most of which encounter the same problems. Here’s a look at one of the better studies on this topic.

  • Mihaly, K., McCaffrey, D. F., Sass, T. R., & Lockwood, J. R. (2012). Where You Come From or Where You Go?

Specifically, this study tries to tease out the problem that arises when graduates of credentialing programs don’t sort evenly across a state. In other words, a problem that ALWAYS occurs in reality!

Researchy language tends to downplay these problems by phrasing them only in technical terms and always assuming there is some way to overcome them with statistical tweak or two. Sometimes there just isn’t and this is one of those times!

[readon2 url="http://schoolfinance101.wordpress.com/2013/02/25/revisiting-the-foolish-endeavor-of-rating-ed-schools-by-graduates-value-added/"]Continue reading...[/readon2]

ODE publishes propaganda

prop·a·gan·da
/ˌpräpəˈgandə/
Noun
1. Information, esp. of a biased or misleading nature, used to promote or publicize a particular political cause or point of view.
2. The dissemination of such information as a political strategy.

That aptly describes the latest document published by the Ohio Department of Education, titled "Myths vs. Facts about the Ohio Teacher Evaluation System". The document lists 10 alleged myths about the teacher evaluation system being created. We thought we'd take a closer look at some of these alleged "myths".

1. Myth: The state is telling us what to do in local evaluations.

ODE, under a bulleted list discussing local board flexibility in creating evaluations, state "The percentages within the given range for student growth measures for the teachers in that district;" This is no longer true for teacher who have Value-add scores. These teachers (over 30% of Ohio's teaching corps) will have 50% of their evaluation based on student test scores. On this, local boards have zero flexibility, it's a state mandate. We judge aspects of this myth to actually be true

2. Myth: This is just a way to fire teachers.

ODE goes to great length to discuss how these evaluations will be great for teachers in identifying areas of improvement (though no money has been allocated for professional development). Utterly lacking is any discussion of the provision within HB153 prohibits giving preference based on seniority in determining the order of layoffs or in rehiring teachers when positions become available again, except when choosing between teachers with comparable evaluations. It is no secret that corporate education reformers such as Michelle Rhee desperately want to use evaluations for the basis of firing what they purportedly measure to be "ineffective" teachers. After all, this is exactly the process used in Washington DC where she came from. It's far too soon to call this a myth, it's more like a corporate educators goal.

3. Myth: One test in the spring will determine my fate.

It's nice that ODE stresses the importance of using multiple measures, but once again they fail to acknowledge that HB555 removed those multiple measures for 30% of Ohio's teachers. For those teachers their fate will be determined by tests. This myth is therefore true.

5. Myth: The state has not done enough work on this system – there are too many unanswered questions.

How can it be a myth when even this documents fails to state that "we're ready". SLO's have yet to be developed, Common Core is almost upon us but no one knows what the tests will be, the legislature keeps changing the rules of the game and no where near enough evaluator training has taken place to evaluate all of Ohio's teachers. Ohio isn't ready for this and that's a fact, not a myth.

6. Myth: “Value-Added” is a mysterious formula and is too volatile to be trusted.

This is perhaps one of the most egregious points of all. Study after study after study has demonstrated that Value add is volatile, unreliable and inappropriate for measuring teacher effectiveness. Their explanation conflates the use of value-add as a diagnostic tool and its use in evaluating teachers. Those are 2 very different use cases indeed.

As for it being mysterious, the formula used in Ohio is secret and proprietary - it doesn't get more mysterious than that! This claim by ODE is simply untrue and ridiculous, they ought to be embarrassed for publishing it. This myth is totally true and real and backed up by all the available scientific evidence.

7. Myth: The current process for evaluating teachers is fine just as it is.

Their explanation: "Last year, 99.7 percent of teachers around the country earned a “satisfactory” evaluation, yet many students didn’t make a year’s worth of progress in reading and are not reading at grade level." Right out of the corporate education reformers message book. Blame the teacher. Still think this isn't going to end up being about firing teachers? This myth is a straw-man, no one argues the current system is ideal, but the proposed OTES is dangerously constructed.

8. Myth: Most principals (or other evaluators) don’t have time to do this type of evaluation, so many will just report that teachers are proficient.

ODE states "Fact: Most principals are true professionals who want the teachers in their buildings to do well." But wait a minute, in Myth #7 these very same principals were handing out "satisfactory" grades like candy to 99.7% of teachers. Which is it? Are they professionals who can fairly evaluate teachers, or aren't they? We wrote about the massive administrative task faced by school administrators almost 2 years ago. Nothing has happened to alleviate those burdens, other than a $2 billion budget cut. This myth is 100% true.

9. Myth: This new evaluation system is like building the plane while we’re flying it.

ODE states: "Fact: Just as the Wright brothers built a plane, tried it by flying it, landed it, and then refined the plane they built, the new evaluation system was built, tried and revised. "

We'll just point out that 110 years have passed since the Wright Brothers first flew and the world has developed better design and project management tools since then.

10. Myth: It will be easy to implement the new teacher evaluation system.

Has anyone, anywhere said this? Or did the ODE brainstorming session run out of bad ideas at 9, and this is all they could come up with? Talk about ending with a straw-man, which frankly, given the rest of the document is probably the most appropriate ending.

ODE ought to withdraw this piece of propaganda from public view.

How Do Value-Added Indicators Compare to Other Measures of Teacher Effectiveness?

Via

Highlights

  • Value-added measures are positively related to almost all other commonly accepted measures of teacher performance such as principal evaluations and classroom observations.
  • While policymakers should consider the validity and reliability of all their measures, we know more about value-added than others.
  • The correlations appear fairly weak, but this is due primarily to lack of reliability in essentially all measures.
  • The measures should yield different performance results because they are trying to measure different aspects of teaching, but they differ also because all have problems with validity and reliability.
  • Using multiple measures can increase reliability; validity is also improved so long as the additional measures capture aspects of teaching we value.
  • Once we have two or three performance measures, the costs of more measures for accountability may not be justified. But additional formative assessments of teachers may still be worthwhile to help these teachers improve.

Introduction

In the recent drive to revamp teacher evaluation and accountability, measures of a teacher’s value added have played the starring role. But the star of the show is not always the best actor, nor can the star succeed without a strong supporting cast. In assessing teacher performance, observations of classroom practice, portfolios of teachers’ work, student learning objectives, and surveys of students are all possible additions to the mix.

All these measures vary in what aspect of teacher performance they measure. While teaching is broadly intended to help students live fulfilling lives, we must be more specific about the elements of performance that contribute to that goal – differentiating contributions to academic skills, for instance, from those that develop social skills. Once we have established what aspect of teaching we intend to capture, the measures differ in how valid and reliable they are in capturing that aspect.

Although there are big holes in what we know about how evaluation measures stack up on these two criteria, we can draw some important conclusions from the evidence collected so far. In this brief, we will show how existing research can help district and state leaders who are thinking about using multiple measures of teacher performance to guide them in hiring, development, and retention.

[readon2 url="http://www.carnegieknowledgenetwork.org/briefs/value-added/value-added-other-measures/"]Continue reading...[/readon2]

How Stable are Value-Added Estimates

Via

Highlights:

  • A teacher’s value-added score in one year is partially but not fully predictive of her performance in the next.
  • Value-added is unstable because true teacher performance varies and because value-added measures are subject to error.
  • Two years of data does a meaningfully better job at predicting value added than does just one. A teacher’s value added in one subject is only partially predictive of her value added in another, and a teacher’s value added for one group of students is only partially predictive of her valued added for others.
  • The variation of a teacher’s value added across time, subject, and student population depends in part on the model with which it is measured and the source of the data that is used.
  • Year-to-year instability suggests caution when using value-added measures to make decisions for which there are no mechanisms for re-evaluation and no other sources of information.

Introduction

Value-added models measure teacher performance by the test score gains of their students, adjusted for a variety factors such as the performance of students when they enter the class. The measures are based on desired student outcomes such as math and reading scores, but they have a number of potential drawbacks. One of them is the inconsistency in estimates for the same teacher when value added is measured in a different year, or for different subjects, or for different groups of students.

Some of the differences in value added from year to year result from true differences in a teacher’s performance. Differences can also arise from classroom peer effects; the students themselves contribute to the quality of classroom life, and this contribution changes from year to year. Other differences come from the tests on which the value-added measures are based; because test scores are not perfectly accurate measures of student knowledge, it follows that they are not perfectly accurate gauges of teacher performance.

In this brief, we describe how value-added measures for individual teachers vary across time, subject, and student populations. We discuss how additional research could help educators use these measures more effectively, and we pose new questions, the answers to which depend not on empirical investigation but on human judgment. Finally, we consider how the current body of knowledge, and the gaps in that knowledge, can guide decisions about how to use value-added measures in evaluations of teacher effectiveness.

[readon2 url="http://www.carnegieknowledgenetwork.org/briefs/value-added/value-added-stability/"]Continue reading...[/readon2]

Do Value-Added Methods Level the Playing Field for Teachers?

Via

Highlights

  • Value-added measures partially level the playing field by controlling for many student characteristics. But if they don't fully adjust for all the factors that influence achievement and that consistently differ among classrooms, they may be distorted, or confounded (An estimate of a teacher’s effect is said to be confounded when her contribution cannot be separated from other factors outside of her control, namely the the students in her classroom.)
  • Simple value-added models that control for just a few tests scores (or only one score) and no other variables produce measures that underestimate teachers with low-achieving students and overestimate teachers with high-achieving students.
  • The evidence, while inconclusive, generally suggests that confounding is weak. But it would not be prudent to conclude that confounding is not a problem for all teachers. In particular, the evidence on comparing teachers across schools is limited.
  • Studies assess general patterns of confounding. They do not examine confounding for individual teachers, and they can't rule out the possibility that some teachers consistently teach students who are distinct enough to cause confounding.
  • Value-added models often control for variables such as average prior achievement for a classroom or school, but this practice could introduce errors into value-added estimates.
  • Confounding might lead school systems to draw erroneous conclusions about their teachers – conclusions that carry heavy costs to both teachers and society.

Introduction

Value-added models have caught the interest of policymakers because, unlike using student tests scores for other means of accountability, they purport to "level the playing field." That is, they supposedly reflect only a teacher's effectiveness, not whether she teaches high- or low-income students, for instance, or students in accelerated or standard classes. Yet many people are concerned that teacher effects from value-added measures will be sensitive to the characteristics of her students. More specifically, they believe that teachers of low-income, minority, or special education students will have lower value-added scores than equally effective teachers who are teaching students outside these populations. Other people worry that the opposite might be true - that some value-added models might cause teachers of low-income, minority, or special education students to have higher value-added scores than equally effective teachers who work with higher-achieving, less risky populations.

In this brief, we discuss what is and is not known about how well value-added measures level the playing field for teachers by controlling for student characteristics. We first discuss the results of empirical explorations. We then address outstanding questions and the challenges to answering them with empirical data. Finally, we discuss the implications of these findings for teacher evaluations and the actions that may be based on them.

[readon2 url="http://www.carnegieknowledgenetwork.org/briefs/value-added/level-playing-field/"]Continue reading...[/readon2]