true

Why Test Scores CAN'T Evaluate Teachers

June 14, 2013 in Article

From the National Education Policy Center. the entire post is well worth a read, here's the synopsis

The key element here that distinguishes Student Growth Percentiles from some of the other things that people have used in research is the use of percentiles. It's there in the title, so you'd expect it to have something to do with percentiles. What does that mean? It means that these measures are scale-free. They get away from psychometric scaling in a way that many researchers - not all, but many - say is important.

Now these researchers are not psychometricians, who aren't arguing against the scale. The psychometricians as who create our tests, they create a scale, and they use scientific formulae and theories and models to come up with a scale. It's like on the SAT, you can get between 200 and 800. And the idea there is that the difference in the learning or achievement between a 200 and a 300 is the same as between a 700 and an 800.

There is no proof that that is true. There is no proof that that is true. There can't be any proof that is true. But, if you believe their model, then you would agree that that's a good estimate to make. There are a lot of people who argue... they don't trust those scales. And they'd rather use percentiles because it gets them away from the scale.

Let's state this another way so we're absolutely clear: there is, according to Jonah Rockoff, no proof that a gain on a state test like the NJASK from 150 to 160 represents the same amount of "growth" in learning as a gain from 250 to 260. If two students have the same numeric growth but start at different places, there is no proof that their "growth" is equivalent.

Now there's a corollary to this, and it's important: you also can't say that two students who have different numeric levels of "growth" are actually equivalent. I mean, if we don't know whether the same numerical gain at different points on the scale are really equivalent, how can we know whether one is actually "better" or "worse"? And if that's true, how can we possibly compare different numerical gains?

[readon2 url="http://nepc.colorado.edu/blog/why-test-scores-cant-evaluate-teachers"]Continue reading...[/readon2]

How Should Educators Interpret Value-Added Scores?

November 26, 2012 in Article

Via

Highlights

Each teacher, in principle, possesses one true value-added score each year, but we never see that "true" score. Instead, we see a single estimate within a range of plausible scores.
The range of plausible value-added scores -; the confidence interval -; can overlap considerably for many teachers. Consequently, for many teachers we cannot readily distinguish between them with respect to their true value-added scores.
Two conditions would enable us to achieve value-added estimates with high reliability: first, if teachers' value-added measurements were more precise, and second, if teachers’ true value-added scores varied more dramatically than they do.
Two kinds of errors of interpretation are possible when classifying teachers based on value-added: a) “false identifications” of teachers who are actually above a certain percentile but who are mistakenly classified as below it; and b) “false non-identifications” of teachers who are actually below a certain percentile but who are classified as above it. Falsely identifying teachers as being below a threshold poses risk to teachers, but failing to identify teachers who are truly ineffective poses risks to students.
Districts can conduct a procedure to identify how uncertainty about true value-added scores contributes to potential errors of classification. First, specify the group of teachers you wish to identify. Then, specify the fraction of false identifications you are willing to tolerate. Finally, specify the likely correlation between value-added score this year and next year. In most real-world settings, the degree of uncertainty will lead to considerable rates of misclassification of teachers.

Introduction

A teacher's value-added score is intended to convey how much that teacher has contributed to student learning in a particular subject in a particular year. Different school districts define and compute value-added scores in different ways. But all of them share the idea that teachers who are particularly successful will help their students make large learning gains, that these gains can be measured by students' performance on achievement tests, and that the value-added score isolates the teacher's contribution to these gains.

A variety of people may see value-added estimates, and each group may use them for different purposes. Teachers themselves may want to compare their scores with those of others and use them to improve their work. Administrators may use them to make decisions about teaching assignments, professional development, pay, or promotion. Parents, if they see the scores, may use them to request particular teachers for their children. And, finally, researchers may use the estimates for studies on improving instruction.

Using value-added scores in any of these ways can be controversial. Some people doubt the validity of the achievement tests on which the scores are based, some question the emphasis on test scores to begin with, and others challenge the very idea that student learning gains reflect how well teachers do their jobs.

In order to sensibly interpret value-added scores, it is important to do two things: understand the sources of uncertainty and quantify its extent.

Our purpose is not to settle these controversies, but, rather, to answer a more limited, but essential, question: How might educators reasonably interpret value-added scores? Social science has yet to come up with a perfect measure of teacher effectiveness, so anyone who makes decisions on the basis of value-added estimates will be doing so in the midst of uncertainty. Making choices in the face of doubt is hardly unusual – we routinely contend with projected weather forecasts, financial predictions, medical diagnoses, and election polls. But as in these other areas, in order to sensibly interpret value-added scores, it is important to do two things: understand the sources of uncertainty and quantify its extent. Our aim is to identify possible errors of interpretation, to consider how likely these errors are to arise, and to help educators assess how consequential they are for different decisions.

We'll begin by asking how value-added scores are defined and computed. Next, we'll consider two sources of error: statistical bias and statistical imprecision.

[readon2 url="http://www.carnegieknowledgenetwork.org/briefs/value-added/interpreting-value-added/"]Continue reading...[/readon2]

Why the ‘market theory’ of education reform doesn’t work

October 16, 2012 in Article

Modern education reform is being driven by people who believe that competition, privatization and other elements of a market economy will improve public schools. In this post, Mark Tucker, president of the non-profit National Center on Education and the Economy and an internationally known expert on reform, explains why this approach is actually harming rather than helping schools.

Years ago, Milton Friedman and others opined that the best possible education reform would be one based on good old market theory. Public education, the analysis went, was a government monopoly, and, teachers and school administrators, freed from the discipline of the market, as in all government monopolies, had no incentive to control costs or deliver high quality. That left them free to feather their own nest. Obviously, the solution was to subject public education to the rigors of the market. Put the money the public collected for the schools into the hands of the parents. Let them choose the best schools for their children. Given a genuine choice among schools, parents would have a strong incentive to choose the ones that were able to produce the highest achievement at the lowest possible cost, driving achievement up and costs down.

At first, there was little appetite among the public for this approach. But, in time, many people, both Republicans and Democrats, seeing the cost of public education steadily rise with no corresponding improvement in student performance, began to blame the school bureaucracy and the teachers’ unions. They saw charter schools as a way to get away from both. All of these people, both those driven by ideology in the form of market theory and those driven by anger at the “educrats” and the teachers unions, found that they could agree on charter schools. A coalition of Silicon Valley entrepreneurs and Wall Street investors put their money behind the cause and the die was cast. The U.S. Department of Education then jumped in with both feet. Choice and markets, in the form of the charter movement, began to drive the American education reform agenda in a big way.

The theory is neat as pin and as American as apple pie. But what if it is not true? What if it does not predict what actually happens when it is put into practice?

For the theory to work, parents would have to make their decisions largely on the basis of information about student performance at the schools from which they can choose. But it turns out that they don’t do that. American parents seem to care most about their children’s safety. Wouldn’t you? Then they prefer a school that is close to home. At the secondary school level, many appear to care a lot more about which schools have the most successful competitive sports programs, rather than which of them produce the most successful scholars. How many trophies in the lobby of the entrances to our schools are for academic contests? If the theory was working the way it is supposed to, you would expect that the first schools to be in trouble would be the worst schools, the ones with the worst academic performance. But any school superintendent will tell you that the most difficult task a superintendent faces is shutting down a school — any school — even if its academic performance is in the basement. How could this be? Does it mean that parents don’t care at all about academic performance? I don’t think so.

But it does mean that, if they have met teachers at that school that seem to really care about their children, take a personal interest in them and seem to be decent people, they are likely to place more value on those things than on district league tables of academic performance based on standardized tests of basic skills, especially if they perceive that school to be safe and it is close to home.

The theory doesn’t work. It doesn’t work in theory (because most parents don’t place academic performance at the top of their list of things they are looking for in a school) and it doesn’t work in practice, either. How do we know that? Because, when we look at large-scale studies of the academic performance of charter schools versus regular public schools, taking into account the background of the students served, the results come out within a few points of each other, conferring a decisive advantage on neither. It is certainly true that some charter schools greatly outperform the average regular public school, but it is also true that some regular public schools greatly outperform the average charter school.

[readon2 url="http://www.washingtonpost.com/blogs/answer-sheet/wp/2012/10/12/why-the-market-theory-of-education-reform-doesnt-work/?wprss=rss_answer-sheet"]Continue Reading...[/readon2]

Erasures demonstrate huge sensitivity in ratings

August 20, 2012 in Article

The Dispatch had another speculative piece of reporting on the attendance erasure issue. We'll leave educator Greg Mild at Plunderbund to go over the substance, or lack thereof, of the article itself. We want to concentrate on something else mentioned in the article which stood out.

At the heart of the controversy is this

Though 7 percent [number of deleted records] may not sound like a lot, it could have a big effect: There are students behind those numbers, and some of their standardized test scores were likely discounted when their attendance records were deleted. That means Columbus’ school grades could have been artificially inflated because of records-tampering.

From this, the Department of Education had a remarkable comment

“The math indicates that removing one student could affect the overall rating,” said John Charlton, spokesman for the Ohio Department of Education. “In some districts, it could be one kid. It’s about dropping the right kid.”

Can this be true? Can the ratings of an entire district truly be affected by just a handful of students, or even one? We have an actual true life example, that yes, small numbers of students can indeed cause an entire districts rating to be changed

In Lockland near Cincinnati, removing 36 kids from the rolls — about 5 percent of their student population — lifted the district from a C to a B on the state report card.

Can a school rating system that is so sensitive to just a handful of students truly be measuring the district as a whole? This growing controversy over attendance data is revealing a lot more than people realize.

How Socrates would fare on new teacher evaluation plan

June 16, 2011 in Article

This is a pretty entertaining piece

The upstart Gates-funded organization Educators 4 Excellence has just put forth a proposal for teacher evaluations in New York City. They would accord 25 percent of the evaluation to student value-added growth data; 15 percent to data from local assessments; 30 percent to administrator observations; 15 percent to independent outside observations; 10 percent to student surveys; and 5 percent to support from the community.

The observations, they say, should follow a rubric. What sort of rubric should this be? The proposal states:

Observations should focus on three main criteria:

1. Observable teacher behaviors that have been demonstrated to impact student learning. For example, open-ended questions are more effective at improving student learning than closed questions.

2. Student behaviors in response to specific teacher behaviors and overall student engagement.

3. Teacher language that is specific and appropriate to the grade level and content according to taxonomy, such as Bloom’s. For example, kindergarten teachers should use different language than high school biology teachers.

Let’s see how Socrates might fare under these conditions. As I recall, he asked a fair number of closed questions. He did this to show his interlocutors a contradiction between what they assumed was true and what they subsequently reasoned to be true.

[readon2 url="http://www.washingtonpost.com/blogs/answer-sheet/post/how-socrates-would-fare-on-new-teacher-evaluation-plan/2011/06/13/AGGdhfTH_blog.html#pagebreak"]Continue reading...[/readon2]

The Exaggerations of TFA

May 20, 2011 in Article

A former TFA'er digs into some wild TFA claims

In response to critics that TFA teachers don’t have enough long-term impact, TFA replies with the statement from their annual survey “Nearly two-thirds of Teach For America alumni work in the field of education, and half of those in education are teachers. Teaching remains the most common profession among our alumni.”

Now a statement like this is pretty strong and probably shuts up those critics, though it also probably leaves them scratching their heads. How could this statement possibly be true?

[readon2 url="http://garyrubinstein.teachforus.org/2011/05/07/two-out-of-three-aint-bad-but-is-it-true/"]Keep reading...[/readon2]