questions

Shame on the PD and NPR

When the Cleveland Plain Dealer and NPR decided to publish the names of 4,200 Ohio teachers and their value-added grades, their reasoning was specious and self-serving. Most of all, it is damaging to the teaching profession in Ohio.

Despite pointing out all the flaws, caveats, and controversies with the use of value-add as a means to evaluate teachers, both publications decided to go ahead and shame these 4,200 teacher anyway. The publication of teachers names and scores isn't new. It was first done by the LA Times, and was a factor in the suicide of one teacher. The LA Times findings and analysis was then discredited

The research on which the Los Angeles Times relied for its August 2010 teacher effectiveness reporting was demonstrably inadequate to support the published rankings. Using the same L.A. Unified School District data and the same methods as the Times, this study probes deeper and finds the earlier research to have serious weaknesses.

DUE DILIGENCE AND THE EVALUATION OF TEACHERS by National Education Policy Center

The Plain Dealer analysis is weaker than the LA Times, relying on just 2 years worth of data rather than 7. In fact, the Pleain Dealer and NPR stated they only published 4,200 teachers scores and not the 12,000 scores they had data for because most only had 1 years worth of data. A serious error as value-add is known to be highly unreliable and subject to massive variance.

Beyond the questionable statistical analysis, the publication of teachers names and value-added scores has been criticized by a great number of people, including corporate education reformer Bill Gates, in NYT op-ed titled "Shame Is Not the Solution"

LAST week, the New York State Court of Appeals ruled that teachers’ individual performance assessments could be made public. I have no opinion on the ruling as a matter of law, but as a harbinger of education policy in the United States, it is a big mistake.

I am a strong proponent of measuring teachers’ effectiveness, and my foundation works with many schools to help make sure that such evaluations improve the overall quality of teaching. But publicly ranking teachers by name will not help them get better at their jobs or improve student learning. On the contrary, it will make it a lot harder to implement teacher evaluation systems that work.

Gates isn't the only high profile corporate education reformer who is critical of such shaming, Wendy Knopp, CEO of Teach for America has also spoken out against the practice

Kopp is not shy about saying what she'd do differently as New York City schools chancellor. While the Bloomberg administration is fighting the United Federation of Teachers in court for the right to release to the news media individual teachers' "value added" ratings—an estimate of how effective a teacher is at improving his or her students' standardized test scores—Kopp says she finds the idea "baffling" and believes doing so would undermine trust among teachers and between teachers and administrators.

"The principals of very high performing schools would all say their No. 1 strategy is to build extraordinary teams," Kopp said. "I can't imagine it's a good organizational strategy to go publish the names of teachers and one data point about whether they are effective or not in the newspaper."

Indeed, if the editors of the Plain Dealer and NPR had read their own reporting, they would have realized the public release of this information was unsound, unfair and damaging. Let's look at the warning signs in their own reporting

...scores can vary from year to year.

Yet they relied upon only 1 years worth of data for much of their analysis, and just 2 for the teachers whose names they published.

...decided it was more important to provide information — even if flawed.

How can it be useful to the layperson to be provided with flawed information? Why would a newspaper knowingly publish flawed information?

...these scores are only a part of the criteria necessary for full and accurate evaluation of an individual teacher.

And yet they publish 4,200 teachers value-added scores based solely on value add, which at best makes up only 35% of a teachers evaluation. Lay people will not understand these scores are only a partial measurment of a teachers effectiveness, and a poor one at that.

...There are a lot of questions still about the particular formula Ohio.

Indeed, so many questions that one would best be advised to wait until those questions are answered before publically shaming teachers who were part of a pilot program being used to answer those questions.

...variables beyond a teacher’s control need to be considered in arriving at a fair and accurate formula.

Yet none of these reporters considered any of these factors in publishing teachers names, and readers will wholly miss that necassary context.

...The company that calculates value-added for Ohio says scores are most reliable with three years of data.

Again, the data is unreliable, especially with less than 3 years worth of data, yet the Plain Dealer and NRP decided they should shame teachers using just 2 years worth of data.

...Ohio’s value-added ratings do not account for the socioeconomic backgrounds of students, as they do in some other states.

How many "ineffective" teachers are really just working in depressed socioeconomic classrooms? The reporters seem not to care and publish the names anyway.

...Value-added scores are not a teacher’s full rating.

No where in the publication of these names are the teachers full ratings indicated. This again leaves lay-people and site visitors to think these flawed value-added scores are the final reflection of a teachers quality

...ratings are still something of an experiment.

How absurd is the decision to publish now seeming? Shaming people on the basis of the results of an experiement! By their very nature experiments can demonstrate something is wrong, not right.

...The details of how the scores are calculated aren’t public.

We don't even know if the value-added scores are correct and accurate, because the formula is secret. How can it be fair for the results of a secret forumla be public? Did that not rasie any alarm bells for the Plain Dealer and NPR?

...The department’s top research official, Matt Cohen, acknowledged that he can’t explain the details of exactly how Ohio’s value-added model works.

But somehow NPR listeners and Cleveland Plain Dealer readers are supposed to understand the complexities, and read the necessary context into the publication of individual teacher scores?

...StateImpact/Plain Dealer analysis of initial state data suggests.

"Initial", "Suggests". They have decided to shame teachers without properly vetting the data and their own analysis - exactly the same problem the LA Times ran into that we highlighted at the top of this article.

It doesn't take a lot of "analysis" to understand that a failing newspaper needed controversy and eyeballs and that their decision to shame teachers was made in their own economic interests and not that of the public good. In the end then, the real shame falls not on teachers who are working hard everyday often in difficult situations made worse by draconian budget cuts, endless political meddling, and student poverty - but on the editors of these 2 publications for putting their own narrow self-interest above that of Ohio's children.

It's a disgrace that they ought to make 4,200 apologies for.

Ohio Third Graders Face Retention Ultimatum

PBS recently ran a report on the new 3rd grade reading gaurantee.

Watch Ohio Third Graders Must Learn to Read or Repeat the Year on PBS. See more from PBS NewsHour.

This exchange with the Senate Education Committee chair was interesting

PEGGY LEHNER: I'm hoping that we can put some additional money in.

JOHN TULENKO: How much is it going to take?

PEGGY LEHNER: I think, frankly, we might be looking at $50 million, 60 million.

JOHN TULENKO: Lehner also acknowledges educators' other concerns about the reading guarantee: lack of preschool and parents who don't do their part.

There are so many questions around this.

PEGGY LEHNER: Sure.

JOHN TULENKO: Do you ever feel like you are stepping out on a limb on this one?

PEGGY LEHNER: It is a risk. And I think we have to take a risk. We have to change what we are doing, because what we have been doing is not working.

JOHN TULENKO: Can you give us a guarantee that this will work?

PEGGY LEHNER: Of course not. Of course not.

The budget will be a good opportunitiy to right some of these problems.

The Educational Path of Our Nation

Education plays a fundamental role in American society. Here we take a look at school enrollment, costs and educational outcomes. How does school enrollment today compare with 1970, when the baby boom generation was in its prime years of school attendance (age 6 to 24) and made up 90 pecent of all student enrolled in schools? The American Community and other Census Bureau survey provide us with information to answer these other valuable questions. Education statistics are vital to communities in determining funding allocations and guiding program planning.

education infographic image [Source: U.S. Census Bureau]

High stakes failure

It might be becoming apparent to any rational observer that high stakes corporate education policies are failing catastrophically. Where once various data and tests were used to inform educators and provide diagnostic feedback, they are increasingly being used to rank, grade, and even punish.

This is leading to the inevitable behaviors that are always present when such systems are created - whether it was in the world of energy companies such as Enron, or other accounting scandals including those affecting Tyco International, Adelphia, Peregrine Systems and WorldCom, to the more recent scandals involving Lehman Brothers, JPM or Barclays bank.

Here's another example, in news from Pennsylvania

After authorities imposed unprecedented security measures on the 2012 statewide exams, test scores tumbled across Pennsylvania, The Inquirer has learned.

At some schools, Pennsylvania Secretary of Education Ronald Tomalis said, the drops are "noticeable" - 25 percent or more.

In some school systems, investigators have found evidence of outright doctoring of previous years' tests - and systemic fraud that took place across multiple grades and subjects.

In Philadelphia and elsewhere, some educators have already confessed to cheating, and investigators have found violations ranging from "overcoaching" to pausing a test to reteach material covered in the exam, according to people familiar with the investigations.

When trillions of dollars of the world's money is at stake, investing in tight oversight and regulation is imperative, but when it comes to evaluating the progress of a 3rd grader, do we really want to spend valuable education dollars measuring the measurers?

The question becomes even more pertinent when one considers that the the efficacy of many of the measures is questionable at best. Article after article, study after study, places significant questions at the feet of value add proponents, and now a new study even places questions at the feet of the tests themselves

Now, in studies that threaten to shake the foundation of high-stakes test-based accountability, Mr. Stroup and two other researchers said they believe they have found the reason: a glitch embedded in the DNA of the state exams that, as a result of a statistical method used to assemble them, suggests they are virtually useless at measuring the effects of classroom instruction.

Pearson, which has a five-year, $468 million contract to create the state’s tests through 2015, uses “item response theory” to devise standardized exams, as other testing companies do. Using I.R.T., developers select questions based on a model that correlates students’ ability with the probability that they will get a question right.

That produces a test that Mr. Stroup said is more sensitive to how it ranks students than to measuring what they have learned. That design flaw also explains why Richardson students’ scores on the previous year’s TAKS test were a better predictor of performance on the next year’s TAKS test than the benchmark exams were, he said. The benchmark exams were developed by the district, the TAKS by the testing company.

We have built a high stakes system on questionable tests, measured using questionable statistical models, subject to gaming and cheating, and further goosed by the scrubbing of other student data. We've seen widespread evidence of it in New York, California, Washington DC, Georgia, Tennessee, Pennsylvania, and now Ohio.

Policymakers are either going to have to spend more and more money developing better tests, better models, tighter security and more bureaucratic data handling policies, or return to thinking about the core mission of providing a quality education to all students. Either way, when you have reached the point where the State Superintendent talks of criminalizing the corporate education system, things have obviously gone seriously awry.

State Superintendent Stan Heffner, who leads the department, has launched his own investigation and has said the probe could lead to criminal charges against educators who committed fraud.

Testing Profits

Now that states and the federal government are attaching high stakes to standardized tests, these tests are coming under increasing scrutiny. They don't appear to be holding up well to this additional scrutiny

A top New York state education official acknowledged Wednesday that the mounting number of errors found on this year's math and English tests has eroded public trust in the statewide exams.

"The mistakes that have been revealed are really disturbing," New York State Board of Regents Chancellor Merryl Tisch said at a Midtown breakfast sponsored by Crain's New York Business.

"What happens here as a result of these mistakes is that it makes the public at large question the efficacy of the state testing system," said Ms. Tisch, whose board sets education policy for the state.

Still, Ms. Tisch said testing experts have told state officials that the exams are valid and can be used to evaluate students and, in some cases, teachers.

Over the past several weeks, a series of errors by test-maker Pearson PLC have come to light, ranging from typographical mistakes to a now-infamous nonsensical reading passage about a pineapple. This is the first year of a five-year, $32 million contract the state awarded to Pearson, which also publishes textbooks.

To date, 29 questions have been invalidated on various third- through eighth-grade math and English tests, which are used in New York City to determine whether students are promoted to the next grade.

Pearson didn't return a request for comment.

Mistake riddled tests are not the only problem being highlighted

Is it okay to ask a child to reveal a secret? Richard Goldberg doesn’t think so. Goldberg, the father of 8-year old twin boys, was dismayed to learn his third-grade sons were asked to write an essay about a secret they had and why it was hard to keep. The unusual question, which Goldberg called "entirely inappropriate" was on the standardized tests given to public school students in the third through eighth grade every spring.
[...]
The question will not, however, appear on any future versions of the test, Barra said. "We’ve looked at this question in light of concerns raised by parents, and it is clear that this is not an appropriate question for a state test," Barra said.

Increasingly, calls are being made to make these tests public, so they can be fully vetted.

I learned that the tests themselves are being kept secret because the state Department of Education and Pearson, their test development contractor, wrote strong confidentiality provisions into the contract. My understanding is that this was so that they both could reuse test questions in the future. In order for the questions to be reusable, they have to be kept secret, otherwise students could prep too easily for the tests, and Pearson’s other customers would be able to get the tests from the public domain.

We only know about the gaffes because students exposed them. Educators have been sworn to secrecy. The Education Department has emphasized their concerns about test prep, but to me the secrecy seems rooted in economics: Secrecy saves New York on future test development costs and makes it easier for Pearson to re-sell the questions it created for New York (at New York taxpayers’ expense) in other states.

Two things strike me as odd about this. First, it’s uncommon to keep tests completely secret after the fact of their administration. Letting people see the test is a basic part of education.

The purpose of testing is to measure how well a student knows subject matter and to identify what areas need work. If the only thing one knows about a child’s performance on a test is his grade, and one can’t review the actual test, the test is pedagogically useless and can only serve a punitive purpose.

If the broader community of parents, educators and researchers can’t see tests, then we have no way of judging the connection between them and curricula or how to help our children.

A paper by the National Board on Educational Testing and Public Policy titled "Errors in Standardized Tests: A Systemic Problem" found

This paper contains a sizable collection of testing errors made in the last twenty-five years. It thus offers testimony to counter the implausible demands of educational policy makers for a single, error-free, accurate, and valid test used with large groups of children for purposes of sorting, selection, and trend-tracking.

No company can offer flawless products. Even highly reputable testing contractors that offer customers high-quality products and services produce tests that are susceptible to error. But while a patient dissatisfied with a diagnosis or treatment may seek a second or third opinion, for a child in a New York City school (and in dozens of other states and hundreds of other cities and towns), there is only one opinion that counts – a single test score. If that is in error, a long time may elapse before the mistake is brought to light – if it ever is.

This paper has shown that human error can be, and often is, present in all phases of the testing process. Error can creep into the development of items. It can be made in the setting of a passing score. It can occur in the establishment of norming groups, and it is sometimes found in the scoring of questions.
[…]
Measuring trends in achievement is an area of assessment that is laden with complications. The documented struggles experienced by the National Center for Education Statistics (NCES) and Harcourt Educational Measurement testify to the complexity inherent in measuring changes in achievement. Perhaps such measurement requires an assessment program that does only that. The National Center of Educational Statistics carefully tries to avoid even small changes in the NAEP tests, and examines the impact of each change on the test’s accuracy. Many state DOEs, however, unlike NCES, are measuring both individual student achievement and aggregate changes in achievement scores with the same test – a test that oftentimes contains very different questions from administration to administration. This practice counters the hard-learned lesson offered by Beaton,“If you want to measure change, do not change the measure”(Beaton et al., 1990, p. 165).

Furthermore, while it is a generally held opinion that consumers should adhere to the advice of the product developers (as is done when installing an infant car seat or when taking medication), the advice of test developers and contractors often goes unheeded in the realm of high-stakes decision-making. The presidents of two major test developers – Harcourt Brace and CTB McGraw Hill – were on record that their tests should not be used as the sole criterion for making high-stakes educational decisions (Myers, 2001; Mathews, 2000a). Yet more than half of the state DOEs are using test results as the basis for important decisions that, perhaps, these tests were not designed to support.

Finally, all of these concerns should be viewed in the context of the testing industry today. Lines (2000) observed that errors are more likely in testing programs with greater degrees of centralization and commercialization, where increased profits can only be realized by increasing market share,“The few producers cannot compete on price, because any price fall will be instantly matched by others .... What competition there is comes through marketing”(p. 1). In Minnesota, Judge Oleisky (Kurvers et al. v. NCS, Inc., 2002) observed that Basic Skills Test errors were caused by NCS’ drive to cut costs and raise profits by delivering substandard service – demonstrating that profits may be increased through methods other than marketing.

It clearly appears that profit is winning the day over quality, when it comes to standardized tests.

Here's the full paper.

Errors in Standardized Tests: A Systemic Problem

Education News for 04-12-2012

Statewide Education News

  • School achievement tests to get tougher in 2014 (Newark Advocate)
  • The tests Ohio's third- through eighth-graders are preparing to take later this month will look vastly different in a few years. No. 2 pencils and bubbled sheets will be replaced with computers; simple multiple choice questions will be replaced with questions requiring more thought. The tests also will be more difficult. Much more difficult. Read More…

  • Ohio Continues to Fall Short on Providing High-Quality Preschool (State Impact Ohio)
  • Ohio isn’t doing a great job of getting children, particularly low-income children, into good, state-funded preschool programs. Sound familiar? That’s because it’s been true for several years running. Steven Barnett is the director of the National Institute for Early Education Research. His group’s new annual report on the state of preschool doesn’t do Ohio any favors. Read More…

  • Test question raises concerns among Jews (Cleveland Jewish News)
  • An Ohio Graduation Test question asking for the Arabs’ perspective on the founding of the state of Israel has raised concerns among members of the Jewish community. Objections range from bias to over-simplification of history. Tenth-graders in public and private schools across Ohio took the OGT March 12 to 16 in five subject areas. Makeup testing took place the following week. Read More…

Local Issues

  • ‘Realistic’ financial projection requested by Liberty schools panel (Vindicator)
  • The fiscal commission prodded and picked at the latest revision of the Liberty school district’s five-year forecast Wednesday, telling the district’s treasurer it wants a more-detailed projection to ensure it is receiving adequate information for future cuts. Roger Nehls, chairman of the fiscal commission charged with guiding the district out of fiscal emergency, said districts sometimes will use the forecast as a budgetary planning tool. Read More…

Editorial & Opinion

  • Complex evaluation (Akron Beacon Journal)
  • Public schools are a favorite target of politicians fixed on accountability, on showing the worth of money spent. Last year, Ohio lawmakers approved in the budget bill provisions that require the State Board of Education to develop a new framework for evaluating teachers. The new assessment will apply, beginning in 2013, to school districts, plus charter schools participating in the federal Race to the Top initiative. Read More…

  • Raise the bar
  • State Auditor Dave Yost is right that Ohio needs higher standards and stricter accountability for charter-school treasurers. As some recent high-profile cases involving ruined schools and misspent tax funds make clear, it’s easy for hundreds of thousands of dollars to be lost before corrective action takes place. Read More…