Is Ohio ready for computer testing?

The Cincinnati Enquirer has a report on how Ohio schools are not going to be ready for the new online PARCC tests that are scheduled to be deployed next year.

Ohio public schools appear to be far short of having enough computers to have all their students take new state-mandated tests within a four-week period beginning in the 2014-15 school year.

“With all the reductions in education funds over the last several years and the downturn in the economy, districts have struggled to be able to bring their (computer technology) up to the level that would be needed for this,” said Barbara Shaner, associate executive director of the Ohio Association of School Business Officials.

Districts could seek state permission to deliver the new tests on paper if they can’t round up enough computers, tablets and gadgets to go around, Jim Wright, director of curriculum and assessment for the Ohio Department of Education, said. A student taking a paper test could be at a disadvantage, though. While the paper tests won’t have substantially different questions, a student taking the test online will have the benefit of audio and visual prompts as well as online tasks that show their work on computer, said Chad Colby, a spokesman for the Partnership for Assessment of Readiness for College and Careers.

The state really does need to step up and help districts fund this costly mandate that has been foisted upon them. Added to this, the computer industry is going through significant changes as more and more people move away from the traditional desktops and laptops in favor of the simpler more portable tablets. School districts could find themselves having to make costly investments again in the near future if they pick the wrong technologies.

The article makes note of the possibility of paper based test takers being at a possible disadvantage over those taking the computer based tests. There has been a significant amount of research over the years on this, and the results seem to indicate the opposite effect - that computer based test takers score lower than paper based tests.

The comparability of test scores based on online versus paper testing has been studied for more than 20 years. Reviews of the comparability literature research were reported by Mazzeo and Harvey (1988), who reported mixed results, and Drasgow (1993), who concluded that there were essentially no differences in examinee scores by mode-of-administration for power tests. Paek (2005) provided a summary of more recent comparability research and concluded that, in general, computer and paper versions of traditional multiple-choice tests are comparable across grades and academic subjects. However, when tests are timed, differential speededness can lead to mode effects. For example, a recent study by Ito and Sykes (2004) reported significantly lower performance on timed web-based norm-referenced tests at grades 4-12 compared with paper versions. These differences seemed to occur because students needed more time on the web-based test than they did on the paper test. Pommerich (2004) reported evidence of mode differences due to differential speededness in tests given at grades 11 and 12, but in her study online performance on questions near the end of several tests was higher than paper performance on these same items. She hypothesized that students who are rushed for time might actually benefit from testing online because the computer makes it easier to respond and move quickly from item to item.

A number of studies have suggested that no mode differences can be expected when individual test items can be presented within a single screen (Poggio, Glassnapp, Yang, & Poggio, 2005; Hetter, Segall & Bloxom, 1997; Bergstrom, 1992; Spray, Ackerman, Reckase, & Carlson, 1989). However, when items are associated with text that requires scrolling, such as is typically the case with reading tests, studies have indicated lower performance for students testing online (O’Malley, 2005; Pommerich, 2004; Bridgeman, Lennon, & Jackenthal, 2003; Choi & Tinkler, 2002; Bergstrom, 1992)

After the evaluations binge, the hangover

You don't have to search far, or wide, to find articles, papers, and studies critical of corporate education reformers push for rigid test based teacher evaluations of the kind currently being deployed in Ohio. Our document archive is full of them. But it is unusual to read a paper published by a right wing think tank with a reputation for being anti-teacher, that raises many of the same points teachers themselves have been raising about the headlong rush to implement corporate education reform principles in the area of teacher evaluations.

But that's exactly what the American Enterprise Institute (AEI) have just done wit ha paper titled "The Hangover: Thinking about the Unintended Consequences of the Nation’s Teacher Evaluation Binge". The paper opens with a warning that the recent pushes might have been too much, too soon, and gone too far

Yet the recent evaluation binge is not without risks.

By nature, education policymaking tends to lurch from inattention to overreach. When a political moment appears, policymakers and advocates rush to take advantage as quickly as they can, knowing that opportunities for real change are fleeting. This is understandable, and arguably necessary, given the nature of America’s political system. But headlong rushes inevitably produce unintended consequences—something akin to a policy hangover as ideas move from conception to implementation.

Welcome to teacher evaluation’s morning after.

the Paper discusses a number of problematic area that will be familiar to JTF readers

Flexibility versus control: There is a temptation to prescribe and legislate details of evaluations to ensure rigor and prevent evaluations from being watered down in implementation. But overly prescriptive policies may also limit school autonomy and stifle innovation that could lead to the development of better evaluations.

Evaluation in an evolving system: Poorly designed evaluation requirements could pose an obstacle to blended learning and other innovative models in which it is difficult or impossible to attribute student learning gains in a particular subject to a particular teacher.

Purposes of evaluations: New evaluation systems have been sold as a way both to identify and dismiss underperforming teachers and to provide all teachers with useful feedback to help them improve their performance. But there are strong tensions between these purposes that create trade-offs in evaluation system design.

Evaluating teachers as professionals: Advocates argue that holding teachers responsible for their performance will bring teaching more in line with norms in other fields, but most professional fields rely on a combination of data and managerial judgment when making evaluation and personnel decisions, and subsequently hold managers accountable for those decisions, rather than trying to eliminate subjective judgments as some new teacher evaluation systems seek to do.

Take one look at this evaluation framework that has been inspired by the Ohio legislature and one can see how prescriptive Ohio's teacher evaluation has become.

Ohio has also fallen into many of the traps this paper highlights. The failure to consider team worked teaching, a lack of focus and funding for professional development, and a lack of resources for administrators to provide adequate feedback, to name just a handful.

AEI offer some useful recommendations, some of which might be too late to implement in Ohio

Recognizing these tensions and trade-offs, this paper offers several policy recommendations:
  • Be clear about the problems new evaluation systems are intended to solve.
  • Do not mistake processes and systems as substitutes for cultural change.
  • Look at the entire education ecosystem, including broader labor-market impacts, pre- and in-service preparation, standards and assessments, charter schools, and growth of early childhood education and innovative school models.
  • Focus on improvement, not just deselection.
  • Encourage and respect innovation.
  • Think carefully about waivers versus umbrellas.
  • Do not expect legislation to do regulation’s job.
  • Create innovation zones for pilots—and fund them.

One might find it gratifying to read reasoned words of caution regarding corporate education reforms from some of the very people responsible for pushing them in the first place, and we can only hope we see more of it. But, it is hard not to suspect that this is the slow dawning of realization that is being drawn from the very real evidence of on-going struggles and failures in corporate education reform policies now being seen across the state and the country.

The Hangover: Thinking about the Unintended Consequences of the Nation’s Teacher Evaluation Binge

Research-Based Options for Education Policy Making

The first in a new series of two-page briefs summarizing the state of play in education policy research offers suggestions for policymakers designing teacher evaluation systems.

The paper is written by Dr. William Mathis, managing director of the National Education Policy Center, housed at the University of Colorado Boulder School of Education.

Teachers are important, and policies mandating high-stakes evaluations of teachers are at the forefront of popular school reforms. Today’s dominant approach labels teachers as effective or ineffective based in large part on a statistical analysis of students’ test-score performance. Teachers judged effective are rewarded, and those found ineffective are sanctioned.

While such summative evaluations can be useful, lawmakers should be wary of approaches based in large part on test scores: the error in the measurements is large—which results in many teachers being incorrectly labeled as effective or ineffective;1 relevant test scores are not available for the students taught by most teachers, given that only certain grade levels and subject areas are tested; and the incentives created by high-stakes use of test scores drive undesirable teaching practices such as curriculum narrowing and teaching to the test.

Summative initiatives should also be balanced with formative approaches, which identify strengths and weaknesses of teachers and directly focus on developing and improving their teaching. Measures that de-emphasize test scores are more labor intensive but have far greater potential to enrich instruction and improve education.

The paper goes on to give some key research points and advice for policy makers

  • If the objective is improving educational practice, formative evaluations that guide a teacher’s improvement provide greater benefits than summative evaluations.
  • If the objective is to improve educational performance, outside-school factors must also be addressed. Teacher evaluation cannot replace or compensate for these much stronger determinants of student learning. The importance of these outside-school factors should also caution against policies that simplistically attribute student test scores to teachers.
  • The results produced by value-added (test-score growth) models alone are highly unstable. They vary from year to year, from classroom to classroom, and from one test to another. Substantial reliance on these models can lead to practical, ethical and legal problems.
  • High-stakes evaluations based in substantial part on students’ test scores narrow the curriculum by diminishing or pushing out non-tested subjects, knowledge, and skills.
  • Teacher evaluation systems necessarily involve trade-offs, and specific design choices are controversial, so it is important to involve all key stakeholders in system design or selection.
  • To be successful, schools must invest in their teacher evaluation systems. An adequate number of highly trained evaluators must be available.
  • Given the wide variety of teacher roles and the many factors that influence learning that are outside the control of the teacher, a wide variety of measures of teacher effectiveness is also indicated. By diversifying, the weakness of any single measure is offset by the strengths of another.
  • High-quality research on existing evaluative programs and tools should inform the design of teacher evaluation systems. States and districts should investigate balanced models such as PAR and the Danielson Framework, closely examine the evidence concerning strengths and weaknesses of each model, and never attach high-stakes consequences to teachers which the evidence cannot validly support.

The paper can be read in full below

Research-Based Options for Education Policy Making

Packed Virtual Classrooms

Later today, Apple will unveil its plans for digital textbooks.

Steve Jobs described textbooks as an '$8 billion a year industry ripe for digital destruction', in conversations with his biographer Walter Isaacson.

Given how much dead tree weight students have to carry around, and how expensive textbooks have become, this is an area ripe for a solution. But as Apple lays out its plans for capturing some of the profits to be had from education, likely with an innovative technology based solution, corporate education reformers have set their sights on using technology to capture profits in an altogether different way.

The Fordham Foundation recently released a paper titled " Creating Sound Policy for Digital Learning, A Working Paper Series from the Thomas B. Fordham Institute. The piece begins

Online learning, in its many shapes and sizes, is quickly becoming a typical part of the classroom experience for many of our nation’s K-12 students. As it grows, educators and policymakers across the country are beginning to ask the question: What does online learning cost? While the answer to this question is a key starting point, by itself it has limited value. Of course there are cheaper ways to teach students. The key question that will eventually have to be addressed is: Can online learning be better and less expensive

At that point the paper descends into the usual rote corporate ed stuff, using anecdote to try to capture the costs and quality of virtual education. The total lack of innovative thought is captured in their first graph.

You will clearly note that it is not technology driving the savings, but instead the slashing of spending on educators. The entire difference between a traditional model and virtual model is in the category of faculty and admin expenditures. Stephen Dyer, at his new blog "10th Period", points out that actual e-school spending in Ohio follows this exact model

Over at Innovation Ohio, I helped write and research a report that pointed out Ohio pays these major eSchool operators enough money for them to provide 15:1 student:teacher ratios, $2,000 laptops and still clear about 30% profit.

However, they don't do that. On average, they have 37:1 student:teacher ratios. Ohio Virtual Academy (run by the infamous, national for-profit K-12, Inc.) has a student:teacher ratio of 51:1, if you can believe it. Anyway, of the $183 million Ohio's taxpayers sent to these eSchools last school year, the schools spent a grand total of $27.5 million on teacher salaries, or about 15% of its money.

E-schooling as envisaged by corporate education reformers doesn't rely upon any technological innovation as a means to deliver high quality education, they use the virtual nature of the model to obfuscate the fact that class sizes can become huge. It's hard for a parent to know their child is crammed in to a packed class with 50 other students if he is sat alone in his bedroom. What you don't see, won't hurt, right?

It's never explained how a teacher can deliver quality to such large classes, in a situation where the virtual nature of the classes already make it naturally more difficult and challenged.

We know from facts certain that Ohio's e-schools are appallingly bad. Even the Fordham Foundation itself found e-school to be terrible

Perhaps before we even begin to consider cost, we ought to sort out the very serious problems we have with quality. What does it matter how cheap something is, if it is not fit for purpose? One might even argue, with tongue not so firmly planted in the cheek that Ohio's e-schools are breaking consumer laws

In common law jurisdictions, an implied warranty is a contract law term for certain assurances that are presumed to be made in the sale of products or real property, due to the circumstances of the sale. These assurances are characterized as warranties irrespective of whether the seller has expressly promised them orally or in writing. They include an implied warranty of fitness for a particular purpose, an implied warranty of merchantability for products, implied warranty of workmanlike quality for services, and an implied warranty of habitability for a home.

Test Scores Often Misused In Policy Decisions

Education policies that affect millions of students have long been tied to test scores, but a new paper suggests those scores are regularly misinterpreted.

According to the new research out of Mathematica, a statistical research group, the comparisons sometimes used to judge school performance are more indicative of demographic change than actual learning.

For example: Last week's release of National Assessment of Educational Progress scores led to much finger-pointing about what's working and what isn't in education reform. But according to Mathematica, policy assessments based on raw test data is extremely misleading -- especially because year-to-year comparisons measure different groups of students.

"Every time the NAEP results come out, you see a whole slew of headlines that make you slap your forehead," said Steven Glazerman, an author of the paper and a senior fellow at Mathematica. "You draw all the wrong conclusions over whether some school or district was effective or ineffective based on comparisons that can't be indicators of those changes."

[readon2 url=""]Continue reading...[/readon2]

Merit Pay: The End Of Innocence?

The current teacher salary scale has come under increasing fire, and for a reason. Systems where people are treated more or less the same suffer from two basic problems. First, there will always be a number of "free riders". Second, and relatedly, some people may feel their contributions aren’t sufficiently recognized. So, what are good alternatives? I am not sure; but based on decades worth of economic and psychological research, measures such as merit pay are not it.

Although individual pay for performance (or merit pay) is a widespread practice among U.S. businesses, the research on its effectiveness shows it to be of limited utility (see here, here, here, and here), mostly because it’s easy for its benefits to be swamped by unintended consequences. Indeed, psychological research indicates that a focus on financial rewards may serve to (a) reduce intrinsic motivation, (b) heighten stress to the point that it impairs performance, and (c) promote a narrow focus reducing how well people do in all dimensions except the one being measured.

In 1971, a research psychologist named Edward Deci published a paper concluding that, while verbal reinforcement and positive feedback tends to strengthen intrinsic motivation, monetary rewards tend to weaken it. In 1999, Deci and his colleagues published a meta-analysis of 128 studies (see here), again concluding that, when people do things in exchange for external rewards, their intrinsic motivation tends to diminish. That is, once a certain activity is associated with a tangible reward, such as money, people will be less inclined to participate in the task when the reward is not present. Deci concluded that extrinsic rewards make it harder for people to sustain self-motivation.

[readon2 url=""]Continue reading...[/readon2]