The first in a new series of two-page briefs summarizing the state of play in education policy research offers suggestions for policymakers designing teacher evaluation systems.

The paper is written by Dr. William Mathis, managing director of the National Education Policy Center, housed at the University of Colorado Boulder School of Education.

Teachers are important, and policies mandating high-stakes evaluations of teachers are at the forefront of popular school reforms. Today’s dominant approach labels teachers as effective or ineffective based in large part on a statistical analysis of students’ test-score performance. Teachers judged effective are rewarded, and those found ineffective are sanctioned.

While such summative evaluations can be useful, lawmakers should be wary of approaches based in large part on test scores: the error in the measurements is large—which results in many teachers being incorrectly labeled as effective or ineffective;1 relevant test scores are not available for the students taught by most teachers, given that only certain grade levels and subject areas are tested; and the incentives created by high-stakes use of test scores drive undesirable teaching practices such as curriculum narrowing and teaching to the test.

Summative initiatives should also be balanced with formative approaches, which identify strengths and weaknesses of teachers and directly focus on developing and improving their teaching. Measures that de-emphasize test scores are more labor intensive but have far greater potential to enrich instruction and improve education.

The paper goes on to give some key research points and advice for policy makers

If the objective is improving educational practice, formative evaluations that guide a teacher’s improvement provide greater benefits than summative evaluations.
If the objective is to improve educational performance, outside-school factors must also be addressed. Teacher evaluation cannot replace or compensate for these much stronger determinants of student learning. The importance of these outside-school factors should also caution against policies that simplistically attribute student test scores to teachers.
The results produced by value-added (test-score growth) models alone are highly unstable. They vary from year to year, from classroom to classroom, and from one test to another. Substantial reliance on these models can lead to practical, ethical and legal problems.
High-stakes evaluations based in substantial part on students’ test scores narrow the curriculum by diminishing or pushing out non-tested subjects, knowledge, and skills.
Teacher evaluation systems necessarily involve trade-offs, and specific design choices are controversial, so it is important to involve all key stakeholders in system design or selection.
To be successful, schools must invest in their teacher evaluation systems. An adequate number of highly trained evaluators must be available.
Given the wide variety of teacher roles and the many factors that influence learning that are outside the control of the teacher, a wide variety of measures of teacher effectiveness is also indicated. By diversifying, the weakness of any single measure is offset by the strengths of another.
High-quality research on existing evaluative programs and tools should inform the design of teacher evaluation systems. States and districts should investigate balanced models such as PAR and the Danielson Framework, closely examine the evidence concerning strengths and weaknesses of each model, and never attach high-stakes consequences to teachers which the evidence cannot validly support.

The paper can be read in full below

Research-Based Options for Education Policy Making