This paper from the Journal Of Educational And Behavioral Statistics takes a look at longitudinal individual teacher effects

One of the most challenging aspects of modeling longitudinal achievement data is how to address the persistence of the effects of past educational inputs on future achievement outcomes. In this article, we are concerned primarily with the effects of individual teachers and how best to model the accumulation of those effects across a longitudinal series of student achievement measures. For example, if a teacher improves student reading comprehension by teaching comprehension strategies, then we might expect the strategies to be useful for improving achievement in both current and future years. However, it is less clear how much the effects will persist and how the effects on future achievement will relate to the effects for the current year. The utility of comprehension strategies might diminish over time as students develop other methods for reading comprehension and the teacher’s effect on future scores might decrease and eventually fade to zero.

Results from these kinds of studies continue to raise concerns

As the prospect of using longitudinal achievement data to make potentially high-stakes inferences about individual teachers becomes more of a reality, itis important that statistical methods be flexible enough to account for the complexities of the data. The increasing frequency of tests that are not developmentally scaled across grades, as well as the concerns about the properties of developmental scales, suggests that longitudinal data series may need to be treated as repeated correlated measures of different constructs rather than repeated measures of a consistently defined unidimensional construct. Coupled with the inherent complexity of the accumulation of past educational inputs,models that assume equality or otherwise perfect correlation between proximal and future year effects of individual teachers may be inappropriate and run the risk of leading to misleading inferences about teachers. The GP model developed in this article tackles these issues head-on by generalizing existing value-added models to handle both scaling inconsistencies across repeated test scores and potential decay in the effects of past educational inputs on future test scores.

The results of our empirical investigations suggest that the assumption of perfect correlation between proximal and future effects of individual teachers is not entirely consistent with the data.

Journal of Educational and Behavioral Statistics-2010-Mariano-253-79