...VA is all about numbers, and numbers have a way of looking more serious than other information, and more true. We observe a teacher surrounded by children who love him; we applaud the profoundly thoughtful assignments he gives, and the profoundly thoughtful way he grades them; we observe the deft pacing of his lesson, the edge-of-the-seat engagement of his kids; the laughter that erupts from his room occasionally; we overhear children in the lunchroom discussing what he taught that day; we overhear the comment of a parent (My son never read a single book until the year he spent with Mr. X). And then next October, when the VA numbers are released, all the rest is forgotten, and all anyone sees are the numbers on the sheet.
Value-Added Accountability Requires Context
by Jackie Bennett
For every human problem, there is a neat, simple solution; and it is always wrong.
H. L. Mencken
Before Obama’s announcement of Arne Duncan as his choice for Secretary of Education, newspapers and blogs were full of accusations that teacher unions obstruct “real” education reform. The most notable criticism came from David Brooks of the New York Times, but there were others as well. Scratch the surface of these criticisms and they generally boil down to one issue: whether or not administrations should be able to fire and reward teachers based on how well their students do on standardized tests. Education reductionists say they should. The rest of the education world (and not just unions by a long shot) says, “Great sound bite. Tempting. But not a good idea.”
So, why is that?
First a quick primer on how test-score accountability works: in order to determine whether or not a teacher is improving her students’ test scores, statisticians have created value-added models (VAs). These models begin with each student’s raw progress from one exam to the next, and then try to weight that number by factoring in some of the things teachers can’t control such as the level at which the student started, whether or not he lives in poverty, and how large his class was. Theoretically, factoring in these aspects of student achievement should level the playing field so that teachers can be fairly compared, and then awarded or terminated based on the result.
That’s the theory, and theoretically at least it sounds like a good idea. Granted, the formulas are extremely limited in what they can capture (compare a portrait by Cézanne to Colorforms to get some idea of their reductionism). Granted too that their limits make them statistically unreliable measures of the impact of teachers on their students’ scores. Still, if we keep that in mind and understand VA as just one sliver of flawed information that must be viewed in context with all the other flawed information we have about a particular teacher and her students, then VA might be helpful for teachers themselves, and the schools in which they work.
But that’s the difficulty. Given its limits, context is everything with VA, and yet the scientific feel of numbers has a tendency to overwhelm all other information and leave context far behind. With VA, even when we intellectually know we must consider context, we tend to ignore it, and the teacher and her teaching become a Colorform.
Is this obstructionism, the notion that VA must be read in context? Absolutely not. In fact, in the education world, it’s almost universally accepted. Here in NYC, for example, Joel Klein’s DoE (an institution with a love of numbers to rival Wall Street’s) cautions principals that VA results can be uncertain because of personal life-changes for students, other learning experiences like tutors and help at home, and the measurement error inherent in state exams. VA, like any other measure cannot give us “the full story,” says the DoE. Rather “… the various pieces come together to create a more reliable picture.” The DoE tells principals to look at VA data in light of other things they know about a teacher from classroom observations, the quality of students’ day-to-day work, and the quality of lesson planning. 
And context doesn’t end there. If many teachers in a school seem to have underperformed, for example, the real problem may lie with the poor curriculum or pedagogy imposed on teachers by administrators. Or maybe the class is on the noisy side of the building.  And, since research shows that regular attendance affects student achievement, then the administrator who has planned too many assemblies may have only herself to blame for the results.
But this is not about shifting blame, and it’s not about excuses. Rather it is about context, and context can work just as much to confirm a bad report as to contradict it or shift the blame to someone else. For example, consider a teacher who has seen little progress in day-to-day work with her top students. When her VA report arrives, it might support other evidence from her class that indicates she’s not reaching those kinds of kids. In context, the report makes sense to her and her principal. Then, she might focus on what changes she can make.
Ultimately, context entails asking ourselves two questions about VA data. First, does the information seem accurate? (Does it truly reflect how well Ms. Jones’s students did?) Second, if it does seem accurate, then what caused those results? (Something else? Or Ms. Jones.)
How important is context? Consider the November 2008 study by Dan Goldhaber and Michael Hansen in which the researchers followed VA data on teachers in North Carolina over several years. Comparing the teachers’ scores from their years before tenure to their scores from later on,the researchers found that the VA formulas yielded troublesome variations over time. Some of those teachers whom VA judged to be bad at teaching reading (in the lowest quintile) before they got tenure, were judged by VA after tenure to be among the best. Specifically, 11% wound up in the top quintile (80th – 100th percentile)after they got tenure. An additional 16% wound up in the quintile just below that.
Goldhaber and Hansen suggest these swings say more about the instability of VA formulas than they do about the teachers, and they probably do. But whatever the cause, they are a good illustration of why VA information can’t be used for high stakes decisions like termination. North Carolina does not “sort and fire” teachers based upon VA data, but if they did they might wind up firing some of their best teachers. And in Dallas, where teachers apparently have been terminated based upon VA, one can’t help wondering if the terminators, in their zeal for simple solutions to complicated issues, got it right.
So. Context matters, and pretty much everyone knows that. And — here’s the problem — everyone promptly forgets. As I’ve worked with VA formulas over the past few months, I have been appalled at how often well-intentioned people speak of context, and then forget about it. That’s because VA is all about numbers, and numbers have a way of looking more serious than other information, and more true. We observe a teacher surrounded by children who love him; we applaud the profoundly thoughtful assignments he gives, and the profoundly thoughtful way he grades them; we observe the deft pacing of his lesson, the edge-of-the-seat engagement of his kids; the laughter that erupts from his room occasionally; we overhear children in the lunchroom discussing what he taught that day; we overhear the comment of a parent (My son never read a single book until the year he spent with Mr. X). And then next October, when the VA numbers are released, all the rest is forgotten, and all anyone sees are the numbers on the sheet.
Speaking of Wall Street’s derivatives, Warren Buffett famously warned, “Beware of geeks bearing formulas.” Partly Buffett meant that you really have to look at context to understand the value of a stock. The same can be said of teachers, but education has no Warren Buffet. It’s got us.
Here in NYC, we are safe from the abuses of VA.  But teachers and their students elsewhere are vulnerable. That may change as more educators work with VA and its flaws become more apparent. But we aren’t quite there yet.
And yes — I did just write that students as well as teachers are vulnerable to out-of-context abuses. VA in isolation sends schools down the wrong path for improving our students’ education by distracting us from the educational mission, focusing on the wrong teachers, and creating a culture in which a teacher’s livelihood will be determined by the answers her students have placed on a multiple choice test.
If VA is read in context that won’t happen.
But just try keeping context in mind.
 This information is taken from Introduction to NYC Teacher Data Initiative, a DoE powerpoint that is not on line.
 A landmark study recently cited in the New York Times showed how the reading levels of children on the noisy side of a school fell several months behind the reading levels of their schoolmates on the quiet side.
 The DoE and UFT intelligently worked together to stop the use of VA in evaluation. NY students take tests mid-year, so that in the formulas the contributions of two teachers become tangled.