Wednesday, October 3, 2012

Assessing Multiple Measures

If you follow education news at all these days, it’s hard to avoid hearing about “multiple measures” - usually in the context of developing teacher evaluation tools, but sometimes as another way of assessing student “achievement” (a discussion for another day).  There appears to be a consensus that multiple measures are a good thing, but why?

The idea seems reasonable to most people, but there’s good reason to examine it more closely.

The argument for multiple measures - which, obviously and importantly, are far more time-consuming and expensive to produce than say, a single, standardized test - is that no one has confidence that any single measure will accurately capture what it is that we're trying to “assess”; in this case, teacher "effectiveness".

It would be one thing if each of the measures in the current PDE proposal addressed a particular aspect of so-called “effectiveness”. If that were the case, you would have a potentially useful way of determining that a teacher is strong in one area, but less so in another. But no one is saying that. Instead, PDE appears to be wishfully-thinking that the shortcomings of one measurement tool will somehow cancel out the shortcomings of another. I should note that the law of GIGO (garbage-in, garbage-out) has not been repealed.

Multiple measures are of no value if the individual measures don't measure anything useful!

The fallacy of this thinking is demonstrated by the fact that where similar “multiple measures” have been used in pilot studies, it is not at all uncommon for an individual teacher to move forty percentile points up or down the scale from one year to the next. If you’re a teacher whose job is on the line, that’s somewhat disconcerting.  Not good for morale.

In addition to standardized student test scores (never designed or validated for the purpose of evaluating teachers, and which, at best, measure only a tiny sliver of what we’d like students to know), PDE has also proposed including comprehensive principal observations (still in the early stages of development), as well as building-level data such as attendance and graduation rates. (!!)

I’ve already discussed the absurdity of the latter, so clearly ‘principal observations’ have the most potential. But the occasional “drive-by” observation – the traditional approach – is what everyone is complaining about. Which suggests that if we’re going to do this right, principals would have to spend a lot more time in the classroom. Where’s that time going to come from? And even if it was possible to adequately train every principal in the country, you could never completely eliminate subjectivity (the reason for multiple measures) or the potential for abuse (although I’m sure that never happens.)  So you really do need multiple measures.

But wouldn’t it be nice if there was a set of such measures that produced useful results (i.e., helped teachers to improve their practice), and wasn’t prohibitively expensive? Here’s a suggestion, courtesy of Ilana Garon:
  1. Professional observations (i.e., principals)
  2. Peer-to-peer observations (other teachers)
  3. Teaching portfolios
  4. Student work
  5. Ask the kids (who provide surprisingly reliable and useful information)
Montgomery County has successfully developed a teaching evaluation system based on similar principles. A huge benefit of such an approach is that it avoids the punitive mind-set that is so counter-productive to - well, teacher effectiveness!  Having strong teacher participation produces a sense of ownership and helps ensure that the process works. I’m waiting to hear a good reason why it wouldn’t.

No comments:

Post a Comment