This Academic Life: Judging and grading

12.29.2004

Judging and grading

Taking a quick break from grading -- but actually, not so much of a break…and, as it turned out, not so quick.

This article from the Washington Post (free subscription required) provides yet another example of perhaps my least-favorite misuse of a philosophical term in popular discourse (right up there with the conflation of "deconstruct" and "analyze"):

[Michelle] Kwan has yet to face the sport's new computer-oriented judging system in which a panel of judges grades each element of a skater's program as it unfolds. The system made its debut more than a year ago in the aftermath of a judging scandal at the 2002 Salt Lake Olympics … It is aimed at bringing more objectivity to a sport once considered rife with corruption and cheating. … Kwan said final preparations for the Jan. 11-16 U.S. figure skating championships in Portland, Ore., her only tuneup before the world championships, have been anything but relaxed. The U.S. championships will be contested under the old scoring system, a far more subjective one in which judges rank skaters only in two broad categories on a 6.0 scale.

I have italicized the key terms. What bothers me here -- not a surprise, I'm sure -- is the somewhat bizarre notion that a judging system based on point ratings for individual performance components is somehow more "objective" than one based on two broad categories. This is a nonsensical claim, inasmuch as the judges are still evaluating the skater as she or he skates, and as such the element of personal discretion is in no way eliminated.

To make a system of evaluation "objective," all personal discretion would have to be removed, so that "the facts" alone would determine how a performance was to be scored. This follows quite simply from the very definition of "objectivity," which means -- if it means anything at all -- that the truth of a statement derives not from the personal whims and impressions of an observer (which would be "subjectivity"), but from some inherent characteristics of the object under observation. To be more precise, this is what we might call classical objectivity, where "classical" means a number of things (pre-quantum physics, pre-poststructuralism, pre-interpretivism). And it is ordinarily opposed to "subjectivity," i.e. an epistemological stance in which the truth of a statement derives from the knowing subject's personal and potentially idiosyncratic habits of thought.

So let's look this new judging system for figure-skating a bit, courtesy of the International Skating Union's official website:

2) Technical Score

When a skater/couple performs, each element of their program is assigned a base value. Base values of all recognised elements are published annually by the ISU. During the program, Judges will evaluate each element within a range of +3 to -3. This evaluation will either add to or deduct from the base value of the element. … When a skater executes an element, the Technical Specialist, monitored by the Technical Controller, will identify the element, and its respective point value will be listed on each Judge’s screen. … The Judge then grades the quality of the element within the range of +3 to -3. The sum of the base value added to the trimmed mean of the Grade of Execution (GOE) of each performed element will form the Total Element Score.

Pardon me for being obtuse, but where's the "objectivity" here? What I see is a set of criteria that individual judges can apply using their (presumably expert) discretion. And this means that a skater's performance rating depends on how she or he is judged by the experts to have executed particular elements.

Well, how about in the portion of the score that replaces the old "artistry" rating -- the place where Michelle Kwan traditionally cleaned up and thus defeated skaters whose technical competence was arguably superior?

3) Program Components

In addition to the technical score, the Judges will award points on a scale from 0 to 10 with increments of .25 for the five Program Components to grade the overall presentation of the performance. These Program Components are skating skills, transitions, performance/execution, choreography and interpretation. Several factors, as detailed below, are to be taken into account when the Judges consider each component.

Hmm. Still no "objectivity" in sight. And when we start looking at the specific criteria that judges are supposed to take into account when issuing a score for a program component, the issue becomes even clearer. Let me just take two Program Components to illustrate the point:

Skating Skills include:
- Overall skating quality
- Multi-directional skating
- Speed and power
- Cleanness and sureness of edges
- Glide and flow
- Balance in ability of partners (pair skating and ice dancing)
- Unison (pair skating)
- Depth and quality of edges and ice coverage (ice dancing)
- One foot skating (ice dancing)

Choreography includes:
- Harmonious composition of the program
- Creativity and originality
- Conformity of elements, steps and movements to the music
- Originality, difficulty and variety of program pattern
- Distribution of highlights
- Utilization of space and ice surface
- Unison (pair skating)

All it looks like the ISU has done is to spell out the elements that were previously wrapped up to form "artistry" as an omnibus rating. While this is undoubtedly an improvement, inasmuch as it gives a skater a better shot at seeing what is desired from the judges instead of remaining in the dark and simply receiving scores more or less at random, it has zippo implications for the "objectivity" of the score.

Precisely the same thing can be said of "grading rubrics," a craze presently sweeping its way through academia. I'm all for spelling out course requirements and for telling students that when I grade an essay I am looking for a clear thesis, argumentative support for that thesis, an effort to refute counter-arguments, a judicious use of examples when required, etc. But I am under no illusions that this makes my grading more "objective." More precise, yes. More defensible, in that I can point to specific components where the student's performance was lacking, yes. Hence, more transparent. But in the end I, like the figure-skating judges, am still doing the same old thing: making expert judgments about how well a particular performance was executed.

Is there a way to eliminate this? It'd sure make grading go faster … what about giving solely multiple-choice exams? That would be the equivalent of replacing all baseball umpires with the QuesTec automated system for evaluating balls and strikes, I think. In so doing, it would convert a "ball" and a "strike" into purely positional claims: a ball went wide of the plate or was too high or too low, a strike managed to cross the plate at the proper height. And it would make the pitching of balls and strikes into a very mechanical affair indeed, one from which virtually every element of human discretion and judgment had been removed (or, at any rate, formalized and institutionalized into a locally stable structure of rules that restricted agency much moreso than presently exists in baseball). Ditto for a multiple-choice test, which would basically eliminate the elements of individual discretion and judgment from an evaluation of a student's performance, and instead simply record a measurement of how many of the pencil-marks attributed to the movement of her or his hand ended up in the proper space on the page.

But would such a system be "objective"? Would it generate "objective" measurements of performance? I don't think so. While such a system would go well beyond the channelling of discretion exemplified in the ISU judging system or in the grading rubric on any of my syllabi, and would essentially eliminate the need for a human judge to make a determination, it would in no way eliminate discretion and judgment per se. Instead, it would hard-wire a certain set of standards into the apparatus of evaluation, and thus take the immediate need to interpret results out of the hands of the observers present at the time. But it would in no way eliminate the observer-dependence of the results thus obtained, even though the "observer" in question would now be a machine -- a machine that was implementing standards generated by ordinary process of social transaction.

To equate a procedure for taking evaluation out of the hands of particular individuals -- or even out of the hands of any particular individual -- with "objectivity" in the classical sense is a peculiar bit of philosophical sleight-of-hand that simply reinforces the popular notion that there are only two options for a knowledge-claim: either the claim is "objective," meaning true because it corresponds to some innate dispositional characteristic of the universe, or "subjective," meaning true because someone arbitrarily declared that it was true. The sleight of hand here involves the notion of a "dispositional essence," whether this involves a student's "intelligence" or a pitcher's "talent" or a skater's "skill"; if we grant that such occult essences are the targets of our evaluation techniques, then it follows that if we all agree on a standard and then step out of the way, the dispositional essence of the student/pitcher/skater will shine through. Voila, "objectivity" -- in the classical, dualist sense.

[This is how John Searle manages to sustain the argument that one can make epistemically objective statements about ontologically subjective -- i.e. socially constructed -- phenomena: as long as individual discretion is minimized or eliminated, the only other option is "objectivity."]

But do we need to make such an assumption -- and is it even helpful to do so? I am not convinced that it is. After all, figure skating and baseball, as well as final exams, are clearly social products, and in this sense to use a word like "objectivity" in its classical sense when referring to any component of them seems a bit misplaced. Social practices that have more or less firm sets of rules that constitute and govern them generate, not "objective" evaluations, but intersubjective ones; any consensuses that they generate are at least as much a product of interpretive activities between and among the participants as they are a product of anything happening "out there in the world" (wherever that might be). Any stability that we perceive in baseball or figure-skating or in a student's performance is a result of our ongoing interpretive activities, our transactions with the world, and cannot be definitively attributed to "the world" itself (whatever that might be).

This applies equally to evaluation situations involving a lot of individual discretion (old-style judging of figure skating, and most baseball umpiring situations), more structured and transparent forms of discretion (new-style judging, grading rubrics), and virtually no individual discretion (QuesTec, multiple-choice exams). None of these generate "objective" results. A student's grades, like a figure skater's scores or a baseball player's stats, provide a record of her or his performance in specific situations. No occult essences needed.

Of course, none of this means that Michelle Kwan isn't a terrific skater, or that my best students aren't fantastically intelligent and capable scholars. Of course they are -- that's how we define the terms. A .300 hitter hits .300 over the long-term, and we know this because they, well, hit .300 over the long term. This doesn't explain anything; "being a .300 hitter" is as little an occult essence as "being an A student" is. ("She got an A because she's an A student" is an empty tautology.) But it does provide information that we can use to classify the person and evaluate her or his performance, and perhaps spur them to try to do better in the future. And isn't this what grading and judging is for?

[Posted with ecto]

¶ 12/29/2004 05:31:00 PM

|

<< Home