Search This Blog

Monday, April 3, 2017

Maybe the signal is the noise: on U.S. standardized tests, from best to awfulest

I have three part-time teaching jobs, one of which, for Princeton Review, involves preparing students for standardized tests. Friday night, the Princeton Review job also involved me staying awake the entire night and taking three Advanced Placement tests in a row (because I hadn't been able to find any other free time to take them in), and reflecting how exceptionally well-designed they are, in comparison to the highly dubious tests I often help with.

Since teaching is one of my passions, and tests are a key force shaping (or mis-shaping) teaching in the United States, I'll share some of my opinions on the tests I'm familiar with. Ranked from best to worst:

1. Advance Placement Exams (high school). The best standardized tests I know, and the ideal to which the others should aspire. For one thing, half of the grade in each test I took Friday night (US History, World History, and Statistics) involves writing -- and writing that's harder than usual to bullshit.

The statistics writing means that you have to be able to explain how you're analyzing an experiment, why you're choosing the approach you're choosing, and what the results of calculations you perform mean. The history writing involves two sections. The first requires you to read, process, and form patterns out of 8 to 10 primary documents in order to create a thesis on the spot that incorporates all of them. Admittedly the thesis itself will be bullshit (because you can't conclude anything fair from having a stranger hand-select documents for you), but the reading and pattern-finding skills with which you weave the nonsense have to be real. The other section involves writing an original essay on, for example, changes and constants in Middle Eastern nationalism from 1920 to the present, or an analysis of Andrew Jackson's presidency in terms of his claimed devotion to the "common man" (common men of his era being apparently all white). If you can handle a topic like that on the spot, well, odds are you know some stuff about history.

Even the multiple choice 50% of the test is nicely done, I think. I found that, most of the time, the wrong answers are just plausible enough to fool someone who doesn't know the subject, but clearly enough wrong not to fool someone who's pretty well informed. For example, a famous quote about suburban housewives (one I recognize from the Feminine Mystique) asks you who it's from: the wrong choices include Angelina Grimke, Susan B Anthony, and Phyllis Schlafly. All of them are known for having opinions about feminism! But the first two came around long before suburban housewives were a large and recognized class, and Phyllis Schlafly, who did have opinions on Betty Friedan's favorite topics, had rather, well, *contrasting* views. (Not that she was a passive suburban homemaker herself, mind you.) This is an ideal range of answers, in my opinion: one that rewards knowledge without demanding superhuman precision.

Basically, to study for AP exam, you need to (1) learn to write well and (2) learn the subject. Great goals. I approve. If it turns out -- karma may not like my sleep pattern -- that I scored less than 97th percentile on any of these, I reserve the right to change my opinion. Whether that will reflect on the test, or on me, will be an exercise for the reader.

2. The Medical College Admission Test (college/ postcollege). 80% of this test is incredibly dull, and requires vast feats of memorization. In contrast to my usual teaching attitude, I'm basically okay with that. Doctors *should* have achieved vast feats of memorization, because human bodies are incredibly complicated, and symptoms vary tremendously. It's nice if the doctor's memory helps her zero in quickly on plausible theories about the patient's issues and needs.

I don't teach that 80%. I teach the fifth section, known as Critical Analysis and Reasoning Skills (CARS). CARS is pretty much like its equivalent sections on the SAT and ACT and GRE: you read passages, and answer multiple-choice questions about the passage. Questions involve anything from simple reading-comprehension details; to the central theme of the passage; to the structural purpose of a paragraph in the passage's context; to the localized meaning of a word or phrase; to what out-of-subject analogy might be made to a process described in the passage. One of the five choices is always supposed to be correct, and there are always specific reasons why the other four answers are wrong. Those reasons are often nefariously subtle, hidden by intentionally deliberate language.

On the whole, I've come to the conclusions that

(a) To answer these questions, you do need to learn reasonably valuable skills. It is possible to think like the test-maker, and learning to do so *does* make you a better reader. It makes you more aware of structure and intent -- and therefore better able to start searching for bias, for deception, or (on the other hand) for the pleasure of a well-phrased and poetic truth, or an amusing detail.

(b) There is little good educational purpose in making so many of the wrong answers *close to* being right. Certainly, I think a few nasty tricks scattered here and there are fine: reward the very best detail-mongers by letting them have a few extra questions right on a long test. But given the deliberately tight time constraints for readers, I think CARS/ GRE/ SAT/ ACT make it too easy for a reader who did a good job understanding the passage quickly to, nonetheless, bomb the questions.

(c) Related to (b), this makes the CARS test unfairly biased towards people with the money and time to hire, say, a delightful Princeton Review tutor such as myself. Much of the separation between the good readers who get a great score, and the good readers who get a poor one, will come down to "the great score goes to she who recognized the vicious little snares that get littered across the test". Which often means "she who had a chance to fail, and be corrected, a lot of times in advance".

(d) Of the four tests with these reading passages, I've found the MCAT CARS the most fun to teach, which I think indicates its passages may be the least unfair. They're at least the ones most likely to be interesting to read, and their difficulty-of-reading -- the passages themselves, I mean -- seems a fair compromise between "grad school academese is hard" and "the student has 11 minutes to read this and answer questions".

At any rate, I'm ranking MCAT #2 for the combination of CARS -- which teaches genuine life-enhancing reading skills, if only you can afford the preparation classes, which maybe you can't -- plus the dull and overwhelming persistence of the other 80% of the test. Together they seem like a highly relevant, if unfair, preparation for the very-unfair life of a medical student -- and the even deadlier life of a resident medic-in-training later.

3. The ACT (high school). The ACT and SAT are very similar tests. They both involve

(A) Reading comprehension with all its trap questions;
(B) Grammar questions that also involve sentences from a continuous passage, and are intermixed with word-usage questions and how-better-to-structure-this-passage questions;
(C) mathematics tests that focus on algebra, plane geometry, and word-problem applications of arithmetic, including a lot of multi-step problems that involve multiple skills.

(C) is a problem. Not because of the questions, which are often very nicely designed, but because math skills shouldn't be measured by multiple choice. A student can learn how to use process of elimination to get right answers on questions they haven't the foggiest idea how to solve. This, in turn, rewards schools (or Princeton Review employees) for teaching process of elimination, instead of math.

(D is for Difference) The ACT and SAT both involve a writing section that gets its own, separate score, not figured into your Verbal score. Here is one way I prefer the ACT to the SAT: its writing prompts are a far better test of what the student knows about the world, how curious she is, and how quickly she can think. The ACT prompt will outline some ongoing historical change in the world; ask you to think about how it's affecting our world and will affect the near future; offers you three real-life perspectives to consider; asks you to write an essay that thoughtfully acknowledges all three; and then asks you to announce and defend your own answer, whether it's one of the three or something else.

I'm not saying it's fun for the student -- it's pressured and unpleasant -- but I think it's very good practice. It demands listening and empathy skills, as well as the ability to make and explain a decision. These are skills schools should be teaching their high schoolers anyway, so using the test to incentivize those lessons is a good idea. I'm not saying students *are* taught these skills. But if they aren't -- which I suspect is often the case when it comes to "acknowledging and synthesizing multiple opinions" -- at least it's despite the ACT designers' best efforts.

(E) Additionally, the ACT offers a "Science" section, which requires no knowledge of science, but good skills interpreting charts and graphs. Chart and graph interpretation is, again, something the schools should try to teach. Credit to the ACT for doing so.

4. The GRE (college/ post-college). The GRE is like the ACT in terms of question types (A), (B), and (C). This is automatically a way it ranks behind the ACT, in terms of fairness. The ACT is administered to students who have, relatively recently, been doing a lot of algebra, plane geometry, and multi-step word problems involving clever manipulation of arithmetic. The GRE is administered to people who often haven't done those forms of math in years.

Be clear what I mean: I've taught two GRE prep class dominated by engineering and science majors -- people who know how to think mathematically. But did they remember how to do 9th grade geometry or algebra? No, because that's not the kind of math their professions-in-progress required. A test showing them to be bad at math is a severely problematic test, I suspect.

The GRE writing section gives you very abstract claims about the world, and asks you to discuss them at length. It's decent prep for being a motivational speaker, I guess.

I ended up ranking the GRE as high as 4th because it contains a unique section that I'd love to see reproduced across the other tests. In it, students are presented with a series of rather brief arguments, each of which is loaded with logical fallacies. Their task is to rip those arguments apart: to identify the rocky assumptions on which they are based, the nonsensical leaps of logic, the kinds of evidence being ignored -- and, sometimes, the ways the arguments could be strengthened.

Of all of the units I teach for Princeton Review (reminding you that I'm not certified to teach Advanced Placement yet), this is the one where I feel best about the life skills I am passing on. We live in a capitalist society with endless media sources: terrible arguments are the cultural air we breathe. They deserve a scorn they too rarely receive, and while other things about the GRE are bad ideas, its help here is valuable.

5. ALEKS math curriculum (K-12). Arguably ALEKS doesn't belong in this article, as -- while it can be used for graded testing if a teacher wants -- the bulk of it is tasks the student needs to score 100% on. It feels test-like in other ways, though, and there's something to be said for a test that won't just dismiss you with a bad score and give up on you forever.

Given my own druthers, I probably wouldn't have chosen to assign my middle school math classes computer-based homework: the ALEKS system is something my school buys subscriptions for and asks me to use as a supplement. It's not bad, though: I'd say 1/4 of my students find it really helpful, 1/4 hate it, and 1/2 are alright with it. It's a self-paced set of lessons that I assign to match what I'm teaching over the course of a trimester. The students answer a series of questions on a topic, get credit only if their answer is exactly right, and are shown what to do if they're wrong, so they can try again. Eventually they have to get several questions right on every mini-topic, for 100 to 200 mini-topics over the trimester.

I have mixed feelings. It's boring, and unlike my teaching, it makes no effort not to be boring. Unlike my teaching, it gives no partial for a good process that's botched at the last moment to produce a wrong answer -- which can be frustrating for a student who's almost but not quite mastered something. I think that the more alienating math is, the harder it is for a student to master it.

On the other hand, I *do* set it to duplicate what I'm teaching, so it's extra practice. And there's something to be said for the combination of "the teacher sympathizes with my efforts" and "but the computer's a dick". Good cop/ bad cop is a thing for a reason: it's useful! As long as my students don't have to feel they're in jail.

6. The SAT (high school). See above: it's the ACT without the ACT's best features. Questions of type (A), (B), and (C) all involve valid life skills, which is good. But because they're often written in strange-looking ways, and because so many of the wrong answers are designed to be traps, it's often useful to suspend lessons on "here is how to be a better reader and math user" in favor of "here is what those College Board jackasses do because they hate you". And "here is how to identify wrong answers without even understanding the question".

As for the writing passage, it's entirely about analyzing rhetoric -- what specific tactics a speaker (in some really well-written passage) is using, to what effect. It could be like the GRE's logical fallacies section, but it's not: the tone is entirely admiring, and the arguments being "analyzed" are ones you're supposed to approve.

For most students, that's going to be an unfamiliar and weird task, thus making the test even more a measure of "Did you hire a tutor? Was he good at his job?" It's also deadening to creativity: in any given speech, there are going to be a certain number of tactics that the speaker is objectively using, so the task is nothing more than to find them, like a Where's Waldo puzzle in which there's ten of him hiding but nobody bothers to tell you what he looks like first.

7. Pearson MyMathLab (high school/ college). More like ALEKS but with elements of the other tests, the MyMathLab tests are a computer-based math curriculum in which you have to get 100% on a lot of pre-tests before also compiling evaluative scores, which you can indeed fail, on regular tests. I've done individual tutoring for students required to use MyMathLab. It is awful.

Even at best, it is dull, like ALEKS. Its explanations are worse than ALEKS's, too, although there's a limit to how well any silent visual explanation of a problem, with no human contact, will be.

It is often *not* at its best, and when it's not, it can stall progress helplessly. Example: on the mortgage-calculation section, many multi-step problems require you to get totals in the thousands of dollars correct to the nearest cent. A problem with this is that sometimes, on the early steps, they want you to use the rounded answers you gave in order to conduct further calculations -- which means that if you retained the non-rounded answers, you'll get them wrong. Nowhere does MyMathLab tell you that; I had to figure it out myself for a student. As a special bonus, even once you understand that, its answers are still not guaranteed to match the calculations Google Spreadsheets gets. And if you miss too many problems (by three cents each), you have to do the test over.

Example too: on a section involving gambling and probability, you find in step one that, say, a certain bet has an expected value of negative eight dollars. The next step, you're asked what negative eight dollars represents: if you say "expected profit", you're wrong. If you say "expected loss", you're right -- even though a negative loss would be a profit. Nowhere is this explained either.

Another: I also had to help a student who was being called wrong on a calculation where the number he entered was correct, and the number cited as the answer was wrong. There was nothing ambiguous: my student was right and the machine blew it. This is understandable in itself; mistakes happen. But the need for a person to record 100% marks on a pretest where even the computer can't score 100%? That is *not* understandable.

Besides which, there was a problem where a student was asked whether it was better to accept a $1200 bill in payment, or a gambling opportunity with an expected value (once you did the math) of $333.33. The obviously correct answer is "take the gambling opportunity, because everyone will recognize a $1200 bill is a forgery". This is not what MyMathLab wants from you.

It is not literally impossible to learn from MyMathLab. But it does little to teach a student, much to frustrate him, and allows the human teacher to assume her job is being done elsewhere. It is the worst of testing, densely compressed so tightly that little else can squeeze in.

8) North Carolina’s end-of-grade math evaluation for the No Child Left Behind Act. (At least the one I’m used to, as this is in potential flux.) At first it seems typical: a bunch of different problems, multiple choice. Maybe you solve them by doing the math, maybe you solve them by recognizing trap answers (or by finding workarounds that only require doing part of the math). None of that’s good. Also, there’s essentially nothing in the way of multi-step problems that make a student think beyond “what’s the formula here?” — already worse than the other math tests mentioned.

What makes (made?) the NCLB tests especially dreadful, though, is that they are pass/ fail — and the number you need to get right, although not something I’m supposed to tell you, is embarrassingly low.

The result is a test designed to encourage teachers to try desperately to help the worst students acquire some minimal knowledge of math, just enough to get by. Maybe the principle isn’t 100% wrong; certainly at my middle school, I work hard to get everyone caught up, and if a student bombs a test, I allow them to take a second, similar test and average their old and new grade in place of the old one. But I’m also pushing students along to think creatively, to master new ideas once they’ve got the old one, even if my attention’s slightly divided.

Whereas the literal incentive the NCLB test places is to flat-out ignore everyone who’s doing a pretty decent job — since there’s no reason to do more. What that means will vary, of course: at my current school, no one’s ever seemed even slightly bothered by the test’s existence. Then again, at the rural high school where I had my first math teaching job, the principal obsessively micromanaged me in an effort to focus only on the test’s pass rate. It made me miserable — I left classroom teaching for the next 11 years. And judging by the school's NCLB rating, I don’t even think it did anything for its own nominal goal.
 
**********
Tests have value to the degree that they incentivize the things we ought to want from our students: knowledge. Pattern recognition. Logical organization of ideas. Empathy. Creativity. Failing that, they should at least incentivize the things so many of us want instead from our students: enough understanding to follow rote procedures that will be useful, we hope, after they've left school.

Some tests are designed for the first. Some tests are designed for the second. Some tests are designed, best I can tell, to let a bunch of underpaid, anonymous, overeducated hacks feel like at least someone else's life is, for a few hours at a time, crappier than theirs. All of these tests are allowed roughly equal influence in our society. That seems to me like something important to fix.

No comments:

Post a Comment