Friday, May 06, 2005

TESTING None of the Above

It's worth looking at this piece. It speaks directly to the validity of the TAKS exam. Can you imagine how much worse this exam is compromised when it gets translated into Spanish for English language learners? -Angela

April 24, 2005

TESTING: None of the Above
By LISA GUERNSEY

O years ago, fifth graders taking Texas's annual standardized science test faced this multiple-choice question: "Which two planets are closest to Earth?" The four choices were "Mercury and Saturn," "Mars and Jupiter," "Mercury and Venus" and "Venus and Mars."

Simple, right? The Texas Education Agency thought so; every fifth grader should know that Venus and Mars orbit on either side of Earth's orbit (remember the mnemonic "My Very Excellent Mother Just Sent Us Nine Pizzas"?). "Venus and Mars," therefore, would have been a good pick.

But wait, said Mark Loewe, a Dallas physicist who was curious about what students are expected to know and so took the test. The question asked which planets - not which planets' orbits - were closest to Earth. So the correct answer depends on when the question is asked.

"Mercury, which orbits closest to the Sun, is closest to Earth most often," Dr. Loewe said, and sure enough, during that test week in spring 2003, Mercury and Mars were the planets closest to Earth, according to the National Aeronautics and Space Administration Web site. That pair was not among the possible answers.

So is the question valid? Perhaps, since the problem was written for a typical 10-year-old, not someone with Dr. Loewe's understanding of science. On the other hand, the problem ignores the physical world woven into the question, and that might trip up brighter fifth graders.

Beware the perils of ambiguity. It is a mantra that is increasingly pertinent to tests in mathematics and science. The two fields might seem immune from imprecision. But in mathematics, for example, today's tests assess more than a student's ability to do "naked computation," as Cathy Seeley, president of the National Council of Teachers of Mathematics, puts it. In many places, calculators have rendered meaningless the testing of basic computational tasks. Instead, more questions test students' comprehension in real-world contexts. A triangle is a corner garden bed. A rectangular object intersected by a line is a juice box, with a straw. A sloped line on a graph represents a year's worth of payments to the power company.

With these scenarios come variables, and mathematicians and scientists from British Columbia to Boston spend much time picking apart the questions, particularly in online discussion groups. If students are asked how many seeds can be planted in the surface area of a triangular garden, do you put seeds in the corners where there isn't room for plants to take root? What about relevant considerations like seasonality of utility bills or position of the planets? Multiple-choice questions, with no place to show your work and thinking, make such realities more vexing.

"To the lay eye, it may appear that I am being picky, criticizing the minutest detail of the exam," wrote James A. Middleton, a mathematics education professor at Arizona State University, in an online critique of his state's high school exams. "I am being picky. Any first-semester student of psychometrics (the statistical study of test design, administration and analysis) could tell you that if a test is to provide reliable and valid data, its items must be designed well, reflect the standards of the content, and clearly allow students who understand the content to demonstrate that understanding."

Professor Middleton decided that a quarter of the questions he analyzed had mistakes in content or context (he has just completed an analysis of recently released questions and says there is improvement).

"It's an increasingly severe problem," says Walter M. Haney, a senior researcher at Boston College's Center for the Study of Testing, Evaluation and Educational Policy.

The process by which questions are vetted is long and costly. "On most of the tests that are created today, the people who write them and the people who review them do a conscientious and good job," says Gregory Cizek, an education professor at the University of North Carolina at Chapel Hill, who has been an elementary school teacher and a test writer for ACT Inc. But, he adds, "stuff always slips through."

Part of the challenge is writing appropriate questions for a particular grade level while not misleading a student who happens to know more. A 10th grader with a sibling in 12th grade may know some higher-level math; a 12th grader taking a physics course at a local college or online may look at a question differently than another student in the same grade.

Consider, for example, another question critiqued by Dr. Loewe on the Texas Assessment of Knowledge and Skills in 2003. Students in 11th grade were asked to calculate how much force a frog would exert against a river bank while leaping off. Dr. Loewe, who has co-written a textbook on quantum mechanics, says that when he worked out the problem, he included the frog's gravitational weight (its force during rest, which he determined by using the formula for acceleration due to gravity, which was provided at the beginning of the test). But the answer key made clear that the question writer did not expect students to consider that.

Dr. Loewe found several other problems and informed the Texas Education Agency. "Texans are ill-served by such incompetence or dishonesty," he wrote. The agency responded by hiring university professors to review exam questions, says Victoria Young, director of instructional coordination in the agency's testing office. The professors review tests that relate to their areas of expertise but are also familiar with what is taught in high school.

Nonetheless, Ms. Young disputes Dr. Loewe's specific complaints. Asked whether the planets question might have led students to Dr. Loewe's answer, she responded, "Not a realistic viewpoint, in my opinion." And of the leaping frog, she says that physics educators have told her that "only if you brought a very advanced level of college physics to the table would you know enough to know that the answer could be arrived at differently."

Not so, says Dr. Haney of Boston College. Students should have known to take the frog's gravitational weight into account, and so argued a few Texans in letters to The Fort Worth Star-Telegram, which reported on the dispute. Dr. Haney says one reason for the problem is that "a lot of the people who may be writing the math and science questions may not have a deep understanding of the math and science that they are trying to test."

In most cases, the people who write the questions are or have been teachers. Often, they are paid to attend summer workshops led by companies that have contracted with the states to develop the tests.

Marilyn Rindfuss, national senior mathematics consultant at Harcourt Assessment, which creates standardized tests for dozens of states, emphasizes a process she calls "tightening up." Ms. Rindfuss asks every question writer to go through a checklist that includes such questions as "Is the fictional information realistic? (no 75-pound housecats)" and "Is there one, and only one, clearly correct answer?" She also rejects questions that ask test takers to extrapolate patterns, because some people see a pattern that a question writer does not, leading the scorer to start "counting answers wrong that are, in fact, correct."

(To get a sense of such patterns, try this question, which has been kicking around New York City high schools for decades: What comes next in this sequence: 28, 23, 18, 14, __? Read on to find the answer.)

Once questions are written, they are typically reviewed by multiple groups that include test writers, teachers, editors, statisticians and content specialists. And then most developers test the questions on real students in real exam settings. In field testing, statisticians may discover that most top-scoring students selected answer "d" when answer "c" was deemed correct. What made "d" so appealing to the advanced students? Could a flaw in the question have led them to arrive at an equally correct answer? In most cases, the incongruity is a red flag, prompting developers to discard the question.

INSUFFICIENT field testing was blamed in part for the controversy in New York high schools in 2003, when two-thirds of test takers failed the Math A exam, which is required for a diploma. As a result, the state Board of Regents overhauled the math curriculum, and last month announced that the state would abandon its approach of integrated mathematics in favor of the old-fashioned curriculum of algebra for freshmen, geometry for sophomores and algebra II and trigonometry for juniors. The first revised exam based on those standards will be given in 2007.

Daniel Jaye, an assistant principal at Stuyvesant High School in Manhattan, served on the panel charged with investigating the troubled exam. He recalls a telling moment during the administration of the test at his school. "One of the kids raised his hand and the proctor called me into the room," Dr. Jaye says. The student was puzzled by a question about a straw that rested diagonally in a rectangular box, 3 by 4 by 8 inches. The question asked for the length of the straw to the nearest 10th of an inch. The answer, according to the Board of Regents, was 9.4 inches.

But, the student asserted, there was not enough information to answer the question correctly. If the question asked about the length of a line, he figured he could solve the problem. But because it asked about the length of a straw, he needed the radius of the straw to determine where it would touch the corner of the juice box.

"How am I expected to come up with an accurate answer?" the student asked.

"I started laughing because he was right," Dr. Jaye recalls. "I said, 'I don't know. I really don't know.' "

Jerry P. Becker, a professor of curriculum and instruction at Southern Illinois University, wrote this in an online discussion about ambiguous math questions: "We're talking about kids who are desperately trying to penetrate the minds of adults and figure out what is 'really' being asked and what 'the trick' is."

Professor Becker particularly dislikes multiple-choice questions. They are inexpensive to score, because they can be run through machines, but they include plausible incorrect choices, he says. Worse yet, many educators and scientists say, the multiple-choice format does not allow for creative, unexpected solutions.

Which brings us to the pattern question posed earlier: What comes next after 28, 23, 18 and 14? Actually, it's a math teacher's idea of a joke.

The answer is Christopher Street, the next stop downtown on the 1 and 9 lines in the New York City subway system.


Lisa Guernsey contributes articles on education and technology for The Times.


Copyright 2005 The New York Times

http://www.nytimes.com/2005/04/24/education/edlife/guernsey24.html?ex=1115524800&en=09c10f265b40574f&ei=5070&oref=login

No comments:

Post a Comment