Berliner: Why Rising Test Scores May Not Mean Increased Learning
The Answer Sheet asked prominent researcher and educational psychologist David Berliner of Arizona State University to explain why rising standardized test scores may not mean that students have learned anything. Here is his post:
By David Berliner
A rise in test scores leads most people to believe good things are happening in their schools. Not unreasonably, politicians and parents alike infer that students have learned more when test scores go up. But since the No Child Left Behind (NCLB) law was passed that inference may be unwarranted. Sadly, there are numerous reasons why rising test scores may not be related to increases in student learning.
1). Virtually all states have changed the passing score on tests so that more children are classified proficient.
NCLB is bizarre since it requires that 100 percent of the children at a school need to be “proficient” in reading and mathematics by 2014. But at what score is one proficient?
The choice of a score for “passing” a test, or for being labeled “proficient,” “basic” or “exemplary,” are value-laden choices. They have little to do with statistics and measurement, and are, instead, political decisions.
Think of it this way: If your students, teachers, and schools were faced with inevitably and inescapably being labeled as failures under a law that is impossible to meet, but could instead be seen as successful by dropping the score at which students are classified as proficient, wouldn’t you be tempted to do that? Almost every state has done exactly that.
By fiddling with passing scores more children are declared proficient. But the increasing percentage of students declared proficient does not necessarily mean that students’ scores have actually risen, or that greater learning has taken place. It may only mean that more students are classified as proficient because the score needed to be classified “proficient” was lowered.
2). School districts across the nation engage in excessive, perhaps unethical, and, in some cases, illegal test preparation. This results in higher test scores, but not necessarily greater learning.
NCLB has resulted in large increases in test preparation activities to ensure that student scores go up, as required by the law. But the test was built to assess learning under “normal” conditions, not conditions in which students were drilled daily in tasks known to be similar to those on their state test. Normal conditions do not mean daily or weekly testing with exams suspiciously similar to those used by the state.
Data from all over the country suggests that it is not uncommon for 20-60 school days per year to be spent in test-preparation. Children can certainly be trained to answer questions in a certain way if they are drilled enough on items like those that will appear on their test. In that way, test scores can be made to increase. But that is not due to education, it is due to training. Under such circumstances it is not clear that authentic learning has taken place.
3) Familiarity with the objectives and the items on a test invariably results in increased test scores.
Teachers and administrators are not fools. If a test is given every year with the same objectives, built to the same curriculum standards, and using many of the same items from one administration to the next, teachers and administrators come to know what will be on the test.
Unless the testing company employed by the state is willing to change items frequently, test scores will inevitably rise every year. Test score increases may be due to teachers’ knowledge about what is on the test, or because students really learned more than the year before, but we do not know which explanation is appropriate.
It is expensive to change test designs and items frequently, so that usually is not done. The inevitable result is that a larger and larger percent of the children in a state score above the average that was obtained the first time the test was given.
The result is called the “Lake Wobegon Effect,” named after Garrison Keillor’s mythological Lake Wobegon, where “all the woman are strong, all the men are good looking, and all the children are above average.”
4) The test items we use do not tap the knowledge we really want to assess.
Most NCLB tests consist of many multiple-choice items, rather than essays or other techniques for the assessment of complex performance.
The reason for that is simple: multiple-choice items are cheap to produce and cheap to score. Also, by including lots of such items during any one testing session a test becomes more reliable. That is, the scores obtained are more dependable as indicators of whatever is being measured. In contrast, because of the time they take to answer, you can only have one or a few essay items per test session.
Because of the small number of essay items on a test, essay exams are usually not reliable indicators of learning. Unreliable tests should never be used to make decisions about students, teachers, or schools.
So we don’t see much in the way of essay testing. Furthermore, essay test items are much more expensive and time consuming to score. It boils down to this: Extensive reliance on multiple-choice questions makes it harder to be sure that deeper, more complex learning has taken place.
5) Afraid they could be fired or their schools closed because of NCLB test scores, district and school administrators invent ways to prevent the poorest performing students from taking tests.
Students are dropped or pushed out of school, certain students are suspended, and some students are moved to other schools mid-year so their scores will not be counted.
For example, because they were expected not to do well, more than 500 students from high schools in Birmingham, Ala., were dropped just before state testing. And in New York City political leaders had to apologize for policies that pushed out thousands of academically weak students from the schools. Pushing out the weakest students raises district or school scores.
Meanwhile, high performing students are pushed to take tests that they previously passed a second and a third time so that the average scores of a school or district will go up. Manipulation of test scores makes interpretation of those scores problematic. Untrustworthy data means we do not know if higher scores really equal greater learning.
Furthermore, the pressure that NCLB puts on teachers transforms the relationship between teachers and students from a caring one into an instrumental one. With high-stakes testing children are too frequently seen positively only if they can increase school test scores, and they are too often seen negatively if they cannot. A child’s worth has become their test score.
6) It is common for scores to go up because of cheating. For example, there are companies that look for anomalies in test scoring. They often find incidents such as a low-scoring student suddenly getting seven items right in a row, or a class in a low-performing school suddenly outperforming classes in a neighboring high-performing school. These may or may not be instances of cheating, but several hundred of these anomalies are associated with NCLB tests in many states.
In Texas, the State Department of Education refused to investigate one such discovery. This was not a surprise since it was uncovered that the head of the State Department of Education had for many years been turning in false scores from the district in which she had previously been a superintendent.
Wesley Elementary School in Houston provides another example. It won accolades for teaching low-income students how to read. The school was even featured on an Oprah Winfrey show about schools that “defy the odds.”
In 2003, Wesley’s fifth graders performed in the top 10 percent in the state on reading exams. The next year, as sixth-graders at the Williams Middle School, the same students fell to the bottom ten percent in the state. Confronted with data the Wesley teachers admitted cheating was standard operating procedure.
Conclusion: It is likely that there will always be some corruption associated with the people and the tests used to assess learning when so much pressure is on administrators and teachers to increase test scores.
This means that when scores do go up, we need to be wary. We need to investigate whether the rise in test scores is a real indicator of greater learning or some form of deception.
Unfortunately research suggests that deception and cheating in contemporary American culture, including our schools, has become more acceptable. Among the many problems associated with this cultural shift is that it is not easy to interpret the test scores associated with NCLB. We are no longer certain that a rise in test scores means that more student learning took place.
Caveat Interpretis! Let the interpreter of scores beware!
David C. Berliner is co-author with Sharon L. Nichols of Collateral Damage: How high-stakes testing corrupts America’s schools (2007). He is also co-author with Bruce J. Biddle of The Manufactured Crisis: Myth, fraud, and the Attack on America’s Public Schools (1995).
For more on cheating in contemporary American schools see Callahan, D. (2004): The cheating culture: Why more Americans are doing wrong to get ahead; and Nichols, S. N. and Berliner, D. C. ( 2006). The Pressure to Cheat in a High-Stakes Testing Environment, In E. M. Anderman & T. B. Murdock (eds): The Psychology of Academic Cheating.