By Benjamin Harold
(Edweek) Students who took the 2014-15 PARCC exams via computer tended to score lower than those who took the exams with paper and pencil—a revelation that prompts questions about the validity of the test results and poses potentially big problems for state and district leaders.
Officials from the multistate Partnership for Assessment of Readiness for College and Careers acknowledged the discrepancies in scores across different formats of its exams in response to questions from Education Week.
“It is true that this [pattern exists] on average, but that doesn’t mean it occurred in every state, school, and district on every one of the tests,” Jeffrey Nellhaus, PARCC’s chief of assessment, said in an interview.
“There is some evidence that, in part, the [score] differences we’re seeing may be explained by students’ familiarity with the computer-delivery system,” Nellhaus said.
In general, the pattern of lower scores for students who took PARCC exams by computer is the most pronounced in English/language arts and middle- and upper-grades math.
Hard numbers from across the consortium are not yet available. But the advantage for paper-and-pencil test-takers appears in some cases to be substantial, based on independent analyses conducted by one prominent PARCC state and a high-profile school district that administered the exams.
In December, the Illinois state board of education found that 43 percent of students there who took the PARCC English/language arts exam on paper scored proficient or above, compared with 36 percent of students who took the exam online. The state board has not sought to determine the cause of those score differences.
Meanwhile, in Maryland’s 111,000-student Baltimore County schools, district officials found similar differences, then used statistical techniques to isolate the impact of the test format.
They found a strong “mode effect” in numerous grade-subject combinations: Baltimore County middle-grades students who took the paper-based version of the PARCC English/language arts exam, for example, scored almost 14 points higher than students who had equivalent demographic and academic backgrounds but took the computer-based test.
“The differences are significant enough that it makes it hard to make meaningful comparisons between students and [schools] at some grade levels,” said Russell Brown, the district’s chief accountability and performance-management officer. “I think it draws into question the validity of the first year’s results for PARCC.”
4 of 5 PARCC Exams Taken Online
Last school year, roughly 5 million students across 10 states and the District of Columbia sat for the first official administration of the PARCC exams, which are intended to align with the Common Core State Standards. Nearly 81 percent of those students took the exams by computer.
Scores on the exams are meant to be used for federal and state accountability purposes, to make instructional decisions at the district and school levels, and, in some cases, as an eventual graduation requirement for students and an eventual evaluation measure for teachers and principals.
Several states have since dropped all or part of the PARCC exams, which are being given again this year.
PARCC officials are still working to determine the full scope and causes of last year’s score discrepancies, which may partly result from demographic and academic differences between the students who took the tests on computers and those who took it on paper, rather than the testing format itself.
Assessment experts consulted by Education Week said the remedy for a “mode effect” is typically to adjust the scores of all students who took the exam in a particular format, to ensure that no student is disadvantaged by the mode of administration.
PARCC officials, however, said they are not considering such a solution. It will be up to district and state officials to determine the scope of any problem in their schools’ test results, as well as what to do about it, Nellhaus said.
Such uncertainty is bound to create headaches for education leaders, said Michael D. Casserly, the executive director of the Council of the Great City Schools, which represents 67 of the country’s largest urban school systems.
“The onus should be on PARCC to make people aware of what these effects are and what the guidelines are for state and local school districts to adjust their data,” Casserly said.
Comparing Online and Paper Tests a Longstanding Challenge
The challenges associated with comparing scores across traditional and technology-based modes of test administration are not unique to PARCC.
The Smarter Balanced Assessment Consortium, for example, told Education Week that it is still investigating possible mode effects in the results from its 2014-15 tests, taken by roughly 6 million students in 18 states. That consortium—which, like PARCC, offers exams aligned with the common core—has yet to determine how many students took the SBAC exam online, although the proportion is expected to be significantly higher than in PARCC states.
Officials with Smarter Balanced are in the early stages of preparing technical reports on that and other matters.
“We’ll analyze the operational data. I can’t speculate in advance what that implies,” Tony Alpert, the executive director of Smarter Balanced, said in an interview. “We don’t believe that differences in scores, if there are any, will result in different decisions that [states and districts] might make based on the test.”
States that administer their own standardized exams, meanwhile, have for years conducted comparability studies while making the transition from paper- to computer-based tests. Past studies in Minnesota, Oregon, Texas, and Utah, for example, have returned mixed results, generally showing either a slight advantage for students who take the tests with paper and pencil, or no statistically significant differences in scores based on mode of administration.
The National Center for Education Statistics, meanwhile, is studying similar dynamics as it moves the National Assessment of Educational Progress, or NAEP, from paper to digital-administration platforms.
An NCES working paper released in December found that high-performing 4th graders who took NAEP’s computer-based pilot writing exam in 2012 scored “substantively higher on the computer” than similar students who had taken the exam on paper in 2010. Low- and middle-performing students did not similarly benefit from taking the exam on computers, raising concerns that computer-based exams might widen achievement gaps.
A still-in-process analysis of data from a study of 2015 NAEP pilot test items (that were used only for research purposes) has also found some signs of a mode effect, the acting NCES commissioner, Peggy G. Carr, told Education Week.
“The differences we see across the distribution of students who got one format or another is minimal, but we do see some differences for some subgroups of students, by race or socioeconomic status,” she said.
One key factor, according to Carr: students’ prior exposure to and experience with computers.
“If you are a white male and I am a black female, and we both have familiarity with technology, we’re going to do better [on digitally based assessment items] than our counterparts who don’t,” she said.
The NCES is conducting multiple years of pilot studies with digitally based items before making them live, in order to ensure that score results can be compared from year to year.
A PARCC spokesman said the consortium did analyze data from a 2014 field test of the exam to look for a possible mode effect, but only on an item-by-item basis, rather than by analyzing the exam taken as a whole. The analysis found no significant differences attributable to the mode of administration.
When asked why 2014-15 test scores were released to the public before a comprehensive analysis of possible mode effects was conducted, Nellhaus, PARCC’s chief of assessment, said responsibility rests with the states in the consortium. “People were very anxious to see the results of the assessments, and the [state education] chiefs wanted to move forward with reporting them,” Nellhaus said. “There was no definitive evidence at that point that any [score] differences were attributable to the platform.”
Illinois, Baltimore County Find Differences in PARCC Scores By Testing Format
The Illinois state school board made its PARCC results public in mid-December. In a press release, it made indirect mention of a possible mode effect, writing that the board “expects proficiency levels to increase as both students and teachers become more familiar with the higher standards and the test’s technology.”
A comparison of online and paper-and-pencil scores done by the state board’s data-analysis division was also posted on the board’s website, but does not appear to have been reported on publicly.
That analysis shows often-stark differences by testing format in the percentages of Illinois students who demonstrated proficiency (by scoring a 4 or 5) on PARCC English/language arts exams across all tested grades. Of the 107,067 high school students who took the test online, for example, 32 percent scored proficient. That’s compared with 50 percent for the 17,726 high school students who took the paper version of the exam.
The differences by format are not so pronounced in elementary-grades math; in grades 3-5, in fact, slightly higher percentages of students scored proficient on the online version of the PARCC exam than on the paper version.
But proficiency rates among paper-and-pencil test-takers were 7 to 9 points higher on the 8th grade and high school math exams.
The Illinois board has not conducted any further analysis of the results to determine the cause of those discrepancies. Board officials declined to be interviewed.
“The statewide results in Illinois suggest some differences in performance between the online and paper administrations of the assessment,” according to a statement provided by the board. “There is no consistent relationship from district to district. … Both versions of the test provide reliable and valid information that teachers and parents can use to identify student strengths and areas needing improvement.”
In Maryland, meanwhile, more than 41,000 Baltimore County students in grades 3-8 took the PARCC exams in 2014-15. Fifty-three percent of students took the math exam online, while 29 percent took the English/language arts exam online. The mode of test administration was decided on a school-by-school basis, based on the ratio of computers to students in each building’s largest grade.
Like Illinois, Baltimore County found big score differences by mode of test administration. Among 7th graders, for example, the percentage of students scoring proficient on the ELA test was 35 points lower among those who took the test online than among those who took the test on paper.
To identify the cause of such discrepancies, district officials compared how students and schools with similar academic and demographic backgrounds did on each version of the exams.
They found that after controlling for student and school characteristics, students were between 3 percent and 9 percent more likely to score proficient on the paper-and-pencil version of the math exam, depending on their grade levels. Students were 11 percent to 14 percent more likely to score proficient on the paper version of the the ELA exam.
“It will make drawing comparisons within the first year’s results difficult, and it will make drawing comparisons between the first- and second-year [PARCC results] difficult as well,” said Brown, the accountability chief for the Baltimore County district.
“This really underscores the need to move forward” with the district’s plan to move to an all-digital testing environment, he said.
A Big ‘Bug in the System’
In the meantime, what should state and district leaders, educators, and parents make of such differences?
The test results still have value, said Nellhaus of PARCC.
“This is still useful and important information providing a wealth of information for schools to improve instruction and identify students who need assistance or enrichment,” he said.
But possible mode effects on multistate-consortia exams should be taken seriously, at least in the short term, and especially if they have not been accounted for before test results are reported publicly, said assessment experts consulted by Education Week.
“Because we’re in a transition stage, where some kids are still taking paper-and-pencil tests, and some are taking them on computer, and there are still connections to high stakes and accountability, it’s a big deal,” said Derek Briggs, a professor of research and evaluation methodology at the University of Colorado at Boulder.
“In the short term, on policy grounds, you need to come up with an adjustment, so that if a [student] is taking a computer version of the test, it will never be held against [him or her],” said Briggs, who serves on the technical-advisory committees for both PARCC and Smarter Balanced.
Such a remedy is not on the table within PARCC, however.
“At this point, PARCC is not considering that,” Nellhaus said. “This needs to be handled very locally. There is no one-size-fits-all remedy.”
But putting that burden on states and school districts will likely have significant implications on the ground, said Casserly of the Council of the Great City Schools.
“I think it will heighten uncertainty, and maybe even encourage districts to hold back on how vigorously they apply the results to their decisionmaking,” he said.
“One reason many people wanted to delay the use [of PARCC scores for accountability purposes] was to give everybody a chance to shake out the bugs in the system,” Casserly added. “This is a big one.”
Associate Editor Catherine Gewertz contributed to this article.
Coverage of the implementation of college- and career-ready standards and the use of personalized learning is supported in part by a grant from the Bill & Melinda Gates Foundation. Education Week retains sole editorial control over the content of this coverage.