Les Perelman is retired Director of Writing Across the Curriculum at the Massachusetts Institute of Technology. He is, not only a good friend, but is a nationally recognized expert on computer-based scoring. He is the inventor, along with his students, of BABEL, the Basic Automated BS Essay Language Generator.
testing
The PARCC and Les Perelman’s BABEL machine.
Les Perelman.
-This is an excerpt from The Washington Post’s Valerie Strauss column featuring a post by Leonie Haimson. Les Perelman is an old high school pal, retired director of the multi-discipline writing program at MIT and an expert on computer scoring. Read the entire article here.
According to Les Perelman, retired director of a writing program at MIT and an expert on computer scoring, the PARCC/Pearson study is particularly suspect because its principal authors were the lead developers for the ETS and Pearson scoring programs. Perelman said: “It is a case of the foxes guarding the hen house. The people conducting the study have a powerful financial interest in showing that computers can grade papers.”
In addition, the Pearson study, based on the spring 2014 field tests, showed that the average scores received by either a machine or human scorer was “very low: below 1 for all of the grades except grade 11, where the mean was just above 1.”
Given the overwhelming low scores, the results of human and machine scoring would of course be closely correlated in any scenario.
Les Perelman concludes: “The study is so flawed, in the nature of the essays analyzed and, particularly, the narrow range of scores, that it cannot be used to support any conclusion that Automated Essay Scoring is as reliable as human graders. Given that almost all the scores were 0’s or 1’s, someone could obtain to close the same reliability simply by giving a 0 to the very short essays and flipping a coin for the rest. ”
As for the AIR study, it makes no particular claims as to the reliability of the computer scoring method, and omits the analysis necessary to assess this question.
As Perelman said: “Like previous studies, the report neglects to give the most crucial statistics: when there is a discrepancy between the machine and the human reader, when the essay is adjudicated, what percentage of instances is the machine right? What percentage of instances is the human right? What percentage of instances are both wrong? … If the human is correct, most of the time, the machine does not really increase accuracy as claimed.”
Moreover, the AIR executive summary admits that “optimal gaming strategies” raised the score of otherwise low-scoring responses a significant amount. The study then concludes because that one computer scoring program was not fooled by the most basic of gaming strategies, repeating parts of the essay over again, computers can be made immune from gaming. The Pearson study doesn’t mention gaming at all.
Indeed, research shows it is easy to game by writing nonsensical long essays with abstruse vocabulary. See for example, this gibberish-filled prose that received the highest score by the GRE computer scoring program. The essay was composed by the BABEL generator – an automatic writing machine that generates gobbled-gook, invented by Les Perelman and colleagues. [A complete pair of BABEL generated essays along with their top GRE scores from ETS’s e-rater scoring program is available here.]
In a Boston Globe opinion piece , Perelman describes how he tested another automated scoring system, IntelliMetric, that similarly was unable to distinguish coherent prose from nonsense, and awarded high scores to essays containing the following phrases:
“According to professor of theory of knowledge Leon Trotsky, privacy is the most fundamental report of humankind. Radiation on advocates to an orator transmits gamma rays of parsimony to implode.’’
Unable to analyze meaning, narrative, or argument, computer scoring instead relies on length, grammar, and arcane vocabulary to do assess prose. Perelman asked Pearson if he could test its computer scoring program, but was denied access. Perelman concluded:
If PARCC does not insist that Pearson allow researchers access to its robo-grader and release all raw numerical data on the scoring, then Massachusetts should withdraw from the consortium. No pharmaceutical company is allowed to conduct medical tests in secret or deny legitimate investigators access. The FDA and independent investigators are always involved. Indeed, even toasters have more oversight than high stakes educational tests.
A paper dated March 2013 from the Educational Testing Service (one of the SBAC sub-contractors) concluded:
Current automated essay-scoring systems cannot directly assess some of the more cognitively demanding aspects of writing proficiency, such as audience awareness, argumentation, critical thinking, and creativity…A related weakness of automated scoring is that these systems could potentially be manipulated by test takers seeking an unfair advantage. Examinees may, for example, use complicated words, use formulaic but logically incoherent language, or artificially increase the length of the essay to try and improve their scores.
The inability of machine scoring to distinguish between nonsense and coherence may lead to a debasement of instruction, with teachers and test prep companies engaged in training students on how to game the system by writing verbose and pretentious prose that will receive high scores from the machines. In sum, machine scoring will encourage students to become poor writers and communicators.
There is nothing objective about standardized testing.
-By Karl-Heinz Gabbey
Please allow me to add an observation regarding this thing you call “objectivity”: It doesn’t exist! “Objectivity” exists nowhere in the private or public sectors, including education. Beyond state mandates, school boards and administrators, usually in negotiation and collaboration with teacher representatives, set standards and rules for teachers. Teachers, in turn, set standards and rules for students. In principle, bosses in the private sector do the same for their employees. Reasonable individuals strive for what is valid, fair, moral, and yields the best results; though in the end, beauty is still in the eyes of the beholder.
That’s called subjectivity. The best we can do to prevent a clash of subjectivities and to operate within a reasonable framework, is for us to create collaboratively certain ground rules embodied in a legal contract that binds all parties. In truth, where is the “objectivity” here?
No matter how hard we try to be totally objective, we will never achieve it due to mankind’s imperfections. Human beings are not perfect machines, and what they produce even with great precision always has flaws. Ever hear of car recalls?
Among the greater imperfections or follies of mankind are standardized tests like PARCC, particularly when they abuse children and are misused as teacher evaluation instruments. One fact that I learned from a very wise, former colleague is that if these standardized tests “test” anything “they really test the community in which the school is located.” Give that some thought.
Spare yourself the agony in the attempt to achieve “objectivity.” End your useless obsession. I have a feeling that you could find far better things to do.
April 1st was a lost day of PARCC testing.
Stop lying about 4/1being a lost day of instruction. It was a lost day of PARCC testing.
— Karen Lewis (@KarenLewisCTU) April 2, 2016
Fred,
So – is assessment part of teaching?
Did you ever grade your art students? Based on what data?
What if I didn’t like the methodology you used, like you don’t like PARCC?
Is teacher evaluation a valid part of education? How were you evaluated (or weren’t you – being a Union boss)? Is student achievement/improvement a valid component of teacher assessment? If not, what should teacher assessment be about? Union Membership?
Just asking; don’t expect answers – I think I know what yours are.
-Akvida
—————
Dear Akvida,
1. Of course assessment is a part of teaching. Standardized tests like the PARCC have very little to do with teaching and learning. They are neither valid nor reliable. PARCC is created by those far from the teaching and learning process and the results are of little use for either assessing learning or improving teaching.
2. I graded my art students as little as possible. I gave them lots of feedback and encouragement with an understanding that assigning a score or a letter grade to their work provided neither useful feedback nor encouragement.
3. It is not a matter of *liking* a methodology. Everything about teaching and learning should be morally and educationally defensible. More than anything else, teaching is a moral act with a moral purpose.
4. However you propose to evaluate a teacher, it is not morally or educationally defensible to do it based on the individual test scores of their students based on one set of tests.
5. Teacher assessment, like evaluating all the social work people do, should be based on a conversation among those involved. And yes, union membership provides a way to have that conversation based on the collective bargaining process. In my district, for example, we spent several years developing a process with our board and administration for evaluating, improving or dismissing employees that was fair, required documentation, and meant that our district had a high quality staff of empowered teachers and administrators. Once Illinois adopted a state-wide evaluation system as demanded by Arne Duncan and the Department of Education, our local evaluation system had to be thrown out.
-Fred
“Yikes,” said Peter Cunningham.
Peter Cunningham.
I don’t normally get into Twitter exchanges. I read ’em. I post ’em. I leave ’em be.
The other day I came across this Peter Cunningham tweet:
Yikes! With low grad rates and high college remediation rates, Oregon signs nation’s first stand-alone opt-out bill. https://t.co/JZdnuWr4ad
— Peter Cunningham (@PCunningham57) December 17, 2015
I was genuinely puzzled by the “Yikes,” because I couldn’t figure out what the relationship was between parents having the right to opt their kids out from stupid tests like PARCC and low graduation rates or remediation rates.
It was a timely “Yikes,” because the results of Illinois student PARCC tests were just released last week. These are scores from tests given last year. It is now the second trimester for most students in the state. I cannot imagine, nor can anyone tell me what use these results offer other than to be used as punishment.
Which is what I tweeted to Peter.
Peter Cunningham is a former Arne Duncan advisor from Chicago and then the Department of Education and now runs an organization and blog that promotes the standard corporate reform agenda of phony accountability and charter schools.
@fklonsky My point is that the test tracks progress and the other data shows Oregon falling short. Not saying tests boost grad rates.
— Peter Cunningham (@PCunningham57) December 17, 2015
I was puzzled again. If, as Peter says, Oregon shows low graduation rates and high college remediation rates, what are we tracking by annual testing that we don’t already know?
@fklonsky It also tells you which teachers are struggling so you can do something about that if you have the will and the courage. Alas.
— Peter Cunningham (@PCunningham57) December 17, 2015
I think Peter just likes tests. Alas.
As someone involved in education for many years – and by “involved,” I don’t mean Peter was engaged in teaching or spending time in a classroom in front of students – he must surely know that a test designed to understand what students know is not designed to tell us how teachers teach.
That’s not how it works.
Because a student performs poorly on a test does not necessarily suggest that it was the teacher who is struggling. Unless Peter thinks the PARCC is a test of teachers instead of students.
That is exactly the fallacy of value added measures.
Testing. The 2% solution.
President Obama’s announcement that students should be tested less is a victory for all those who have seen the destructive results of the education accountability movement that saw its birth in the Nation at Risk report over 30 years ago.
He said it because a movement demanded he say it.
But there may be less here than meets the eye.
When I was still teaching just a few short years ago, our school district had an Assistant Superintendent of Student Learning – a title that had been changed a few years earlier from Assistant Superintendent of Curriculum – who was fond of repeating, “We value what we measure and we measure what we value.”
I’m sure she picked that up at a breakout session at ASCD or saw it on a wall somewhere.
What she meant by measure was standardized testing. Like schools everywhere, we did a lot of it.
To sound progressive, she spoke of multiple measures. She meant multiple kinds of standardized tests that had multiple initials.
She was right though. We tested what we valued.
President Obama says he wants to limit the amount of standardized testing to 2% of student time, not counting test prep.
That happens to be about the same percentage of the school year that the district I retired from offered students time in the Art room. And we were a suburban district that supported the Arts.
2%.
That is what supporting the Arts means in American schools.
None of the standardized tests our district used measured what I taught or what my students learned in Art.
Thank goodness.
Every time I sat in a meeting with the Assistant Superintendent of Student Learning and she said that “we value what we measure and we measure what we value,” I was reminded that she didn’t value Art. Or Music. Or play.
Not because we didn’t test it. But because we were spending more of the students’ day measuring and testing than we were spending making Art, Music or play.
Even if the President is serious, or is even capable, of turning around the monster we have created, it will take years and generations of students.
We now have a national system of education in which every part of it, from admission requirements to teacher evaluation, is rooted in a monstrous system of standardized testing. It will take a major effort and dedication of teachers and parents at street level, and leaders with the will, to undo it anytime soon.
The movement below exists to stop it.
We need more than the words of a lame duck president who allowed toxic testing to proceed and grow for seven years if we want to turn this sucker around.
When there is more time for art, music and play than there is for measuring and testing, then we will know we have gotten somewhere.
I predict John Oliver’s takedown of standardized tests will be on every teacher’s Facebook page by tomorrow.
Little Rock 58 years later. LRSD taken over by Waltons and Rockefeller Foundation.
Little Rock Public Schools have been taken over by the state of Arkansas – the Waltons and the Rockefeller Foundation. The reason? Poor test scores at four schools. The union had a plan in place to improve those schools, negotiated with the school board, but the state intervened before the intended results could be achieved. Students are marching and protesting.
Statement from the Little Rock School District Student Association:
On February 1st, students from the Little Rock School District (LRSD) met to organize the foundations of the Little Rock School District Student Association (LRSDSA). The team of students, working throughout Sunday afternoon, represented three of the five high schools in the district (Hall High School, Little Rock Central High School, and Parkview Art/Science Magnet High School). The students capitalized on momentum generated by the Arkansas State Board of Education’s recent takeover of the LRSD– and subsequent dissolution of the district school board– to create a groundbreaking camaraderie between students.
The LRSDSA plans to provide representation for the students of the district in the political bodies that dictate the future of education. The working mission statement of the LRSDSA was drafted during the meeting and reads, “The LRSDSA is an association of students united to amplify our voices and dedicated to empowering students to speak out in their classrooms, schools, and community in order to create continual implementation of reform in our district.” The students of the LRSDSA are students who stand, “dedicated to ensuring our voice and our vote in our education.”
The students founding the new association feel that their collective voices have gone unheard by the Arkansas State Department of Education. Over the past several weeks, these students spoke at out at LRSD Board of Directors meetings, community forums, and a special meeting of the State Board of Education to plead for the continuation of the LRSD Board of Directors. The LRSDSA believes that those in charge of a school district must possess an intimate knowledge of the communities surrounding struggling schools and be willing to recognize student voices as equal to those of administrators and teachers. This intimate connection is easily lost in bureaucracy, as demonstrated by the decision of five members of State Board of Education to vote for a State takeover, thereby disregarding the voices of students who spoke out and implored the members of the Arkansas State Board of Education to allow students from each high school to work with the LRSD Board of Directors, community members, teachers, and administrators to to improve education across the district.
The Little Rock School District Board of Directors was a democratically elected body and provided a seat for a student ex officio at every meeting. Several students engaged in forming the LRSDSA worked on the campaigns of school board members, and many students formed personal connections with the board. The Arkansas State Board of Education currently allows for no official student representative at their meetings and often schedules these meetings during school hours, making it impossible for students to attend meetings concerning their education. The LRSDSA seeks to change that.
Additionally, the LRSDSA plans to make known to the Arkansas State Board of Education and to the public that they are displeased with both the dissolution of the LRSD Board of Directors and the silencing of student voices through a peaceful demonstration on Thursday, February 5th, 2015. At 5pm, students will march from the Arkansas State Board of Education at 4 Capitol Mall to the LRSD Central Office– the location of LRSD Board of Directors meetings– located at 810 West Markham. The organizing students emphasize that this demonstration will be done peacefully and encourage any community supporters to join them.
Written by Hannah Burdette, founding member of the LRSDSA, on behalf of her constituents.
25 hours of standardized testing. A question for the IEA: What ever happened to New Business Item 1?
Fred,
I can’t write a better op-ed about the PARCC than this one:
http://educationopportunitynetwork.org/whos-really-failing-students/
Please, please, please post this essential article. It’s PERFECT. IT’S THE TRUTH. All of this, as you know, is a corrupt, tangled web that in the end will affect ALL of us, active or retired. As public schools are shuttered because they are “failing” and teachers are forced to work outside of the TRS system, we will all be screwed.
As you know, this past March at the IEA RA New Business Item #1 was adopted.
“The IEA RA directs the IEA Executive Director with assistance form the Government Relations department to advocate for a moratorium on accountability measures related to the PARCC exam. The IEA RA supports the Illinois Learning Standards (Common Core); however, the moratorium would prevent PARCC results for the next several years from being used as accountability measures for students, educators, or schools.”
Between now and the end of the school year, my students will be subjected to at least 25 hours of standardized testing, including the PARCC, which accounts for about 16 hours of the total. This excessive, unnecessary testing is not in the best interest of children, the teachers, the district, or the taxpayers. This is guaranteed to interfere with real learning opportunities and experiences for students. And as we all know, standardized testing should only be used as a tool to inform instruction; it’s not reliable as a measure of student growth or teacher proficiency.
What is being done to stop the implementation of the PARCC in Illinois? What is being done to “prevent PARCC results for the next several years from being used as accountability measures for students, educators, or schools”? What is our school board doing to oppose this testing madness? What do parents do who want to opt their children out of the PARCC?
Teachers, administrators, and school boards are concerned. Please view the video and read the op-ed at these links:
http://educationopportunitynetwork.org/whos-really-failing-students/
– Let me teach