Skip to content

Les Perelman’s BABEL generator. You try it.

October 25, 2014

lorem_ipsum_postcard-p239180943717058341trdg_400

My friend Les Perelman and some grad students came up with software that generates gibberish. They named the program BABEL,  the Basic Automated BS Essay Language Generator.

Why would MIT grad students want to generate gibberish?

To demonstrate the problem with computer-based scoring of test essays.

Les submitted the results to AES, Automatic Scoring Engines.  

“It works spectacularly well in producing nonsense that received high scores from various AES machines,” wrote Les.

BABEL even works well with ETS’ e-rater. Though ETS would not give me direct access to its scoring engine, ETS allows prospective test takers to take practice Graduate Record Exam (GRE) essays that are instantly graded by e-rater for $13. I have bought a number of these packages and have used the BABEL Generator to produce essays that consistently receive scores of 5 and 6 on a 1-6 point scale on each of the two writing assignments that comprise the writing portion of the GRE.

Naturally ETS has banned Les from future access to their scoring engine.

I tried it.

BABEL asks for three keywords.

I picked Chicago, politics and schools.

Here is what BABEL produced:

Chicago has not, and no doubt never will be misplaced. Human society will always enlightenment teachers; some of intercessions and others for a accumulation. a lack of Windy City lies in the field of literature but also the field of philosophy. Teachers is the most prototypically collaborating commencement of mankind.

Verisimilitude, especially for acceptance, articulates a query on rapaciously but discordantly antagonistic consequences by instructor. If conveyances voyage or belittle contemplation, pique that is consistently inappropriate but is iconoclastic, quotidian, and reproving with politics can be more venomously portended. Additionally, political relation, often at a assumption, can be the ligation. In my experience, all of the affronts to our personal epigraph of the exposition we taunt afford the allocations in question. Even so, armed with the knowledge that the divisive surfeit mortifies impropriety, most of the probes for my reprimand blubber. Our personal injunction to the aggregation we disparage acquiesces. Chicago which performs all of the demonstrations might certainly be an amygdala on our personal assassination with the taunt we surprise as well. The rumination of devices may be inauguration but is petulant yet somehow discrepant, not remuneration that depreciates contretemps and denounces dictates. In my theory of knowledge class, none of the dictators at our personal axiom by the exposure we allure collaborate and probe quips which civilize the advancement. The more a circumstance that gambols should be reprobation, the less provocation can diligently be a Gaussian onslaught.

As I have learned in my semiotics class, teacher is the most fundamental amplification of humankind. Though interference for veracity inverts, information processes brains. The same pendulum may process two different orbitals to process an orbital. The plasma is not the only thing the brain reacts; it also receives neutrinoes for disruption with Chicago. Due to advancing, humanely but egotistically admonished accumulations collapse also on Chicago. a startling teacher changes the dictum at Windy City.

The authentication, frequently to a retort, contravenes politics. The sooner the people involved attest, the sooner contemplation sanctions confluences. Furthermore, as I have learned in my literature class, society will always verify political relation. Our personal congregation of the convulsion we expel will be demolition with apprentices and may risibly be commission. The inspection might, still yet, be elidible in the way we respond or utter the inflexibly and pusillanimously atrocious acquiescence but accumulate intercessions. In my semantics class, almost all of the tyroes at my escapade convulse or augur the appendage. a quantity of political relation is inchoate for our personal speculation on the authorization we encounter as well. The avocation denigrates conjecture, not a ligation. In my experience, many of the circumscriptions by our personal assassin at the appetite we ascertain bemoan insinuations. The less rancor that seethes is antipodal in the extent to which we demarcate most of the adjurations for the realm of reality and infuse or should unyieldingly be a trope, the more affronts articulate the trope of parsimony.

Politics with agronomists will always be an experience of human society. In any case, armed with the knowledge that sublimation may perilously be compensation, most of the domains at my aggregation dictate commencements but quibble and disseminate inquiries which fascinate a rumination. If elated agriculturalists intercede and appease sanctions to the admonishment, teachers which choreographs assassinations can be more naturally assimilated. Instructor has not, and undoubtedly never will be articulated but not risible. Chicago is genially but fallaciously whimpering as a result of its those in question.

Would this get me into Harvard? Who knows?

But Les’ research suggests it would score well on an AES.

You try it.

http://babel-generator.herokuapp.com/

 

Keeping retirement weird. Intermittent blogging.

October 25, 2014

.

Six months ago we made plans for an Autumn trip to Mexico City and Puebla.

We have been to various parts of Mexico over the years, but never to Mexico City. Everyone who has been to Mexico City has told us that they loved it.

Traveling off-season when prices are lower is one of the benefits of retirement.

But those who think they can spend their retirement years traveling all the time are kidding themselves.

Unless they have the deep pockets of Bruce Rauner or his friend, Rahm Emanuel.

Although the idea of Rahm Emanuel in retirement and going somewhere else should not be discouraged.

In fact, I’m working on it.

But travel in retirement is not free. We are on a strict yearly travel budget. And that includes our favorite travel. Trips to see kids and grandkids.

Starting next week, posts will be intermittent. Although they might include photos of Mexico City sights and Day of the Dead celebrations in the town of Puebla.

And perhaps protests.

Last September 26th, 43 students went missing in the southern Mexican state of Guerrero.

From what little news that U.S. media has bothered to report we have learned that activist students in Guerrero had been collecting money in the town of Iguala for an Oct. 2 demonstration. The demonstrations opposed cuts to school funding. The government school had been opened in 1926 and has always been a center for social justice movements. This time, students were attacked by local police when they tried to take buses to and from the demonstration.

The police opened fire on the students and six were killed.

Then the 43 students were apparently abducted. Most believe they were killed by the police or by narco-criminals who work with the police.

The Mayor and his wife were in cahoots with the narco-criminals and were indicted. They have gone into hiding and are now fugitives.

The Governor of the state of Guerrero has resigned.

But the students have not been found.

Will Rahm’s guy get 15 years in the federal pen?

October 24, 2014

AMER

Educational Testing Service censors Les Perelman’s exposé of their B.S.

October 24, 2014

photo

Les Perelman (second from right) on a recent visit to the Art Institute of Chicago.

Les Perelman is not only an old high school pal. We are still friends. In fact, we spent time together just recently on one of his trips through Chicago. Les worked for years as director of undergraduate writing at MIT.  He is now a research affiliate in the Comparative Media Studies/Writing program MIT.

Les and his students created BABEL, the Basic Automatic B.S. Essay Language Generator. Using this software, Les uses Babel to show how limited computer-based assessments are in evaluating student writing.

This does not make institutions like Educational Testing Service (ETS), the world’s largest testing company, very happy. 

Today Les tells his story in Valerie Strauss’ Washington Post column, The Answer Sheet.

ETS supplies Strauss with a response to Les.

——–

By Les Perelman

The Educational Test Service (ETS) won’t let me continue to test a product that they are trying to sell to schools and colleges across America. Specifically, the company will not allow me access to the Automated Scoring Engine (AES) unless I agree to let them censor my findings.

All I want to do is test the claim by ETS that the feedback their automated essay scoring engine gives students is more precise than that available to anyone through Microsoft Word. Their website says: “The Microsoft® Word Spelling and Grammar tool can provide writers with a quick analysis of common errors. However, the Criterion service, as an instructional tool used to improve writing, targets more precise feedback.”

I submitted a proposal to test this claim to ETS. Previously, access to Criterion was easy to obtain. In 2012, in a series of experiments publicized in The New York Times , I demonstrated that Criterion was oblivious to factual errors and intentional incoherence. In fact, the scoring system preferred long pretentious language and verbosity.

I also discovered that Criterion’s feedback was often inaccurate and sometimes just plain wrong. It categorizes perfectly appropriate uses of the definite article as a “missing or extra article,” told me that the phrase “opinions about a film” contained a preposition error, and was almost always incorrect in identifying the thesis sentence of an essay. In addition, it chided me for writing a paragraph that had only three sentences. Nevertheless, I was quoted in The New York Times praising ETS for allowing me access. In the same article, David Williamson, the senior research director for the Assessment Innovations Center, was quoted as stating, “At E.T.S., we pride ourselves in being transparent about our research.”
Well ETS is no longer so transparent. Earlier this year, I asked for access to e-rater, Criterion’s scoring engine as part of a series of experiments to show that computer generated nonsense could receive high scores from Automated Essay Scoring (AES) computers. I even imagined that the nonsense generator could become a mobile app. ETS turned down my request, stating that I was developing a commercial product. In response, I pledged not to commercialize the research and appealed their decision. A one-sentence email informed me that the appeal was denied.

Three very smart undergraduates, two at MIT and one at Harvard, developed the computer gibberish tool, which we dubbed the Basic Automated BS Essay Language Generator or BABEL Generator. It works spectacularly well in producing nonsense that received high scores from various AES machines.

BABEL even works well with ETS’ e-rater. Though ETS would not give me direct access to its scoring engine, ETS allows prospective test takers to take practice Graduate Record Exam (GRE) essays that are instantly graded by e-rater for $13. I have bought a number of these packages and have used the BABEL Generator to produce essays that consistently receive scores of 5 and 6 on a 1-6 point scale on each of the two writing assignments that comprise the writing portion of the GRE.

For example, one essay containing this language:

Competition which mesmerizes the reprover, especially of administrations, may be multitude. As a result of abandoning the utterance to the people involved, a plethora of cooperation can be more tensely enjoined. Additionally, a humane competition changes assemblage by cooperation. In my semiotics class, all of the agriculturalists for our personal interloper with the probe we decry contend.. . .

is followed by these canned comments that the essay:

articulates a clear and insightful position on the issue in accordance with the assigned task
develops the position fully with compelling reasons and/or persuasive examples
sustains a well-focused, well-organized analysis, connecting ideas logically
conveys ideas fluently and precisely, using effective vocabulary and sentence varietydemonstrates superior facility with the conventions of standard written English (i.e., grammar, usage, and mechanics) but may have minor errors

My success with the BABEL Generator spurred me to test the efficacy of the bold claims made for classroom applications of Automated Essay Scoring.

Not only were computers going to test students but they were going to teach them as well, allowing students to write more and in classes in which an over-burdened instructor had 35 students and could just look at final drafts. As a Writing Program Administrator for over 30 years, I laugh at this naiveté, knowing that if such a technology were deployed, the most likely result would be that a superintendent or dean would double the class size to 70 students. In addition, a recently published article by two researchers showed that the Criterion was substantially less accurate than expert human scorers in identifying errors in papers written by advanced non-native English speakers. [Semire Dikli, Susan Bleyle, Automated Essay Scoring feedback for second language writers: How does it compare to instructor feedback?, Assessing Writing, Vol.22, October 2014, Pages 1-17. ] Indeed, it also confirmed the patterns I found in my earlier studies that over forty percent of the errors identified by Criterion, especially those regarding articles, were not errors at all. While this defect may just be annoying to native speakers, it is devastating to those learning English.

I submitted a detailed proposal to compare the accuracy of Criterion to that of the Microsoft® Word Spelling and Grammar tool. I would conduct the study with a colleague from MIT who has a Ph.D. in linguistics from MIT and who worked with Noam Chomsky and Morris Halle, the founders of modern linguistics.

Instead of the easy access I received two years ago, I received a long email from Chaitanya Ramineni, a research scientist in the Innovations in the Development and Evaluation of Automated Scoring (IDEAS) Group that included some disturbing provisions. Unlike my previous experiences with Criterion, I would not be allowed access to Criterion but would be required to give the data to ETS for them to process and return to me. Even more disconcerting, was provision 3C:

All presentations/manuscripts must be submitted for ETS review at least two weeks prior to public dissemination. ETS will retain the right to comment on the article to correct any errors, and as a result of the review, ETS can require that the ETS name, the Criterion name, and any identifying information about the particular company/product be removed from the publication / presentation / public dissemination (article, blog, etc.).

My reply to ETS labeled this censorship.

Dr. Ramineni responded, “This provision is a common policy/practice at ETS for external researchers who are not working directly with ETS staff.” This stance was reiterated in a phone conference I subsequently had with Dr. Ramineni, David Williamson, and several silent ETS executives and then repeated by the same group in another phone conference in which I was absent to the Chair of my professional organization a few weeks later.

Over the next few months, I discovered that the provisions ETS had told me were common practice were not consistently applied. Around the same time, another researcher had applied to use Criterion and had no problem gaining access. Indeed, ETS even provided her with training on how to use the latest release. The Criterion Non-Commercial Research Software License agreement [attached] she was asked to sign contained no language censoring content. And, during a phone call with Dr. Ramineni, she was asked out-of-the-blue if she was working with Les Perelman. She is not.

All I want to do is what organizations like Consumers Union and the Underwriters Laboratory do all the time: determine 1) if an advertised product meets its claims and 2) whether or not it is defective. Considering that the product in question is being used by school children and bought largely through public funds, free access should be limited solely to concerns about intellectual property. Yet ETS will not allow me access.
ETS is not alone. Pearson Educational Technologies wouldn’t even reply to my request to test their WriteToLearn® software, and Peter Foltz, a Pearson Vice President, was quoted in the 2012 New York Times article as justifying Pearson’s refusal to give me access to their product because “He wants to show why it doesn’t work.”

Although no company should prevent consumers from discovering whether or not its products work, ETS’s refusal is particularly alarming. ETS is a tax-exempt 501(c)(3) non-profit corporation. ETS claims this status as a non-profit educational institution, that, according to its charter, among other activities, encourages research in major areas of assessment.

The IRS, however, puts additional requirements on educational institutions:

The method used by an organization to develop and present its views is a factor in determining if an organization qualifies as educational within the meaning of section 501(c)(3). The following factors may indicate that the method is not educational.

The presentation of viewpoints unsupported by facts is a significant part of the organization’s communications.
The facts that purport to support the viewpoint are distorted.
I am trying to verify the factual accuracy of important educational claims made by ETS but the company is trying to prevent me from doing so. I hope that someone at the IRS is reading this.

Here’s a response from ETS Corporate Spokesperson Thomas Ewing:

Our policy is not censorship, but is actually designed to be as transparent as possible while still protecting ETS from dissemination of erroneous information.

What we have is a difference of viewpoint on policies relating to how reports which draw upon ETS-provided data can be published. We encouraged Mr. Perelman to use Criterion® for research purposes, but with the proviso that ETS be given the opportunity to review the results before they are published. The purpose of this policy is to ensure that there are no factual inaccuracies in public materials.

For example, if an external researcher were to find a correlation between Criterion scores and performance in college they may claim that Criterion is good for college admissions testing. However, this would be a misuse of Criterion and so ETS would be obligated to discourage this kind of claim.

Such a right of review for public material is a standard requirement for all such requests and something which other researchers accept, including our own, who submit papers for internal peer review before publication. We require that all presentations or manuscripts that rely upon ETS data be submitted for ETS to review at least two weeks prior to public dissemination. Obviously we have the right to comment on the research and to correct any errors or misrepresentations as a result of that review.

If we find the results of any study to be flawed or incorrect, the author can either correct the problems or they can remove any reference to ETS and replace them with general descriptions of the data that do not identify ETS as the source. This still allows the author to publish the results. This policy is progressive and more permissive than many other organizations who simply don’t allow external researchers to study their data, or publish on the basis. Mr. Perelman refused to accept this requirement and that’s where it stands.

National Lead Poison Prevention Week.

October 24, 2014

lead

CPS is not healthy for our children.

October 24, 2014

IMG_1392(1)

Gale parents protest possible school closing last year. Meanwhile Gale students were learning in an unsafe environment known to CPS.

As a severe contagious bronchial infection spread across the nation, Rahm Emanual’s hand-picked school board laid off hundreds of school custodians who keep our schools clean. They already handed over the work to a private company, Aramark, ending the practice of custodians as a part of a single school staff and staff culture.

Counters have not been wiped down. Soap dispensers have not been refilled. Toilet paper cannot be found in many student bathrooms.

This is how they treat our children.

Did you read all the fuss yesterday about a cockroach at a City Council meeting?

What? Have the alderman never seen a cockroach before? I could be snarky and suggest that there have been at least 44 cockroaches sitting there all along. What city do these Chicago alderman and reporters live in?

Haven’t they been told we live in Rat City?

Of way more importance than some city bureaucrat being mortified about a cockroach in the council chambers is the poisoning of students at Gale School.

For five years CPS has known about peeling lead-based paint at Gale.

For five years they did nothing about it until pressured by protests and FOIA requests.

Lead-based paint is poisonous. When the lead enters the body’s system, it never leaves.

It impacts brain and learning development.

Five years.

A student entering Gale in first grade will have been exposed to lead-based paint until the fifth grade.

Chicago Public Schools knew there was lead paint in first-floor bathrooms of Gale Math and Science Academy at least five years ago, but the district didn’t remove the paint until this summer, records show.

Neighborhood activists, who have been fighting for years for more resources for a struggling Gale, released the records Wednesday after obtaining the results of lead testing done at the school through a Freedom of Information Act request.

“We know that this is not something that’s going to be limited to Gale, and it’s incredibly dangerous and needs to be addressed at other schools,” said Daniel Dilliplane, a member of the Chicago Light Brigade, which published a report critical of CPS.

Here is the FOIA documentation.

Did I mention that this is National Lead Poisoning Prevention Week?

Slime Time.

October 23, 2014

slime

Follow

Get every new post delivered to your Inbox.

Join 1,262 other followers