A computer scientist, a database, and a couple of open source tools he developed helped provide evidence for something that had long been rumored: That Timothy Parker, the world’s most syndicated crossword puzzle developer, was copying from other people.
It’s not proof, and Parker denies it, saying it was just a coincidence. But in the meantime, he said he would “step back” from his position as crossword puzzle developer for USA Today and Universal Crosswords while this gets sorted out.
It all started over a decade ago. In 2003, crossword puzzle developer Marjorie Richter first called out Parker for one of his crossword puzzles bearing a striking familiarity to hers.
If you aren’t a crossword puzzle aficionado (or, a cruciverbalist, which is a word only a crossword puzzle aficionado would know), them’s fightin’ words. Apparently, there is no more heinous crime in the rarefied world of crossword puzzle developers—there are only about 300 of them and apparently they all know each other—than stealing each other’s ideas.
Since then, rumors of plagiarism by Parker have been simmering, but it was hard to prove. You can type a phrase into Google and find all the places it’s been used online, and you can use other tools to find all the places a particular image has been used, but how do you search for all the identical instances of a crossword puzzle? Especially if only some of the components are duplicated?
Crossword puzzles use multiple components, such as a theme, the grid or design, placement of the words in the grid, the clues, and of course the answers themselves, writes crossword puzzle designer Matt Gaffney in Slate. In the crossword puzzles in question, from one to all of the components were similar or identical.
In case you were thinking it seems like they could write a program to automate crossword puzzle design, they can, after a fashion. Turns out that computer scientists love crossword puzzles (which, incidentally, have only been around for a hundred years). “A very large percentage of crossword puzzle constructors are into computers or math as professions,” says New York Times puzzle editor Will Shortz in an interview by Cornelia Dean. “Crossword making involves having this huge amount of data and synthesizing it into a grid.”
But when it comes to developing arcane themes and the pun-filled clues for them, apparently not even Watson can handle it. “The best-loved American puzzles are constructed around themed answers, clued with precisely the allusive wit for which you need a human—or another human to copy from,” writes Alan Connor in The Guardian (which has an entire blog devoted to crossword puzzles).
Computers, though, were great at uncovering the issue. The duplication was discovered by programmer Saul Pwanson, who created a database of more than 52,000 puzzles, including a collection of puzzle fan Barry Haldiman, to help him figure out how crossword puzzles get made. “Until recently, puzzling plagiarism would have been easier,” Connor writes. “As newsprint found its way to the dumpster, so too would have any evidence. But the databases of clues assembled by devoted crossword fans and some clever code have made it possible to pull a [Inspector] Morse on anyone who might be cribbing crosswords. After all, these are people who are used to cracking clues.”
“The database that helped uncover the repetition holds tens of thousands of puzzles published by 11 outlets over various time periods — for example, it holds puzzles from The New York Times starting in 1942 and from the Los Angeles Times starting in 1996,” writes Oliver Roeder in FiveThirtyEight, which broke the story. “The engineer who created the database also wrote a computer program that identifies similar puzzles and assigns each pair of similar puzzles a similarity score, essentially the percentage of letters and black squares that are shared by two puzzles’ grids.”
It is possible to copy parts of a crossword puzzle accidentally, as Gaffney demonstrated in 2009. He found to his horror that a particular puzzle of his essentially replicated someone else’s puzzle.
So Gaffney laid out all the parameters for crossword puzzle design—and there’s a lot, written and unwritten—to show how likely it is that his duplication was innocent. He then proved it by asking another crossword puzzle designer to create a puzzle using the same theme, and his design was almost identical as well.
But in the analysis of Parker’s work, we’re talking about a lot of puzzles, and a lot of similarity. “More broadly, 1,090 Universal puzzles and 447 USA Today puzzles were at least a 75 percent match to an earlier puzzle in the database,” Roeder writes. “That’s 16 percent of all the Universal puzzles in the database (about one out of every six) and 8 percent of all the USA Today puzzles (one out of every 12).”
To crossword puzzlers, the odds of these occurring randomly seemed pretty unlikely. “If pure coincidence and accident were to blame, we would likely see similar rates of duplication at other crossword puzzle creators,” writes Jonathan Bailey in Plagiarism Today. “However, the highest rate at any other provider is Newsday, which was 1.1 percent, still less than 1/12 that of USA Today.”
While some have wondered what the fuss was all about, this case has opened the eyes of a number of people to the power of big data and electronic data in finding plagiarism. “Crossword authors are where reporters and writers were in the 90s as the first large-scale plagiarism detection tools were being created,” Bailey writes. “Prior to the development of document fingerprinting, there was just no way to check a text work against a large database of other works (such as the Internet). That technological leap has unveiled plagiarisms new and old. Now something similar is starting to happen in the field of crossword puzzles.”
“This would never have come to light except in the electronic age, where you can track these things,” Shortz tells Roeder.
More generally, the case helps demonstrate the value of big data. “When you get the data into a nice, clean, dense form, stuff just falls out of it,” Pwanson tells Roeder.
“I guess that’s the nature of any data set,” Haldiman adds. “You might find things you’d rather not see.”
Using the data at your fingertips is only one way to work smarter. Get your copy of and learn other creative ways to simplify the work you do every day.
Simplicity 2.0 is where we examine the intricate and transitory world of technology—through a Laserfiche lens. By keeping an eye on larger trends, we aim to make software that’s relevant to modern day workers, rather than build technology for technology’s sake.
Subscribe to Simplicity 2.0 and follow us on Twitter. If what we’re saying piques your interest, head over to Laserfiche.com where you’ll see how we apply the lessons learned on Simplicity 2.0 to our own processes, products and industry.