Spend some time on social media and you’ll see them: Memes that use jokes or sayings against Victorianesque woodcut art. Now there’s a guy who wants to create a database to automate that process, using real Victorian jokes and real Victorian art.

Dr. Bob Nicholson, of Edge Hill University, won this spring’s annual British Library Labs Competition with his Victorian Meme Machine. Working with the British Library Labs, his proposal is to create a database of Victorian jokes, including all the metadata about the jokes and where they were found.

We’ve seen a lot of digitization of ephemera in historic preservation lately. But Nicholson wants to go even further. He intends to take the British Library’s collection of Victorian art, develop a program to pair jokes with appropriate art, send them out into social media, and see which ones go viral. The reaction will also be incorporated into the program to help it do a better job of pairing jokes with art.

To digitize the jokes and put them in the database with metadata, Nicholson first has to find them. His goal is to house one million. That is not as daunting a task as one might think. Nicholson, who specializes in Victorian culture, insists that Victorians were a lot funnier than we realize.

“Jokes circulated at all levels of Victorian culture,” he writes in the British Library Digital Scholarship blog. “While most of them have now been lost to history, a significant number have survived in the pages of books, periodicals, newspapers, playbills, adverts, diaries, songbooks, and other pieces of printed ephemera. There are probably millions of Victorian jokes sitting in libraries and archives just waiting to be rediscovered.”

So far, Nicholson is focusing on looking for the jokes in the British Library’s collection of newspapers. “Many Victorian newspapers carried weekly joke columns containing around 30 gags at a time,” he writes. “Over the course of a year, a regularly printed column yields more than 1,500 jests. If we can develop an efficient way to extract jokes from these texts then we’ll have a good chance of meeting our target of 1 million gags.”

For the moment, Nicholson is limiting his search to dedicated joke columns in the papers. His plan is to discover the titles of such columns and then search the newspapers for them. “Obvious keywords like ‘jokes’ and ‘jests’ have proven to be effective, but we’ve also found material using words like ‘quips,’ ‘cranks,’ ‘wit,’ ‘fun,’ ‘jingles,’ ‘humour,’ ‘laugh,’ ‘comic,’ ‘snaps,’ and ‘siftings,’” he writes.

“However, while these general search terms are useful, they don’t catch everything.” One such column, for example, is called “Buckwheat Cakes,” because they are jokes imported from America, and Americans all ate buckwheat cakes, get it?

Victorian humor is sometimes subtle.

Eventually, Nicholson hopes to develop a program that can scan the newspapers automatically and “recognize” a joke. “As our project develops, we’d like to experiment with some kind of joke-detection tool that picks out content with similar formatting and linguistic characteristics to the jokes we’ve already found,” he explains.

Next, the jokes must be digitized. That’s more challenging because the quality of the newspaper imagery is poor and optical character recognition (OCR) tools don’t do a good job, Nicholson explained in a lecture at the British Library. Consequently, he’s expecting to have to use manual transcription for this process, possibly through crowdsourcing.

Finally, Nicholson will have the jokes entered into a database, with metadata describing where the joke was found—which newspaper, what date, and so on. But it will also include information about the joke itself, such as the names of the characters and where they’re located. This is relevant because certain cities had particular reputations and this information was telegraphed to the astute reader, he explains.

“Women from Boston are always presented as over-educated and verbose,” he describes in a Reddit post. “Women from Chicago are often presented as being too forward or lacking the grace and refinement of other women. Men from Texas are always getting into gun fights.” Metadata will also include categorizing the joke.

There’s no word on how long Nicholson expects the cataloging process to take, or when we can expect to start seeing real Victorian jokes and real Victorian art in our social media feeds. He started to  collaborate with the library after winning the contest in May and plans to showcase his work at the British Library Conference Centre on November 3, 2014. The database could also end up being used by groups such as historians and novelists looking for Victorian verisimilitude.

What is the point of cataloging Victorian jokes, metadata and all? Why would anyone bother to pair them with real Victorian art and see what takes off on social media?

Cataloging the jokes helps us to better understand our history, Nicholson says. If the project demonstrates that computers can recognize jokes in archived newspapers, it will advance text analysis—and the technique could be applied to other fields. “All code developed will be open source and posted on GitHub; all the data will be open access and available for re-use in whatever way people want,” he writes. “So, it should be possible to take a plain text file of all the jokes (or a particular sample) and plug it into things like text mining software.”

Similarly, generating the memes, and then refining its generation strategy based on responses, also helps develop artificial intelligence capabilities. And it offers a better understanding of humor itself.

It’s said that “man is the animal who laughs.” In the process of understanding what makes these jokes funny, we will better understand ourselves—and our Victorian ancestors.