When the guy known as “the father of the Internet” tells you to worry about something, it’s a good idea to pay attention.
Dr. Vint Cerf, who helped develop the TCP/IP protocols on which the Internet is based, and is now vice president and chief Internet evangelist for Google, warned attendees at a recent American Association for the Advancement of Science meeting of a possible upcoming “digital dark ages.”
If future generations lose access to either the storage media or the software that stores so much of our data because the programs need to view them become defunct, mounds of digital material will be lost forever. In fact, he went so far as to call the 21st century “the forgotten century” because of the risk of losing its history.
“We’re going to have to build into our thinking the concept of preservation writ large,” Cerf told the group. “We don’t want our digital lives to fade away. If we want to preserve them, we need to make sure that the digital objects we create today can still be rendered far into the future.”
While some are promoting a call to go back to paper, this is throwing the baby out with the bathwater. First, paper has its own vulnerabilities—just recall how much trouble people are having reading volcano-crisped scrolls and water-damaged palimpsests (which Cerf also cited in his talk). And let’s not even talk about the library at Alexandria.
Second, born-digital and scanned documents offer many advantages over paper documents, such as ease in sharing and distributing the information. A document can only be held by one person, while a scanned document can be shared among many. Indeed, many museums are scanning their paper holdings, ranging from Darwin’s library to the Vatican, to make them more accessible to others.
Keep in mind, too, that Cerf is thinking about preservation in the context of decades, centuries or even more. “If we’re thinking 1,000 years, 3,000 years ahead in the future, we have to ask ourselves, how do we preserve all the bits that we need in order to correctly interpret the digital objects we create?” he said.
That stipulated, what’s the best way to ensure that your digital records stay readable for years to come?
“At a high level, the way to solve this would be to maintain, at a minimum, read compatibility with older data even as new technologies are introduced without worrying about performance, capacity or cost,” Eric Burgener, a research director with IDC, told CIO. “The devil, of course, is in the details.”
So what are some of those details?
- Keep up to date with storage technology, and in the process, migrate data to the new technology. Idaho’s Ada County, for example, is just now migrating from Zip disks, which were introduced in 1995 and stopped being made in 2003, leaving the county scouring eBay and Craigslist to find replacement drives.
- While you’re migrating data to new storage formats, save it in newer data formats as well.
- “A longer term option may be to store your data in the cloud with giant Internet corporations (e.g. Google, Amazon, Microsoft),” as those companies will update storage technology for you, writes Michael Zhang at PetaPixel.
- Organizations such as the Library of Congress and the Internet Archive are storing data such as Twitter posts and web pages, notes the Washington Post. The Library of Congress is also recommending archival formats, reports GCN.
- The industry is working on open data formats, rather than proprietary ones, which will make it easier to read data files even if the vendor goes out of business, writes the Guardian, which adds that the biggest problem the future may have is trying to pick the wheat from the chaff of our voluminous digital files.
Ultimately, the solution may be what Cerf calls “digital vellum,” or projects intended to preserve data and the means to read it for long periods to come. Stored under the right conditions, vellum documents can reportedly last for more than 1,000 years. Cerf has been promoting this concept for the past year or so.
One such project is Carnegie Mellon University’s OLIVE (Open Library of Images for Virtualized Execution), which performs regular snapshots of the data, as well as the software and operating system required to read it. “Take a snapshot of the entire computer, including the document, the settings, the program, the operating system itself and store it safely,” CNN writes. “All of that information is essential because computers in the future won’t have any context for understanding the programs we rely on today.”
Of course, such a system creates another problem, Cerf admits “The files required to store these digital snapshots forever will be huge but that will not be a fundamental problem, according to Mr. Cerf,” writes the Financial Times. “Data storage is getting so cheap that I don’t worry about that,” he told the paper. “I worry about how to find something in it.”
Simplicity 2.0 is where we examine the intricate and transitory world of technology—through a Laserfiche lens. By keeping an eye on larger trends, we aim to make software that’s relevant to modern day workers, rather than build technology for technology’s sake.
Subscribe to Simplicity 2.0 and follow us on Twitter. If what we’re saying piques your interest, head over to Laserfiche.com where you’ll see how we apply the lessons learned on Simplicity 2.0 to our own processes, products and industry.