Here are a few of our favorite things: libraries, old documents, maps, open government data, and crowdsourcing. What’s not to like?
That’s why we are so intrigued by the Building Inspector project at the New York Public Library (NYPL). The map room of the library—which houses more than 500,000 maps and 20,000 books and atlases—stores a series of insurance atlases from the mid-19th to the mid-20th centuries. These atlases were created by surveyors to assess property value for the various buildings in New York City, and included data such as their shape, their address, the building material used, and so on. The library wanted to digitize these maps, put them online and give people access to them.
But digitizing the maps manually takes a long time. In three years, staff and volunteers were able to digitize 120,000 buildings—which was, basically, Manhattan and Queens for the year 1853. Seeing that it was going to take too long to do this by hand, the library looked for a way to automate the process.
First, the library’s geospatial specialists had to decide on a definition of a “building” for the purpose. It ended up being: enclosed by black lines, larger than 20 sq. meters, smaller than 3000 sq. meters, and different from the color of the paper background. Then they wrote a program that looked at a black-and-white version of the maps, turned the images into vectors, synced them up with geographic coordinates, turned those into shapes, and ultimately decided which shapes were actually buildings.
This was a much faster process. Instead of producing 120,000 buildings in three years, the program could produce 66,056 buildings in a single day. However, the results weren’t perfect. The software didn’t enforce adjacency, which meant that some of the building images didn’t touch each other, and should have, or the building images overlapped, when of course real buildings don’t. There were also some false negatives or positives—that is, the program thought something was a building when it wasn’t, or didn’t think it was a building when it actually was.
That’s when the library decided crowdsourcing was the answer to fixing the maps. The program Building Inspector (after which the project gets its name) lays a copy of the shapes that the vectorizing software created over the map itself, and then invites people to say whether the software had drawn the building correctly, identified something that wasn’t a building, or if it was a little off but could be fixed—a process known as “rectifying.”
The next question: Would people be willing to go even further and help look at insurance atlas building maps?
The answer was a resounding yes: In a single day, more than 77,000 buildings were “inspected” by comparing the two maps; in three days, more than 163,000 were done. The most important information that surfaced was that much of the vectorized data—84 percent—was actually correct, and that only 7 percent needed to be fixed (9 percent of the vectorized data weren’t really buildings).
Now the library has moved to the next phase: In addition to saying whether the shape, or “footprint,” of the building is correct, people can also type in the address, if it’s shown; fix the shape of the building, if it is incorrect; and specify the color shown in the original map. In addition, the library has added Brooklyn to the project.
Other than chronicling change and discovering lost cities, there is a practical purpose to the project. The information is useful in times of a natural disaster. For example, older maps combined with the expertise developed in this project were used in Haiti after the earthquake to help determine where people might need to be rescued by showing where buildings were—or had been.
The library’s staff is also excited about the vast array of historical georeferenced data, and how it could be coordinated. “The mind boggles when one extrapolates outward because what is being imagined is a kind of time machine: detailed, drillable layers of urban history down to the individual address or landmark,” writes Ben Vershbow, Manager, NYPL Labs, in his paper Hacking the Library. “And when the lens expands outward to include other library collections with a geographical dimension (both at NYPL and beyond)—residential and business directories, local newspapers and periodicals, literary archives, corporate records, photographs, prints, menus, playbills, church registries, the list goes on—one begins to see an intricate needlework of inter-stitched data, cross-referencing through time and space.”
Simplicity 2.0 is where we examine the intricate and transitory world of technology—through a Laserfiche lens. By keeping an eye on larger trends, we aim to make software that’s relevant to modern day workers, rather than build technology for technology’s sake.
Subscribe to Simplicity 2.0 and follow us on Twitter. If what we’re saying piques your interest, head over to Laserfiche.com where you’ll see how we apply the lessons learned on Simplicity 2.0 to our own processes, products and industry.