The Internet Archive is working on two projects—the U.S. Medical Heritage Library and the U.K. Medical Heritage Library—that is intended to scan millions of pages of medical knowledge gleaned over decades and make it accessible to anyone over the Internet.

Admittedly, the U.S. and U.K. Medical Heritage Libraries are more “heritage” than “medical” these days—it’s unlikely that they will contain any information that would supersede the medical knowledge we have today. That said, it’s fascinating to see how people gradually put together the medical knowledge we now have. The project has been funded by grants. The Medical Heritage Library also features a Twitter feed that shares several items per day from the collection. 

Both libraries were put together by combining the collections of several other libraries. The project started in 2009 and was first put online in 2010. It began with participants including the National Library of Medicine and the libraries of Columbia, Harvard, and Yale Universities, writes Kevin O’Brien in the Journal of the Medical Library Association.

The U.K. Medical Heritage Library began in 2013 with libraries from the Royal College of Physicians of London, the Royal College of Physicians of Edinburgh, and the Royal College of Surgeons of England, to which have since been added UCL (University College London), the University of Leeds, the University of Glasgow, the London School of Hygiene & Tropical Medicine, King’s College London and the University of Bristol, according to the Wellcome Library, which is sponsoring the project. It is intended to be complete this year.

“The MHL, modeled on the highly successful Biodiversity Heritage Library, is a collection of scanned public domain books on medicine, pharmacy, nursing, and allied areas,” O’Brien writes. “Its curators maintain a regularly updated home page and a Facebook page featuring news about the project, images from recently released books, and links to articles concerned with the history of medicine. MHL book scans are contributed by the participating institutions and are made available for in-browser reading and file download in a dedicated section of the Internet Archive. Basic metadata for each text is included, and downloads are available in portable document format (PDF), Kindle, and a variety of other file formats.”

The “allied areas” cover a broad variety, including consumer health, sports and fitness, as well as some more arcane aspects of medical practice, ranging from phrenology (diagnosing someone by the shape of their head) to hydrotherapy, which uses water for pain relief and treatment. Works on food and nutrition will also feature around 1400 cookbooks from the University of Leeds.

The UK books have been scanned by hand by a group of a dozen people at a rate of 800 pages per hour, writes Victoria Turk in Motherboard. The project is expected to include more than 15 million scanned pages by the time it’s completed.

“The books arrive in orange crates, having been pre-checked to make sure there are no duplicates already online,” Turk writes. “Each is given a stable URL from the start as a unique identifier. Some really thick tomes won’t work, as the scanner can’t reach right into the “gutter” of the pages, leaving words chopped off. Many have a bandage of white ribbon holding their pages together so they don’t crumble apart.” And some, she adds, have uncut pages: After all this time, they’ve never been opened.

Unlike typical scanners, the scanners use a special machine, Turk writes. “The book scanner puts a book, open, on a V-shaped platform, then uses the foot pedal to lift it to a V-shaped glass plate,” she writes. “Two Nikon cameras snap the two pages at once.” The scanners use LED light to prevent any ultraviolet damage, she adds.

Once the books are digitized, readers can search for a particular word, book title, and book author, as well as subject and keyword, O’Brien writes. The scanned material also features high-quality images, including the colors used in the original books.

As well as making all the information available over the Internet to anyone who might want it, the project serves another purpose as well that is applicable to a broad range of libraries, writes Simon Chaplin in the Guardian. First of all, it helps reduce duplication. “By matching against the archive’s existing holdings, as well as each other’s, the UK Medical Heritage Library partners can avoid scanning the same book several times,” he writes.

Second, working together helps libraries decide which books to keep. “Matching of catalogues against one another also means that, for the first time, libraries will start to get a picture of how much overlap there is between their collections,” Chaplin writes. “By working together, they can reduce duplication while ensuring that enough physical copies survive in sufficient geographical locations to provide security for the future. The net result is that a greater number of different books are preserved.”

Finally, comparing how widespread particular titles are helps researchers determine which books were more canonical, and the spread of medical knowledge, Chaplin writes. “It will be interesting to see whether matching the collection of, say, the Royal College of Physicians of Edinburgh against that of the Royal College of Physicians in London will show that doctors in Scotland and England had broadly similar views of what was important to read in the 19th century,” he writes. “This kind of understanding isn’t applicable solely to medicine; we’re well-placed to beat a path for others to follow.”

New Call-to-action


Simplicity 2.0 is where we examine the intricate and transitory world of technology—through a Laserfiche lens. By keeping an eye on larger trends, we aim to make software that’s relevant to modern day workers, rather than build technology for technology’s sake.

Subscribe to Simplicity 2.0 and follow us on Twitter. If what we’re saying piques your interest, head over to Laserfiche.com where you’ll see how we apply the lessons learned on Simplicity 2.0 to our own processes, products and industry.

Machine Learning

Learn how machine learning can be the driving force for digital transformation in your organization.

Listen Now

Related Articles

By Sharon Fisher, March 04, 2015

If a picture is worth a thousand words, then the British Library’s most recent digitization effort is worth 4 billion words.

Read More