A Publication of the Public Library Association Public Libraries Online

Artificial Intelligence Used to Search Handwritten Manuscripts

by on January 17, 2018

Adam Matthew Digital is a UK company that digitizes unique primary sources including periodicals, correspondences, photographs, and even handwritten manuscripts in archives around the world. They share their collections ranging from Medieval Travel Writings, World War Propaganda, and Eighteenth Century Journals with researchers, universities, and libraries. Last month, the company announced the launch of Handwritten Text Recognition (HTR), an artificial intelligence (AI) technology that searches the full-text of their handwritten manuscript collections. They are the first primary source publisher to utilize AI to enhance their search capabilities.[1]

The head of technical for Adam Matthew, Glyn Porritt, said “It continues to return really remarkable results on even poor quality handwriting. We have undertaken research on samples of material and our estimates are an equivalent of 90 percent accuracy.”[2]

Handwriting recognition software has been in use for a couple of decades now for signature verification at banks and mail sorting at post offices. However, the address system for sorting mail can validate results by relying on zip code databases. Modern note-taking apps use machine learning to adapt to the handwriting of the most frequent user.

The HTR doesn’t have either of these advantages because the system is working with manuscripts written in a variety of calligraphy styles without pre-existing databases to cross-check. Instead the HTR uses complex algorithms to determine possible character combinations in text. This allows the handwritten text to be identified at document level, allowing users to easily navigate search results. Search results are displayed as snippets from the manuscript. Users then select the wanted snippet which directs them to the desired page of the manuscript.  

Standard Optical Character Recognition (OCR) software can fail to decipher text in printed documents if they have uncommon scripts, unusual spacing, water damage, stains or fading. HTR has similar issues and sometimes can’t decipher manuscripts written in legible script.

The first collection available with this new search feature is Colonial America, Module III: The American Revolution. This collection was sourced from the the National Archives UK and offers thousands of documents on North America from 1606-1822 making HTR a vital tool for navigating this content. “Manuscript volumes rarely have indexes,” Porritt said. “Keywords and metadata have traditionally brought the researchers towards the relevant document but then they have to find pertinent area of that work themselves. With HTR technology, the user can be taken straight to a highlighted word or words.”

This game-changing technology will be released in both new and selected existing collections published by Adam Matthew including Medical Services and Warfare, East India Company, and Mass Observation Online.   


References

  1. Adam Matthew, a Sage Company. (2017) Artificial intelligence transforms discoverability of 17th and 18th century manuscripts using handwritten text recognition. [Press release]. Retrieved from http://blog.apastyle.org/apastyle/2010/09/how-to-cite-a-press-release-in-apa-style.html
  2. Enis, M. (2017, October). Adam Matthew Enables Full-Text Search of Handwritten Manuscripts. Library Journal. Retrieved from http://lj.libraryjournal.com/2017/10/technology/adam-matthew-enables-full-text-search-handwritten-manuscripts/#_

Tags: , , ,