A Publication of the Public Library Association Public Libraries Online

Creating a Digital Archive: It’s Harder Than You Think

by on March 9, 2017

History buffs get excited whenever a state or local agency announces the digitization of a huge collection of newspapers, birth and death records, and other archives. We all want the Holy Grail: convenient online access from home that lets us drill down and find information from genealogy records to crime reports. Such was the buzz when the State Historical Society of Iowa announced it was partnering with a Cedar Rapids business, Advantage Companies, to digitize 12 million pages of Iowa newspaper history. However, the reality of creating a digital archive is much more complex, and making it available online is risky and expensive. For a library, a museum, or any other state agency to create an online archive is harder than you think.

Managing the Project

We’ve all seen the sign on someone’s desk reading “the buck stops here.” But how did that sign get there? Who made that person responsible? It comes down to project management and who is best at it. The Iowa newspaper collection is still owned by the State Historical Society of Iowa, but is loaned to Advantage to scan and digitize. The company bears the cost in exchange for potential profit. So who is in charge of the transfer of the papers and microfiche? Who trains the archivists in the handling of the information? And who is responsible if something gets damaged? The library has experts at archiving and preserving. Advantage has experts in scanning and digitization, along with the equipment to do so. Both organizations have project managers of varying skill levels, and even an untrained project manager can succeed with the right tools. The key is from the beginning, a project manager must be selected and given the final authority over decisions.

Managing the Resource

Once digital files have been created, they need to be stored in more than one place. The resources must also be cataloged and managed, keywords chosen and added, and some kind of database or other search portal created. This is something that can be done by Advantage, or by the library. Using a program called FileMaker, the library can manage the data and how much of it can be accessed through the cloud so that they do not have to manage a server onsite. All those pages of newspaper articles equals a lot of data, and migrating that much information to the cloud takes time, even if you do so incrementally. Making it easily searchable is another issue, requiring the creation of a proprietary program or modifying one that has already been created. If this were not problematic enough, there is the issue of security and access.

Managing Access

Passwords prevent casual or accidental access to data. Administrative passwords that allow data to be modified or even deleted must be extremely secure. Backing up data in several locations is essential to preservation: digital data is no more secure than the server or servers where it is held. It is essential to review the basics of security management when undertaking any project.

No matter how secure they make us feel, servers are just servers. There really is no cloud: the cloud is merely someone else’s computer. If you do not have the room or capability to maintain a large enough server onsite, there should still be a physical backup of the cloud data somewhere, perhaps including storing it on more than one cloud service. At the same time information that cannot be accessed is essentially useless. Users must be able to read and search the archives for the purposes of research, even if they must do so at a library rather than from home. In the Iowa instance, Advantage is struggling with this issue. Their business model was set up to provide access at the library location, and they made a web app almost as an afterthought. It easily handled smaller datasets, but is not designed at all for one this large. The decision of how to manage this tension between access and security will vary by product, but it will also vary by how you pay for it.

Paying for it All

The Iowa circumstances are unique. The archive is on loan to a company which shoulders the costs in exchange for what they hope will be future profits. This creates a tension for them: they must determine how to monetize the data they gain. It makes it easier on the Historical Society of Iowa, but the digitization process is just the beginning. The costs then turn to operations and maintenance, the daily, monthly, and yearly costs of keeping data secure, accessible, and adding to it over time. These costs can be enormous. While it is easy to get grants for an initial project, it is much more difficult to get them for day to day operations. With many library budgets shrinking, this means they must get creative with how they fund this type of project.

However, if the project is funded with a federal grant initially, often one of the caveats in the grant language is that archives must be open and accessible to the public online and free of charge. Many library and university collections are getting around this by allowing access, but charging for copies of files or downloads of pictures. Still, this is often not enough to pay daily costs. Archiving is an expensive process, and even maintaining a digital one is not cheap. Someone must bear that financial burden. In Iowa, being able to search 12 million pages of archives sounds like a dream come true. Until these obstacles can be fully overcome, it may remain just that: a dream.


Tags: , , , , ,