Google Books: Far More Than Just Books
Dorothy A. Mays is the Head of Public Services at the Olin Library at Rollins College in Winter Park, Florida. She writes historical fiction under the pseudonym Elizabeth Camden. firstname.lastname@example.org.
Google Books is such a game-changing addition to the world of librarianship that we are only beginning to grasp the wealth of its potential benefits.
Some scholars are designing techniques for algorithmic searching, text mining, and statistical analysis of the digitized books in hopes of better understanding historical eras of literature.1 Much of the press about Google Books has been consumed with its legal quagmires and copyright concerns.2 Librarians often bemoan the woefully inadequate metadata, poor search capabilities, and quality control issues.3
Putting these issues aside, I’d like to explore the hidden bounty contained within Google Books that can enrich what a public library can offer its patrons. The first stumbling block in understanding the value of this database is its name: Google Books. Yes, Google Books contains plenty of fiction and nonfiction books, but there is a wealth of non-monograph ephemera, including government documents, retail catalogs, maps, city reports, directories, and illustrations that can be mined for genealogical and historical research.
One of the beauties of Google Books is the ability to search the entire text of millions of items, bypassing the necessity of hunting down known items or even familiarity with the published literature on the topic. All the patron needs is the name of an ancestor or a historical curiosity to begin the search. This article will focus on ways average readers, librarians, and genealogists can enrich their research in surprising ways by the variety of materials beyond mere monographs that are contained in Google Books.
Size and Scope of Google Books
Within its first ten years, Google Books has grown to contain an estimated twenty million items, something it took the Library of Congress two hundred years to achieve. Given the pace at which Google is scanning and adding material, it is well on its way to becoming the world’s largest collection of books within the next decade. In partnership with more than forty research libraries and over thirty-five thousand publishers worldwide, Google is scanning and making searchable the cultural heritage of nations from around the globe. Although the majority of the books are in English, it currently contains books written in over four hundred languages.4 In a 2010 research study comparing Google Books against other major research collections, Edgar Jones found that pre-1872 content available at Google Books was comparable or superior to that of the control libraries.5
Google scans two types of materials: pre-1923 works that are in the public domain and books published in 1923 or after that are likely to still be under copyright protection. The pre-1923 books are fully searchable and almost always can be read in their entirety online. With the cooperation of over thirty-five thousand publishers and forty partner libraries, Google is also scanning books that still retain copyright. The full text of these books are searchable, but due to copyright concerns, most will only display limited pages or snippet views of the keyword searched. It is estimated that approximately 80 percent of the items in Google Books fall into this limited-access category.6
Google scans materials going back as far as the fifteenth century, although people searching for material from the nineteenth century are going to find the largest treasure trove of full-text information. The nineteenth century was an era of emerging bureaucracies, research organizations, inexpensive printing, and an explosion of commercial endeavors. Much of the paperwork generated by these groups has been scanned and included in Google Books.
Few public libraries have the funds or space to provide large collections of historic documents or primary sources for their Google Books patrons. Google Books solves this problem nicely, but it may take a little digging to find the relevant information.
An excellent example of the richness of Google Books for nineteenth-century research can be demonstrated by a research project I undertook to reconstruct the lives of average citizens in the months following the great Chicago Fire of 1871. Surprisingly little research has been done on this crisis, which displaced one hundred thousand people in an era before emergency relief services. In the immediate aftermath of the fire, the city government of Chicago was overwhelmed with the need to clear the rubble, rebuild streets and railroad lines, provide emergency shelter for one hundred thousand people, and import a massive amount of food and building material into the city.
A simple search in Google Books, “Chicago Fire” will pull in over three hundred thousand hits. The algorithm tends to favor currency and will push modern materials to the forefront of the results list, but the real treasures are buried deeper. Using the “search tools” limiting feature that appears under the search bar, it is possible to select a custom date range using the “any time” drop-down feature. By searching for items published only in 1871 and 1872, I retrieved only a few hundred items, but these were mostly primary documents and pure gold for someone wanting insight into what life was like in Chicago immediately after the fire. The results contain records of city council meetings, notes from insurance companies, reports from relief societies, church sermons on the fire, and personal memoirs.
Perhaps most surprising, I found reports from cities all over the country, as various relief organizations and town councils banded together to send help to Chicago. The fire was a nationwide catastrophe due to the ripple effect as hundreds of Chicago companies were plunged into bankruptcy, resulting in contracts that had to be canceled, disruption in the timber and beef industries, and the diversion of railway traffic. The crisis threw a wrench into the works of companies and industries all over the nation, and it is doubtful I would have gained this perspective had I done my research in person at Chicago-area libraries and archives. Google Books let me easily expand my search to archives across the country, leading me to serendipitous discoveries that let me study the catastrophe through a lens I had never anticipated. Many of the finds underscored the scope of the disaster as well as adding sometimes heartbreaking personal details.
In the years following the fire, a number of survivors wrote personal memoirs or accounts of their experiences. These memoirs were often cheaply published with print runs of only a few hundred copies, very few of which are extant today. These books have been scanned and made digitally available with the click of a mouse. One such book contained the texts of telegrams that flew in and out of the city during the chaotic first few days after the fire. Here is one from a shop owner telling his wife (who was visiting relatives in New York at the time of the fire) that they have lost everything: “Store and contents, dwelling and everything lost. Insurance worthless. Buy all the coffee you can and ship this afternoon by express. Don’t cry.” Stumbling across these rare and highly personal glimpses of historical life makes Google Books such a boon to people looking for a sense of life in an earlier era.
Having written a number of historical novels, I have come to rely on Google Books to reconstruct the details of nineteenth-century cities. While researching a novel set in Washington, DC, I found travel brochures that provided opening and closing times of the local museums, ticket prices for various theatres, and streetcar routes for navigating the city. Items such as telephone directories, budget reports, shopping catalogs, and social registers can also be found. In an early congressional directory I found several detailed floor plans for the U.S. Capitol in 1891. Try finding that in a post-9/11 world!
Prior to the advent of Google Books, this research would have required a trip to the cities in question, spending several weeks combing the archives and courthouse records. I estimate that my research via Google Books, due to full-text scanning capabilities and done from the comfort of my Florida home office, was faster and more complete than had I traveled to the cities in question.
A number of features in Google Books make it a godsend for genealogists, who are accustomed to prowling through massive archives on the hunt for fleeting references to their ancestors. Many of the resources genealogists rely on are not in Google Books: there is no systematic inclusion of census records, church archives, or passenger arrival lists. Nevertheless, Google Books contains many resources not typically used by genealogists, but the ease of full-text searching makes stumbling across serendipitous finds certainly worth any genealogist’s time.
Cities, states, and counties were often required to compile annual reports of their activities, and a good many such reports have been scanned into Google Books. Examples of such documents include police departments, public schools, telephone companies, commodity exchanges, labor unions, and professional organizations. These groups were often required to submit annual reports of their activities, which may contain chance glimpses of a long-ago family member. Such reports chronicle the life of ordinary people. For example, a search on my grandfather’s name and city turned up his application to have electricity added to his backyard garage in 1922. This reference appeared in a list of electrical licenses included in an annual report from the city engineers. This is not the sort of material typically at the forefront of a genealogist’s hunt, but a few clicks in Google Books may turn up many unexpected glimpses into the everyday lives of ancestors.
Full-text searching works best for people with unusual names. My great-grandfather, an immigrant from Germany, had the unusual name of Josef Auchter. A search on his name reveals only a handful of references, mostly from German-language regimental histories from the late nineteenth century. He gave his son an anglicized name, and there are hundreds of references to “Joseph Auchter,” most of whom are not my grandfather. The problem of duplicate names is a familiar one for genealogists, so narrowing the search by adding a known city, profession, or additional family member is a good way to refine the initial search. Because I knew the city where my grandfather was born, I was able to identify a handful of records merely by adding “Milwaukee” to the search.
Some of the best genealogical data is held on the county level, so searching Google Books by the county name, state, and limiting it to a decade in the nineteenth century is likely to yield interesting results. A search on “Wood County” and Ohio reveals probate records, regimental histories, commemorative histories, and some court records. Not everything is available in full-text, but there are links to buy, borrow, or order a print-on-demand copy.
This brings us to another value-added feature of Google Books. A number of companies are partnering with Google to provide inexpensive paper copies of out-of-print books. I have found these services to be fast and comparatively inexpensive. When I need an old nineteenth-century book for research, I am reluctant to purchase an antique copy and subject it to the abuse and scribbled marginalia that is my preferred style of research. For ten dollars I am often able to obtain a print-on-demand copy that will allow my librarian’s soul to rest easier as I underline and dog-ear at will.
A rich source of genealogy information can be the published oral histories; family genealogies; or the histories of a county, township, or village. These monographs are generally still under copyright protection, and Google may display only the snippet containing the search term. In such cases, this snippet view may provide enough information for the patron to request a scan of the relevant pages from the Family History Library in Salt Lake City, the largest genealogical research library in the world. Patrons who can provide a complete citation to a desired piece of information may email their request to the Family History Library and receive up to five image shots per month. The library reports that following their announcement of this service they received thousands of requests located through these limited snippet views in Google Books.
The lack of organization in Google Books may prove frustrating to librarians accustomed to complete catalog records, as these items usually lack metadata and reflect a wildly uneven collection of items. Because Google leans heavily on their forty participating research libraries, the geographic regions surrounding those universities are better represented than other areas. Lack of reliable cataloging aside, the full-text search capabilities make moving through the records comparatively quick and painless.
The erratic nature of the quality and quantity of materials found using Google Books cannot be emphasized enough. A city like Chicago will have a rich set of results because of Google’s partnership with Northwestern University. Other geographic regions and subject areas will not be so well served, but with over a million new items being scanned and added each year, researchers should periodically revisit Google Books to see if anything of interest has appeared.
For a comprehensive search on a specific topic, Google Books is once again likely to produce a colorful and diverse set of documents that trace a historic event as it unfolds. A good example is the massive engineering project to fill in Boston’s Back Bay. Beginning in the early nineteenth century, Boston undertook the nation’s most ambitious landfill project by filling many of the bays and inlets along its shoreline, a project that ultimately created over a thousand acres of new land. Most of this work was done between 1855 and 1894, and was well-documented.
A search on “Back Bay” and Boston, limited to full-text items from the nineteenth century will yield a tremendous variety of documents including:
- maps of the ongoing landfill progress;
- full text of city council reports on the project;
- reports from the city’s engineering office;
- financial documents relating to funding and expenditures;
- guide books for the city; and
- real estate brochures for the newly available tracts of land, houses, and shops built in the Back Bay.
Because of Google’s full-text searching, some of the results from the above query will be only tenuously related to the Back Bay, such as Clark’s Boston Blue Book: Ladies Visiting and Shopping Guide published in 1900, and containing over six hundred pages of club memberships, photographs, seating charts for Boston theatres, and the address and function of local municipal agencies. The Back Bay is mentioned only for some of its sporting clubs and dining establishments. Perhaps the most interesting feature of this book for those thirsty for trivia is the advertisements that promoted everything from banking, pharmaceuticals, and millinery to cab services.
One of the frustrations with Google Books is the lack of traditional cataloging. For example, trying to find an early telephone directory for a particular city can be a challenge because in the late nineteenth century this item might be called a guide, customer list, telegraphic address book, register, or perhaps “the Buffalo Directory.” People using Google Books need to have patience and the desire to hunt through immense lists of items in hopes of serendipitously stumbling across something of interest.
The lack of traditional cataloging also means there is no safety net to catch variations in spelling. This is especially important to keep in mind when searching British versus American spelling. As most genealogists are aware, variants in proper names are common in census and other historical records, but search operators can work, for example: Schwartz Josef OR Joseph OR Josephus will produce all three variants of spelling.
Although Google Books suffers from poor metadata, all the items are coded with year of publication and can be searched by specific date range. This means if you wish to research the state of technical or social awareness during a specific time frame, it is easy to search for scientific reports limited to a chronological era. Google Books is particularly rich in US government documents, and the abundance of scientific reports from the Smithsonian and research organizations (for example, the Agriculture Department, the Weather Bureau, the Patent Office, the Geological Survey, the Bureau of Ethnology), make it easy to generate a set of concise reports. These government reports are excellent windows into the state of knowledge during your defined time frame.
Another way to glean insight into the state of knowledge during a particular era is to consult an early edition of the Encyclopedia Britannica for the year closest to your time period. Hunting through this monumental encyclopedia can be a challenge, but it is an excellent and comprehensive
source for documenting the state of human knowledge from 1768 to the present (although full-text searching ceases with 1923).
Although the majority of items in Google Books are indeed “books,” I think it is more useful if librarians consider it to be a source of information rather than any kind of traditional set of books. Google Books is full of brochures, product manuals, directories, maps, church records, newsletters, government documents, and a huge range of ephemera. Other than the ability to refine your search by a specific year of publication, there are no sophisticated search capabilities. It is like dipping a bucket into a deep well, holding your breath, and praying you’ll find something of interest
in the vast results you pull up into the light of day. You will generally find something interesting, but it will require plenty of hunting and pecking.
One of the biggest frustrations our users typically have is managing their expectation for full text. There are two possible solutions for this:
- Limit results to “free Google eBooks” under the “Any Book” dropdown option.
- Make use of the in-demand printing option. For out-of-print books, it is usually affordable and has a fast turnaround.
For the librarians who fear Google Books or bemoan its lack of satisfactory cataloging, perhaps it will be comforting to learn of the experience at the University of Complutense in Madrid. After partnering with Google and having thousands of their books loaded into Google Books, they noticed a spike in circulation of the items that were made available in digital copies. This is curious because Complutense provided Google only with books in the public domain, and these books were viewable in their entirety online. The mere presence of these digital copies appeared to have sparked interest in the paper copies that was not noticed among the university’s books of similar age and topics that had not been digitized by Google. Their assumption is that Google generates more exposure to the book, which ultimately redounds to the print copy.7
Despite its immense size, Google Books is still in its infancy. Since its introduction in 2004, it has been the target of copyright lawsuits and deep suspicion of its potential to create a corporate monopoly over the world’s cultural heritage. Love it or hate it, Google Books represents one of the most significant developments in the last century of librarianship. Its poor indexing and search capabilities are overshadowed by the ease of its full-text search capabilities and the wonderful ephemera that enriches its holdings far beyond mere “books.”
References and Notes
- See for example, Mark Davies, “Making Google Books n-grams Useful for a Wide Range of Research on Language Change,” International Journal of Corpus Linguistics 19, no. 3 (2014): 401–16; Paula Findlen, “How Google Rediscovered the 19th Century,” Chronicle of Higher Education (Aug. 2, 2013): B2; Andrew Stauffer, “The Nineteenth-Century Archive in the Digital Age,” European Romantic Review 23, no. 3 (2012): 335–41.
- See for example, Clarice Castro and Ruy de Queiroz, “The Song of Sirens: Google Books Project and Copyright in a Digital Age,” Information, Communication & Society 16, no. 9 (2013): 1441–455; Marina Lao, “The Perfect is the Enemy of the Good: The Antitrust Objections to the Google Book Settlement,” Antitrust Law Journal 78 (2012): 397–442; Alok Sharma, “Google Book and Copyright: A Critical Perspective,” Social Science Research Network (2013), accessed Sept. 23, 2014.
- See for example, Millie Jackson, “Using Metadata to Discover the Buried Treasure in Google Book Search,” Journal of Library Administration 47, nos. 1–2 (2008): 165–73; Ryan James and Andrew Weiss, “An Assessment of Google Books’ Metadata,” Journal of Library Metadata 12, no. 1 (2012): 15–22; Julia T. Pope and Robert P. Holley, “Google Book Search and Metadata,” Cataloging and Classification Quarterly 49, no. 1 (2011): 1–13.
- Peter Baron, “The Library of the Future: Google’s Vision for Books,” Learned Publishing 24, no. 3 (2011): 198.
- Edgar Jones, “Google Books as a General Research Collection,” Library Resources & Technical Services 54, no. 2 (2010): 77–89.
- Castro and de Queiroz, “Song of the Sirens,” 1448.
- Suzanne Bjørner, “Complutense University of Madrid: Different Language, Similar Experience,” Searcher 15, no. 4 (2007): 22.
Tags: genealogy, google books, historical research, local records, using google books for research