A Publication of the Public Library Association Public Libraries Online

Library of Congress Slims Down Twitter Archive

by on March 2, 2018

The Library of Congress announced in December 2017 that it would no longer collect every stray thought, joke, announcement, or governmental policy change posted to Twitter, a collection developed over the last eight years.

A Brief History of the Library’s Tweet Collection

The Library of Congress announced its acquisition of the Twitter archive in April 2010 with a blog post titled, “How Tweet It Is!” At the time, Twitter saw more than 50 million tweets every day, and the acquisition comprised billions of tweets.[i] 

According to the gift agreement, tweets older than six months could be posted to the Library’s website or made available to approved researchers for specifically non-commercial use. Tweets could not be made available in a way that would allow for bulk downloading of the information.[ii] The transfer of the tweets from Twitter to the Library of Congress began in February 2011 with tweets from December 2010. The full 2006-2010 archive was received in February 2012, comprising about 21 billion tweets. By the end of 2012, the Library had received a further 150 billion tweets.[iii]

In January 2013, the Library had finished the acquisition and preservation of the original 2006-2010 archive, including establishing systems for receiving, preserving, and organizing the incoming tweets. By that time, the full archive comprised about 170 billion tweets and over 300 terabytes of information. The amount of incoming tweets had grown to 140 million per day in February 2011 and then to almost 500 million per day in October 2012.

Access to researchers was not made available, although the library had received about 400 inquiries about topics ranging from elected officials’ communications to predicting stock market activity.[iv] At the time a single search of the smaller 2006-2010 archive could take 24 hours and so was untenable for research activity.[v]

In December 2017 the Library decided to change the collections practice for Twitter. The Library will now limit the scope of the collection by acquiring tweets on a selective basis, matching its practices for the collection of websites.[vi] The Library plans to collect tweets around themes or events including elections and public policy. The 2006-2017 archive of tweets will continue to exist as a standing collection, but access to the collection by the public and researchers will remain unavailable until a solution to searching and access problems is resolved. [vii]


References

[i] “How Tweet It Is!: Library Acquires Entire Twitter Archive.” Library of Congress Blog. April 14, 2010. https://blogs.loc.gov/loc/2010/04/how-tweet-it-is-library-acquires-entire-twitter-archive/

[ii] “Gift Agreement.” Library of Congress. Accessed January 20, 2018. https://blogs.loc.gov/loc/files/2010/04/LOC-Twitter.pdf

[iii] Library of Congress. Update on the Twitter Archive At the Library of Congress. January 2013. https://www.loc.gov/static/managed-content/uploads/sites/6/2017/02/twitter_report_2013jan.pdf

[iv] “Update on the Twitter Archive at the Library of Congress.” Library of Congress Blog. January 4, 2013. https://blogs.loc.gov/loc/2013/01/update-on-the-twitter-archive-at-the-library-of-congress/

[v] Library of Congress. Update on the Twitter Archive At the Library of Congress. January 2013. https://www.loc.gov/static/managed-content/uploads/sites/6/2017/02/twitter_report_2013jan.pdf

[vi] “Update on the Twitter Archive at the Library of Congress.” Library of Congress Blog. Published December 26, 2017. https://blogs.loc.gov/loc/2017/12/update-on-the-twitter-archive-at-the-library-of-congress-2/

[vii] Library of Congress. Update on the Twitter Archive at the Library of Congress. December 2017. https://blogs.loc.gov/loc/files/2017/12/2017dec_twitter_white-paper.pdf


Tags: , ,