More than Buzz Words: Big Data and Data Science
Data science isn’t a common phrase. So let’s start with an increasingly popular phrase: big data. Big data earned buzz word status with employers several years ago, and numerous vendors are now talking about big data in libraries. Big data generally refers to the storage and management of large data sets.[1] In this field, it would not be uncommon to work with a sizable datasets of five terabytes or larger. By comparison, five terabytes would hold approximately one million music tracks (85,000 hours of music).
Big data’s companion field, data science, focuses on extracting knowledge from these large data sets, and practitioners are called data scientists. Much like with big data, data science emerged when the right conditions developed—robust computing power, massive data sets, theoretical algorithms to extract knowledge, and powerful and flexible program languages. In practice, data science often focuses on predicting customer behavior and financial outcomes using large data sets that previously would have been too large to process for analytical purposes. Performing such tasks draws on a number of skillsets including machine learning, database programming, and predictive analytics According to Levi Bowles, practicing data scientist and author of DataScienceNotes.com, “The core abilities for a data scientist include higher level math statistics skills (calculus and beyond), computer programming, understanding business principles, as well as the scientific method and experimental design.”[2] Additionally, communication skills to translate highly technical findings to stakeholders throughout the business or organization are a huge plus. This combination of skills, encompassing expertise from a broad range of a number of fields, is a tall order.
As the field of data science has naturally evolved from diverse roots, including mathematics and computer programming, there hasn’t been a clear educational pathway for practitioners. Recognizing this gap, three academic units at the University of Illinois at Urbana–Champaign created a Master of Computer Science in Data Science (MCS-DS) degree in collaboration with Coursera, an online service offering massive open online courses.[3] The three units joining forces in creating this area of study are Department of Computer Science, Department of Statistics, and Graduate School of Library and Information Science. Unlike traditional graduate programs, the coursework is “stackable,” offering opportunities for students to focus on specific areas and earn certificates for study without the requirement to commit to the entire master’s program course load.[4] This flexibility allows both students new to the field to pursue a robust academic program in data science and also for practicing professionals to return to the classroom to focus on their specific areas of interest.
There is rich potential for collaboration between the field of data science and library science. Given data science’s powerful text analysis abilities and sizeable digital collections of significant works created by library science, there is an opportunity for a deeper understanding of content within the collection of these works looking at the broad collection to see patterns across millions—or more—documents. Since the capacity of an individual scholar to review documents over their entire lifetime would not match the capacity of data science’s tools to analyze in a relatively short time period, a collaboration of this nature, which can produce deep analyses of digital collections would complement individual scholarly study of documents.
Similarly, collaboration between the library science and library science could reap valuable information about citation patterns, such as the most influential scholars and journals. Relatedly, this collaboration could also identify citation patterns that are likely fraudulent. Work in this vein is already in progress at Louisiana State University where the Department of Mathematics and the School of Library & Information Science partnered to produce the presentation “Bibliometric Models and Preferential Attachment.”[5]
A final example of an area ripe for collaboration is result relevancy and recommendations: The tools of data science allow us to better predict user behavior. Capitalizing on this knowledge, search results and suggestions can be better refined based on user behavior for our patrons in library catalogs and online portals.
In summary, Urbana–Champaign’s Master of Computer Science in Data Science program seeks to fill a significant gap in the educational marketplace for the new and growing field of data science. This program found natural partners in statistics, computer science, and library science. Future collaboration in this vein could produce valuable understanding of library collections and citation behavior and can enhance library services.
References
[1] Gil Press, “12 Big Data Definitions: What’s Yours?” Forbes Tech, September 3, 2014.
[2] Levi Bowles, practicing data scientist, in an interview with the author, April 7, 2016.
[3] “GSLIS partners with CS, Statistics to offer first MOOC-based master’s degree in data science,” press release courtesy of CS@Illinois, March 30, 2016.
[5] Department of Mathematics Partners with SLIS for Research Presentation. (2016, March 18). Retrieved April 26, 2016, from http://www.lsu.edu/chse/slis/news/smolinsky-research.php.
Tags: big data, data science, library data, library school, masters in library science, MLIS Students