by Mark Dahl,
Mark Dahl, director of the Aubrey R. Watzek Library at Lewis & Clark College and a NITLE Fellow, outlines guidelines to help college libraries move from building digital collections to developing digital initiatives centered around faculty and student scholarship. Mr. Dahl has presented and written extensively on library technology and digital initiatives. His professional interests include digital initiatives, student engagement with library resources, and the future of the liberal arts college library.
The April 2013 Association of College and Research Libraries biennial conference in Indianapolis featured no less than fourteen sessions about academic library data services. Topics ranged from data and statistical sources for reference and instruction, to data literacy for scientists, to the development of data curation services.
Clearly, data services are a hot area in academic libraries. But how is this trend playing out in libraries at teaching-focused institutions, specifically liberal arts colleges? As I will illustrate below, there are rich opportunities to expand library reference and instruction services to support quantitative reasoning initiatives and data-intensive undergraduate research. Data curation and management services, a major interest at research libraries, are also an emerging opportunity at liberal arts institutions as are the collection and management of field research data.
First, it’s worth stepping back a bit to reflect on the bigger picture. We live in a world where networked technology makes it possible for organizations ranging from Walmart to Facebook to the NSA to ingest and store enormous amounts of data about society and the natural world. With abundant data, the scarce factor is the expertise needed to make sense of it and apply it. According to Google’s chief economist, Hal Varian, “The ability to take data—to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it—that’s going to be a hugely important skill in the next decades, not only at the professional level but even at the educational level for elementary school kids, for high school kids, for college kids.”
In the academy, the term “e-science” has come to encompass what big data can do for the natural sciences. According to one definition:
E-science is the application of computer technology to the undertaking of modern scientific investigation, including the preparation, experimentation, data collection, results dissemination, and long-term storage and accessibility of all materials generated through the scientific process. These may include data modeling and analysis, electronic/digitized laboratory notebooks, raw and fitted data sets, manuscript production and draft versions, preprints, and print and/or electronic publications.
The National Science Foundation’s introduction of data management plan requirements to their grant applications in 2011 affirms data access and preservation as essential aspects of the scientific research process.
In the social sciences, quantitative analysis is well established. But the Internet has made available a wide range of datasets to researchers from high school students to established scholars. Furthermore, big data stores that capture social phenomena from Twitter posts to sales transactions are now fodder for analysis by economists, sociologists, political scientists, and others.
Data intensive analysis is also an important aspect of the digital humanities. Winning projects in the National Endowment for the Humanities initial Digging Into Data Challenge included analysis of Enlightenment correspondence, linguistic parsing of large quantities of speech harvested from the web, and analysis of unstructured texts pertaining to railroads from 19th century newspapers.
Librarians for Data Literacy
Librarians have a role to play in assisting students find and evaluate data and as advocates for data literacy, a natural subcomponent of information literacy, quantitative literacy, and empirical reasoning. Librarians can effectively support students as they tackle data intensive assignments that are an aspect of quantitative reasoning initiatives. They also can help students develop data literacy as they pursue research projects that employ quantitative methods and data.
Many U.S. liberal arts colleges have embraced a movement towards quantitative literacy in undergraduate education endorsed by organizations such as the Association of American Colleges and Universities and the National Numeracy Network. Carleton College’s Quantitative Inquiry, Reasoning, and Knowledge (QuIRK) initiative is designed to strengthen quantitative reasoning in the curriculum. It focuses on how quantitative reasoning (QR) is used in the development, evaluation, and presentation of a principled argument.
QuIRK incentivises faculty to redesign courses in ways that develop quantitative skills. A number of these courses include assignments that involve locating and evaluating data, such as a brief research assignment in an introduction to quantitative reasoning course or an empirical economics research assignment proposal. That’s where Kristin Partlo, Reference and Instruction Librarian for the Social Sciences and Data at Carleton’s Gould Library, comes in. Partlo serves as the main librarian for economics, sociology, and anthropology at Carleton and as a data specialist for the library overall. She often meets directly with classes in the QuIRK initiative that are working on assignments that require students to find data sources. Partlo has seen the demand for individual and group consultations with students working on data intensive assignments grow since the inception of the QuIRK initiative about a decade ago.
Barnard College’s recently created Empircial Reasoning Lab supports an initiative similar to Carleton’s QuIRK but with a broader focus on both qualitative and quantitative research methodology. The Lab, located in Barnard’s library, is headed up by Heather Van Volkinburg, a recent Ph.D. in psychology and now the library’s associate director for learning initiatives and data services. The Lab assists students with locating data, helps them formulate research questions, and also provides them with guidance on research methodology.
Barnard’s Lab also trains students on data and statistical software including Excel and SPSS. After students find data that they need, the challenge of parsing, analyzing, and presenting it remains. Some libraries, such as Rutgers, have developed successful services supporting technology for data analysis and presentation. At other institutions, including Carleton, the library partners with instructional technology to provide such support.
Like Partlo, Van Volkinburg works closely with faculty to integrate data into courses in a variety of disciplines. In her case, these range from environmental studies to urban studies, chemistry, sociology, and beyond. For example, in the fall of 2012, she collaborated with faculty in the Urban Studies Program to develop a new project for their junior colloquium, The Shaping of the Modern City, which looks at urban growth from the mid-19th century to the present. The Empirical Reasoning Lab put together a workshop for the students to develop skills in Excel that they would use to analyze mortality rates from five different American cities from the early-19th to early-20th centuries.
With myriad data sources available on the Internet, undergraduates now have the potential to apply data in ambitious ways to answer their own research questions. Take for example an award-winning economics thesis by a Macalester student that looked at the effect of wind turbines on home values in an Illinois county by bringing together data from a county assessor’s office, an organization that tracks wind turbine locations, and the U.S. Census Bureau.
Even with seemingly abundant data, aligning the right data sources with student research questions remains a difficult task, and one still needs to look hard to find undergraduate capstone projects that make sophisticated use of multiple data sets. Partlo emphasizes that it’s her job to teach students how to find and evaluate data sources on their own rather than just giving them the answers. She has even mapped out her formula for a successful “data reference interview” on a worksheet that can be useful for students as well as a prompt for librarians.
According to Partlo, students are often piqued by data that they glimpse on an Internet search. “Google is making very visible particular types of data that aren’t accessible to our students. We need to help them uncover the accessible data that may sometimes be harder to find,” according to Partlo. Van Volkinburg says that Barnard students often come into the Empirical Reasoning Lab with “broad unanswerable questions.” She and her staff work with them to reformulate research questions that can be answered with available data.
Data Management and Curation: A Teachable Moment?
If students need help with finding, analyzing and presenting data, liberal arts faculty engaged with research need assistance managing their research data. This is where the other major area of library data services comes in: data curation and management. As a recent article in Nature points out, libraries are positioning themselves to provide data management services for scientists, but they are by no means established in this new role.
It is not yet clear how successful this reinvention will be, especially given the tight budgets facing both libraries and researchers. And as they step into the data-curation business, libraries are entering a crowded market of commercial publishers, information-storage companies and discipline-specific data repositories such as GenBank, which archives DNA sequences. But many say that libraries have a natural role in the data world, and that their importance will only grow with the push to unlock the products of research.
Middlebury College is a forerunner among liberal arts colleges in the area of library data management services. Science Data Librarian Wendy Shook arrived there in August 2012 with a background in archiving data produced at astronomical observatories. Shook’s initial strategy has been to educate researchers about data management through workshops, websites, and individual meetings. She says that there are disciplinary repositories for some research data produced by faculty, but for those that don’t have such a home, a local data repository is key. Shook was impressed by the faculty interest she received in a local data repository when surveying faculty members on their data needs and has developed a pilot local data repository service for launch in 2014.
Shook also plans on developing a digital preservation policy for Middlebury that will demonstrate to faculty and funding agencies the seriousness that the institution places on long-term preservation of digital assets. According to Shook, data preservation is a relatively “new field” for librarians that represents a combination of librarianship, information technology, and archiving. Data management practices remain very much in flux, and there are a “checkerboard” of solutions evolving, says Shook.
Trinity University is another pioneer in library data services among liberal arts institutions. They took a similar approach to Middlebury in implementing data services that began with a faculty needs assessment. So far their services focus on data management plan consultation, data management education, and repository services. Their outreach efforts target faculty in all disciplines including the social sciences and humanities. Indeed, there will likely be more uptake of data management services in the humanities as digital humanities initiatives gain ground at liberal arts colleges and as digital humanities funders increasingly require data management plans.
With students often involved as research assistants in science laboratories at liberal arts colleges, it is interesting to think about ways that data management might be a learning opportunity for them. At ACRL 2013, not less than four sessions focused on developing data management literacy among future scientists. For example, Oregon State University’s library is developing a curriculum for graduate students on research data management. Along similar lines, Trinity’s librarians conducted a data management workshop for undergraduate summer research students in 2013.
Data gathering is another aspect of the research process that has been reshaped by technology in recent years. In particular, mobile devices allow researchers to take measurements in the field and gather data with new levels of control, speed, and efficiency. In a collaboration between the departments of Biology, Environmental Studies, and Computing and Information Studies, Washington and Jefferson College created a mobile application to collect field data on salamanders and myriapods from the Abernathy Field Station near their campus in southwestern Pennsylvania. The application guides students in the data gathering process and retains the data until an internet connection is available.
In a similar vein, Lewis & Clark’s Watzek Library has provided technical support, metadata expertise, and training for students collecting data in the field in an introductory biology class, as well as its Digital Field Scholarship and Lewis & Clark Around the World programs. While gathering data in the field may seem quite apart from traditional library activities, the practice in fact demands careful attention to classification, metadata, and associated technologies, areas of potential library expertise.
Building a library data services program often requires a creative approach to staffing. Reinventing current positions with new data services duties can be an effective tactic. Social science librarians such as those at Carleton and Barnard are in a good place within the curriculum to provide data reference and education in data literacy. Science liaison librarians such as those at Middlebury and Trinity are positioned well to develop data management programs for scientists. Data services librarians do need special skills, and experience in data intensive research and/or data management can serve as a strong foundation for building those skills. At the same time, data reference and data curation practices are quickly evolving areas and present an opportunity for eager professionals to shape a new subfield of librarianship as it evolves.
Lewis & Clark’s recently hired Science/Data Services Librarian, Christine Malinowski, plays a crossover role as a liaison to the hard sciences and economics. Malinowski, who has a background in science communication and research program management, also collaborates with other liaison librarians to bring data literacy to other disciplines. In her first six months, she has co-written NSF data management plans and taught data literacy in social science courses. As data services mature and faculty come to the library looking for more specialized services, sharing expertise across institutions in this area may become a rich area for collaboration among regional liberal arts consortia.
The hype around data services in the library field focuses primarily around managing data for established researchers. At liberal arts colleges, the big opportunities around data services center on students. Teaching students how to find, analyze, and present data supports advanced forms of inquiry in the liberal arts and builds critical competencies that students will need beyond college. Support for research data management and field research projects is usually initiated with faculty but again can come back to the students. Training students to implement data management practices as apprentices in the research process is an opportunity to build data literacy among future scientists and scholars.
As faculty employ ever more sophisticated means of inquiry in their teaching and scholarship, they increasingly depend on partnerships to get the job done. Even in the humanities, team-based work is becoming more prevalent. As this article has demonstrated, libraries can be valuable partners as faculty develop data intensive assignments in the classroom and as they employ data in the laboratory and the field. If a library is successful in this endeavor, it can meaningfully elevate the quality and sophistication of both student and faculty work at its institution.
Distributed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
 Hal Varian,“Hal Varian on how the Web Challenges Managers,” Insights and Publications, McKinsey and Company, January 2009, http://www.mckinsey.com/insights/innovation/hal_varian_on_how_the_web_challenges_managers.
 Shannon Bohle, “What is E-Science and How Should it be Managed?,” SciLogs (blog), 12 June 2013, http://www.scilogs.com/scientific_and_medical_libraries/what-is-e-science-and-how-should-it-be-managed/.
 Steven Ovadia, “The Role of Big Data in the Social Sciences,” Behavioral & Social Sciences Librarian 32, no. 2 (2013): 130-134.
Christa Williford and Charles Henry, “One Culture. Computationally Intensive Research in the Humanities and Social Sciences. A Report on the Experiences of First Respondents to the Digging Into Data Challenge,” Council on LIbrary and Information Resources Publication 151 (June 2012), http://www.clir.org/pubs/reports/pub151.
 Mark Dahl, Kristin Partlo, Jeremy McWilliams, Wendy Shook, Heather Van Volkinburg, “Data Services In Liberal Arts College Libraries,” NITLE Shared Academics (online seminar), 24 April 2013.
 Wang, Minglu. “Supporting the research process through expanded library data services.” Program: Electronic Library and Information Systems 47, no. 3 (July 2013): 282-303, doi:10.1108/PROG-04-2012-001.
 “Introducing Barnard’s Empirical Reasoning Lab,” Barnard College, February 8th, 2013, http://barnard.edu/news/introducing-barnards-empirical-reasoning-lab.
 Natalie Camplair, “Do wind turbines affect the sale price of single-family homes? Evidence from McLean County, Illinois,” (undergraduate thesis, Macalester College, 2013), http://www.minneapolisfed.org/mea/contest/2013papers/camplair.pdf.
 Dahl, et al., “Data Services in Liberal Arts College Libraries.”
 Heather Van Volkinburg, phone interview by author, 12 December 2013.
Richard Monastersky, “Publishing frontiers: The library reboot,” Nature 495, no. 7442 (2013): 430-432.
Dahl, et al., “Data Services in Liberal Arts College Libraries.”
Megan Toups and Michael Hughes. “When Data Curation Isn’t: A Redefinition for Liberal Arts Universities,” Journal Of Library Administration 53, no. 4 (May 2013): 223-233, doi:10.1080/01930826.2013.865386.
 Samuel B. Fee, Christian Griffith, “Custom Mobile Applications for Collection of Field Data,” CUR Quarterly 34, no. 2 (Winter 2013), http://www.cur.org/assets/1/23/Winter2013_v34.2_Fee.Griffith.pdf.