Charleston Insights in Library, Archival, and Information Sciences
eBook - ePub

Charleston Insights in Library, Archival, and Information Sciences

Practical Strategies for Information Professionals

  1. 448 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Charleston Insights in Library, Archival, and Information Sciences

Practical Strategies for Information Professionals

Book details
Book preview
Table of contents
Citations

About This Book

It has become increasingly accepted that important digital data must be retained and shared in order to preserve and promote knowledge, advance research in and across all disciplines of scholarly endeavor, and maximize the return on investment of public funds. To meet this challenge, colleges and universities are adding data services to existing infrastructures by drawing on the expertise of information professionals who are already involved in the acquisition, management and preservation of data in their daily jobs. Data services include planning and implementing good data management practices, thereby increasing researchers' ability to compete for grant funding and ensuring that data collections with continuing value are preserved for reuse. This volume provides a framework to guide information professionals in academic libraries, presses, and data centers through the process of managing research data from the planning stages through the life of a grant project and beyond. It illustrates principles of good practice with use-case examples and illuminates promising data service models through case studies of innovative, successful projects and collaborations.

Frequently asked questions

Simply head over to the account section in settings and click on ā€œCancel Subscriptionā€ - itā€™s as simple as that. After you cancel, your membership will stay active for the remainder of the time youā€™ve paid for. Learn more here.
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
Both plans give you full access to the library and all of Perlegoā€™s features. The only differences are the price and subscription period: With the annual plan youā€™ll save around 30% compared to 12 months on the monthly plan.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, weā€™ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes, you can access Charleston Insights in Library, Archival, and Information Sciences by Joyce M. Ray in PDF and/or ePUB format, as well as other popular books in Languages & Linguistics & Library & Information Science. We have over one million books available in our catalogue for you to explore.

Part 1

UNDERSTANDING THE POLICY CONTEXT

1

The Policy and Institutional Framework

JAMES L. MULLINS

INTRODUCTION

This chapter is in two parts. In Part 1, the policy framework on the national level is addressed, including policies of funding agencies to the collective response of research libraries through the Association of Research Libraries (ARL) to position members to be actively engaged in data management planning and services. In Part 2, a general overview of the manner in which Purdue University Libraries responded is provided as a case study to demonstrate how administrative policy within a university and the positioning of one research library meet this changing environment.

PART 1: SCIENTIFIC AND TECHNICAL RESEARCH: THE NEED FOR AND DEVELOPMENT OF POLICIES FOR DATA MANAGEMENT

Setting the stage.

In 1999, John Taylor, director general of the United Kingdomā€™s Office of Science and Technology, coined the phrase e-science to describe projects resulting from major funding undertaken in the many areas of the physical and social sciences, including particle physics, bioinformatics, earth sciences, and social sciences. The term in the United States is not often used as frequently as computational science is to denote a high integration of computer modeling and simulations into the scientific methodologies. During the last several years, scientists and technological researchers tend not to recognize or identify e-science or computational science as unique within research methodology. Computational research is how research is done.
In June 2005, a report was generated by the Presidentā€™s Information Technology Advisory Committee (PITAC), titled Computational Science: Ensuring Americaā€™s Competitiveness, providing then, and still today, a succinct compilation of the development of computational research methods that advanced and facilitated research in areas that were impossible even 30 years ago.
The breakthrough in mapping the human genome would not have been possible without sophisticated algorithms that deduced relationships within the human genome. In order to map the human genome, massive datasets were created that drew upon the research skills of computer scientists, statisticians, and information technologists. It also created a new role, at first not apparent, for an information/data specialist to determine how data could be described, identified, organized, shared, and preserved.
Concurrent with the PITAC report of 2005, Congress was raising questions to federal funding agencies about the high cost of research. Specifically, this was directed to major funding agencies such as the National Science Foundation (NSF), the National Institutes of Health (NIH), and the Department of Energy (DOE). The inquiry from Congress focused on the cost of collecting data in multiple research projects, projects that on the surface appeared to be connected or supportive of each other. If these projects were collaborative or complementary, why would it be necessary to provide funding for a research team to generate new datasets when another dataset already created could answer a question, or provide a dataset that could be mined to test a model or to test an algorithm? Was it really necessary to create a dataset that would be used by one research team for one project and then be discarded? If the dataset were known to the larger research community, couldnā€™t it be reused or mined multiple times, and thereby reduce the cost and possibly speed up the research process?

How did the transition from bench science to computational science take place?

There is an oft-recounted comment made by a biology professor at a major research university that 20 years ago she could tell a new graduate student to, ā€œSpit into that petri dish and research that,ā€ requiring little more than a microscope and standard research methodologies and reference sources. Now, the professor opined, a new graduate student arrives, and immediately they need to work with a lab team that would include computer scientists, statisticians, and information technologists. To explore and test a research question generally requires the creation of a dataset. The creation of the dataset requires complicated equipment and the talent of many people, resulting in a very high cost.
Scientists, engineers, and social scientists embraced this new method of gaining insight into their data. By analyzing and determining patterns in massive amounts of data or one large dataset, new hypotheses could be tested. By using data and the proper algorithm, it was no longer necessary to always replicate a bench experiment, rather, by drawing upon standard methodologies and the requisite data, a problem could be researched and a finding could be determined. Each project generated one or more datasets, with variables defined through metadata. The collection of accurate metadata is very important as minor differences here (the way the experiment is done) always impacts the data output. The challenge came when there was no consistent method for describing the process, the storage of the dataset, and the description of the content within the dataset.

Concurrent increase in managing data, or the lack thereof, by the researchers, the transition, and awareness that storage is not archiving.

The work of the researchers generated a massive number of datasets, often stored on an individual researcherā€™s computer, a lab server, or, less often, a university data storage facility or a disciplinary data archive. In the early 2000s it was becoming a greater and greater challenge for scientists and engineers in many different research arenas to know how to share and retrieve datasets, and it was an even greater challenge to retrieve and share datasets that were only a few years old. Research datasets often were lost with the transition of a labā€™s postdoc or graduate student. When the postdoc or graduate student who developed the methodology for describing, retrieving, and archiving the data from the research of the past three years or so departed, the access to and usability of the research dataset went out the door as well.
Researchers were assuming that the situation in which they found themselves could be managed by their college, school, or central information technology organization. All things considered, that was not an unusual or unlikely scenario. When researchers would meet with information technology specialists, they would be assured that storage would not be a problem, space was inexpensive, and as long as the dataset was in use, storage support would not be an issue. As the researcher continued to explore how best to identify, describe, and share the dataset, it became once again a problem that the researcher had to manage. The information technologist was not prepared to create metadata, or even to advise the researcher on how to create, or what elements should be in the metadata. The storage space used by the researcher typically was not searchable on the web, and was, therefore, hidden and inaccessible until the researcher responded to a request from a colleague to share the dataset, resulting in a file transfer that could be a challenge for the researcher and the colleague to accommodate.
So, the researcher had identified several important collaborators in undertaking computational science: the information technologist to manage storage; the computer scientist to create necessary algorithms to test the data; and the statistician to advise and run tests to determine reliability of the data. An important part of this continuum was missing: how to identify, describe, retrieve, share, archive, and preserve the dataset.

The environment that has created the demand from funding agencies that a data management plan must be included with proposals.

In the early 2000s questions began to be raised by Congress about the inefficiency or duplication of research projects funded by federal agencies, such as the NSF. In response, the NSF held hearings and appointed task forces to assess what the challenges were in data management, specifically data mining, and how could less waste be encouraged. The initial study, Revolutionizing Science and Engineering through Cyberinfrastructure: Report of the National Science Foundation Blue-Ribbon Advisory Panel on Cyberinfrastructure, was issued January 2003 (Atkins et al., 2003). Daniel Atkins, dean of the School of Information at the University of Michigan, was chair of the Blue-Ribbon Task Force. The report was groundbreaking, and now over 10 years later, reference is still made to this seminal work, typically referred to as the Atkins Report.
The Atkins Report for the first time brought together the challenges faced by investigators using cyberinfrastructure, large-scale computational facilities coupled with a strong Internet backbone to transfer data to wherever and whoever needed it. However, the Atkins Report identified challenges in fulfilling the potential of the cyberinfrastructure: how to identify, describe, locate, share, and preserve large amounts of data. Who were the players that had to be brought together to work through this dilemma, to ensure that federal research dollars were not being wasted on duplication of research projects across the United States? Among the Atkins Reportā€™s findings was an expansion of the Digital Libraries created by the Defense Advanced Research Projects Agency (DARPA), the NSF, and the National Library of Medicine (NLM) by an initial allocation of $10 million per year that was increased to $30 million when others, including the Library of Congress, joined the effort. The Atkins Report recommended an increase to $30 million, recognizing the value for provision of access and long-term stewardship.
Following up on the Atkins Report was another report issued by the ARL from a workshop funded by the NSF. To Stand the Test of Time: Long-term Stewardship of Digital Data Sets in Science and Engineering was issued in 2006. For the first time, the report detailed the proposed role for academic and research libraries in the management of datasets and as a collaborator in the cyberinfrastructure, computational, or e-science arena. It was not only in the United States that focus had turned to solving the challenge of managing data; in the United Kingdom and elsewhere in Europe, attention was being given to solving or at least understanding the challenges of data management.
In the report To Stand the Test of Time, there were three overarching recommendations to the NSF to act upon: ā€œresearch and development required to understand, model and prototype the technical and organizational capacities needed for data stewardship ā€¦ ; supporting training and educational programs to develop a new workforce in data science ā€¦ ; developing, supporting, and promoting educational efforts to effect change in the research enterpriseā€ (ARL, 2006, p. 12). Any one of these three recommendations could have had an impact upon the role of research libraries in data management. The challenge for research libraries was how to tackle one without the others also being advanced at the same time? How could libraries help effect change in the research enterprise in the use of data, if libraries did not have staff that understood or were interested in the problems of data management? Without staff who understood the challenge, how could the organizationā€”the libraryā€”modify its processes or role within the university to facilitate the management of data? Without these two objectives coming together, how could researchers be expected to change the manner in which they did their research?
The two recommendations that particularly resonated with the library community were the management of data and the ability of staff within libraries to collaborate with the researchers on managing data. Librarians had participated in building massive data repositories, albeit of textual data, such as the Online Computer Library Center, Inc. (OCLC), or more inclusive of numeric data, the Inter-university Consortium for Political and Social Research (ICPSR). However, these two efforts were accomplished by a cohort of libraries, supported by information technologists and computer science professionals, with a common shared goal and understanding of what the end product was to be. No such common understanding or defined goal existed within the research library community on how to come together to answer the apparent need of the researchers.

Response by the library community by accepting that data management would benefit from the application of library science principles.

The initial response to the challenge being made to the research library community was heard and acted upon by a few universities including Cornell University, Johns Hopkins University, Massachusetts Institute of Technology, Purdue University, University of California-San Diego, and University of Minnesota. For the training of librarians, two library/information schools took the lead to identify curriculum and programs that would prepare library professionals to participate in data management, the University of Illinois at Urbana-Champaign and the University of Michigan.
Where to start was the question raised by many research libraries, or even more elemental, does it really need us, or is it really something with which research libraries should involve themselves? University libraries were and are challenged to define their role among the various players (e.g., the faculty, the office of the vice president for research and information technology). How can these diverse groups and individuals work together? More will be discussed about these relationships below and in succeeding chapters.
There was discussion within the university library community about whether there should be a role for a librarian or the library, since libraries traditionally have been involved in the research process at the end by identifying and preserving the results of research in journals, conference proceedings, and books. Why get involved at the front end of the research process? An awareness did emerge when consideration was given to the role libraries had had for a very long time to preserve manuscripts and records of authors, scholars, and famous individuals. These manuscripts and records were, more or less, raw bits of data until a researcher ā€œminedā€ them to answer a research question. Therefore, libraries and archives had been partners in data management for a long time. Now it was to be the archive of nontangible data for science and engineering (Mullins, 2009).

Federal funding agencies provide support to study data management challenge.

Soon after the release of To Stand the Test of Time, two federal funding agencies responded to the call to look for a better way to steward massive amounts of data: the NSF and the Institute of Museum and Library Services (IMLS). The NSF was looking for the development of the underlying infrastructure that would enable the storage, retrieval, and sharing of data. The IMLS was looking to fund projects and research that would prepare the library community to collaborate on the challenge of data management. These initiatives from the NSF and the IMLS moved to address the first of the two concerns expressed in the report: establish a prototype to manage data, and the education and training of specialists to man...

Table of contents

  1. Cover Page
  2. Halftitle Page
  3. Title Page
  4. Copyright Page
  5. Contents
  6. Introduction to Research Data Management
  7. Part 1: Understanding the Policy Context
  8. Part 2: Planning for Data Management
  9. Part 3: Managing Project Data
  10. Part 4: Archiving and Managing Research Data in Repositories
  11. Part 5: Measuring Success
  12. Part 6: Bringing it All Together: Case Studies
  13. Closing Reflections: Looking Ahead
  14. About the Contributors
  15. Index