The dramatic increase over the last two decades or so in computing power, in wired and wireless connectivity, and in the availability of data has affected all aspects of our lives. Our aim in this book is to provide an accessible introduction to how social science researchers are harnessing innovations in digital technologies to transform their research methods. In this chapter we provide an overview of how and why e-Research methods have emerged, including an account of the drivers that have motivated their development and the barriers to their successful adoption. The chapters that follow examine how innovations in digital technologies are enabling the emergence of more powerful research infrastructure, services and tools, and how social science researchers are exploiting them.
1.1.1 Digital Data
As everyone exposed to the Internet is aware, the amount of digital data available is expanding very rapidly, both through the digitization of past records and by the accretion of ‘born digital’ materials that are in machine-readable form from the outset. The digital universe – the data we create and copy annually – is estimated to be doubling in size every two years and projected to reach 44 trillion gigabytes by 2020 (where a trillion is a million million, or 1012) (IDC, 2014). For social scientists, the predictions that more data will be generated in the next five years than in the entire history of human endeavour is both an opportunity and a challenge.
Today, vast amounts of data are generated as people go about their daily activities, both data that is deliberately produced and that which is generated by embedded systems. For example, use of public services is captured in administrative records; in the private sector, patterns of consumption of goods and services are captured in credit and debit card records; patterns of personal communications are captured in telephone records; patterns of movement are logged by sensors, such as traffic cameras, satellites and mobile phones; the movement of goods is increasingly tracked by devices such as radio-frequency identification (RFID) tags; and the advent of the ‘Social Web’ has led to an explosion of citizen-generated content in blogs and on social networking sites.
Currently, these data sources are barely exploited for social research purposes. The potential benefits to researchers are enormous, offering opportunities to mount multidisciplinary investigations into major social and scientific issues on a hitherto unrealizable scale by marshalling artificially produced and naturally occurring ‘big data’ of multiple kinds from multiple sources. However, exploiting these digital data sources to their full research potential requires new mechanisms for ensuring secure and confidential access to sensitive data, and new analysis tools for mining, integrating, structuring and visualizing data from multiple sources.
1.1.2 e-Infrastructure
Since the beginning of the new millennium, a world-wide effort has been underway to create the research infrastructure and to develop the research methods that will be needed if the ‘data deluge’ is to be harnessed effectively for research. A new generation of distributed digital technologies is leading to the development of interoperable, scalable computational tools and services that increasingly make it possible for researchers to locate, access, share, aggregate, manipulate and visualize digital data seamlessly across the Internet on a scale that was unthinkable only a decade or so ago.
e-Infrastructure comprises the information and communication technologies (ICTs) – the networked computing hardware and software – and the digital data that are deployed to support research. A very broad definition has been adopted by Research Councils UK (2014), which spells out more fully the components that are brought together:
e-Infrastructure refers to a combination and interworking of digitally-based technology (hardware and software), resources (data, services, digital libraries), communications (protocols, access rights and networks), and the people and organisational structures needed to support modern, internationally leading collaborative research be it in the arts and humanities or the sciences.
This definition highlights the complexity of e-Infrastructure and, correspondingly, the enormity of the socio-technical efforts required to efficiently integrate distributed computers, data, people and organizations in order to deliver tools and services that scientists can readily adopt to their advantage in pursuing their research. (In the US, the term cyberinfrastructure is more commonly used than e-Infrastructure.)
e-Research is the generic term that has been coined for the innovations in research methods that are emerging to take advantage of this new and vastly more powerful e-Infrastructure. Similarly, e-Social Science is the research facilitated by the e-Infrastructure. The ‘e’ in all these terms is short for ‘electronic’, although it is sometimes rendered as ‘enhanced’.
The scope of the book is the application of e-Research methods across the social sciences, including both quantitative and qualitative data collection and analysis. The aim is to introduce the reader to the application of innovative digital research methods throughout the research lifecycle, from resource discovery, through the collection, manipulation and analysis of data, to the presentation and publication of results.