1
Cyber Society, Big Data, and Evaluation: An Introduction
Gustav Jakob Petersson, Frans Leeuw, Jonathan Breul and H.B.M. Leeuw
We are living in a new world of data. Crime cameras and other sensors generate data as we move. The global usage of mobile devices, social media activity, and Internet behavior leave behind a digital trail of âdata exhaust.â The economic world has largely moved online, reducing transaction costs and enhancing cross-border money flows. New âpeer-to-peerâ markets have been developed, including Internet auction âhousesâ such as eBay (Margetts, 2009: 3). Enormous amounts of data are created also through online media, radio, telecommunications, energy consumption, remote sensing, and program data (e.g. logistics). Machine-generated real-time data thereby constitute an increasing share of the data stored globally. Further expanding the digital world of data, registers as sources of (administrative) data have been increasingly digitalized and made more available.
The resulting dramatic growth in digital data volumes spans nearly every part of our lives from gene sequencing to consumer behavior (King, 2011). The article on the Internet and public policy by Margetts (2009: 3) showed that âfor many people across the world, large chunks of their social, economic and political life have moved online.â This is sometimes referred to as a âdataficationâ of our everyday lives.
Datafication is, however, not only about people starting to do things online rather than offline but also about people doing new things, particularly with the growth of the so-called Web 2.0 applications, where users can easily produce as well as consume content by themselves. Examples include social networking sites used by around one-third of Internet users; photo- and video-sharing sites and social media; and peer-produced information goods such as the online user-generated encyclopedia Wikipedia, the English language version of which has over three million articles and eleven million registered users (Margetts, 2009: 3). Such developments regard a large and increasing share of the worldâs population as Internet penetration has now reached approximately fifty percent globally, almost seventy-five percent in Europe1 and close to eighty-seven percent in the United States.2
Another aspect of the ongoing datafication is that we increasingly monitor ourselves. The Quantified Self-Movement âinvolves individuals engaged in the self-tracking of any kind of biological, physical, behavioral, or environmental information as individuals or in groups. Health is an important but not exclusive focus, where objectives may range from general tracking to pathology resolution and to physical and mental performance enhancement.â3
The Quantified Self-Movement works through technology: wearable sensors, mobile apps, software interfaces, and online communities, ranging from simple things like smart watches and electronic T-shirts to electronic tattoos and wearable personal information ecosystems (Swan, 2013).4
When such highly diversified sources of data are combined to form massive data sets, they are referred to as Big Data. And data sets indeed grow big. In fact, âbigâ may be translated into âtoo big.â Data sets grow bigger than what can be processed by the memories of individual computers; the development of tools to make the most of Big Data, such as machine learning, data mining, and Big Data analytics, is sometimes depicted as just as transformative as the development of the Internet (for instance, Cukier and Mayer-Schoenberger, 2013).
When Big Data is anonymized, aggregated, and analyzed, it can reveal significant new insights and trends about human behavior. The basic idea is that Big Data makes it possible to learn things that we could not comprehend with smaller amounts of data, creating new insights and value in ways that change markets, organizations, the relationship between citizens and government, and more (Mayer-Schoenberger and Cukier, 2013). We can also learn about phenomena that have been previously difficult to capture, for example, personal connections, such as those within Facebook, and geolocation, such as the place from which a âtweetâ was sent via Twitter (Taylor et al., 2014).
Private businesses have found use for Big Data in various fields as the development of new data products has grown to big business. A data product may, for instance, estimate the potential of various business strategies, for instance, by assessing how likely an individual with certain characteristics is to respond to a certain marketing campaign or how likely an investment is to yield the expected return (see for instance Siegel, 2013). And such analyses may be performed almost in real time.
At the same time, it is well known that in our world of evaluations, our work sometimes does not meet the informational needs of decision makers. There is a need for varied and rapidly delivered information to inform decision making. Studies taking years to finish are increasingly believed to be of limited societal or political relevance in a society where every (Internet) year lasts âthree months,â as the proverb goes. Ex ante evaluations, including regulatory impact assessments, are sometimes even done and reported before the program has been fully implemented, and also real-time evaluations and âRIPI-evaluationsâ (evaluating recently implemented program interventions) are more and more seen as answers to these challenges. Another central invention has been to evaluate theories underlying policies and programs (also before they are fully implemented) with the help of knowledge repositories where systematic reviews and synthesis studies have been collected, reviewed, and summarized. The Campbell Collaboration, EPPI, 3ie, and other âsecondary knowledge production institutesâ (Hansen and Rieper, 2009) provide syntheses that aggregate the samples of different studies, for instance, in the form of meta-analysis or realist synthesis. And yet the calls for more rapidly delivered and broadly generalizable results persist. This is while Big Data provides new opportunities to take the pulse of communities in real time.5
Big Data therefore seems to hold the potential to meet needs already noted and experienced by the evaluation community. And yet we will show that Big Data is hardly utilized by evaluatorsâthis in spite of the never-ending call for evidence-based policymaking and in spite of Big Data permitting a dramatically increased range of other agents to analyze social developments (Burrows and Savage, 2014), implying competition from evaluation-like activities. In the adjacent discipline of empirical sociology, the challenge is considered profound and very real (Burrows and Savage, 2014).
In the face of the rapid growth in the global data production, and given the development of analytical techniques to make the most of Big Data, we believe that evaluators should work with and use Big Data. Otherwise, the evaluation community will see tightening competition from more innovative communities representing a rival form of knowledge production that may promise insights quicker, cheaper, and more useful than those of evaluators.
The lack of attention to Big Data also seems surprising since there has been a debate within the evaluation community on the need for integrating evaluation and monitoring. In our view that debate illustrates the need to incorporate also Big Data since Big Dataâjust like monitoringârefers to a continuous collection of data.
We, therefore, encourage evaluators to engage with Big Dataâand keep turning the pages of this book.
The Purpose
The purpose of this book is, first, to highlight lessons learned from rare but valuable examples of how Big Data has been rewardingly utilized in evaluation and, second, to discuss how Big Data could be used more widely and systematically in evaluation in various policy fields. We will also discuss how the advent of Big Data may transform the role of evaluators in knowledge production and policy making.
We see at least four reasons why evaluators should engage with Big Data. First, Big Data sources are frequently available in real time, improving the opportunities of evaluators to produce their results while still relevant for decision making. Second, the size of the data makes statistical analysis more powerful and possibly more accurate. Third, Big Data often involves aspects of human behavior that have been previously difficult to observe, for example, personal connections and geolocation. This characteristic makes Big Data important to enhance the relevance of evaluation. Fourth, since the data are already there, it may be comparatively cheap to use as compared to other sources of data, such as surveys.
We will discuss Big Data in relation to various evaluative activities, such as impact analysis, effectiveness assessments, monitoring, and predictive work (ex ante evaluation). Distinctions between such activities are not our primary focus. This book focuses on the use of Big Data in the production of evidence-based knowledge for decision making.
Getting started with Big Data is of course associated with a number of challenges. Often discussed in relation to Big Data are ethical concerns associated with having access to data that mirrors numerous aspects of peopleâs lives, that is, privacy concerns, and also challenges of other sorts must be met. The fact that data have been stored does not guarantee that it will be accessible for evaluators or other analysts, as it may be owned, for instance, by business firms. Another pertinent concern is whether the evaluation community is equipped to make valuable contributions also in the Big Data era. Would we best fade out and leave room for Big Data analysts to help produce evidence-based policies? We believe not. Evaluators harness skills which will remain valuable and which make a good starting point for collaboration with Big Data analysts, when needed. And sometimes collaboration will not be neededâif evaluators acquire basic insights and skills needed to work with Big Data. We will discuss both challenges and valuable evaluation skills throughout this volume and summarize in the concluding chapter.
This chapter proceeds by first discussing the meaning of the concept of Big Data. It then gives some first indications of how Big Data may be of value in evaluation. Thereafter, some critique of Big Data is discussed. The chapter concludes by lining out the structure of this volume.
The Concept of Big Data
The term Big Data is today used in a number of different ways. While some observers emphasize characteristics of the data as such, others highlight the implications of the data and their derivativesâprocessing algorithms and data productsâfor decision making as well as society as a whole. Chapter 2 of the present volume will provide a discussion of different definitions of Big Data. For now, we will illustrate the breadth by referring to a definition presented by Boyd and Crawford (2012: 663):
We define Big Data as a cultural, technological, and scholarly phenomenon that rests on the interplay of:
(1) Technology: maximizing computation power and algorithmic accuracy to gather, analyze, link, and compare large data sets.
(2) Analysis: drawing on large data sets to identify patterns in order to make economic, social, technical, and legal claims.
(3) Mythology: the widespread belief that large data sets offer a higher form of intelligence and knowledge that can generate insights that were previously impossible, with the aura of truth, objectivity, and accuracy.
Other observers, however, use the term Big Data to refer only to data, frequently with an emphasis on automatically generated data. Bail (2014: 469) uses the term Big Data âto refer to the increasingly large volume of text-based data that is often â though not always â produced through digital sources. ⊠These data are also unique because they are ânaturally occurring,â unlike survey data which result from the intrusion of researchers into everyday life.â
Different chapters of the present volume will analyze different aspects of Big Data in relation to evaluation. Some chapters dig into data sources seldom used by evaluators. Other chapters use combinations of data sources which would not have been conceivable only a few years ago, when processing capacity and algorithms were less developed. Yet other chapters discuss the political and cultural framework within which evaluation meets Big Data.
Which of these aspects of Big Data are new? Some observers hold that the data as such are not very revolutionizing and that the revolutionizing power lies rather in (1) the development of a new frame of mind where data are conceived as a commodity with high scientific, economic, political, and social value and (2) in the development of new methods, infrastructures, technologies, skills, and knowledge to handle data (Leonelli, 2014). We will develop our perspective on what is new and what is now so new about Big Data in Chapter 2.
Using Big Data
Big Data has received substantial attention in different spheres. In the scientific community, Big Data journals have been established, and funding for the development of new analytical tools to utilize Big Data is growing. Private businesses utilize Big Data to create various data products, drawing on the development of machine learning, data mining, and Big Data analytics. Such tools make it possible to manage far larger data volumes than previously and to analyze non-structured data, such as free-text documents, images, motion pictures, and sound recordings (Mayer-Schoenberger and Cukier, 2013; OâReilly Media, 2011). Big Data has also been used in ex ante evaluation, for instance, in Global Development and epidemiological work (Kirkpatrick, 2012). Still there is no doubt that Big Data generally remains under-utilized in policymaking, as highlighted, for instance, in Deckerâs APPAM Presidential Address (Decker, 2014). The potential of Big Data has probably not yet been fully understood by policy makers and evaluators, as there is no reason to hold that Big Data is less promising for the public sector than for the for-profit sector (Mayer-Schoenberger and Cukier, 2013). For one, Big Data provides as yet unexplored opportunities for manipulating and controlling individuals and communities on a large scale (Leonelli, 2014).
Giving some examples of how Big Data has been utilized for evaluation-like activities may give a first glimpse of the potential of Big Data in evaluation. Letâs look into a few developments.
First, Big Data has made it possible to capture aspects of human behavior, which have previously been difficult to observe. Affecting such phenomenaâfor instance, human interactionâis frequently a focus of public interventions. Interaction and networks may be captured, for instance, over social media, but various difficulties such as validity issues have been obstacles in utilizing the potential of such data. Work is however being done to manage such difficulties. Note, for instance, that Bail (2015) provides an example of how Big Data generated through social media may be effectively combined with conventional survey techniques to enable more comprehensive analysis of collective behavior online. Bail had noted that although social media websites such as Facebook and Twitter provide an unprecedented amount of qualitative data about organizations and collective behavior, these new data sources lack critical information about the broader social context of collective behaviorâor protect it behind strict privacy barriers. Therefore, Bail introduced the idea of social media survey apps (SMSAs), which may be used to (1) request permission to access public and non-public data from users of an organizationâs social media page and (2) distribute a survey among the users in order to capture additional data of interest to a researcher. Finally, social media survey apps may be used to return the results of a scholarly analysis back to the organization as an incentive to share data and participate in social science research. Bail concludes that app technology provides a powerful new platform for social science research. Why should evaluators not use similar technology to map changes in norms or behavior related to interventions?
Another example of a difficult-to-observe phenomenon is deforestation or failure to preserve biodiversity, simply because vast areas would have to be monitored. So how are we to evaluate interventions to tackle such problems? An emerging trend is the develo...