Data Abundance and Its Consequences
In the spring of 2014, a professor heading up a new data analytics centre at a top UK university told an audience gathered to find out more about big data that if all of the data that existed was printed out on A4 paper, it would make a pile so high that it would extend to Jupiter and back seven times. By the time readers encounter these words, the figure will be much higher. Around the same time, the online technology dictionary Webopedia stated that in 2011, we created 1.8 trillion gigabytes of data, enough to fill nearly 60 billion, 32 gigabyte iPads. Thatâs âenough iPads to build a Great iPad Wall of China twice as tall as the originalâ (Webopedia 2014). In our big data times, such tales about data quantities abound.1
Stories about the qualitative consequences of data quantities also abound, as data get mined, analysed and aggregated by an increasingly diverse range of actors for an equally diverse range of purposes. One widely circulated anecdote tells the tale of the online department store Target and a teenage girl who changed her shopping habits. Where she once bought scented hand cream, she switched to unscented creams. She began to buy dietary supplementsâcalcium, magnesium and zinc. Target data analysts had previously identified 25 products whose purchase contributed to a âpregnancy prediction scoreâ (one journalist described this as Target âdata-mining its way to your wombâ (Hill 2012)) and this young womanâs score was high. The analysts concluded that she was pregnant and the store started to target pregnancy-related products at the teenager, a move to which her father vehemently objected. But the store was right: the teenager was pregnant, and the store knew this before her family did (Duhigg 2012; Hill 2012).
Another consequence of data abundance, this time on social media, can be seen in the story of a young man who joined a choir when he started college and Facebookâs unveiling of his actions. Taylor McCormick joined the Queer Chorus when he started studying at the University of Texas. The president of the chorus added McCormick to the chorusâs Facebook group, unaware that McCormickâs membership of this group would subsequently become visible to his Facebook friends, including his strict Christian father. A member of a conservative church, McCormickâs father did not speak to his son for weeks after the revelation. According to an article in the Wall Street Journal, McCormick was the victim of the lack of control we have over our data once they are digitised, or over our lives once they are datafied (Fowler 2012). McCormick âlost control of his secretsâ according to the article.
Digital data mining is used to predict a wide range of phenomena. Increasingly, influence and reputation are matters of numerical prediction through digital reputation measurement platforms like Klout, Kred and Peer Index. These systems produce âscoresâ that serve not only as measures of present influence but also as predictive targets of the future. These scores are then used in a number of ways: by hotel chains to determine upgrade rates; by events organisers to give preferential access to parties; in the evaluation of job applications in the digital industries; and by customer services departments to decide how quickly to reply to enquiriesâthe logic here is that it is better to respond quickly to someone with a high reputation score, as that person will influence more people when talking positively or negatively about his or her experience with a given brand (Gerlitz and Lury 2014). Writing about transactional and financial data, Mark Andrejevic (2013) points to other ways in which data is used to make predictions, including a story from the New York Times about credit card companies watching purchasing habits for signs of imminent divorce (Duhigg 2009). Did you use your credit card to pay a marriage counsellor? If you did, your credit might be reduced, because divorce is expensive and affects peopleâs ability to make credit payments. In another example cited by Andrejevic, the Heritage Health Prize set up a competition hosted by Kaggle, a company that specialises in crowdsourcing predictive algorithms, using anonymised health data to produce an algorithm that best predicts which patients might be hospitalised. Such prediction could result in useful, pre-emptive health intervention or, more ominously, in reduced options for health insurance, Andrejevic points out.
In addition to stories about data abundance, their predictive capacities and increasingly disturbing significance, there are also signs of a growth in public awareness of data mining practices. At the time of writing, Take This Lollipop (Zada 2011), an interactive Facebook application, had more than 15.5 million Facebook likes, somewhat ironically, given that Facebook is the target of the appâs critical message. On the Take This Lollipop homepage, users are askedâor daredâto take a lollipop by signing in with their Facebook account details. A video runs, in which a menacing-looking man, with dirty, chewed nails, taps on a keyboard and stares at a screen. Heâs staring at you, looking at your Facebook content, accessed via Facebookâs Application Programming Interface (or API), which allows software applications to interact with Facebook data. Heâs looking at photos of you, or perhaps of your children, tagged as you, and of your friends. He looks at a Google map, identifies where you live, and gets in a car. Heâs coming after you, getting closer. The video ends with the name of one of your Facebook friends randomly selected by the appâs algorithm: âyouâre nextâ, you are told about your friend.
A number of similar applications, usually subjecting the user to less visceral experiences than Take This Lollipop, emerged in the 2010s. These testify both to growing awareness of the possibilities and consequences of the mining of social media and other web-based data and to a desire to spread such awareness among web and social media users. They include sites such as We Know What Youâre Doing ⊠(Hayward 2012) and Please Rob Me (Borsboom et al. 2010), which re-publish public data shared on social media, including views about employers, information about personal alcohol and drug consumption, new phone numbers and absences from home. We Know What Youâre Doing ⊠describes itself as a âsocial networking privacy experimentâ designed to highlight the publicness and mine-ability of social media content. If users scroll down to the bottom of the webpage, they discover that the sentence started in the siteâs title ends with the words â⊠and we think you should stopâ. Similarly, the footer of the Please Rob Me website declares âour intention is not, and never has been, to have people burgledâ. On the contrary, both of these sites aim to raise awareness of the consequences of over-sharing. There are many other examples of this kind.
Another indication of the growth in awareness of digital data mining can be seen in the kinds of articles and reports that populate the pages of the mainstream press with growing regularity. A quick glance at the tabs I have open in my browser on the day I write these words, Thursday, 9 October 2014, demonstrates this. They include: a feature from The Guardian newspaper online, from 8 October, entitled âSir Tim Berners-Lee speaks out on data ownershipâ (Hern 2014), with a subtitle which highlights that the inventor of the web believes that data must be owned by their subjects, rather than corporations, advertisers and analysts; a report from Wired magazineâs website which asserts that our colleagues pose bigger threats to our privacy than hackers (Collins 2014); and another report from that day, this time on the BBCâs news webpages, about Facebook vowing to âaggressively get rid of fake likesâ on its platform (BBC 2014).
At the time of writing, anecdotes, apps and articles such as those discussed hereâwhich point to an abundance of digital data, some of the consequences of data mining, and efforts to respond to these phenomenaâare ever more common. They attest to the fact that data on our online behaviour are increasingly available and mined, and that the people whose data are mined appear, at least at first glance, increasingly aware of it. They also show that data mining has consequences, which go beyond the outing of young gay people, the withdrawal of credit and the refusal of entry to networking events. As many writers have argued, data mining and analytics are about much more than this: they also offer new and opaque opportunities for discrimination and control. Numerous writers have made this case, including Andrejevic (2013), Beer and Burrows (2013), boyd and Crawford (2012), Gillespie (2014), Hearn (2010), Turow (2012) and van Dijck (2013a), to name only a few. The expansion of data mining practices quite rightly gives rise to criticisms of the possibilities that they open up for regimes of surveillance, privacy invasion, exclusion and inequality, concerns implicit in one way or another in the above examples.
These criticisms, which I discuss in detail in Chapter 3, are entirely justified when it comes to the spectacular forms of data mining and analysis that have hit the headlines in recent years, as carried out by the National Security Agency (NSA) in the US and Government Communications Headquarters (GCHQ) in the UK, as well as governments, law enforcers and the major social media corporations themselves (Lyon 2014; van Dijck 2014). However, at the time of writing, there are many more forms of data mining than these. The expansion of data mining in recent times means that a diverse range of data mining practices exists today, carried out by a variety of actors, in distinct contexts, for distinct purposes, and some of them are more troubling than others. Therefore, we need to differentiate types of data mining, actors engaged in such practices, institutional and organisational contexts in which it takes place, and the range of purposes, intentions and consequences of data mining. Writing specifically about one source of data, social media, JosĂ© van Dijck and Thomas Poell (2013, p. 11) state that âall kinds of actorsâin education, politics, arts, entertainment, and so forthâ, as well as police, law enforcers and activists, are increasingly required to act within what they define as âsocial media logicâ. Such logic, they argue, is constituted by the norms, strategies, mechanisms and economies that underpin the incorporation of social media activities into an ever broader range of fields. One such norm or mechanism is data mining. Given the ubiquity of social media logic, we need to be attentive to the diversity of social media data mining that takes place within the varied fields identified by van Dijck and Poell, in order to fully comprehend data mining in its contemporary formation.
Couldry and Powell (
2014) make a similar argument about the need to ground studies of data mining and analytics in real-world, everyday practices and contexts. Acknowledging that enthusiastic belief in the power of data needs to be tempered, they nonetheless argue that:
However misleading or mythical some narratives around Big Data (âŠ), the actual processes of data-gathering, data-processing and organisational adjustment associated with such narratives are not mythical; they constitute an important, if highly-contested âfactâ with which all social actors must deal. (Couldry and Powell 2014, p. 1)
They argue that the focus in much critical debate on the ability of algorithms to act with agency (they give the work of Scott Lash
2007 as an example) leaves little room to explore the agency of small-scale actors who are making organisational adjustments to accommodate the rise of dataâs power. In contrast, Couldry and Powell argue that these actors deserve to be examined, alongside âthe variable ways in which power and participation are constructed and enactedâ (
2014, p. 1) in data miningâs practices. In this, they acknowledge that they are echoing Beerâs response to Lash, in which he argued that there is a need to focus not only on the power of algorithms, but also on âthose who engage with the software in their everyday livesâ (Beer
2009, p. 999). Couldry and Powell propose the same in relation to data mining, arguing that what is needed is an open enquiry into âwhat actual social actors, and groups of actors, are doing under these conditions in a variety of places and settingsâ (
2014, p. 2). Evelyn Ruppert and others (
2013) make the same ar...