1Noise, data, silence, privacy
Thomas N. Cooke
Silence â in the digital realm â is an odd but welcoming postulation. What does silence look like online? Where does it come from? Why does it, or should it, matter? These are all exceedingly difficult questions to answer, perhaps because âdigital silenceâ is something of an oxymoron. The âdigitalâ is an expression of networked computer technologies constantly transmitting 1s and 0s, a realm of endless data transfer that never sleeps. To invoke silence in such a formulation implies that data flows stop, and that is precisely the point. Oft characterized as a ânoisyâ infrastructure, the notion of silence is indeed an intervention into the Internet itself â a necessary one that promotes user-first privacy within the virtually boundless reach of Big Data. Of central interest is ProtectMyPrivacy (PMP), a smartphone application that freezes data flows inside of smartphones so as to put the user in control of how and whether that data ought to be circulated, collected, and analyzed. I argue that silence is a temporary but crucial digital intervention that allows the user a previously unrealized opportunity to take digital privacy into her own hands. This technological intervention is particularly important because it prevents data from being mishandled, mistreated, and, as such, misplaced and mishandled in the sea of data noise.
Data has a turbulent relationship with noise. As a measurement of a larger picture, one intrinsically colored by numerous cultural, political, technical, and economic shades, hues, and tones, data is but a color sample. It is but a fractional representation of the complexities of a much larger story. Working with data can facilitate focused analyses, create new ways of accentuating underappreciated details, and can provide insight into some otherwise unobvious ingredients of life itself. However, when data is taken out of context it becomes confusing and ambiguous â especially so to the regimes of logic informing the usage of various data measurement, calculation, organization, and judgment tools. When taken out of context, dataâs meaningfulness is constructed, via interpretation. I argue that, as data is mined from a userâs smartphone, it tends to be misinterpreted, mishandled, and even exaggerated in its meaning and utility, thereby becoming less like âdataâ and more akin to ânoiseâ â particularly as data is aggregated in massive databases.
The relationship between noise and data is thus a privacy problem. As Big Data mentalities commit corporations and governments toward the pursuit of information mastery in the name of profit, entertainment, security, and surveillance, these actors are inescapably confronted with the problem of the mass accumulation and mass interpretation of data â processes that unavoidably build meaning outside of the context in which it emerged. The propensity to mistreat, misunderstand, and mishandle data leads to inaccurate user profiling that leads to an array of legal problems and ethical conundrums, in turn. While some of these issues are believed to be addressable, via policy interventions and discursive developments, they fail to recognize an inescapable epistemological problem surrounding the relationship between data creation and data analysis: Data is inescapably bound to, and offset by, noise itself.
The chapter is structured in three parts. Part 1 explores the nature of the relationship between noise and data on aural, technical, and theoretical levels. It begins by turning to communications studies and critical data studies to demonstrate that (despite an intellectual tendency to argue otherwise) data is not meaningful on its own terms. Data is always constitutive of various ranges of noise, misrepresentations, and assumptions. Part 2 of the chapter turns to philosophical work on noise by Serres (1980, 1982) to theorize how the turbulent relationship between noise and society translates into the digital realm, and thus becomes a privacy problem. Part 3 discusses the ways in which temporary silences of data flows not only interrupt Big Data but also repositions users to build a relationship with, and act upon, data in their smartphone. I also discuss how these momentary silences compel us as analysts to rediscover how they create moments of agency for entire collectives of privacy-oriented smartphone users, to relate the predicament of data mining in real-time to prevailing discursive lessons regarding the nature of contemporary digital surveillance. The subsection exploring PMP demonstrates precisely how the process of temporary silences, as micro-interventions, unfold. The chapter concludes by identifying how the power flows inherent in Big Data information transmission channels can indeed be interrupted and even harnessed to empower the user over the outcome of her own digital profiles and digital privacy. It is here that I implore future research to consider the sociopolitical ramifications of reifying digital privacy as a âtop-downâ matter of the regulation and control of data for users, as opposed to seeking out new horizons of possibility afforded by hacktivist technologies that deploy silence in noisy networks filled with noisy data.
Part 1: noise and data
Between May and October 2014 at the Centre de Cultura ContemporĂ nia de Barcelona art center in Barcelona, a six-minute film was projected onto three adjacent walls. Upon entering the partially enclosed space, viewers stepped into a virtual tour of Telefonicaâs data centers in AlcalĂĄ, Spain â one of the largest of its kind in the world. The project, created by Timo Arnall and entitled âInternet Machine,â not only allowed viewers to âseeâ the architecture of the Internet but also allowed them to âhearâ data being created, accessed, analyzed, and circulated. âI wanted to look beyond the childish myth of the âcloud,â â says Arnall, as he encourages his participants and readers to think about the nuts, bolts, energies, and moving parts of the Internet. âThink of the sound of the fan of your computer. Multiply that by 20 times or more. Think what thousands of those all going at once would sound likeâ (Arnall, 2014). Sitting at home on a tablet or in front of a laptop in an office cubicle, the belief that the Internet is a quiet, discretely operated network instantly collapses upon experiencing the materiality of the Internet. Arnallâs exhibit is not merely an allusion, nor is it merely allegorical either. It is, at once, a critical artistic exposition on networks as well as a documentation of the noise of the Internet itself. The multitude of fans, processors, and storage devices produce a cacophony of aural noise as a result of hundreds of different vibrational frequencies connecting and clashing. These devices are, in turn, affected by micro-noises within their component parts and wiring as well.
As electrical currents flow through wires, capacitors, and transistors embedded within their various component parts, they produce magnetic fields that cause their own micro-vibrations. Beyond the aurally measurable dimensions of the noisiness of networked machines, any electrical flow in a wire that is not discernible as a signal is considered noise as well. Signals exist as such because they are designed to be differentiated from random electrical current fluctuations. Signals are thus comprised of rhythmically timed releases of voltage representing binary units (1s and 0s). Arranged into groups, the binary units constitute the data transmitted through network cables. If there is any interference, or noise in a channel, it impedes the distinguishability of the binaries being sent. It can corrupt them, obfuscate them, redirect them, or prevent them from being received altogether. If wires are damaged, or too long, if different conductors are connected by different wires or if signals from neighboring transmissions interfere with other flows electromagnetically, electrical interference can be created â all of which are phenomena that interfere with a signal and degrade it, thereby making it increasingly less distinguishable from noise itself.
This kind of noise problem is not merely an aural problem, as depicted by Arnallâs exhibit. Rather, it is a technical problem whereby the reception of a signal (and, as such, data itself) becomes a target for intervention by various walks of the communications sciences. The issue was first taken up via Claude Shannonâs (1948) groundbreaking technical analysis of âsignal noiseâ within electrical communications systems. To Shannon, noise was indeed something akin to Arnallâs unwanted sound. But it was also a very specific kind of unwanted sound. It was an unwanted reverberation. To Shannon, noise was the irregular but measurable fluctuation inherent in transmitted electrical signals.
Noise here thus refers not only to interference within a transmission channel, but also as the quality or status of the channelâs content: data. While data can thus become temporarily distorted by ânoiseâ and to various degrees, data is thus argued to exist because it can be distinguishable from the background noise of neighboring signals and randomly fluctuating energy. Data is thus believed to be âmeaningfulâ because it is not noise (Shannon, 1948). Shannonâs logic dichotomizes noise and data as distinctly different and completely separate objects. But, as the following will argue, the line between noise and data is particularly blurry when dealing with noisy machines on a noisy Internet, laden with algorithms that often make errors, mishandle data, and misrepresent it altogether. The âmeaningâ of data, so to speak, is not self-evident.
Noise and the meaning of data
The matter of how and whether data is meaningful, on its own terms, is a problematic one. A wide array of technical literature aligns closely with Shannonâs formulation, whereby the presence of noise in electrical communications systems precludes the availability, integrity, and discernibility of data, essentially destroying its meaning as such (Minkoff, 1992; McDonough and Whalen, 1995; Pierce, 1980). However, a crucial distinction must be made between normative and critical treatments regarding how and whether data is meaningful.
A postpositivist, critical perspective argues that data, on its own terms, is not self-evidently meaningful. Data is not considered information until it is arranged, visualized, contextualized, and subsequently interpreted. How data becomes meaningful is thus an issue of how it is derived and rationalized. The extent to which data can thus be considered distinctly unique â to be considered not noise â is a semantical concern, not merely a technical concern. To Kitchin (2014), who conducted an etymological excavation of the term data found its origin in the Latin word dare, âto give,â data may indeed be a raw element of a technical system that can be taken through various techniques of recording, measuring, computation, and so on. Considerable emphasis is placed upon taken in the sense that data is but a sample of all possible measurements available. In other words, recorded and collected data used to analyze and make assumptions about any given phenomena, social, technical, or otherwise, can never represent an entire picture accurately. Data is always partially representative to Kitchin. It is always selective and the criteria used to distinguish what data is taken has consequences. Data is a product of modernity, spawned out of the seventeenth- and eighteenth-century scientific modes of producing knowledge â a child of the positivist postulation whereby only that which can be verified by science and math can represent verifiable knowledge and truth (Kitchin, 2014).
Various walks of the hard sciences conceptualize data as a self-contained embodiment that exists prior to argument and interpretation â human preoccupations that convert data into facts, evidence, and information. According to this intellectual treatment, data is thus believed to be meaningful on its own terms, independent of format, medium, producer, and context. But, as Kitchin argues, this is but one modality for understanding data. To understand it beyond a ânarrow rhetorical viewâ is to contend with the ways in which data exists as materially and technically as it does socially; data is never simply data. How data is created and used invariably varies depending upon who and what analyzes, uses, draws conclusions from data (Kitchin, 2014).
Claims regarding the âmeaningfulnessâ of data are thus paradoxically bound to issues regarding the context of its origin. As data is produced by gyroscopes and accelerometers in a smartphone, or keylogging algorithms imbedded within a smartphone API, it is amalgamated and (re)presented within different frames of reference and understanding so as to make them intelligible to data analysts. For example, metadata about how many times a smartphone user âtapsâ a region of a smartphone screen is translated into median coordinates representing averages of usage. That translation process allows analysts to draw inferences and conclusions about reward systems, content accessibility, and possible future user behavioral tendencies. But the means through which these inferences and conclusions emerge informed by their own systems of assumptions, ideas, logics, theories, and goals about how data ought to be valued and how other data ought to be dismissed. Each of these systems are consensus regimes that are designated by corporate priorities, programming sensibilities, logical paradigms, and privileged computer coding cultures. Perhaps most importantly, the experience of the user herself, as this data is produced as a result of her interactions with the smartphone, exists in a context that is fundamentally different than what the side story the data attempts to tell.
The userâs context is the larger story that data mining is assumed to reveal. Mistakes, misrepresentations, and partial insights are thus inevitable and unavoidable. Accordingly, noise is always a register of data. It is not a purely external phenomenon that interferes with and impedes upon data but a property of data itself. The tools used to analyze data are thus seeking to mitigate noise. For example, content data â such as the photos and videos found on social media platforms â are created with an intention, vision, and value specific to the user and her perspective upon the objects and subjects she is recording. The context of her recording is constituted by the wide range of lived conditions and experiences surrounding the recording. When she uploads content data online, or when that content data is copied from her device, its context changes. Most problematically is the moment that same content data intercepted, circulated, analyzed, and then recast into a variety of visual metrics to create assumptions about her shopping preferences or, perhaps more insidiously, to generate âpredictionsâ about her threat level to national security. Mined data thus becomes a different object than that which was initially constructed by the user herself. Mined data â to the perspective of the user â is as meaningless as it is meaningful. This is particularly the case for âdata about data,â or âmetadata.â
Metadata is collected in the moment users create content data: the coordinates approximating her geophysical location, the number of times she viewed the video, and the percentages of friends who saw her photos. Metadata is often trivial in terms of its meaningfulness to the perspective of the user, and is almost always different to the user and data harvesters. Noise may indeed be a technical problem, as Shannon argued, but is thus also a social problem as the matter of whether or not the information constituting data is discernible or noisy is a problem that can never be settled. Consider how metadata about a userâs Internet browsing behavior is never manually removed from her smartphone, nor is it ever voluntarily provided by the user herself. It is extracted via data collection mechanisms, such as HTTP or JavaScript cookies â thousands of byte-sized data packets recursively installed and retrieved by 95 percent of the nearly one billion websites in existence â that track...