p.1
1 Data and the city
Rob Kitchin, Tracey P. Lauriault and Gavin McArdle
Introduction
There is a long history of governments, businesses, science and citizens producing and utilizing data in order to monitor, regulate, profit from, and make sense of the urban world. Data have traditionally been time-consuming and costly to generate, analyse and interpret, and generally have provided static, often coarse, snapshots of urban phenomena. Recently, however, we have entered the age of big data, with data related to knowing and governing cities increasingly becoming a deluge; a wide, deep torrent of timely, varied, resolute and relational data (Kitchin 2014a; Batty 2016). This has been accompanied by an opening up of state data, and to a much lesser degree, business data, the production of volunteered geographic information, and the emergence of open data cultures and practices (Goodchild 2007; Bates 2012). As a result, evermore aspects of everyday life â work, consumption, travel, communication, leisure â and the worlds we inhabit are being captured and stored as data, made sense of through new data analytics, mediated through data-driven technologies, normalized through data-driven infrastructures, and shared through data infrastructures and data brokers (Amoore 2013; Kitchin 2014b; Offenhuber and Ratti 2014).
This data revolution has produced multiple challenges that require critical and technical attention â how best to produce, manage, analyse and act on urban big and open data, make sense of data infrastructures, data cultures and practices, and understand their consequences with respect to city governance, economy, politics and everyday life. However, to date, there has been relatively little critical reflection on the new emerging relationship between data and the city, and how we come to know and understand cities through data in the present era.
In the rush to create so-called âsmart citiesâ, wherein core city services and infrastructures become digitally mediated and data-driven â generating, processing and acting on data in real-time to algorithmically manage systems and calibrate performance â much of the attention has been on how to technically create and implement suitable smart city technologies, and associated institutional and infrastructural supports such as data standards, protocols, policies, and a variety of telecom networks. Such data-driven technologies include: urban control rooms, e-government systems, city operating systems, coordinated emergency response systems, intelligent transport systems, integrated ticketing, real-time passenger information, smart parking, fleet and logistics management, city dashboards, predictive policing, digital surveillance, energy smart grids, smart meters, smart lighting, sensor networks, building management systems and a wide plethora of locative and spatial media. Collectively these technologies are generating an ever-growing tsunami of indexical data (uniquely linked to people, objects, territories, transactions) that can be repurposed in diverse ways â for example, in predictive profiling and social sorting of citizens and neighbourhoods, creating urban models and simulations, for policing and security purposes, etc. (CIPPIC 2006; Batty 2013; Kitchin 2014b; 2016). These data are in addition to large quantities of administrative and statistical data, more traditional sampled survey data, polling and public opinion data, and any other data the city may collect as part of reporting and delivering services.
p.2
Rather less attention has been paid to more epistemological, normative, ethical and political questions concerning how data-driven cities and urban issues are framed and approached; how city development and progress are envisaged; what kinds of data are being produced and to what purposes they are being employed; what kinds of cities we ideally want to create and live in (not simply from an instrumental perspective â solving particular issues such as traffic congestion; but with respect to issues such as fairness, equity, justice, citizenship, democracy and governance); how these data-driven technologies and processes work in practice on the ground; what kinds of social and spatial relations they produce; whom they benefit and disadvantage or exclude; what kinds of subjectivity, citizenship, participation and political action they support; and how they reshape many aspects of urban life. This is not to say that there has been no consideration of such questions â as the chapters that follow and the work they reference attest, there is a growing body of research that critically examines urban data and their use. However, the work to date is still relatively formative in theoretical and empirical terms, often considers data-driven systems within the context of smart cities in general terms rather than focusing specifically on the unfolding relationship between data and cities, and the development and rollout of data-driven urbanism is largely outpacing critical reflection and interventions.
Data and the city
This volume is designed to help to fill this lacuna through an interdisciplinary examination of the relationship between data and contemporary urbanism. The focus is not smart city technologies per se, but rather the essays concentrate on how to make sense of urban data and the emerging era of data-driven urbanism. As well as providing synoptic analyses and new conceptual thinking, the chapters detail a number of illustrative examples of urban data, data-driven systems and related issues, including data infrastructures, urban blockchains, mapping, urban modelling, data provenance, data quality, data citizenship, citizen science, data practices, data cultures, data frictions and city dashboards. Importantly, given the wide-ranging, diverse and complex relationship between data and the city, and the need to bring various expertise and knowledge into dialogue, the contributors are drawn from a number of disciplines (Geography, Geographic Information Science, Planning, Sociology, Information Science, Design, Media Studies, Law and Computer Science).
p.3
All but three of the chapters were prepared initially for a workshop at the National University of Ireland Maynooth in September 2015, funded by the European Research Council through an Advanced Investigator Award to Rob Kitchin for The Programmable City project (ERC-2012-AdG-323636-SOFTCITY). Each essay was pre-prepared and submitted in advance of the meeting, then extensively discussed at the workshop, and subsequently revised for publication. While the book is designed to work as a standalone text, there is a companion book, Code and the City (Kitchin and Perng 2016), that focuses predominately on the relationship between software and the city. To provide a structure, we have divided the book into four parts.
Data-driven cities
The first part considers the relationship between data and the city in a broad sense, focusing on the creation of real-time cities and data-driven urbanism and how the ever-greater flows of data are transforming city services, infrastructures, urban life and how we understand and govern cities.
In the opening chapter, Martijn de Waal examines the creation of âreal-time citiesâ, wherein computation is embedded into the fabric of cities producing real-time data flows that can be used to know and manage city services in the here-and-now. He argues that such data-driven systems are changing how we understand cities in three ways. The first is the adoption of an action-orientated epistemology wherein the production of real-time data, along with machine learning techniques, enables a new kind of scientific knowledge about cities that treats them as complex systems which can be made actionable through smart city technologies. The second approach is more critical in orientation and, on the one hand, challenges the scientific principles and epistemology of the first, and on the other, considers more ontological questions concerning how real-time data and data-driven systems transform the production of space, the nature of place, and the experience of living in the city. The third approach asks more normative questions and argues that cities cannot be conceptualized and approached as being analogous to other complex systems, such as galaxies and rainforests, because they are social-cultural-political in nature. Instead, it is contended that a new science of cities needs to frame data-driven cities with respect to wider concerns about the kinds of cities we want to create and how to produce particular kinds of âcitynessâ. De Waal argues that more attention needs to be paid to this third kind of knowledge making and its praxes.
Mike Batty considers the nature of urban big data and the epistemological challenges of using them to make sense of the city, placing his discussion in historical context. Adopting an approach that is perhaps characterised as fitting within de Waalâs first mode of understanding data-driven cities, Batty argues that we have always been struggling to extract insights from ever-larger and more dynamic data as urban technologies evolve and urban computational research struggles to keep up. He notes that what might be considered small data â sampled in time, space and by category â soon become very large once the interactions between data points are examined. Using the concept of a data cube, Batty examines the characteristics of urban flow data between locations. In particular, he illustrates his arguments by detailing the difficulties of making sense of traditional transport interaction data, such as origin (home) to destination (work) flows across a city, and more dynamic and massive datasets, such as the tap-in and tap-outs of travellers on the London Underground (one of his datasets consists of nearly 10 billion records generated over 86 days in the summer of 2012). In both cases, urban science is still struggling to extract and communicate meaningful insight. He concludes that rather than abandoning theory for an empiricist form of data science, there is a pressing need to develop a theoretically insightful urban science.
p.4
In his chapter, Rob Kitchin argues that while there has long been forms of urbanism that are data-informed, a new era of data-driven urbanism unfolding as cities become ever more instrumented and networked, their systems interlinked and integrated, and vast troves of big urban data are being generated and used to manage and control urban life in real-time. He contends that data-driven urbanism is the key mode of production for what have widely been termed smart cities. Adopting an approach that largely maps onto de Waalâs third approach, Kitchin critically examines a number of urban data issues, including: the politics of urban data and production of data assemblages; data ownership, data control, data coverage and access; the creation of buggy, brittle, hackable urban systems (data security, data integrity); and social, political, ethical effects (data protection and privacy, dataveillance, and data uses including social sorting and anticipatory governance). He concludes that whilst data-driven urbanism purports to produce a common-sense, pragmatic, neutral, apolitical, evidence-based form of responsive urban governance, it is nonetheless selective, crafted, flawed, normative and politically inflected. Consequently, whilst data-driven urbanism provides a set of solutions for urban problems, it does so within limitations and in the service of particular interests or there is an overreliance on mathematically and engineered models that do not factor in a cityâs social, cultural, historical, institutional and political complexities; those very things that give cities their character.
Urban data
The second part focuses attention on the nature of urban data, examining them from ontological, political, practical and technical points of view. Importantly, the analysis does not conceive of urban data from a common-sense, essentialist position, wherein they are seen to faithfully and validly represent the state of the world, but rather consider the ways in which data are produced and framed within socio-technical systems.
Teresa Scassa provides a critical overview of crime data and their sharing through open data sites, interactive visualizations, and other media. She details how crime data are far from neutral, objective records of criminal, policing and legal activity, but rather are shaped significantly by legal, institutional and cultural factors. She argues that crime data are subjective and contested, record certain kinds of information but excludes others, and are known to be full of gaps and errors. Moreover, capturing, analysing and acting upon crime data requires human interpretation and judgement, framed with societal and institutional contexts. And yet, despite these issues, crime data are often taken at face value and are used to drive social, policing, security and legal policy and programmes and to underpin new interventions such as predictive policing. While the data do hold value and are important in revealing levels of crime and societyâs institutional response, she contends that they need to be treated with caution, with users considering how, by whom, and for what purposes the data were generated to gauge their veracity and trustworthiness.
p.5
Jim Thatcher and Craig Dalton similarly consider the issues of data veracity and trustworthiness by considering data provenance. They note that data provenance is presently largely instrumental in nature and concerns information about the production and history of a dataset. Such information allows users to know how the data were captured, by whom, using what techniques and technologies, how they were processed and handled, and so on, enabling them to judge their quality, shortcomings and suitability for use. Typically, such information is stored as a metadata â that is, data about the data. However, they contend that such an instrumental approach to data provenance is limited and too technically orientated, ignoring the wider context in which the data are produced and used. Instead, they suggest the use of a more-than-technical form of provenance that not only documents traditional metadata, but also includes situated contextual factors such as motivation, value and power. They formulate this version of data provenance as the recording of âdata encountersâ which capture the always already-cooked nature of data and the contextual nature of its use.
Jim Merricks White likewise is interested in data encounters, but rather than focus on provenance, he seeks to follow data from their generation through to their various uses, exposing how they are cleaned, recombined and put to work. Using an empirical example of infant mortality and their use in city indicator initiatives he charts the translation and circulation of data, seeking to document what he terms âdata threadsâ, highlighting the entanglement of data infrastructures and geography, and their inherent materiality and relationality. He traces how infant mortality data are generated by messy human and computational practices shaped by a framework of definitions and standards. These data are then used in varying ways, reworked to create new derived data, and used in ways not anticipated with respect to their original generation. He notes that the devastating loss of a childâs life is rendered first as trace, then as data point, and then as input to derivative calculations and distant ambitions, in this case various health and city indicator initiatives. With each transformation, he argues the data become increasingly alienated from their material associations and their meaning mutate to reflect new discourses and ideologies. Comparing his notion of data threads to that of âdata journeysâ detailed by Bates et al. (2016), White provides a useful epistemological avenue for thickening the description of data assemblages and how data translate and are woven together across such assemblages.
p.6
Considering the nature of urban data further, Dietmar Offenhuber examines what makes urban data meaningful, the extent to which data are always cooked and never raw, and concerns with respect to the repurposing data. Utilizing the concept of âdata frictionâ he examines the issues that arise when data and metadata generated by different organizations, that utilize different formats and standards, are moved or bought into contact. He notes that despite difficulties and limitations, data sets can develop a life of their own and be repurposed in diverse ways, often as data proxies for other phenomena. Offenhuber examines these issues with respect to Twitter data, which have become widely used in social science research, and satellite imagery generated by the Operational Linescan System (OLS) of the US Air Forceâs Defense Meteorological Satellite Program (DMSP). He contends that Twitter data, despite its widespread repurposing, are âsticky dataâ, that is meaningful when discussed in their original context, but problematic to interpret, extrapolate and generalize otherwise. In contrast, OLS/DMSP data are relatively non-sticky, being used extensively to identify city street lighting and act as a proxy for population density and economic activity, though it is not without problems. Offenhuber thus concludes that as proxies for urban phenomena, both data sources offer only partial perspectives and need to be used with caution.
Urban data technologies and infrastructures
The third part examines the constellation of existing and emerging urban data technologies and infrastructures. The chapters explore a range of political, practical and technical issues and epistemological and theoretical approaches with respect to building, operating and making sense of such data-driven systems.
One way in which a plethora of urban data are made sense of by city managers and shared with citizens is through city dashboards that provide a variety of visualization and analytic tools which enable these data to be explored. While such dashboards provide useful tools for evaluating and managing urban services, understanding and formulating policy, and creating public knowledge and counter-narratives, Rob Kitchin and Gavin McArdleâs analysis reveals a number of conceptual and practical shortcomings. They critically examine six issues with respect to the building and use of city dashboards: epistemology, scope and access, veracity and validity, usability and literacy, use and utility, and ethics. Drawing on their experience of building the Dublin Dashboard, they advocate a shift in thinking and praxis that openly situates the epistemology and instrumental rationality of city dashboards and addresses more technical shortcomings.
Pouria Amirian and Anahid Basiri also consider the sharing and analysis of urban big data, though their focus is more technical in nature. Given the wide variety of different data-driven platforms being utilized across a nu...