III
Uses of Big Data for Sectoral Measurement
9
Nowcasting the Local Economy
Using Yelp Data to Measure Economic Activity
Edward L. Glaeser, Hyunjin Kim, and Michael Luca
9.1 Introduction
Public statistics on local economic activity, provided by the US Census Bureauās County Business Patterns (CBP), the Bureau of Economic Analysis (BEA), the Federal Reserve System (FRS), and state agencies, provide invaluable guidance to local and national policy makers. Whereas national statistics, such as the Bureau of Labor Statisticsā (BLS) monthly job report, are reported in a timely manner, local datasets are often published only after long lags. These datasets are also aggregated to coarse geographic areas, which impose practical limitations on their value. For example, as of August 2017, the latest available CBP data were from 2015, aggregated to the zip code level, and much of the zip code data were suppressed for confidentiality reasons. Similarly, the BEAās metropolitan area statistics have limited value to the leaders of smaller communities within a large metropolitan area.
Data from online platforms such as Yelp, Google, and LinkedIn raise the possibility of enabling researchers and policy makers to supplement official government statistics with crowdsourced data at the granular level provided years before official statistics become available. A growing body of research has demonstrated the potential of digital exhaust to predict economic outcomes of interest (e.g., Cavallo 2018; Choi and Varian 2012; Einav and Levin 2014; Goel et al. 2010; Guzman and Stern 2016; Kang et al. 2013; Wu and Brynjolfsson 2015). Online data sources also make it possible to measure new outcomes that were never included in traditional data sources (Glaeser et al. 2018).
In this paper, we explore the potential for crowdsourced data from Yelp to measure the local economy. Relative to the existing literature on various forecasting activities, our key contribution is to evaluate whether online data can forecast government statistics that provide traditional measures of economic activity, at geographic scale. Previous related work has been less focused on how predictions perform relative to traditional data sources, especially for core local datasets like the CBP (Goel et al. 2010). We particularly focus on whether Yelp data predict more accurately in some places than in others.
By the end of 2016, Yelp listed over 3.7 million businesses with 65.4 million recommended reviews.1 These data are available on a daily basis and with addresses for each business, raising the possibility of measuring economic activity day-by-day and block-by-block. At the same time, it is a priori unclear whether crowdsourced data will accurately measure the local economy at scale, since changes in the number of businesses reflect both changes in the economy and the popularity of a given platform. Moreover, to the extent that Yelp does have predictive power, it is important to understand the conditions under which Yelp is an accurate guide to the local economy.
To shed light on these questions, we test the ability of Yelp data to predict changes in the number of active businesses as measured by the CBP. We find that changes in the number of businesses and restaurants reviewed on Yelp can help to predict changes in the number of overall establishments and restaurants in the CBP, and that predictive power increases with zip code level population density, wealth, and education level.
In section 9.2, we discuss the data. We use the entire set of businesses and reviews on Yelp, which we merged with CBP data on the number of businesses open in a given zip code and year. We first assess the completeness of Yelp data relative to the CBP, beginning with the restaurant industry where Yelp has significant coverage. In 2015, the CBP listed 542,029 restaurants in 24,790 zip codes, and Yelp listed 576,233 restaurants in 22,719 zip codes. Yelp includes restaurants without paid employees that may be overlooked by the US Census Bureauās Business Register. We find that there are 4,355 zip codes with restaurants in the CBP that do not have any restaurants in Yelp. Similarly, there are 2,284 zip codes with Yelp restaurants and no CBP restaurants.
We find that regional variation in Yelp coverage is strongly associated with the underlying variation in population density. For example, there are more Yelp restaurants than CBP restaurants in New York City, while rural areas like New Madison, Ohio have limited Yelp coverage. In 2015, 95 percent of the US population lived in zip codes in which Yelp counted at least 50 percent of the number of restaurants that the CBP recorded. This cross-sectional analysis suggests that Yelp data are likely to be more useful for policy analyses in areas with higher populat...