Big Data in Cognitive Science
eBook - ePub

Big Data in Cognitive Science

  1. 374 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Big Data in Cognitive Science

Book details
Book preview
Table of contents
Citations

About This Book

While laboratory research is the backbone of collecting experimental data in cognitive science, a rapidly increasing amount of research is now capitalizing on large-scale and real-world digital data. Each piece of data is a trace of human behavior and offers us a potential clue to understanding basic cognitive principles. However, we have to be able to put the pieces together in a reasonable way, which necessitates both advances in our theoretical models and development of new methodological techniques.

The primary goal of this volume is to present cutting-edge examples of mining large-scale and naturalistic data to discover important principles of cognition and evaluate theories that would not be possible without such a scale. This book also has a mission to stimulate cognitive scientists to consider new ways to harness big data in order to enhance our understanding of fundamental cognitive processes. Finally, this book aims to warn of the potential pitfalls of using, or being over-reliant on, big data and to show how big data can work alongside traditional, rigorously gathered experimental data rather than simply supersede it.

In sum, this groundbreaking volume presents cognitive scientists and those in related fields with an exciting, detailed, stimulating, and realistic introduction to big data – and to show how it may greatly advance our understanding of the principles of human memory, perception, categorization, decision-making, language, problem-solving, and representation.

Frequently asked questions

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes, you can access Big Data in Cognitive Science by Michael N. Jones in PDF and/or ePUB format, as well as other popular books in Psychology & Cognitive Psychology & Cognition. We have over one million books available in our catalogue for you to explore.

Information

Year
2016
ISBN
9781315413556
Edition
1
1
DEVELOPING COGNITIVE THEORY BY MINING LARGE-SCALE NATURALISTIC DATA
Michael N. Jones
Abstract
Cognitive research is increasingly coming out of the laboratory. It is becoming much more common to see research that repurposes large-scale and naturalistic data sources to develop and evaluate cognitive theories at a scale not previously possible. We now have unprecedented availability of massive digital data sources that are the product of human behavior and offer clues to understand basic principles of cognition. A key challenge for the field is to properly interrogate these data in a theory-driven way to reverse engineer the cognitive forces that generated them; this necessitates advances in both our theoretical models and our methodological techniques. The arrival of Big Data has been met with healthy skepticism by the field, but has also been seen as a genuine opportunity to advance our understanding of cognition. In addition, theoretical advancements from Big Data are heavily intertwined with new methodological developments—new techniques to answer questions from Big Data also give us new questions that could not previously have been asked. The goal of this volume is to present emerging examples from across the field that use large and naturalistic data to advance theories of cognition that would not be possible in the traditional laboratory setting.
While laboratory research is still the backbone of tracking causation among behavioral variables, more and more cognitive research is now letting experimental control go in favor of mining large-scale and real-world datasets. We are seeing an exponential1 expansion of data available to us that is the product of human behavior: Social media, mobile device sensors, images, RFID tags, linguistic corpora, web search logs, and consumer product reviews, just to name a few streams. Since 2012, about 2.5 exabytes of digital data are created every day (McAfee, Brynjolfsson, Davenport, Patil, & Barton, 2012). Each little piece of data is a trace of human behavior and offers us a potential clue to understand basic cognitive principles; but we have to be able to put all those pieces together in a reasonable way. This approach necessitates both advances in our theoretical models and development of new methodological techniques adapted from the information sciences.
Big Data sources are now allowing cognitive scientists to evaluate theoretical models and make new discoveries at a resolution not previously possible. For example, we can now use online services like Netflix, Amazon, and Yelp to evaluate theories of decision-making in the real world and at an unprecedented scale. Wikipedia edit histories can be analyzed to explore information transmission and problem solving across groups. Linguistic corpora allow us to quantitatively evaluate theories of language adaptation over time and generations (Lupyan & Dale, 2010) and models of linguistic entrainment (Fusaroli, Perlman, Mislove, Paxton, Matlock, & Dale, 2015). Massive image repositories are being used to advance models of vision and perception based on natural scene statistics (Griffiths, Abbott, & Hsu, 2016; Khosla, Raju, Torralba, & Oliva, 2015). Twitter and Google search trends can be used to track the outbreak and spread of “infectious” ideas, memory contagion, and information transmission (Chen & Sakamoto, 2013; Masicampo & Ambady, 2014; Wu, Hofman, Mason, & Watts, 2011). Facebook feeds can be manipulated2 to explore information diffusion in social networks (Bakshy, Rosenn, Marlow, & Adamic, 2012; Kramer, Guillory, & Hancock, 2014). Theories of learning can be tested at large scales and in real classroom settings (Carvalho, Braithwaite, de Leeuw, Motz, & Goldstone, 2016; Fox, Hearst, & Chi, 2014). Speech logs afford both theoretical advancements in auditory speech processing, and practical advancements in automatic speech comprehension systems.
The primary goal of this volume is to present cutting-edge examples that use large and naturalistic data to uncover fundamental principles of cognition and evaluate theories that would not be possible without such scale. A more general aim of the volume is to take a very careful and critical look at the role of Big Data in our field. Hence contributions to this volume were handpicked to be examples of advancing theory development with large and naturalistic data.
What is Big Data?
Before trying to evaluate whether Big Data could be used to benefit cognitive science, a very fair question is simply what is Big Data? Big Data is a very popular buzzword in the contemporary media, producing much hype and many misconceptions. Whatever Big Data is, it is having a revolutionary impact on a wide range of sciences, is a “game-changer,” transforming the way we ask and answer questions, and is a must-have for any modern scientist’s toolbox. But when pressed for a definition, there seems to be no solid consensus, particularly among cognitive scientists. We know it probably doesn’t fit in a spreadsheet, but opinions diverge beyond that. The issue is now almost humorous, with Dan Ariely’s popular quip comparing Big Data to teenage sex, in that “everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it.”
As scientists, we are quite fond of careful operational definitions. However, Big Data and data science are still-evolving concepts, and are moving targets for formal definition. Definitions tend to vary depending on the field of study. A strict interpretation of Big Data from the computational sciences typically refers to datasets that are so massive and rapidly changing that our current data processing methods are inadequate. Hence, it is a drive for the development of distributed storage platforms and algorithms to analyze datasets that are currently out of reach. The term extends to challenges inherent in data capture, storage, transfer, and predictive analytics. As a loose quantification, data under this interpretation currently become “big” at scales north of the exabyte.
Under this strict interpretation, work with true Big Data is by definition quite rare in the sciences; it is more development of architectures and algorithms to manage these rapidly approaching scale challenges that are still for the most part on the horizon (NIST Big Data Working Group, 2014). At this scale, it isn’t clear that there are any problems in cognitive science that are true Big Data problems yet. Perhaps the largest data project in the cognitive and neural sciences is the Human Connectome Project (Van Essen et al., 2012), an ambitious project aiming to construct a network map of anatomical and functional connectivity in the human brain, linked with batteries of behavioral task performance. Currently, the project is approaching a petabyte of data. By comparison, the Large Hadron Collider project at CERN records and stores over 30 petabytes of data from experiments each year.3
More commonly, the Gartner 3 Vs definition of Big Data is used across multiple fields: “Big data is high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision-making, insight discovery and process optimization” (Laney, 2012). Volume is often indicative of the fact that Big Data records and observes everything within a recording register, in contrast to our commonly used methods of sampling in the behavioral sciences. Velocity refers to the characteristic that Big Data is often a real-time stream of rapidly captured data. The final characteristic, variety, denotes that Big Data draws from multiple qualitatively different information sources (text, audio, images, GPS, etc.), and uses joint inference or fusion to answer questions that are not possible by any source alone. But far from being expensive to collect, Big Data is usually a natural byproduct of digital interaction.
So while a strict interpretation of Big Data puts it currently out of reach, it is simultaneously everywhere by more liberal interpretations. Predictive analytics based on machine learning has been hugely successful in many applied settings (see Hu, Wen, & Chua, 2014, for a review). Newer definitions of Big Data summarize it as more focused on repurposing naturalistic digital footprints; the size of “big” is relative across different fields (NIST Big Data Working Group, 2014). The NIH BD2K (Big Data to Knowledge) program is explicit that a Big Data approach is best defined by what is large and naturalistic to specific subfields, not an absolute value in bytes. In addition, BD2K notes that a core Big Data problem involves joint inference across multiple databases. Such combinatorial problems are clearly Big Data, and are perfectly suited for theoretically driven cognitive models—many answers to current theoretical and practical questions may be hidden in the complimentary relationship between data sources.
What is Big Data to Cognitive Science?
Much of the publicity surrounding Big Data has focused on its insight power for business analytics. Within the cognitive sciences, we have been considerably more skeptical of Big Data’s promise, largely because we place such a high value on explanation over prediction. A core goal of any cognitive scientist is to fully understand the system under investigation, rather than being satisfied with a simple descriptive or predictive theory.
Understanding the mind is what makes an explanatory cognitive model distinct from a statistical predictive model—our parameters often reflect hypothesized cognitive processes or representations (e.g. attention, memory capacity, decision thresholds, etc.) as opposed to the abstract predictive parameters of, say, weights in a regression model. Predictive models are able to make predictions of new data provided they are of the same sort as the data on which the model was trained (e.g. predicting a new point on a forgetting curve). Cognitive models go a step further: An explanatory model should be able to make predictions of how the human will behave in situations and paradigms that are novel and different from the situations on which the model was built but that recruit the same putative mechanism(s) (e.g. explaining the process of forgetting).
Marcus and Davis (2014) have argued rather convincingly that Big Data is a scientific idea that should be retired. While it is clear that large datasets are useful in discovering correlations and predicting common patterns, more data do not on their own yield explanatory causal relationships. Big Data and machine learning techniques are excellent bedfellows to make predictions with greater fidelity and accuracy. But the match between Big Data and cognitive models is less clear; because most cognitive models strive to explain causal relationships, they may be much better paired with experimental data, which shares the same goal. Marcus and Davis note several ways in which paying attention to Big Data may actually lead the scientist astray, compared to a much smaller amount of data from a well-controlled laboratory scenario.
In addition, popular media headlines are chock-full of statements about how theory is obsolete now that Big Data has arrived. But theory is a simplified model of empirical phenomena—theory explains data. If anything, cognitive theory is more necessary to help us understand Big Data in a principled way given that much of the data were generated by the cognitive systems that we have carefully studied in the laboratory, and cognitive models help us to know where to search and what to search for as the data magnitude grows.
Despite initial skepticism, Big Data has also been embraced by cognitive science as a genuine opportunity to develop and refine cognitive theory (Griffiths, 2015). Criticism of research using Big Data in an atheoretic way is a fair critique of the way some scientists (and many outside academia) are currently using Big Data. However, there are also scientists making use of large datasets to test theory-driven questions—questions that would be unanswerable without access to large naturalistic datasets and new machine learning approaches. Cognitive scientists are, by training, [experimental] control freaks. But the methods used by the field to achieve laboratory control also serve to distract it from exploring cognitive mechanisms through data mining methods applied Big Data.
Certainly, Big Data is considerably more information than we typically collect in a laboratory experiment. But it is also naturalistic, and a footprint of cognitive mechanisms operating in the wild (see Goldstone & Lupyan, 2016, for a recent survey). There is a genuine concern in the cognitive sciences that many models we are developing may be overfit to specific laboratory phenomena that neither exist nor can be generalized beyond the walls of the lab. The standard cognitive experiment takes place in one hour in a well-controlled setting with variables that normally covary in the real world held constant. This allows us to determine conclusively that the flow of causation is from our manipulated variable(s) to the dependent variable, and often by testing discrete settings (“factorology”; Balota, Yap, Hutchison, & Cortese, 2012).
It is essential to remember that the cognitive mechanisms we study in the laboratory evolved to handle real information-processing problems in the real world. By “capturing” and studying a mechanism in a controlled environment, we risk discovering experiment or paradigm-specific strategies that are a response to the experimental factors that the mechanism did not evolve to handle, and in a situation that does not exist in the real world. While deconfounding factors is an essential part of an experiment, the mechanism may well have evolved to thrive in a rich statistically redundant environment. In this sense, cognitive experiments in the lab may be somewhat analogous to studying captive animals in the zoo and then extrapolating to behavior in the wil...

Table of contents

  1. Cover
  2. Title Page
  3. Copyright Page
  4. Contents
  5. Contributors
  6. 1 Developing Cognitive Theory by Mining Large-scale Naturalistic Data
  7. 2 Sequential Bayesian Updating for Big Data
  8. 3 Predicting and Improving Memory Retention: Psychological Theory Matters in the Big Data Era
  9. 4 Tractable Bayesian Teaching
  10. 5 Social Structure Relates to Linguistic Information Density
  11. 6 Music Tagging and Listening: Testing the Memory Cue Hypothesis in a Collaborative Tagging System
  12. 7 FlickrÂŽ Distributional Tagspace: Evaluating the Semantic Spaces Emerging from FlickrÂŽ Tag Distributions
  13. 8 Large-scale Network Representations of Semantics in the Mental Lexicon
  14. 9 Individual Differences in Semantic Priming Performance: Insights from the Semantic Priming Project
  15. 10 Small Worlds and Big Data: Examining the Simplification Assumption in Cognitive Modeling
  16. 11 Alignment in Web-based Dialogue: Who Aligns, and How Automatic Is It? Studies in Big-Data Computational Psycholinguistics
  17. 12 Attention Economies, Information Crowding, and Language Change
  18. 13 Decision by Sampling: Connecting Preferences to Real-World Regularities
  19. 14 Crunching Big Data with Fingertips: How Typists Tune Their Performance Toward the Statistics of Natural Language
  20. 15 Can Big Data Help Us Understand Human Vision?
  21. Index