Applying Language Technology in Humanities Research
eBook - ePub

Applying Language Technology in Humanities Research

Design, Application, and the Underlying Logic

  1. English
  2. ePUB (mobile friendly)
  3. Available on iOS & Android
eBook - ePub

Applying Language Technology in Humanities Research

Design, Application, and the Underlying Logic

Book details
Book preview
Table of contents
Citations

About This Book

This book presents established and state-of-the-art methods in Language Technology (including text mining, corpus linguistics, computational linguistics, and natural language processing), and demonstrates how they can be applied by humanities scholars working with textual data. The landscape of humanities research has recently changed thanks to the proliferation of big data and large textual collections such as Google Books, Early English Books Online, and Project Gutenberg. These resources have yet to be fully explored by new generations of scholars, and the authors argue that Language Technology has a key role to play in the exploration of large-scale textual data. The authors use a series of illustrative examples from various humanistic disciplines (mainly but not exclusively from History, Classics, and Literary Studies) to demonstrate basic and more complex use-case scenarios. This book will be useful tograduate students and researchers in humanistic disciplines working with textual data, including History, Modern Languages, Literary studies, Classics, and Linguistics. This is also a very useful book for anyone teaching or learning Digital Humanities and interested in the basic concepts from computational linguistics, corpus linguistics, and natural language processing.

Frequently asked questions

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes, you can access Applying Language Technology in Humanities Research by Barbara McGillivray,Gábor Mihály Tóth in PDF and/or ePUB format, as well as other popular books in Languages & Linguistics & Linguistics. We have over one million books available in our catalogue for you to explore.

Information

© The Author(s) 2020
B. McGillivray, G. M. TóthApplying Language Technology in Humanities Researchhttps://doi.org/10.1007/978-3-030-46493-6_1
Begin Abstract

1. Introducing Language Technology and Humanities

Barbara McGillivray1, 3 and Gábor Mihály Tóth2
(1)
Faculty of Modern and Medieval Languages, University of Cambridge, Cambridge, UK
(2)
Viterbi School of Engineering, Signal Analysis Lab (SAIL), University of Southern California, Los Angeles, CA, USA
(3)
The Alan Turing Institute, London, UK
Barbara McGillivray (Corresponding author)
Gábor Mihály Tóth
1.1 Why Language Technology for the Humanities?
1.2 Structure of the Book
References

Abstract

This chapter outlines the relevance of language technology for the exploration and study of big textual data sets in the humanities. We also discuss the importance of understanding the logic underlying the use of language technology to resolve research problems in the humanities. Finally, we outline the three pillars of the approach we follow throughout the book: focus on application through both simplified and more complex use-case examples; discussion of both the potential and the limitations of language technology; and explanation of how to translate humanities research questions into research problems using language technology.
Keywords
Big dataDistant readingTextual resourceLanguage technologyHumanities research
End Abstract

1.1 Why Language Technology for the Humanities?

In the last two decades, the humanities have seen an unprecedented change opening up new directions for the inquiry of human cultures and their histories: the yet not fully explored availability of digitized humanistic texts. Thanks to the mass digitization of analogue resources preserved in libraries and archives, large textual collections, such as Google Books, Early English Books Online, and Project Gutenberg, have become available on the World Wide Web. The rise of digital humanities as a new academic field has contributed to the proliferation of research infrastructures and centres dedicated to the study and distribution of textual resources in the humanities. The mission of digital humanities projects such as CLARIN European Research Infrastructure, DARIAH and the ESRC Centre for Corpus Approaches to Social Science is to make textual resources not only available but also investigable for scholars. Digital humanists have proposed the method of distant reading or macro analysis for learning from large textual resources (Jockers 2013; Moretti 2015). Alongside a growing interest in large textual resources, there is an increasing demand from (digital) humanities researchers for quantitative and computational skills. The current offering in this space is rich, with a range of training options (including dedicated summer schools like the digital humanities training events at Oxford,1 DHSI at Victoria,2 or the European Summer School in Digital Humanities in Leipzig3) and publications (examples include Bird et al. 2009; Gries 2009; Hockey 2000; Jockers 2014; Piotrowski 2012). Nonetheless, textual resources in the humanities and beyond raise a key challenge: they are too big to be read by humans interested in analysing them. The potential lying in the exploration of large textual collections has not been fully realized; yet, it remains a key task for the current and the next generations of humanities scholars.
To explore tens of thousands of books or millions of historical documents, humanities scholars inevitably need the power of computing technologies. Among these technologies, there is one that has had and will definitely continue to have a pivotal role in the exploration of big textual resources. Language technology, which can help unlock and investigate large amounts of textual data, is a truly interdisciplinary enterprise. It is not an academic field per se; it is rather a collection of methods that deal with textual data. Language technology sits at the crossroads between corpus and computational linguistics, natural language processing and text mining, data science and data visualization. As we will demonstrate throughout this book, language technology can be used to address a great variety of research problems involved in the investigation of textual data in the humanities and beyond.

1.2 Structure of the Book

This book examines research problems that are relevant for humanities and can be addressed with the help of language technology. The first chapter demonstrates how language technology can help structure raw textual data and represent them as a resource meaningful for both humans and computers. For instance, the lyrics of thousands of popular songs are now available in plain text on the World Wide Web. But lyrics in plain text format do not distinguish the title and the refrain of a song. This is an example of unstructured data because various components of a song are not marked in a way that computers can automatically extract them. Language technology can help detect structural components within a text such as the refrain of a song; it can also help represent a song in digital form so that different structural components are distinguished and readily available for further computational investigations. Language technology also supports word-level investigations of textuality. The lyrics of a song consist of not only structural units, but also different types of words such as nouns, verbs, and names of people. In plain text format, word-level information about lyrics is not readily usable by computing tools; for instance, it is not possible to extract all proper names from a collection of lyrics in plain text. As Chapter 2 explains, language technology helps attach different types of information to each word of a text; it also offers ways to record this information in well-established data formats.
Language technology also facilitates the bottom-up exploration of textual resources and textuality. For instance, finding terms that are significant elements of a text is an important component of bottom-up explorations. We will discuss how the investigation of word frequency can support this in Chapter 3. Language technology methods can map terms closely related to a given concept in thousands of texts. This form of bottom-up exploration is discussed in Chapter 4. Language technology methods can also help in bottom-up studies of word meaning. For instance, the meaning of a concept can be investigated by drawing on a dictionary definition, but it can also be inferred from the way authors used that concept in their works. Chapter 5 examines how language technology enables this type of exploration of meaning. Finally, language technology has tools to detect patterns recurring over thousands of texts. As the proverb says, there is nothing new under the sun. Similar themes and ideas recur over texts from different historical times. However, detecting them in large textual resources is a tedious (or sometimes impossible) task for human readers. As Chapter 5 illustrates, language technology supports humans in their efforts to detect recurrence and similarity in texts.
To realize the rich potential that language technology offers, humanists need to bridge two interrelated gaps. The first is the conceptual gap between humanities research problems and language technology methods. As a simple example, language technology can detect how many times a given term is used in a given set of historical sources. In more technical terms, with language technology we can study word frequency. But rarely do historians ask how many times a term occurs in their source texts. Rather, they inquire about the prevailing social concepts in a given historical time. There is a conceptual gap between word frequency and the prevailing social concepts. This simple example also sheds light upon the second gap, which lies between qualitative and quantitative approaches. The insights that language technology can deliver are very often quantitative and difficult to interpret with a qualitative framework. Bridging these gaps is a daunting task for scholars, and this publication seeks to assist them in this task. We believe that the potential of language technology can be realized if there is a clear understanding of the logic underlying it. The overall goal of this book is therefore to apply the logic of language technology to the resolution of humanistic research problems. We will attempt to convey this logic by following a didactic approach with three pillars.
First, we guide you through various research procedures involved in the application of language technology. The first chapter looks at the design of language resources, the first step in the application of language technology. The following chapters study specific humanities-related research problems and show how to design quantitative research procedures to address them. We believe that an understanding of how to design a research process in language technology is one of the key steps to understanding its overall logic. We do not, however, explain the technical implementation of the research procedures discussed throughout the book.4 Thanks to the development of computing tools in popular programming languages, such as Python and R, many of the technological procedures presented here have been (at least partially) automated, and their implementation can be learnt by following excellent on-line tutorials and manuals. But what is difficult to learn from on-line resources is ...

Table of contents

  1. Cover
  2. Front Matter
  3. 1. Introducing Language Technology and Humanities
  4. 2. Design of Text Resources and Tools
  5. 3. Frequency
  6. 4. Collocation
  7. 5. Word Meaning in Texts
  8. 6. Mining Textual Collections
  9. 7. The Innovative Potential of Language Technology for the Humanities
  10. Back Matter