eBook - ePub

Available until 25 Jan |Learn more

Doing Corpus Linguistics

Name: Doing Corpus Linguistics
ISBN: 9781317688051

Eniko Csomay,

William J. Crawford,

164 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Available until 25 Jan |Learn more

Doing Corpus Linguistics

Eniko Csomay,

William J. Crawford,

About this book

Doing Corpus Linguistics offers a practical step-by-step introduction to corpus linguistics, making use of widely available corpora and of a register analysis-based theoretical framework to provide students in Applied Linguistics and TESOL with the understanding and skills necessary to meaningfully analyze corpora and carry out successful corpus-based research. Divided into three parts – Introduction to Doing Corpus Linguistics and Register Analysis; Searches in Available Corpora; and Building Your Own Corpus, Analyzing Your Quantitative Results, and Making Sense of Data – the book emphasizes hands-on experience with performing language analysis research and in interpreting findings in a meaningful and engaging way. Readers are given multiple opportunities to analyze and apply language data by completing smaller tasks and corpus projects using publicly available corpora. The book also takes readers through the process of building a specialized corpus designed to answer a specific research question and provides detailed information on completing a final research project that includes both a written paper and an oral presentation of their specific research projects. Doing Corpus Linguistics provides students in applied linguistics and TESOL with the opportunity to gain proficiency in the technical and interpretive aspects of corpus research and to encourage them to participate in the growing field of corpus linguistics.

Tools to learn more effectively

Saving Books

Keyword Search

Annotating Text

Listen to it instead

Information

Publisher

Taylor & Francis

Year

2015

eBook ISBN

9781317688051

Edition

Topic

Languages & Linguistics

Subtopic

Linguistics

Index

Languages & Linguistics

Part 1 Introduction to Doing Corpus Linguistics and Register Analysis

Chapter 1 Linguistics, Corpus Linguistics, and Language Variation

1.1 Language and Rules/Systems
1.2 What Is Corpus Linguistics?
1.3 Outline of the Book

1.1 Language and Rules/Systems

While all humans use language to communicate, the ability to describe language is not nearly as advanced as our ability to actually use language. One defining component of the scientific study of language (i.e., linguistics) includes a description of how language works. Native speakers of English are able to produce plural nouns that end in different sounds—we say batS and bagZ, not batZ and bagS. These same speakers can also produce plurals of nonsense words that we have never heard before—we would say bligZ and not bligS. Native speakers of English also know that We worked out the problem and We worked the problem out are both acceptable sentences but We worked it out and We worked out it may not be equally acceptable (the latter is likely to sound strange to many native speakers of English). The fact that we can agree on these aspects of English related to the pronunciation of plurals and word order point to the fact that language, in many respects, is predictable (i.e., systematic). Such aspects are not only related to sounds and the order of words, but they are also related to how we might use language in different contexts and for different purposes. For example, we would not be likely to ask a professor for an extension on an assignment by saying: “Hey, man. Gimme an extension.” Instead, we are more likely to make such a request by saying: “Would you please allow me to hand in that assignment tomorrow? I have been so busy that I completely forgot about it.”

While it may be difficult to explain these particular aspects of the English language, native speakers apply these “rules” of language flawlessly. In other words, one important component of linguistic description is to make implicit “rules” or patterns of language (knowledge we use) explicit (knowledge we can describe). It is safe to say that language users follow rules (and sometimes choose not to follow rules) for specific reasons even though they may not be able to explain the rules themselves. An important part of linguistic study focuses on analyzing language and explaining what may seem on the surface to be a confusing circumstance of facts that may not make much sense.

When many people think of language rules, they may think of the grammar and spelling rules that they learned in school. Rules such as “don’t end a sentence with a preposition” or “don’t start a sentence with the word and” are rules that many people remember learning in school. Very often people have strong opinions about these types of rules. For example, consider the excerpt below taken from a grammar website on whether or not to follow the grammar rule of “don’t split an infinitive.”

Even if you buy the sales pitch for language being descriptive rather than prescriptive, splitting infinitives is at the very best inelegant and most certainly hound-dog lazy. It is so incredibly simple to avoid doing it with a second or two of thought that one wonders why it is so common. There are two simple solutions.

(1) “The President decided to not attend the caucus” can be fixed as easily as moving the infinitive: “The President decided not to attend the caucus.” I’d argue that works fine, and not using that simple fix is about as hound-dog lazy as a writer can get, but split infinitives can be avoided entirely with just a bit more thought. How about:

(2) “The President has decided he will not attend the caucus.” What the heck is wrong with that?

It’s hound-dog lazy, I say. Where has the sense of pride gone in writers? (https://gerryellenson.wordpress.com/2012/01/02/to-not-split-infinitives/)

Examples such as these are not uncommon. One would only have to look at letters to the editor in newspapers or at blog posts to find many more instances of people who have very strong opinions about the importance of following particular grammar rules.

So far, we have looked at “rules” as doing two different things: 1) describing implicit, naturally occurring language patterns and 2) prescribing specific, socially accepted forms of language. Although both descriptive and prescriptive perspectives make reference to language rules, prescriptive rules attempt to dictate language use while descriptive rules provide judgment-free statements about language patterns. Both prescriptive and descriptive aspects of language are useful. When writing an academic paper or formal letter, certain language conventions are expected. A prescriptive rule can provide useful guidelines for effective communication. However, descriptive approaches can be useful in uncovering patterns of language that are implicit (as in the examples described above). Descriptive approaches can also be used to see how prescriptive rules are followed by language users.

The concept of language rules raises another interesting question: Why are these rules sometimes followed and sometimes “violated”? Consider the prescriptive infinitive rule described above. Is it accurate to say that those who write to not attend are not following a rule? In some respects, this may be the case, but there is another—perhaps somewhat misunderstood—issue related to language that deserves some attention and serves as a basis for this book: the role of language variation. It is an incontrovertible fact that language varies and changes. The type of English used in the British Isles is quite different from the type of English used in the United States. The type of English used in the British Isles or the United States also varies from region to region or among people from different socio-economic classes. The type of English used 150 years ago in the United States is quite different from the type of English used in the United States today. Language even changes and varies in a single person. The study of language variation seeks to gain an understanding of how language changes and varies for different reasons and in different contexts. There are different perspectives on how to investigate and understand language variation. The perspective that we will take is, as you can tell from the title of the book, related to an area of language study called corpus linguistics.

1.2 What Is Corpus Linguistics?

One way to understand linguistic analysis and language is through corpus linguistics, which looks at how language is actually used in certain contexts and how it can vary from context to context. While understanding variation and contextual differences is a goal shared by researchers in other areas of linguistic research, corpus linguistics describes language variation and use by looking at large amounts of texts that have been produced in similar circumstances. The concept of a “circumstance” or “context” or “situation” depends on how each researcher defines it. Corpus linguistic studies have frequently noted the general distinction between two different modes of language production—written language and spoken language. From a written perspective, one may be interested in contexts such as news writing, text messaging or academic writing. From an oral perspective, one may be interested in language such as news reporting, face-to-face conversation or academic lectures. Although text messaging and academic writing are both written, the purpose of text messaging is quite different from the purpose of academic writing and we would expect, therefore, some degree of language variation in these different written contexts. The same may be said with face-to-face conversation and academic lectures; both are spoken but they have different purposes and consequently have different linguistic characteristics. More generally, we might also expect that spoken language (in all of its various purposes and contexts) would likely differ from written forms of language. Spoken language does not generally have the same type of planning and opportunities for revision that we find in many types of written language. We will consider how different circumstances (or situational variables) can affect language use in the following chapter. But, before we do, we would like to briefly describe what we mean by a corpus.

A corpus is a representative collection of language that can be used to make statements about language use. Corpus linguistics is concerned with understanding how people use language in various contexts. A corpus is a collection of a fairly large number of examples (or, in corpus terms, texts) that share similar contextual or situational characteristics. These texts are then analyzed collectively in order to understand how language is used in these different contexts. The result of this analysis is a collection of language patterns that are recurrent in the corpus and either provide an explanation of language use or serve as the basis for further language analysis. One common method used in corpus research is to look at the environment of a particular word or phrase to see what other words are found (i.e., “collocate”) with the reference word. As an example, we will use the Corpus of Contemporary American English (available at http://corpus.byu.edu/coca/), a publically available collection of over 450 million words of American English, to investigate the use of two words: equal and identical.

In many respects, equal and identical can mean the same thing (two things that are similar to each other), and they are often taken as synonyms of one another. For example, we can use both of these words in a sentence such as: These two students are equal/identical in their performance on the exam with the same general meaning. If we were asked to define the word equal we may use the word identical in our definition (and vice versa). However, if we use a corpus and look at how these words are actually used, a different picture emerges. The Corpus of Contemporary American English (COCA) shows us that, although they may sometimes be synonyms, these two words behave very differently. We are more likely to use expressions such as equal opportunity, equal rights, and equal protection rather than identical opportunity, identical rights, or identical protection. We are not likely to talk about equal twins or equal copies but instead use the phrase identical twins and identical copies. A consideration of the words that follow equal and identical suggest that equal is more likely to modify abstract concepts such as opportunities, rights, and protection while identical is more likely to modify concrete nouns such as twins, items, and houses. Without reference to large amounts of texts, we would likely not be able to make such an observation. This is one example of how corpus linguistics can provide information about language use that can help linguists understand how language is actually used in authentic contexts.

Additionally, the corpus can tell us about frequency differences between equal and identical (see Table 1.1). The top five collocates of equal occur between 950 and 405 times in the COCA corpus and the top five collocates of identical occur between 417 and 20 times in the corpus. In other words, we can see that the word equal is more frequent than the word identical because the frequency of collocates shows a large difference between the two words. In fact, the word equal occurs 22,480 times in the corpus, and the word identical occurs 8,080 times.

In addition to information on collocation and frequency, a corpus will also allow us to examine the extent to which certain types of prescriptive rules are followed. Let us look at what a corpus might tell us about splitting infinitives. Earlier in this chapter, we saw that this rule can raise the ire of some people—to the point of associating some serious character flaws in those writers who do not follow it. The Corpus of Contemporar...

Cover Page
Half Title Page
Title Page
Copyright Page
Contents
List of Tables
List of Figures
Preface
Acknowledgments
Part 1 Introduction to Doing Corpus Linguistics and Register Analysis
Part 2 Searches in Available Corpora
Part 3 Building Your Own Corpus, Analyzing Your Quantitative Results, and Making Sense of Data
Index

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 990+ topics, we’ve got you covered! Learn about our mission

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud

Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app

Yes, you can access Doing Corpus Linguistics by Eniko Csomay,William J. Crawford in PDF and/or ePUB format, as well as other popular books in Languages & Linguistics & Linguistics. We have over one million books available in our catalogue for you to explore.