Doing Corpus Linguistics
eBook - ePub

Doing Corpus Linguistics

  1. 164 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Doing Corpus Linguistics

Book details
Book preview
Table of contents
Citations

About This Book

Doing Corpus Linguistics offers a practical step-by-step introduction to corpus linguistics, making use of widely available corpora and of a register analysis-based theoretical framework to provide students in Applied Linguistics and TESOL with the understanding and skills necessary to meaningfully analyze corpora and carry out successful corpus-based research. Divided into three parts – Introduction to Doing Corpus Linguistics and Register Analysis; Searches in Available Corpora; and Building Your Own Corpus, Analyzing Your Quantitative Results, and Making Sense of Data – the book emphasizes hands-on experience with performing language analysis research and in interpreting findings in a meaningful and engaging way. Readers are given multiple opportunities to analyze and apply language data by completing smaller tasks and corpus projects using publicly available corpora. The book also takes readers through the process of building a specialized corpus designed to answer a specific research question and provides detailed information on completing a final research project that includes both a written paper and an oral presentation of their specific research projects. Doing Corpus Linguistics provides students in applied linguistics and TESOL with the opportunity to gain proficiency in the technical and interpretive aspects of corpus research and to encourage them to participate in the growing field of corpus linguistics.

Frequently asked questions

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes, you can access Doing Corpus Linguistics by Eniko Csomay, William J. Crawford in PDF and/or ePUB format, as well as other popular books in Languages & Linguistics & Linguistics. We have over one million books available in our catalogue for you to explore.

Information

Publisher
Routledge
Year
2015
ISBN
9781317688051
Edition
1

Part 1

Introduction to Doing Corpus Linguistics and Register Analysis

Chapter 1

Linguistics, Corpus Linguistics, and Language Variation

  • 1.1 Language and Rules/Systems
  • 1.2 What Is Corpus Linguistics?
  • 1.3 Outline of the Book

1.1 Language and Rules/Systems

While all humans use language to communicate, the ability to describe language is not nearly as advanced as our ability to actually use language. One defining component of the scientific study of language (i.e., linguistics) includes a description of how language works. Native speakers of English are able to produce plural nouns that end in different sounds—we say batS and bagZ, not batZ and bagS. These same speakers can also produce plurals of nonsense words that we have never heard before—we would say bligZ and not bligS. Native speakers of English also know that We worked out the problem and We worked the problem out are both acceptable sentences but We worked it out and We worked out it may not be equally acceptable (the latter is likely to sound strange to many native speakers of English). The fact that we can agree on these aspects of English related to the pronunciation of plurals and word order point to the fact that language, in many respects, is predictable (i.e., systematic). Such aspects are not only related to sounds and the order of words, but they are also related to how we might use language in different contexts and for different purposes. For example, we would not be likely to ask a professor for an extension on an assignment by saying: “Hey, man. Gimme an extension.” Instead, we are more likely to make such a request by saying: “Would you please allow me to hand in that assignment tomorrow? I have been so busy that I completely forgot about it.”
While it may be difficult to explain these particular aspects of the English language, native speakers apply these “rules” of language flawlessly. In other words, one important component of linguistic description is to make implicit “rules” or patterns of language (knowledge we use) explicit (knowledge we can describe). It is safe to say that language users follow rules (and sometimes choose not to follow rules) for specific reasons even though they may not be able to explain the rules themselves. An important part of linguistic study focuses on analyzing language and explaining what may seem on the surface to be a confusing circumstance of facts that may not make much sense.
When many people think of language rules, they may think of the grammar and spelling rules that they learned in school. Rules such as “don’t end a sentence with a preposition” or “don’t start a sentence with the word and” are rules that many people remember learning in school. Very often people have strong opinions about these types of rules. For example, consider the excerpt below taken from a grammar website on whether or not to follow the grammar rule of “don’t split an infinitive.”
Even if you buy the sales pitch for language being descriptive rather than prescriptive, splitting infinitives is at the very best inelegant and most certainly hound-dog lazy. It is so incredibly simple to avoid doing it with a second or two of thought that one wonders why it is so common. There are two simple solutions.
(1) “The President decided to not attend the caucus” can be fixed as easily as moving the infinitive: “The President decided not to attend the caucus.” I’d argue that works fine, and not using that simple fix is about as hound-dog lazy as a writer can get, but split infinitives can be avoided entirely with just a bit more thought. How about:
(2) “The President has decided he will not attend the caucus.” What the heck is wrong with that?
It’s hound-dog lazy, I say. Where has the sense of pride gone in writers? (https://gerryellenson.wordpress.com/2012/01/02/to-not-split-infinitives/)
Examples such as these are not uncommon. One would only have to look at letters to the editor in newspapers or at blog posts to find many more instances of people who have very strong opinions about the importance of following particular grammar rules.
So far, we have looked at “rules” as doing two different things: 1) describing implicit, naturally occurring language patterns and 2) prescribing specific, socially accepted forms of language. Although both descriptive and prescriptive perspectives make reference to language rules, prescriptive rules attempt to dictate language use while descriptive rules provide judgment-free statements about language patterns. Both prescriptive and descriptive aspects of language are useful. When writing an academic paper or formal letter, certain language conventions are expected. A prescriptive rule can provide useful guidelines for effective communication. However, descriptive approaches can be useful in uncovering patterns of language that are implicit (as in the examples described above). Descriptive approaches can also be used to see how prescriptive rules are followed by language users.
The concept of language rules raises another interesting question: Why are these rules sometimes followed and sometimes “violated”? Consider the prescriptive infinitive rule described above. Is it accurate to say that those who write to not attend are not following a rule? In some respects, this may be the case, but there is another—perhaps somewhat misunderstood—issue related to language that deserves some attention and serves as a basis for this book: the role of language variation. It is an incontrovertible fact that language varies and changes. The type of English used in the British Isles is quite different from the type of English used in the United States. The type of English used in the British Isles or the United States also varies from region to region or among people from different socio-economic classes. The type of English used 150 years ago in the United States is quite different from the type of English used in the United States today. Language even changes and varies in a single person. The study of language variation seeks to gain an understanding of how language changes and varies for different reasons and in different contexts. There are different perspectives on how to investigate and understand language variation. The perspective that we will take is, as you can tell from the title of the book, related to an area of language study called corpus linguistics.

1.2 What Is Corpus Linguistics?

One way to understand linguistic analysis and language is through corpus linguistics, which looks at how language is actually used in certain contexts and how it can vary from context to context. While understanding variation and contextual differences is a goal shared by researchers in other areas of linguistic research, corpus linguistics describes language variation and use by looking at large amounts of texts that have been produced in similar circumstances. The concept of a “circumstance” or “context” or “situation” depends on how each researcher defines it. Corpus linguistic studies have frequently noted the general distinction between two different modes of language production—written language and spoken language. From a written perspective, one may be interested in contexts such as news writing, text messaging or academic writing. From an oral perspective, one may be interested in language such as news reporting, face-to-face conversation or academic lectures. Although text messaging and academic writing are both written, the purpose of text messaging is quite different from the purpose of academic writing and we would expect, therefore, some degree of language variation in these different written contexts. The same may be said with face-to-face conversation and academic lectures; both are spoken but they have different purposes and consequently have different linguistic characteristics. More generally, we might also expect that spoken language (in all of its various purposes and contexts) would likely differ from written forms of language. Spoken language does not generally have the same type of planning and opportunities for revision that we find in many types of written language. We will consider how different circumstances (or situational variables) can affect language use in the following chapter. But, before we do, we would like to briefly describe what we mean by a corpus.
A corpus is a representative collection of language that can be used to make statements about language use. Corpus linguistics is concerned with understanding how people use language in various contexts. A corpus is a collection of a fairly large number of examples (or, in corpus terms, texts) that share similar contextual or situational characteristics. These texts are then analyzed collectively in order to understand how language is used in these different contexts. The result of this analysis is a collection of language patterns that are recurrent in the corpus and either provide an explanation of language use or serve as the basis for further language analysis. One common method used in corpus research is to look at the environment of a particular word or phrase to see what other words are found (i.e., “collocate”) with the reference word. As an example, we will use the Corpus of Contemporary American English (available at http://corpus.byu.edu/coca/), a publically available collection of over 450 million words of American English, to investigate the use of two words: equal and identical.
In many respects, equal and identical can mean the same thing (two things that are similar to each other), and they are often taken as synonyms of one another. For example, we can use both of these words in a sentence such as: These two students are equal/identical in their performance on the exam with the same general meaning. If we were asked to define the word equal we may use the word identical in our definition (and vice versa). However, if we use a corpus and look at how these words are actually used, a different picture emerges. The Corpus of Contemporary American English (COCA) shows us that, although they may sometimes be synonyms, these two words behave very differently. We are more likely to use expressions such as equal opportunity, equal rights, and equal protection rather than identical opportunity, identical rights, or identical protection. We are not likely to talk about equal twins or equal copies but instead use the phrase identical twins and identical copies. A consideration of the words that follow equal and identical suggest that equal is more likely to modify abstract concepts such as opportunities, rights, and protection while identical is more likely to modify concrete nouns such as twins, items, and houses. Without reference to large amounts of texts, we would likely not be able to make such an observation. This is one example of how corpus linguistics can provide information about language use that can help linguists understand how language is actually used in authentic contexts.
Additionally, the corpus can tell us about frequency differences between equal and identical (see Table 1.1). The top five collocates of equal occur between 950 and 405 times in the COCA corpus and the top five collocates of identical occur between 417 and 20 times in the corpus. In other words, we can see that the word equal is more frequent than the word identical because the frequency of collocates shows a large difference between the two words. In fact, the word equal occurs 22,480 times in the corpus, and the word identical occurs 8,080 times.
In addition to information on collocation and frequency, a corpus will also allow us to examine the extent to which certain types of prescriptive rules are followed. Let us look at what a corpus might tell us about splitting infinitives. Earlier in this chapter, we saw that this rule can raise the ire of some people—to the point of associating some serious character flaws in those writers who do not follow it. The Corpus of Contemporar...

Table of contents

  1. Cover Page
  2. Half Title Page
  3. Title Page
  4. Copyright Page
  5. Contents
  6. List of Tables
  7. List of Figures
  8. Preface
  9. Acknowledgments
  10. Part 1 Introduction to Doing Corpus Linguistics and Register Analysis
  11. Part 2 Searches in Available Corpora
  12. Part 3 Building Your Own Corpus, Analyzing Your Quantitative Results, and Making Sense of Data
  13. Index