Part 1
Introduction to Doing Corpus Linguistics and Register Analysis
Chapter 1
Linguistics, Corpus Linguistics, and Language Variation
- 1.1 Language and Rules/Systems
- 1.2 What Is Corpus Linguistics?
- 1.3 Outline of the Book
1.1 Language and Rules/Systems
While all humans use language to communicate, the ability to describe language is not nearly as advanced as our ability to actually use language. One defining component of the scientific study of language (i.e., linguistics) includes a description of how language works. Native speakers of English are able to produce plural nouns that end in different soundsâwe say batS and bagZ, not batZ and bagS. These same speakers can also produce plurals of nonsense words that we have never heard beforeâwe would say bligZ and not bligS. Native speakers of English also know that We worked out the problem and We worked the problem out are both acceptable sentences but We worked it out and We worked out it may not be equally acceptable (the latter is likely to sound strange to many native speakers of English). The fact that we can agree on these aspects of English related to the pronunciation of plurals and word order point to the fact that language, in many respects, is predictable (i.e., systematic). Such aspects are not only related to sounds and the order of words, but they are also related to how we might use language in different contexts and for different purposes. For example, we would not be likely to ask a professor for an extension on an assignment by saying: âHey, man. Gimme an extension.â Instead, we are more likely to make such a request by saying: âWould you please allow me to hand in that assignment tomorrow? I have been so busy that I completely forgot about it.â
While it may be difficult to explain these particular aspects of the English language, native speakers apply these ârulesâ of language flawlessly. In other words, one important component of linguistic description is to make implicit ârulesâ or patterns of language (knowledge we use) explicit (knowledge we can describe). It is safe to say that language users follow rules (and sometimes choose not to follow rules) for specific reasons even though they may not be able to explain the rules themselves. An important part of linguistic study focuses on analyzing language and explaining what may seem on the surface to be a confusing circumstance of facts that may not make much sense.
When many people think of language rules, they may think of the grammar and spelling rules that they learned in school. Rules such as âdonât end a sentence with a prepositionâ or âdonât start a sentence with the word andâ are rules that many people remember learning in school. Very often people have strong opinions about these types of rules. For example, consider the excerpt below taken from a grammar website on whether or not to follow the grammar rule of âdonât split an infinitive.â
Even if you buy the sales pitch for language being descriptive rather than prescriptive, splitting infinitives is at the very best inelegant and most certainly hound-dog lazy. It is so incredibly simple to avoid doing it with a second or two of thought that one wonders why it is so common. There are two simple solutions.
(1) âThe President decided to not attend the caucusâ can be fixed as easily as moving the infinitive: âThe President decided not to attend the caucus.â Iâd argue that works fine, and not using that simple fix is about as hound-dog lazy as a writer can get, but split infinitives can be avoided entirely with just a bit more thought. How about:
(2) âThe President has decided he will not attend the caucus.â What the heck is wrong with that?
Itâs hound-dog lazy, I say. Where has the sense of pride gone in writers? (https://gerryellenson.wordpress.com/2012/01/02/to-not-split-infinitives/)
Examples such as these are not uncommon. One would only have to look at letters to the editor in newspapers or at blog posts to find many more instances of people who have very strong opinions about the importance of following particular grammar rules.
So far, we have looked at ârulesâ as doing two different things: 1) describing implicit, naturally occurring language patterns and 2) prescribing specific, socially accepted forms of language. Although both descriptive and prescriptive perspectives make reference to language rules, prescriptive rules attempt to dictate language use while descriptive rules provide judgment-free statements about language patterns. Both prescriptive and descriptive aspects of language are useful. When writing an academic paper or formal letter, certain language conventions are expected. A prescriptive rule can provide useful guidelines for effective communication. However, descriptive approaches can be useful in uncovering patterns of language that are implicit (as in the examples described above). Descriptive approaches can also be used to see how prescriptive rules are followed by language users.
The concept of language rules raises another interesting question: Why are these rules sometimes followed and sometimes âviolatedâ? Consider the prescriptive infinitive rule described above. Is it accurate to say that those who write to not attend are not following a rule? In some respects, this may be the case, but there is anotherâperhaps somewhat misunderstoodâissue related to language that deserves some attention and serves as a basis for this book: the role of language variation. It is an incontrovertible fact that language varies and changes. The type of English used in the British Isles is quite different from the type of English used in the United States. The type of English used in the British Isles or the United States also varies from region to region or among people from different socio-economic classes. The type of English used 150 years ago in the United States is quite different from the type of English used in the United States today. Language even changes and varies in a single person. The study of language variation seeks to gain an understanding of how language changes and varies for different reasons and in different contexts. There are different perspectives on how to investigate and understand language variation. The perspective that we will take is, as you can tell from the title of the book, related to an area of language study called corpus linguistics.
1.2 What Is Corpus Linguistics?
One way to understand linguistic analysis and language is through corpus linguistics, which looks at how language is actually used in certain contexts and how it can vary from context to context. While understanding variation and contextual differences is a goal shared by researchers in other areas of linguistic research, corpus linguistics describes language variation and use by looking at large amounts of texts that have been produced in similar circumstances. The concept of a âcircumstanceâ or âcontextâ or âsituationâ depends on how each researcher defines it. Corpus linguistic studies have frequently noted the general distinction between two different modes of language productionâwritten language and spoken language. From a written perspective, one may be interested in contexts such as news writing, text messaging or academic writing. From an oral perspective, one may be interested in language such as news reporting, face-to-face conversation or academic lectures. Although text messaging and academic writing are both written, the purpose of text messaging is quite different from the purpose of academic writing and we would expect, therefore, some degree of language variation in these different written contexts. The same may be said with face-to-face conversation and academic lectures; both are spoken but they have different purposes and consequently have different linguistic characteristics. More generally, we might also expect that spoken language (in all of its various purposes and contexts) would likely differ from written forms of language. Spoken language does not generally have the same type of planning and opportunities for revision that we find in many types of written language. We will consider how different circumstances (or situational variables) can affect language use in the following chapter. But, before we do, we would like to briefly describe what we mean by a corpus.
A corpus is a representative collection of language that can be used to make statements about language use. Corpus linguistics is concerned with understanding how people use language in various contexts. A corpus is a collection of a fairly large number of examples (or, in corpus terms, texts) that share similar contextual or situational characteristics. These texts are then analyzed collectively in order to understand how language is used in these different contexts. The result of this analysis is a collection of language patterns that are recurrent in the corpus and either provide an explanation of language use or serve as the basis for further language analysis. One common method used in corpus research is to look at the environment of a particular word or phrase to see what other words are found (i.e., âcollocateâ) with the reference word. As an example, we will use the Corpus of Contemporary American English (available at http://corpus.byu.edu/coca/), a publically available collection of over 450 million words of American English, to investigate the use of two words: equal and identical.
In many respects, equal and identical can mean the same thing (two things that are similar to each other), and they are often taken as synonyms of one another. For example, we can use both of these words in a sentence such as: These two students are equal/identical in their performance on the exam with the same general meaning. If we were asked to define the word equal we may use the word identical in our definition (and vice versa). However, if we use a corpus and look at how these words are actually used, a different picture emerges. The Corpus of Contemporary American English (COCA) shows us that, although they may sometimes be synonyms, these two words behave very differently. We are more likely to use expressions such as equal opportunity, equal rights, and equal protection rather than identical opportunity, identical rights, or identical protection. We are not likely to talk about equal twins or equal copies but instead use the phrase identical twins and identical copies. A consideration of the words that follow equal and identical suggest that equal is more likely to modify abstract concepts such as opportunities, rights, and protection while identical is more likely to modify concrete nouns such as twins, items, and houses. Without reference to large amounts of texts, we would likely not be able to make such an observation. This is one example of how corpus linguistics can provide information about language use that can help linguists understand how language is actually used in authentic contexts.
Additionally, the corpus can tell us about frequency differences between equal and identical (see Table 1.1). The top five collocates of equal occur between 950 and 405 times in the COCA corpus and the top five collocates of identical occur between 417 and 20 times in the corpus. In other words, we can see that the word equal is more frequent than the word identical because the frequency of collocates shows a large difference between the two words. In fact, the word equal occurs 22,480 times in the corpus, and the word identical occurs 8,080 times.
In addition to information on collocation and frequency, a corpus will also allow us to examine the extent to which certain types of prescriptive rules are followed. Let us look at what a corpus might tell us about splitting infinitives. Earlier in this chapter, we saw that this rule can raise the ire of some peopleâto the point of associating some serious character flaws in those writers who do not follow it. The Corpus of Contemporar...