This is a test
- 202 pages
- English
- ePUB (mobile friendly)
- Available on iOS & Android
eBook - ePub
Book details
Book preview
Table of contents
Citations
About This Book
Corpus Linguistics for Grammar provides an accessible and practical introduction to the use of corpus linguistics to analyse grammar, demonstrating the wider application of corpus data and providing readers with all the skills and information they need to carry out their own corpus-based research.
This book:
- explores the kinds of corpora available and the tools which can be used to analyse them;
- looks at specific ways in which features of grammar can be explored using a corpus through analysis of areas such as frequency and colligation;
- contains exercises, worked examples and suggestions for further practice with each chapter;
- provides three illustrative examples of potential research projects in the areas of English Literature, TESOL and English Language.
Corpus Linguistics for Grammar is essential reading for students undertaking corpus-based research into grammar, or studying within the areas of English Language, Literature, Applied Linguistics and TESOL.
Frequently asked questions
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
Both plans give you full access to the library and all of Perlegoâs features. The only differences are the price and subscription period: With the annual plan youâll save around 30% compared to 12 months on the monthly plan.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, weâve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes, you can access Corpus Linguistics for Grammar by Christian Jones, Daniel Waller in PDF and/or ePUB format, as well as other popular books in Languages & Linguistics & Linguistics. We have over one million books available in our catalogue for you to explore.
Information
Part 1 Defining grammar and using corpora
DOI: 10.4324/9781315713779-2
Chapter 1 What is a corpus? What can a corpus tell us?
DOI: 10.4324/9781315713779-3
1.1 Introduction
Suppose you had an argument with a friend as to whether Sherlock Holmes ever said âelementary, my dear Watsonâ (he didnât!) and you wanted to prove your case; how would you go about doing it? One option would be to read all of the novels and short stories, but presumably you would have to get your friend to do the same to verify the truth of what you find. The other would be to turn to an electronic database that contained all of the Holmes stories and then search for the phrase. Essentially, this is how a corpus can help you.
This chapter will explain what a corpus is and why we may wish to consult one when trying to analyse grammatical and lexico-grammatical patterns. We will demonstrate what different types of corpora exist, including examples of various spoken and written corpora with different designs. We will then move on to an explanation of what information a corpus can provide us with and why we might want to use one to analyse areas such as frequency or grammatical patterns, to provide robust evidence of language in use. We will also examine how corpora have been used within the development of corpus-informed dictionaries and grammars. All the samples we use will be taken from open-access corpora (corpora on the internet that are free to access). By using resources that anyone can access, we aim to encourage the reader to look at these corpora for themselves.
1.2 What is a corpus?
A corpus is simply an electronically stored, searchable collection of texts. These texts may be written or spoken and may vary in length but generally they will be longer than a single speaking turn or single written sentence. They are normally measured in terms of the number of words they contain or to use a word common in most corpora, the number of tokens. Consider an analysis of the sentence above:
They are normally measured in terms of the number of words they contain or to use a word common in most corpora, the number of tokens.
This sentence has a total of twenty-six tokens in it.
We can also measure a corpus by the number of different word types it may contain, i.e. how many adjectives, how many verbs, etc. If we look at the sentence above, we can see how many different types there are in the sentence.
Pronouns: they Ă 2Verbs: are, measured, contain, useNouns: terms, number Ă 2, words, word, corpora, tokensAdjectives: commonAdverbs: normallyDeterminers: the Ă 2, a, mostPrepositions: in Ă 2, of Ă 3, toConjunction: or
Therefore there are twenty different types in the text.
Types and tokens can also be compared by dividing the number of types by the number of tokens, giving us a type:token ratio. In this case that is 20 divided by 26 Ă 100, which is a type token ratio of 76%. Obviously, in this example we have used only one sentence, which is a sample size that most researchers would not use. When looking at a corpus the type-token ratio simply allows a researcher to see how varied a collection of texts may or may not be; in general, the more types there are in comparison to the number of tokens, the more lexically varied the text.
Corpora vary enormously in size and there is no minimum limit on how many tokens they should contain or indeed no set maximum size. In general, written corpora tend to be larger due to the relative ease of locating and storing electronic texts and the time-consuming nature of transcribing spoken data. It is also fair to say that a small corpus can be just as effective as a large one, depending on the purpose for which it is used and the principles behind its construction, a point we shall go on to discuss in 1.3. However, at this stage, it is instructive to compare the size of many of the corpora we will use in this book, alongside some others that are commonly used by publishers. These details are shown in Table 1.1 and there is more information given on the open-access corpora in Chapter 2.
1.3 Different types of corpora and good corpus design
Corpora can be mono-modal (through one medium, typically text) or multi-modal (through more than one medium, typically text and video), as described by Adolphs and Carter (2013). Due to costs, most corpora are mono-modal, although increasingly multi-modal corpora are being developed (see Adolphs and Carter, 2013 for examples). According to Sinclair (1991), a corpus should consist of a principled collection of texts. This means that a corpus should contain texts that can provide answers to questions we want answers to.
Corpus name | Spoken/written or both | Number of tokens | Text types | Availability | Dates |
---|---|---|---|---|---|
Brigham Young University-British National Corpus (BYU-BNC) (Davies, 2004) | Both | 100 million | Newspapers, fiction, journals, academic books, published and unpublished letters, school and university essays, unscripted conversation, meetings, radio phone-ins and shows | Open-access (registration needed) | 1980 sâ1993 |
Corpus of Contemporary American English (COCA) (Davies, 2008) | Both | 450 million | Fiction, newspapers, magazines, academic texts, unscripted conversations | Open-access (registration needed) | 1990 â2012 |
Corpus of Global Web-Based English (GloWbe) (Davies, 2013) | Written | 1 .9 billion | Web pages from 20 English-speaking countries | Open-access (registration needed) | 2013 |
Vienna-Oxford International Corpus of English (VOICE) (Seidlhofer et al., 2013) | Spoken (English used as a Lingua Franca) | 1 million words | Interviews, press conferences, service encounters, seminar discussions, working group discussions, workshop discussions, meetings, panels, question-answer, sessions conversations | Open-access (registration needed) | 2008 â2011 |
Cambridge English Corpus (CEC) | Spoken and written | Multi-billion words | Learner English, business English, academic English, unscripted conversations | No general access | No dates given |
The Cambridge English Profile Corpus (CEPC) | Spoken and written (learner data) | 10 million words | Spoken and written texts from English language tests | Access to the English vocabulary profile available. Once complete, parts of the CEPC will be open-access | 2005 âpresent |
By way of example, if we wished to analyse the performance of learners in a set of English language tests, we would need samples of their written and spoken work from the tests to be able to make realistic statements about the language in use. We would also need to make decisions about whether to include students who pass or fail tests with a particular mark. Other variables we would need to acknowledge and control for are the age and nationalities of the candidates. If the test is taken by a range of nationalities, for example, we would need a sample of tests that give a representative sample of those nationalities. We would also need to make a decision about how many words (or tokens) to include. This should be based upon two aspects: what we intend to use the corpus for and, practically, how many texts we can collect in the time available to us.
In the hypothetical example of the corpus of tests, sihould we wsh to make statements about how a grammatical pattern is used across different levels, then clearly we would need a lot more words than if we wished to investigate how a particular pattern was used only in a written test at one particular level. Finally, we would need to decide upon the type of corpus we need. For example, a mono-modal corpus of texts would give us information about candidatesâ writing and speech but in the case of speech, we would be unable to comment upon their use of body language and how this acts to reinforce their message.
Try it yourself 1.1
Imagine you wish to construct a corpus to represent the following types of English and purposes. What types of texts would you need and approximately how large would each corpus need to be? A suggested answer is available at the back of the book.
- A corpus of British spoken academic English. Purpose: to discover the most frequent words used by lecturers.
- A corpus of Dickensâ fiction. Purpose: to discover the way lexical and grammatical patterns are used to reinforce themes.
- A corpus of written requests made by colleagues in a UK university. Purpo...
Table of contents
- Cover Page
- Half-Title Page
- Title Page
- Copyright Page
- Table of Contents
- Figures
- Tables
- Acknowledgements
- List of abbreviations
- Introduction
- Part 1 Corpus Linguistics for Grammar
- Part 2 Corpus Linguistics for Grammar
- Part 3 Corpus Linguistics for Grammar
- Suggested answers
- Corpus Linguistics for Grammar
- Index