Part I
Short Introductions to Corpus-Based Sociolinguistics and the BNC2014
1 Corpus Linguistics and Sociolinguistics
Introducing the Spoken BNC2014
Vaclav Brezina, Robbie Love and Karin Aijmer
1.1 Sociolinguistics Meets Corpus Linguistics
Systematic, large-scale exploration of sociolinguistic features in everyday language use has been made possible by the availability of corpora representing informal speech, such as the demographically sampled spoken component of the British National Corpus (the Spoken BNC1994DS) and indeed the new Spoken BNC2014 (see Section 1.2). These corpora include rich metadata about social characteristics of the speakers and a large volume of data, which can be analysed using different techniques. Sociolinguistic exploration of large corpus data is, however, not without its challenges (e.g., Brezina & Meyerhoff, 2014). Language represents a dynamic system with variation occurring simultaneously at multiple levels, reflecting both conscious and unconscious choices by speakers as well as the requirements of the mode of communication, genre and a specific linguistic context (see Chapter 3 in this volume). Capturing socially meaningful variation is therefore a difficult task, requiring a good understanding of social and linguistic processes as well as familiarity with the dataset. The analysis often needs to shift between showing general patterns in the data and providing specific examples of language use to arrive at an interpretation that does justice to the complexity of the data. Bringing corpus linguistics and sociolinguistics together (cf. Baker, 2010) to investigate current spoken British English creates a unique opportunity to gain insight into everyday language use of people from different parts of the UK and different âcornersâ of society. It is a fascinating exploration to which this volume intends to contribute.
1.2 The Spoken BNC2014: Full Dataset and Sample
For over twenty years, the British National Corpus (BNC) has been one of the most widely known corpora used as a representative sample of current British English. Focusing on the five-million-token Spoken BNC1994DS, Love, Dembry, Hardie, Brezina, and McEnery (2017) show that no other orthographically transcribed spoken corpus compiled since its release has matched it in its size, representativeness or usefulness. However, as Love, Dembry, Hardie, Brezina, and McEnery (2017) argue, a new dataset reflecting current usage is needed to better serve the requirements of the research community than the aging Spoken BNC1994DS.
The Spoken BNC2014 is a response to this need. Publicly released in September 2017, initially via CQPweb (Hardie, 2012), the corpus is a result of collaboration between the ESRC Centre for Corpus Approaches to Social Science (CASS)1 at Lancaster University and Cambridge University Press (CUP).) Love, Dembry, Hardie, Brezina, and McEnery (2017) describe in greater detail how the Spoken BNC2014 was designed and built within the Lancaster/Cambridge partnership; the BNC2014 user guide (Love, Hawtin, and Hardie 2017) includes information about the structure of the full 11.5-million-word corpus.
The studies in this volume are based on a five-million-token sample of the Spoken BNC2014 data, referred to as the Spoken BNC2014 Sample (Spoken BNC2014S), which contains transcripts from conversations recorded between 2012 and 2015. The Spoken BNC2014S was made available on a competitive basis to the authors of this volume, who focused on a variety of sociolinguistic applications (see Section 1.3). The Spoken BNC2014S consists of 4,784,691 tokens (including punctuation), approximately 60% of which were produced by female speakers and 40% by male speakers. A wide range of age groups are represented in the dataset, with the largest proportion (41%) in the data being produced by speakers between 19 and 29. Information is also available about the speakerâs socio-economic status and region. A detailed break-down of these categories is provided in the Appendix at the end of this chapter.
1.3 Sociolinguistic studies of the Spoken BNC2014
This volume offers four short theoretical/methodological pieces and eight empirical studies. It demonstrates a corpus-based sociolinguistic approach to the Spoken BNC2014 and provides a snapshot of sociolinguistic variation in spoken British English in the 2010s, often contrasted with the situation in the 1990s. The volume is divided into three broad sections: (i) Introductions, (ii) Discourse, Pragmatics and Interaction, and (iii) Morphosyntax.
I Introductions to Corpus-Based Sociolinguistics and the Spoken BNC2014
In addition to this introduction, the first section of this volume comprises three short contributions, which offer a reflection about the state of the art in corpus linguistics and sociolinguistics and provide context to the empirical chapters that follow. McEnery offers a compelling account of the major design decisions when building the Spoken BNC2014; this chapter lays out principles of spoken corpus design and highlights the importance of data in corpus linguistics. Busseâs contribution outlines different sociolinguistic perspectives on British English with the focus on the current debates in the field. Finally, Hardie highlights some of the main features to be found in CQPwebâthe online corpus analysis system which hosts the Spoken BNC2014.
II Discourse, Pragmatics and Interaction
This section is devoted to studies dealing with language use in context and the dynamics of discourse. In Chapter 5, Culpeper and Gillings focus on a well-known stereotype about British politeness. Whilst politeness in Britain is often thought of as a monolithic phenomenon characterised by indirectness, there is an assumption in lay discourse that northerners are perceived as having very different politeness practices from southerners, practices which, broadly, are characterized by friendliness. The authors put this assumption to the test by selecting fourteen key British formulaic politeness expressions, each belonging to one of three different types of politeness (tentativeness, deference or solidarity), and then examining their frequencies in the combined north and south components of the Spoken BNC2014S and the Spoken BNC1994DS.
In Chapter 6, Aijmer draws attention to new and unusual intensifiers in present-day English which appear to be in the process of undergoing delexicalisation and grammaticalisation. The following intensifiers fit into this category of intensifiers: fucking, super, dead, real, well (good) and so(+NP), in their roles as intensifiers before adjectives. Aijmerâs method involves a comparison of the intensifiers in the Spoken BNC1994DS and Spoken BNC2014S.
The aim of Axelssonâs contribution (Chapter 7) is to provide an in-depth analysis of the frequencies and formal features of tag questions (including instances with innit) as well as their distribution across gender, age, dialect and socio-economic status. This study complements the evidence in her previous work, which is based on the BNC1994DS. The study thus explores diachronic change in informal discourse and its dynamics.
In Chapter 8, the final chapter in this section, Wong and Kruger examine structural categories derived from the number of words, non-words and partial forms that contribute to a backchannel. They seek to establish the factors that condition the selection of various backchannel structures in British English, using a multifactorial method. With the help of corpus annotation, they identify backchannel structures, and then use grammatical and speaker metadata associated with each utterance as predictors of backchannel choice.
III Morphosyntax
The final section in this volume is concerned with morphosyntactic features in British speech. Säily, GonzĂĄlez-DĂaz and Suomelaâs chapter (Chapter 9) is a contribution which investigates the use of adjective comparison. It focuses on the productivity of two comparative strategies in English: the inflectional -er and the periphrastic more strategy. The study builds on recent research using novel methodologies that shows sociolinguistic variation in the productivity of extremely productive derivational suffixes.
Jenset, McGillivray and Rundellâs contribution (Chapter 10) investigates English verbs whose argument structure preferences include the dative alternation (Give me the money/Give the money to me). Although this is a well-researched topic, most published work draws either on introspection or on data from written sources. Using contemporary unscripted spoken data will therefore take the research into fresh territory and will bring new insights about the dative alternation in spoken English with attention being paid to sociolinguistic variation.
Caines, McCarthy and Buttery (Chapter 11) investigate zero auxiliary use in progressive aspect interrogatives in spoken British English, e.g., You talking to me? Where we going? What you been doing? The authors outline the situation of the progressive aspect in English, including the zero auxiliary, and offer two comparable empirical studies on the use of zero auxiliary in British speech, one dealing with data from the 1990s, the other with 2010s data.
Finally, Paterson (Chapter 12) explores the use of untriggered reflexives in current British English from the sociolinguistic perspective, i.e., the use of untriggered reflexives by particular demographic groups (defined by age, gender, etc.). The analysis provides a snapshot of current usage of untriggered reflexives and facilitates comparison with the existing corpus-based research of this grammatical phenomenon.
Note
References
Baker, P. (2010). Sociolinguistics and corpus linguistics. Edinburgh: Edinburgh University Press.
Brezina, V., & Meyerhoff, M. (2014). Significant or random? A critical review of sociolinguistic generalisations based on large corpora. International Journal of Corpus Linguistics, 19(1), 1â28.
Hardie, A. (2012). CQPwebâcombining power, flexibility and usability in a corpus analysis tool. International Journal of Corpus Linguistics, 173, 380â409.
Love, R., Dembry, C., Hardie, A., Brezina, V., & McEnery, T. (2017). The spoken BNC2014: Designing and building a spoken corpus of everyday conversations. International Journal of Corpus Linguistics, 22(3), 319â344.
Love, R., Hawtin, A., & Hardie, A. (2017). The British national corpus 2014: User manual and reference guide (version 1.0). Lancaster: ESRC Centre for Corpus Approaches to Social Science.
Appendix:...