- 793 pages
- English
- PDF
- Available on iOS & Android
Corpus Linguistics. Volume 1
About This Book
This volume provides an up-to-date survey of the field of corpus linguistics, a field whose methodology has revolutionized much of the empirical work done in most fields of linguistic study over the past decade.
Corpus linguistics investigates human language by starting out from large collections of texts - spoken, written, or recorded. These language corpora, which are now regularly available in electronic form, are the basis for quantitative and qualitative research on almost any question of linguistic interest. Many techniques that are in use in corpus linguistics today are rooted in the tradition of the late 18th and 19th century, when linguistics began to make use of mathematical and empirical methods. Modern corpus linguistics has used and developed these methods in close connection with computer science and computational linguistics.
The handbook sketches the history of corpus linguistics, shows its potential, discusses its problems, and describes various methods of collecting, annotating, and searching corpora as well as processing corpus data. It also reports case studies that illustrate the wide range of linguistic research questions addressed in corpus linguistics. The over 60 articles included in the handbook are divided into five sections:
(1) the origins and history of corpus linguistics and surveys of its relationship to central fields of linguistics
(2) corpus compilation
(3) corpus types
(4) preprocessing of corpora
(5) the use and exploitation of corpora.
The final section gives an overview of the results of corpus studies obtained in phonetics, phonology, morphology, syntax, semantics, sociolinguistics, historical linguistics, stylometry, dialectology, and discourse analysis. It also reports on recent advances made in human and machine translation, contrastive studies, computer-assisted language learning, and automatic summarization.
The contributors to the volume are internationally known experts in their respective fields. The handbook is intended for a wide audience ranging from teachers, university students, and scholars to anyone interested in the use of computers in linguistic analyses and applications.
Frequently asked questions
Information
Table of contents
- Frontmatter
- Contents
- 1. Pre-electronic corpora
- 2. Early generative linguistics and empirical methodology
- 3. Some aspects of the development of corpus linguistics in the 1970s and 1980s
- 4. Corpus linguistics and historical linguistics
- 5. Theory-driven and corpus-driven computational linguistics, and the use of corpora
- 6. Corpus linguistics and sociolinguistics
- 7. Corpora and language teaching
- 8. Corpus linguistics and lexicography
- 9. Collection strategies and design decisions
- 10. Text corpora
- 11. Speech corpora and spoken corpora
- 12. Multimodal corpora
- 13. Treebanks
- 14. Historical corpora
- 15. Learner corpora
- 16. Parallel and comparable corpora
- 17. Corpora of computer-mediated communication
- 18. Web linguistics
- 19. Large text networks as an object of corpus linguistic studies
- 20. Well-known and influential corpora
- 21. Corpora of less studied languages
- 22. Annotation standards
- 23. Development of tag sets for part-of-speech tagging
- 24. Tokenizing and part-of-speech tagging
- 25. Lemmatising and morphological tagging
- 26. Sense and semantic tagging
- 27. Corpora for anaphora and coreference resolution
- 28. Syntactic preprocessing
- 29. Pragmatic annotation
- 30. Preprocessing speech corpora: Transcription and phonological annotation
- 31. Preprocessing multimodal corpora
- 32. Preprocessing multilingual corpora
- 33. Searching and concordancing
- 34. Searching treebanks and other structured corpora
- 35. Linguistically annotated corpora: Quality assurance, reusability and sustainability