Translation-Driven Corpora
eBook - ePub

Translation-Driven Corpora

Corpus Resources for Descriptive and Applied Translation Studies

  1. 244 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Translation-Driven Corpora

Corpus Resources for Descriptive and Applied Translation Studies

Book details
Book preview
Table of contents
Citations

About This Book

Electronic texts and text analysis tools have opened up a wealth of opportunities to higher education and language service providers, but learning to use these resources continues to pose challenges to scholars and professionals alike. Translation-Driven Corpora aims to introduce readers to corpus tools and methods which may be used in translation research and practice. Each chapter focuses on specific aspects of corpus creation and use. An introduction to corpora and overview of applications of corpus linguistics methodologies to translation studies is followed by a discussion of corpus design and acquisition. Different stages and tools involved in corpus compilation and use are outlined, from corpus encoding and annotation to indexing and data retrieval, and the various methods and techniques that allow end users to make sense of corpus data are described. The volume also offers detailed guidelines for the construction and analysis of multilingual corpora.

Corpus creation and use are illustrated through practical examples and case studies, with each chapter outlining a set of tasks aimed at guiding researchers, students and translators to practice some of the methods and use some of the resources discussed. These tasks are meant as hands-on activities to be carried out using the materials and links available in accompanying downloadable resources. Suggested further readings at the end of each chapter are complemented by an extensive bibliography at the end of the volume.

Translation-Driven Corpora is designed for use by teachers and students in the classroom or by researchers and professionals for self-learning. It is an invaluable resource for anyone interested in this fast growing area of scholarly and professional activity.

Frequently asked questions

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes, you can access Translation-Driven Corpora by Federico Zanettin in PDF and/or ePUB format, as well as other popular books in Languages & Linguistics & Linguistics. We have over one million books available in our catalogue for you to explore.

Information

Publisher
Routledge
Year
2014
ISBN
9781317639848
Edition
1

1. Introduction

Electronic texts and text analysis tools have opened up a wealth of opportunities to higher education and language services providers, but learning to use these resources continues to pose challenges to scholars and professionals alike. This book is concerned with the creation of electronic text corpora and their exploitation in descriptive and applied translation research. As such, it takes a broad perspective on the use of technologies for analyzing texts, ranging from the applications of corpus linguistics to translation studies to the use of corpora in translator training as well as the use of corpus resources by translators, language services providers and computational linguists.
Almost 20 years have passed since Mona Baker’s (1993) seminal article on the application of insights from corpus linguistics to translation studies. Since then, corpus methodologies have become almost mainstream in descriptive translation studies, and corpus-based language instruction has become a standard component in many translator training university courses. Concurrently, computational applications increasingly based on corpora such as translation memories (TMs) and machine translation (MT) systems have become part of life for all language services providers, not only for specialized and technical translators. Monographs and collected volumes have appeared on both theoretical and descriptive aspects (e.g. Laviosa 2002; Olohan 2004; Anderman and Rogers 2008) and pedagogical applications (e.g. Bowker and Pearson 2002; Zanettin et al. 2003; Beeby et al. 2009; Tengku Mahadi et al. 2010), and monographs on corpus linguistics often include sections on translation and contrastive linguistics (e.g. Tognini-Bonelli 2001; Meyer 2002; Hunston 2002). The online Translation Studies Abstracts database (TSA Online) lists over 800 entries in the Corpus-Based Studies category.
In this volume, corpus creation and use are illustrated through practical examples and case studies, with each chapter outlining a set of tasks aimed at guiding researchers, students and translators to practise some of the methods, and use some of the resources discussed. These tasks are meant as hands-on activities to be carried out using the materials and links available in the accompanying DVD. Suggested texts for further reading at the end of each chapter are complemented by an extensive bibliography at the end of the volume.
The main focus is on the creation and use of corpus resources by researchers and scholars as well as university students following advanced translator training and translation studies courses. However, the volume may prove of interest not only in translation-oriented academic settings but also to language and translation professionals. While some familiarity with translation studies research may be helpful, it is not taken for granted. No knowledge of corpus linguistics is assumed.

1.1 Book outline

Following this Introduction, the book is divided into six main chapters, each focusing on specific aspects of corpus creation and use, and containing a number of practical tasks and a list of suggested further reading and links to online corpus resources. The book can be read sequentially and basic concepts are defined as they are encountered. However, each chapter has an autonomous structure and some topics, tools and methods are discussed or mentioned in more than one place. In these cases, the reader is consistently referred to the other places in the volume where these aspects are discussed. All chapters include a Tasks section inviting researchers, students and translators to practise some of the methods discussed and use the materials. The tasks are related to the examples presented in each chapter, and they are meant as hands-on activities to be carried out on the accompanying DVD (see section below). Practical activities are reproduced in print for the reader’s convenience and ease of reference, and it is assumed that users will have access to online computing facilities in order to carry out (part of) the tasks.
Chapter 2 offers an introduction to corpora and applications of corpus linguistics methodologies to translation studies. The various types of corpora used in descriptive and applied translation research are presented, and examples from a number of corpus-based projects are surveyed and discussed. A typology of corpus-driven corpora is sketched out, starting from a variety of corpora used in descriptive research. These usually contain two or more subcorpora which are compared in order to find similarities and differences between source and target texts or languages, to isolate potential distinguishing features of translated texts or languages, or to study translation styles and genres. Some studies investigate varieties of translated language produced by specific types of language users such as interpreters, translation trainees or language learners. There follows a brief overview of the types of corpora typically used in translation teaching and learning, namely large monolingual corpora and small, ad hoc, disposable do-it-yourself (DIY) corpora, either monolingual or bilingual. This general introduction to translation-related corpus typology and corpus-based research ends with a short overview of the use of corpora in machine-assisted translation and computational linguistics.
Three different tasks are proposed as a way to practise the research methodologies discussed. The first consists in replicating a piece of research using the Translational English Corpus (TEC) hosted at the University of Manchester and accessible online. In the second, the same techniques and procedures are employed to compare research findings with those obtained from COMPARA, a bilingual, bidirectional parallel Portuguese-English corpus hosted in Lisbon and also accessible online. In this experiment, only the English language components (both translations and non-translations) of the corpus are selected and used. Finally, the online interface of the Learner Translation Corpus (LTC) is examined. This contains original and translated texts in many European languages, produced by both translation trainees and professional translators.
Chapter 3 deals with corpus design and acquisition. After the main phases of corpus construction are introduced, various issues regarding the size and composition of corpora are discussed. The evaluation of the internal composition of a corpus in relation to its size is required when assessing representativeness and comparability. This is especially relevant for studies based on translation-driven corpora, which usually involve a comparison of findings derived from subcorpora in the same and different languages. To exemplify the implications of decisions that can be taken when designing a corpus, a detailed case study is presented. This concerns the design of CEXI, an Italian-English bilingual bidirectional parallel and translation-driven corpus.
Ideal criteria for corpus design often need to be adjusted to practical constraints such as project funding, copyright restrictions or lack of appropriate corpus material or tools. These considerations lead to and examination of the implications of creating corpora from the Web, which contains enormous quantities of textual material already available in electronic format. First, the Web is examined as a ‘surrogate corpus’ with respect to issues of size and representativeness. The tools available to use it as a language rather than a content resource are also discussed.
This is followed by an analysis of the Web as a source of corpus data. Corpus linguists and translation researchers, as well as translation practitioners, can in fact create monolingual corpora as well as bi- and multilingual comparable ones, both general and specialized, by downloading and processing Web documents retrieved using Internet search engines and directories. Such corpora can also be created through semi-automatic routines implemented by ad hoc programs and online services. Further issues concerning the design and acquisition of bilingual and multilingual corpora, both parallel and comparable, are explored in relation to corpus alignment and processing in Chapter 6.
The tasks presented in Chapter 3 involve the drafting of a corpus creation project, and the design of two DIY Web corpora. It is up to the individual reader to decide upon the precise nature of the project. A grid for outlining a corpus building project is provided to guide the prospective corpus developer through all the main stages. This task is followed up in Chapter 6 where the reader will be asked to reconsider some of the issues addressed, with a focus on the construction of multilingual corpora. The two DIY corpora will be created in one case by manually sifting results from Internet searches (in English), in the other by semi-automatically compiling a bilingual comparable corpus.
Chapter 4 goes through the different stages of corpus compilation and use, from corpus encoding and annotation to indexing and data retrieval, focusing on methods and standards for the annotation of robust corpora to be used in descriptive translation studies. It is suggested that common encoding standards should be adopted by the research community, and that a modular approach accommodating different layers of annotation can be used to encode different textual features. This approach is illustrated by a short introduction to some existing standards for corpus annotation, i.e. the Text Encoding Initiative (TEI) guidelines and the XML Corpus Encoding Standard (XCES). A model header is presented, followed by a summary introduction to how structural and linguistic annotation can be recorded in an XML TEI conformant document. Different layers of annotation can also be stored by implementing a model in which annotation is kept separate from the running text. This is illustrated through examples of annotation from the Learner Translation Corpus (LTC), first introduced in Chapter 2.
The practical tasks designed for this chapter allow the user to create and search a single-document corpus. First, an XML TEI conformant document is created from a source PDF file by manually marking up documentary and structural information in the text. The document is then linguistically annotated, validated and indexed, and finally the very small corpus created is explored through a couple of sample searches. The different pieces of software used to process the text at various stages (text conversion, manual and automatic annotation, indexing, text retrieval) are freely available on the Web and partly included on the accompanying DVD.
Chapter 5 offers an overview of software tools which can be used to create, manage and analyze corpora, and describes methods and techniques which allow end users to make sense of corpus data. After a discussion of the hardware and software requirements which have to be met in order to successfully carry out the various stages of corpus construction, the chapter focuses on corpus analysis. Basic corpus analysis tools and techniques as well as more advanced ones are presented and illustrated through practical examples, in order to show how they can be used to investigate lexical patterning. The concepts of collocation, colligation, semantic preference and semantic prosody are also introduced and briefly discussed. Like Chapter 3, this chapter focusses on monolingual corpora and subcorpora. The tools and techniques described provide the background for a more detailed discussion on the construction and analysis of multilingual corpora in the following two chapters.
Practical examples of how to investigate phenomena such as collocation and semantic preference using corpus analysis software are provided in the Tasks section. The reader will be shown how to create, manipulate and explore wordlists and concordances using as data the bilingual comparable corpus created in Chapter 3, and a text-only version of the Open American National Corpus (OANC) (provided on the DVD). These tasks can be carried out using freely available text analysis software (copies on the DVD) or commercial software. More advanced computational tools are tried out in order to investigate lexicogrammatical relations through the analysis of word clusters and word profiles.
Chapter 6 focusses on the creation and use of bilingual parallel corpora. After a discussion of the terms and concepts of comparable and parallel corpora, it provides a survey of procedures and tools for the alignment of parallel corpora at ‘sentence’ level, which illustrates issues in parallel corpus processing. Various aspects are exemplified through reference to the creation of various parallel corpora. The OPUS collection of parallel multilingual corpora is presented as a case study of tools and procedures that can be used to build an aligned version of parallel corpora. There follows an examination of the difference between parallel corpora and translation memories, and a discussion of ‘word alignment’ in both comparable and parallel corpora.
The tasks in this chapter include the alignment of a parallel corpus using texts provided on the DVD and already partially processed in a previous task. Two different alignment programs (both available on the DVD) are used to align three text pairs of different length and processing ease. The alignment of the three parallel texts involves different approaches to automatic alignment and different degrees of interaction between the user and the alignment application. Readers may either start the process from scratch, after selecting parallel texts of their choice and performing basic preparatory processing, or use the files in plain text format available on the DVD, and to which the activities described refer. A further task consists in revising the corpus building project outlined in Chapter 3, in light of the information acquired in the following chapters and using the checklist provided.
Chapter 7 deals with tools and techniques for using multilingual corpora in descriptive and applied translation studies. It also addresses issues concerning the display and analysis of parallel concordances. After an overview of some of the tools which can be used to search parallel corpora and retrieve parallel concordances, two case studies are presented to illustrate the types of analysis which can be carried out with parallel corpora, depending on the level of annotation and on the software used for retrieving and displaying parallel concordances. First, the methodologies that can be adopted for investigating the descriptive features of translated texts are illustrated through examples from a parallel corpus comprising some novels of Salman Rushdie and their Italian translations. These are analyzed using the ParaConc parallel concordancer. Then, a contrastive analysis of the words ‘eye’ and occhio is carried out using the search interface to the OPUS Multilingual Word Alignment Database. Finally, the use of multilingual corpora as resources for professional translators is briefly examined. It is argued that comparable and parallel corpora can help translators deal with translation problems for which they may not find a solution elsewhere.
The tasks for this chapter include hands-on explorations of two different parallel corpora: the small literary English-Italian parallel corpus created and aligned in the Tasks section of Chapter 6, and the multilingual parallel corpus Europarl, which contains several hundred million words of the EU parliamentary proceedings from 1996 onwards in 20 languages. The corpus of literary texts is searched using the Demo version of ParaConc included on the DVD, while the Europarl corpus is searched using the online OPUS online multilingual search interface.
The concluding chapter looks at foreseeable developments of applications of computer technology to the retrieval of textual and linguistic information in electronic texts of relevance for descriptive and applied translation studies. Some recommendations are also made as to possible future developments of corpus-based projects in translation research. A reference section and a names and subjects index are appended at the end of the volume.

1.2 How to use the DVD

The DVD is divided into three main sections:
Tasks
Software
Texts and corpora
Each Tasks section contains directions on how to carry out the activities and links to the software applications, online services, texts and corpora needed. Hands-on activities can be done individually or in pairs. It is advisable to open external links in separate tabs or pages, so as to always keep directions in view. Users should also save their texts and working files in a personal folder (and subfolders if necessary) on their computer or on a server.
The Software section contains an index to the programs which are needed to carry out the activities in the Tasks sections. Copies of these programs, on which the activities proposed in the Tasks sections have been tested, are stored on the DVD. These programs are available as either freeware, shareware or demo...

Table of contents

  1. Cover
  2. Title Page
  3. Copyright Page
  4. Dedication
  5. Table of Contents
  6. List of figures and tables
  7. Acknowledgements
  8. 1. Introduction
  9. 2. Corpus linguistics and translation studies
  10. 3. Corpus design and acquisition
  11. 4. Corpus encoding and annotation
  12. 5. Corpus tools and corpus analysis
  13. 6. Creating multilingual corpora
  14. 7. Using multilingual corpora
  15. 8. Conclusions
  16. References
  17. Index