PART I
Corpus-based translation studies
Introduction
Federico Zanettin
The first section of this volume collects five articles written between the early 1990s and the early 2000s, and which together articulate a vision of the application of corpus linguistics insights and techniques to the study of translation. This vision proved to be rather prolific, since hardly any article on corpus-based translation studies fails to refer to one or more of these pioneering publications, which trace the birth as well as the first developments of one of the fastest-growing subfields in translation studies (Zanettin et al. 2015).
The first three articles sketch out a rationale and possible research lines, while the last two further discuss methodological aspects of the application of corpus tools and resources in translation studies. The first article, published in 1993, appeared in a Festschrift volume honoring John Sinclair, the late British linguist and founder of the COBUILD project, which revolutionized contemporary lexicography and opened the way to a new type of corpus-based dictionaries. Perhaps not coincidentally, the article by Mona Baker similarly opened the way to what has since been known as corpus-based translation studies, or CTS. Bakerâs article was not, strictly speaking, the first publication to discuss translation as an object of investigation using corpus linguistics methodologies; earlier publications include Lindquist (1984) and Gellerstam (1986). It was, however, the first to depart from a contrastive linguistics approach and consider corpora from a translation studies perspective. In Bakerâs approach, corpus linguistics is used to explore features which are âlinked to the nature of the translation process itself rather than to the confrontation of specific linguistic systemsâ (1993: 243), and translation is seen as a variety of language worthy of being studied because of its specificities, rather than as a deviant, distorted version of ârealâ, ânaturalâ language.
The article is divided into two parts, the first providing an overview and state-of-the-art of translation studies, at the time still an emerging discipline, the second offering some âsuggestions for future researchâ, ones that indeed spawned much research in the decades that followed. Baker notes how debates on the notion of equivalence evolved from a search for isomorphism and semantic sameness across languages to an expansion of the concept and to various classifications of how different levels and types of equivalence may be realized. A move away from an essentialist view of meaning towards an understanding of meaning as function in context, in the British tradition of linguistics which extends from J.R. Firth to M.A.K. Halliday and John Sinclair, can be seen as paralleling the shift from source-oriented notions of equivalence to target-oriented approaches, with the latter stressing the role and function of translation in the receiving culture. This shift occurred both in relation to literary translation, where the position of translated text as part of the target system was underlined by descriptive translation studies, and in relation to non-literary translation, where the role of the receiving culture was underlined by functionalist theories. Especially important is the connection with polysystem theory and the notion of norms introduced by Gideon Toury, which presupposes a view of literature as a conglomerate of systems, and the need to study translated texts as part of such systems rather than in isolation and only as compared to the source text. Thus, while on the one hand the status of the source text was being undermined, translation was increasingly seen as a non-derivative activity, both by âsystemicâ scholars (Hermans 1999) and by scholars advocating a higher visibility of translation and translators (Venuti 1995).
Baker notes that âthe notion of norms is very similar to that of typicality, a notion which has emerged from recent work on corpus-based lexicographyâ (1993: 239). She argues that, since norms can only be investigated on the basis of a corpus of texts rather than on the basis of individual texts, computerized corpora and corpus linguistics tools and methods make it possible to overcome the limitations inherent in the manual investigations of printed collections of text, which had led Toury to state that âwe are in no position to point to strict statistical methods for dealing with translational normsâ (Toury 1978: 96). Baker suggests several research lines which could be fruitfully pursued with the help of electronic corpora; thus, corpora may help unearth sociocultural norms, for instance whether various target cultures may at times consistently show different attitudes towards the use of loan words. The evolution of translations through different stages and versions, including revision and editing by different agents, is another aspect that may be investigated by corpus linguistics techniques. However, among Bakerâs suggestions for research avenues, what she then termed âuniversal features of translationâ have indeed attracted most attention. Numerous articles have since been written on this topic, and the very notion of âuniversals of translationâ, as they have sometimes also been called, has faced intense scrutiny (see e.g. Mauranen & Kujamäki 2004). The debate has led to reclassifications of supposed translation universals as well as to their redefinition and questioning, including by Baker herself (Baker 2007), or to their outright rejection as a viable theoretical framework.
The issue of universality aside, Baker suggested that computerized corpora can help explicate the phenomenon of translation, that is whether translated texts share characteristics which are the result of translation being a distinct and distinctive practice. Drawing on previous literature, she lists several features which have been posited to pertain to all translated texts, irrespective of source language or text type. These include a supposed tendency of translated texts to be more explicit, simpler and more conventional than non-translated texts, as well as to avoid repetition and to demonstrate a distinct distribution of conventional TL (target language) features, for instance of specific cohesive patterns. While still presented as âa very tentative list of suggestionsâ (1993: 343), Baker argues that corpora have, in fact, the potential of addressing a number of descriptive hypotheses in translation studies, and that cumulative findings may allow scholars to make generalizations that are otherwise very difficult to arrive at.
While the 1993 article was included in a collection of articles on linguistics and addressed corpus linguists, the essay published in 1995 in the journal Target was addressed to the translation studies community. Baker points out the potential of corpora in theoretical and pedagogical areas of translation studies, and after noting how corpora are already firmly established as a basis for new developments in terminology and machine translation, devotes some space to defining and explaining the main terms and concepts of corpus linguistics. Specifically, after introducing corpora, concordances and basic statistical counts, she proposes a number of criteria for classifying different types of corpora for translation studies purposes. In addition to multilingual corpora and parallel corpora, the latter including translations together with their source texts, she introduces a new variety, which is comprised of two separate collections of texts in the same language, one composed of original texts and one consisting of texts translated into that language. The two corpus components must be comparable along a set of criteria. This is the type of corpus which, Baker argues, can support research into norms and distinctive features of translation.
Having described corpus composition, Baker provides examples of possible investigations which could be carried out in order to test hypothesized features; thus, a higher frequency of occurrence of the optional âthatâ in reported structures in translated English texts, as compared to a corpus of non-translated English texts, may provide support to the explicitation hypothesis, while a lower type/token ratio and lexical density, i.e. âthe percentage of lexical as opposed to grammatical itemsâ (1995: 237), may be seen as supporting the simplification hypothesis.
Aspects of corpus creation and of features to be investigated are discussed further in the 1996 essay. As far as corpus resources are concerned, Baker focuses on the creation of a comparable monolingual English corpus. She describes the specific parameters being implemented and documented in what became known as the Translational English Corpus (TEC), and its non-translational counterpart, assembled by selecting a comparable subcorpus of English non-translated texts. Baker stresses that the careful selection and documentation of texts belonging to different text types, from different languages and by different authors and translators, would allow scholars not only to search for assumed distinctive features of translation, but also to focus on and make sense of atypical patterns and unusual examples. While in this article Baker brings to sharper focus four possible specific features of translated texts per se, namely explicitation, simplification, normalization/conservatism and levelling out, she is well aware that these are rather vague and abstract concepts, and that the real challenge consists in devising techniques for isolating the surface expressions that constitute concrete manifestations of such higher-level features. She suggests, for instance, that among the features which could be seen as pointing to translated texts being more explicit than non-translated texts is the overuse of explanatory vocabulary and conjunctions. In order to detect whether translated texts are simpler and easier to read than non-translated texts, one could instead look not only at simple statistical features such as type/token ratio, lexical density and average sentence length, but also at the use of punctuation. As far as normalization/conservatism is concerned, she suggests taking into consideration the use of unmarked grammatical structures, punctuation and collocational patterns, which might be more frequent in translated texts than in non-translated texts. The fourth feature discussed, levelling out, would be confirmed if translated texts were found to be more homogeneous and similar to each other than non-translated texts are among themselves, so that by looking at statistical features such as lexical density and type/token ratio it could be shown that âtranslation tends to pull various textual features towards the centre, to move away from extremesâ (1996: 184).
While these first articles projected possible investigations at a time when not much was available in terms of suitable corpus data, the article published in 2000 takes stock of the Translational English Corpus (TEC), whose construction was by this time well under way in Manchester, using a comparable subcorpus of texts extracted from the British National Corpus (BNC) as a reference point. However, this time Baker explores a different path, providing tentative examples of how such a corpus resource can be used to investigate individual style in literary translation, rather than norms and distinctive features per se. Baker argues that while the concept of style in translation had been usually treated as a matter of quality assessment, that is as an evaluation of how well a translation renders the style of the original, style of translation can be described as referring to the characteristic use of language by individual translators, their âprofile of linguistic habitsâ (2000: 246) as manifested in often subconscious ârecurring patterns of linguistic behaviourâ (ibid.), as compared to other translators. Baker suggests that the concept of style of translation should not, however, be restricted to differences in linguistic patterning, which may also be carried over, at least partly, from the source language and author, and that the very choice of themes, literary genres and texts could be included in the notion of (literary) translator style. Baker highlights how, while the study of corpora may appear to be driven by exclus...