1Developing multiple language versions of instruments for intercultural research
Sumru Erkut
Scientific and ethical concerns
The scientific issue in translating instruments is avoiding one of the threats to internal validity, commonly referred to as âinstrumentationâ (Campbell & Stanley, 1963; Shadish, Cook, & Campbell, 2002). The instrumentation threat can occur if different respondents receive a different version of the measure, making it invalid to infer that differences in answers are owing to the respondentsâ characteristics because the differences in the versions of the instrument are a viable alternative explanation. Attempts to achieve equivalence in different language versions of a measure address this instrumentation threat. For example, if a vocabulary test for young children were to use the word hair in the English version and translate it as cabello into Spanish for use with Puerto Rican children, the two language versions would have different levels of difficulty, thereby introducing an instrumentation threat. This is because whereas hair is widely used in English, the average Puerto Rican refers to hair as pelo. Cabello is more common among well-educated Puerto Ricans, which would raise concerns for a social class bias as well, an ethical concern if the vocabulary test were to be used to place children in academic tracks.
Taking on the scientific challenge translation poses to validity, Peña (2007) has framed the ethical issue in terms of fairness. The American Educational Research Associationâs (1999) definitions of fairness, articulated in Standards for Educational and Psychological Testing, include the notion of equal treatment in context and purpose of testing and comparable opportunity for all undergoing testing to demonstrate their abilities on the construct the test is intended to measure. If the translated version is different, scores on the test are not comparable across different language versions and using such scores to make educational decisions violates the principles of equal treatment and comparable opportunity. Arguing that it is possible to apply these principles to intercultural research, Peña has made an important contribution to the developmental literature on translating instruments into languages other than English. She draws attention to the need to go beyond linguistic equivalence to include functional equivalence, cultural equivalence, and metric equivalence to improve internal validity.
I will not review Peñaâs (2007) work but invite readers to read the original. Rather, I will expand on the fairness framework to encompass an ethical concern for avoiding a cultural bias when producing multiple language versions of instruments for intercultural research.
Potential for Western bias
There is a potential for bias when researchers from one language or culture group wish to measure some aspect of the psychological development of the members of a different group by using a translation of an instrument developed in the researchersâ culture. Foucault (1975) commented on the âclinical gazeâ to draw attention to power differences between the observer and observed, whereby the clinician views the observed through a lens that reflects the history, norms, and economic circumstances of the observerâs culture. Gergen, GĂŒlerce, Lock, and Misra (1996) have argued that when Western concepts and methods guide research, the resulting product can be of little relevance to other cultures and may disregard and undermine alternate cultural traditions. Greenfield (1994) explained this phenomenon as the product of psychologistsâ familiarity with their own culture when she argued that psychologists tend to base their intercultural research on an implicit understanding of the culture in which they grew up. Rogler (1999) has suggested that these unexamined insidersâ perspectives often become the basis for norms, in that they can set the standard for what is studied in other culture groups and how it is studied.
I contend that to minimize the influence of Western perspectives, generating multiple language versions of an instrument can begin with examining the motivation for the research. Researchers need to be able to answer why they are pursuing their research goals. It may be easy to answer the âwhyâ question with, âWe want to compare âŠ,â but I urge caution with research questions and hypotheses that can lead to invidious comparisons, which have been the fodder of the much criticized deficiency model (e.g., GarcĂa Coll & Magnuson, 1997; KaÄitçibasi, 2007). The deficiency model refers to studies whose results âexplainâ why members of less powerful culture groups (such as Third World societies, indigenous populations of First World societies, immigrants, and minorities) are deficient in some aspect of growth and development. Paraphrasing Foucault (1975), when we privilege one culture or language as the source and the other as the target, we give primacy to the source cultureâs history, norms, and economic circumstances. One useful heuristic for not falling into an unintended deficiency paradigm is to ask what aspects of development would members of the âotherâ culture group deem important to study? Psychologists from other cultures rarely if ever âstudyâ North Americans. Consider an example from the practice of age mixing in education, which is more widespread in Russia than in the United States. I believe there would be resistance if Russian psychologists came to the United States with an English translation of their instruments for a comparative study of the impact of younger students learning from older students. Similar to the faults of the deficiency model, people might feel this is a setup to highlight the superiority of a Russian pedagogical practice.
Horizontal collaboration
What is a conscientious researcher to do? It is not a trivial matter that financial resources for research reside mostly in the West (Moghaddam, 1987), and within Western societies they are more available to members of the educated elite of the dominant White culture group. The answer can be found in nonhierarchical âhorizontal collaboration,â which Sinha (1984) proposed to manage one cultureâs domination of the others.
Horizontal collaboration requires researchers from each culture and language group who come together to jointly decide on what constructs to research. Indigenous coleaders, who are full members of the team, can provide a safeguard against the unexamined exportation of ideas and methods because people from the cultures under study take a leading role in defining the goals and methods of the study. It is important for the collaborative research team to examine the constructs underlying the instruments to be translated. Although most discussions of translating instruments focus on item wording to achieve equivalence, a more fundamental concern is whether their conceptual foundations have comparable relevance with development in the cultures under study. Substantive problems can occur as a result of an unexamined transfer of constructs and concepts from one culture and language system to another. For example, the Japanese and Western constructions of the âselfâ are not strictly comparable (DeVos, 1985). This need to focus on constructs is reinforced by the long tradition in psychometrics that gives constructs a primary role in validity studies (Campbell & Fiske, 1959). If a serious examination of the constructs in the cultures to be studied reveals that the underlying concepts are not equivalent, the research need not be abandoned. Rather, the research questions can be revised. In such cases, a worthy research question may be what social, cultural, and physical environmental conditions have given rise to different conceptualizations in the different language groups.
A comparison of methods for generating multiple language versions of instruments
Direct, one-way translation is the most basic approach, but it is not recommended as a technique for translating instruments; I do not include it in Table 1.1, which presents the characteristics of alternative translation techniques.
Back translation
When researchers want to go beyond direct translations, back translation (Brislin, 1970, 1986) is currently the most widely used technique. The back translation method works as follows: To create a Portuguese version of a measure originally developed in English, one person (or a team of translators) translates from English into Portuguese, and a different person (or a team of translators) translates from Portuguese back into English. It is recommended to use several iterations of back translation until the last back translation matches the source language. Because the translation centers on the source language, which remains unchanged, this approach is most appropriate for translating established instruments that have a long history of use in the source language. Back translation has had its detractors (see Bontempo, 1993; Olmedo, 1981). Maxwell (1996) provides a compelling example of the potential pitfalls of relying solely on back translation in the following item on a science test:
Back translationâs main weaknesses include the absence in the process of input from researchers knowledgeable about the subject matter, lack of provisions for examining whether underlying constructs are equivalent in the cultures being studied, and failure to consider the interface of the potential for bias and scientific issues.
Back translation with decentering
This technique begins with back translation from the source to the target language and back. Discrepancies between the source and back-translated versions are dealt with through âdecentering.â The instrument is decentered or moved away from the idiosyncrasies of the source language by subjecting both the source and target language versions to modification through a process of several iterations (Werner & Campbell, 1970). One example of decentering is the hypothetical item, âable to meet deadlines,â from a hypothetical measure of attentional processes. In Turkish, a literal translation would be âölĂŒm çizgisi ile buluşma yeteneÄine sahiptir.â This can be back translated as âhas the ability to get together with the line of death,â indicating a serious need for decentering. The appropriate rendering in Turkish requires specifying what the deadline is for. Is it homework, a job, or a task? If it is homework, âable to meet deadlinesâ can be approximated in Turkish with a phrase that back translates as âfinishes homework on time.â At this point in the decentering process, the translators become aware that the Turkish version has dropped âable to.â They debate whether âfinishes on timeâ has the same meaning as âable to meet deadlines.â The next iteration might be to add words to the Turkish version to recapture âable to.â They can try, âĂdevini vaktinde bitirebilme yeteneÄine sahiptir.â This rendition back translates into âhas the ability to finish homework on time.â Although grammatically correct, the Turkish version is awkward. Translators may experiment with a different wording that back translates into âalways finishes homework on time,â and the iterations will continue until the translators are satisfied. When they are satisfied, we have a case where the idiosyncrasies of both the source and target languages have led to changes in the other.
Compared with back translation, decentering is more likely to yield functionally equivalent instruments. It is better suited to translate new instruments because decentering makes this technique unsuitable for translating established measures when researchers feel compelled to preserve the original wording in the source language. It shares with the back translation method the absence of provisions for input from bilingual e...