1
Evaluation:
For Public Good or Professional Power?
Jan-Eric Furubo and Ove Karlsson Vestman
âThe definition of the alternatives is the supreme instrument of power â
E. E. Schattschneider, 1960
Evaluation has come of age. Most of us, not only evaluators, would probably have some difficulty imagining a society in which evaluation is not used on a daily basis, everywhere from school to work, and from local authorities to parliamentary committees. Evaluation is also regarded generally as something for the public good. While university researchers or public servants might think that there just might be too many types of evaluation, rankings and reviews, evaluation is nonetheless viewed positively, and evaluators are seen to be among the âgood guys.â One way or another, evaluation is perceived as a tool for improvement, or in the words of one of the most influential theorists: âEvaluators are dedicated to using their knowledge for the benefit of societyâ (Hirschon Weiss, 2004: 153).
At the same time, evaluation has an intrinsic relationship to something more ambiguous and even ominous. It is power. Many evaluation practitioners, and certainly scholars in the field, have been very much aware of this relationship. For many, a power relationship actually underpins the rationale for evaluation. Evaluators do often say, alluding to Wildavskyâs famous book, that evaluation must Speak Truth to Power (Wildavsky, 1979).
It is also said that evaluation can change power relations in society and give power to powerless groups (House & Howe, 1999). Evaluation can bring knowledge to others, and knowledge is important for the exercise of power. It can change how power is allocated in a range of relationships and settings. This is an important but not very new idea. After all, it is exactly what Francis Bacon said 400 years ago when he said that knowledge is power.
While this perspective informs this book, we have added a second one, namely the degree to which evaluation seeks power for its own interests. This perspective is based on a simple assumption: If you are in possession of an asset that can give you power, why not use it for your own interests? In his discussion in this volume, Peter Dahler-Larsen expresses it very bluntly âIf institutions are interested in protecting and expanding their resources, autonomy, and reputation, so are evaluation institutions. This logic will surprise no political observer of institutions.â
Can we really trust evaluation to be a force for the good? To what degree can we talk about self-interest in evaluation, and is this self-interest something that contradicts other interests such as âthe benefit of societyâ? If so, what can be done to control this?
It is understandable that some readers will question the necessity of such critical questions about evaluation. We think two answers justify this endeavor. But first, we have to address a more fundamental question about what we actually mean by âevaluation.â
What Do We Mean by Evaluation?
Every reader is probably aware that there is no one authoritative definition of evaluation. It is therefore impossible to state in a precise manner which activities can be called evaluation and which cannot. It is also obvious that the term âevaluationâ becomes more elastic the longer it is used, partly because the word itself is loaded with positive connotations that make it advantageous to use it in many situations. So, we find ourselves in a similar situation as Michael Power in The Audit Society, where he states that the
study of auditing is actually unable to be precise about what it is talking about. However, and this is the point, it is precisely this fuzziness in the idea of auditing that enables its migration and importation into a wide variety of organizational context. The ambiguity of auditing is not a methodological fact but it is a substantive fact (Power, 1997: 6).
A similar observation can be made about evaluation more generally. This is true not only for audit and evaluation, but also for many other phenomena, and is more evident when discussing evaluation in an international context. Due to varying historical and institutional circumstances, evaluation has different connotations in different countries. In many nations, evaluation is regarded as a retrospective analysis, a proposition with which we tend to agree, but in the European Union, ex ante analysis is also called âevaluationâ (Summa & Toulemonde, 2002: 410). Chelimsky wrote in 1995 that âwe now evaluate both ex ante and ex postâ (Chelimsky, 1995: 14).
Most evaluators also acknowledge that we have to draw borderlines between evaluation and other forms of intellectual inquiry. For example, descriptive statistics as such cannot be labeled as evaluation. Statistics, which in many countries have been produced for more than 200 years, can be an important part of evaluation and often have fundamental importance for how we conceive society and how we define social problems. But statistics about housing standards, or the number of students in different education systems and so on, are not regarded as evaluation. At the same time, such lines are indistinct. Many would regard an effort to monitor what actually happens after, let us say, a governmental intervention in the health sector as an evaluation, even if explanatory and causal questions are absent.
A similarly tricky question arises in relation to statements about empirical situations or about causal relations, and whether these have to be the result of the application of certain (scientific) methods in order to be called âevaluation.â A statement about a social problem does not become an evaluation merely because is linked to an assessment as to whether a policy of the present government is stupid or brilliant. In other words, evaluation has to be a carefully conducted assessment of the merit, worth, and value of the evaluand (Vedung, 1997: 3). When we state that something is an evaluation, we imply some sort of quality in the evaluative product and in the evaluative process. But there is a danger that we will be trapped in an overly rigid position by saying that evaluation should be conducted according to a set of specific criteria that reflect certain epistemological perspectives or methods. Such a position implies that the answer to the question of whether a study is an evaluation or not depends on the choice of method. We think this is too a narrow perspective.
In short, evaluation today, in both praxis and discourse, is very diverse. At the same time, overly vague use of the term makes evaluation more or less synonymous with all forms of knowledge production emanating from academic institutions, consultant firms and so on. While the activities undertaken by the latter groups certainly can be described as systematic and careful endeavors, there is an additional salient feature of evaluation, namely that it is an investigative activity conducted in relation to something which reflects a particular kind of purpose and intention. The following examples illustrate the point.
A researcher can be interested in studying which factors will influence reading skills without any relation to governmental interventions, existing or planned. In such studies, the researcher can ask questions about interactions between parts of the brain, movements of the eyes, and so on. The fact that a study can be used or is intended to be used in the construction of different governmental interventions does not mean that such studies are labeled as âevaluation.â The motive for a certain study can be pure curiosity about social, psychological or physical mechanisms. But the motive can also be the search for knowledge that can be used in a political context.
One can study many things that explain differences in readings skills without thinking of interventions to improve these skills, but the purpose can also be to build knowledge that can then be used in the design of future interventions. It is possible to study the relation between obedience, authority and the willingness to hurt others in different experiments without any intention to use this research in the construction of an intervention aimed to have an impact on bullying in schools. However, irrespectively of intention, such studies can be useful in a discussion of how we can influence behavior like bullying. This usage doesnât mean that we regard as âevaluationâ the studies of different social and psychological phenomena such as Durkheimâs study of suicide, or Milligramâs classical study of obedience, or Rosenthal & Jacobsonâs study of the importance of the expectation of teachers (Durkheim, 1979; Milgram, 1963; Rosenthal & Jacobson, 1968). However, evaluations which are motivated by a desire to build knowledge about the effects and causal mechanisms of a certain intervention can ask similar questions.
To summarize, evaluation
has to be something more than description. It has to involve some form of explanation and judgment.
has to be conducted carefully
is related to some form of intervention or action which reflects purpose and intention.
Of course, purpose and intention can have different moral values. Evaluative techniques could have been applied in political settings such as the Third Reich, had they existed at the time. But based on a study of the development of evaluation, we agree with Weiss and many others that evaluators are dedicated to use their knowledge for the benefit of society. Or as Mark, Henry and Julnes (2000) put it:
The raison dâĂȘtre of evaluation (...) is to contribute indirectly to social betterment by providing assisted sense-making to the democratic institutions that directly charged with defining and seeking that betterment (p. 7).
And we also believe that evaluation can be conducted only if society permits an open discussion about goals and means. Only in such an environment can evaluation have a meaningful role in questions about the right thing to do. And as Peter Dahler-Larsen points out in his chapter, evaluation can, if it works according to its best ideals, help to create transparency about public institutions and interventions, and thus pave the way for their improvement.
Evaluation is more than the producer of briefing material about how we can do things the right way.1
Why so self-critical?
The critical questions about evaluation raised earlier are important because the actual praxis of evaluation does not always fulfill the moral and ethical ambitions of social betterment or doing good. It is understandable that some readers will question the necessity of such critical questions about evaluation. We think that we can give two answers that justify this endeavor.
Quantity and Institutionalization
Our first answer has to do with the fact that evaluation today is a well-established, systematic and institutionalized public management instrument. It is integrated into budgeting systems, accounting and reporting procedures, and into appropriate audit mechanisms. Indeed, in many countries it is today hard to imagine any serious public policy effort being undertaken without evaluation and a stream of evaluative information. Evidence for this assertion can be ascertained easily by looking at the proliferation of evaluations and evaluation bodies, the size of evaluation budgets and the number of evaluations undertaken. These developments have been described in the International Atlas of Evaluation (Furubo, Sandahl & Rist, 2002) and in From Studies to Streams (Rist & Stame, 2006).
Even so, the lack of more precise figures about what we spend on evaluation at the local/regional, national, and international levels is striking. The International Atlas of Evaluation, noted earlier, mentions the number of studies and the actual costs of evaluation, but the picture is very fragmentary. In 2007, Kim Forss made a very rough estimate of how much was spent on evaluation on national and local levels in Sweden, a country with nine million inhabitants: The cost was about 800 million euros (Forss, 2007). To put this into perspective and to compare it with other forms of knowledge production, this would pay the salaries of 5,000-plus professors of economics, political science and behavioral sciences.
But despite the lack of precise dataâa problem in itselfâwe are on safe ground when we state that much more attention is paid to evaluation and to evaluative information more generally than was previously the case. This is principally due to the role that the European Union (EU) has played as an entrepreneur for evaluation. The World Bank, the Organization for Economic Co-Operation and Development (OECD) and other institutions have played similar roles in many other countries. In Canada and the United States, federal governments have adopted performance measurement and âresults-based managementâ as approaches to more accountable government, resulting in higher demand for evaluation. Bouckaert & Halligan show in their comparative analysis Managing Performance that the praxis of measuring performance has become more extensive but also more intensive (Bouckaert & Halligan, 2008).
The number of international and national evaluation societies has boomed. Twenty years ago, the American Evaluation Association and the Canadian Evaluation Society dominated the scene. Since then, new societies have been formed, including the European Evaluation Society, as well as the Australian, U.K., and Danish evaluation societies, to name a few.
Both the sheer quantity of evaluation and its âembeddednessâ in administrative structures, management systems, budget processes and so on, raise questions about its actual role and consequences in terms of power. When we today talk about evaluation we are not speaking about a few research reports, but about a fundamental praxis in many societies and governance structures. So even though evaluation has long been part of the political game, the reason to discuss the relationship to power has become much more pressing today.
A Moral Obligation
Our second answer is that evaluators, like other professionals, are the least self-critical. As evaluators, we are in the business of asking fundamental questions about what others do, both in terms of theory and implementation. It is important that we raise similar questions about what we ourselves do.
It may even be a moral obligation to critique our own professional endeavors. Evaluation brings a critical perspective to many other fields, such as education, regional development, agriculture, international aid, health and so on, acting as a devilâs advocate. But the discourse in the field of evaluation is only developed by evaluators and evaluation scholars themselves. So we should âdo unto ourselves what we would do unto others.â
The importance of this is further underlined in Dahlerâs discussion, with reference to Luhman and Lindeberg: we are now witnessing a development where evaluation institutions are evaluated by evaluations institutions, which are, in turn, evaluated by evaluation institutions. Evaluation becomes self-referential, but that does not necessarily mean that the evaluation is perceived as legitimate by others in society.
If evaluationâas a form of social praxisâwants to be regarded as an emerging profession, there is a need to ask questions that deepen our understanding of evaluationâs own role. Current developments in the field of evaluation show that the actual praxis of evaluation is increasingly embedded in policy development and public administration: evaluators themselves risk becoming part of the very power structures they are examining.
In this discussion, it is possible to move between different levels. On one level evaluation is a social phenomenon, like education or the media. A second level is the organization that commissions or conducts the evaluation, or that builds evaluation systems. A third level is the individual evaluator. Ethical codes are basically focused on the third level and address the individual evaluator and his or her work. But in a way, the two other levels can also be discussed in terms of individual responsibilities. The notion of ethical dilemmas, or the unintended effects of a social praxis or a system, cannot be observed and discussed by the social praxis or system itself. These phenomena can only be observed and discussed by individuals, who are not absolved from responsibility simply because a problem is caused at a...