eBook - ePub

High-Stakes Testing in Education

Name: High-Stakes Testing in Education
ISBN: 9781317682127

Value, fairness and consequences

Theo Eggen,

Gordon Stobart,

Theo J.H.M Eggen,

144 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

High-Stakes Testing in Education

Value, fairness and consequences

Theo Eggen,

Gordon Stobart,

Theo J.H.M Eggen,

About this book

High-stakes educational testing is a global phenomenon which is increasing in both scale and importance. Assessments are high-stakes when there are serious consequences for one or more stakeholders. Historically, tests have largely been used for selection or for providing a 'licence to practise', making them high-stakes for the test takers. Testing is now also used for the purposes of improving standards of teaching and learning and of holding schools accountable for their students' results. These tests then become high-stakes for teachers and schools, especially when they have to meet externally imposed targets. More recent has been the emergence of international comparative testing, which has become high-stakes for governments and policy makers as their education systems are judged in relation to the performances of other countries.

In this book we draw on research which examines each of these uses of high-stakes testing. The articles evaluate the impact of such assessments and explore the issues of value and fairness which they raise. To underline the international appeal of high-stakes testing the studies are drawn from Australia, Africa, the Caribbean, Europe, former Soviet republics and North America. Collectively they illustrate the power of high-stakes assessment in shaping, for better or for worse, policy making and schooling.

This book was originally published as a special issue of Assessment in Education: Principles, Policy & Practice.

Tools to learn more effectively

Saving Books

Keyword Search

Annotating Text

Listen to it instead

Information

Publisher

Year

Print ISBN

eBook ISBN

Topic

Subtopic

High-stakes testing – value, fairness and consequences

High-stakes testing has been with us for over two thousand years and is steadily increasing both in scale and range. This special issue considers some of the main uses of these tests (a term used loosely to cover many forms of assessment) and their impact. Tests become ‘high-stakes’ when the results lead to serious consequences for at least one key stakeholder. These consequences could be educational or occupational life chances for individual candidates. This is the case when, for example, testing is used for selection in education and training or to gain credentials that provide ‘a licence to practise’. Where tests are used for accountability purposes in evaluating performance and to determine whether targets have been met, they become high-stakes for institutions such as schools and colleges, especially if they affect funding and recruitment. Current international comparisons, for instance the Programme for International Student Assessment (PISA), have introduced a new high-stakes phenomenon – tests that are low-stakes for the individuals taking them and for their schools but high-stakes for politicians, policy makers and governments.

While assessments are often used for multiple purposes (Newton 2007), this special issue focuses on three main high-stakes uses: selection and placement, raising standards and accountability. It provides case studies from a range of countries on the impact of high-stakes tests, many of them recently introduced to make selection fairer or to improve educational standards and accountability. Alongside these studies we include two papers which look at technical aspects, one examining models of validation in complex high-stakes assessment contexts and the other examining the reliability of the awarding and certification process in a high-stakes school graduation diploma.

Selection, placement and certification

Historically these have been the original, and dominant, uses of high-stakes assessment. Much of the selection was for occupational progression. Its origins are found in the Chinese Imperial Examination System for civil service selection, which goes back to the competitive examinations during the Han dynasty two thousand years ago (see Carless 2011, for a fuller account). At the heart of this was the attempt to provide fairer selection for schooling, government positions and the military than the nepotism that prevailed (though women and many males in manual work were excluded). By the time of the Ming dynasty (1368–1644 AD) the examination system was fully developed and extremely rigorous, including being locked in an examination cell for three days, the candidates’ scripts being copied so that handwriting would not be recognised, and double marking.

Like all competitive assessments in which the results affect life chances, there is the risk of seeking to gain advantages over other candidates. Like its modern counterparts, the Chinese system had to reckon with the development of a coaching industry, which would advantage the more affluent, and with cheating. The predictability of the examinations made them a target for smuggling in answers, countered by robust body searches. The cells were exposed so that behaviour could be observed (and the rain come in), but still there were examples of the use of tunnelling. Collusion between test-takers and officials led to severe penalties (in 1657 it led to seven officials being given the death sentence).

We recount these examples because these issues remain with us today and feature in several of the articles. George Bethell and Algirdas Zabulionis provide us with a fascinating account of how high-stakes university selection examinations have been introduced in former Soviet Republics to make selection fairer and to move away from political nepotism. In some of these countries corruption remains part of the social context, so what steps have to be taken to keep the process fair and reliable? What we find are modern equivalents of the Chinese system, with CCTV replacing surveillance towers, examiners being kept away from the public until the exam is taken and scripts being marked anonymously through the use of sophisticated digital technology. The stakes in all this are extremely high: Bethell and Zubulionis observe that ‘a single mark can make the difference between, for example, a university place and a year in military service’ (p. 17), and so security and reliability become paramount – sometimes with costs to construct validity.

Iasonas Lamprianou’s country profile of Cyprus focuses on the unintended consequences of the rapid implementation of a new high-stakes university selection examination, which was introduced at short notice because of a European Union decision. When policy makers sought to combine it with the school graduation examination, so that it had a dual purpose, a number of unintended consequences followed. Because of its high-stakes selection role the new examination has become a very public, and political, concern. Lamprianou also examines how the intention to reduce the influence of the private exam preparation industry, which has traditionally advantaged the more prosperous, has fared.

A third case study of the impact of high-stakes selection examinations is Jerome De Lisle, Peter Smith, Carol Keller and Vena Jules’ analysis of the outcomes of the 11+ selection examinations in Trinidad and Tobago. In terms of life chances, the selection examination for secondary schooling could historically claim to carry the highest stakes for individuals. When secondary schooling was rationed it meant the difference between finishing formal education and gaining all the opportunities that came with additional schooling. In an age of universal secondary education its role in many countries is placement as it may determine which educational track students will enter or at which school they get accepted. These are powerful drivers across the world, with parents desperate to get their children into prestigious schools. Again this has led to a coaching industry as parents seek to maximise their children’s chances.

The fairness and reliability of secondary selection tests has always been a concern (Gardner and Cowan 2005) given the impact on life chances. This is particularly the case when the outcome rests on a single result from a single assessment. Bourdieu (1991) observed: ‘between the last person to pass and the first person to fail, the competitive examination creates differences of all or nothing that can last a lifetime’ (120). De Lisle et al.’s analysis looks at some of the equity issues that could undermine the validity of the 11+ assessment, focusing on gender, geography and assessment design. Assessment design includes what is included and how changing this, to improve equity, has social consequences.

Setting and raising standards

The use of assessments to evaluate and improve the performance of schools, colleges and training institutions is a widely recognised, and very public, purpose. While it may have a very contemporary feel, there are plenty of historical precedents (see Stobart 2008). Twice-yearly written examinations were introduced at Cambridge University at the end of the eighteenth century to improve the performance for its students. The use of external written examinations to raise standards then percolated down to schools, leading in England to the development of the university examination boards which set school examinations and used them for selection as admissions to university became more open.

The use of high-stakes testing for school accountability is exemplified in the United States through the No Child Left Behind (NCLB) legislation, with its financial consequences for schools and teachers. This ‘incentive’ approach is not without its history. ‘Payment-by-results’ was introduced in England through the 1862 Revised Code at a time of increasing demand for elementary schooling. The scheme introduced grants for schools, which directly affected teachers’ salaries, based on the performance of pupils in reading, writing and arithmetic tests. The assessments were conducted by visiting school inspectors. Like the current NCLB legislation, the intentions were good; teachers would have to prepare all their pupils, not just favour the higher-achieving ones.

Like other high-stakes tests the consequences of payment-by-results, a scheme that continued for 30 years, were mixed. The main negative impact was how it affected teaching and learning, which soon became focused on drilling for the tests. In a scathing indictment of its effects on learning a Chief Inspector commented:

The children ...were drilled in the contents of those books until they knew them almost by heart. In arithmetic they worked abstract sums, in obedience to formal rules, day after day, and month after month; and they were put up to various tricks and dodges which would, it was hoped, enable them to know by what precise rules the various questions on the arithmetic card were to be answered... Not a thought was given, except in a small minority of schools, to the real training of the child, to the fostering of his mental (and other) growth. To get him through the yearly examination by hook or by crook was the one concern of the teacher... To leave a child to find out anything for himself, to work out anything for himself, would have been regarded as proof of incapacity, not to say insanity, on the part of the teacher, and would have led to results which, from the ‘percentage’ point of view, would probably have been disastrous.

(Holmes 1911, 107–8)

Research into current accountability testing suggests that similar risks are still with us. While there may be positive consequences in terms of teachers working harder and more effectively to cover more material (Koretz, McCaffrey, and Hamilton 2001), this may also restrict the curriculum to those subjects that will be tested (Boyle and Bragg 2006) with an emphasis on coaching to the test (Beverton et al. 2005). The most negative consequence would be cheating, either through directly aiding students or through ‘playing the system’ by manipulating entries, for example by retaining students in the year below the test year or encouraging them to drop out (Hursh 2005).

A recent major US review conducted by the National Academies of Sciences (Hout and Elliot 2011) on the impact of incentives and test-based accountability in education reports similar mixed benefits. Their main conclusion is:

(1) Test-based incentive programs ... have not increased student achievement enough to bring the United States close to the levels of the highest achieving countries. When evaluated using relevant low-stakes tests, which are less likely to be inflated by the incentives themselves, the overall effects on achievement tend to be small and are effectively zero for a number of programs.

(S-3)

Despite this kind of evidence, policy makers are still drawn to high-stakes tests which offer relatively simple accountability measures and allow comparisons to be made between schools and local administrations. One such recent development is the introduction of national tests in Australia, a country which previously has only had state-based assessment. Val Klenowski and Claire Wyatt-Smith provide a telling account of the impact of the introduction of the National Assessment Program – Literacy and Numeracy (NAPLAN), which includes school ‘league tables’ which have not previously been available. Given what we know about the consequences of high-stakes accountability testing, has Australia learned any lessons that would help to mitigate negative impacts?

When low-stakes become high-stakes: the impact of international comparative studies

A more recent high-stakes phenomenon has been the way in which international comparative studies such as Trends in International Mathematics and Science Study (TIMSS), The Progress in International Reading Literacy Study (PIRLS) and PISA have become high-stakes. Here the consequences are not for the students or schools, since the sampling methodologies mean no direct consequences for them as they are not identified in the results. The consequences are for politicians and policy makers who have to respond to their country’s position in the league tables that become the focus of public and policy concern. This is particularly the case when a country does worse than expected, either by sliding down the league table or by doing worse than neighbouring countries. An example of this would be the ‘PISA Shock’ experienced by Norway in the 2000 PISA study. When a country that has one of the highest per capita investments in education scored below the PISA average and was ranked below its fellow Scandinavian countries, there were extensive political repercussions as the opposition seized on these findings (Baird et al. 2011). This also led to a programme of educational and assessment reform, as it has in other countries, for example Germany and Denmark.

Sarah Howie’s study of the impact of South Africa participating, for the first time and as the first African country, in the 1995 TIMSS study and then, as one of two African countries, in the 2006 PIRLS study provides a powerful case study of the impact of taking part in such studies. As a developing nation seeking to overcome its historical legacy, this was a brave commitment to monitor standards. The poor results had a major political and policy impact, as did deciding not to take part in the 2007 TIMSS study. It has led to the setting up of other national monitoring approaches that were seen as more constructive in how they reflect what is very slow progress in the struggle to raise standards.

The quality of high-stakes testing

The consequences of high-stakes testing mean that the quality of the testing instruments, the awarding procedures and the valid interpretation of the results have to be of the highest quality. We see in several of the articles how those responsible for the tests seek to make them as fair as possible for the candidates. The task is to optimise what can be assessed (construct validity), how best it can be assessed (fitness-for-purpose) and how reliability can best be ensured. These raise important theoretical and technical issues.

Two of the articles directly address the theoretical issues around validity and reliability in high-stake assessments. Martha Koch and Christopher DeLuca argue that the multiple purposes that high-stake tests are used for require a re-thinking of conventional validity theorising. Much validity theorising focuses on the use of a single instrument in relation to a specific purpose (for example, Crooks, Kane, and Cohen 1999). When there are multiple purposes, the approach is to validate each one separately. Koch and DeLuca argue that this approach does not do justice to the interactions between these purposes. They propose a model of validation that addresses this, using narrative case description as a better representation of the complexity of large-scale assessment systems. They demonstrate this approach in relation to Ontario’s Grade 9 Mathematics assessment. In an era when assessment is regularly used for multiple purposes, this article offers important new thinking on validation.

The reliability of awarding procedures when results of several tests are combined is the focus of the article by Peter van Rijn, Anton Béguin, and Huub Verstralen. Their case study is of another high-stakes assessment, the Dutch secondary school leaving diploma. This is awarded on a pass/fail basis but represents the combination of examination results and teacher assessments in a variety of subjects. How can these diverse results be combined in a way that reduces the risk of misclassifying candidates in the final pass/fail result? The authors consider a variety of technical approaches drawing on test theory and model the alternatives to establish which are the most appropriate decision rules for aggregating results. The article serves as a valuable reminder of the importance of establishing valid assessment procedures, especially when the results generated are of such importance to students’ life chances.

These articles are complemented by Gordon Stanley’s review of Secondary School External Examination Systems (2009), edited by Vlaardingerbroek and Taylor. From the 16 case studies of different countries Stanley identifies some of the main themes common to these end-of-secondary selection examinations. These include a concern with standards as universal secondary education sees an increasing proportion of the cohort taking s...

Cover
Title
Copyright
Contents
Citation Information
Notes on Contributors
1. High-stakes testing - value, fairness and consequences
2. The evolution of high-stakes testing at the school-university interface in the former republics of the USSR
3. Unintended consequences of forced policy-making in high stakes examinations: the case of the Republic of Cyprus
4. Differential outcomes in high-stakes eleven plus testing: the role of gender, geography, and assessment design in Trinidad and Tobago
5. The impact of high stakes testing: the Australian story
6. High-stakes testing in South Africa: friend or foe?
7. Rethinking validation in complex high-stakes assessment contexts
8. Educational measurement issues and implications of high stakes decision making in final examinations in secondary education in the Netherlands
Index

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 990+ topics, we’ve got you covered! Learn about our mission

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud

Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app

Yes, you can access High-Stakes Testing in Education by Theo Eggen,Gordon Stobart,Theo J.H.M Eggen in PDF and/or ePUB format, as well as other popular books in Education & Education General. We have over one million books available in our catalogue for you to explore.