eBook - ePub

Handbook of Human and Social Conditions in Assessment

Name: Handbook of Human and Social Conditions in Assessment
Author: Gavin T. L. Brown,Lois R. Harris

Gavin T. L. Brown,

Lois R. Harris,

570 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Handbook of Human and Social Conditions in Assessment

Gavin T. L. Brown,

Lois R. Harris,

Book details

Book preview

Table of contents

Citations

About This Book

The Handbook of Human and Social Conditions in Assessment is the first book to explore assessment issues and opportunities occurring due to the real world of human, cultural, historical, and societal influences upon assessment practices, policies, and statistical modeling. With chapters written by experts in the field, this book engages with numerous forms of assessment: from classroom-level formative assessment practices to national accountability and international comparative testing practices all of which are significantly influenced by social and cultural conditions. A unique and timely contribution to the field of Educational Psychology, the Handbook of Human and Social Conditions in Assessment is written for researchers, educators, and policy makers interested in how social and human complexity affect assessment at all levels of learning.

Organized into four sections, this volume examines assessment in relation to teachers, students, classroom conditions, and cultural factors. Each section is comprised of a series of chapters, followed by a discussant chapter that synthesizes key ideas and offers directions for future research. Taken together, the chapters in this volume demonstrate that teachers, test creators, and policy makers must account for the human and social conditions that shape assessment if they are to implement successful assessment practices which accomplish their intended outcomes.

Frequently asked questions

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.

At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.

Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Yes, you can access Handbook of Human and Social Conditions in Assessment by Gavin T. L. Brown,Lois R. Harris in PDF and/or ePUB format, as well as other popular books in Education & Educational Psychology. We have over one million books available in our catalogue for you to explore.

Information

Publisher

Routledge

Year

2016

ISBN

9781317608172

Edition

Topic

Education

Subtopic

Educational Psychology

1
Volume Introduction

The Human and Social Experience of Assessment: Valuing the Person and Context

Lois R. Harris and Gavin T. L. Brown

The human condition is not perfect. We are not perfect specimens, any of us. We’re not robots.

Michael Ovitz (Zeman, 2001, paragraph 70)

Introduction

In our quest to improve educational assessment (e.g., increasing its validity, authenticity, accuracy, and learning benefits), it is easy to secretly wish that human beings (like robots) behaved in a consistent, predictable, and rational fashion. Rather than being content with total scores or ranking, educational stakeholders are currently seeking more diagnostic results from testing and assessment, but understanding how such methods might cope with the dynamism and complexity of classroom interactions is still a major challenge. Ideally, assessments should be able to measure human knowledge and behaviour with the same level of accuracy that we can achieve when calculating the speed of a ball descending a ramp. Imagine test designers’ glee if students always interpreted and responded to items in consistent ways or if panels of teachers marking essay responses generated identical feedback and scores. Classroom teachers would love to know in advance exactly how their students would interpret and respond to their formative feedback and be able to plan knowing precisely what kind of assessment was best for each student.

However, we live in a complex social and psychological world where people are not governed by universal laws of motion, but instead by their intricate (and ever changing) systems of values and beliefs, cultural and social norms, political contexts, personal motivations, relationships with others, and perceptions of the world around them, just to name a few. Within educational contexts, these factors influence all areas of assessment, including: decisions within educational systems, schools, and classrooms about how learning will be assessed; individuals’ (e.g., teachers and students) understandings of and kind of participation in assessment practices; and how responses to these assessment practices are evaluated, interpreted, and used. Educational assessment also has multiple and sometimes overlapping purposes (e.g., improvement, student accountability, school accountability [Brown, 2008]) and functions (Newton, 2007).

While these multiple purposes create difficulties for assessment validity and reliability (Bonner, 2013; Brookhart, 2003; Parkes, 2013), assessment models still generally expect the world of task and response or question and answer to be stable. When administering an assessment, educators anticipate student answers will be predictable in response to the difficulty of the question, its curriculum content, its cognitive demand, the probability of a student guessing the answer, and the sensitivity of the question to small changes in ability. In relying on observation and judgement, teachers expect that the scoring rubric or rules will be sufficient to consistently explain and describe student learning or diagnose needs. These models of behaviour make testing and assessment easier since they allow us to pretend the complexity of human existence does not exist within the world of educational assessment.

However, our observation, based on years of teaching and research, is that assessment is not so simple given the social environments within and surrounding classrooms at compulsory and tertiary education levels, let alone in vocational and workplace learning contexts. Classmates influence each other, often for good and sometimes, sadly, for ill. Although educators might wish for student honesty, truthfulness, and consideration, responses to teachers in the public forum of the classroom can be distorted by reluctance to exhibit non- or misunderstanding in front of others (Harris & Brown, 2013) in both face-to-face and online settings. Teachers may be frustrated or angered when, despite best efforts, students fail to grasp a new concept or skill; pupils’ fear of the teacher’s disappointment or anger may create further distortions in their responses to assessment practices in classroom settings. Student concerns about parent reactions to or the high-stakes consequences of their assessment results (e.g., promotion, retention, tertiary entrance, financial costs or rewards) can motivate further dishonest behaviour (e.g., plagiarism or various forms of cheating). Furthermore, teachers as evaluators may not always assess consistently or accurately due to factors including fatigue, time constraints, mood, biases (e.g., having a positive or negative relationship with or opinion of the student, the halo effect, etc.), inexperience, and differing interpretations of performance standards or criteria (Kahneman, 2011; Wyatt-Smith & Klenowski, 2012; Wyatt-Smith, Klenowski, & Gunn, 2010). Even when scoring is mechanical (e.g., computer marking of tests) or objective, a person originally designed the questions and set the parameters of what would be considered a correct answer (Brown, 2015), a set of decisions that might not be universally accepted as appropriate. Hence, human conditions impact assessment from design to implementation to scoring and these take place within a complex social environment.

Within this handbook, the term human conditions refers to how individuals understand, respond to, and interpret assessment. These individuals include assessment takers (normally students), assessment givers (usually teachers or administrators), and assessment users (i.e., teachers, administrators, students, or parents). To understand human conditions, multiple aspects must be explored. First, it is important to consider the beliefs, attitudes, perceptions, and/or conceptions of assessment that diverse stakeholders hold. Additionally, one must take into account their experiences, responses to, behaviours around, and emotions towards assessment. Hence, when we speak of human conditions, we are talking specifically about emotions, experiences, and beliefs which occur WITHIN an individual and influence how that person understands, engages with, and interprets assessment experiences and results.

In contrast, the term social conditions is used to refer to how assessment is experienced in group settings (e.g., classrooms) and how cultural, historical, and societal influences mediate those experiences. Students are educated and assessed in social settings which include adult teachers (who enact assessment practices within particular policy, social, and cultural environments) and peers (who impact on students’ motivation, self-esteem, self-image, collaborative learning opportunities, etc.). These settings are contextualised by the educational and assessment policies of the jurisdiction, which may or may not align with the values, beliefs, and motivations of students, their parents, and their teachers. Hence, by social conditions, we refer to the interplay between the experiences of the individual and collectives to which those individuals belong at the level of classroom, policy, society, and culture.

Both human and social conditions must be understood as many contemporary educational policy reforms (especially Assessment for Learning [AfL]) highlight the need to consider both the intra- and interpersonal dynamics of assessment. AfL explicitly requires classroom assessment practices (e.g., peer and self-assessment) that involve students taking on the teacher’s traditional role as assessor (Assessment Reform Group, 2002; Berry, 2008; Black et al., 2004; Stiggins, 2002; Swaffield, 2011); the interpersonal dynamics of teacher and learner now have to be extended to account for experiences as students assess each other or assess themselves in front of their classmates.

Understanding what assessment means in light of these human and social conditions is important for the design of assessments, the preparation and development of teachers, the quality of scoring and marking systems, the creation of appropriate policies, and the design of statistical models used to generate scores. This chapter will further explore the need for research into these important areas and provide an introduction to the subsequent chapters within this volume.

The Need for Research Into Human and Social Factors

Test responses are a function not only of the items, tasks, or stimulus conditions, but of the persons responding and the context of measurement. This latter context includes factors in the environmental background as well as the assessment setting (Messick, 1989, p. 14). Any educational assessment, from classroom based teacher observations to high-stakes standardised tests needs to be valid and reliable if it is to be used in educational decision-making. The modern approach to validity positions it as a series of arguments (Kane, 2006, 2013); specifically, validity is “an integrated evaluative judgement of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and actions based on test scores or other modes of assessment” (Messick, 1989, p. 13). Reliability is generally considered to be the “consistency, stability, dependability, and accuracy of assessment results” (McMillan, 2001, p. 65). However, as Bonner (2013) accurately points out, there is a need to rethink exactly how validity is constructed within the realm of classroom assessment in light of the differences between assessments administered for summative evaluation versus those for formative improvements in instruction or learning. While the consequences of some educational assessments may be greater than others (e.g., student held back due to low test scores vs. placed in the wrong within-class reading group vs. spent slightly more time on a concept than necessary because the teacher thought he had not understood), all invalid or unreliable assessments carry some type of inappropriate consequence for learners, teachers, and/or schools.

But what causes assessments to be invalid or unreliable? While invalid assessment sometimes occurs because inappropriate instruments are used to evaluate learning (e.g., instrument does not assess what students had actually been taught) or because of cultural or linguistic biases (e.g., students are tested in their second language prior to attaining adequate competence in the language of testing), problems can also arise when students respond to the assessment in particular ways or when inappropriate inferences are drawn from it, rather than from inherent flaws within the instrument or assessment method itself. For example, issues can occur because of inappropriate student responses to assessment tasks (e.g., cheating or lack of effort) or physical and psychological issues which negatively affect the student’s performance on the day (e.g., illness, anxiety about the test, or unrelated personal worries which distract the student). When stakeholders are unaware of the human and social conditions influencing assessment, they are far more likely to fail to consider these threats to validity and reliability, undermining the appropriateness of their inferences from the data.

Also, without understanding human and social conditions, inferences based on assessment results are less likely to be valid. One particularly concerning misconception is around the accuracy of a score. Over 30 years ago, Messick (1984, p. 215, emphasis added) noted:

Because student characteristics as well as social and educational experiences influence current performance, the interpretation and implications of educational achievement measures must be relative to intrapersonal and situational contexts. These points imply a strategy of comprehensive assessment in context that focuses on the processes and structures involved in subject-matter competence as moderated in performance by personal and environmental influences.

However, this message appears to have been largely overlooked within the mainstream assessment culture, especially in environments where high-stakes testing is the primary form of student and school accountability and evaluation. Current three-parameter item response psychometric models account for item difficulty, item discrimination, and the probability of guessing (Hambleton, Swaminathan, & Rogers, 1991). There are multiple statistical models for handling polytomous, as opposed to dichotomous, scoring systems, and for identifying the impact of human raters, tasks themselves, and interactions of tasks and raters upon the accuracy of those polytomous scores. Tools and procedures exist to minimise error introduced by emotional or biased markers. However, current models do not have the ability to detect alignment (e.g., were students taught what was tested?), interference caused by variation in students’ emotional or physical states, students’ inappropriate responses to assessment, and a host of other human variability in relation to academic evaluation. We still have little understanding of the intrapersonal and interpersonal factors in various situations such as classrooms, and this absence weakens the statistical models we use to estimate attainment or proficiency.

The Challenges to Assessment Validity

There seem to be three major types or conditions of assessment that need to be distinguished. Classroom assessments include formal (e.g., tests, assignments) and informal (e.g., observations, conversations, interactions) methods to inform teacher and student improvement action (i.e., formative) and evaluate and report on performance (i.e., summative). Some national (e.g., the United States’ National Assessment of Educational Progress [NAEP] or New Zealand’s National Monitoring Study of Student Achievement [NMSSA]) and international (e.g., PISA, TIMMS) external monitoring assessments aim to lightly sample and monitor the overall health of the schooling system. These assessments generally have low-stakes consequences for school-level participants on whose performance inferences depend, but may have significant impact at the system level. Then, there are external accountability assessments (e.g., the United States’ No Child Left Behind [NCLB]), where student testing is used to directly evaluate the efficacy of schools and teachers and which carry high-stakes consequences for school administrators and teachers. Interestingly, education research, especially from the USA, has uncovered that many of the consequences of school accountability testing have been negative (Nichols & Berliner, 2007). Parallel to these high-stakes school accountability tests are high-stakes graduation (e.g., Hong Kong Diploma of Secondary Education), certification (e.g., Texas Examinations of Educator Standards), or entry (e.g., China’s University Entrance Examination-gao kao) tests used to evaluate the competence or proficiency of individual students.

Throughout this volume and within the field, it is evident that there is much more research about the human and cultural responses and constraints within formal testing than there is about informal, interactive classroom assessments. Interestingly, there is much less research into the effects of the formal testing used in low-stakes monitoring than there is of high-stakes formal testing. Hence, throughout this volume, the authors have been careful to delineate which class of assessment the research they review applies to and they have been challenged to consider the generalisability and meaningfulness of their results beyond the context which this research has investigated.

Challenges for Testing Systems

While information about validity and reliability of high-stakes testing systems may be contained within detailed technical reports about particular educational assessment tools, in many cases, such details are seldom released to the public (Chamberlain, 2013; Gebril, this volume) or are not provided in a language that lay people are likely to understand. In cases where technical information about test reliability has been made public (e.g., England) and efforts have been made to engage with the public around these important issues, there have been huge challenges, with the media often sensationalising, misinterpreting, or misreporting test data (Newton, 2013). Newton (2013) has noted that while such media reports often contain some useful information, there is a risk that readers take away only the sensationalised headlines and data about measurement error (e.g., 30% of students may have received the wrong score); these simplistic take-home messages could potentially erode public confidence in the system. Such was the case with the innovative California Learning Assessment System, which was determined to have a tolerable standard error of 0.55 on a 1 to 6 scale (Cronbach, Bradburn, & Horvitz, 1995), and yet, the system was deemed by politicians to have failed and was subsequently abandoned. Such data certainly can undermine commonly held public assumptions about testing: that it is objective and that it is appropriate to use such data comparatively and competitively (Brookhart, 2013). While Brookhart’s (2013) work focuses on an American context, the global trend to use testing for school accountability purposes (e.g., the Australian National Assessment Program—Literacy and Numeracy [NAPLAN]; Klenowski & Wyatt-Smith, 2012; Thompson, 2013) indicates these concerns apply to many other countries.

Diverse important decisions, including academic promotion, acceptance into tertiary or other selective programs, school funding, and in some jurisdictions, teacher promotion or dismissal, often rely heavily upon test scores. Without test scores, it is highly possible such decisions might be subject to collusion, nepotism, or corruption and, thus, lead to even worse outcomes; these concerns provide an argument for maintaining formal assessment systems, especially in conditions of rationed resources. Hence, it is relatively easy for policy makers and non-experts to treat assessment scores and grades as clear, reliable, and unequivocal measurements of...

Cover
Title
Copyright
Contents
Foreword
Acknowledgments
Chapter 1 Volume Introduction: The Human and Social Experience of Assessment: Valuing the Person and Context
Section 1 Teachers and Assessment
Section 2 Students and Assessment
Section 3 Classroom Conditions
Section 4 Cultural and Social Contexts
Contributing Authors
Author Index
Subject Index