Handbook of Test Development
eBook - ePub

Handbook of Test Development

  1. 676 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Handbook of Test Development

Book details
Book preview
Table of contents
Citations

About This Book

The second edition of the Handbook of Test Development provides graduate students and professionals with an up-to-date, research-oriented guide to the latest developments in the field. Including thirty-two chapters by well-known scholars and practitioners, it is divided into five sections, covering the foundations of test development, content definition, item development, test design and form assembly, and the processes of test administration, documentation, and evaluation. Keenly aware of developments in the field since the publication of the first edition, including changes in technology, the evolution of psychometric theory, and the increased demands for effective tests via educational policy, the editors of this edition include new chapters on assessing noncognitive skills, measuring growth and learning progressions, automated item generation and test assembly, and computerized scoring of constructed responses. The volume also includes expanded coverage of performance testing, validity, fairness, and numerous other topics.

Edited by Suzanne Lane, Mark R. Raymond, and Thomas M. Haladyna, The Handbook of Test Development, 2nd edition, is based on the revised Standards for Educational and Psychological Testing, and is appropriate for graduate courses and seminars that deal with test development and usage, professional testing services and credentialing agencies, state and local boards of education, and academic libraries serving these groups.

Frequently asked questions

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes, you can access Handbook of Test Development by Suzanne Lane, Mark R. Raymond, Thomas M. Haladyna in PDF and/or ePUB format, as well as other popular books in Pedagogía & Educación general. We have over one million books available in our catalogue for you to explore.

Information

Publisher
Routledge
Year
2015
ISBN
9781136242564
Edition
2
Part I
Foundations

1
Test Development Process

Suzanne Lane, Mark R. Raymond, Thomas M. Haladyna and Steven M. Downing
Test development requires a systematic approach to ensure the validity of test score interpretations and uses based on those interpretations. The 12 components of test development outlined in this chapter provide a framework for test development and validation. In the first edition of the Handbook (Downing & Haladyna, 2006), these components were labeled as steps (Downing, 2006). To better reflect the related and interactive nature of these steps we are referring to them as components of test development. One of the components presented in the first edition, item banking, is now subsumed in item development, and test security has been added as an essential component in the test development process. This chapter draws on the chapter in the first edition (Downing, 2006).
The first component in the test development process—overall plan—requires careful attention to all remaining components, from defining the domain and claim statements to test documentation in support of the validity of the score interpretations and uses. To effectively develop an overall plan, test developers need to consider the purpose of the test, the claims to be made about examinee performance, the score(s) that are needed and the evidence to support the validity of test score interpretations and uses.
The Standards for Educational and Psychological Testing (American Educational Research Association (AERA), American Psychological Association & National Council on Measurement in Education, 2014) provide criteria and guidelines for all test development and validation tasks and should be adhered to by test developers and users. Test developers need to provide documentation supporting their efforts in adhering to the Standards. Throughout the volume the Standards (AERA et al., 2014) are cited and used so as to ensure the validity of the interpretations and uses of test scores. It is important to note that in most chapters in the 2014 Standards there is one overarching standard and for all chapters the standards are divided into topical sections, providing an organizational framework for the standards. The chapter by Wise and Plake (this volume) provides an overview of the 2014 Standards.
The purpose of this chapter is to provide a high-level overview of the test development process through the use of the 12 coordinated components that are needed in the development of any test. Each of these 12 components can be used to provide a framework for collecting and organizing evidence to support the psychometric quality of the test and the validity of the test score interpretations and uses. Table 1.1 describes each component, cites example standards that apply to each task and identifies the chapters in the volume that discuss in detail each of the test development components. Although these components are listed sequentially, they are interrelated and some tasks may occur simultaneously or in some other order. For example, the reporting of test scores should be considered in the overall test plan and in the delineation of claim statements; test security should be attended to
Table 1.1 Test Development Process
Test development components Test development recommendation Example relevant standards Relevant chapters

Overall Plan Develop a detailed plan for the entire test development project, including information on all test components, a rationale for each component and the specific methods to be used to evaluate the validity of all intended test score interpretations and uses and the psychometric quality of the test. 1.0,2.0,3.0,4.0,5.0, 11.1,12.2,13.4 2,3,4, 5,27
Domain Definition and Claims Statements Name and define the domain to be measured. Provide a clear statement of the claims to be made about examinee knowledge, skills and abilities (KSAs). 1.0,4.1,11.2,11.3, 11.13,12.4 3,4,5,7,8,9,10
Content Specifications Develop content specifications to guide item development, form assembly, score reporting and other activities. 4.1,4.2,11.3,12.4 7,8
Item Development Identify suitable item formats and materials. Develop items and obtain validity evidence to support item use. 3.2,4.7-4.14 3,5,9,10,12, 13,14,15,16, 17,18,19 20,21
Test Design and Assembly Design and create test forms based on test specifications; attend to issues related to test content, format, scoring rules, scaling and equating. 4.3,5.0,5.1-5.20, 11.15,12.11,13.2 3,7,8,20,21,22, 23,24,25,26
Test Production Produce a clear, accurate and accessible test form. 4.0 28
Test Administration Administer the test in a standardized way. Avoid threats to validity that may arise during administration. 3.0,3.4,4.3,4.15-4.17, 6.1-6.7,12.16 6,29
Scoring Establish a quality control policy and procedures for scoring and tabulating item responses. Ensure accurate and consistent scoring where judgment is required. 4.3,4.18-4.23,6.8-6.9 14,15,17,20,22,23
Cut Scores Establish defensible cut scores consistent with the purpose of the test. 2.16,5.21-5.23,11.16 11
Test Score Reports Develop accessible and understandable test score reports. 2.0,2.3-2.4,2.13-2.14, 5.1-5.5,6.10-6.16, 8.7-8.8,12.18 30
Test Security Establish policies and procedures for ensuring test security during test development and administration. 6.7,6.14,6.16,7.9,9.0, 8.5-8.6,8.9-8.12,9.0, 9.21-9.23 6,12
Test Documentation Prepare technical reports and other documentation supporting validity, fairness and the technical adequacy of the test. 4.0,7.0,7.1-7.14,12.6 31,32
throughout the test development process; and the type of evidence that is needed to document the psychometric quality and the validity of the test score interpretations and uses should be delineated initially and refined throughout test development.

Overall Plan

The overall plan provides a systematic framework for all major activities associated with test development, makes explicit the most important a priori decisions, outlines a realistic timeline and emphasizes test security and quality control procedures from the onset. The fundamental questions to address in this phase are: What is the construct to be measured? What is the population for which the test is intended? Who are the test users and what are the intended interpretations and uses of test scores? What test content, cognitive demands and format will support the intended interpretations and uses? Rationales for the decisions to each question should be provided by the test developer. For entities that don’t develop the test (e.g., state agencies, certification boards) but instead contract for services, an overall plan provides the essential ingredients for developing a contract for test services. The chapter by Roeber and Trent (this volume) provides useful guidance for specifying requirements, obtaining bids and evaluating contracts for testing services.
The overall plan requires an explicit delineation of the validity evidence that is needed to support each of the intended score interpretations and uses. The claims made about how test scores can be interpreted and used and the validity evidence needed to support these claims must guide the subsequent decisions made in the overall plan. As explained in the chapter by Kane (this volume), an interpretative/use argument (IUA) and validity argument (Kane, 2013) provide a foundation for all test development activities. The IUA specifies the intended interpretations and uses of test scores for the intended population and contexts, and the validity argument provides an evaluation of the IUA. Evidence required for the validity argument should be obtained throughout the test development process, with the recognition that the evidence collected may lead to subsequent refinements.
Fairness should also be considered in the overall test plan because it is a fundamental validity issue (AERA et al., 2014). The four views of fairness proposed by the Standards (AERA et al., 2014) and discussed in the chapter by Zieky (this volume)—equitable treatment of test takers, absence of bias or differential validity for subgroups of test takers, accessibility, and validity of score interpretations for all test takers—should be addressed throughout the test development plan. Fairness in testing is achieved if a given test score has the same meaning for all examinees and is not substantially influenced by factors not relevant to the examinee’s performance. As indicated by Standard 4.0 in the “Test Design and Development” chapter,
Test and testing programs should be designed and developed in a way that supports the validity of interpretations of the test scores for their intended uses. Test developers and publishers should document steps taken during the design and development process to provide evidence of fairness, reliability, and validity for intended uses for individuals in the intended examinee population.
(AERA et al., 2014, p. 85)
Decisions also need to be made about the test administration procedures, scoring procedures, what scores are to be reported and how, and the psychometric methods to be used and a rationale for their use. Other fundamental decisions include: Who develops and reviews the test specifications, items, scoring procedures, test administration materials and score reports? How are examinee confidentiality and test security maintained? What quality controls are needed to ensure accuracy? What documentation is needed for all test development activities to support the intended interpretations and uses? Timelines and the identification of those responsible for each task need to be clearly stated so as to ensure an effective test development plan.
The Standards (AERA et al., 2014) articulate the importance of specifying the intended score interpretations and uses, the construct to be measured and all subsequent activities in the test development process, including rationales for all decisions, to ensure the validity of such score interpretations and uses. As the Standards indicate, “Ultimately, the validity of an intended interpretation of test scores relies on all the available evidence relevant to the technical quality of a testing system” (p. 22). Over the past decade, principled test design has emerged as a rigorous paradigm for guiding the planning and design of testing programs. The chapter by Riconscente, Mislevy and Corrigan (this volume) provides an overview of evidence-centered design, one approach to principled design that has guided test development for some notable testing programs. Evidence-centered design provides a systematic framework and thorough processes for defining the construct domain, articulating the claim statements and designing the assessment tasks that will furnish the requisite evidence that examinees have obtained the specified knowledge, skills and abilities (KSAs).

Domain Definition and Claims Statements

A critical early activity for test developers is to articulate the construct domain to be tested and to specify the claims to be made about examinee KSAs. These claim statements help define the domain to be tested and articulate the intended interpretations of the scores. As noted in the chapter by Kane (this volume), achievement tests and credentialing tests rely heavily on validity evidence based on test content to make fundamental arguments to support or refute specific interpretations and uses of test scores. The effectiveness of all other test development activities relies on how well the domain is defined and claim statements are delineated. The validity of test score interpretations and uses rests on the adequacy and defensibility of the methods used to define the domain and claim statements, and the successful implementation of procedures to systematically and sufficiently sample the domain.
Defining the domain for educational achievement tests is typically guided by local, state or national content standards. Statements are made about the KSAs that differentiate students in performance categories, such as “advanced,” “proficient” and “basic.” This is accomplished through the delineation of claims about student KSAs at each performance level; these performance-level descriptions are then used to guide the development of the test content specifications and item and test development. The chapter by Perie and Huff (this volume) provides a thorough discussion on specifying the content of tests and student claims through the use of performance-level descriptors. Once developed, these descriptors serve as the basis for item development, standard setting and score reporting. The development of performance-level descriptions requires an understanding of how students learn and progress within a given domain so as to be able to delineate the degree to which students have acquired the intended KSAs. The chapter by Graf and van Rijn (this volume) defines learning progressions for linear functions, presents sample tasks for measuring that progression and then illustrates strategies for empirically verifying the ordering of levels in that progression. Of course, the learning progressions for a particular grade level do not stop at the end of the academic year—the KSAs continue their development into subsequent years. Therefore, as the chapter by Young and Tong (this volume) points out, learning progressions also span grade levels, as does the need to develop vertical score scales and report scores that allow for examining student growth across those grades. Their chapter on vertical scaling provides a thorough yet accessible summary of the factors that need to be considered if planning to report scores that encourage interpretations related to student growth over time.
In contrast to achievement tests, the domain to be defined for credentialing tests is job performance in the work setting. Domain definitions in credentialing are obtained through practice analyses in which subject-matter experts (SMEs) analyze real-world work activities...

Table of contents

  1. Cover
  2. Title
  3. Copyright
  4. CONTENTS
  5. List of Contributors
  6. Preface
  7. PART I Foundations
  8. PART II Content
  9. PART III Item Development and Scoring
  10. PART IV Test Design and Assembly
  11. PART V Production, Preparation, Administration, Reporting, Documentation and Evaluation
  12. Author Index
  13. Subject Index