The Role of Constructs in Psychological and Educational Measurement
eBook - ePub

The Role of Constructs in Psychological and Educational Measurement

  1. 344 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

The Role of Constructs in Psychological and Educational Measurement

Book details
Book preview
Table of contents
Citations

About This Book

Contributors to the volume represent an international "who's who" of research scientists from the fields of psychology and measurement. It offers the insights of these leading authorities regarding cognition and personality. In particular, they address the roles of constructs and values in clarifying the theoretical and empirical work in these fields, as well as their relation to educational assessment. It is intended for professionals and students in psychology and assessment, and almost anyone doing research in cognition and personality.

Frequently asked questions

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes, you can access The Role of Constructs in Psychological and Educational Measurement by Henry I. Braun,Douglas N. Jackson,David E. Wiley in PDF and/or ePUB format, as well as other popular books in Education & Education General. We have over one million books available in our catalogue for you to explore.

Information

Publisher
Routledge
Year
2001
ISBN
9781135649890
Edition
1

IV
VALUES—THEORY ASSESSMENT

13
Constructs and Values in Standards-Based Assessment*

Robert L.Linn
University of Colorado at Boulder



*An earlier version of this chapter was presented at a conference in honor of Samuel Messick, “Under Construction: The Role of Constructs in Psychological and Educational Testing,” Princeton, NJ: Educational Testing Service, September 19, 1997. Preparation of this chapter was partially supported under the Educational Research and Development Center Program PR/Award Number R305B600002, as administered by the Office of Educational Research and Improvement, U.S. Department of Education. The findings and opinions expressed in this publication do not reflect the position or policies of the National Institute on student Achievement, the Office of Educational Research and Improvement or the U.S. Department of Education.
Although I was quick to say yes when Ann Jungeblut asked me to participate in this celebration of Sam Messick’s contributions to educational and psychological measurement, I must admit it is a bit intimidating to think of talking about issues of validity, constructs, and values in this context. Many people have contributed to the refinement of the way that the field thinks about validity. Two people, Sam Messick and Lee Cronbach, however, stand head and shoulders above the crowd in this regard. In the space available, it would be difficult to do justice to a summary of the ways in which Sam has advanced our thinking about the interplay of constructs and values or the process of validating inferences and actions based on test scores. The intimidation, however, comes from the desire to build and extend his work in these complex areas with him sitting here. Nonetheless, I, of course, told Ann that it was a great idea and wouldn’t miss the chance to participate.
Although it may not be evident from my remarks, I have learned a great deal from Sam. I am indebted to him for the support he gave me early in my career when I was at ETS. He always gave me encouragement and the freedom to pursue my research interests. In addition to teaching me a lot about validity, Sam also taught me some more mundane and practical things. For example, he taught me that an ETS Senior Research Psychologist (the title he held when I was new to ETS in 1965) is obviously equivalent to a full professor because we always waited for him to arrive full 15 minutes after the time he set for the start of a meeting. When he became a vice president, I learned that there is no university equivalent of that exalted title, because no one would wait that long in a university setting. I also learned as editor of the Third Edition of Educational Measurement, that although a chapter by Sam will certainly not arrive on the editor’s desk on schedule, it will be clear when it does arrive that the contribution it makes is more than worth the wait. Although I haven’t checked it, I am confident that Sam’s chapter is the most cited, and certainly the most influential chapter in the Third Edition of Educational Measurement.
Of course, there were good reasons people were willing to wait for Sam to show up for a meeting just as there are for an editor of a book to be willing to wait for a chapter. Most important of these reasons is that the quality of his thinking on issues ranging from the mundane bureaucratic ones to the profound. Sam’s contributions to the discussion are consistently worth the wait. It is no wonder that when ETS has had a problem or issue that required the best the organization had to offer, Sam has long been the person that every ETS President (Henry Chauncey, Bill Turnbull, Greg Anrig, and now Nancy Cole) has turned to for help. Thus, there was no answer other than “yes” when Ann Jungeblut asked me to participate in this celebration. I am delighted to be part of it.
The focus of my chapter is on issues of values, constructs, and validity in the context standards-based assessment programs. I take as my starting point, Sam’s one sentence definition of validity in his 1989 chapter in the third edition of Educational Measurement, which Shepard (1993) describes as the currently “most cited authoritative reference on the topic (p. 423). “Validity is an integrated evaluative judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and actions based on test scores or other modes of assessment” (Messick, 1989, p. 13, emphasis in original). This statement, which is the first sentence of Sam’s influential chapter, is so packed with meaning that it took him 91 pages of an oversized book with relatively small type to elaborate the concept.
As I’m sure is familiar to most readers, Sam’s comprehensive definition of validity is elaborated in a two-by-two table corresponding to the adequacy/appropriateness and inferences/ actions distinctions of the definition. The two rows of the table distinguish two bases of support for validity claims—the evidential basis and the consequential basis that are used to support claims of adequacy and appropriateness. The two columns of the table distinguish between interpretations of assessment results (e.g., the latest NAEP history results show students have a “striking lack of knowledge about their heritage”, Innerst, Washington Times, November 2, 1995) and uses of results (e.g., the cash awards given to teachers in Kentucky based on increases in assessment scores by successive cohorts of student).
Although there is some disagreement in the field regarding the desirability of including consequences as a part of validity (see, for example, Green, 1990; Moss, 1994; Wiley, 1991), there is broad consensus both with other parts of Messick’s comprehensive formulation and with the importance of investigations of consequences as part of the overall evaluation of particular uses and interpretations of assessment results (Baker, O’Neil, & Linn, 1993; Cronbach, 1980; 1988; Linn, 1994; Linn & Baker, 1998; Linn, Baker, & Dunbar, 1991; Moss, 1992, 1994; Shepard, 1993). Of course, affirmation of primacy of validity based on a comprehensive framework is one thing. Validity practice is quite another. Validity practice is too often more in keeping with outmoded notions of validity that base validity claims on a demonstration that test items correspond to the cells of a matrix of test specifications or the demonstration that scores on a test are correlated with other relevant measures (e.g., teacher ratings, or another test administered at a later time). Although both content-related evidence and criterion-related evidence are relevant to validity judgments, they do not provide a sufficient basis for the kind of “integrated evaluative judgment” that Messick demands.
Standards-based assessments currently being introduced in states around the country are frequently introduced by legislation that includes a requirement that assessments be “valid, reliable, and fair.” However, the approach to validation is often limited to comparisons of the assessment content to the content standards that are supposed to determine what gets assessed. In old terminology, the actions suggest that content validity, or what Sam would prefer to call content relevance and content representativeness, are treated as if that was sufficient. The new slogan for that emphasis of content relevance and representativeness is alignment. There is considerable emphasis on developing assessments that are aligned with content standards. Far less attention is paid, however, to accumulating evidence needed to judge the adequacy and appropriateness of interpretations and uses made of assessment results.
There are a number of reasons for the large discrepancy between the theory and practice of validity. When taken seriously, the job of validation can be quite daunting and may seem to be overwhelming due to its comprehensiveness. Because of the scope of validity, it is often useful to identify a series of components or facets of validity.
The CRESST validity criteria (Baker, O’Neil, & Linn, 1993; Linn, Baker, & Dunbar, 1991) were developed in response to questions being raised by new emphases on the forms of assessment that were seen in the early 1990’s. In particular, the movement toward an increased reliance on performance-based assessments characterized by judgmentally rated student responses on a relatively small number of open-ended tasks was stimulated by changing priorities. The changing priorities demanded increased attention to the consequences of uses made of assessment results and fundamental questions of fairness. The judgmental aspects of scoring and the limited number of tasks also gave new salience to questions of generalizability and transfer. Questions of content quality, the adequacy of content coverage, the cognitive processes being measured, were also highlighted.
It is not that different forms of assessment require different conceptions of validity. Messick (1994) has cogently argued that “performance assessments must be evaluated by the same validity criteria, both evidential and consequential, as are other assessments. Indeed, such basic assessment issues as validity, reliability, comparability, and fairness need to be uniformly addressed for all assessments because they are not just measurement principles, they are social values that have meaning and force outside of measurement wherever evaluative judgment and decisions are made” (1994, p. 13). New forms of assessment that were emphasized in the early 1990s did require new technical approaches, not only as the result of changes in forms of assessment, but as the result of shifts in emphases and intended uses of assessments that together provided a different context for raising validity questions. The challenge, however, remains to establish priorities for validity studies that will provide adequate evidence for making the overall evaluative judgment that Messick calls for in his definition of validity.
In the time remaining, I focus on current standards-based assessment programs and try to identify some of the issues that would seem to demand high priority in planning validation efforts. I will use as a prime example the Voluntary National Tests (VNT) in 4th grade reading and 8th grade mathematics that the Clinton administration is trying to make a reality. Although the VNT are not based on government adopted national content standards, a reasonable case can be made that they follow a standards-based approach to assessment using as the starting place the frameworks of the National Assessment of Educational Progress (NAEP). That is, the NAEP frameworks are expected to play the role of their close cousin, national content standards, and the NAEP achievement levels play the role of performance standards. Indeed, the NAEP framework was selected as a starting point in an effort to finesse the question of what content standards should be used as the basis for developing the VNT and thereby jump-start the development process so that it could meet the fast-track expectations to have operation...

Table of contents

  1. COVER PAGE
  2. TITLE PAGE
  3. COPYRIGHT PAGE
  4. CONTRIBUTORS
  5. PREFACE
  6. I: PERSONALITY—THEORY AND ASSESSMENT
  7. II: INTELLECT—THEORY AND ASSESSMENT
  8. III: VALIDITY AND VALUES IN PSYCHOLOGICAL AND EDUCATIONAL MEASUREMENT
  9. IV: VALUES—THEORY ASSESSMENT