eBook - ePub

Comprehending Test Manuals

Name: Comprehending Test Manuals
Author: Ann Silverlake

A Guide and Workbook

Ann Silverlake,

122 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Comprehending Test Manuals

A Guide and Workbook

Ann Silverlake,

Book details

Book preview

Table of contents

Citations

About This Book

• Your students will get valuable practice in interpreting actual excerpts from published test manuals.

• Each of the 39 exercises begins with a guideline that helps students review the measurement concepts they will need in order to complete the exercise.

• Background notes on each exercise describe the purpose of the test from which the excerpt was drawn.

• Students answer questions that require them to locate and interpret important points in the excerpt.

• The excerpts are largely unabridged so that students practice interpreting material as it is actually presented by test makers.

• The skills they learn with this book can be easily transferred to other test manuals they may be using in the future.

• Students have an ethical responsibility to be thoroughly familiar with the technical characteristics of the tests they will use. This book prepares them for this responsibility.

• All major topics are covered, including:

· validity

· reliability

· standard error of measurement

· norm group composition

· derived scores

· scales to detect faking

· item analysis

· cultural bias

• The excerpts are drawn from tests such as:

· Wechsler Intelligence Scale for Children

· Peabody Picture Vocabulary Test

· 16PF

· Stanford Binet Intelligence Scale

· MMPI

· Beck Depression Inventory

· Stanford Achievement Test Series

· KeyMath

· and many others!

Frequently asked questions

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.

At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.

Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Yes, you can access Comprehending Test Manuals by Ann Silverlake in PDF and/or ePUB format, as well as other popular books in Psychology & Research & Methodology in Psychology. We have over one million books available in our catalogue for you to explore.

Information

Publisher

Routledge

Year

2016

ISBN

9781351970860

Edition

Topic

Psychology

Subtopic

Research & Methodology in Psychology

Index

Psychology

Exercise 1
Test-Retest Reliability

Behavior Rating Profile¹

Guideline

Test-retest reliability measures the stability of test scores over time. To estimate this type of reliability, a test is administered twice to a group of examinees — generally with a week or two between the two administrations. The degree of reliability is usually expressed with a correlation coefficient. Note that when a correlation coefficient is used to describe reliability, it is called a "reliability coefficient," or, in this case, a "test-retest reliability coefficient." See Appendix A to review the correlation coefficient before attempting this exercise.

Background Notes

The Behavior Rating Profile, 2nd edition (BRP-2) is designed for rating students who display disturbed behavior. On one scale, parents rate their children on items such as "Is verbally aggressive to parents" and "Is shy; clings to parents." On another scale, teachers rate the children on items such as "Is an academic underachiever" and "Doesn't follow class rules." On three other scales, children rate themselves in relation to their home lives (e.g., "I often break rules set by my parents"), school lives (e.g., "My teachers give me work that I cannot do"), and peers (e.g., "Other kids don't seem to like me very much.").

Excerpt from the Manual

In the test-retest method, a test is administered to the same group of students on two occasions. A specified period of time is permitted to elapse between administrations, and the results are analyzed to test for mean differences or to determine the correlation of the two sets of data. Kaufman (1980) used this procedure to investigate the stability reliability [i.e., test-retest reliability] of the BRP-2 scales with 36 Indiana high school students, 27 of their parents, and 36 of their teachers, permitting two weeks to intervene between administrations... The resulting coefficients, reported in Table 4.3, range from .78 to .91 with only one coefficient falling below the .80 demarcation. These data provide evidence of the stability of the BRP-2 scales when they are used with adolescents. [See Table 4.3 on the next page.]

Table 4.3 Delayed Test-Retest Reliability of the BRP-2 Scales with Adolescents (decimals omitted)

BRP-2 Scale	r

Parent Rating Scale	84
Teacher Rating Scale	91
Student Rating Scales: Home	78
Student Rating Scales: School	83
Student Rating Scales: Peer	86

Questions:

Which one of the scales is the most reliable? Explain.
Which one of the scales is the least reliable? Explain.
In your opinion, are all the scales adequately reliable? Explain.
The excerpt presents the results of only one of a number of reliability studies described in the manual for the BRP-2. In your opinion, is this one study sufficient or are others needed? Explain.
In Table 4.3, decimals have been omitted. If they were not omitted, what would the reliability coefficient be for the Parent Rating Scale?
The test-retest reliability coefficients are based on a two-week interval. Do you think the coefficients would be higher or lower if a two-month interval had been used? Explain.
Speculate on why test makers usually allow an interval of a week or two between the two administrations of the test instead of giving the same test twice in a row at one sitting.
If you were considering using this instrument, what other types of reliability coefficients, if any, would you like to see in the manual? Explain.
In general, how important is test-retest reliability information for selecting a scale or test? Would you consider it a serious flaw if a manual did not contain information on this topic? Explain.
If you have a measurement textbook, do the authors suggest a minimum acceptable value for a test-retest reliabi lity coefficient? If yes, what is it? If yes, do all of the coefficients in the excerpt exceed the minimum value?

Exercise 2
Interscorer Reliability

Wechsler Preschool and Primary Scale of Intelligence¹

Guideline

Scoring some tests involves making subjective judgments. For example, some subjectivity often enters into scoring essays, and, as a result, one English teacher might give an essay a grade of A while another might give it a grade of B. Such a lack of agreement indicates a weakness in interscorer reliability (i.e., the consistency of scores from one scorer to another).

Interscorer reliability is usually judged by having a set of examinees' responses to the test scored by two or more scorers and correlating the two sets of scores by computing a correlation coefficient. Note that when a correlation coefficient is used for this prupose, it is called an "interscorer reliability coefficient." See Appendix A to review the correlation coefficient before attempting this exercise.

Background Notes

The Wechsler Preschool and Primary Scale of Intelligence-Revised (WPPSI-R) is an individually administered intelligence test for young children. The test administrator observes an individual examinee's responses and scores them.

Excerpt from the Manual

Most WPPSI-R subtests involve straightforward and quite objective scoring; however, some subtests are subjectively scored, and are therefore more vulnerable to scoring error. For these subtests, which include Comprehension, Vocabulary, Similarities, and Mazes, it was necessary to evaluate interscorer reliability. In addition, previous research with the WPPSI indicated a low rate of scoring agreement on the Geometric Design subtest (Sattler, 1976). A more objective set of scoring rules and procedures was created for this subtest, and its effect on scorer agreement also was evaluated.

To assess the interscorer reliability of the Comprehension, Vocabulary, Similarities, and Mazes subtests, a sample of 151 cases (83 males and 68 females) stratified by age was randomly selected from all cases collected for the standardization. For the Geometric Design subtest, a sample of 188 cases (105 males and 83 females) was randomly selected. A group of research scorers was trained and given practice in scoring the subtests. The cases were subdivided by age to control for age effects, and two scorers were selected at random to score all the cases in each age group.

To ensure that scorings were independent, any previous scoring notations on standardization Record Forms were masked, leaving only the verbatim responses on the Verbal subtests, the performance times and tracing on Mazes, and the actual drawings on Geometric Design. Scorers in the study recorded their scores on separate forms so that they never saw each other's scores. . . .

Interscorer reliability coefficients were as follows: .96 on Comprehension, .94 on Vocabulary, .96 on Similarities, .94 on Mazes, and .88 on Geometric Design. These results indicate that the scoring rules for these subtests are objective enough for different scorers to produce similar results.

Questions:

Why was scorer agreement examined for only some of the WPPSI-R subtests?
Cases were selected at random. What is random selection?
Cases were selected from all cases collected for the standardization. What do you think the "standardization" is?
Is it important to know that the research scorers were trained and given practice in scoring the subtests? Explain.
How many scorers scored the cases in each age group? In your opinion, is this an adequate number?
The responses had been previously scored. Is it important to know that the research scorers were not allowed to see the previous scoring notations? Why? Why not?
Is it important to know that the research scorers did not see each other's scores? Why? Why not?
On which subtest was the interscorer reliability the lowest? Explain.
Overall, do you think that the interscorer reliability is adequate? Explain.

¹Wechsler Preschool and Primary Scale of Intelligence-Revised (WPPSI-R). Copyright © 1993 by The Psychological Corporation. Reproduced by permission. All rights reserved. "Wechsler Preschool and Primary Scale of Intelligence" and "WPPSI-R" are registered trademarks of The Psychological Corporation.

Cover
Title
Copyright
Table of Contents
Introduction
1. Test-Retest Reliability Behavior Rating Profile
2. Interscorer Reliability Wechsler Preschool and Primary Scale of Intelligence
3. Internal Consistency and Test-Retest Reliability Occupational Aptitude Survey and Interest Schedule
4. Internal Consistency Reliability (Cronbach's Alpha) The Sixteen Factor Personality Questionnaire
5. Concurrent Validity and Test-Retest Reliability Reading and Arithmetic Indexes (12)
6. Concurrent Validity Thurstone Test of Mental Alertness
7. Predictive Validity Wechsler Intelligence Scale for Children
8. Content Validity: I Test of Written Language
9. Content Validity: II Boehm Test of Basic Concepts
10. Construct Validity: I Comprehensive Receptive and Expressive Vocabulary Test
11. Construct Validity: II Gray Oral Reading Tests
12. Construct Validity: III Beck Depression Inventory
13. Percentile Ranks Test of Pragmatic Language
14. Stanines Flanagan Aptitude Classification Test
15. IQ Scores Wechsler Intelligence Scale for Children
16. Derived Scores and the Normal Curve Peabody Picture Vocabulary Test
17. Grade Equivalents KeyMath
18. Age Equivalents Vineland Adaptive Behavior Scales
19. Norm Group Composition: I The Adaptive Behavior Evaluation Scale
20. Norm Group Composition: II The Sixteen Personality Factor Questionnaire
21. Standard Error of Measurement: I Peabody Picture Vocabulary Test
22. Standard Error of Measurement: II Behavior Dimensions Scale-School Version
23. Standard Error of Measurement and Alternate-Forms Reliability Stanford Achievement Test Series
24. Significance of Intra-Ability Difference Scores Gray Oral Reading Tests
25. Use of a Bias Review Panel Personality Assessment Inventory
26. Pretesting Items to Reduce Bias SRA Pictorial Reasoning Test
27. Procedures to Eliminate Bias Stanford Achievement Test Series
28. Scales for Detecting Faking Tennessee Self-Concept Scale
29. Experiment on Faking Survey of Interpersonal Values
30. Social Desirability Scale The Sixteen Personality Factor Questionnaire
31. Item Omissions and Validity Minnesota Multiphasic Personality Inventory
32. Lie Scale Minnesota Multiphasic Personality Inventory
33. Item Analysis: I The Attention Deficit Disorders Evaluation Scale-Home Version
34. Item Analysis: II Comprehensive Receptive and Expressive Vocabulary Test
35. Equivalence of Editions Self-Directed Search
36. Presenting Intelligence Test Items Stanford-Binet Intelligence Scale
37. Testing Conditions Minnesota Multiphasic Personality Inventory
38. Establishing Rapport During Test Administration Woodcock-Johnson Tests of Cognitive Ability
39. Responsibility for Test Security Woodcock Language Proficiency Battery
Appendix A Review of Basic Statistics

About This Book

Frequently asked questions

Information

Exercise 1Test-Retest Reliability

Guideline

Background Notes

Excerpt from the Manual

Exercise 2Interscorer Reliability

Guideline

Background Notes

Excerpt from the Manual

Table of contents

Exercise 1
Test-Retest Reliability

Exercise 2
Interscorer Reliability