Comprehending Test Manuals
eBook - ePub

Comprehending Test Manuals

A Guide and Workbook

  1. 122 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Comprehending Test Manuals

A Guide and Workbook

Book details
Book preview
Table of contents
Citations

About This Book

• Your students will get valuable practice in interpreting actual excerpts from published test manuals.

• Each of the 39 exercises begins with a guideline that helps students review the measurement concepts they will need in order to complete the exercise.

• Background notes on each exercise describe the purpose of the test from which the excerpt was drawn.

• Students answer questions that require them to locate and interpret important points in the excerpt.

• The excerpts are largely unabridged so that students practice interpreting material as it is actually presented by test makers.

• The skills they learn with this book can be easily transferred to other test manuals they may be using in the future.

• Students have an ethical responsibility to be thoroughly familiar with the technical characteristics of the tests they will use. This book prepares them for this responsibility.

• All major topics are covered, including:

· validity

· reliability

· standard error of measurement

· norm group composition

· derived scores

· scales to detect faking

· item analysis

· cultural bias

• The excerpts are drawn from tests such as:

· Wechsler Intelligence Scale for Children

· Peabody Picture Vocabulary Test

· 16PF

· Stanford Binet Intelligence Scale

· MMPI

· Beck Depression Inventory

· Stanford Achievement Test Series

· KeyMath

· and many others!

Frequently asked questions

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes, you can access Comprehending Test Manuals by Ann Silverlake in PDF and/or ePUB format, as well as other popular books in Psychology & Research & Methodology in Psychology. We have over one million books available in our catalogue for you to explore.

Information

Publisher
Routledge
Year
2016
ISBN
9781351970860
Edition
1

Exercise 1
Test-Retest Reliability

Behavior Rating Profile1

Guideline

Test-retest reliability measures the stability of test scores over time. To estimate this type of reliability, a test is administered twice to a group of examinees — generally with a week or two between the two administrations. The degree of reliability is usually expressed with a correlation coefficient. Note that when a correlation coefficient is used to describe reliability, it is called a "reliability coefficient," or, in this case, a "test-retest reliability coefficient." See Appendix A to review the correlation coefficient before attempting this exercise.

Background Notes

The Behavior Rating Profile, 2nd edition (BRP-2) is designed for rating students who display disturbed behavior. On one scale, parents rate their children on items such as "Is verbally aggressive to parents" and "Is shy; clings to parents." On another scale, teachers rate the children on items such as "Is an academic underachiever" and "Doesn't follow class rules." On three other scales, children rate themselves in relation to their home lives (e.g., "I often break rules set by my parents"), school lives (e.g., "My teachers give me work that I cannot do"), and peers (e.g., "Other kids don't seem to like me very much.").

Excerpt from the Manual

In the test-retest method, a test is administered to the same group of students on two occasions. A specified period of time is permitted to elapse between administrations, and the results are analyzed to test for mean differences or to determine the correlation of the two sets of data. Kaufman (1980) used this procedure to investigate the stability reliability [i.e., test-retest reliability] of the BRP-2 scales with 36 Indiana high school students, 27 of their parents, and 36 of their teachers, permitting two weeks to intervene between administrations... The resulting coefficients, reported in Table 4.3, range from .78 to .91 with only one coefficient falling below the .80 demarcation. These data provide evidence of the stability of the BRP-2 scales when they are used with adolescents. [See Table 4.3 on the next page.]
Table 4.3 Delayed Test-Retest Reliability of the BRP-2 Scales with Adolescents (decimals omitted)
BRP-2 Scale r


Parent Rating Scale 84
Teacher Rating Scale 91
Student Rating Scales: Home 78
Student Rating Scales: School 83
Student Rating Scales: Peer 86
Questions:
  1. Which one of the scales is the most reliable? Explain.
  2. Which one of the scales is the least reliable? Explain.
  3. In your opinion, are all the scales adequately reliable? Explain.
  4. The excerpt presents the results of only one of a number of reliability studies described in the manual for the BRP-2. In your opinion, is this one study sufficient or are others needed? Explain.
  5. In Table 4.3, decimals have been omitted. If they were not omitted, what would the reliability coefficient be for the Parent Rating Scale?
  6. The test-retest reliability coefficients are based on a two-week interval. Do you think the coefficients would be higher or lower if a two-month interval had been used? Explain.
  7. Speculate on why test makers usually allow an interval of a week or two between the two administrations of the test instead of giving the same test twice in a row at one sitting.
  8. If you were considering using this instrument, what other types of reliability coefficients, if any, would you like to see in the manual? Explain.
  9. In general, how important is test-retest reliability information for selecting a scale or test? Would you consider it a serious flaw if a manual did not contain information on this topic? Explain.
  10. If you have a measurement textbook, do the authors suggest a minimum acceptable value for a test-retest reliabi lity coefficient? If yes, what is it? If yes, do all of the coefficients in the excerpt exceed the minimum value?
1Brown, L., & Hammill, D. D. (1990). Examiner's Manual: Behavior Rating Profile (Second Edition). Austin, TX: Pro-Ed. Excerpt reprinted with permission. Copyright © 1990 by Pro-Ed.

Exercise 2
Interscorer Reliability

Wechsler Preschool and Primary Scale of Intelligence1

Guideline

Scoring some tests involves making subjective judgments. For example, some subjectivity often enters into scoring essays, and, as a result, one English teacher might give an essay a grade of A while another might give it a grade of B. Such a lack of agreement indicates a weakness in interscorer reliability (i.e., the consistency of scores from one scorer to another).
Interscorer reliability is usually judged by having a set of examinees' responses to the test scored by two or more scorers and correlating the two sets of scores by computing a correlation coefficient. Note that when a correlation coefficient is used for this prupose, it is called an "interscorer reliability coefficient." See Appendix A to review the correlation coefficient before attempting this exercise.

Background Notes

The Wechsler Preschool and Primary Scale of Intelligence-Revised (WPPSI-R) is an individually administered intelligence test for young children. The test administrator observes an individual examinee's responses and scores them.

Excerpt from the Manual

Most WPPSI-R subtests involve straightforward and quite objective scoring; however, some subtests are subjectively scored, and are therefore more vulnerable to scoring error. For these subtests, which include Comprehension, Vocabulary, Similarities, and Mazes, it was necessary to evaluate interscorer reliability. In addition, previous research with the WPPSI indicated a low rate of scoring agreement on the Geometric Design subtest (Sattler, 1976). A more objective set of scoring rules and procedures was created for this subtest, and its effect on scorer agreement also was evaluated.
To assess the interscorer reliability of the Comprehension, Vocabulary, Similarities, and Mazes subtests, a sample of 151 cases (83 males and 68 females) stratified by age was randomly selected from all cases collected for the standardization. For the Geometric Design subtest, a sample of 188 cases (105 males and 83 females) was randomly selected. A group of research scorers was trained and given practice in scoring the subtests. The cases were subdivided by age to control for age effects, and two scorers were selected at random to score all the cases in each age group.
To ensure that scorings were independent, any previous scoring notations on standardization Record Forms were masked, leaving only the verbatim responses on the Verbal subtests, the performance times and tracing on Mazes, and the actual drawings on Geometric Design. Scorers in the study recorded their scores on separate forms so that they never saw each other's scores. . . .
Interscorer reliability coefficients were as follows: .96 on Comprehension, .94 on Vocabulary, .96 on Similarities, .94 on Mazes, and .88 on Geometric Design. These results indicate that the scoring rules for these subtests are objective enough for different scorers to produce similar results.
Questions:
  1. Why was scorer agreement examined for only some of the WPPSI-R subtests?
  2. Cases were selected at random. What is random selection?
  3. Cases were selected from all cases collected for the standardization. What do you think the "standardization" is?
  4. Is it important to know that the research scorers were trained and given practice in scoring the subtests? Explain.
  5. How many scorers scored the cases in each age group? In your opinion, is this an adequate number?
  6. The responses had been previously scored. Is it important to know that the research scorers were not allowed to see the previous scoring notations? Why? Why not?
  7. Is it important to know that the research scorers did not see each other's scores? Why? Why not?
  8. On which subtest was the interscorer reliability the lowest? Explain.
  9. Overall, do you think that the interscorer reliability is adequate? Explain.
1Wechsler Preschool and Primary Scale of Intelligence-Revised (WPPSI-R). Copyright © 1993 by The Psychological Corporation. Reproduced by permission. All rights reserved. "Wechsler Preschool and Primary Scale of Intelligence" and "WPPSI-R" are registered trademarks of The Psychological Corporation.

Table of contents

  1. Cover
  2. Title
  3. Copyright
  4. Table of Contents
  5. Introduction
  6. 1. Test-Retest Reliability Behavior Rating Profile
  7. 2. Interscorer Reliability Wechsler Preschool and Primary Scale of Intelligence
  8. 3. Internal Consistency and Test-Retest Reliability Occupational Aptitude Survey and Interest Schedule
  9. 4. Internal Consistency Reliability (Cronbach's Alpha) The Sixteen Factor Personality Questionnaire
  10. 5. Concurrent Validity and Test-Retest Reliability Reading and Arithmetic Indexes (12)
  11. 6. Concurrent Validity Thurstone Test of Mental Alertness
  12. 7. Predictive Validity Wechsler Intelligence Scale for Children
  13. 8. Content Validity: I Test of Written Language
  14. 9. Content Validity: II Boehm Test of Basic Concepts
  15. 10. Construct Validity: I Comprehensive Receptive and Expressive Vocabulary Test
  16. 11. Construct Validity: II Gray Oral Reading Tests
  17. 12. Construct Validity: III Beck Depression Inventory
  18. 13. Percentile Ranks Test of Pragmatic Language
  19. 14. Stanines Flanagan Aptitude Classification Test
  20. 15. IQ Scores Wechsler Intelligence Scale for Children
  21. 16. Derived Scores and the Normal Curve Peabody Picture Vocabulary Test
  22. 17. Grade Equivalents KeyMath
  23. 18. Age Equivalents Vineland Adaptive Behavior Scales
  24. 19. Norm Group Composition: I The Adaptive Behavior Evaluation Scale
  25. 20. Norm Group Composition: II The Sixteen Personality Factor Questionnaire
  26. 21. Standard Error of Measurement: I Peabody Picture Vocabulary Test
  27. 22. Standard Error of Measurement: II Behavior Dimensions Scale-School Version
  28. 23. Standard Error of Measurement and Alternate-Forms Reliability Stanford Achievement Test Series
  29. 24. Significance of Intra-Ability Difference Scores Gray Oral Reading Tests
  30. 25. Use of a Bias Review Panel Personality Assessment Inventory
  31. 26. Pretesting Items to Reduce Bias SRA Pictorial Reasoning Test
  32. 27. Procedures to Eliminate Bias Stanford Achievement Test Series
  33. 28. Scales for Detecting Faking Tennessee Self-Concept Scale
  34. 29. Experiment on Faking Survey of Interpersonal Values
  35. 30. Social Desirability Scale The Sixteen Personality Factor Questionnaire
  36. 31. Item Omissions and Validity Minnesota Multiphasic Personality Inventory
  37. 32. Lie Scale Minnesota Multiphasic Personality Inventory
  38. 33. Item Analysis: I The Attention Deficit Disorders Evaluation Scale-Home Version
  39. 34. Item Analysis: II Comprehensive Receptive and Expressive Vocabulary Test
  40. 35. Equivalence of Editions Self-Directed Search
  41. 36. Presenting Intelligence Test Items Stanford-Binet Intelligence Scale
  42. 37. Testing Conditions Minnesota Multiphasic Personality Inventory
  43. 38. Establishing Rapport During Test Administration Woodcock-Johnson Tests of Cognitive Ability
  44. 39. Responsibility for Test Security Woodcock Language Proficiency Battery
  45. Appendix A Review of Basic Statistics