Theory of Mental Tests
eBook - ePub

Theory of Mental Tests

  1. 508 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Theory of Mental Tests

Book details
Book preview
Table of contents
Citations

About This Book

This classic volume outlines, for both students and professionals, the mathematical theories and equations that are necessary for evaluating a test and for quantifying its characteristics. The author utilizes formulas that evaluate both the reliability and the validity of tests. He also provides the means for evaluating the reliability and validity of total test scores and individual item analysis. The work remains one of the only books on classical test theory to discuss applications, "true score" theory, the effect of test length on reliability and validity, and the effects of univariate and multivariate selection on validity.

Frequently asked questions

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes, you can access Theory of Mental Tests by Harold Gulliksen in PDF and/or ePUB format, as well as other popular books in Education & Education General. We have over one million books available in our catalogue for you to explore.

Information

Publisher
Routledge
Year
2013
ISBN
9781136461040
Edition
1

1
Introduction

It is interesting to note that during the 1890’s several attempts were made in this country to utilize the new methods of measurement of individual differences in order to predict College grades. J. McKeen Cattell and his student Clark Wissler tried a large number of psychological tests and correlated them with grades in various subjects at Columbia University; see Cattell (1890), Cattell and Farrand (1896), and Wissler (1901). The correlations between the psychological tests and the grades were around zero, the highest correlation being .19. A similar attempt by Gilbert (1894), at Yale, produced similarly disappointing results.
Scientific confidence in the possibilities of measuring individual differences revived in this country with the introduction of the Binet scale and the quantitative techniques developed by Karl Pearson and Charles Spearman at the beginning of the twentieth Century. Nearly all the basic formulas that are particularly useful in test theory are found in Spearman’s early papers; see Spearman (1904a), (1904b), (1907), (1910), and (1913). Since then development of both the theory and the practical aspects of aptitude and achievement testing has progressed rapidly. Aptitude and achievement tests are widely used in education and in industry.
Since 1900 great progress has been made toward a unified quantitative theory that describes the behavior of test items and test scores under various conditions. This mathematical rationale applicable to mental tests should not be confused with statistics. A good foundation in elementary statistics and elementary mathematics is a prerequisite for work in the theory of mental tests. In addition, as the theory of mental tests is developed, the necessity arises for various statistical criteria to determine whether or not a given set of test data agrees with the theory, within reasonable sampling limits. The theory, however, must first be developed without consideration of sampling errors, and then the statistical problems in conjunction with sampling can be considered.
This book deals with the mathematical theory and statistical methods used in interpreting test results. There are numerous non-quantitative Problems involved in constructing aptitude or achievement tests that are not considered here. Non-quantitative problems such as choice of item types or matching the examination to the objectives of a curriculum are discussed in the University of Chicago Manual of Examination Methods (1937); Englehart (1942); Hawkes, Lindquist, and Mann (1936); Hull (1928); Orleans (1937); Ruch (1929); and others. Therefore, no attempt is made here to familiarize the student with the various psychological and educational tests now available or with the scope of the many testing programs. Such material is surveyed in yearbooks by Buros (1936), (1937), (1938), (1941), and (1949); Hildreth (1939); Lee and Symonds (1934); the National Society for the Study of Education, the 17th Yearbook (1918); Ruger (1918); Whipple (1914), (1915); Freeman (1939); Mursell (1947); Ross (1947); Goodenough (1949); Cronbach (1949); and other general textbooks listed in the bibliography.
In constructing tests, analyzing and interpreting the results, there are five major types of problems:
  1. Writing and selecting the test items.
  2. Assigning a score to each person.
  3. Determining the accuracy (reliability or error of measurement) of the test scores.
  4. Determining the predictive value of the test scores (validity or error of estimate).
  5. Comparing the results with those obtained using other tests or other groups of subjects. In making these comparisons, it is necessary to consider the effect of test length and group heterogeneity on the various measures of the accuracy and the predictive value of the test scores.
In dealing with any given test these problems would arise chronologically in the order in which they are given above. However, the theory of the selection of test items depends upon comparing them with some test score or scores; therefore it is convenient to consider first the theory dealing with the accuracy of these test scores. Similarly the evaluation of experimental methods of determining reliability and the discussion of practical methods of setting up parallel tests depend upon a theoretical concept of reliability and of parallel tests. Therefore, instead of beginning with practical problems of item selection, experimental methods of determining reliability, or of setting up parallel tests, we shall begin with the theoretical constructs.
An ideal model will be set up giving the measures of accuracy of test scores and the theoretical effects of changes in test length and in group heterogeneity. The theory of these changes will be derived from assumptions regarding parallel tests and selection procedures, without inquiring very closely into the experimental methods that are appropriate for realizing these assumptions. Beginning with Chapter 14, various practical problems relating to the construction of parallel tests, criteria for parallel tests, experimental methods of determining reliability, etc., will be considered. It is felt that postponing such practical considerations until the latter part of the book has the advantage of giving the student a firm foundation in theory first. Then on the basis of this familiarity with the ideal Situation, various practical procedures can be evaluated in terms of the closeness with which they approximate the theoretically perfect method. To consider practical experimental procedures without such a grounding in the theoretical foundation leaves these procedures as approximations to something that is not yet clearly stated or understood.
The basic theoretical material on accuracy of test scores is presented in Chapters 2 through 5, which deal with the topics of test reliability and the error of measurement. The effect of test length upon reliability and validity is considered in Chapters 6 through 9, and the effect of group heterogeneity on measures of accuracy in Chapters 10 through 13. In these chapters we give only a theoretical definition of parallel tests, and we define reliability as the correlation between two parallel forms. This simplified presentation of the concept of parallel tests and of reliability makes it possible to concentrate on the theory of test reliability and test validity before taking up the short-cuts and approximations that are frequently used in actual practice. Practical problems of criteria for parallel tests are given in Chapter 14, and experimental methods of determining reliability when a parallel form is not used are considered in Chapters 15 and 16. Methods of scoring, scaling, and equating tests are considered in Chapters 18 and 19. Problems dealing with batteries of tests are considered in Chapter 20, and problems of item selection in Chapter 21.

2
Basic Equations Derived from a Definition of Random Error

1. Introduction

We shall begin by assuming the conventional objective testing procedure in which the person is presented with a number of items to be answered. Each answer is scored as correct or incorrect, and a simple or a weighted sum of the correct answers is taken as the test score. The various procedures for determining which items to use and the best weighting methods will be considered later. For the present we assume that the numerical score is based on a count, one or more points for each correct answer and zero for each incorrect answer, and we turn our attention to the determination of the accuracy of this score.
When psychological measurement is compared with the type of measurement found in physics, many points of similarity and difference are found. One of the very important differences is that the error of measurement in most psychological work is very much greater than it is in physics. For example, Jackson and Ferguson (1941) resorted to specially constructed “rubber rulers” in order to reduce the reliability of length measurements to values appreciably below .99. The estimation of the error in a set of test scores and the differentiation between “error” and “true” score on a test are central problems in mental measurement.

2. The basic assumption of test theory

It is necessary to make some assumption regarding the relationship between true scores and error scores. Let us define three basic symbols.
  • Xi = the score of the ith person on the test under consideration.
  • Ti = the true score of the ith person on this test.
  • Ei = the error component for the same person.
In defining these symbols it is assumed that the gross score has two components. One of these components (T) represents the actual ability of the person, a quantity that will be relatively stable from test to test as long as the tests are measuring the same thing. The other component (E) is an error. It is due to the various factors that may cause a person sometimes to answer correctly an item that he does not know, and sometimes to answer incorrectly an item that he does know. So far, it will be observed, there is no proposition subject to any experimental check. We have simply said that there is some number T that would be the person’s correct score, and that the obtained score (X) does not necessarily equal T.
It is possible to make many different assumptions regarding the relationship between the three terms X, T, and E. The one made in test theory is the simplest possible assumption, namely, that
eq0001
This equation may be regarded as an assumption that states the relationship between true and error score; or it may be regarded as an equation defining what we are going to mean by error. In other words, once we accept the concept of a tr...

Table of contents

  1. Cover
  2. Title
  3. Copyright
  4. Contents
  5. 1. INTRODUCTION
  6. 2. BASIC EQUATIONS DERIVED FROM A DEFINITION OF RANDOM ERROR
  7. 3. FUNDAMENTAL EQUATIONS DERIVED FROM A DEFINITION OF TRUE SCORE
  8. 4. ERRORS OF MEASUREMENT, SUBSTITUTION, AND PREDICTION
  9. 5. VARIOUS INTERPRETATIONS OF THE ERROR OF MEASUREMENT
  10. 6. EFFECT OF DOUBLING TEST LENGTH ON OTHER TEST PARAMETERS
  11. 7. EFFECT OF TEST LENGTH ON MEAN AND VARIANCE (GENERAL CASE)
  12. 8. EFFECT OF TEST LENGTH ON RELIABILITY (GENERAL CASE)
  13. 9. EFFECT OF TEST LENGTH ON VALIDITY (GENERAL CASE)
  14. 10. EFFECT OF GROUP HETEROGENEITY ON TEST RELIABILITY
  15. 11. EFFECT OF GROUP HETEROGENEITY ON VALIDITY (BIVARIATE CASE)
  16. 12. CORRECTION FOR UNIVARIATE SELECTION IN THE THREE-VARIABLE CASE
  17. 13. CORRECTION FOR MULTIVARIATE SELECTION IN THE GENERAL CASE
  18. 14. A STATISTICAL CRITERION FOR PARALLEL TESTS
  19. 15. EXPERIMENTAL METHODS OF OBTAINING TEST RELIABILITY
  20. 16. RELIABILITY ESTIMATED FROM ITEM HOMOGENEITY
  21. 17. SPEED VERSUS POWER TESTS
  22. 18. METHODS OF SCORING TESTS
  23. 19. METHODS OF STANDARDIZING AND EQUATING TEST SCORES
  24. 20. PROBLEMS OF WEIGHTING AND DIFFERENTIAL PREDICTION
  25. 21. ITEM ANALYSIS
  26. BIBLIOGRAPHY
  27. APPENDIX A. EQUATIONS FROM ALGEBRA, ANALYTICAL GEOMETRY, AND STATISTICS, USED IN TEST THEORY
  28. APPENDIX B. TABLE OF ORDINATES AND AREAS OF THE NORMAL CURVE
  29. APPENDIX C. SAMPLE EXAMINATION QUESTIONS IN STATISTICS FOR USE AS A REVIEW EXAMINATION AT THE BEGINNING OF THE COURSE IN TEST THEORY
  30. APPENDIX D. SAMPLE EXAMINATION ITEMS IN TEST THEORY
  31. ANSWERS TO PROBLEMS
  32. AUTHOR INDEX
  33. TOPIC INDEX