Section IV
Psychological tests
Lists, descriptions and evaluations
In this final section of the handbook I scrutinise some of the best-known tests within the various categories which have been discussed throughout this book. I describe them, without destroying test confidentiality, and discuss how they were constructed, which has considerable bearing on their utility. I examine them against the criteria of psychometric efficiency - reliability, validity and quality of standardisation - and conclude by discussing where each may be best used in applied psychology and research or whether it should be used at all, since many famous tests turn out to be quite inadequate. Where relevant these considerations embrace critical research findings with the tests.
In a book of this length I have had to be selective in my discussion of tests. I have chosen those tests which are generally regarded as the best within their fields and other tests which are of the highest technical standards or are of particular interest either on account of their method of construction or because they attempt to measure variables of psychological importance.
However, that this section does not include every published test rests on more than convenience. This is a handbook of psychological testing and I hope that it will enable readers to make their own informed judgements about tests, both in respect of their technical qualities and their use and value in applied psychology. Furthermore, as examination of the Mental Measurement Yearbooks has always shown, most psychological tests are neither reliable nor valid.
Recently the British Psychological Society has published two books reviewing ability and personality tests in respect of their suitability for use in occupational psychology and vocational guidance (Bartram et al., 1992, 1995). These reviews by various psychometric specialists cover almost all tests which are available in this country. These are valuable reference books, but they are complementary to the discussions of tests in this handbook which are from the rigorous psychometric viewpoint that tests must measure only externally validated, psychologically meaningful factors rather than simply reliable variables.
Chapter 23
Intelligence tests
As was made clear from Chapter 12, intelligence testing is the longest established field of psychological testing. I shall not describe a large number of intelligence tests simply because there are a few tests which are regarded as valid measures by almost all workers in the field and which, in terms of convenience and length, are suited to a wide variety of applications.
Individual Intelligence Tests
Two individual intelligence tests have dominated the measurement of intelligence, especially in the clinical and developmental fields, the Wechsler Tests (Wechsler, 1958) and the Stanford-Binet (Terman and Merrill, 1960). Both these tests were developed before the modern elaborations of the factor analytic account of intelligence, and more recently a new test, the British Ability Scale (Elliot, 1983) has been constructed to take these into account. These three tests will now be described and evaluated.
The Wechsler Scales
The first of Wechsler's scales was the Wechsler Bellevue Scale of 1938. Since then, a new adult scale, the Wechsler Adult Intelligence Scale (Wechsler, 1958), the WAIS, and two children's scales, the WISC and the WPSSI, for pre-school and primary school children, have been produced. I have not given exact dates and references for these tests, since they are constantly under revision, but the latest editions and manuals are available from the Psychological Corporation, New York. These are, inter alia, the WISC-Rs, for Scottish children, the WISC-Ruk for use in other parts of the UK, the WPSSI-Ruk, the WAIS-Ruk and a computerised form of both tests, the WAIS-R Micro and the WISC-R Micro. Wechsler (1944, 1958, 1974) has clear accounts of the rationale of these tests, as do the test manuals. These tests together allow intelligence testing from the ages of 4 years to beyond 70.
The WAIS
I shall describe the adult scale, the WAIS, since the other scales follow this format as far as is possible, their items simply being suited for children and thus of less difficulty.
Variables Verbal intelligence, performance intelligence and an overall intelligence score. In addition subscale scores and profiles of scores can be used.
Subscales There are eleven subscales, divided into verbal and performance scales. The verbal scales are: information, comprehension, arithmetic, similarities, digit span, and vocabulary. The performance tests are: digit symbol, picture completion, block design, picture arrangement and object assembly.
Description of subscales I shall describe the items in these scales. However, in this chapter and the others describing ability tests I shall not discuss the actual items in the test as this would destroy the confidentiality of the items. Instead I shall describe similar types of items.
1. The verbal subtests
a. Information: 29 items This is basically a test of general knowledge, involving the authorship of books, parts of the Bible and geographic locations. Such a test is clearly a test of gc, crystallised ability, rather than the basic reasoning ability of the brain, fluid ability, gf. Although scores on this scale are likely to be affected by social class and education, the rationale for its inclusion in an intelligence test is that, other things being equal, the intelligent person will be more knowledgeable since she or he learns more easily and can make more connections between new and old material than the less intelligent.
It should also be realised that the efficiency of the WAIS in predicting educational success is augmented by such scales where the investment of g, in skills valued by the culture, to use the terminology of Cattell (1971), is tested, which is a basic demand of educational success.
b. Comprehension: 14 items This test requires the subject to explain the meaning of proverbs, e.g. 'it's no good crying over spilt milk', and to solve practical problems. Some of the questions involve perhaps slightly more than general knowledge, for example, of taxes and the price of land but, in general, this is a good test of reasoning where subjects are required to get to the heart of a problem and, as such, it would be expected to load highly on crystallised ability.
c. Arithmetic: 14 items These items involve traditional arithmetic problem solving: if water flows into a bath at 2 gallons a minute and the bath holds 200 pints how long will it take to fill? Obviously such a test depends on knowledge of simple arithmetic but such knowledge does depend on gc at least in part.
d. Similarities: 13 items In this test subjects are required to point out what is similar about two things, for example a sonnet and a sonata. The correct answer involves hitting on the essential similarity, in this case that they are both creations, rather than the superficial, that they have a similar first syllable. Again this is likely to be a test of gc rather than gf.
e. Digit span: 17 items Subjects are required to repeat strings of digits after they have been read to them at a steady rate. One set of items is repeated as heard, another is repeated backwards. Clearly this is not a test of crystallised ability since few cultures value such an ability. However the fact that it loads on fluid ability is of considerable theoretical interest from the viewpoint of the nature of intelligence. It is noteworthy that, as Jensen (1980) points out, the backward digit span loads more highly on gr than does the forward span, presumably because the task requires more and more complex mental processing, a feature of all gf loaded tasks. Jensen (1980) argues that the digit span loading on g supports the notion that intelligence reflects the speed and integrity of information processing in the brain, a view, however, which is perhaps too simple in the light of EEG findings and work with reaction times (as is fully argued in Kline, 1990).
f. Vocabulary: 40 items This is a straightforward vocabulary test and one which many practical psychologists would use if they were forced to estimate intelligence with a brief test, since it loads highly on both g factors. As was the case with information this is because the intelligent person learns better because she or he can connect new information to what is already possessed. It is this process which leads to the ever-widening disparity between the highly intelligent and those of low intelligence, as life proceeds.
Indeed as Cattell (1971) demonstrates, vocabulary loads highly on gf in early life but gradually shifts over to crystallised ability for obvious reasons. Jensen (1980) has argued that in adulthood vocabulary scores reflect childhood rather than present fluid ability since subjects with a high vocabulary score may be poor at dealing with novel problems. This is a possible explanation of this finding but it should be pointed out that this disparity in performance may arise from the fact that some subjects whose work is closely involved with words may simply develop a good vocabulary for this reason alone, intelligence not being much implicated. This scale is also useful in the assessment of premorbid intelligence in brain damaged patients (Crawford et al., 1988).
Conclusions concerning the verbal subtests These descriptions make it clear that the verbal intelligence score of the WAIS is essentially a measure of crystallised ability. The only exception to this is the digit span test. All these tests are concerned with intelligence as it is evinced in Western culture. It is clear, further, that these scales are to some extent influenced by social class and education, although to a lesser extent than is the case with school attainment tests. It is for all these reasons that the verbal intelligence score of the WAIS and WISC correlated well with academic and occupational success (e.g. Vernon, 1961; Ghiselli, 1966) since these criteria are also examples of the investment of intelligence in culturally valued activities.
2. The performance subtests
That the WAIS has performance tests as well as the verbal measures is, undoubtedly, one of the great assets of the test. These tests are far less likely than the verbal measures to be influenced by social class or education. Indeed, as shall be seen below, these tests exemplify the definition of Cattell (1971) of how to test fluid ability - namely with items that are so familiar in a culture that social factors are of no importance or with items so unfamiliar that no subject will have encountered anything like them before taking the test.
g. Digit symbol: 90 items in 90 seconds In this speeded test symbols are paired with digits. With the examples before them subjects have to place the correct digit to the symbols. The mental processing required for this task is within the capacity of most subjects so that the emphasis of the test is on speed. This test is likely to load highest on the cognitive speed factor, as well as gf, but it is entirely sensible that it should correlate with real life success.
h. Picture completion: 21 items in 20 seconds Subjects are presented with pictures each with an element missing which has to be recognised within a second of presentation. This is a scale which demands familiarity with pictorial representation so that it could not be used in cross-cultural studies without first demonstrating that subjects could recognise the pictures (Deregowski, 1980). Such a test would be expected to load on fluid ability and perceptual speed. In Great Britain only highly deprived subjects would be obviously at a disadvantage.
i. Block design: 10 items with bonuses for rapid completion In this test subjects are presented with blocks and some designs which have to be made from the blocks. Bonus points can be scored for rapid completion. This is a skill that is largely unpractised so the test should load on fluid ability as well as on spatial and perceptual factors. There is an element of specific variance in this test since I have seen some subjects divide the pictures of the designs into rectangles, thus greatly simplifying the task.
j. Picture arrangement: 8 items with time bonuses for rapid completion Each item consists of a set of pictures which have to be arranged into a sequence such that they tell a story. Again for most subjects this is a novel task, so that it should load on gf.
k. Object assembly: 4 items with bonuses for rapid completion This test requires subjects to assemble pieces into a whole object, like a jigsaw. Such a test should load on spatial ability and gf.
Conclusions concerning performance subtests These performance tests are the type of tests that load on fluid ability, and which are unlikely to yield scores inflated by good educational and familial factors. Indeed the differences between the verbal and performance scores are often considered to be important but this will be discussed in a later section on the WAIS.
Norms
The 1955 version of the test contained norms based upon a sample of 1700 A...