1 Why Measurement Is Fundamental
Doctoral students in Malaysia report about a group of rather hard-nosed social science professors, who, during dissertation defense, insist on cross-examining candidates on the nature of the data they are analyzing. In particular, they enquire as to whether the data are really interval or merely ordinal in nature. Apparently, this rather old-fashioned disposition has been the undoing of a number of doctoral defenses; candidates who could not argue for the interval nature of their data were required to redo their statistical analyses, replacing Pearsonâs r with Spearmanâs rho and so on. Most professors in the Western world, at least in education, psychology, and the other human sciences, seem to have given up quibbling about such niceties: Pearsonâs r seems to work just as well with all sorts of dataâas SPSS doesnât know where the data come from, and apparently many of its users donât either. The upside of this difficulty is that many of these hard-nosed professors now realise that measures derived from Rasch analyses may be considered as interval and therefore permit the use of the wide array of statistical calculations that abound in the social sciences. Unfortunately, however, measurement is not routinely taught in standard curricula in the Western world, and the fallback position is to analyze ordinal data as if they were interval measures.
It seems, notwithstanding those old-fashioned professors and a small number of measurement theorists, that for more than half a century, social science researchers have managed to delude themselves about what measurement actually is. In our everyday lives, we rely both explicitly and implicitly on calibrated measurement systems to purchase gasoline, buy water, measure and cut timber, buy lengths of cloth, assemble the ingredients for cooking, and administer appropriate doses of medicine to ailing relatives. So how is it that when we go to university or the testing company to conduct social science research, undertake some psychological investigation, or implement a standardised survey, we then go about treating and analyzing those data as if the requirements for measurement that served us so well at home in the morning no longer apply in the afternoon? Why do we change our definition of and standards for measurement when the human condition is the focus of our attention?
Measurement systems are ignored when we routinely express the results of our research interventions in terms of either probability levels of p < 0.01 or p < 0.05, orâbetter yetâas effect sizes. Probability levels indicate only how un/likely it is that A is more than B or that C is different from B, and effect size is meant to tell us by how much the two samples under scrutiny differ. Instead of focusing on constructing measures of the human condition, psychologists and others in the human sciences have focused on applying sophisticated statistical procedures to their data. Although statistical analysis is a necessary and important part of the scientific process, and the authors in no way would ever wish to replace the role that statistics play in examining relations between variables, the argument throughout this book is that quantitative researchers in the human sciences are focused too narrowly on statistical analysis and not concerned nearly enough about the nature of the data on which they use these statistics. Therefore, it is not the authorsâ purpose to replace quantitative statistics with Rasch measurement but rather to refocus some of the time and energy used for data analysis on the prerequisite construction of quality scientific measures.
Those hard-nosed professors mentioned earlier, of course, recur to the guidelines learned from S.S. Stevens (1946). Every student of Psychometrics 101 or Quantitative Methods 101 has Stevensâs lesson ingrained forever. In short, Stevens defined measurement as the assignment of numbers to objects or events according to a rule and, thereby, some form of measurement exists at each of four levels: nominal, ordinal, interval, and ratio. By now, most of us accept that ratio-level measurement is likely to remain beyond our capacity in the human sciences, yet most of us assume the data that we have collected belong to interval-level scales.
Still, it remains puzzling that those who set themselves up as scientists of the human condition, especially those in psychological, health, and educational research, would accept their ordinal-level âmeasuresâ without any apparent critical reflection, when they are not really measures at all. Perhaps we should all read Stevens himself (1946) a little more closely. âAs a matter of fact, most of the scales used widely and effectively by psychologists are ordinal scalesâ (p. 679). He then specified that the only statistics âpermissibleâ for ordinal data were medians and percentiles, leaving means, standard deviations, and correlations appropriate for interval or ratio data only. And, even more surprisingly, âThe rank-order correlation coefficient is usually deemed appropriate to an ordinal scale, but actually this statistic assumes equal intervals between successive ranks and therefore calls for an interval scaleâ (p. 678). Can it be clearer than this: âWith the interval scale we come to a form that is âquantitativeâ in the ordinary sense of the wordâ (p. 679)? This is also our point: only with âintervalâ do we get âquantitativeâ in the ordinary sense, the sense in which we use scientific measures in our everyday lives. So why are social scientists left in a state of confusion?
Unfortunately, in this same seminal article, Stevens then blurred these ordinal/interval distinctions by allowing us to invoke âa kind of pragmatic sanction: In numerous instances it leads to fruitful resultsâ (p. 679). He added a hint of a proviso: âWhen only rank order of data is known, we should proceed cautiously with our statistics, and especially with the conclusions we draw from themâ (p. 679). It appears that his implicit âpermissionâ to treat ordinal data as if they were interval was the only conclusion to reach the social scientistsâscientists who were so obviously desperate to use their sophisticated statistics on their profusion of attitude scales.
One reasonably might expect that those who see themselves as social scientists would aspire to be open-minded, reflective, and, most importantly, critical researchers. In empirical science, it would seem that this issue of measurement might be somewhat paramount. However, many attempts to raise these and âwhether our data constitute measuresâ issues result in the abrupt termination of the opportunities for further discussion even in forums specifically identified as focusing on measurement, quantitative methods, or psychometrics. Is the attachment of our field to the (mis?)interpretation of Stevensâthe blatant ignorance that ordinal data do not constitute measurementâmerely another case of the emperorâs new clothes? (Stone, 2002). Letâs look at the individual components of that tradition: what is routine practice, what the definition of measurement implies, and the status of each of the ubiquitous four levels of measurement.
Under the pretense of measuring, the common practice has been for psychologists to describe the raw data at hand. They report how many people answered the item correctly (or agreed with the prompt), how highly related one response is to another, and what the correlation is between each item and total score. These mere descriptions have chained our thinking to the level of raw data, and raw data are not measures. Although psychologists generally accept counts as âmeasurementâ in the human sciences, this usage cannot replace measurement as it is known in the physical sciences. Instead, the flurry of activity and weight of scientific importance has been unduly assigned to statistical analyses instead of measurement. This misemphasis, coupled with unbounded faith in the attributions of numbers to events as sufficing for measurement, has blinded psychologists, in particular, to the inadequacy of these methods. Michell (1997) is quite blunt about this in his paper, titled âQuantitative Science and the Definition of Measurement in Psychologyâ, in which psychologistsâ âsustained failure to cognize relatively obvious methodological factsâ is termed âmethodological thought disorderâ (p. 374). The question remains: Is it po...