The term ‘statistics’ is used in three senses. In everyday language it is usually taken to mean – in the words of the Concise Oxford Dictionary ‘numerical facts systematically collected’. It may also refer – as in the title of this book – to the methods of collecting, classifying and analysing quantitative data (i.e. information). Or, it may be the plural of ‘statistic’ which is a technical term, the purpose of which is explained on p. 3, referring to a characteristic of a sample which has been selected. In this book the methods of dealing with data concerned with people and their relationships are at the centre of attention. However, before these are described we need to establish two things, first, the purpose of statistical methods, and secondly, the kinds of data generated in the social field.
The Functions of Statistics
Statistical methods are used for two main purposes: to describe or summarise data in the most suitable way, and to make inferences from such data. As far as the former is concerned, the social scientist is often confronted with large quantities of data and has to know how to proceed to make sense of them. This is the case, for instance, when a social survey has been carried out, and it is also true of information obtained from official sources such as the Census of Population. Essentially, we need to change the form of the data so that their meaning can more easily be grasped. This can be done by calculating measures such as percentages and averages which summarise the data. We might also represent the data in tabular or pictorial form in order to take advantage of the increased impact of these forms. Thus, if we were interested in incomes, we might calculate average income and a measure of how all the incomes were spread out about that average value. We might also produce a table showing the numbers of people with incomes between specified limits, as is done in most official publications. We could then produce a picture of this table, so that the distribution of incomes could be observed more easily.
Where our interest lies in the relationships between aspects such as income, occupation, housing and social mobility, then there is again a need to summarise data. For the simplest case of two items the correlation coefficient (p. 154) which specifies the strength of the association between them, is a useful summarising measure, and patterns of correlation lend themselves well to graphical presentation. However, descriptive statistics is even more necessary when the relationship between more than two aspects is being examined, since in undigested form the data would be very difficult to absorb.
The second function of statistics, which will in fact be the subject of the major part of this book, is to make estimates of, and draw inferences about, the characteristics of populations on the basis of information obtained from samples which have been drawn from those populations according to a specific criterion. In statistical language the term population does not necessarily refer to people, but is used to describe the complete set of things – individuals, social groups, relationships, objects or observations in which we are interested and for which we wish our results to apply, e.g. all the households in a city. A sample, on the other hand, is simply a part (or sub-set) of the population. Undoubtedly, the use of methods of statistical inference has been a great help in the development of the social sciences. There are several reasons for this, the most obvious of which is essentially practical.
The sociologist seeks to describe and analyse social relationships and group life in industrial societies. However, the modern nation state incorporates millions of inhabitants linked by a vast network of relationships. Considerations of time and money dictate that the social investigator cannot contact every resident of the country (or even of a city) in order to study a phenomenon of interest, e.g. marital breakdown. What is possible, though, is for him to specify precisely the category of persons or social groups, e.g. families and households, about which he wishes to generalise (the population)and then draw a – perhaps relatively small – sample from it. He can then proceed to investigate the phenomenon in appropriate depth by contacting the members of the sample. He may fmd, for instance, that within the sample, marital breakdown has occurred more often for those couples where the partners come from diverse social backgrounds. However, since the primary interest is in the defined population and not simply in the sample, there is a need to make certain inferences from the characteristics of the latter to those of the former. This can be achieved with the help of statistical theory. Perhaps the most widely known examples where this type of inference is involved are provided by opinion polls which seek to indicate the voting intentions of the nation’s electorate on the basis of information from samples of approximately 2,000 adults.
Sometimes the situation is rather different in that the social scientist wishes to investigate the causal processes which may have generated the data. Yet the aim still will often be to generalise, so the procedures of inferential statistics are again required. In experimental research, for instance, the intention is to draw conclusions about relationships which apply under similar conditions. Again, in non-experimental studies the social scientist may examine some, or even all, of the available cases but nevertheless seek to generalise more widely. For instance, perusal of the criminal statistics for England and Wales reveals that crime rates tend to be higher in urban than in rural areas, but inferential statistics are needed if one is to conclude that this association may be expected to apply generally in similar circumstances. Of course, vagueness here surrounds the notion of ‘similar circumstances’, and it is a basic task of the sociologist and criminologist in advancing from the initial statistical evidence to clarify this idea and provide understanding of the link between the determinants of crime and the differing features of urban and rural life.
This type of example can help to indicate the differing ways in which statistical procedures are used in social subjects as compared with disciplines such as physics and chemistry. The fact is that in these latter fields experimentation is the basic method, but for a variety of reasons it is less often able to be used in the study of social phenomena. The sociologist most often seeks to analyse either data derived from naturally occurring situations (as is the case with official statistics or the fmdings of direct observation), or those generated by social surveys. In these cases there are typically many relevant factors and statistical procedures are needed to analyse their relative influence. For instance, crime rates are no doubt affected not simply by urban and rural conditions, but also by the effectiveness of local police forces, local sentencing policies and other factors (some unrelated to the urban-rural dimension). What may initially appear to be a simple relationship often proves to be highly complex and difficult to interpret. Though procedures of statistical inference are used in virtually all scientific fields (e.g. to handle problems of measurement error), the need for them in the social field is particularly acute.
Once the broad nature of statistical inference is clarified, it becomes easier to specify in a preliminary way the two types of problem with which it deals: the estimation of population parameters using sample statistics, and tests of statistical hypotheses. In connection with the first topic, the notion of a parameter refers simply to some measurable characteristic of a population (e.g. a proportion or average). On the other hand, a statistic is defined as a quantity calculated from the raw sample data (again a proportion or average would be an example) which may be used to estimate a parameter. In a study of overcrowding in a city the population might be defined as all dwellings within an identifiable geographical boundary. The parameter to be estimated might be the proportion of those dwellings which are overcrowded, according to a specific criterion. In proceeding to estimate this parameter, the social scientist might select a sample of dwellings in the city from a suitable list or map. He would then determine by direct investigation the proportion of overcrowded dwellings in the sample. The latter quantity would be the sample statistic. If the sample had been appropriately selected, with the help of statistical theory he would then be in a position to estimate the parameter. This usually involves specifying a range of values (perhaps centred on the statistic) and attaching a high degree of confidence to the claim that, if its value became known, the parameter would be found to lie within that range. Summarising, one can say that the investigator is able to estimate the required quantity, but the fact that only partial coverage of the population has been achieved leads to there being a certain calculable error – the sampling error.
The second type of problem indicated above – the testing of statistical hypotheses – can be illustrated as follows. Suppose in a study of political attitudes and voting behaviour, one wishes to determine whether the distribution of opinions is the same among the men and the women resident in a city. To this end a sample is chosen by appropriate means from the electoral roll and the political views of those selected are determined, e.g. the proportions intending to vote Conservative, Labour, and so on. Almost inevitably, one is likely to find some (possibly slight) aggregate differences in the patterns of responses of the men and women sampled. The basic issue, though, if differences are present, is whether these are simply a feature of the particular sample selected – which may be expected to be absent in alternative samples – or can be taken as a firm indication that there are real differences in the population itself. The decision as to which of these alternatives to accept is made following a ‘statistical test of significance’ (see Chapter 6). Among other things the decision depends upon the sample size and the magnitude of the differences within the sample.
Examples such as these may help to make clear just how useful descriptive and inferential statistical procedures can be in the analysis of empirical data in the social field. However, it is necessary to stress that statistical issues do not arise simply at the final stages of an investigation. It is very important that statistical considerations be brought in at the planning stage of such projects, especially in relation to the precise definition of the population to be studied; the question of whether to draw any samples and, if so, the method of drawing them and their size; and the procedures of estimation and/or testing which will be used. It is essential that decisions on these issues be taken before rather than after the start of data collection.