Part I The Foundations
The materials covered in the chapters that comprise this first part of this book provide the general framework for the specific statistical tests presented in Parts II and III. Part I reveals the plot and the main characters who are encountered throughout the book in the evolving episodes, each a surprising sequel. Despite the increasing twists and turns of each instalment, in Parts II and III, the most general plot and character types remain the same.
In Chapter 1 the logic which applies to all forms of data analysis covered in this book is described. The basic form of the logic revolves around four interrelated questions: What is expected? What is observed? What is the difference between the expected and the observed? How much of a difference between the expected and the observed can be expected due to chance alone? In this first chapter the concept of randomness is introduced along with other key concepts and the most common forms of research design.
In Chapter 2 the pictorial and numerical techniques researchers use to summarize their data are described. An informative summary is the first and, arguably, the most important stage in all forms of data analysis. Most statistical tests can be viewed as estimates of the reliability of the picture painted by the summary.
The coverage of probability theory and its laws in Chapter 3 is the basis for estimating the reliability or replicability of the picture portrayed in the summary. Parts II and III are built upon the foundation constructed in Part I. Each chapter in Parts II and III represents an application of probability theory (Chapter 3) to the summary of specific data (Chapter 2) to be analysed within a variation and adaptation of the general framework (Chapter 1).
1 Overview
Chapter contents
- 1.1 Purpose 4
- 1.2 The general framework 4
- 1.3 Recognizing randomness 8
- 1.4 Lies, damned lies, and statistics 9
- 1.5 Testing for randomness 10
- 1.6 Research design and key concepts 14
- 1.7 Paradoxes 19
- 1.8 Chapter summary 20
- 1.9 Recommended readings 20
Key Concepts: randomness, experiment, quasi-experiment, observational designs, question of difference, question of association, categorical data, measurement data, null hypothesis, Simpson’s paradox, Type I error, Type II error, variables, population, sample, random sample, independent variable, dependent variable.
1.1 Purpose
The first purpose of this chapter is to introduce you to a few concepts and themes that will be present, directly or indirectly, throughout this book. If there is one concept that is omnipresent, if not explicitly then at least implicitly, it is randomness. As will be seen, the concept underlies other phrases used either to refer to the presence of randomness or to its absence. To claim that two groups of people differ in some respect is to say that group membership is not completely random; for example, height is not random with respect to basketball players versus non-basketball players. To say that two groups do not differ in some regard is to say that group membership is random; for example, the maximum speed at which a car can travel is probably unrelated to the car’s colour. To assert that two events are related is to say that they do not occur randomly with respect to each other; for example, tsunamis are associated with earthquakes. To state that two events are unrelated is to say that they occur randomly with respect to each other. Related to our use of randomness are four key questions: What is expected? What is observed? What is the difference between the expected and the observed? How much of a difference can be expected due to chance alone?
The second purpose of this chapter is to review some basic strategies and principles of empirical research. We differentiate the basic forms of research (experimental, quasi-experimental, and observational designs) and review the main characteristics of each.
1.2 The general framework
Statistics, which are the numbers researchers use to describe their data and to test the trustworthiness or replicability of their findings, can feel convoluted and mysterious both for students and for researchers. In this section I offer you a three-part framework; if you use it, it will make the material in this book, and the statistics you encounter in everyday life, more easily understood.
Part 1. As a researcher, you begin with a question (or questions) about the nature of the world, or at least that aspect of the world which interests you. Let us start with the simplest type of research, where there is only one question. Regardless of your topic, your question will take one of two basic forms.
The first form is one of difference. For example, imagine yourself as a political science professor who wishes to know if your students prefer term papers or essay examinations as a means of evaluation. Or think of yourself as a clinical psychologist wishing to know if cognitive behavioural therapy (CBT) reduces your patients’ anxiety symptoms more than does the most commonly prescribed anxiolytic (a medication to reduce anxiety). In both of these examples you suspect that one set of scores will be different from the other: more students will prefer one form of assessment over the other; the CBT group on average will show fewer symptoms than the anxiolytic drug patients.
The second form a research question may take is one of relation or association. For example, you may manage a coffee shop and are interested in customer behaviour. Is the choice of beverage (coffee versus tea) associated with gender (men versus women)? Or if you are an educational psychologist you may suspect that there is an association between the number of hours per week a student works off-campus and his or her grades at the end of term. In both of these examples you suspect that one set of scores will be related (or will predict) the other set of scores. Perhaps a greater proportion of women will prefer tea than will men. Perhaps the more hours a student works off-campus the lower his or her grade point average will tend to be. The type of question – difference versus association – orients you towards appropriate statistical procedures. Questions of differences are linked with one family of statistical tests, and questions of association are linked with another family of tests.
Questions of differences and questions of association are not as dissimilar as they may appear. They are usually two sides of a single coin, with one question implying the other. Furthermore, a research project in psychology and in the social sciences often entails more than one question, and it may involve both questions of differences and questions of associations. For example, you may be a ‘sportologist’ wishing to know why some baseball players hit more home runs than others. You suspect that the taller the player, the more home runs he will hit (this is a question of a possible association between height and the number of home runs). You may also suspect that players who use aluminium bats will hit more home runs than will players who use the old-fashioned wooden bats (this is a question of a possible difference between types of bats).
Part 2. As an empirical researcher you collect data. Regardless of your area of interest, the observations usually take one of two general forms.
The first form your observations can take is that of frequency data or categorical data. Remember, as a political science professor you wished to know if among your students term papers are more popular than essay examinations as a form of evaluation. You are keeping count of the number of students in the two categories: those who prefer a term paper versus those who prefer an essay examination. As a manager of a coffee shop you were keeping track of the frequencies in four categories: the number of women who prefer coffee, the number of women who prefer tea, the number of men who prefer coffee, and the number of men who prefer tea.
The second form your observations can take is that of measurement data. As a clinical psychologist you wished to know if two groups of patients (CBT versus anxiolytic) differ in terms of their average number of anxiety symptoms. You are recording the number of symptoms each patient exhibits. It is possible that no two patients will exhibit the same number of symptoms. As an educational psychologist interested in hours worked and academic performance, you are recording the actual number of hours per week each student works off-campus and his or her grade. It is possible that no two students in your study will have worked the same number of hours or have exactly the same grade.
I need to warn you: the two types of data are not as different as they may at first appear, nor do they encompass all possible types of data. And often one type of data can be transformed or treated as if it were the other type. Examples of this transformation will appear at the end of Chapter 3.
As we will see in Chapter 2, frequency/categorical data and measurement data can be further divided into four types of number scales: nominal, ordinal, interval, and ratio. Where nominal and ordinal number scales are described as being frequency/categorical data, interval and ratio scales are considered as measurement data. As will become apparent in Part II of this book, for purposes of analysis ordinal data (such as percentile scores on an examination) often form an intermediate form of data or are transformed into a type of measurement data called z-scores, which are discussed in detail in Chapter 3.
We now have two basic research questions and two types of data. Earlier we said that each research question is linked with its own family of statistical tests. The same may be said with respect to the two types of data. Frequency data are linked with one family of statistical tests and measurement data are associated with another family of tests.
There are four families of statistical test:
- Tests for a question of difference with frequency data
- Tests for a question of relation with frequency data
- Tests for a question of difference with measurement data
- Tests for a question of relation with measurement data.
Keep in mind that this framework is not carved in stone, nor are the boundaries between the four categories impermeable. Rather, the framework is a guideline for following the flow of this book. It will help you to cut through what appear to be so many unrelated procedures and formulae and to see the general storyline and character types.
Part 3. We have seen that there are different families of statistical tests which reflect an intersection of the type of question the researcher asks and the type of data he or she has collected. Surprisingly, almost all statistical tests – at least those covered in this book – have the same underlying logic based on a few simple questions.
Question 1: What do you as a researcher expect to find?
You may have taken a course that introduces you to research methodology and know that...