1.1 About this Book
This book covers types of analysis that apply predominantly to data gathered via quantitative surveys. It is intended for non-experts who may be conducting survey research for the first time ā for example for a student dissertation. As such, it has relatively few details of the mathematical underpinning of these methods (although some are essential to aid understanding), and concentrates more on the key principles of when and how the analysis should be done, and how it can be interpreted.
Although this text covers all of the key issues specific to analysis of quantitative survey data using classical test theory, many aspects of a broader study may be covered elsewhere. In particular, most types of analysis that would be used to describe data, or to test hypotheses, are covered in another book in this series (Scherbaum and Shockley, 2015). However, it will cover aspects of analysing survey data that other basic guides to data analysis might miss ā in particular, data cleaning, reliability analysis, exploratory factor analysis (EFA) and confirmatory factor analysis (CFA).
Although it does not assume you have read a previous book in this series on questionnaires (Ekinci, 2015), it would be advisable to do this if you are starting by constructing and administering your own questionnaire as this book is concerned with data analysis rather than collection. There will be some points in this book where I refer back to that text rather than explaining something fully again.
1.2 Survey Data and Questionnaires
Surveys have long been used as an important method of data gathering in social science and other fields. As such, a wide range of methods have been developed over recent decades to enable the appropriate analysis of such data; in particular, in disciplines such as psychology this has become an important area of research in its own right (the field of psychometrics is devoted to studying the theory and technique of psychological measurement, which is predominantly survey-based).
One of the reasons surveys are so popular is their flexibility. Then can be used to collect both quantitative and qualitative data (this volume is concerned only with the quantitative part); they are sometimes the only valid way of collecting some quantitative data (e.g. opinions, attitudes, perceptions ā things that cannot be measured directly); and they allow a large amount of data to be collected using consistent and relatively inexpensive methods.
Quantitative data within questionnaires can take on different forms, however ā as it can with any source of data. Before going into detail on the types of methods that can be used, it is essential to understand these different data types.
The first distinction it is important to understand is that between categorical and numerical data. Categorical data refers to variables that can take on different categories ā for example, sex, nationality and occupation. Each value represents a different thing, or category, but this is not a numerical quantity (although we may choose to use numbers as labels for these categories). Numerical data refers to variables that have a meaningful numerical value, whether on a naturally occurring or constructed scale ā for example age, income, number of children, well-being measured on a scale from 1 to 10.
However, within these two major groups, there are sub-groups that it is important to understand: these are described and exemplified within Table 1.1.
Table 1.1 *Of course this is not necessarily a simple binary variable as it fails to take into account transgender people ā however, it is often measured in a binary way.
The distinction between these different types and sub-types is crucial for making decisions about analysis. Some types of analysis (e.g. correlations) only make sense with continuous data; some (e.g. chi-squared tests) only make sense with categorical data; some (e.g. one-way ANOVA) require a mixture of the two. However, these types of analysis are covered in Scherbaum and Shockley (2015), and will only be mentioned briefly in this book.
For the majority of this book we will concern ourselves with one particular type of data that is very common in questionnaires, but doesnāt fit neatly into the categorization in Table 1.1: Likert scales.
1.3 Likert Scales
Likert scales, named after the twentieth-century American psychologist Rensis Likert, are a method of measuring a variable (construct) that cannot be directly measured, by asking respondents to what extent they agree with a series of statements. Each statement is known as an āitemā; technically, a Likert scale is the summation (or average) of the different items, although it is sometimes used to refer to an individual item as well.
For example, a set of three items relating to extraversion (Lang et al., 2011) is:
- I see myself as someone who is talkative.
- I see myself as someone who is outgoing, sociable.
- I see myself as someone who is reserved.
For each item, the respondent would typically choose from one of the following options:
- Strongly disagree.
- Disagree.
- Neither agree nor disagree.
- Agree.
- Strongly agree.
Sometimes a different set of response categories might be used: for example, options ranging from āVery dissatisfiedā to āVery satisfiedā, or from āNot at allā to āAll the timeā. Technically these should be referred to as Likert-type scales, as the original definition of Likert scales refers to items asking about the level of agreement only; however, it is common for all such scales to be referred to as Likert scales, and so this book will use the term āLikert scalesā to refer to all sets of items with such rating scales.
The number of response options may vary: in the above example there were five, but this may be four, six, seven or indeed any higher number. Note that if there are an odd number of items, then a symmetrical scale will yield a neutral or average item (e.g. āNeither agree nor disagreeā) whereas an even number of items (e.g. six) will not.
In any case, each item should be considered as an ordinal variable. That is, the respondent chooses from one of a number of ordered categories. However, when taken together this changes. The purpose of asking three separate questions about extraversion here is not that the individual items are themselves of particular interest, but that between them they should give a better overall indication of the level of extraversion. Therefore a single score for the construct (extraversion) needs to be created.
For this purpose, a number is assigned to each of the responses ā typically these would be 1, 2, 3, 4, 5 for the above example, although for the third question (which measures the extent to which someone is reserved ā the opposite of what we would expect for an extravert) a response of āstrongly disagreeā means higher extraversion, and therefore we would code these responses as 5, 4, 3, 2, 1 respectively. This is often referred to as a ānegatively worded questionā.,1
Figure 1.1 Lang et al.ās (2011) extraversion scale in questionnaire form
The overall score for extraversion, then, would be calculated as the average ā the arithmetic mean ā or, alternatively, the sum of these three item scores. So, for example, if someone responded āStrongly agreeā to āI see myself as someone who is talkativeā this would be scored 5; if they responded āNeither agree nor disagreeā to āI see myself as someone who is outgoing, sociableā this would be scored 3; and if they responded āDisagreeā to āI see myself as someone who is reservedā, this would be scored 4 (because it is a negatively worded question). The overall extraversion score for this individual, then, would be (5 + 3 + 4) Ć· 3 = 4.0.
You may notice that, by doing this, we are treating the ordinal measurement of the individual items in a more numerical way, and the eventual score for extraversion is no longer categorical, but actually resembles an interval variable. In fact, it is then usually treated in analysis as if it is a continuous, numerical variable.
Is this justified? Research suggests that it can be, but only if the Likert scale (or just āscaleā) has good reliability and validity ā concepts that are hugely important in survey research, and will be the focus for large sections of this book. These will be introduced formally in Chapter 2.
1.4 Classical Test Theory
Classical test theory (CTT) is the measurement theory that underlies the techniques being described in this book. It was developed by Mel Novick, who first published the codification in 1966 (Novick, 1966). It is based around the idea that any measured score consists of two parts ā a true score and an error. The āerrorā may represent measurement error, and other types of random or systematic error too ā examples of these will be shown in the next chapter. As we will also see in Chapter 2, this can be expressed in a formal mathematical way that allows us to express concepts like reliability and validity in a more formal way.
A key concept in CTT is that, even though our measurement may be constrained by the tools we used, the underlying (true) score is on a continuum. Thus, when someone answers a question such as āI see myself as someone who is talkativeā, their true perception of how talkative they are could fall anywhere within a given range ā between the point where they would only answer āstrongly disagreeā and the point where they would answer āstrongly agreeā. If they consider themselves to be quite talkative, for instance, but perhaps ...