A Primer in Longitudinal Data Analysis
eBook - ePub

A Primer in Longitudinal Data Analysis

  1. 176 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

A Primer in Longitudinal Data Analysis

Book details
Book preview
Table of contents
Citations

About This Book

`The author has done a remarkable job of writing a very accessible introduction to a broad literature. As such, he should be congratulated on achieving his objective to provide the "ideal primer for this growing area of social research"? - Kwantitatieve Methoden

This accessible introduction to the theory and practice of longitudinal research takes the reader through the strengths and weaknesses of this kind of research, making clear: how to design a longitudinal study; how to collect data most effectively; how to make the best use of statistical techniques; and how to interpret results.

Although the book provides a broad overview of the field, the focus is always on the practical issues arising out of longitudinal research. This book supplies the student with all that they need to get started and acts as a manual for dealing with opportunities and pitfalls. It is the ideal primer for this growing area of social research.

Frequently asked questions

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes, you can access A Primer in Longitudinal Data Analysis by Toon W Taris in PDF and/or ePUB format, as well as other popular books in Social Sciences & Social Science Research & Methodology. We have over one million books available in our catalogue for you to explore.

Information

1

Longitudinal Data and Longitudinal Designs

This chapter deals with some of the issues and complexities involved in the collection of longitudinal data. It aims to provide guidance, ideas, and perhaps some sense of confidence to investigators who expect a longitudinal design to help them in obtaining valid answers to their research questions, but are as yet uncertain about the best design for such a study. In this chapter I first distinguish between longitudinal research designs and longitudinal data, showing that the last does not necessary imply the first, and vice versa. After discussing some of the advantages of longitudinal data, seven basic designs for collecting such data are addressed. Finally, I provide a short checklist of the issues to be considered before undertaking a longitudinal study.

Longitudinal data versus longitudinal designs

Basically, longitudinal data present information about what happened to a set of research units (such as people, business firms, nations, cars, etc.) during a series of time points (for simplicity, I will refer to human subjects throughout the remainder of this text). In contrast, cross-sectional data refer to the situation at one particular point in time. Longitudinal data are usually (but not exclusively) collected using a longitudinal research design. The participants in a typical longitudinal study are asked to provide information about their behavior and attitudes regarding the issues of interest at a number of separate occasions in time (also called the ‘phases’ or ‘waves’ of the study). The number of occasions is often quite small – longitudinal studies in the behavioral and social sciences usually involve just two or three waves. The amount of time between the waves can be anything from several weeks (or even days, minutes, or seconds, depending on the aim of a study) to more than several decades. Finally, the number of participants in the study is usually fairly large (say, 200 participants or over; sometimes even tens of thousands).
Although longitudinal research designs can take on very different shapes, they share the feature that the data describe what happened to the research units during a series of time points. That is, data are collected for the same set of research units for (but not necessarily at) two or more occasions, in principle allowing for intra-individual comparison across time. Note that the research units may or may not correspond with the sampling units. For example, in a two-wave longitudinal study on the quality of the care provided by a children’s day care center (the research unit), a different sample of parents (the sampling units) may be interviewed at each occasion. The aggregate of the parents’ judgements at each time point will allow for conclusions about changes in the quality of the care provided by the center, even if no single parent has been interviewed twice.
As another example, take the consumer panel that is frequently used in marketing research. The participants in such panels provide the researchers on a regular basis with information about their level of consumption of particular brands or products. These levels are then monitored in time. However, the consecutive measurements are usually not matched at the micro-level of households (Van de Pol, 1989). Although this example presents a longitudinal study at the level of the research units (the brands under examination, the levels of consumption of these being followed across time), a series of cross-sectional studies would have given us the same information.
Thus, there is not necessarily a one-to-one correspondence between the design of a study and the type of data collected. The data obtained using a longitudinal research design (involving multiple interviews with the same participants) may be analyzed in such a way that no intra-individual comparisons are made; it may even be pointless to attempt to do so (as in the consumer panel). Conversely, longitudinal data may be collected in a single-wave study, by asking questions about what happened in the past (so-called retrospective questions, see below for a discussion). Although such data are collected at the same occasion, they may cover an extended period of time. As Campbell (1988: 47) argued, ‘To define “longitudinal” and “repeated measures” synonymously is to confuse the design of a particular study with the form of the data one wishes to obtain’.

Covariation and causation

A distinction can be made between studies that are mainly of a descriptive nature, and studies that more or less explicitly aim to explain the occurrence of a particular phenomenon (Baltes and Nesselroade, 1979). In descriptive studies, the association (or covariation) between particular characteristics of the persons under study is described. Thus, researchers are satisfied with describing how the values of one variable are associated with the values of other variables. Conclusions in this type of research typically take the form of ‘if X is the case, Y is usually the case as well’, and ‘members of group A have on average more of property X than members of group B’. Such statements simply describe what is the case; in a longitudinal context they would tell you what has happened to whom. The strength of the association between variables X and Y can be expressed through association measures such as the correlation coefficient (if both variables are measured on at least ordinal level) or the chi-square value (if both variables are measured qualitatively).
It is often unsatisfactory to observe a particular association without being able to say why this particular association exists. Further, from a practical point of view it is much more helpful to know that phenomenon Y is affected by X, rather than to know that X and Y tend to coincide. Therefore, it is not surprising that much research aims to explain the occurrence of events, to understand why particular events happen, and to make predictions when the situation changes (Marini and Singer, 1988). Stated differently, much research describes the association between pairs of variables in causal terms. It is generally accepted that at least the following three criteria must have been satisfied before a particular association between two variables can be interpreted in causal terms (Blalock, 1964; Menard, 1991).
  1. Covariation. There must be a statistically significant association between the two variables of interest. It makes little sense to speak of a ‘causal’ relationship if there is no relationship at all.
  2. Non-spuriousness. The association between the two variables must not be due to the effects of other variables. In experimental contexts this is ascertained by random allocation of participants to conditions. If successful, this results in a situation in which there are no pre-treatment differences between the experimental group and the control group, thus ruling out alternative explanations for a post-treatment difference. In non-experimental contexts, the association between two phenomena must hold up, even when other (sets of) variables are controlled. For example, a statistically significant relationship between the number of rooms in one’s house and the price of the car that one drives will probably fully be accounted for by one’s income. A statistical association between two variables that disappears after controlling a third variable is called ‘spurious’.
  3. Temporal order of events. Thirdly, the ‘causal’ variable must precede the ‘effect’ variable in time. That is, a change in the causal variable must not occur after a corresponding change in the effect variable (but see below).
A fourth criterion is not usually mentioned, perhaps because it is so obvious. Causal inferences cannot directly be made from empirical designs, irrespective of the research design that has been used to collect the data or the statistical techniques used to analyze the data. In non-experimental research, causal statements are based primarily on substantive hypotheses which the researcher develops about the world. Causal inference is theoretically driven; causal statements need a theoretical argument specifying how the variables affect each other in a particular setting across time (Blossfeld and Röhwer, 1997; Freeman, 1991). Thus, causal processes cannot be demonstrated directly from the data; the data can only present relevant empirical evidence serving as a link in a chain of reasoning about causal mechanisms.
The first two criteria (there is a statistically significant association between two variables, that is not accounted for by other variables) can in principle be tested using data from cross-sectional studies. Evidence relevant to the third criterion (cause precedes effect) can usually only be obtained using longitudinal data. Thus, one great advantage of longitudinal data over cross-sectional data would seem that the first provides information relevant to the temporal order of the designated ‘causal’ and ‘effect’ variables. Indeed, some authors (e.g., Baumrind, 1983) maintain that causal sequences cannot usually be established unambiguously without incorporating across-time measurement. However, there has been some debate whether the causal order of events is accurately reflected in their temporal order (Griffin, 1992): Is it really informative to know the order in which events occurred?
According to Marini and Singer (1988), causal priority may be established in the mind in a way that is not reflected in the temporal sequence of behavior. Willekens (1991) argued that present behavior may be determined by future events (or the anticipation of such events), rather than by these events themselves. For example, one common finding is that women tend to quit their job after the birth of their first child. These two events (leaving the labor market and having a baby) tend to coincide, with empirically occurring patterns in which childbirth both precedes and follows leaving the job. The first sequence would suggest that having a baby ‘produces’ a change of labor market status, whereas the second would imply that leaving the labor market leads to childbirth. However, it would seem that both events are the result of anticipations and decisions taken long before the occurrence of either. If this is correct, the temporal order of these events may not say much about their causal relation (Campbell, 1988).
The take-home message is that, although longitudinal data do provide information on the temporal order of events, it still may or may not be the case that there is a causal connection between these events. We still need to develop a more or less explicit theory that spells out the causal processes that produce empirically occurring patterns of events. A cautious investigator will consider these processes before the study is actually carried out – that is, in the design phase: a priori consideration of the possible relations among the study variables may lead them to conclude that other variables must be measured as well.

Designs for collecting longitudinal data

Any study can only be as good as its design. This obvious (albeit often neglected) point applies strongly to longitudinal research, as the design of a longitudinal study must usually be fixed long before the last wave of this study has been conducted. Errors in the design phase may be costly and difficult (if not impossible) to correct – it is awkward to find out afterwards that it would have been very convenient had variable X been measured at the first wave of the study, rather than only at its final wave.
At a more basic level, investigators must decide in advance about the number of waves of their study; whether it is really necessary to measure the variables of interest at different times for the same set of sampling units; and about the number of sampling units for which data should be collected (taking into account that sampling units have the sad tendency to drop out of the study, see Chapter 2). Below I describe seven basic design strategies, all of which are frequently employed in practice (Kessler and Greenberg, 1981; Menard, 1991). Some of these are truly longitudinal, in that they involve multiple measurements from the same set of sampling units; others are not usually thought of as ‘longitudinal’ designs.

The simultaneous cross-sectional study

In this type of research, a cross-sectional study involving several distinct age groups is conducted. Each age sample is observed regarding the variables of interest. Although this design does not result in data describing change across time (it is therefore not a truly longitudinal design), it does yield data relevant to describing change across age groups. As such, it may be used to obtain understanding of development or growth across time. Any cross-sectional study in which participant age is measured might be considered an example of this design. However, in a simultaneous cross-sectional study, respondent age is the key variable, whereas in most ‘standard’ cross-sectional designs age is just another variable to be controlled.
There are many threats to the validity of inferences based on this type of study. For example, different age groups have usually experienced different historical circumstances, and these may also result in differences among the age groups (this point is elaborated below, in the discussion of the cohort study). Further, in this design, age effects are confounded with developmental effects, because the two concepts are measured with the same variable.

The trend study

In a trend study (which is sometimes also referred to as a ‘repeated cross-sectional study’), two or more cross-sectional studies are conducted at two or more occasions. The participants in the cross-sectional studies are comparable in terms of their age. Usually a different sample is drawn from the population of interest for each cross-sectional study. In order to ensure the comparability of the measurements of the concept of interest across time, the same questionnaires must be used in all cross-sectional surveys (see also Chapter 3). This type of design is suitable to provide answers to questions like ‘are adolescents becoming more sexually permissive?’, or ‘how does voters’ support for right-wing parties vary across time?’.
In a typical trend study, researchers are not interested in examining change at the individual level (it is impossible to know what happened to whom, assuming that the study did not include retrospective questions). The trend study design is therefore not suited to resolve issues of causal order or to study developmental patterns. Its principal advantage to a true cross-sectional design is that it allows for the detection of change at the aggregate level. Thus, the trend study is a typical instance of a design that is cross-sectional at the level of the sampling units, but longitudinal at the level of the research units.

Time series analysis

In time series analysis, repeated measurements are taken from the same set of participants. The measurements are not necessarily equally spaced in time. In comparison to the two preceding designs, the time series design allows for the assessment of intra-individual change, because the same participants are observed across time. If different age groups are involved in the study, differences between groups with respect to intra-individual development may be examined. The time series design is very general and flexible. The intervention study and the panel study (see below) may be considered as variations on the time series design, involving many participants, many variables, and a limited number of measurements. In contrast, the term ‘time series analysis’ is usually reserved for studies in which a very limited number of subjects is followed through time at a large number of occasions and for a small number of variables.

The intervention study

The classic example of an intervention study is the pretest–posttest control group design (Campbell and Stanley, 1963). In this design, there is an experimental and a control group. The effects of a particular intervention (also termed treatment or manipulation) are studied by comparing the pretest and posttest scores of the experimental and the control group. In experimental (laboratory) studies, random assignment of participants to the control and experimental groups ensures that there are no important differences between the groups as regards possible confounding variables. This means that this design is a powerful means of assessing causal relations; if the experimental and the control group were comparable in terms of their pretest scores and participants were randomly assigned to these groups, a difference between the groups on the posttest measurement must be attributed to the experimental manipulation.
In survey research, however, random assignment of participants to experimental and control groups is usually unethical, impractical, or impossible, whereas the occurrence of the manipulation is often beyond the investigator’s control (compare Chapter 5). Conscience will not let experimenters randomly assign children to experimental and control groups in order to examine the effects of growing up in a one-parent family on, say, substance abuse. In practice, some of the participants experience a particular event during the observed interval (such as the death of their spouse, the separation of their parents, etc.), whereas others do not. It is likely that the ‘experimental’ group (comprising the participants who experienced the event of interest) differed initially from the ‘control’ group. For example, if the event of interest is the death of a spouse, it would seem likely that the experimental group is on average quite a bit older than the control group. Insofar as such differences are relevant to the research question, they must be statistically controlled in order to ensure valid inferences. This ‘non-equivalent control group design’ (Cook and Campbell, 1979) is currently very popular in quasi-experimentation and survey research.

The panel study

In the panel study, a particular set of participants is repeatedly interviewed using the same questionnaires. The term ‘panel study’ was coined by the famous sociologist Paul H. Lazarsfeld when he reflected on the presumed effect of radio advertising on product sales. Traditionally, hearing the radio advertisement was assumed to increase the likelihood that the listeners would buy the corresponding product. Lazarsfeld considered the reverse relationship (people who have purchased the product might notice the advertisement, whereas others would not) plausible as well, casting doubts on the causal direction of this relationship. Lazarsfeld proposed that repeatedly interviewing the same set of people (the ‘panel’) might clarify this issue (Lazarsfeld and Fiske, 1938). However, long before Lazarsfeld, researchers routinely conducted studies involving repeated measurements (for example, in studies on childhood development: Nesselroade and Baltes, 1979; Sontag, 1971). Menard (1991) notes that national censuses have been taken at periodic intervals for more than three hundred ye...

Table of contents

  1. Cover Page
  2. Title
  3. Copyright
  4. Contents
  5. Preface
  6. 1 LONGITUDINAL DATA AND LONGITUDINAL DESIGNS
  7. 2 NONRESPONSE IN LONGITUDINAL RESEARCH
  8. 3 MEASURING CONCEPTS ACROSS TIME: ISSUES OF STABILITY AND MEANING
  9. 4 ISSUES IN DISCRETE-TIME PANEL ANALYSIS
  10. 5 ANALYSIS OF REPEATED MEASURES
  11. 6 ANALYZING DURATIONS
  12. 7 ANALYZING SEQUENCES
  13. References
  14. Author index
  15. Subject index