1
Time course data
CONTENTS
1.1 Chapter overview
1.2 What are âtime course dataâ?
1.3 Key challenges in analyzing time course data
1.3.1 Trade-off between power and resolution
1.3.2 Possibility of experimenter bias
1.3.3 Statistical thresholding
1.3.4 Individual differences
1.4 Visualizing time course data
1.5 Formatting data for analysis and plotting
1.5.1 A note on data aggregation
1.6 Chapter recap
1.7 Exercises
1.1 Chapter overview
This chapter will describe the main problems that growth curve analysis is meant to address. First, it will define a particular kind of data, called time course data or longitudinal data, which involve systematic relationships between observations at different time points. These relationships pose problems for simple traditional analysis methods like t-tests.
Section 1.3 will discuss four kinds of problems and illustrate them with concrete examples. First, using separate analyses for individual time bins or time windows creates a trade-off between power (more data in each bin) and temporal resolution (smaller time bins). Second, flexibility in selection of time bins or windows for analysis introduces experimenter bias. Third, statistical thresholding (p < 0.05 is significant but p > 0.05 is not) makes gradual change look abrupt and creates the illusion that continuous processes are discrete. Fourth, there is no clear way to quantify individual differences, which are an important source of constraints for theories in the behavioral sciences.
Section 1.4 will provide a brief introduction to ggplot2
, a powerful and flexible package for graphing data in R
. Section 1.5 will distinguish between wide and long data formats and describe how to use the melt
function to convert data from the wide to the long format, which is the right format for growth curve analysis and for plotting with ggplot2
. The rest of this book will describe growth curve analysis, a multilevel regression method that addresses the challenges discussed in this chapter, provide a guide to applying growth curve analysis to time course data, and demonstrate how to use ggplot2
to visualize time course data and growth curve model fits.
1.2 What are âtime course dataâ
Time course data are the result of making repeated observations or measurements at multiple time points. These sorts of data are also called longitudinal or, more generally, repeated measures data. Imagine that you measured a childâs height annually from birth to 18 years old. You would have a series of 19 data points that describe how that childâs height changed over time during those 18 years. In other words, the growth (height) time course for that child.
Two key properties distinguish time course data from other kinds of data. The first is that groups of observations all come from one source, which is called nested data. In the height example, the source was a particular child. If you repeated this procedure for another child, you would now have two nested series of data points corresponding to the two children in your study. The heights of two randomly selected children may be uncorrelated, but the height of a child at time t is strongly correlated with that childâs height at time t â 1. Nested observations are not independent and this non-independence needs to be taken into account during data analysis. Capturing this nested structure allows quantifying the particular pattern of correlation among data points for an individual, which can reveal potentially interesting individual differences â a taller child compared to a shorter child, whether the child had an earlier or later growth spurt, etc.
In this example, the data were nested or grouped at the individual participant level. The grouping can also be at a higher level. For example, if you measured the weights of newborns at different hospitals every month for a year, you would have data grouped by hospital, rather than by individual child (each child was only weighed once, but each hospitalâs newborns were weighed every month). Groupings can also be at multiple levels; for example, if you followed those children as they grew, you would have measurements grouped by child and children grouped by hospital.
The second key property of longitudinal data is that the repeated measurements are related by a continuous variable. Usually that variable is time, as in the child growth example, but it can be any continuous variable. For example, if you asked participants to name letters printed in different sizes, you could examine the outcome (letter recognition accuracy) as a function of the continuous predictor size. On the other hand, if you had presented letters from different alphabets (Latin, Cyrillic, Hebrew, etc.), that would be a categorical predictor. For categorical predictors, one can only assess whether the outcome was different between different categories (for example, if recognition of Latin letters was better or worse than recognition of Cyrillic letters). For continuous predictors, one can do that kind of simple comparison, but it is also possible to assess the shape of the change â whether the relationship between letter recognition accuracy and letter size follows a straight line, or accuracy improves rapidly for smaller sizes and then reaches a plateau, or follows a U-shape. Because time is so frequently that critical continuous variable, this book will typically refer to these sorts of data as âtime course dataâ even though the same issues apply to other continuous predictors.
As we will see, growth curve analysis (GCA) is a way to analyze nested data that takes the grouping into account and provides a way to quantify and assess the shapes of time course curves. Before getting into GCA, it will help to understand the challenges of analyzing time course data in a little more detail. That is, to understand why traditional methods like t-test and analysis of variance (ANOVA) are not well-suited to these sorts of data. To do that, the next section goes over some examples of the kinds of problems that come up when analyzing time course data.
1.3 Key challenges in analyzing time course data
How should time course data be analyzed? A simple approach is to apply traditional data analysis techniques like t-tests or ANOVAs. For example, we could independently compare conditions at each time bin or time window. This approach has a number of problems, which are easiest to demonstrate with concrete examples.
1.3.1 Trade-off between power and resolution
The data in Figure 1.1 are based on an experiment that examined whether words with high âtransitional probabilityâ (TP) would be learned faster than words with low TP (Mirman, Magnuson, Graf Estes, & Dixon, 2008). Word learning was predicted to be faster in the high TP condition than the low TP condition. The training trials were grouped into blocks to examine the gradual learning. The data in Figure 1.1 are the word âlearning curvesâ: the participants started out near chance (50% correct, because there are two response choices on each trial) and gradually got better, reaching about 90% correct at the end of 10 blocks of training trials. Importantly, it looks like this learning was faster for high TP words.
What kind of statistical test would provide the quantitative test of the effect of TP on word learning? Faster word learning means that participants in the High TP condition generally have higher accuracy, so we could do a t-test comparing the High and Low TP conditions on...