1
Introduction
Over the last three decades, social scientists have been collecting and analyzing event history data with increasing frequency. This trend is not accidental, nor does it reflect the prevailing fashion in survey research or statistical analysis. Instead, it indicates a growing recognition among social scientists that event history data are often the most appropriate empirical information one can get on the substantive process under study.
James Coleman (1981, p. 6) characterized this kind of substantive process in the following general way: (1) There is a collection of units (which may be individuals, organizations, societies, or whatever), each moving among a finite (usually small) number of states; (2) these changes (or events) may occur at any point in time (i.e., they are not restricted to predetermined points in time); and (3) there are time-constant and/or time-dependent factors influencing the events.
Illustrative examples of this type of substantive process can be given for a wide variety of social research fields: In labor market studies, workers move between unemployment and employment,1 full-time and part-time work,2 or among various kinds of jobs;3 in social inequality studies, people become home-owners over the course of their lives (life course);4 in demographic analyses, men and women enter into consensual unions, marriages, or into father-/motherhood or are getting a divorce;5 in sociological mobility studies, employees shift through different occupations, social classes, or industries;6 in studies of organizational ecology, firms, unions, or organizations are founded or closed down;7 in political science research, governments break down, voluntary organizations are founded, or countries go through a transition from one political regime to another;8 in migration studies, people move between different regions or countries;9 in marketing applications, consumers switch from one brand to another or purchase the same brand again; in criminological studies, prisoners are released and commit another criminal act after some time; in communication analyses, interaction processes such as interpersonal and small group processes are studied;10 in educational studies, students drop out of school before completing their degrees, enter into a specific educational track, or start a program of further education later in life;11 in analyses of ethnic conflict, incidences of racial and ethnic confrontation, protest, riot, and attack are studied;12 in socialpsychological studies, aggressive responses are analyzed;13 in psychological studies, human development processes are studied;14 in psychiatric analyses, people may show signs of psychoses or neuroses at a specific age;15 in social policy studies, entry to and exit from poverty, transitions into retirement, or the changes in living conditions in old age are analyzed;16 in medical and epidemiological applications, patients switch between the states “healthy” and “diseased” or go through various phases of an addiction career;17 and so on.
Technically speaking, in all of these diverse examples, units of analysis occupy a discrete state in a theoretically meaningful state space, and transitions between these states can occur virtually at any time.18 Given an event history data set, the typical problem of the social scientist is to use appropriate statistical methods for describing this process of change, to discover the causal relationships among events, and to assess their importance.
This book was written to help the applied social scientist achieve these goals. In this introductory chapter, we first discuss different observation plans and their consequences for causal modeling. We also summarize the fundamental concepts of event history analysis and show that the change in the transition rate is a natural way to represent the causal effect in a statistical model. The remaining chapters are organized as follows:
- Chapter 2 describes event history data sets and their organization. It also shows how to use such data sets with Stata.
- Chapter 3 discusses basic nonparametric methods used to describe event history data, mainly the life table and the Kaplan-Meier (product-limit) estimation methods as well as cumulative incidence functions, and finally an appropriate nonparametric method for processes with competing risks.
- Chapter 4 deals with the basic exponential transition rate model. Although this very simple model is almost never appropriate in practical applications, it serves as an important starting point for all other transition rate models.
- Chapter 5 describes a simple generalization of the basic exponential model, called the piecewise constant exponential model. In our view, this is one of the most useful models for empirical research, and we devote a full chapter to discussing it.
- Chapter 6 discusses time-dependent covariates. The examples are restricted to exponential and piecewise exponential models, but the topic–and part of the discussion–is far more general. In particular, we introduce the problem of how to model parallel and interdependent processes.
- Chapter 7 introduces a variety of models with a parametrically specified duration-dependent transition rate, in particular the Gompertz–Makeham, Weibull, log-logistic, and log-normal models.
- Chapter 8 discusses the question of goodness-of-fit checks for parametric transition rate models. In particular, the chapter describes simple graphical checks based on transformed survivor functions and generalized residuals.
- Chapter 9 introduces basic and advanced semiparametric transition rate models based on an estimation approach proposed by D. R. Cox (1972). The advanced part on Cox models describes fixed effects models for multiepisode data.
- Chapter 10 discusses problems of model specification, in particular, transition rate models with unobserved heterogeneity. The discussion on specifying more than one error term on the same level is mainly critical, and examples are restricted to using a gamma mixing distribution. This chapter also provides a short introduction to transition rate models with random coefficients in a multilevel framework.
- Chapter 11 by Brendan Halpin introduces sequence analysis as a complementary alternative to event history analysis, with exploratory and descriptive advantages. Sequence analysis focuses on longitudinal data such as life course trajectories as wholes and calculates distances between sequences. These distances can be used to create data-based typologies using cluster analysis to calculate distance to theoretically or empirically defined reference sequences, to compare trajectory diversity across variables such as cohort, to compare similarity of dyadic pairs of sequences such as couples’ time use, and so on.
- The Appendix contains many exercises designed to help the reader gain a deeper understanding of the (basic) concepts, more familiarity with how to handle event history data and estimation commands in Stata, and greater skill in estimating the results.
1.1 Causal modeling and observation plans
In event history modeling, design issues regarding the type of substantive process are of crucial importance. It is assumed that the methods of data analysis (e.g., estimation and testing techniques) cannot depend on the particular type of data alone (e.g., cross-sectional data, panel data, etc.) as has been the case when applying more traditional statistical methodologies. Rather, the characteristics of the specific kind of social process itself must “guide” both the design of data collection and the way these data are analyzed and interpreted (Coleman, 1973, 1981, 1990).
Different observation plans have been used to collect ...