CHAPTER 1
WHAT IS A REGRESSION?
1.0 What We Need to Know When We Finish This Chapter
1.1 Why Are We Doing This?
1.2 Education and Earnings
1.3 What Does a Regression Look Like?
1.4 Where Do We Begin?
1.5 Where’s the Explanation?
1.6 What Do We Look for in This Explanation?
1.7 How Do We Interpret the Explanation?
1.8 How Do We Evaluate the Explanation?
1.9 R2 and the F-statistic
1.10 Have We Put This Together in a Responsible Way?
1.11 Do Regressions Always Look Like This?
1.12 How to Read This Book
1.13 Conclusion
Exercises
1.0 What We Need to Know When We Finish This Chapter
This chapter explains what a regression is and how to interpret it. Here are the essentials.
1. Section 1.4: The dependent or endogenous variable measures the behavior that we want to explain with regression analysis.
2. Section 1.5: The explanatory, independent, or exogenous variables measure things that we think might determine the behavior that we want to explain. We usually think of them as predetermined.
3. Section 1.5: The slope estimates the effect of a change in the explanatory variable on the value of the dependent variable.
4. Section 1.5: The t-statistic indicates whether the explanatory variable has a discernible association with the dependent variable. The association is discernible if the p-value associated with the t-statistic is .05 or less. In this case, we say that the slope is statistically significant. This generally corresponds to an absolute value of approximately two or greater for the t-statistic itself. If the t-statistic has a p-value that is greater than .05, the associated slope coefficient is insignificant. This means that the explanatory variable has no discernible effect.
5. Section 1.6: The intercept is usually uninteresting. It represents what everyone has in common, rather than characteristics that might cause individuals to be different.
6. Section 1.6: We usually interpret only the slopes that are statistically significant. We usually think of them as indicating the effect of their associated explanatory variables on the dependent variable ceteris paribus, or holding constant all other characteristics that are included in the regression.
7. Section 1.6: Continuous variables take on a wide range of values. Their slopes indicate the change that would be expected in the dependent variable if the value of the associated explanatory variable increased by one unit.
8. Section 1.6: Discrete variables, sometimes called categorical variables, indicate the presence or absence of a particular characteristic. Their slopes indicate the change that would occur in the dependent variable if an individual who did not have that characteristic were given it.
9. Section 1.7: Regression interpretation requires three steps. The first is to identify the discernible effects. The second is to understand their magnitudes. The third is to use this understanding to verify or modify the behavioral understanding that motivated the regression in the first place.
10. Section 1.7: Statistical significance is necessary in order to have interesting results, but not sufficient. Important slopes are those that are both statistically significant and substantively large. Slopes that are statistically significant but substantively small indicate that the effects of the associated explanatory variable can be reliably understood as unimportant.
11. Section 1.7: A proxy is a variable that is related to, but not exactly the variable we really want. We use proxies when the variables we really want aren’t available. Sometimes this makes interpretation difficult.
12. Section 1.8: If the p-value associated with the F-statistic is .05 or less, the collective effect of the ensemble of explanatory variables on the dependent variable is statistically significant.
13. Section 1.8: Observations are the individual examples of the behavior under examination. All of the observations together constitute the sample on which the regression is based.
14. Section 1.8: The R2, or coefficient of determination, represents the proportion of the variation in the dependent variable that is explained by the explanatory variables. The adjusted R2 modifies the R2 in order to take account of the numbers of explanatory variables and observations. However, neither measures statistical significance directly.
15. Section 1.9: F-statistics can be used to evaluate the contribution of a subset of explanatory variables, as well as the collective statistical significance of all explanatory variables. In both cases, the F-statistic is a transformation of R2 values.
16. Section 1.10: Regression results are useful only to the extent that the choices of variables in the regression, variable construction, and sample design are appropriate.
17. Section 1.11: Regression results may be presented in one of several different formats. However, they all have to contain the same substantive information.
1.1 Why Are We Doing This?
The fundamental question that underlies most of science is, how does one thing affect another? This is the sort of question that we ask ourselves all the time. Whenever we wonder whether our grade will go up if we study more, whether we’re more likely to get into graduate school if our grades are better, or whether we’ll get a better job if we go to graduate school, we are asking questions that econometrics can answer with elegance and precision.
Of course, we probably think we have answers to these questions already. We almost surely do. However, they’re casual and even sloppy. Moreover, our confidence in them is almost certainly exaggerated.
Econometrics is a collection of powerful statistical tools that are devoted to helping provide answers to the question of how one thing affects another. Econometrics not only teaches us how to answer questions like this more accurately but also helps us understand what is necessary in order to obtain an answer that we can legitimately treat as accurate.
We begin in this chapter with a primer on how to interpret regression results. This will allow us to read work based on regression and even to begin to perform our own analyses. We might think that this would be enough.
However, this chapter will not explain why the interpretations it presents are valid. That requires a much more thorough investigation. We prepare for this investigation in chapter 2. There, we review the summation sign, the most important mathematical tool for the purposes of this book.
We actually embark on this investigation in chapter 3, where we consider the precursors to regression: the covariance and the correlation. These are basic statistics that measure the association between two variables, without regard to causation. We might have seen them before. We return to them in detail because they are the mathematical building blocks from which regressions are constructed.
Our primary focus, however, will be on the fundamentals of regression analysis. Regression is the principal tool that economists use to assess the responsiveness of some outcome to changes in its determinants. We might have had an introduction to regression before as well. Here, we devote chapters 4, 5, and 7 through 14 to a thorough discussion.
Chapter 6 intervenes with a discussion of confidence intervals and hypothesis tests. This material is relevant to all of statistics, rather than specific to econometrics. We introduce it here to help us complete the link between the regression calculations of chapter 4 and the behavior that we hope they represent, discussed in chapter 5.
Chapter 15 discusses what we can do in a common situation where we would like to use regression, but where the available information isn’t exactly appropriate for it. This discussion will introduce us to probit analysis, an important relative of regression. More generally, it will give us some insight as to how we might proceed when faced with other situations of this sort.
As we learn about regression, we will occasionally need concepts from basic statistics. Some of us may have already been exposed to them. For those of us in this category, chapters 3 and 6 may seem familiar, and perhaps even chapter 4. For those of us who haven’t studied statistics before, this book introduces and reviews each of the relevant concepts when our discussion of regression requires them.1
1.2 Education and Earnings
Few of us will be interested in econometrics purely for its theoretical beauty. In fact, this book is based on the premise that what will interest us most is how econometrics can help us organize the quantitative information that we observe all around us. Obviously, we’ll need examples.
There are two ways to approach the selection of examples. Econometric analysis has probably been applied to virtually all aspects of human behavior. This means that there is something for everyone. Why not provide it?
Well, this strategy would involve a lot of examples. Most readers wouldn’t need that many to get the hang of things, and they probably wouldn’t be interested in a lot of them. In addition, they could make the book a lot bigger, which might make it seem intimidating.
The alternative is to focus principally on one example that may have relatively broad appeal and develop it throughout the book. That’s the choice here. We will still sample a variety of applications over the course of the entire text. However, our running example returns, in a larger sense, to the question of section 1.1: Why are we doing this? Except now, let’s talk about college, not this course.
Presumably, at least some of the answer to that question is that we believe college prepares us in an important way for adulthood. Part of that preparation is for jobs and careers. In other words, we probably believe that education has some important effect on our ability to support ourselves.
This is the example that we’ll pursue throughout this book. In the rest of this chapter, we’ll interpret a somewhat complicated regression that repres...