eBook - ePub

Linear Regression Models

Name: Linear Regression Models
ISBN: 9781000438109

Applications in R

John P. Hoffmann,

432 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Linear Regression Models

Applications in R

John P. Hoffmann,

About this book

Research in social and behavioral sciences has benefited from linear regression models (LRMs) for decades to identify and understand the associations among a set of explanatory variables and an outcome variable. Linear Regression Models: Applications in R provides you with a comprehensive treatment of these models and indispensable guidance about how to estimate them using the R software environment.

After furnishing some background material, the author explains how to estimate simple and multiple LRMs in R, including how to interpret their coefficients and understand their assumptions. Several chapters thoroughly describe these assumptions and explain how to determine whether they are satisfied and how to modify the regression model if they are not. The book also includes chapters on specifying the correct model, adjusting for measurement error, understanding the effects of influential observations, and using the model with multilevel data. The concluding chapter presents an alternative model—logistic regression—designed for binary or two-category outcome variables. The book includes appendices that discuss data management and missing data and provides simulations in R to test model assumptions.

Features

Furnishes a thorough introduction and detailed information about the linear regression model, including how to understand and interpret its results, test assumptions, and adapt the model when assumptions are not satisfied.

Uses numerous graphs in R to illustrate the model's results, assumptions, and other features.

Does not assume a background in calculus or linear algebra, rather, an introductory statistics course and familiarity with elementary algebra are sufficient.

Provides many examples using real-world datasets relevant to various academic disciplines.

Fully integrates the R software environment in its numerous examples.

The book is aimed primarily at advanced undergraduate and graduate students in social, behavioral, health sciences, and related disciplines, taking a first course in linear regression. It could also be used for self-study and would make an excellent reference for any researcher in these fields. The R code and detailed examples provided throughout the book equip the reader with an excellent set of tools for conducting research on numerous social and behavioral phenomena.

John P. Hoffmann is a professor of sociology at Brigham Young University where he teaches research methods and applied statistics courses and conducts research on substance use and criminal behavior.

Tools to learn more effectively

Saving Books

Keyword Search

Annotating Text

Listen to it instead

Information

Publisher

Year

Print ISBN

eBook ISBN

Topic

Subtopic

Estadísticas para los negocios y la economía

1 Introduction

DOI: 10.1201/9781003162230-1

Think about how often we’re exposed to data of some sort. Reports of studies in newspapers, magazines, and online provide data about people, animals, or even abstract entities such as cities, counties, or countries. Life expectancies, crime rates, pollution levels, the prevalence of diseases, unemployment rates, election results, and numerous other phenomena are presented with overwhelming frequency and in painful detail. Understanding statistics—or at least being able to talk intelligently about percentages, means, and margins of error—has become nearly compulsory for the well-informed person. Yet, few people understand enough about statistics to fully grasp not only the strengths but also the weaknesses of the way data are collected and analyzed. What does it mean to say that the life expectancy in the U.S. is 78.7 years? Should we trust exit polls that claim that Wexton will win the election over Comstock by 5% (with a “margin of error” of ± 2%)? When someone claims that “taking calcium supplements is not associated with a significantly lower risk of bone fractures in elderly women,” what are they actually saying? These questions, as well as many others, are common in today’s world of statistical analysis and numeracy.

For the budding social or behavioral scientist, whether sociologist, psychologist, geographer, political scientist, or economist, avoiding quantitative analyses that move beyond simple statistics such as percentages, means, standard deviations, and t-tests is almost impossible. A large proportion of studies found in professional journals employ statistical models that are designed to predict or explain the occurrence of one variable with information about other variables. The most common type of prediction tool is a regression model. Many books and articles describe, for example, how to conduct a linear regression analysis (LRA) or estimate an LRM,¹ which, as noted in the Preface, is designed to account for or predict the values of a single outcome variable with information from one or more explanatory variables. Students are usually introduced to this model in a second course on applied statistics, and it is the main focus of this book. Before beginning a detailed description of LRMs, though, let’s address some general issues that all researchers and consumers of statistics should bear in mind.

¹The word linear is defined as “capable of being represented by a straight line on a graph” (Oxford English Dictionary, definition 3, https://www.oed.com). Why the definition specifies a straight line will become clear in Chapter 3.

Our Doubts are Traitors and Make Us Lose the Good We Oft Might Win²

²William Shakespeare, Measure for Measure, Act I, Scene IV.

A critical issue I hope readers will ponder as they study the material in the following chapters involves perceptions of quantitative research. Statistics has, for better or worse, been maligned by a variety of observers in recent years. For one thing, the so-called “replication crisis” has brought to light the problem that the results of many studies in the social and behavioral sciences cannot be confirmed by subsequent studies.³ Books with titles such as How to Lie with Statistics are also popular⁴ and can lend an air of disbelief to many studies that use statistical models. Researchers and statistics educators are often to blame for this disbelief. We frequently fail to impart some important caveats to students and consumers, including:

³See, for example, Ed Yong (2018), “Psychology’s Replication Crisis Is Running Out of Excuses,” The Atlantic, November 19 (retrieved from https://www.theatlantic.com/science/archive/2018/11/psychologys-replication-crisis-real/576223).

⁴The book’s cover notes that it has sold “over half a million copies” (see Darrell Huff (1993), How to Lie with Statistics, New York: W.W. Norton). I suspect this makes it one of the best (if not the best) selling statistics books of all time.

A single study is never the end of the story; multiple studies are needed before we can (or should) reach defensible conclusions about social and behavioral phenomena.
Consumers and researchers need to embrace a healthy dose of skepticism when considering the results of research studies.⁵ They should ask questions about how data were collected, how variables were measured, and whether the appropriate statistical methods were used. We should also realize that random or sampling “error” (see Chapter 2) affects the results of even the best designed studies.
People should be encouraged to use their common sense and reasoning skills when assessing data and the results of analyses. Although it’s important to minimize confirmation bias and similar cognitive tendencies that (mis)shape how we process and interpret information, we should still consider whether research findings are based on sound premises and follow a logical pattern given what we already know about a phenomenon.

⁵Healthy is the operative word here. Unfortunately, I fear that many people have become overly skeptical about the results of scientific studies, even from those that are rigorously designed and executed. In the U.S. there have been increasing skepticism, mistrust of people and institutions, and political polarization that affect worldviews and beliefs. This can motivate some to dismiss important research findings that might otherwise be beneficial (see Esteban Ortiz-Ospina and Max Roser (2019), “Trust,” at https://ourworldindata.org/trust; Gleb Tsipursky (2018), “(Dis)trust in Science,” Scientific American, July 5; and Jan Mewes et al. (2021), “Experiences Matter: A Longitudinal Study of Individual-Level Sources of Declining Social Trust in the United States,” Social Science Research, retrieved from https://doi.org/10.1016/j.ssresearch.2021.102537).

Best Statistical Practices⁶

⁶Adapted from E. Ashley Steel, Martin Liermann, and Peter Guttorp (2019), “Beyond Calculations: A Course in Statistical Thinking,” American Statistician 73(S1): 392–401.

In the spirit of these three admonitions, it is wise to heed the following advice regarding data analysis in general and regression analysis in particular.

Plot your data—early and often.
Understand that your dataset is only one of many possible sets of data that could have been observed.
Understand the context of your dataset—what is the background science and how were measurements taken (for example, survey questions or direct measures)? What are the limitations of the measurement tools used to collect the data? Are some data missing? Why?
Be thoughtful in choosing summary statistics.
Decide early which parts of your analysis are exploratory and which parts are confirmatory, and preregister⁷ your hypotheses, if not formally then at least in your own mind.
If you use p-values,⁸ which can provide some evidence regarding statistical results, follow these principles:
1. Report effect sizes and confidence intervals (CIs);
2. Consider providing graphical evidence of predicted values or effect sizes to display for your audience the magnitude of differences furnished by the analysis;
3. Report the number of tests you conduct (formal and informal);
4. Interpret the p-value in light of your sample size (and power);
5. Don’t use p-values to claim that the null hypothesis of no difference is true; and
6. Consider the p-value as, at best, only one source of evidence regarding your conclusion rather than the conclusion itself.
Consider creating customized, simulation-based statistical tests for answering your specific question with your particular dataset.
Use simulations to understand the performance of your statistical plan on datasets like yours and to test various assumptions.
Read results with skepticism, remembering that patterns can easily occur by chance (especially with small samples), and that unexpected results based on small sample sizes are often wrong.
Interpret statistical results or patterns in data as being consistent or inconsistent with a conceptual model or hypothesis instead of claiming that they reveal or prove some phenomenon or relationship (see Chapter 2 for an elaboration of this recommendation).

⁷Preregistration is a growing trend wherein researchers publicly identify the hypotheses that guide their work early in the process. They then restrict the analysis to testing those hypotheses and not others. A common view is that researchers must distinguish the hypothesis generating process from the hypothesis testing process. Preregistration is designed to guard against “fishing expeditions”: the tendency to estimate several statistical models and then choose one to report that seems the most interesting or innovative. The point in practice 5 is that we should always preregister hypotheses and the conceptual models or theories that guide them, even if informally, and avoid the temptation to keep estimating statistical models until we “confirm” some attractive, yet post hoc, hypothesis. For additional information, see Brian A. Nosek et al. (2018), “The Preregistration Revolution,” PNAS 115(11): 2600–2606.

⁸These, as well as effect sizes, CIs, and hypothesis tests, are described in detail in Chapter 2.

The material presented in the following chapters is not completely faithful to these practices. For example, we don’t cover how variables are measured, hypothesis generation, or simulations (but see Appendix B), and we are at times too willing to trust p-values (see Chapter 2). These practices should, nonetheless, be at the forefront of all researchers’ minds as they consider how to plan, execute, and report their own research.

I hope readers of subsequent chapters will be comfortable thinking about the results of quantitative studies as they consider this material and as they embark on their own studies. In fact, I never wish to underemphasize the importance of careful reasoning among those assessing and using statistical techniques. Nor should we suspend our common sense and knowledge of the research literature simply because a set of numbers supports some unusual conclusion. This is not to say that statistical analysis is not valuable or that the results are generally misleading. Numerous findings from research studies that did not comport with accepted knowledge have been shown valid in subsequent studies. Statistical analyses have also led to many noteworthy discoveries in social, behavioral, and health sciences, as well as informed policy in a productive way. The point I wish to impart is that we need a combination of tools—including statistical methods, a clear comprehension of previous research, and our own ideas and reasoning abilities—to help us understand social and behavioral issues.

Statistical Software

I have taught courses on regression models for many years. When I first started out, most social and behavioral scientists used SPSS or SAS to estimate statistical models. I had used both but became a diehard Stata user. So, after a few years teaching students to use SPSS for statistical modeling I switched to Stata. But the tide has turned and the statistical software R (www.r-project.org)—a descendant of SPlus—is on the rise in my field. I therefore opted to prepare this book using R for the analytic examples. Since this is not a book on statistical software, however, I strongly urge readers to take the necessary time to learn to use R, which, among its many capabilities, performs all of the analyses presented herein. It has a rather steep learning curve, but once you’ve mastered the basics of R, estimating univariate, bivariate, and multivariable statistics, including LRMs, is a straightforward task. Learning to use R is easier if you have experience with another statistical software package such as SAS, SPSS, or Stata; but even a diligent novice can l...

Cover
Half-Title
Series
Title
Copyright
Contents
Preface
Acknowledgments
Author Biography
1 Introduction
2 Review of Elementary Statistical Concepts
3 Simple Linear Regression Models
4 Multiple Linear Regression Models
5 The ANOVA Table and Goodness-of-Fit Statistics
6 Comparing Linear Regression Models
7 Indicator Variables in Linear Regression Models
8 Independence
9 Homoscedasticity
10 Collinearity and Multicollinearity
11 Normality, Linearity, and Interaction Effects
12 Model Specification
13 Measurement Errors
14 Influential Observations: : Leverage Points and Outliers
15 Multilevel Linear Regression Models
16 A Brief Introduction to Logistic Regression
17 Conclusions
Appendix A: : Data Management
Appendix B: : Using Simulations to Examine Assumptions of Linear Regression Models
Appendix C: : Selected Formulas
Appendix D: : User-Written R Packages Employed in the Examples
References
Index

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 990+ topics, we’ve got you covered! Learn about our mission

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud

Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app

Yes, you can access Linear Regression Models by John P. Hoffmann in PDF and/or ePUB format, as well as other popular books in Matemáticas & Estadísticas para los negocios y la economía. We have over one million books available in our catalogue for you to explore.