Linear Regression Models
eBook - ePub

Linear Regression Models

Applications in R

John P. Hoffmann

  1. 432 pagine
  2. English
  3. ePUB (disponibile sull'app)
  4. Disponibile su iOS e Android
eBook - ePub

Linear Regression Models

Applications in R

John P. Hoffmann

Dettagli del libro
Anteprima del libro
Indice dei contenuti
Citazioni

Informazioni sul libro

Research in social and behavioral sciences has benefited from linear regression models (LRMs) for decades to identify and understand the associations among a set of explanatory variables and an outcome variable. Linear Regression Models: Applications in R provides you with a comprehensive treatment of these models and indispensable guidance about how to estimate them using the R software environment.

After furnishing some background material, the author explains how to estimate simple and multiple LRMs in R, including how to interpret their coefficients and understand their assumptions. Several chapters thoroughly describe these assumptions and explain how to determine whether they are satisfied and how to modify the regression model if they are not. The book also includes chapters on specifying the correct model, adjusting for measurement error, understanding the effects of influential observations, and using the model with multilevel data. The concluding chapter presents an alternative model—logistic regression—designed for binary or two-category outcome variables. The book includes appendices that discuss data management and missing data and provides simulations in R to test model assumptions.

Features



  • Furnishes a thorough introduction and detailed information about the linear regression model, including how to understand and interpret its results, test assumptions, and adapt the model when assumptions are not satisfied.


  • Uses numerous graphs in R to illustrate the model's results, assumptions, and other features.


  • Does not assume a background in calculus or linear algebra, rather, an introductory statistics course and familiarity with elementary algebra are sufficient.


  • Provides many examples using real-world datasets relevant to various academic disciplines.


  • Fully integrates the R software environment in its numerous examples.

The book is aimed primarily at advanced undergraduate and graduate students in social, behavioral, health sciences, and related disciplines, taking a first course in linear regression. It could also be used for self-study and would make an excellent reference for any researcher in these fields. The R code and detailed examples provided throughout the book equip the reader with an excellent set of tools for conducting research on numerous social and behavioral phenomena.

John P. Hoffmann is a professor of sociology at Brigham Young University where he teaches research methods and applied statistics courses and conducts research on substance use and criminal behavior.

Domande frequenti

Come faccio ad annullare l'abbonamento?
È semplicissimo: basta accedere alla sezione Account nelle Impostazioni e cliccare su "Annulla abbonamento". Dopo la cancellazione, l'abbonamento rimarrà attivo per il periodo rimanente già pagato. Per maggiori informazioni, clicca qui
È possibile scaricare libri? Se sì, come?
Al momento è possibile scaricare tramite l'app tutti i nostri libri ePub mobile-friendly. Anche la maggior parte dei nostri PDF è scaricabile e stiamo lavorando per rendere disponibile quanto prima il download di tutti gli altri file. Per maggiori informazioni, clicca qui
Che differenza c'è tra i piani?
Entrambi i piani ti danno accesso illimitato alla libreria e a tutte le funzionalità di Perlego. Le uniche differenze sono il prezzo e il periodo di abbonamento: con il piano annuale risparmierai circa il 30% rispetto a 12 rate con quello mensile.
Cos'è Perlego?
Perlego è un servizio di abbonamento a testi accademici, che ti permette di accedere a un'intera libreria online a un prezzo inferiore rispetto a quello che pagheresti per acquistare un singolo libro al mese. Con oltre 1 milione di testi suddivisi in più di 1.000 categorie, troverai sicuramente ciò che fa per te! Per maggiori informazioni, clicca qui.
Perlego supporta la sintesi vocale?
Cerca l'icona Sintesi vocale nel prossimo libro che leggerai per verificare se è possibile riprodurre l'audio. Questo strumento permette di leggere il testo a voce alta, evidenziandolo man mano che la lettura procede. Puoi aumentare o diminuire la velocità della sintesi vocale, oppure sospendere la riproduzione. Per maggiori informazioni, clicca qui.
Linear Regression Models è disponibile online in formato PDF/ePub?
Sì, puoi accedere a Linear Regression Models di John P. Hoffmann in formato PDF e/o ePub, così come ad altri libri molto apprezzati nelle sezioni relative a Mathematik e Wahrscheinlichkeitsrechnung & Statistiken. Scopri oltre 1 milione di libri disponibili nel nostro catalogo.

Informazioni

Anno
2021
ISBN
9781000438109

1 Introduction

DOI: 10.1201/9781003162230-1
Think about how often we’re exposed to data of some sort. Reports of studies in newspapers, magazines, and online provide data about people, animals, or even abstract entities such as cities, counties, or countries. Life expectancies, crime rates, pollution levels, the prevalence of diseases, unemployment rates, election results, and numerous other phenomena are presented with overwhelming frequency and in painful detail. Understanding statistics—or at least being able to talk intelligently about percentages, means, and margins of error—has become nearly compulsory for the well-informed person. Yet, few people understand enough about statistics to fully grasp not only the strengths but also the weaknesses of the way data are collected and analyzed. What does it mean to say that the life expectancy in the U.S. is 78.7 years? Should we trust exit polls that claim that Wexton will win the election over Comstock by 5% (with a “margin of error” of ± 2%)? When someone claims that “taking calcium supplements is not associated with a significantly lower risk of bone fractures in elderly women,” what are they actually saying? These questions, as well as many others, are common in today’s world of statistical analysis and numeracy.
For the budding social or behavioral scientist, whether sociologist, psychologist, geographer, political scientist, or economist, avoiding quantitative analyses that move beyond simple statistics such as percentages, means, standard deviations, and t-tests is almost impossible. A large proportion of studies found in professional journals employ statistical models that are designed to predict or explain the occurrence of one variable with information about other variables. The most common type of prediction tool is a regression model. Many books and articles describe, for example, how to conduct a linear regression analysis (LRA) or estimate an LRM,1 which, as noted in the Preface, is designed to account for or predict the values of a single outcome variable with information from one or more explanatory variables. Students are usually introduced to this model in a second course on applied statistics, and it is the main focus of this book. Before beginning a detailed description of LRMs, though, let’s address some general issues that all researchers and consumers of statistics should bear in mind.
1The word linear is defined as “capable of being represented by a straight line on a graph” (Oxford English Dictionary, definition 3, https://www.oed.com). Why the definition specifies a straight line will become clear in Chapter 3.

Our Doubts are Traitors and Make Us Lose the Good We Oft Might Win2

2William Shakespeare, Measure for Measure, Act I, Scene IV.
A critical issue I hope readers will ponder as they study the material in the following chapters involves perceptions of quantitative research. Statistics has, for better or worse, been maligned by a variety of observers in recent years. For one thing, the so-called “replication crisis” has brought to light the problem that the results of many studies in the social and behavioral sciences cannot be confirmed by subsequent studies.3 Books with titles such as How to Lie with Statistics are also popular4 and can lend an air of disbelief to many studies that use statistical models. Researchers and statistics educators are often to blame for this disbelief. We frequently fail to impart some important caveats to students and consumers, including:
3See, for example, Ed Yong (2018), “Psychology’s Replication Crisis Is Running Out of Excuses,” The Atlantic, November 19 (retrieved from https://www.theatlantic.com/science/archive/2018/11/psychologys-replication-crisis-real/576223).
4The book’s cover notes that it has sold “over half a million copies” (see Darrell Huff (1993), How to Lie with Statistics, New York: W.W. Norton). I suspect this makes it one of the best (if not the best) selling statistics books of all time.
  1. A single study is never the end of the story; multiple studies are needed before we can (or should) reach defensible conclusions about social and behavioral phenomena.
  2. Consumers and researchers need to embrace a healthy dose of skepticism when considering the results of research studies.5 They should ask questions about how data were collected, how variables were measured, and whether the appropriate statistical methods were used. We should also realize that random or sampling “error” (see Chapter 2) affects the results of even the best designed studies.
  3. People should be encouraged to use their common sense and reasoning skills when assessing data and the results of analyses. Although it’s important to minimize confirmation bias and similar cognitive tendencies that (mis)shape how we process and interpret information, we should still consider whether research findings are based on sound premises and follow a logical pattern given what we already know about a phenomenon.
5Healthy is the operative word here. Unfortunately, I fear that many people have become overly skeptical about the results of scientific studies, even from those that are rigorously designed and executed. In the U.S. there have been increasing skepticism, mistrust of people and institutions, and political polarization that affect worldviews and beliefs. This can motivate some to dismiss important research findings that might otherwise be beneficial (see Esteban Ortiz-Ospina and Max Roser (2019), “Trust,” at https://ourworldindata.org/trust; Gleb Tsipursky (2018), “(Dis)trust in Science,” Scientific American, July 5; and Jan Mewes et al. (2021), “Experiences Matter: A Longitudinal Study of Individual-Level Sources of Declining Social Trust in the United States,” Social Science Research, retrieved from https://doi.org/10.1016/j.ssresearch.2021.102537).

Best Statistical Practices6

6Adapted from E. Ashley Steel, Martin Liermann, and Peter Guttorp (2019), “Beyond Calculations: A Course in Statistical Thinking,” American Statistician 73(S1): 392–401.
In the spirit of these three admonitions, it is wise to heed the following advice regarding data analysis in general and regression analysis in particular.
  1. Plot your data—early and often.
  2. Understand that your dataset is only one of many possible sets of data that could have been observed.
  3. Understand the context of your dataset—what is the background science and how were measurements taken (for example, survey questions or direct measures)? What are the limitations of the measurement tools used to collect the data? Are some data missing? Why?
  4. Be thoughtful in choosing summary statistics.
  5. Decide early which parts of your analysis are exploratory and which parts are confirmatory, and preregister7 your hypotheses, if not formally then at least in your own mind.
  6. If you use p-values,8 which can provide some evidence regarding statistical results, follow these principles:
    1. Report effect sizes and confidence intervals (CIs);
    2. Consider providing graphical evidence of predicted values or effect sizes to display for your audience the magnitude of differences furnished by the analysis;
    3. Report the number of tests you conduct (formal and informal);
    4. Interpret the p-value in light of your sample size (and power);
    5. Don’t use p-values to claim that the null hypothesis of no difference is true; and
    6. Consider the p-value as, at best, only one source of evidence regarding your conclusion rather than the conclusion itself.
  7. Consider creating customized, simulation-based statistical tests for answering your specific question with your particular dataset.
  8. Use simulations to understand the performance of your statistical plan on datasets like yours and to test various assumptions.
  9. Read results with skepticism, remembering that patterns can easily occur by chance (especially with small samples), and that unexpected results based on small sample sizes are often wrong.
  10. Interpret statistical results or patterns in data as being consistent or inconsistent with a conceptual model or hypothesis instead of claiming that they reveal or prove some phenomenon or relationship (see Chapter 2 for an elaboration of this recommendation).
7Preregistration is a growing trend wherein researchers publicly identify the hypotheses that guide their work early in the process. They then restrict the analysis to testing those hypotheses and not others. A common view is that researchers must distinguish the hypothesis generating process from the hypothesis testing process. Preregistration is designed to guard against “fishing expeditions”: the tendency to estimate several statistical models and then choose one to report that seems the most interesting or innovative. The point in practice 5 is that we should always preregister hypotheses and the conceptual models or theories that guide them, even if informally, and avoid the temptation to keep estimating statistical models until we “confirm” some attractive, yet post hoc, hypothesis. For additional information, see Brian A. Nosek et al. (2018), “The Preregistration Revolution,” PNAS 115(11): 2600–2606.
8These, as well as effect sizes, CIs, and hypothesis tests, are described in detail in Chapter 2.
The material presented in the following chapters is not completely faithful to these practices. For example, we don’t cover how variables are measured, hypothesis generation, or simulations (but see Appendix B), and we are at times too willing to trust p-values (see Chapter 2). These practices should, nonetheless, be at the forefront of all researchers’ minds as they consider how to plan, execute, and report their own research.
I hope readers of subsequent chapters will be comfortable thinking about the results of quantitative studies as they consider this material and as they embark on their own studies. In fact, I never wish to underemphasize the importance of careful reasoning among those assessing and using statistical techniques. Nor should we suspend our common sense and knowledge of the research literature simply because a set of numbers supports some unusual conclusion. This is not to say that statistical analysis is not valuable or that the results are generally misleading. Numerous findings from research studies that did not comport with accepted knowledge have been shown valid in subsequent studies. Statistical analyses have also led to many noteworthy discoveries in social, behavioral, and health sciences, as well as informed policy in a productive way. The point I wish to impart is that we need a combination of tools—including statistical methods, a clear comprehension of previous research, and our own ideas and reasoning abilities—to help us understand social and behavioral issues.

Statistical Software

I have taught courses on regression models for many years. When I first started out, most social and behavioral scientists used SPSS or SAS to estimate statistical models. I had used both but became a diehard Stata user. So, after a few years teaching students to use SPSS for statistical modeling I switched to Stata. But the tide has turned and the statistical software R (www.r-project.org)—a descendant of SPlus—is on the rise in my field. I therefore opted to prepare this book using R for the analytic examples. Since this is not a book on statistical software, however, I strongly urge readers to take the necessary time to learn to use R, which, among its many capabilities, performs all of the analyses presented herein. It has a rather steep learning curve, but once you’ve mastered the basics of R, estimating univariate, bivariate, and multivariable statistics, including LRMs, is a straightforward task. Learning to use R is easier if you have experience with another statistical software package such as SAS, SPSS, or Stata; but even a diligent novice can l...

Indice dei contenuti

  1. Cover
  2. Half-Title
  3. Series
  4. Title
  5. Copyright
  6. Contents
  7. Preface
  8. Acknowledgments
  9. Author Biography
  10. 1 Introduction
  11. 2 Review of Elementary Statistical Concepts
  12. 3 Simple Linear Regression Models
  13. 4 Multiple Linear Regression Models
  14. 5 The ANOVA Table and Goodness-of-Fit Statistics
  15. 6 Comparing Linear Regression Models
  16. 7 Indicator Variables in Linear Regression Models
  17. 8 Independence
  18. 9 Homoscedasticity
  19. 10 Collinearity and Multicollinearity
  20. 11 Normality, Linearity, and Interaction Effects
  21. 12 Model Specification
  22. 13 Measurement Errors
  23. 14 Influential Observations: : Leverage Points and Outliers
  24. 15 Multilevel Linear Regression Models
  25. 16 A Brief Introduction to Logistic Regression
  26. 17 Conclusions
  27. Appendix A: : Data Management
  28. Appendix B: : Using Simulations to Examine Assumptions of Linear Regression Models
  29. Appendix C: : Selected Formulas
  30. Appendix D: : User-Written R Packages Employed in the Examples
  31. References
  32. Index
Stili delle citazioni per Linear Regression Models

APA 6 Citation

Hoffmann, J. (2021). Linear Regression Models (1st ed.). CRC Press. Retrieved from https://www.perlego.com/book/2555010/linear-regression-models-applications-in-r-pdf (Original work published 2021)

Chicago Citation

Hoffmann, John. (2021) 2021. Linear Regression Models. 1st ed. CRC Press. https://www.perlego.com/book/2555010/linear-regression-models-applications-in-r-pdf.

Harvard Citation

Hoffmann, J. (2021) Linear Regression Models. 1st edn. CRC Press. Available at: https://www.perlego.com/book/2555010/linear-regression-models-applications-in-r-pdf (Accessed: 15 October 2022).

MLA 7 Citation

Hoffmann, John. Linear Regression Models. 1st ed. CRC Press, 2021. Web. 15 Oct. 2022.