Technology & Engineering

Multiple Regression Analysis

Multiple regression analysis is a statistical method used to examine the relationship between a dependent variable and multiple independent variables. It allows for the identification of the strength and direction of the relationships between the variables, enabling predictions and insights into the impact of the independent variables on the dependent variable.

Written by Perlego with AI-assistance

Related key terms

Correlation and Regression

Least Squares Linear Regression

Linear Regression

Multiple Regression

Nonlinear Regression

Ordinary Least Square Method

Polynomial Regression

Simple Linear Regression Model

Statistical Models

Two Quantitative Variables

8 Key excerpts on "Multiple Regression Analysis"

eBook - ePub
Applied Regression and Modeling
- Amar Sahay(Author)
- 2016(Publication Date)
- Business Expert Press
  (Publisher)
CHAPTER 5 Multiple Regression: Computer Analysis
This chapter provides in-depth analysis of multiple regression model. This is one of the most widely used prediction techniques used in data analysis and decision making. Multiple regression enables us to explore the relationship between a response variable, and two or more independent variables or the predictors. The multiple regression model can be used to predict a response variable using two or more predictors or independent variables. In this chapter we will:
- Outline the difference between simple and multiple regression;
- Explain the multiple regression model and how to establish multiple regression equation;
- Use the multiple regression model to make inferences;
- Assess the quality of multiple regression model by calculating different measures;
- Interpret the computer results from computer packages, such as Excel and MINITAB;
- Test the hypotheses to assess the overall significance of multiple regression model (F-test);
- Test the hypotheses to determine whether each of the independent variables is significant (t-tests);
- Explain multicollinearity problem in multiple regression and explain how to detect multicollinearity;
- Outline the underlying assumptions of multiple regression; and
- Perform residual analysis to check whether the assumptions of multiple regression are met.
Introduction to Multiple Regression

In the previous chapter, we explored the relationship between two variables using simple regression and correlation analysis. We demonstrated how the estimated regression equation can be used to predict a dependent variable (y) using an independent variable (x). We also discussed the correlation between two variables that explains the degree of association between two variables. In this chapter, we expand the concept of simple linear regression to include Multiple Regression Analysis. A multiple linear regression involves one dependent or response variable, and two or more independent variables or predictors
Sign up to read
Learn more about book
eBook - ePub
Applied Statistics Using Stata
A Guide for the Social Sciences
- Mehmet Mehmetoglu, Tor Georg Jakobsen(Authors)
- 2022(Publication Date)
- SAGE Publications Ltd
  (Publisher)
Multiple linear regression analysis is an extension of simple linear regression analysis. While simple regression is used to examine the relationship between a dependent and an independent variable, Multiple Regression Analysis is a technique used to examine the relationship between a continuous dependent and two or more continuous and/or categorical independent variables. One reason why we may want to include more than one independent variable in our conceptual model is that human behaviour is a complex phenomenon influenced by various factors that we need to consider in order to get a more complete picture of the phenomenon under study. The second and a more salient reason is to be able to estimate the effect of a factor (e.g., gender) on a phenomenon (e.g., annual salary) by taking into account or controlling for other relevant factors (e.g., experience, educational level) that may also influence the phenomenon (Keith, 2006).

The regression concepts that we treated in the simple regression framework in the previous chapter are (with only slight adjustments) directly transferable to a multiple regression situation. Here, too, we first develop a theory-driven conceptual model and state it mathematically as follows:

Yi = β0 + β1 X1 i +β2 X2 i +…+ βk Xki . (4.1)

As in the simple regression situation, what we more realistically can claim is that1

1 E[Yi ] should actually be read E[Yi |X1 i ,…,Xki ].

E[Yi ] = β0 + β1 X1 i + β2 X2 i +…+ βk Xki . (4.2)

The term β0 (intercept/constant) is the mean-Y value when all the independent variables in the model are equal to zero (X1 ,X2 ,…,Xk =0). β1 is the regression coefficient showing the amount of change in mean-Y for every unit increase in X1 , while holding the value of all other independent variables in the model constant. This applies also to the interpretation of the coefficients (β2 ,…,βk
Sign up to read
Learn more about book
eBook - ePub
The Reviewer's Guide to Quantitative Methods in the Social Sciences
- Gregory R. Hancock, Laura M. Stapleton, Ralph O. Mueller, Gregory R. Hancock, Laura M. Stapleton, Ralph O. Mueller(Authors)
- 2018(Publication Date)
- Routledge
  (Publisher)
23 Multiple Regression Ken Kelley and Scott E. Maxwell
Multiple regression has been described as a general data analytic system (e.g., Cohen, 1968), primarily because many commonly used statistical models can be regarded as its special cases (e.g., single-sample t -test, two-independent samples t -test, one-way analysis of variance), the independent variables can be categorical (e.g., groups) or quantitative (e.g., level of treatment), and the model can be used for observational or experimental studies. Furthermore, many advanced models have multiple regression as a special case (e.g., path analysis, structural equation modeling, multilevel models, analysis of covariance). The ubiquity of multiple regression makes this model one of the most important and widely used statistical methods in social science research. In general, the idea of the multiple regression model is to relate a set of regressor (independent or predictor ) variables to a criterion (dependent or outcome ) variable, for purposes of explanation and/or prediction, with an equation linear in its parameters. More formally, the population multiple regression model is given as

Y i
=
β 0
+
β 1

X
1 i

+ ⋯ +
β K

X
K i

+
ε i
,

(1)
where β 0 is the population intercept,
βk
is the population regression coefficient for the k th regressor (k = 1, . . ., K ), is the k th regressor for the i th individual (i = 1, . . . , N ), and is the error for the i th individual, generally assumed to be normally distributed with mean 0 and population variance

σ ε 2

. The intercept is the model-implied expected value of Y when each of the K X variables are at values of zero. The intercept may have a meaningful substantive interpretation, such as when the regressor variables are centered around 0 so that the intercept represents the grand mean on the outcome or when the regressor variables are dummy variables and the intercept thus represents the expected value of the outcome for the referent group, otherwise it serves as a scalar so that the sum of the squared errors can be minimized. For contemporary treatments of multiple regression applied to a wide variety of examples, we recommend Cohen, Cohen, West, and Aiken (2003), Pedhazur (1997), Harrell (2001), Fox (2008), Rencher and Schaalje (2008), Gelman and Hill (2007), and Muller and Fetterman (2002). Specific desiderata for applied studies that utilize multiple regression are presented in Table 23.1
Sign up to read
Learn more about book
eBook - ePub
Applied Statistics Using R
A Guide for the Social Sciences
- Mehmet Mehmetoglu, Matthias Mittner(Authors)
- 2021(Publication Date)
- SAGE Publications Ltd
  (Publisher)
In this chapter, we provide the main reasons for the popularity of Multiple Regression Analysis in the social sciences, and explain how to build, estimate, and evaluate a multiple regression model. In doing so, we emphasize the novel concepts emerging in multiple regression, such as adjusted R 2, partial slope coefficients, and the relative importance of regression coefficients. After the conceptual treatment of these issues, we illustrate how to use R to estimate a multiple regression model as well as interpreting the resulting R output. 8.1 Multiple Regression Analysis Multiple linear regression analysis is an extension of the simple linear regression analysis. While simple regression is used to examine the relationship between a dependent and an independent variable, Multiple Regression Analysis is a technique used to examine the relationship between a continuous dependent and two or more continuous or/and categorical independent variables. One reason why we may want to include more than one independent variable in our conceptual model is that human behaviour is a complex phenomenon influenced by various factors that we need to consider in order to get a more complete picture of the phenomenon under study. The second and a more salient reason is to be able to estimate the effect of a factor (e.g. gender) on a phenomenon (e.g. annual salary) by taking into account, or controlling for, other relevant factors (e.g. experience, educational level) that may also influence the phenomenon (Keith, 2006). The regression concepts that we treated in the simple regression framework in the previous chapter are (with only some slight adjustments) directly transferable to a multiple regression situation
Sign up to read
Learn more about book
eBook - ePub
Statistics for Dental Clinicians
- Michael Glick, Alonso Carrasco-Labra, Olivia Urquhart(Authors)
- 2023(Publication Date)
- Wiley-Blackwell
  (Publisher)
11 Understanding and interpreting a regression analysis

If two variables change in tandem in the same direction (directly or positively) or in different directions (inversely, indirectly, or negatively), they can be considered correlated and the magnitude of this correlation can be quantified (e.g., with the Pearson product‐moment correlation coefficient, Spearman’s rank correlation coefficient) (Chapter 10 ). Regression is an extension of this concept in which an equation or model is derived to describe how the expected value of some variable—a dependent variable—is related to the values of one or more other variables—independent variables. A dependent variable depends on the value of some other variable(s) and is also referred to as an outcome, response, or predicted variable. Independent variables can help explain the variability in the values of the dependent variable and are sometimes described as predictors, exposures, covariates, or explanatory variables. The terminology used to describe these variables will primarily depend on the purpose of the research, but for simplicity the terms dependent or outcome variables and independent variables are used herein.

Regression analysis is a useful tool for researchers who want to go further than just summarizing the frequency of variables in a sample (i.e., descriptive statistics); regression can also estimate the magnitude of intervention/exposure and outcome relationships and make outcome predictions.

Estimation

Clinicians want to know which interventions are the most effective for treating a disease or which behavior changes can prevent future disease. A number of study designs (Chapters 15 , 16 , 17 , and 18
Sign up to read
Learn more about book
eBook - ePub
Applied Multivariate Statistics for the Social Sciences
Analyses with SAS and IBM's SPSS, Sixth Edition
- Keenan A. Pituch, James P. Stevens(Authors)
- 2015(Publication Date)
- Routledge
  (Publisher)
Since, as we have indicated earlier, least squares regression can be quite sensitive to outliers, some researchers prefer regression techniques that are relatively insensitive to outliers, that is, robust regression techniques. Since the early 1970s, the literature on these techniques has grown considerably (Hogg, 1979; Huber, 1977; Mosteller & Tukey, 1977). Although these techniques have merit, we believe that use of least squares, along with the appropriate identification of outliers and influential points, is a quite adequate procedure.

3.18 Multivariate Regression

In multivariate regression we are interested in predicting several dependent variables from a set of predictors. The dependent variables might be differentiated aspects of some variable. For example, Finn (1974) broke grade point average (GPA) up into GPA required and GPA elective, and considered predicting these two dependent variables from high school GPA, a general knowledge test score, and attitude toward education. Or, one might measure “success as a professor” by considering various aspects of success such as: rank (assistant, associate, full), rating of institution working at, salary, rating by experts in the field, and number of articles published. These would constitute the multiple dependent variables.

3.18.1 Mathematical Model
In multiple regression (one dependent variable), the model was
y = Xβ + e,

where y was the vector of scores for the subjects on the dependent variable, X was the matrix with the scores for the subjects on the predictors, e was the vector of errors, and β was vector of regression coefficients.

In multivariate regression the y, β, and e vectors become matrices, which we denote by Y, B, and E:

Y = XB + E

The first column of Y gives the scores for the subjects on the first dependent variable, the second column the scores on the second dependent variable, and so on. The first column of B
Sign up to read
Learn more about book
eBook - ePub
Quantitative Analysis in Archaeology
- Todd L. VanPool, Robert D. Leonard(Authors)
- 2011(Publication Date)
- Wiley-Blackwell
  (Publisher)
11 Linear Regression and Multivariate Analysis
As useful as measures of central tendency and dispersion are, they cannot characterize all of the relationships that interest archaeologists. We may ask questions about the structure of a single variable of an assemblage (e.g., rim angles of different pottery types), but archaeological analysis often focuses on the relationships among two or more variables. Does rim angle change as maximum vessel height changes? Are longer projectile points also wider, or does the hafting element constrain a point’s maximum width? Do settlements in an area get bigger through time? Do they get bigger moving down slope towards river flood plains? All of these questions might be interesting to an archaeologist, but they require the analyst to consider the relationships evident among two or more variables. As helpful as the mean, standard deviation, and other measurements we have discussed thus far are, they are not adequate for such tasks. The analyst needs additional statistical tools that can be called, as a group, multivariate analyses.

Perhaps the simplest and most straightforward multivariate method is linear regression, a technique that is one of the most widely used in archaeological (and other) analyses. Most people are familiar with it, at least in an abstract sense. It is useful in so many different contexts that we find it impossible to read the newspaper, a financial report, or listen to the daily news without encountering it (or at least the newsworthy results of its application). As common as it is, however, it does have certain limitations and assumptions, which are unfortunately often ignored, leading to hidden, but severe, analytic difficulties. Still, it is a powerful and flexible tool that is indispensible for archaeological analysis.

Simply put, linear regression allows us to examine the relationship between two continuously measured variables where we believe one variable influences the values of another. For example, we might expect the absolute number of hearths to increase in large habitation sites relative to small sites simply because more hearths are needed for cooking and heating in larger settlement. If this is so, then settlement size and hearth frequency share a functional relationship , in the sense that one of the variables (number of hearths) is dependent
Sign up to read
Learn more about book
eBook - ePub
Statistics at Square Two
- Michael J. Campbell, Richard M. Jacques(Authors)
- 2023(Publication Date)
- Wiley-Blackwell
  (Publisher)
error term.

The models are fitted by choosing estimates b0 , b1 ,…, bp which minimise the sum of squares (SS) of the predicted error. These estimates are termed ordinary least squares estimates. Using these estimates we can calculate the fitted values yi fit and the observed residuals ei = yi – yi fit as discussed in Chapter 1 . Here it is clear that the residuals estimate the error term. Further details are given in, for example, Draper and Smith.2
2.2 Uses of Multiple Regression
Multiple regression is one of the most useful tools in a statistician’s armoury.

To estimate the relationship between an input (independent) variable and a continuous output (dependent) variable adjusting for the effects of potential confounding variables. For example, to investigate the effect of diet on weight allowing for smoking habits. Here the dependent variable is the outcome from a clinical trial. The independent variables could be the two treatment groups (as a 0/1 binary variable), smoking (as a continuous variable in numbers of packs per week) and baseline weight. The multiple regression model allows one to compare the outcome between groups, having adjusted for differences in baseline weight and smoking habit. This is also known as analysis of covariance.

To analyse the simultaneous effects of a number of categorical variables on a continuous output variable. An alternative technique is the analysis of variance but the same results can be achieved using multiple regression.

To predict a value of a continuous outcome for given inputs. For example, an investigator might wish to predict the forced expiratory volume (FEV 1 ) of a subject given age and height, so as to be able to calculate the observed FEV 1 as a percentage of predicted, and to decide if the observed FEV 1 is below, say, 80% of the predicted one.
2.3 Two Independent Variables

We will start off by considering two independent variables, which can be either continuous or binary. There are three possibilities: both variables continuous, both binary (0/1), or one continuous and one binary. We will anchor the examples in some real data.
Sign up to read
Learn more about book

Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.

Explore more topic indexes

Biological Sciences

Computer Science

Languages & Linguistics

Politics & International Relations

Social Sciences