Technology & Engineering

Multiple Regression

Multiple regression is a statistical technique used to analyze the relationship between a dependent variable and two or more independent variables. It extends simple linear regression by allowing for the consideration of multiple predictors simultaneously. This method is commonly employed in technology and engineering to model complex relationships and make predictions based on multiple input factors.

Written by Perlego with AI-assistance

10 Key excerpts on "Multiple Regression"

eBook - ePub
Applied Regression and Modeling
- Amar Sahay(Author)
- 2016(Publication Date)
- Business Expert Press
  (Publisher)
CHAPTER 5 Multiple Regression: Computer Analysis
This chapter provides in-depth analysis of Multiple Regression model. This is one of the most widely used prediction techniques used in data analysis and decision making. Multiple Regression enables us to explore the relationship between a response variable, and two or more independent variables or the predictors. The Multiple Regression model can be used to predict a response variable using two or more predictors or independent variables. In this chapter we will:
- Outline the difference between simple and Multiple Regression;
- Explain the Multiple Regression model and how to establish Multiple Regression equation;
- Use the Multiple Regression model to make inferences;
- Assess the quality of Multiple Regression model by calculating different measures;
- Interpret the computer results from computer packages, such as Excel and MINITAB;
- Test the hypotheses to assess the overall significance of Multiple Regression model (F-test);
- Test the hypotheses to determine whether each of the independent variables is significant (t-tests);
- Explain multicollinearity problem in Multiple Regression and explain how to detect multicollinearity;
- Outline the underlying assumptions of Multiple Regression; and
- Perform residual analysis to check whether the assumptions of Multiple Regression are met.
Introduction to Multiple Regression

In the previous chapter, we explored the relationship between two variables using simple regression and correlation analysis. We demonstrated how the estimated regression equation can be used to predict a dependent variable (y) using an independent variable (x). We also discussed the correlation between two variables that explains the degree of association between two variables. In this chapter, we expand the concept of simple linear regression to include Multiple Regression analysis. A multiple linear regression involves one dependent or response variable, and two or more independent variables or predictors
Sign up to read
Learn more about book
eBook - ePub
The Reviewer's Guide to Quantitative Methods in the Social Sciences
- Gregory R. Hancock, Laura M. Stapleton, Ralph O. Mueller(Authors)
- 2018(Publication Date)
- Routledge
  (Publisher)
23 Multiple Regression Ken Kelley and Scott E. Maxwell
Multiple Regression has been described as a general data analytic system (e.g., Cohen, 1968), primarily because many commonly used statistical models can be regarded as its special cases (e.g., single-sample t -test, two-independent samples t -test, one-way analysis of variance), the independent variables can be categorical (e.g., groups) or quantitative (e.g., level of treatment), and the model can be used for observational or experimental studies. Furthermore, many advanced models have Multiple Regression as a special case (e.g., path analysis, structural equation modeling, multilevel models, analysis of covariance). The ubiquity of Multiple Regression makes this model one of the most important and widely used statistical methods in social science research. In general, the idea of the Multiple Regression model is to relate a set of regressor (independent or predictor ) variables to a criterion (dependent or outcome ) variable, for purposes of explanation and/or prediction, with an equation linear in its parameters. More formally, the population Multiple Regression model is given as

Y i
=
β 0
+
β 1

X
1 i

+ ⋯ +
β K

X
K i

+
ε i
,

(1)
where β 0 is the population intercept,
βk
is the population regression coefficient for the k th regressor (k = 1, . . ., K ), is the k th regressor for the i th individual (i = 1, . . . , N ), and is the error for the i th individual, generally assumed to be normally distributed with mean 0 and population variance

σ ε 2

. The intercept is the model-implied expected value of Y when each of the K X variables are at values of zero. The intercept may have a meaningful substantive interpretation, such as when the regressor variables are centered around 0 so that the intercept represents the grand mean on the outcome or when the regressor variables are dummy variables and the intercept thus represents the expected value of the outcome for the referent group, otherwise it serves as a scalar so that the sum of the squared errors can be minimized. For contemporary treatments of Multiple Regression applied to a wide variety of examples, we recommend Cohen, Cohen, West, and Aiken (2003), Pedhazur (1997), Harrell (2001), Fox (2008), Rencher and Schaalje (2008), Gelman and Hill (2007), and Muller and Fetterman (2002). Specific desiderata for applied studies that utilize Multiple Regression are presented in Table 23.1
Sign up to read
Learn more about book
eBook - ePub
Business Forecasting
A Practical Approach
- A. Reza Hoshmand(Author)
- 2009(Publication Date)
- Routledge
  (Publisher)
8 Forecasting with Multiple Regression
In the previous chapter, it was shown how regression and correlation analysis are used for purposes of prediction and planning. The techniques and concepts presented were used as a tool in analyzing the relationship that may exist between two variables. A single independent variable was used to estimate the value of the dependent variable. In this chapter, you will learn about the concepts of regression and correlation where two or more independent variables are used to estimate the dependent variable and make a forecast. There is only one dependent variable (as in the simple linear regression), but several independent variables. This improves our ability not only to estimate the dependent variable, but also to explain more fully its variations.
Since Multiple Regression and correlation is simply an extension of the simple regression and correlation, we will show how to derive the Multiple Regression equation using two or more independent variables. Second, attention will be given to calculating the standard error of estimate and related measures. Third, the computation of multiple coefficient of determination and correlation will be explained. Finally, we will discuss the use of Multiple Regression in business and economic forecasting.
The advantage of Multiple Regression over simple regression analysis is in enhancing our ability to use more available information in estimating the dependent variable. To describe the relationship between a single variable Y and several variables X , we may write the Multiple Regression equation as:

Y = a + b 1 X 1 + b 2 X 2 + … + b k X k +
[8-1]
where
Y = the dependent variable
X 1 … X k = the independent variables
= the error term, which is a random variable with a mean of zero and a standard deviation of .
The numerical constants, a , b 1 to b k
Sign up to read
Learn more about book
eBook - ePub
Applied Multivariate Statistics for the Social Sciences
Analyses with SAS and IBM's SPSS, Sixth Edition
- Keenan A. Pituch, James P. Stevens(Authors)
- 2015(Publication Date)
- Routledge
  (Publisher)
Since, as we have indicated earlier, least squares regression can be quite sensitive to outliers, some researchers prefer regression techniques that are relatively insensitive to outliers, that is, robust regression techniques. Since the early 1970s, the literature on these techniques has grown considerably (Hogg, 1979; Huber, 1977; Mosteller & Tukey, 1977). Although these techniques have merit, we believe that use of least squares, along with the appropriate identification of outliers and influential points, is a quite adequate procedure.

3.18 Multivariate Regression

In multivariate regression we are interested in predicting several dependent variables from a set of predictors. The dependent variables might be differentiated aspects of some variable. For example, Finn (1974) broke grade point average (GPA) up into GPA required and GPA elective, and considered predicting these two dependent variables from high school GPA, a general knowledge test score, and attitude toward education. Or, one might measure “success as a professor” by considering various aspects of success such as: rank (assistant, associate, full), rating of institution working at, salary, rating by experts in the field, and number of articles published. These would constitute the multiple dependent variables.

3.18.1 Mathematical Model
In Multiple Regression (one dependent variable), the model was
y = Xβ + e,

where y was the vector of scores for the subjects on the dependent variable, X was the matrix with the scores for the subjects on the predictors, e was the vector of errors, and β was vector of regression coefficients.

In multivariate regression the y, β, and e vectors become matrices, which we denote by Y, B, and E:

Y = XB + E

The first column of Y gives the scores for the subjects on the first dependent variable, the second column the scores on the second dependent variable, and so on. The first column of B
Sign up to read
Learn more about book
eBook - ePub
Introductory Statistics and Analytics
A Resampling Perspective
- Peter C. Bruce(Author)
- 2015(Publication Date)
- Wiley
  (Publisher)
Regression is a ubiquitous procedure—it is used in a wide variety of fields. Most uses fall into one of the two categories:

Explaining relationships. Researchers want to understand whether x is related to y. For example, is race a factor in criminal sentencing? How do sex and other factors, affect earnings?

Predictive modeling. Data scientists and business analysts, for example, want to predict how much a customer will spend?

13.1 Regression as Explanation

For answers, we will turn to regression, a technique that was introduced earlier for one variable and which we will now extend to multiple input variables. The term “input variable” is used here; elsewhere, the term “independent variable” is used. Essentially, we are attempting to predict or explain the behavior of a variable of interest—the outcome—in terms of the levels of other variables—inputs. In different fields, such as biostatistics, data mining, and machine learning, these variables have different names (Figure 13.1 ).

Figure 13.1
Different terms for variables in regression.

Here is a summary.
In these problems, we have measurement data on an outcome variable—a single dependent variable of interest. We wish to model how that variable depends on other variables—input or independent variables. We also wish to measure how reliable our model is—how much it might differ if we were to select a different dataset. We may also wish to test whether an apparent relationship between input variables and an outcome variable could be due to chance. We will first review simple linear regression and address such problems via a resampling technique. Then, we will discuss the process of going from the single independent variable you already know about, to multiple independent variables.

The diagram shown earlier talks of “causation” and the terminology refers to one outcome variable “depending” on other variables. The directional nature of this relationship is a product of our belief, presumably on the basis of theory or knowledge, but regression does not prove it. The mathematics of regression merely describes a relationship; it does not prove a direction of causation. So the logical train of thought is thus the following: (i) We have a theory that y depends on a set of x-variables. (ii) Regression analysis may confirm that there is a relationship, and it may also describe the strength of that relationship. (iii) If so, we take this as evidence that our theory is correct. However, you can see that there is no guarantee that the theory that y depends on x is correct. The direction of the relationship could be the reverse. Or both x and y
Sign up to read
Learn more about book
eBook - ePub
Introductory Regression Analysis
with Computer Application for Business and Economics
- Allen Webster(Author)
- 2013(Publication Date)
- Routledge
  (Publisher)
Chapter 4
Multiple Regression

Using Two or More Predictor Variables

Introduction

• Additional Assumptions

4.1

The Multiple Regression Model

• The Adjusted Coefficient of Determination

• Analyzing the Model

• A Change in the Coefficient for GDP

4.2

The Issue of Multicollinearity

• The Problems of Multicollinearity

• Detecting Multicollinearity

• Treating the Problem of Multicollinearity

4.3
Analysis of Variance: Using the F -Test for Significance

4.4

Dummy Variables

• Allowing for More Responses in a Qualitative Variable

• Using Dummy Variables to Deseasonalize Time Series Data

• Interpreting a Computer's Printout

4.5

Interaction Between Independent Variables

4.6

Incorporating Slope Dummies

4.7

Control Variables

4.8
A Partial F -Test

4.9

Computer Applications

• Excel

• Minitab

• SPSS

4.10

Review Problem

Chapter Problems
Conceptual Problems Computational Problems
Computer Problem

INTRODUCTION
This chapter offers an extension of the simple regression model we examined in Chapters 2 and 3 by introducing additional explanatory variables. The use of two or more right-hand-side (RHS) variables results in a Multiple Regression model . Additional explanatory variables provide many advantages not offered by our simple, bivariate model consisting of only one RHS variable. If one independent variable proves useful in explaining changes in the dependent variable, integrating additional predictor variables could enhance the explanatory power of the model even further.

A Multiple Regression model allows us to control for other variables that affect the dependent variable. Rarely in business and economic matters do we find that a dependent variable is influenced by only one factor. Economic models are generally the consequence of several variables that interface to produce a combined effect. Multiple Regression allows us to assimilate the aggregated results of these influential elements.
Sign up to read
Learn more about book
eBook - ePub
Statistics at Square Two
- Michael J. Campbell, Richard M. Jacques(Authors)
- 2023(Publication Date)
- Wiley-Blackwell
  (Publisher)
error term.

The models are fitted by choosing estimates b0 , b1 ,…, bp which minimise the sum of squares (SS) of the predicted error. These estimates are termed ordinary least squares estimates. Using these estimates we can calculate the fitted values yi fit and the observed residuals ei = yi – yi fit as discussed in Chapter 1 . Here it is clear that the residuals estimate the error term. Further details are given in, for example, Draper and Smith.2
2.2 Uses of Multiple Regression
Multiple Regression is one of the most useful tools in a statistician’s armoury.

To estimate the relationship between an input (independent) variable and a continuous output (dependent) variable adjusting for the effects of potential confounding variables. For example, to investigate the effect of diet on weight allowing for smoking habits. Here the dependent variable is the outcome from a clinical trial. The independent variables could be the two treatment groups (as a 0/1 binary variable), smoking (as a continuous variable in numbers of packs per week) and baseline weight. The Multiple Regression model allows one to compare the outcome between groups, having adjusted for differences in baseline weight and smoking habit. This is also known as analysis of covariance.

To analyse the simultaneous effects of a number of categorical variables on a continuous output variable. An alternative technique is the analysis of variance but the same results can be achieved using Multiple Regression.

To predict a value of a continuous outcome for given inputs. For example, an investigator might wish to predict the forced expiratory volume (FEV 1 ) of a subject given age and height, so as to be able to calculate the observed FEV 1 as a percentage of predicted, and to decide if the observed FEV 1 is below, say, 80% of the predicted one.
2.3 Two Independent Variables

We will start off by considering two independent variables, which can be either continuous or binary. There are three possibilities: both variables continuous, both binary (0/1), or one continuous and one binary. We will anchor the examples in some real data.
Sign up to read
Learn more about book
eBook - ePub
Statistical Methods for Communication Science
- Andrew F. Hayes(Author)
- 2020(Publication Date)
- Routledge
  (Publisher)
Y unaccountable for by the predictor variables in the model.

It is convenient to drop the residual term and rewrite equation 13.1 in terms of the estimated or expected Y value:

E
( Y )
=
Y ^
i
= a +
b 1
X
1 i
+
b 2
X
2 i
+
b 3
X
3 i
+ ... +
b k
X
k i
13.2

where Ŷ
i
is case i’s estimated Y. Just as in simple regression, the Multiple Regression model assumes that the dependent measure is quantitative in nature and measured at the interval or ratio level, and that the predictor variables are either quantitative and measured at either the pseudo-interval level or higher, or dichotomous. But communication researchers often conduct a Multiple Regression analysis of an outcome variable that is quantitative but only ordinal or pseudo-interval. In many circumstances, probably not much harm is done in analyzing ordinal outcome data using OLS regression. There are methods that are more appropriate for the regression analysis of ordinal outcome variables that you should eventually familiarize yourself with (see section 13.6.7 ).

Figure 13.1
A three-dimensional regression plane.

In simple regression, the regression equation is a description of the relationship between X and Y that can be characterized visually and described mathematically as a line. In Multiple Regression, the result is still a regression equation, but rather than being a line, it is best conceptualized as a regression surface. For example, consider a Multiple Regression model of the form Ŷ = 2+0.5( X1 ) −0.2( X2 ). This regression equation will produce a Ŷ value as a function of both X1 and X2 . Figure 13.1 represents this regression equation visually. As can be seen, when there are two predictor variables the regression equation yields a regression plane, with the Ŷ from the regression model residing somewhere on this plane in three-dimensional space. It becomes difficult to visually represent or even cognitively conceptualize what the regression surface would look like with more than 2 predictor variables. With k predictor variables, the regression surface requires k + 1 dimensions in space to represent it visually, something very difficult to imagine much less illustrate when k
Sign up to read
Learn more about book
eBook - ePub
A Primer of Multivariate Statistics
- Richard J. Harris(Author)
- 2001(Publication Date)
- Psychology Press
  (Publisher)
2 Multiple Regression: Predicting One Variable From Many
As pointed out in
Chapter 1
, Multiple Regression is a technique used to predict scores on a single outcome variable Y on the basis of scores on several predictor variables, the Xi S. To keep things as concrete as possible for a while (there will be plenty of time for abstract math later), let us consider the hypothetical data in Data Set 1.

Data Set 1

Say that X 1 , X 2 , and
X3
represent IQ as measured by the Stanford-Binet, age in years, and number of speeding citations issued the subject in the past year, respectively. From these pieces of information, we wish to predict Y —proficiency rating as an assembly-line worker in an auto factory. In fact, these are purely hypothetical data. Hopefully you will be able to think of numerous other examples of situations in which we might be interested in predicting scores on one variable from scores on 3 (or 4 or 2 or 25) other variables. The basic principles used in making such predictions are not at all dependent on the names given the variables, so we might as well call them X 1 , X 2 , . . .,
Xn ,
and Y . The Xs may be called independent variables, predictor variables, or just predictors, and Y may be referred to as the dependent variable , the predicted variable, the outcome measure , or the criterion. Because in many applications of Multiple Regression all measures (Xs and Y) are obtained at the same time, thus blurring the usual independent-dependent distinction, predictor-predicted would seem to be the most generally appropriate terminology. In Data Set 1 we have three predictor variables and one predicted variable.

At this point you might question why anyone would be interested in predicting Y at all. Surely it is easier (and more accurate) to look up a person's score on Y than it is to look up his or her score on three other measures (X 1 , X 2 and X 3 ) and then plug these numbers into some sort of formula to generate a predicted score on Y. There are at least three answers to this question. First, we may be more interested in the prediction formula itself than in the predictions it generates. The sine qua non of scientific research has always been the successive refinement of mathematical formulae relating one variable to one or more other variables, for example, P = VT/ C, E = mc 2 , Stevens's power law versus Thurstonian scaling procedures, and so on. Second, and probably the most common reason for performing Multiple Regression, is that we may wish to develop an equation that can be used to predict values on Y for subjects for whom we do not already have this information. Thus, for instance, we might wish to use the IQ, age, and number of speeding citations of a prospective employee to predict his probable performance on the job as an aid in deciding whether to hire him. It seems reasonable to select as our prediction equation for this purpose that formula that does the best job of predicting the performance of our present and past employees from these same measures. The classic way of approaching this problem is to seek from our available data the best possible estimates of the parameters (free constants not specified on a priori grounds) of the population prediction equation. Not unexpectedly, this "best" approximation to the population prediction equation is precisely the same as the equation that does the best job of predicting Y scores in a random sample from the population to which we wish to generalize. Finally, we may wish to obtain a measure of the overall degree of relationship between Y, on the one hand, and the X s, on the other. An obviously relevant piece of information on which to base such a measure is just how good (or poor) a job we can do of predicting Y from the X s. Indeed, one of the outputs from a Multiple Regression analysis (MRA) is a measure called the coefficient of multiple correlation, which is simply the correlation between Y and our predicted scores on Y
Sign up to read
Learn more about book
eBook - ePub
Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences
- Jacob Cohen, Patricia Cohen, Stephen G. West, Leona S. Aiken(Authors)
- 2013(Publication Date)
- Routledge
  (Publisher)
Multiple Regression/Correlation With Two or More Independent Variables
DOI: 10.4324/9780203774441-3

3.1 Introduction: Regression and Causal Models

In Chapter 2 we examined the index of linear correlation between two variables, the Pearson product moment correlation r and the regression equation for estimating Y from X. Because of the simplicity of the two-variable problems, we did not need to go into detail regarding the interpretive use of these coefficients to draw substantive inferences. The inferences were limited to the unbiased estimation of their magnitudes in the population; the assertion, in the case of the regression coefficient, that one variable was, in part, related to or dependent on the other; and the demonstration of the significance of the departure of the coefficients from zero. When we move to the situation with more than one independent variable, however, the inferential possibilities increase more or less exponentially. Therefore, it always behooves the investigator to make the underlying theoretical rationale and goals of the analysis as explicit as possible. Fortunately, an apparatus for doing so has been developed in the form of the analysis of causal models. Because the authors advocate the employment of these models in virtually all investigations conducted for the purpose of understanding phenomena (as opposed to simple prediction), this chapter begins with an introduction to the use of causal models. A more complete presentation is found in Chapter 12 .

3.1.1 What Is a Cause?

Conceptions of causality and definitions of cause and effect have differed among proponents of causal analysis, some offering no explicit definitions at all. Causal analysis as a working method apparently requires no more elaborate a conception of causality than that of common usage. In our framework, to say that X is a cause of Y
Sign up to read
Learn more about book

Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.

Multiple Regression

10 Key excerpts on "Multiple Regression"

Applied Regression and Modeling

The Reviewer's Guide to Quantitative Methods in the Social Sciences

Business Forecasting

A Practical Approach

Applied Multivariate Statistics for the Social Sciences

Analyses with SAS and IBM's SPSS, Sixth Edition

3.18 Multivariate Regression

3.18.1 Mathematical Model

Introductory Statistics and Analytics

A Resampling Perspective

13.1 Regression as Explanation

Introductory Regression Analysis

with Computer Application for Business and Economics

Statistics at Square Two

2.2 Uses of Multiple Regression

2.3 Two Independent Variables

Statistical Methods for Communication Science

A Primer of Multivariate Statistics

Data Set 1

Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences

Multiple Regression/Correlation With Two or More Independent Variables

3.1 Introduction: Regression and Causal Models

3.1.1 What Is a Cause?