Technology & Engineering

Simple Linear Regression Model

Simple Linear Regression Model is a statistical method used to model the relationship between a single independent variable and a dependent variable. It assumes a linear relationship between the variables and aims to find the best-fitting line to make predictions. The model is often used in various fields, including technology and engineering, to analyze and predict the behavior of systems and processes.

Written by Perlego with AI-assistance

Related key terms

Correlation and Regression

Hypothesis Test for Regression Slope

Least Squares Linear Regression

Linear Regression

Multiple Regression

Multiple Regression Analysis

Nonlinear Regression

Ordinary Least Square Method

Polynomial Regression

Statistical Models

11 Key excerpts on "Simple Linear Regression Model"

eBook - ePub
Theory of Linear Models
- Bent Jorgensen(Author)
- 2019(Publication Date)
- Chapman and Hall/CRC
  (Publisher)
CHAPTER 1
Simple Linear Regression

Linear models are used for studying the explanation of a given variable in terms of a linear combination of given explanatory variables. In the present chapter, we discuss the case of one explanatory variable, as a preparation for the general case, which is treated in Chapter 2 and onwards. Readers already familiar with simple linear regression may want to go directly to Chapter 2 , with occasional reference to Chapter 1 as necessary.

1.1 The linear regression model

Consider an experiment in which we make simultaneous measurements of two variables x and y for a range of different experimental conditions. If we make n measurements, let

(
x 1
,
y 1
) , … , (
x n
,
y n
)

denote the corresponding n pairs of observations. We construct a statistical model for the situation where the relationship between x and y is thought to be linear or approximately so.

Often x represents the experimental conditions, and y represents the outcome of the experiment. The variable y is called the response variable or the dependent variable . We assume that y 1 , …,
yn
are realizations of independent random variables Y 1 , …,
Yn
. In contrast, the values x l , …,
xn
are considered constant (non-random), and x is called the explanatory variable or the independent variable . We must hence make a clear distinction between the response and explanatory variables in regression analysis. Even if x 1 , …
xn
are realizations of random variables Xl , …,
Xn
, we may think of x l ,,
xn
as fixed, in the sense that we consider the conditional distribution of y1 ,…,
Yn
given Xl =
Xl
, …,
Xn
=
xn
.

The first step in the analysis is to make a scatterplot of y versus x . A typical scatterplot is shown in Figure 1.1
Sign up to read
Learn more about book
eBook - ePub
Applied Medical Statistics
- Jingmei Jiang(Author)
- 2022(Publication Date)
- Wiley
  (Publisher)
13 Simple Linear Regression

CONTENTS

13.1 Concept of Simple Linear Regression

13.2 Establishment of Regression Model

13.2.1 Least Squares Estimation of a Regression Coefficient

13.2.2 Basic Properties of the Regression Model

13.2.3 Hypothesis Testing of Regression Model

13.3 Application of Regression Model

13.3.1 Confidence Interval Estimation of a Regression Coefficient

13.3.2 Confidence Band Estimation of Regression Model

13.3.3 Prediction Band Estimation of Individual Response Values

13.4 Evaluation of Model Fitting

13.4.1 Coefficient of Determination

13.4.2 Residual Analysis

13.5 Summary

13.6 Exercises

In this chapter, we present analyses to determine the strength of the relationship between two variables. The magnitude of one of the variables (the dependent variable y) is assumed to be determined by a function of the magnitude of the another variable (the independent variable x), whereas the reverse is not true. In particular, we will look for straight-line (or linear) changes in y as x changes. The term “dependent” does not necessarily imply a cause-and-effect relationship between the two variables. Such a dependence relationship is called simple linear regression, or linear regression in short. The term “simple” is used because there is only one independent variable x. Starting from the basic concepts, we will systematically introduce the modeling principles of linear regression, statistical inference of parameters, and the application of regression model. Multiple linear regression, which considers two or more independent variables, will be introduced in Chapter 15 . For convenience, in this chapter, we use lower case letters x and y to denote the dependent and independent variables, where y
Sign up to read
Learn more about book
eBook - ePub
Statistical Techniques in Geographical Analysis
- Dennis Wheeler, Gareth Shaw, Stewart Barr(Authors)
- 2013(Publication Date)
- Routledge
  (Publisher)
Figure 8.5 , some degree of departure from a perfect relationship must always be expected.

In essence, simple linear regression methods ‘fit’ a straight line through scatters of points. In section 2.5 we saw that straight lines can be described in algebraic terms that allow them to be plotted on sheets of graph paper. It was also shown that the Y term (in the context of regression methods this is the dependent variable) is related to the term X (the independent variable) through two constants, a and b . If we know the values of the two constants we can plot the line and predict Y for any value of X (see, for example, Table 2.7 ). The problem is, how to determine these two values? Many different results will be obtained if we rely on a visual judgement to determine the best-fit line . Clearly, we require an objective method, based on a consistent criterion. Statisticians, for very important reasons that we cannot discuss here, adopt the criterion of least squares , whereby the best-fit line passes through the scatter of plotted points in such a way that the sum of the squared departures of each point from the line (always measured in terms of Y ) is at a minimum. For each set of two variables there is only one such solution.

Figure 9.1 is a simplified expression of the above principles, in which we have indicated the difference (V ) between each observation of Y and the best-fit line. We should also notice that the constant b is, in the current context, described as the regression coefficient , but that it also describes the slope of the line. The constant a is known as the intercept term , and is the point on the Y axis through which the line passes. In algebraic terms it is the value of Y when X
Sign up to read
Learn more about book
eBook - ePub
International Business Research
- James P. Neelankavil(Author)
- 2015(Publication Date)
- Routledge
  (Publisher)
Regression and correlation analysis are techniques that assist researchers in studying relationships between variables. Regression analysis is a technique that links the variables. Correlation analysis is a technique that measures the strength of the relationship between the variables.

Linear relations or approximated linear relations are commonly found in business situations. The term “linear” implies that the relationship between two variables can be functionally described in the form of an equation. In simple regression, the variable whose values we wish to estimate is referred to as the dependent/criterion variable, denoted by Y , and the variable from which the forecast (predictions) are made is called the independent/predictor variable, denoted by X .

SIMPLE LINEAR REGRESSION

The linear equation used in regression is in the form of Y = a + bX , where a and b are constants (these constants will be explained later) that describe the average relationship between the two variables. This implies that for a given value of X , one can establish the value of Y. For example, consider the linear model Y = 2 + 3X . If the value of X = 10, then the value of Y = 2 + 3(10), that is, 32.

Example1 . Simple regression is best explained through an example. The vice president of operations for a large manufacturing company is attempting to forecast factory workers’ average output per day in one of his plants. The average output for 10 randomly selected workers is presented below.

The values of average working hours (X ) and worker output (Y ) are plotted on a graph as shown in Figure 16.1 , called a scatter diagram.
The pattern of the plot appears to be linear. No straight line will pass through all 10 points; however, a single line seems to pass reasonably close to all 10 points.

SLOPE AND THE INTERCEPT

Referring back to the linear equation Y = a + bX , the two constants a and b can now be explained. The first constant, a , is the intercept, and the second, b , is the slope. The intercept, a , is defined as the value of Y when the value of X is 0 (the starting point of the straight line that intersects the y-axis). The slope, b , is the rate of change of Y for a unit change in the value of X , which provides the angle of the straight line. Given the starting point and the angle of the straight line, the linear function can be used to predict the values of the dependent variable (Y ) for a given value of the independent variable (X
Sign up to read
Learn more about book
eBook - ePub
Business Forecasting
A Practical Approach
- A. Reza Hoshmand(Author)
- 2009(Publication Date)
- Routledge
  (Publisher)
7 Forecasting with Simple Regression
In many business and economic situations, you will be faced with a problem where you are interested in the relationship that exists between two different random variables X and Y . This type of a relationship is known as a bivariate relationship. In the bivariate model, we are interested in predicting the value of a dependent or response variable based upon the value of one independent or explanatory variable. For example, a marketing manager may be interested in the relationship between advertising and sales. A production manager may want to predict steel production as it relates to household appliance output. A financial analyst may be interested in the bivariate relationship between investment X and its future returns Y ; or an economist may look at consumer expenditure as a function of personal disposable income. Regression models are also called causal or explanatory models. In this case, forecasters use regression analysis to quantify the behavioral relationship that may exist between economic and business variables. They may use regression models to evaluate the impact of shifts in internal (company level) variables, such as discount prices and sales, and external economic factors, such as interest rates and income, on company sales.
To determine if one variable is a predictor of another variable, we use the bivariate modeling technique. The simplest model for relating a variable Y to a single variable X is a straight line. This is referred to as a linear relationship. Simple linear regression analysis is used as a technique to judge whether a relationship exists between Y and X . Furthermore, the technique is used to estimate the mean value of Y , and to predict (forecast) a future value of Y for a given value of X .
In simple regression analysis, we are interested in describing the pattern of the functional nature of the relationship that exists between two variables. This is accomplished by estimating an equation called the regression equation. The variable to be estimated in the regression equation is called the dependent variable and is plotted on the vertical (or Y ) axis. The variable used as the predictor of Y , which exerts influence in explaining the variation in the dependent variable, is called the independent variable. This variable is plotted on the horizontal (or X
Sign up to read
Learn more about book
eBook - ePub
Practical Statistics and Experimental Design for Plant and Crop Science
- Alan G. Clewer, David H. Scarisbrick(Authors)
- 2013(Publication Date)
- Wiley
  (Publisher)
Chapter 7

Linear Regression and Correlation

7.1 BASIC PRINCIPLES OF SIMPLE LINEAR REGRESSION (SLR)

The applied biologist frequently finds it necessary to consider the simultaneous variation within a number of different parameters. He or she may wish to investigate relationships between yield, environmental data and yield components. For example, such studies can be used to assess whether an increase or decrease in one variable, such as number of pods per plant, can be explained by change in another such as plant population and light intensity.

Study of simultaneous change in two (or more) variables can be carried out using techniques of regression and correlation. A regression equation is a mathematical relationship which can be used to determine the expected value of a dependent (or response) variable for a given value of a correlated independent (or predictor) variable. The independent variable is generally denoted by X and the dependent variable by Y . For example, a study of the effect of change in plant population (X ) on the number of pods per plant (Y ) is referred to as the regression relationship of Y on X . The results of a regression analysis provide information on the average rate of change in number of pods per unit increase or decrease in plant population. Regression equations can be used to provide a succinct summary of large amounts of experimental and observational data.
SLR can be used in the following circumstances:

To find out whether two variables are connected by a straight-line relationship. If so, we are interested in the fitted equation.

If the X -values are fixed by the experimenter (for example, different fertiliser concentrations) and the Y -values are measured (for example, yields) then the relationship between the two variables can be studied and predictions made about future Y -values for given X -values.

Suppose one variable is difficult, and another is easy to measure. We can carry out a study to find out if these variables are connected by a straight-line relationship. If they are, then future values of one variable can be estimated from measurements on the other. For example, the leaf area of linseed which has small ovate leaves is more difficult to measure than stem length. If the equation connecting these variables is known, it can be used to estimate leaf area from measurements of stem length.
Sign up to read
Learn more about book
eBook - ePub
Biostatistics for Clinical and Public Health Research
- Melody S. Goodman(Author)
- 2017(Publication Date)
- Routledge
  (Publisher)
13 Linear regression
This chapter will focus on investigating the change in a continuous variable in response to a change in one or more predictor variables and will include the following topics:

Simple linear regression

Regression concepts

Methods of least squares

Linear relationship

Inference for predicted values

Evaluation of the model

Multiple linear regression

Model evaluation

Other explanatory variables

Model selection

Terms

collinearity

indicator variable

Simple linear regression

Simple linear regression measures the association between two continuous variables.

One variable is treated as the response (dependent or outcome) variable, commonly denoted as y .

The other is the explanatory (independent or predictor) variable, commonly denoted as x .

The concept is similar to correlation; however, regression enables us to investigate how a change in the response variable corresponds to a given change in the explanatory variable. Correlation analysis makes no such distinction. It can only determine whether a linear relationship exists between the two variables of interest, and it determines the strength of that association. The objective of regression is to predict the value of the response variable that is associated with a fixed value of an explanatory variable. Linear regression is used to examine the relationship between a variable (continuous, categorical, or binary) and a continuous outcome, specifically how a change in the explanatory (predictor) variable affects a change in the response (outcome) variable.
Sign up to read
Learn more about book
eBook - ePub
Data Mining and Predictive Analytics for Business Decisions
A Case Study Approach
- Andres Fortino(Author)
- 2023(Publication Date)
- Mercury Learning and Information
  (Publisher)
Not all relationships may be linear or logistic. In certain circumstances, a power law relationship may exist, or an exponential growth relationship is apparent. Sometimes the equation of a parabola (quadratic) or a cubic relationship is more appropriate. Then, using a regression algorithm to obtain the parameters, we may find a model other than linear or logistic more accurate. We will discuss this using a model-fitting measure of goodness called R-squared later in the chapter.

LINEAR REGRESSION

A simple linear regression (SLR) machine is the linear mathematical equation that defines the best linear relationship between two variables, X and Y, as shown in Figure 7.1 .

Here, Y is the response (outcome, output, label, or dependent) variable, what we are predicting (e.g., catalog expenditures or lifetime value of a customer). X is the predictor (input, feature, or independent) variable (e.g., age or income). b is a constant numerical value and denotes the Y-intercept when X takes on a value of zero. m is the slope of the regression line.

FIGURE 7.1 The equation of a straight line showing the coefficients and their relationships.

SIMPLE LINEAR REGRESSION

Let us assume we have a data set of customers’ purchasing behavior versus their income level. We extract a sample of 10 customers to build a predictive model of 1-year Lifetime Value (LTV), which is their cumulative purchases for the previous year, as the outcome, or predicted variable. We use their income level as the predictor. Figure 7.2
Sign up to read
Learn more about book
eBook - ePub
Regression Analysis
A Practical Introduction
- Jeremy Arkes(Author)
- 2023(Publication Date)
- Routledge
  (Publisher)
Chapter 7 describes the strategies to use when estimating regressions for the other objectives.
2.3 The Simple Regression Model

2.3.1 The components and equation of the Simple Regression Model
If you remember 8th-grade Algebra, a straight line is represented by the equation:

y = a + b x

where:

x is the horizontal-axis variable

y is the vertical-axis variable

a is the y-intercept

b is the slope of the line.

A Simple Regression is similar in that there is one X and one Y variable, but it differs in that not all points fall on a straight line. Rather, the Simple Regression line indicates the line that best fits the data.

The Simple Regression Model (also known as the Bivariate Regression Model) is:

Y i
=
β 0
+
β 1
×
X i
+
ε i
(2.1a)

(i = 1, 2, 3, …, N)

This equation describes each of the N data points (observations), not just the line. The five components of the model are:

The dependent variable (Y), which is also called the outcome, response variable, regressand, or Y variable. (It is the Y-axis variable, or “income” in Figure 2.1 .)

The explanatory variable (X), which is also called the independent variable, explanatory variable, treatment variable, regressor, or simply X variable. Personally, I do not like the term “independent variable” because: (1) it is not descriptive of what the variable does; (2) sometimes, the X variable is “dependent” on the dependent (Y) variable or other factors that are related to the dependent variable; and (3) it is too close to and often gets confused with “dependent variable.” I prefer “explanatory variable” or simply “X variable.” (It is the X-axis variable, or “years-of-schooling” in Figure 2.1
Sign up to read
Learn more about book
eBook - ePub
Statistics for Compensation
A Practical Guide to Compensation Analysis
- John H. Davis(Author)
- 2011(Publication Date)
- Wiley
  (Publisher)
Chapter 7 Linear Model 7.1 Examples
If the plot of two variables indicates that a straight line can describe the observed trend, then we fit a linear model or straight line to the data. This process is called simple linear regression. The term “simple” means that there is just one x -variable. Here are two examples of linear models.

The first model, in Figure 7.1 , was used to make a decision on the salary range for the BPD VP of Human Resources.

Figure 7.1 Linear model of market data for VP Human Resources

The second model, in Figure 7.2 , was used to make decisions on targeting BPD'communication efforts to lower-paid employees on the benefits of contributing to a 401(k).

Figure 7.2 Linear Model of Employee Contribution to 401(k)

Case Study 5, Part 2 of 3
Recall from Chapter 6 that the engineering manager of BPD needs to hire some mid-level chemical engineers with 5–10 years of experience and wants to know what the market pay is, so he knows the range of pay to offer. This statement summarizes the first two steps in the model building process—to specify the problem or issue (what is the pay for mid-level chemical engineers?) and to generate the critical factors that may impact the problem (for this example, we focus on experience of BS chemical engineers).

You know that pay varies with experience, so you collect survey data that include both salary and experience. We will use the same data from Chapter 6 that were used to illustrate the model building process. The data and corresponding plot are repeated here in Table 7.1 and Figure 7.3
Sign up to read
Learn more about book
eBook - ePub
Python: Advanced Predictive Analytics
- Joseph Babcock, Ashish Kumar(Authors)
- 2017(Publication Date)
- Packt Publishing
  (Publisher)
Several complexities complicate this analysis in practice. First, the relationships we fit usually involve not one, but several inputs. We can no longer draw a two dimensional line to represent this multi-variate relationship, and so must increasingly rely on more advanced computational methods to calculate this trend in a high-dimensional space. Secondly, the trend we are trying to calculate may not even be a straight line – it could be a curve, a wave, or even more complex patterns. We may also have more variables than we need, and need to decide which, if any, are relevant for the problem at hand. Finally, we need to determine not just the trend that best fits the data we have, but also generalizes best to new data.
In this chapter we will learn:
How to prepare data for a regression problem
How to choose between linear and nonlinear methods for a given problem
How to perform variable selection and assess over-fitting
Linear regression
Ordinary Least Squares (OLS ).
We will start with the simplest model of linear regression, where we will simply try to fit the best straight line through the data points we have available. Recall that the formula for linear regression is:

Where y is a vector of n responses we are trying to predict, X is a vector of our input variable also of length n, and β is the slope response (how much the response y increases for each 1-unit increase in the value of X). However, we rarely have only a single input; rather, X will represent a set of input variables, and the response y is a linear combination of these inputs. In this case, known as multiple linear regression, X is a matrix of n rows (observations) and m columns (features), and β is a vector set of slopes or coefficients which, when multiplied by the features, gives the output. In essence, it is just the trend line incorporating many inputs, but will also allow us to compare the magnitude effect of different inputs on the outcome. When we are trying to fit a model using multiple linear regression, we also assume that the response incorporates a white noise error term ε, which is a normal distribution with mean 0 and a constant variance for all data points.
To solve for the coefficients β in this model, we can perform the following calculations:
The value of β is known the ordinary least squares estimate of the coefficients. The result will be a vector of coefficients β for the input variables. We make the following assumptions about the data:
Sign up to read
Learn more about book

Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.

Explore more topic indexes

View all