Mathematics

Least Squares Linear Regression

Least Squares Linear Regression is a statistical method used to find the best-fitting line through a set of data points. It minimizes the sum of the squares of the vertical distances between the data points and the line. This technique is commonly used for modeling and predicting relationships between variables in various fields such as economics, engineering, and social sciences.

Written by Perlego with AI-assistance

11 Key excerpts on "Least Squares Linear Regression"

  • Applied Regression Analysis
    10 .
    We shall see how the method of analysis called the method of least squares can be used to examine data and to draw meaningful conclusions about dependency relationships that may exist. This method of analysis is often called regression analysis . (For historical remarks, see Section 1.8.)
    Throughout this book we shall be most often concerned with relationships of the form
    The model function will usually be “known” and of specified form and will involve the predictor variables as well as parameters to be estimated from data. The distribution of the random errors is often assumed to be a normal distribution with mean zero, and errors are usually assumed to be independent. All assumptions are usually checked after the model has been fitted and many of these checks will be described.
    (Note: Many engineers and others call the parameters constants and the predictors parameters . Watch out for this possible difficulty in cross-discipline conversations!)
    We shall present the least squares method in the context of the simplest application, fitting the “best” straight line to given data in order to relate two variables X and Y , and will discuss how it can be extended to cases where more variables are involved.

    1.1. STRAIGHT LINE RELATIONSHIP BETWEEN TWO VARIABLES

    In much experimental work we wish to investigate how the changes in one variable affect another variable. Sometimes two variables are linked by an exact straight line relationship. For example, if the resistance R of a simple circuit is kept constant, the current I varies directly with the voltage V applied, for, by Ohm’s law, I = V /R . If we were not aware of Ohm’s law, we might obtain this relationship empirically by making changes in V and observing I , while keeping R fixed and then observing that the plot of I against V more or less gave a straight line through the origin. We say “more or less” because, although the relationship actually is exact, our measurements may be subject to slight errors and thus the plotted points would probably not fall exactly on the line but would vary randomly about it. For purposes of predicting I for a particular V (with R fixed), however, we should use the straight line through the origin. Sometimes a straight line relationship is not exact (even apart from error) yet can be meaningful nevertheless. For example, suppose we consider the height and weight of adult males for some given population. If we plot the pair (Y 1 , Y 2 ) = (height, weight), a diagram something like Figure 1.1 will result. (Such a presentation is conventionally called a scatter diagram
  • Applied Univariate, Bivariate, and Multivariate Statistics Using Python
    eBook - ePub
    • Daniel J. Denis(Author)
    • 2021(Publication Date)
    • Wiley
      (Publisher)
    matrix algebra will only lead to more enlightenment down the road, just as understanding the logical precursors that must be in place before prediction is entertained for a given research problem will foster a richer understanding. Mathematically, knowing how to pump out derivatives and integrals, for instance, is not nearly as important as understanding how regression works on an intuitive level, while also not ignoring the technicalities.

    7.2  The Least-Squares Principle

    Though regression analysis may use one of several methods of estimating parameters, by far the one that dominates is ordinary least-squares. In fact, much of the history of regression analysis dates back to the origins of the least-squares principle in one form or another. Many early astronomers, for instance, worked on ways of minimizing error in predicting astronomic phenomena (Stigler, 1986). Though the mathematics behind least-squares can get rather complex, the principle is very easy to understand. When we say “least-squares,” what we mean is minimizing (i.e. “least”) a quantity. The quantity we are wanting to minimize is a squared quantity (i.e. the “squares” part of the principle). But, what quantity do we want to minimize when making predictions? Remarkably, it is analogous to making predictions without least-squares, and we have already seen it before. If you follow our development here, you will have a solid and intuitive understanding of least-squares.
    Recall the sample variance discussed earlier in the book:
    In computing the variance, recall that, in the numerator, we are summing squared deviations from the mean, but it is entirely reasonable to ask why we are summing deviations from the mean, and not any other number? Why we are squaring deviations is well understood, so that the sum does not always equal 0 (i.e. if we did not square deviations, then would always equal 0 regardless of how much variability we have in our data; try it yourself with some sample data), but why deviations from the mean in particular? Believe it or not, the answer to this question helps form the basis of least-squares regression, and the answer is this: because taking squared deviations from the mean ensures for us a smaller average squared deviation than if we took deviations from any other value, such as the median or the mode
  • Presenting Statistical Results Effectively
    5 The Linear Regression Model 5.1 Introduction The volume of knowledge that has been built on regression models is staggering. Most quantitative social science employs some form of regression analysis. At its core, regression is a relatively simple idea. It provides a specific and elegant summary of how the values of a dependent variable y are expected to change as an independent variable x changes. The pillar of regression methods is linear regression. Despite the linearity constraint, linear regression is very flexible, making it a sensible choice for many situations. It also forms the foundation of a vast array of methods for which linear regression itself is not appropriate. The linear model is simple to estimate, has a straightforward interpretation, and its insights can be easily presented to audiences of varying levels of sophistication. All quantitative social scientists are well served by having a solid understanding of linear regression. This chapter provides a brief overview of linear regression and the assumptions of the linear model. Our emphasis is on ordinary least squares (OLS) regression. We provide enough statistical foundation to ensure a sound understanding of how linear models are estimated, and the properties of their estimates (for more technical but accessible treatments, see Wooldridge, 2010; Baltagi, 2011; Fox, 2016). We also discuss basic model interpretation and evaluation (i.e., model fit and testing). These discussions foreshadow the diagnostic, presentation and interpretation tasks that are the focus of Chapters 6 – 9. We end the chapter with a discussion of the linear probability model. This controversial use of OLS to predict binary outcomes is gaining in popularity (again) in the social sciences after decades of limited use
  • Python: Advanced Predictive Analytics
    • Joseph Babcock, Ashish Kumar(Authors)
    • 2017(Publication Date)
    • Packt Publishing
      (Publisher)
    Several complexities complicate this analysis in practice. First, the relationships we fit usually involve not one, but several inputs. We can no longer draw a two dimensional line to represent this multi-variate relationship, and so must increasingly rely on more advanced computational methods to calculate this trend in a high-dimensional space. Secondly, the trend we are trying to calculate may not even be a straight line – it could be a curve, a wave, or even more complex patterns. We may also have more variables than we need, and need to decide which, if any, are relevant for the problem at hand. Finally, we need to determine not just the trend that best fits the data we have, but also generalizes best to new data.
    In this chapter we will learn:
    • How to prepare data for a regression problem
    • How to choose between linear and nonlinear methods for a given problem
    • How to perform variable selection and assess over-fitting

    Linear regression

    Ordinary Least Squares (OLS ).
    We will start with the simplest model of linear regression, where we will simply try to fit the best straight line through the data points we have available. Recall that the formula for linear regression is:
    Where y is a vector of n responses we are trying to predict, X is a vector of our input variable also of length n, and β is the slope response (how much the response y increases for each 1-unit increase in the value of X). However, we rarely have only a single input; rather, X will represent a set of input variables, and the response y is a linear combination of these inputs. In this case, known as multiple linear regression, X is a matrix of n rows (observations) and m columns (features), and β is a vector set of slopes or coefficients which, when multiplied by the features, gives the output. In essence, it is just the trend line incorporating many inputs, but will also allow us to compare the magnitude effect of different inputs on the outcome. When we are trying to fit a model using multiple linear regression, we also assume that the response incorporates a white noise error term ε, which is a normal distribution with mean 0 and a constant variance for all data points.
    To solve for the coefficients β in this model, we can perform the following calculations:
    The value of β is known the ordinary least squares estimate of the coefficients. The result will be a vector of coefficients β for the input variables. We make the following assumptions about the data:
  • Understanding Econometrics
    • Jon Stewart(Author)
    • 2018(Publication Date)
    • Routledge
      (Publisher)
    The least squares principle states that the line (and hence the parameter estimates) should be chosen so as to make the sum of squared residuals as small as possible. Formally, this can be expressed as choose α ^ ⁢ and β ^ ⁢ to minimize Σ e t 2 = Σ (Y t − α ^ − β ^ X t) 2 (2.1.6) There exists a well defined procedure in calculus for solving a problem such as 2.1.6. What the calculus does (see exercise 2.3) is to show that the values of α ^ and β ^ that satisfy the minimization condition are those values that satisfy a pair of equations, known as the normal equations : α ^ n + β ^ Σ ⁢ X t = Σ Y t α ^ Σ X t + β ^ Σ X t 2 = Σ X t Y t (2.1.7) In these equations, α ^ and β ^ are unknowns and Σ Y t, n, Σ X t, Σ X t Y t and Σ X 2 t are all known in the sense that, for any given set of data, a value can be calculated for each quantity. There are many values of α ^ and β ^ which would satisfy one of the equations, but the least squares principle implies that the correct choice of estimates would be the particular values of α ^ and β ^ which satisfy both equations. This is one of the contexts in which it is important to be able to recognize linearity, for if two unknown quantities have to satisfy two linear equations, the only possible solution is a pair of values which define a point lying on both straight lines. Since two distinct straight lines can only cross at a single point, there can only be one value, for each unknown, which satisfies both equations. The only cases in which this conclusion does not hold are (1) when the two lines are parallel or (2) when the two equations represent exactly the same Une. In the first case, no solution is possible and the equations are inconsistent. In the second case, there is really only one equation and any point on the corresponding line would be a solution: but such a solution is not unique and, to fit a single estimated line to observed points, we do need a unique solution
  • Basic Experimental Strategies and Data Analysis for Science and Engineering
    Chapter 8 Regression Analysis 8.1    Introduction
    The goal of the vast majority of experimentation is (or ought to be) to develop a model which adequately describes the system being studied. The model can then be used for whatever the intended objective is: optimization, troubleshooting, control, etc. The model is essentially a concise summary of all the data that was taken. The model smooths out the noise (variability) in the data, and it elucidates the underlying relationships between the factors and the response(s).
    As a rule, the form of the model is known (or specified) before any experiments are run. It could be a simple straight line, a complicated polynomial, or anything in between. It could even be a mechanistic model, although those are, in general, beyond the scope of this text. But, even though the form is known, the model has constants in it whose values are NOT known. One way to think of the goal of experimentation is that the goal is to allow the estimation of those constants (which we sometimes also call by other names like coefficients or parameters). The main thrust of this chapter describes the process we use to distill the values of those constants from our data.
    We will begin this chapter with a description of the method, which is called Least Squares. We will then apply the method to a simple problem of fitting a straight line (Section 8.3 ). After that we will cover the more useful situation of fitting a model with several factors in it (Section 8.4 ). We will look at describing how well the model fits the data in Section 8.5 , and how good any assumptions are that we had to make along the way (Section 8.6 ). It should be mentioned up front that the equations that are used for regression analysis can be expressed very succinctly in matrix form, and so matrices will be used extensively except for the very simplest of cases. If you are unfamiliar (or just rusty) on the subjects of matrix notation and matrix manipulation, there as many good and short tutorials available on-line. See, for example, stattrek.com
  • A First Course in the Design of Experiments
    eBook - ePub
    • John H. Skillings, Donald Weber(Authors)
    • 2018(Publication Date)
    • CRC Press
      (Publisher)
    It can be shown that these solutions do, in fact, minimize the sum of the squared residuals.
    We call the values of b0 and b1 given in (2.3.3) the least squares estimates for β0 and β1 . The value ŷ, obtained from ŷ = b0 + b1 x, is called the least squares estimate for Y. The line ŷ = b0 + b1 x is called the least squares line, or regression line, or line of best fit.
    Example 2.3.1 Refer to the TV data of Example 2.2.1. As before we let x represent the educational level and y represent the number of hours watching TV. We proceed to find the least squares estimates and the corrssponding least squares line for these data. Performing some summary calculations we obtain
    i = 1
    10
    i = 21.9 ,
    i = 1
    10
    x i
    = 133 ,
    i = 1
    10
    x i 2
    = 1841 ,
    and
    i = 1
    10
    x i
    y i
    = 277.7.
    Using (2.3.3) formulas, we find the least square estimates
    b 1
    =
    10
    ( 277.7 )
    ( 133 )
    ( 21.9 )
    10
    ( 1841 )
    ( 133 )
    2
    = 0.1882
    b 0
    =
    21.9 10
    (
    0.1882
    )
    133 10
    = 4.6932.
    Hence the least squares line is
    y ^
    = 4.6932 0.1882 x .
    The actual data and the least squares line are plotted in Figure 2.3.2 . From this plot we see that this least squares line “fits” the data quite well.
    Figure 2.3.2 Television Least Squares Line
    We can use the least squares line, alternately called the regression line, in several ways. To illustrate, in the television study example, the slope is negative. It therefore follows that hours watching TV tends to decrease as the educational level increases. Secondly, the regression line can be used to predict values of y for given values of x. For example, if the educational level of a person is x = 11 years, we predict that his or her evening TV watching averages ŷ = 4.6932 − 0.1882(11) = 2.6 hours.
    Matrix Representation
    We close this section with a note concerning the matrix representation in simple linear regression. The sample form (2.3.1)
  • An Introduction to Probability and Statistical Inference
    • George G. Roussas(Author)
    • 2003(Publication Date)
    • Academic Press
      (Publisher)
    Chapter 13

    A Simple Linear Regression Model

    This is a rather extensive chapter on an important subject matter with an abundance of diverse applications. The basic idea involved may be described as follows. There is a stimulus, denoted by x , and a response to it, denoted by y. At different levels of x , one observes the respective responses. How are the resulting (x, y ) pairs related, if they are related at all? There are all kind of possibilities, and the one discussed in this chapter is the simplest such possibility, namely, the pairs are linearly related.
    In reality, what one, actually, observes at x , due to errors, is a value of a r.v. Y, and then the question arises as to how we would draw a straight line, which would lie “close” to most of the (x, y ) pairs. This leads to the Principle of Least Squares. On the basis of this principle, one is able to draw the so-called fitted linear regression line by computing the Least Squares Estimates of parameters involved. Also, some properties of these estimates are established. These things are done in the first two sections of the chapter.
    Up to this point, the errors are not required to have any specific distribution, other than having zero mean and finite variance. However, in order to proceed with statistical inference about the parameters involved, such as constructing confidence intervals and testing hypotheses, one has to stipulate a distribution for the errors; this distribution, reasonably enough, is assumed to be Normal. As a consequence of it, one is in a position to specify the distribution of all estimates involved and proceed with the inference problems referred to above. These issues are discussed in Sections 13.3 and 13.4.
    In the following section, Section 13.5 , the problem of predicting the expected value of the observation Y 0 at a given point x 0 and the problem of predicting a single value of Y 0
  • Basic Computational Techniques for Data Analysis
    eBook - ePub
    • D Narayana, Sharad Ranjan, Nupur Tyagi(Authors)
    • 2023(Publication Date)
    • Routledge India
      (Publisher)
    Regression analysis can be linear or non-linear based on the relationship between variables. An important assumption of linear regression is that the relationship between the variables must be linear, which follows a straight-line relationship. On the other hand, a non-linear regression model studies variables having no linear relationship. That is, the curve of regression is not a straight line. The present chapter focuses on the linear regression model throughout the analysis. In addition to understanding the concept of regression analysis, its relevance, and applicability, we will also learn how to develop, estimate, and interpret a linear regression model. Based on this, we will make predictions and determine the fit of the model.

    8.1 Regression Analysis

    Regression analysis is a statistical technique that quantifies the relationship between the variables and predicts the value of one variable from another set of variables. The variable that predicts the other variable’s value is known as the independent variable, the explanatory variable, the predictor, or simply the X variable. Likewise, the variable whose value is predicted is known as the dependent variable, or variable of interest or the Y variable. Linear regression analysis models the relationship between a dependent and an independent variable by fitting a linear equation. Let us look at the steps involved in developing a complete linear regression model.

    8.2 Developing a Linear Regression Model

    To create a regression model, we first identify and establish the relationship between the dependent and independent variables. The simple linear regression equation can be written in the form of:
    Y =
    β 0
    +
    β 1
    X
    Here,
    Y = Dependent variable
    X = Independent variable
    β0 = Y-Intercept
    β1 = Slope coefficient of Y with respect to X
    Correspondingly, the multiple linear regression equation for ‘n’ number of independent variables can be written in the form of:
    Y =
    β 0
    +
    β 1
    X 1
    +
    β 2
    X 2
    + +
    β n
    X n
    Here,
    Y = Dependent variable
    X
    i
    = ith independent variable,
    i { 1 , 2 n }
    β0 = Y-Intercept
    β
    i
    = Slope coefficient of Y on X
    i
    The regression coefficient, denoted as β
    i
    , is known as the regression coefficient of Y on X (when the value of variable Y depends on X). The β0 known as the constant or the Y-intercept, estimates the value of Y when the value of X is zero. The magnitude of β
    i
    signifies the change in Y due to a change in X
    i
    by one unit, provided all the other explanatory variables are held constant. Together β0 and β1 are the parameters of the regression model that mathematically describe the relationship between dependent and independent variables. The regression model can be shown graphically as in Figure 8.1
  • Principles of Statistics
    x ; the purpose of regression analysis is to make inferences about the form of this graph.
    The simplest and most important type of regression is the straight line
    where β is the slope of the line and α its intercept at x = 0. As we remarked above, the regression of comb-growth on log dose seems to be approximately linear within the range of doses from mg. to 8 mg.; it cannot, however, be linear over the entire range of doses since (1) there must be an upper limit to the comb-growth as the dose is indefinitely increased, and (2) the comb-growth for zero dose, when log dose = −∞, must be zero and not −∞! This example illustrates the danger of extrapolation.
    Let us suppose then that we have n pairs of observations, (x 1 , y 1 ), (x 2 , y 2 ), ..., (x n , y n ), on two variables of which x is the independent and y the dependent variable and that we wish to estimate the regression of y on x which is assumed to be linear,
    The standard procedure is to choose as the estimated regression line
    that line which minimises the sum of the squared deviations of observed from estimated values of y , that is to say the line which minimises the quantity
    These deviations are shown graphically in Fig. 31 . This method is known as the method of least squares . It was first considered in connection with errors of astronomical observations by Legendre in 1806 and by Gauss in 1809.
    FIG. 31 . The ‘best ’ line is the line which minimises the sum of the squares of the deviations in the direction shown
    To find the line which minimises S2 we must solve the pair of simultaneous equations:
    The solution of the first equation is
    which tells us that the line passes through the point Substituting this expression in the second equation we find
    It is also convenient to have a formula for S 2
  • Using R for Introductory Statistics
    This can be expressed as y i has a Normal (β 0 + β 1 x i, σ) distribution. If the x values are random, the model assumes that, conditionally on knowing these random values, the same is true about the distribution of the y i. Estimating the parameters in simple linear regression One goal when modeling is to “fit” the model by estimating the parameters based on the sample. For the regression model the method of least squares is used. With an eye toward a more general usage, suppose we have several predictors, x 1, x 2,. . . , x k ; several parameters, β 0, β 1,. . . , β p ; and some function, f, which gives the mean for the variables y i. That is, the statistical model y i = f (x 1 i, x 2 i,. . . , x ki | β 1, β 2,. . . , β p) + ϵ i. The method of least squares finds values for the β ’s that minimize the squared difference between the actual values, y i, and those predicted by the function f. That is, the following sum is minimized: ∑ i [ y i - f (x 1 i x 2 i,..., x k i | β 0, β 1,..., β p) ] 2. For the simple linear regression model, the formulas are not difficult to write (they are given below). For the more general model, even if explicit formulas are known, we don’t present them. The simple linear regression model for y i has three parameters, β 0, β 1, and σ 2. The least-squares estimators for these. are β ^ 1 = Σ (x i − x ¯) (y i − y ¯) Σ (x i − x ¯)) 2, ⁢ (11.1) β ^ 0 = y ¯ - β ^ 1 x ¯, and ⁢ (11.2) σ ^ 2 = 1 n - 2 ∑ [ y i - (β ^ 0 + β ^ 1 x i) ] 2. ⁢ (11.3. ) We call y ̂ = β ̂ 0 + β ̂ 1 x the prediction line; a value ŷ i = β ̂ 0 + β ̂ 1 x i the predicted value for x i ; and the difference between the actual and predicted values, e i = y i − ŷ i, the residual. The residual sum of squares is denoted RSS and is equal to Σ i e i 2. Quickly put, the regression line is chosen to minimize the residual sum of squares, RSS ; it has slope β̂ 1, intercept β̂ 0, and goes through the point (x ̄, ȳ)
Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.