Technology & Engineering

Covariance and Correlation

Covariance and correlation are statistical measures used to quantify the relationship between two variables. Covariance measures how much two variables change together, while correlation standardizes this measure to a range of -1 to 1, indicating the strength and direction of the relationship. Both are important in analyzing data and understanding the associations between different technological and engineering factors.

Written by Perlego with AI-assistance

8 Key excerpts on "Covariance and Correlation"

Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.
  • Understanding Quantitative Data in Educational Research

    ...The table or scatterplot should be carefully examined to compare the variables and to see whether the paired data points follow a straight line which indicates that the value of one variable is linearly associated with the value of the other variable. If an association or a relationship exists between variables, the strength and direction of the relationship will be measured by a coefficient of correlation. To see if the relationship occurs by chance, a null hypothesis is formulated, and then the p -value is computed from the data. We cannot go directly from statistical correlation to causation, and further investigations are required. 13.1 Covariance and Correlation between two variables Covariance and Correlation describe the association (relationship) between two variables, and they are closely related statistics to each other, but not the same. The covariance measures only the directional relationship between the two variables and reflects how they change together. A direct or positive covariance means that paired values of the two variables move in the same direction, while an indirect or negative covariance means they move in the opposite direction. The formula for covariance is: where x i is the i th x -value in the data set, is the mean of the x values, y i is the i th y -value in the data set, is the mean of the y -values and n is the number of data values in each data set. If cov(X, Y) > 0 there is a positive relationship between the dependent and independent variables, and if cov(X, Y) < 0 the relationship is negative. Example 13.1 Computing the covariance Data file: Ex13_1.csv Suppose that a physics teacher would like to convince her students that the amount of time they spend studying for a written test is related to their test score...

  • A Conceptual Guide to Statistics Using SPSS

    ...4 Correlation CHAPTER OUTLINE Behind the Scenes: Conceptual Background of Correlation Covariance Versus Correlation Computing Correlation (and Covariance) in SPSS Interpreting the Correlation Output A Closer Look: Partial Correlations Visualizing Correlations BEHIND THE SCENES: CONCEPTUAL BACKGROUND OF CORRELATION Correlation has many definitions, and we’ll give you a few different ones, but the only one that you really need to know is this: the degree of the linear relationship between two variables. One reason people often get confused about correlation is that the equation used to compute it has no intuitive relationship to the conceptual meaning of “linear relationship between two variables.” So we will begin this chapter by providing (or at least attempting to provide) an intuitive explanation of how the formula for a correlation coefficient relates to the concept of a correlation. To do this, it is helpful to first understand covariance. In class or in your textbook, you may have learned that the covariance between two variables, X and Y, is defined as COV (X, Y) = E (XY) − E (X) E (Y), where In English, this says that the covariance between X and Y is the difference between the expected value of their products and the product of their expected values. In order to simplify this, suppose for now that both X and Y are centered around 0 (i.e., we have subtracted the mean from each observation). Furthermore, suppose that X and Y are distributed symmetrically around the (zero) mean. Now, look again at the equations for E (X) and E (Y). These quantities are the sum of the products of each value with its probability. Under the assumptions that we made (observations are centered symmetrically around the zero mean), each positive value in the distribution is mirrored by a negative value, and each has the same probability of appearing in a sample...

  • Probability, Statistics and Other Frightening Stuff
    • Alan Jones(Author)
    • 2018(Publication Date)
    • Routledge
      (Publisher)

    ...However, in order to understand what the Correlation Coefficient is telling us – and, importantly, what it is not telling us (Section 5.2) – it is useful at least to have a mental image of what Covariance is telling us. The reason for this will become clearer in Section 5.2, but in short, Covariance is embedded within the calculation for the Correlation Coefficient. We may recall from Chapter 3 (unless we have successfully blanked that memory out) that we considered what the statistic Variance was doing; we concluded that it was looking at an average area of the squares around the Arithmetic Mean formed by plotting a variable against itself. This gave us a measure of scatter. Covariance is very similar, and so we will repeat the exercise but this time we will plot the two variables against each other (which will probably seem eminently more sensible to most of us than plotting something against itself!) This time we will be looking at the area of the rectangles formed with the joint Means at one corner. Definition 5.2 Covariance The Covariance between a set of paired values is a measure of the extent to which the paired data values are scattered around the paired Arithmetic Means. It is the average of the product of each paired variable from its Arithmetic Mean. A high positive Covariance (relative to the values of the two constituent Arithmetic Means) suggests that the data has a tendency to move in the same direction. In contrast, a high negative Covariance (‘high’ in an Absolute Value sense) suggests that the data has a tendency to move in the opposite direction to each other. A Covariance close to zero (again relative to the value of the Arithmetic Means) suggests that the paired data values are unrelated (i.e...

  • The SAGE Encyclopedia of Educational Research, Measurement, and Evaluation

    ...Hyun Joo Jung Hyun Joo Jung Jung, Hyun Joo Jennifer Randall Jennifer Randall Randall, Jennifer Correlation Correlation 412 413 Correlation If one wants to know the degree of a relationship, the correlation between two variables can be examined. Correlations can be quantified by computing a correlation coefficient. This entry first describes a concept central to correlation, covariance, and then discusses calculation and interpretation of correlation coefficients. Covariance indicates the tendency in the linear relationship for two random variables to covary (or vary together) that is represented in deviations measured in the unstandardized units in which X and Y are measured. Specifically, it is defined as the expected product of the deviations of each of two random variables from its expected values or means. The population covariance between two variables, X and Y, can be written by: Other where E is the expected value or population mean. Similarly, the sample covariance between x and y is given by: Other where N is the number of observations; are the sample means of x and y. When one interprets covariances, zero covariances indicate that variables are not linearly related. If they are nonlinearly associated or statistically independent, the covariance is zero. On the other hand, a nonzero covariance indicates the tendency of covarying. If the sign of covariance is positive, the two variables tend to vary in the same direction. If a covariance value is negative, the two variables tend to move in the opposite direction. The covariance is not independent of the unit used to measure x and y, and so magnitude of the covariance depends on the measurement units of two variables...

  • Business Statistics Using EXCEL and SPSS

    ...But how do we know how ‘strong’ the association is? The problem with covariation is that it depends on the scale of measurement used. So, for example, the exam mark is expressed in percentage terms. However, if I had marked out of 10, or 200, the covariance would have been different. Thus it is not possible to compare two covariances and say whether one is bigger than the other objectively, unless the same measurement scales are used. Correlation The solution to our thorny problem with covariance is to standardize it – or in other words to convert it somehow into a standard unit. In fact, you’ve already done this when you converted various data sets into z- score data. The principle is the same here. Remember: the standard deviation can be used to convert any distance from the mean, in whatever scale you have, into standard deviation units. What you do is divide the deviation from the mean for any individual data point by the standard deviation, and you get the deviation from the mean in standard units, rather than the units of the original scale. Take an example from Table 10.1. Student 3 scored 70 on the exam, which had a mean score of 65. This means the deviation from the mean in the original units is 5. The standard deviation is 14.56, so divide 5 by 14.56 and you get 0.34. So we can see that the deviation from the mean for student 3 is 0.34 standard deviations. Of course, this logic applies to the covariance as a whole. In this case, because there are two variables, we divide the covariance by the product of the standard deviations for each variable. This standardized covariance is called the correlation coefficient, and the formula is: where s x and s y represent the standard deviations of each of the two variables. There are various different formulae for correlation coefficients. This particular one is called the Pearson product moment correlation coefficient, or just the Pearson correlation for short, and is denoted by r...

  • Mechanical Vibration
    eBook - ePub

    Mechanical Vibration

    Analysis, Uncertainties, and Control, Fourth Edition

    • Haym Benaroya, Mark Nagurka, Seon Han(Authors)
    • 2017(Publication Date)
    • CRC Press
      (Publisher)

    ...This coefficient characterizes how linear the relationship is between the two random variables. We begin by considering the two random variables X and Y with their respective joint second moment, (9.19) E { X Y } = ∫ - ∞ ∞ ∫ - ∞ ∞ x y f X Y (x, y) d x d y. If X and Y are statistically independent then the joint density function can be separated into the product of the respective marginal densities, f X Y (x, y) = f X (x) f Y (y), and using Equation 9.19, E { XY }. = E { X } E { Y }. The covariance is defined as the joint second moment about the mean values μ X and μ Y, (9.20) Cov (X Y) = E { (X - μ X) (Y - μ Y) } = E { X Y } - μ X μ Y. Note that if the variables are independent, Cov(XY) = 0. The correlation coefficient ρ is defined as the normalized and dimensionless Cov(XY), that is, (9.21) ρ X Y = Cov (X Y) σ X σ Y. To better understand the correlation coefficient we assume that X and Y are linearly related by the equation X = aY, wher a is a positive constant. Then, E { X Y } = a E { Y 2 }, and Cov (X Y) = a E { Y 2 } - a E 2 { Y }. Using the definition of variance, the covariance becomes Cov (X Y) = a σ Y 2, and Equation 9.21 becomes (9.22) ρ X Y = a σ Y 2 σ X σ Y = a σ Y σ X = + 1, since σ X = aσ Y. The random variables X and Y are completely correlated. The last equality in Equation 9.22 is found using the property of the variance of a random variable multiplied by a constant: If Y has variance σ Y, then aY has variance aσ Y. Had we defined X = − aY, then we would have found that ρ XY = −1, or X and Y are completely negatively correlated. 21 We conclude that −1 ≤ ρ XY ≤ +1. Figure 9.19 depicts representative correlations between data points for random, nonlinear, and perfect linear relationships. Example 9.10 Jointly Distributed Variables Two random variables X and Y are joinly distributed according to the joint density f X Y (x, y) = 1 2 e - y, y > | x |, −∞ < x < ∞, as plotted in Figure 9.20...

  • Practical Statistics for Field Biology
    • Jim Fowler, Lou Cohen, Philip Jarvis(Authors)
    • 2013(Publication Date)
    • Wiley
      (Publisher)

    ...14 MEASURING CORRELATIONS 14.1 The meaning of correlation Many variables in nature are related; examples from biology include the mass of a growing organism and its volume, the length of an otolith (‘ear-stone’) and the length of the fish it is taken from, the structural complexity of a plant community and latitude. Relationships or associations between variables such as these are referred to as correlations. Correlations are measured on ordinal or interval scales. When an increase in one variable is accompanied by an increase in another, the correlation is said to be positive or direct. The length of an otolith and the length of the fish are positively correlated. When an increase in one variable is accompanied by a decrease in another, the correlation is said to be negative or inverse. The mass of body fat of a migrating bird and the distance flown since its last feed are negatively correlated. The fact that variables are associated or correlated does not necessarily mean that one causes the other. Otolith length and body length in a population of fish may be correlated but one cannot be said to cause the other; both are undoubtedly related to some underlying genetic factor. In common usage, the word ‘correlation’ describes any type of relationship between objects and events. In statistics however, correlation has a precise meaning; it refers to a quantitative relationship between two variables measured on ordinal or interval scales. 14.2 Investigating correlation Bivariate observations of variables measured on ordinal or interval scales can be displayed as a scattergram (Figs 4.8 and 14.1). Just as a simple dot-diagram gives both a useful indication of whether a sample of observations is roughly symmetrically distributed about a mean and the extent of the variability, a scattergram gives an impression of correlation. Figure 14.1 (a) shows a clear case of a positive correlation, whilst Fig...

  • The SAGE Encyclopedia of Industrial and Organizational Psychology

    ...Measures of Association/Correlation Coefficient Measures of Association/Correlation Coefficient Ronald S. Landis Ronald S. Landis Landis, Ronald S. 927 930 Measures of Association/Correlation Coefficient Ronald S. Landis In many situations, researchers are interested in evaluating the relationship between variables of interest. Such associations are important for testing theories and hypotheses in which changes in one variable are tied to changes in another. In other words, is an increase in one variable associated with a systematic increase or decrease in the other? The most frequently reported measure of association within industrial and organizational psychology is the correlation coefficient (r). Correlation is a standardized index of the extent to which two sets of scores vary together. As an index, correlation can vary between −1 (i.e., a perfect negative relationship) and +1 (i.e., a perfect positive relationship). Correlations near zero indicate the absence of a linear relationship between the variables of interest. Squaring the correlation (i.e., r 2) provides an indication of the percentage of variance in one variable that can be explained by the other variable. For example, if the correlation between height and weight is.50, then 25% of the variance in height can be explained by weight, or vice versa. Numerical Representation of Correlations The correlation between two variables can be described in one of two ways: numerically or graphically. The following example illustrates how correlation is computed and what the numerical value indicates...