Mathematics

Product Moment Correlation Coefficient

The Product Moment Correlation Coefficient is a measure of the strength and direction of the linear relationship between two variables. It ranges from -1 to 1, where 1 indicates a perfect positive linear relationship, -1 indicates a perfect negative linear relationship, and 0 indicates no linear relationship. It is commonly denoted by the symbol "r" and is used in statistics to analyze the association between variables.

Written by Perlego with AI-assistance

10 Key excerpts on "Product Moment Correlation Coefficient"

  • Using Statistics in Small-Scale Language Education Research
    eBook - ePub
    • Jean L. Turner(Author)
    • 2014(Publication Date)
    • Routledge
      (Publisher)
    Section IVAnalyzing Patterns Within a Variable and Between Two Variables Passage contains an image

    10The Parametric Pearson’s Product Moment Correlation Coefficient Statistic

    In Chapter Ten , we take a look at one of the statistics designed for investigating the correlational relationship between variables, Pearson’s Product Moment Correlation Coefficient, also known as Pearson’s r . Pearson’s r is a parametric statistic used to calculate the strength of the correlational relationship between two normally distributed variables, an independent variable and a dependent variable. When it’s used in the context of statistical logic, Pearson’s r allows a researcher to make a probability statement about the degree of correspondence between two sets of data. Unlike the statistics addressed in Chapters Six through Nine , which allow a researcher to explore differences among groups, correlation formulas are used to determine the extent to which two sets of data vary together. A correlation is a numerical expression of the strength of the relationship between the two variables.
    Researchers in language education often want to know whether there’s a significant relationship between variables; they can gain a deeper understanding of the learners in their classes and their learning environment by knowing how the variables are related to one another. There’s a practical application too, of knowing the strength of the relationship between two variables—when two variables are strongly related, we can estimate or predict a person’s behavior on the second variable, given his or her performance on the first. It’s important, though, to remember that even a strong relationship between two variables can’t be interpreted as evidence of causality. To illustrate, I’d like to tell you about a little study I did quite a while ago, just out of curiosity. I was teaching oral skills courses for international undergraduate students at a university and it seemed to me that the type of shoe a student usually wore was related to the level of his or her oral skills. I collected some data and found there was a statistically significant relationship between the two variables, a strong one— the participants who wore a particular kind of sport shoe definitely tended to have a higher degree of oral language proficiency than did people who wore other types of shoes. So, yes, on the basis of that small study, I can say that there was a statistically significant relationship between the type of shoe an international undergraduate student wears and the level of his or her oral skills, but there’s no causality there—buying a different type of shoe isn’t going to help anyone become more fluent in English!
  • Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences
    • Jacob Cohen, Patricia Cohen, Stephen G. West, Leona S. Aiken(Authors)
    • 2013(Publication Date)
    • Routledge
      (Publisher)
    5
    5 Note that this equation is slightly different from that in earlier editions. The n/(n - 1) term is necessary because the sd used here is the sample estimate of the population sd rather than the sample sd which uses n in the denominator.
    Although it is clear that this index, ranging from 0 (for a perfect positive linear relationship) through 2 (for no linear relationship) to 4 (for a perfect negative one), does reflect the relationship between the variables in an intuitively meaningful way, it is useful to transform the scale linearly to make its interpretation even more clear. Let us reorient the index so that it runs from −1 for a perfect negative relationship to +1 for a perfect positive relationship. If we divide the sum of the squared discrepancies by 2(n − 1) and subtract the result from 1, we have
    r = 1
    (
    Σ
    (
    z X
    z Y
    )
    2
    2 ( n 1 )
    )
    ,
    (2.2.4)
    which for the data of Table 2.2.2 gives
    r = r = 1
    (
    9.614 28
    )
    = .657.
    r is the Product Moment Correlation Coefficient, invented by Karl Pearson in 1895.6 This coefficient is the standard measure of the linear relationship between two variables and has the following properties:
    6 The term product moment refers to the fact that the correlation is a function of the product of the first moments, of X and Y, respectively. See the next sections.
    1. It is a pure number and independent of the units of measurement.
    2. Its value varies between zero, when the variables have no linear relationship, and +1.00 or −1.00, when each variable is perfectly estimated by the other. The absolute value thus gives the degree of relationship.
    3. Its sign indicates the direction of the relationship. A positive sign indicates a tendency for high values of one variable to occur with high values of the other, and low values to occur with low. A negative sign indicates a tendency for high values of one variable to be associated with low values of the other. Reversing the direction of measurement of one of the variables will produce a coefficient of the same absolute value but of opposite sign. Coefficients of equal value but opposite sign (e.g., +.50 and −.50) thus indicate equally strong linear relationships, but in opposite directions.
  • Sensory Evaluation of Food
    eBook - ePub

    Sensory Evaluation of Food

    Statistical Methods and Procedures

    • Michael O'Mahony(Author)
    • 2017(Publication Date)
    • CRC Press
      (Publisher)
    Zero correlation coefficients can also be obtained in other ways; consider the examples of nonrandom points shown in Figures 15.7 and 15.8. It is difficult to know where to draw a line in these figures. Note : The correlation coefficient is a coefficient of linear correlation. It is high only when the points fall on a straight line. If the relationship between X and Y is curvilinear, as in Figure 15.8, the correlation will be zero. More complex coefficients are required to describe such a relationship. Figure 15.8 Curvilinear relationship yields zero correlation. 15.3 How to Compute Pearson’s Product-Moment Correlation Coefficient To measure the correlation between a set of Y and X values, Pearson’s product-moment correlation coefficient, developed by Karl Pearson, is used. We will simply give the formula for the correlation coefficient; we will not derive it formally. It is given by the formula r = Σ (X − X ¯) (Y − Y ¯) N S x S y where X ¯ and Y ¯ are the mean of the X and lvalues being correlated, S x and S y are their standard deviations, and N is the number of X scores (or the number of Y scores, but not the number of X scores + the number of Y scores), generally the number of subjects. tested. Note: S x = Σ (X − X ¯) 2 N − 1 = Σ X 2 − (Σ X) 2 N N − 1 This formula is inconvenient to use. However, it can be rearranged into a more convenient, albeit longer form: r = N Σ X Y − Σ X Σ Y [ N Σ X 2 − (Σ X) 2 ] [ N Σ Y 2 − (Σ Y) 2 Although this formula is not derived here, we can see that it is of an intuitively sensible form. Looking at the numerator, we can see that the larger the value of Σ XY, the larger the value of the numerator and hence r. Should the two sets of scores (X and Y) be positively correlated, large X and large Y values will be associated, giving large XY values and Σ XY will be correspondingly large. This can be clarified by an example
  • Statistical Power Analysis for the Behavioral Sciences
    • Jacob Cohen(Author)
    • 2013(Publication Date)
    • Routledge
      (Publisher)
    3
    The Significance of a Product Moment rs
    3.1 INTRODUCTION AND USE
    Behavioral scientists generally, and particularly psychologists with substantive interests in individual differences in personality, attitude, and ability, frequently take recourse to correlational anlysis as an investigative tool in both pure and applied studies. By far the most frequently used statistical method of expression of the relationship between two variables is the Pearson product-moment correlation coefficient, r .
    r is an index of linear relationship, the slope of the best-fitting straight line for a bivariate (X , Y ) distribution where the X and Y variables have each been standardized to the same variability. Its limits are −1.00 to +1.00. The purpose of this handbook precludes the use of space for a detailed consideration of the interpretations and assumptions of r . For this, the reader is referred to a general textbook, such as Cohen & Cohen (1983), Hays (1981), or Blalock (1972).
    When used as a purely descriptive measure of degree of linear relationship between two variables, no assumptions need be made with regard to the shape of the marginal population distribution of X and Y , nor of the distribution of Y for any given value of X (or vice versa), nor of equal variability of Y for different values of X (homoscedasticity). However, when significance tests come to be employed, assumptions of normality and homoscedasticity are formally invoked. Despite this, it should be noted that, as in the case of the t test with means, moderate assumption failure here, particularly with large n , will not seriously affect the validity of significance tests, nor of the power estimates associated with them.
    In this chapter we consider inference from a single correlation coefficient,
    rs
    , obtained from a sample of n pairs (X , Y ) of observations. There is only one population parameter involved, namely r , the population correlation coefficient. It is possible to test the null hypothesis that the population r equals any value c (discussed in Chapter 4 ). In most instances, however, the behavioral scientist is interested in whether there is any (linear) relationship between two variables, and this translates into the null hypothesis, H 0 : r = 0. Thus, in common statistical parlance, a significant r s is one which leads to a rejection of the null hypothesis that the population r is zero. It is around this null hypothesis that this chapter and its tables are oriented. (For the test on a difference between two r ’s, see Chapter 4
  • Sports Research with Analytical Solution using SPSS
    • J. P. Verma(Author)
    • 2016(Publication Date)
    • Wiley
      (Publisher)
    Researchers in the area of sports science are always engaged in finding ways and means to improve the capability of athletes to enhance their performance. It is therefore important to know the parameters that affect the performance in different sports. Once the parameters responsible for performance are identified, an effective training schedule can be developed to improve the performance. For instance, if a coach trains his budding athletes for the middle distance events, his first priority would be to develop their endurance and then try to improve their other parameters like strength, skills, and related techniques and tactics. This is so because endurance is highly associated with the performance of the middle distance event. Thus, it is important to identify the parameter that is highly related to the performance. This can be achieved by knowing the strength of relationship between a parameter and the performance. This strength of relationship between the two variables can be computed by a measure known as Product Moment Correlation Coefficient. In short, it is referred as correlation coefficient and is denoted by “r.”
    The correlation coefficient gives a fair estimate of the extent of relationship between the two variables, if the subjects are chosen randomly. But in most of the situations samples are purposive; and therefore, correlation coefficient in general may not give the correct picture of the real relationship. If a study is to be conducted on university students for developing a regression equation for estimating shot put performance, on the basis of some predictors, a sample may be drawn from all the university students who have participated in the interuniversity tournaments. The next job is to first identify the most contributing parameter to the shot put performance. And if the correlation coefficient between the performance and height comes out to be 0.8, it cannot be interpreted that height is highly related with the performance. It may be due to the fact that the subjects might have very good as well. Further, higher correlation might also be due to their higher coordinative ability, and leg strength. Thus, in this situation product moment correlation may not be considered as a good indicator of the real relationship between the height and shot put performance because the sample was purposive in nature. The sample is called “purposive” because it is not randomly chosen from the population of interest, rather has been obtained from a specific domain and for a specific purpose.
    Since correlation does not explain the cause and effect relationship, another measure is computed to overcome this problem, which is known as partial correlation. This provides a real relationship between the two variables after partialling out the effect of other independent variables. Partial correlation is a statistical technique of eliminating the effects of independent variables after the data is collected. Another method of eliminating the effect of independent variables is to make them constant while collecting the data, but this is not feasible all the time. Let us understand this fact through this example. Consider a situation where the height and weight of 20 children with age ranging from 12 to 18 years are selected, and the correlation between the height and weight is computed as 0.75. Although this correlation is quite high, it cannot be considered as an indicator of a real relationship between height and weight. This higher correlation has been observed because all the children belong to the developmental age; and during this age, in general, if the height increases weight also increases. Thus, in order to find the real relationship between the height and weight, the age needs to be constant. Age can be made constant by taking all the subjects from the same age category. But it is not possible in the experimental situation once the data collection is over. Even if an experimenter tries to control the effect of one or more variable manually, it may not be possible to control the effect of other variables; otherwise, one might end up with getting one or two sample only for the study.
  • Introduction to Statistics for Nurses
    • John Maltby, Liz Day, Glenn Williams(Authors)
    • 2014(Publication Date)
    • Routledge
      (Publisher)
    We now know what a correlation generally is, and the description of the relationship, but so far we have vague information. However, the Pearson and Spearman correlation statistics can give us much more information. The correlation coefficient provides a statistic that tells us the direction, the strength and the significance of the relationship between two variables.
    What do we mean by direction and strength? Well, both the Pearson and Spearman correlation statistics use the number of the data and present a final figure that indicates the direction and strength of the relationship. This final figure is always known as the correlation coefficient and is represented by r (Pearson) or rho (Spearman). The correlation coefficient will always be a value ranging from +1.00 through 0.00 to −1.00.
    • A correlation of +1.00 would be a ‘perfect’ positive relationship.
    • A correlation of −1.00 would be a ‘perfect’ negative relationship.
    • A correlation of 0.00 would be no relationship (no single straight line can sum up the almost random distribution of points).
    All correlations, then, will range from −1.00 to +1.00. So a correlation statistic could be −0.43 (a negative relationship) or 0.53 (a positive relationship). It could be 0.01, close to no relationship. This figure can then be used with significance testing, such as the chi-square, to see whether the relationship between the two variables is statistically significant. Let us show you how one of these correlational statistics works.

    Pearson product-moment correlation

    Remember from Figure 7.1 that the Pearson product-moment correlation is a test used with parametric data, and is used when you have two continuous-type variables that you believe should be used in a parametric test (for example, it shows a normal distribution, or rather that the date are not skewed). Remember, too, that the key idea of the correlation statistical test is to determine whether there is a relationship between the two variables.

    Performing the Pearson product-moment correlation on SPSS for Windows

    In the following example we will show how a Pearson product-moment correlation works by using some of the data from the adult branch dataset. One aspect of these data looks at the relationship between four dimensions of physical health as measured by the SF-36 Health Survey, Version 2 (
    Ware et al., 1994
    , 2000
  • Data Analysis for the Social Sciences
    eBook - ePub

    Data Analysis for the Social Sciences

    Integrating Theory and Practice

    Figure 7.20 we find that
    If we calculate cov from the data in Panel 2 of Figure 7.20 we find that
    =4320.
    Figure 7.22
    From the two covariances alone, we might be inclined to conclude that there is a stronger association in Panel 2 than there is in Panel 1. Examination of the two scatterplots produced from the two panels, however, reveals otherwise. Figures 7.21 and 7.22 reveal that it is only in terms of the units of measurements on the x-axis that the two scatterplots differ. The association between errors and latency depicted in the two scatterplots is identical. In the first panel latency was recorded in seconds; in the second panel latency was recorded in milliseconds. Just as we used indices such asφ, Cramér’s V, and Γ, to standardize the association between two categorical variables, we need an index for standardizing the association between two measurement variables.

    7.6 The Pearson Product Moment Correlation Coefficient

    In this section we will examine:
    1. the Pearson Product Moment Correlation Coefficient or r,
    2. how r is related to z-scores,
    3. how r can be transformed into an index of the reduction of error in prediction,
    4. how to test the significance of r, and
    5. nonparametric alternatives to r.
    Where φ and Γ are standardized measures of the association between two categorical (nominal and ordinal) variables, the Pearson Product Moment Correlation Coefficient (r) (often simply referred to as the correlation) is the most common index of a linear association between two measurement variables.
    The word ‘correlation’ comes from two words in Latin, cor (com) meaning ‘together’ and relatio which means relation.
    Like the other measures of association we have discussed, r can take any value between 0 and 1. And like the ordinal Γ, r can be either negative or positive. One way to begin to understand r
  • Basic Computational Techniques for Data Analysis
    eBook - ePub
    • D Narayana, Sharad Ranjan, Nupur Tyagi(Authors)
    • 2023(Publication Date)
    • Routledge India
      (Publisher)
    negative correlation. It means when one variable increases by some percentage, the other variable decreases but with lesser percentage, or if one variable decreases by some percentage, the other variable increases but with a lesser percentage.
    As mentioned before, the correlation coefficient of 0 indicates no correlation between the variables. It means that the change in one variable is not linked to another variable. The correlation coefficient can be estimated with the help of the following methods.

    7.3 Karl Pearson’s Coefficient of Correlation

    Karl Pearson’s correlation coefficient is the most widely used method to measure the correlation between two variables. It is also referred to as the Pearson product-moment correlation coefficient denoted by ‘r’. Pearson correlation coefficient has no unit of measurement. Thus, relationships across the variables can be measured regardless of the unit of measurement they are in. For example, the correlation coefficient between height (in cm) and weight (in kg) is neither expressed in cm nor kg. It is a number independent of the unit of measurement.
    It may be noted that the Pearson Coefficient is also independent of the change in the scale and the origin of the variables. Thus, the magnitude of the coefficient is not affected by the change in the unit of measurement of the variables. For example, the correlation coefficient between height (in cm) and weight (in kg) will be equal to the correlation coefficient between height (in m) and weight (in g).
    Karl Pearson’s coefficient of correlation between two variables, X and Y, is calculated using the formula given next:
    r
    x y
    =
    C o v a r i a n c e     b e t w e e n     X     a n d     Y
    P r o d u c t     o f     s t a n d a r d     d e v i a t i o n     o f     X     a n d     s t a n d a r d     d e v i a t i o n     o f     Y
    =
    σ
    x y
    σ x
    σ y
    or,
    r
    x y
    =
    i = 1
    N
    (
    x i
    X ¯
    )
    (
    y i
    Y ¯
    )
    i = 1
    N
    (
    x i
    X ¯
    )
  • Econometrics
    eBook - ePub
    • K. Nirmal Ravi Kumar(Author)
    • 2020(Publication Date)
    • CRC Press
      (Publisher)
    •  Correlation coefficient is unit free. That is, if we change the values of variables into different units (say, meters to centimeters or quintals to kgs etc), the correlation coefficient would remain the same.
    •  Correlation measures the strength of a linear relationship only between the variables. The closer ‘r’ is to 0 the weaker the relationship; the closer to +1 or – 1 the stronger the positive or negative relationship respectively. So, the sign of the correlation provides direction only.
    •  The value of correlation coefficient ranges from −1 to +1 i.e., −1 ≤ r ≤ +1 (Figures 2.5 and 2.6 ). If two sets of data (sample) have r = +1, they are said to be perfectly correlated positively and if r = −1, they are said to be perfectly correlated negatively; and if r = 0, they are uncorrelated. That is, a correlation coefficient quite close to 0, but either positive or negative, implies little or no relationship between the two variables. A correlation coefficient close to +1 means a strong positive relationship between the two variables (Figure 2.5.4 ). A correlation coefficient close to −1 indicates a strong negative relationship between two variables. Note that, when r = 0, we may not assert that, there is no relation at all between X and Y. For example, in Panel B of Figure 2.4 , there is relation between X (distance) and Y (height), but still ‘r’ is zero, as the relationship between X and Y is non-linear or non-monotonic. So, Pearson’s correlation coefficient is meant to measure linear relationship only. It should not be used in the case of non-linear relationships, since it will obviously lead to an erroneous interpretation.
    Fig. 2.5.4:    Extent of overlap indicates Positive Correlation of A and B variables
    •  The sign of the correlation coefficient determines, whether the correlation is positive or negative (ie., direction). The magnitude of the correlation coefficient determines the strength of the correlation. The extreme values of r, that is, when r = ±1, indicate that there is perfect (positive or negative) correlation between X and Y (Appendix 2.A.1 ). However, if r is 0, we say that there is no or zero correlation. The remaining values falling in sub-intervals of [–1 to +1], describe the relationship in terms of its strength and the Figure 2.6 may be used as a rough guideline as to what adjective should be used for the values of ‘r’obtained after calculation to describe the relationship. Say, for example, r
  • A Beginner's Guide to Structural Equation Modeling
    • Tiffany A. Whittaker, Randall E. Schumacker(Authors)
    • 2022(Publication Date)
    • Routledge
      (Publisher)
    Tankard, 1984 ). In fact, the basis of association between two variables (i.e., the bivariate correlation) has played a major role in statistics. The Pearson correlation coefficient provides the basis for point estimation (and test of significance), explanation (variance accounted for in a dependent variable by an independent variable), prediction (of a dependent variable from an independent variable through linear regression), reliability estimates (test–retest, equivalence), and validity (factorial, predictive, concurrent). The Pearson correlation coefficient also provides the basis for establishing and testing models among measured and/or latent variables.
    Although the Pearson correlation coefficient has had a major impact in the field of statistics, other correlation coefficients have emerged to accommodate the different levels of variable measurement. Stevens (1968) provided the properties of scales of measurement that have become known as nominal, ordinal, interval, and ratio. The distinguishing features of the four levels of measurement were discussed in Chapter 2 . The types of correlation coefficients developed for these various levels of measurement are categorized in Table 3.1 .
    Table 3.1: Types of Correlation Coefficients
    Correlation Coefficient
    Level of Measurement
    Pearson product-moment Both variables interval
    Spearman rank, Kendall’s tau Both variables ordinal
    Phi Both variables dichotomous
    Point biserial One variable interval, one variable dichotomous
    Gamma, rank biserial One variable ordinal, one variable nominal
    Biserial
    One variable interval, one variable artificiala
    Polyserial One variable interval, one variable ordinal with underlying continuity
    Tetrachoric
    Both variables dichotomous (artificiala )
    Polychoric Both variables ordinal with underlying continuities
    Note: a = artificial refers to recoding variable values into a dichotomy with underlying continuity.
    Many popular computer programs, for example, SAS and SPSS, typically do not compute all of these correlation types. Therefore, you may need to check a popular statistics book or look around for a computer program (R software) that will compute the type of correlation coefficient you need. For example, the phi and biserial coefficients are usually calculated with available macros (e.g., in SPSS or SAS) or using available functions in R. The Pearson correlation, tetrachoric or polychoric correlation, and biserial or polyserial correlation can all be used in SEM analysis with both LISREL and Mplus software. The correlations that are most commonly taught in a correlation and regression methods course are presented in Table 3.2
Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.