Mathematics

Chi Square Test for Goodness of Fit

The Chi Square Test for Goodness of Fit is a statistical test used to determine whether an observed frequency distribution differs from a theoretical distribution. It compares the observed frequencies of different categories with the expected frequencies to assess if there is a significant difference. This test is commonly used in various fields such as biology, business, and social sciences to analyze categorical data.

Written by Perlego with AI-assistance

8 Key excerpts on "Chi Square Test for Goodness of Fit"

Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.
  • The SAGE Encyclopedia of Educational Research, Measurement, and Evaluation

    ...While the idea of determining whether standard distributions gave acceptable fits to data sets was well established early in Pearson’s career, detailed in his 1900 paper, he was determined to derive a test procedure that further advanced the problem of goodness of fit. As a result, the formulation of the chi-square statistic stands as one of the greatest statistical achievements of the 20th century. Basic Principles and Applications Generally speaking, a chi-square test (also commonly referred to as χ 2) refers to a bevy of statistical hypothesis tests where the objective is to compare a sample distribution to a theorized distribution to confirm (or refute) a null hypothesis. Two important conditions that must exist for the chi-square test are independence and sample size or distribution. For independence, each case that contributes to the overall count or data set must be independent of all other cases that make up the overall count. Second, each particular scenario must have a specified number of cases within the data set to perform the analysis. The literature points to a number of arbitrary cutoffs for the overall sample size. The chi-square test has most often been utilized in two types of comparison situations: a test of goodness of fit or a test of independence. One of the most common uses of the chi-square test is to determine whether a frequency data set can be adequately represented by a specified distribution function. More clearly, a chi-square test is appropriate when you are trying to determine whether sample data are consistent with a hypothesized distribution...

  • Practitioner's Guide to Statistics and Lean Six Sigma for Process Improvements
    • Mikel J. Harry, Prem S. Mann, Ofelia C. De Hodgins, Richard L. Hulbert, Christopher J. Lacke(Authors)
    • 2011(Publication Date)
    • Wiley
      (Publisher)

    ...This test will be conducted using the procedures discussed in Section 16.4. As mentioned earlier, the frequencies obtained from the performance of an experiment are called the observed frequencies. They are denoted by Oi. To make a goodness-of-fit test, we calculate the expected frequencies for all categories of the experiment. The expected frequency for category i, denoted by E i, is given by the product of n and p i, where n is the total number of trials and p i is the probability for category i. The degrees of freedom for a goodness-of-fit test are where k equals the number of possible categories in the experiment. The test statistics for a goodness-of-fit test is χ 2 and its value is calculated as where O i = observed frequency for category i and E i, = expected frequency for category i = np i. The procedure for performing a goodness-of-fit test involves the same five steps that were used in the preceding chapters. The chi-square goodness-of-fit test is always a right-tailed test. Whether or not the null hypothesis is rejected depends on how much the observed and expected frequencies differ from each other. To find how large the difference between the observed frequencies and the expected frequencies is, we do not look only at O i – E i because some of the O i – E i, values will be positive and others will be negative. The net result of the sum of these differences will always be zero. Therefore, we square each of the O i – E i values to obtain (O i – E i) 2 and then weight them according to the reciprocals of their expected frequencies...

  • Statistics for the Behavioural Sciences
    eBook - ePub

    Statistics for the Behavioural Sciences

    An Introduction to Frequentist and Bayesian Approaches

    • Riccardo Russo(Author)
    • 2020(Publication Date)
    • Routledge
      (Publisher)

    ...Other statistical tests, for example, the Friedman's test (see Siegel & Castellan, 1988), also make use of the χ 2 distribution in the process of statistical inference. 12.4 The Pearson's χ 2 goodness of fit test In the previous section our example used the Pearson's χ 2 test to assess whether or not there was a good fit between a set of observed and expected frequencies. In this section we generalise the use of this test to situations where the phenomenon studied has more than two outcomes. Consider a fictitious study where a sample of 120 university lecturers are asked to indicate what they think is the most effective way to assess undergraduate students. Each lecturer has to select only one of these 3 types of assessment: only by end-of-year exams; only by course-work submitted during the year; by a mixture of course-work and end-of-year exams. The results of this study are shown in Table 12.3. Our aim is to assess whether there is a preferred method of assessment. The null hypothesis states that each of the three methods of assessment has the same probability of being selected, i.e., 1 3. Thus, the expected frequency of each outcome is given by: 120 × 1 3 = 40. Table 12.3 Observed and expected frequencies for the study of the most effective way to assess undergraduate students described above. The differences between observed and expected frequencies, their squared values, and their squared values divided by the expected frequency are also reported for each outcome To calculate the Pearson's χ 2 statistic we need to obtain the differences between observed and expected frequencies and their squared values. These are given in Table 12.3. Applying the appropriate formula we obtain: χ 2 = ∑ (O − E) 2 E = (35 − 40) 2 40 + (23 − 40) 2 40 + (62 − 40) 2 40 = 25 40 + 289 40 + 484 40 = 19.95. We now need to compare the obtained χ 2 statistic against the critical value of the relevant χ 2 distribution with df = 2, where df = m − 1 = 3 − 1 = 2...

  • Biostatistical Design and Analysis Using R
    eBook - ePub

    ...Random sampling should address this. (ii) No more than 20% of the expected frequencies are less than five. χ 2 distributions do not reliably approximate the distribution of all possible chi-square values under those circumstances e. Since the expected values are a function of sample sizes, meeting this assumption is a matter of ensuring sufficient replication. When sample sizes or other circumstances beyond control lead to a violation of this assumption, numerous options are available (see section 16.5) 16.2 Goodness of fit tests 16.2.1 Homogeneous frequencies tests Homogeneous frequencies tests (often referred to as goodness of fit tests) are used to test null hypotheses that the category frequencies observed within a single variable could arise from a population displaying a specific ratio of frequencies. The null hypothesis (H 0) is that the observed frequencies come from a population with a specific ratio of frequencies. 16.2.2 Distributional conformity - Kolmogorov-Smirnov tests Strictly, goodness of fit tests are used to examine whether a frequency/sampling distribution is homogeneous with some declared distribution. For example, we might use a goodness of fit test to formally investigate whether the distribution of a response variable deviates substantially from a normal distribution. In this case, frequencies of responses in a set of pre-defined bin ranges are compared to those frequencies expected according to the mathematical model of a normal distribution. Since calculations of these expected frequencies also involve estimates of population mean and variance (both required to determine the mathematical formula), a two degree of freedom loss is incurred (hence df = n – 2). 16.3 Contingency tables Contingency tables are used to investigate the associations between two or more categorical variables...

  • Practical Statistics for Field Biology
    • Jim Fowler, Lou Cohen, Philip Jarvis(Authors)
    • 2013(Publication Date)
    • Wiley
      (Publisher)

    ...13 ANALYSING FREQUENCIES 13.1 The chi-square test Field biologists spend a good deal of their time counting and classifying things on nominal scales such as species, colour and habitat. Statistical techniques which analyse frequencies are therefore especially useful. The classical method of analysing frequencies is the chi-square test. This involves computing a test statistic which is compared with a chi-square (χ 2) distribution that we outlined in Section 11.11. Because there is a different distribution for every possible number of degrees of freedom (df), tables in Appendix 3 showing the distribution of χ 2 are restricted to the critical values at the significance levels we are interested in. There we give critical values at P = 0.05 and P = 0.01 (the 5% and 1% levels) for 1 to 30 df. Between 30 and 100 df, the critical values are estimated by interpolation, but the need to do this arises infrequently. Chi-square tests are variously referred to as tests for homogeneity, randomness, association, independence and goodness of fit. This array is not as alarming as it might seem at first sight. The precise applications will become clear as you study the examples. In each application the underlying principle is the same. The frequencies we observe are compared with those we expect on the basis of some Null Hypothesis. If the discrepancy between observed and expected frequencies is great, then the value of the calculated test statistic will exceed the critical value at the appropriate number of degrees of freedom. We are then obliged to reject the Null Hypothesis in favour of some alternative. The mastery of the method lies not so much in the computation of the test statistic itself but in the calculation of the expected frequencies. We have already shown some examples of how expected frequencies are generated. They can be derived from sample data (Example 7.5) or according to a mathematical model (Section 7.4)...

  • Sensory Evaluation of Food
    eBook - ePub

    Sensory Evaluation of Food

    Statistical Methods and Procedures

    • Michael O'Mahony(Author)
    • 2017(Publication Date)
    • CRC Press
      (Publisher)

    ...6 Chi-Square 6.1 What is Chi-Square? We now examine a test called chi-square or chi-squared (also written as χ 2, where χ is the Greek lowercase letter chi); it is used to test hypotheses about frequency of occurrence. As the binomial test is used to test whether there may be more men or women in the university (a test of frequency of occurrence in the “men” and “women” categories), chi-square may be used for the same purpose. However, chi-square has more uses because it can test hypotheses about frequency of occurrence in more than two categories (e.g., dogs vs. cats vs. cows vs. horses). This is often used for categorizing responses to foods (“like” vs. “indifferent” vs. “dislike” or “too sweet” vs. “correct sweetness” vs. “not sweet enough”). Just as there is a normal and a binomial distribution, there is also a chi-square distribution, which can be used to calculate the probability of getting our particular results if the null hypothesis were true (see Section 6.6). In practice, a chi-square value is calculated and compared with the largest value that could occur on the null hypothesis (given in tables for various levels of significance); if the calculated value is larger than this value in the tables, H 0 is rejected. This procedure will become clearer with examples. In general, chi-square is given by the formula Chi-square = Σ [ (O − E) 2 E ] where O = observed frequency E = expected frequency We will now examine the application of this formula to various problems. First we look at the single-sample case, where we examine a sample to find out something about the population; this is the case in which a binomial test can also be used. 6.2 Chi-Square: Single-Sample Test-One-Way Classification In the example we used for the binomial test (Section 5.2) we were interested in whether there were different numbers of men and women on a university campus. Assume that we took a sample of 22 persons, of whom 16 were male and 6 were female...

  • Statistics
    eBook - ePub

    Statistics

    The Essentials for Research

    ...Second, we are often in a position where we know only that someone “graduated” or “failed to graduate” so we cannot use a test that utilizes finer distinctions. Finally, our data may consist of categories that differ qualitatively —non-orderable countables not amenable to true measurement, such as male-female. Chi square is relatively easy to calculate and, although it is frequently used incorrectly, its prevalence in the literature makes it an important test to know about. 10.11 Overview This is the third distribution we have studied. We have discussed the binomial distribution, the normal distribution, and now the chi square distribution. The use of all of these distributions in tests of statistical significance is quite similar. The distributions provide us with a theoretical relative frequency of events; for the binomial it is the relative frequency, or probability, of obtaining any proportion of events in a sample of size n, given the proportion of events in the population from which the sample was randomly drawn; for the normal distribution it is the relative frequency, or probability, of obtaining samples yielding values of z as deviant as those listed in Table N ; for the chi square distribution with various df it is the probability of obtaining χ 2 values as large or larger than those listed in Table C. In each case, when we select an appropriate test of significance, we assume that if the null hypothesis is true, our data should conform to that theoretical sampling distribution. When the test is significant, it means that on the basis of the hypothesized sampling distribution, the results are quite improbable. However, before we can reject hypotheses about the population parameters, it is quite important that the remaining assumptions about the distribution have been met, for example, that observations are randomly obtained and that we have the proper df...

  • Social Statistics
    eBook - ePub

    Social Statistics

    Managing Data, Conducting Analyses, Presenting Results

    • Thomas J. Linneman(Author)
    • 2017(Publication Date)
    • Routledge
      (Publisher)

    ...Chi-square, chi-squared, I’ve seen both, and I’m not picky. I am picky about pronunciation: say chiropractor and then take off the ropractor. Although I like to drink chai, that’s not what we’re doing here. Although I appreciate tai chi, that’s not what we’re doing here. In the world of statistical tests, the chi-square test is a relatively easy one to use. It contrasts the frequencies you observed in the crosstab with the frequencies you would expect if there were no relationship among the variables in your crosstab. It makes this contrast with each cell in the crosstab. We’ll use the third sex/gun crosstab from earlier, the one where your gut wasn’t completely sure if there was a generalizable relationship. Here it is, with its frequencies expected crosstab next to it: ■ Exhibit 4.12: Frequencies Observed and Frequencies Expected Let’ s first find the difference between the frequencies observed (hereafter referred to as f o) and the frequencies we would expect (hereafter referred to as f e): ■ Exhibit 4.13: Differences between Observed and Expected Frequencies Cell f o f e f o - f e Top left 56 49 7 Top right 91 98 -7 Bottom left 44 51 -7 Bottom right 109 102 7 Then we’re going to square each of these and divide it by its corresponding f e : ■ Exhibit 4.14: Calculating the Chi-Square Value The sum of the last column of numbers is our value for chi-square: 1.00 + 0.50 + 0.96 + 0.48 = 2.94 Here is the formula for what we just. did: χ 2 = ⁢ Σ (f o - f e) 2 f e Notice that the symbol for chi-square is χ 2. It looks like an x with some attitude. Our chi-square value of 2.94 is not an end in itself but rather a means to an end. For now we are going to go shopping, or at least an activity that I consider similar to shopping. When you go shopping (let’s say shirt shopping, because everyone loves shirts), you go into a store with one thing (money) and you come out of the store with something else (a shirt)...