Mathematics

Chi Square Test for Homogeneity

The Chi Square Test for Homogeneity is a statistical test used to determine whether the distribution of categorical variables is the same across different groups or populations. It compares observed frequencies of the categories with the expected frequencies under the assumption of homogeneity. The test helps to assess whether there are significant differences in the distribution of categorical variables among groups.

Written by Perlego with AI-assistance

8 Key excerpts on "Chi Square Test for Homogeneity"

Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.
  • Practical Statistics for Field Biology
    • Jim Fowler, Lou Cohen, Philip Jarvis(Authors)
    • 2013(Publication Date)
    • Wiley
      (Publisher)

    ...13 ANALYSING FREQUENCIES 13.1 The chi-square test Field biologists spend a good deal of their time counting and classifying things on nominal scales such as species, colour and habitat. Statistical techniques which analyse frequencies are therefore especially useful. The classical method of analysing frequencies is the chi-square test. This involves computing a test statistic which is compared with a chi-square (χ 2) distribution that we outlined in Section 11.11. Because there is a different distribution for every possible number of degrees of freedom (df), tables in Appendix 3 showing the distribution of χ 2 are restricted to the critical values at the significance levels we are interested in. There we give critical values at P = 0.05 and P = 0.01 (the 5% and 1% levels) for 1 to 30 df. Between 30 and 100 df, the critical values are estimated by interpolation, but the need to do this arises infrequently. Chi-square tests are variously referred to as tests for homogeneity, randomness, association, independence and goodness of fit. This array is not as alarming as it might seem at first sight. The precise applications will become clear as you study the examples. In each application the underlying principle is the same. The frequencies we observe are compared with those we expect on the basis of some Null Hypothesis. If the discrepancy between observed and expected frequencies is great, then the value of the calculated test statistic will exceed the critical value at the appropriate number of degrees of freedom. We are then obliged to reject the Null Hypothesis in favour of some alternative. The mastery of the method lies not so much in the computation of the test statistic itself but in the calculation of the expected frequencies. We have already shown some examples of how expected frequencies are generated. They can be derived from sample data (Example 7.5) or according to a mathematical model (Section 7.4)...

  • Sensory Evaluation of Food
    eBook - ePub

    Sensory Evaluation of Food

    Statistical Methods and Procedures

    • Michael O'Mahony(Author)
    • 2017(Publication Date)
    • CRC Press
      (Publisher)

    ...6 Chi-Square 6.1 What is Chi-Square? We now examine a test called chi-square or chi-squared (also written as χ 2, where χ is the Greek lowercase letter chi); it is used to test hypotheses about frequency of occurrence. As the binomial test is used to test whether there may be more men or women in the university (a test of frequency of occurrence in the “men” and “women” categories), chi-square may be used for the same purpose. However, chi-square has more uses because it can test hypotheses about frequency of occurrence in more than two categories (e.g., dogs vs. cats vs. cows vs. horses). This is often used for categorizing responses to foods (“like” vs. “indifferent” vs. “dislike” or “too sweet” vs. “correct sweetness” vs. “not sweet enough”). Just as there is a normal and a binomial distribution, there is also a chi-square distribution, which can be used to calculate the probability of getting our particular results if the null hypothesis were true (see Section 6.6). In practice, a chi-square value is calculated and compared with the largest value that could occur on the null hypothesis (given in tables for various levels of significance); if the calculated value is larger than this value in the tables, H 0 is rejected. This procedure will become clearer with examples. In general, chi-square is given by the formula Chi-square = Σ [ (O − E) 2 E ] where O = observed frequency E = expected frequency We will now examine the application of this formula to various problems. First we look at the single-sample case, where we examine a sample to find out something about the population; this is the case in which a binomial test can also be used. 6.2 Chi-Square: Single-Sample Test-One-Way Classification In the example we used for the binomial test (Section 5.2) we were interested in whether there were different numbers of men and women on a university campus. Assume that we took a sample of 22 persons, of whom 16 were male and 6 were female...

  • Practitioner's Guide to Statistics and Lean Six Sigma for Process Improvements
    • Mikel J. Harry, Prem S. Mann, Ofelia C. De Hodgins, Richard L. Hulbert, Christopher J. Lacke(Authors)
    • 2011(Publication Date)
    • Wiley
      (Publisher)

    ...The numbers entered in the cells are usually called the joint frequencies. For example, 12 workers belong to the joint category of males and have no opinion. Hence, it is the joint frequency of this category. 18.4 TESTS OF INDEPENDENCE AND HOMOGENEITY This section is concerned with tests of independence and homogeneity, which are performed using the contingency tables. Except for a few modifications, the procedure used for such tests is almost the same as the one applied in Section 18.2 for a goodness-of-fit test. 18.4.1 Test of Independence In a test of independence for a contingency table, we test the null hypothesis that the two attributes (characteristics) of the elements of a given population are not related (i.e., are independent) against the alternative hypothesis that the two characteristics are related (i.e., are dependent). For example, we may want to test whether the gender and opinions of workers about the labor management contract mentioned in Table 18.3 are dependent. We perform such a test by using the chi-square distribution. As another example, we may want to test whether or not an association exists between the job satisfaction index and the absentee rate of employees. As we mentioned earlier, the methods used in a test of independence are similar to the methods used in a goodness-of-fit test. The three differences involve the hypotheses, the calculation of the number of degrees of freedom, and the method for calculating the expected frequencies. In the case of a test of independence, the null and alternative hypotheses are of the form The following formula is used to calculate the number of degrees of freedom for a test of independence where R and C are the numbers of rows and columns, respectively, in the contingency table. In order to calculate the expected frequencies, we must first calculate the row and column totals for the contingency table...

  • Essential Statistics for Public Managers and Policy Analysts
    • Evan M. Berman, XiaoHu Wang(Authors)
    • 2016(Publication Date)
    • CQ Press
      (Publisher)

    ...Chi-square is but one statistic for testing a relationship between two categorical variables. Once analysts have determined that a statistically significant relationship exists through hypothesis testing, they need to assess the practical relevance of their findings. Remember, large datasets easily allow for findings of statistical significance. Practical relevance deals with the relevance of statistical differences for managers; it addresses whether statistically significant relationships have meaningful policy implications. Key Terms Alternate hypothesis (p. 182) Chi-square (p. 178) Chi-square test assumptions (p. 186) Critical value (p. 184) Degrees of freedom (p. 184) Dependent samples (p. 186) Expected frequencies (p. 179) Five steps of hypothesis testing (p. 184) Goodness-of-fit test (p. 191) Independent samples (p.186) Kendall’s tau-c (p.193) Level of statistical significance (p. 183) Null hypothesis (p. 181) Purpose of hypothesis testing (p. 180) Sample size (and hypothesis testing) (p. 188) Statistical power (p. 190) Statistical significance (p. 183) Appendix 11.1: Rival Hypotheses: Adding a Control Variable We now extend our discussion to rival hypotheses. The following is but one approach (sometimes called the “elaboration paradigm”), and we provide other (and more efficient) approaches in subsequent chapters. First mentioned in Chapter 2, rival hypotheses are alternative, plausible explanations of findings...

  • Statistics
    eBook - ePub

    Statistics

    The Essentials for Research

    ...Second, we are often in a position where we know only that someone “graduated” or “failed to graduate” so we cannot use a test that utilizes finer distinctions. Finally, our data may consist of categories that differ qualitatively —non-orderable countables not amenable to true measurement, such as male-female. Chi square is relatively easy to calculate and, although it is frequently used incorrectly, its prevalence in the literature makes it an important test to know about. 10.11 Overview This is the third distribution we have studied. We have discussed the binomial distribution, the normal distribution, and now the chi square distribution. The use of all of these distributions in tests of statistical significance is quite similar. The distributions provide us with a theoretical relative frequency of events; for the binomial it is the relative frequency, or probability, of obtaining any proportion of events in a sample of size n, given the proportion of events in the population from which the sample was randomly drawn; for the normal distribution it is the relative frequency, or probability, of obtaining samples yielding values of z as deviant as those listed in Table N ; for the chi square distribution with various df it is the probability of obtaining χ 2 values as large or larger than those listed in Table C. In each case, when we select an appropriate test of significance, we assume that if the null hypothesis is true, our data should conform to that theoretical sampling distribution. When the test is significant, it means that on the basis of the hypothesized sampling distribution, the results are quite improbable. However, before we can reject hypotheses about the population parameters, it is quite important that the remaining assumptions about the distribution have been met, for example, that observations are randomly obtained and that we have the proper df...

  • Statistics for the Behavioural Sciences
    eBook - ePub

    Statistics for the Behavioural Sciences

    An Introduction to Frequentist and Bayesian Approaches

    • Riccardo Russo(Author)
    • 2020(Publication Date)
    • Routledge
      (Publisher)

    ...12  The chi-square distribution and the analysis of categorical data 12.1 Introduction In this chapter, a new continuous distribution is described. This is the chi-square (or alternatively chi-squared) distribution. We will show how this continuous distribution can be used in the analysis of discrete categorical (or alternatively frequency) data. First, the general characteristics of the chi-square distribution are presented. Then the Pearson's chi-square test is described. Examples of its application in the assessment of how well a set of observed frequencies matches a set of expected frequencies (i.e., goodness of fit test), and in the analysis of contingency tables (Frequentist and Bayesian) are provided. 12.2 The chi-square (χ 2) distribution “Chi” stands for the Greek letter χ and is pronounced as either “key” or “kai”. “Square” or, alternatively, “squared”, means raised to the power of two, hence the notation χ 2. The chi-square distribution is obtained from the standardised normal distribution in the following way. Suppose we could sample a z score from the z distribution, we square it and its value is recorded. The sampling process is performed an infinite number of times, allowing for the possibility that any z score can be sampled again (i.e., independent sampling). If the z 2 scores obtained are then plotted, the resulting distribution is the χ 2 distribution with one degree of freedom (denoted as χ 1 2). Now suppose we independently sample two χ 2 scores from the χ 1 2 distribution and we add their values, as done above in the case of the z scores. This process is performed an infinite number of times, and all the sums obtained are plotted. The resulting distribution is the χ 2 distribution with two degrees of freedom (denoted as χ 2 2). This process can be generalised to the distribution of any sum of k random variables each having the χ 1 2 distribution...

  • Social Statistics
    eBook - ePub

    Social Statistics

    Managing Data, Conducting Analyses, Presenting Results

    • Thomas J. Linneman(Author)
    • 2017(Publication Date)
    • Routledge
      (Publisher)

    ...By making reference to the percentages in the cells of the crosstab, explain why the chi-square test turned out the way it did. Exercise 19 Using the PewSocialMedia dataset, create a crosstab and chi-square test to address this hypothesis: if someone is against the government’s surveillance program (as measured by the variable survatt), they will also be likely to be bothered by others posting pictures of their children without permission (as measured by the variable fbbother4). Explain your results. Exercise 20 Using the PewSocialMedia dataset, create a crosstab and chi-square test to address this hypothesis: do whites and nonwhites (as measured by the recoded variable XYracewnw) differ with regard to how often they felt they could not cope with all the things they had to do (as measured by the variable upset6). Then, rerun the crosstab, controlling for sex. Explain your results. Exercise 21 Using the WVS dataset, I created a dichotomy of the ETHDIVORCE variable called ETHDIVORCE2CAT, where one category contained countries with lower acceptability of divorce, and the other category contained countries with higher acceptability of divorce. Use this variable as your dependent variable and the countries-in-six-categories variable as the independent variable in a crosstab, running a chi-square test as well. Explain your results. Exercise 22 Using the WVS dataset, I created two dichotomous variables, using two of the variables related to what is important for children: self-expression and obeying. I classified each country in terms of whether it felt these two issues were less important or more important. The two variables are called CHILDEXPR2CAT and CHILDOBEY2CAT. Use these variables in a crosstab and run a chi-square test...

  • Social Statistics
    eBook - ePub

    Social Statistics

    Managing Data, Conducting Analyses, Presenting Results

    • Thomas J. Linneman(Author)
    • 2021(Publication Date)
    • Routledge
      (Publisher)

    ...In other words, we cannot say that there is a difference in the population; we have to conclude that there is not a difference. If the chi-square test proves to be statistically significant, with p being below 0.05, we conclude that we can reject H 0 in favor of H alt. In other words, we can say that there is a difference in the population. Let’s take a moment to make some connections here: if your observed frequencies don’t differ enough from the frequencies you would have expected if there were no difference, you will have a low chi-square value and a p greater than.05, and you will fail to reject H 0, which claims that there is no difference. So, if you hear people using this language, don’t let it throw you. It’s actually quite similar to p -style conclusions. The Chi-Square Distribution Now that we’ve gone through the chi-square procedure a few times, I want to talk about what’s really going on here. Where do these various “significance cutoffs” come from? How can we claim that a difference is large enough to be considered statistically significant? To address these questions, I’m going to reverse what we’ve been doing. Instead of starting with a sample and then saying something about the population from which it is drawn, I’m going to show you a population and then draw samples from it. Let’s say we have a small town whose entire population is 300 people, and we’re able to ask all 300 of them whether they own a gun. We get the following crosstab for the population: Exhibit 4.35 Crosstab of Gun Ownership by Sex for a Hypothetical Population Notice that in this population crosstab, the two groups do not differ: the population of men and the population of women each have a 35% chance of owning a gun. From this population, I randomly drew a sample of 100 people and created a crosstab based on my sample information: Exhibit 4.36 Crosstab of Gun Ownership by Sex for First Sample Notice how similar the sample results are to the population...