Mathematics

Chi Square Test for Independence

The Chi Square Test for Independence is a statistical test used to determine if there is a significant association between two categorical variables. It compares the observed frequencies of the variables with the frequencies that would be expected if there was no relationship between them. The test is commonly used in research and data analysis to assess the independence of variables.

Written by Perlego with AI-assistance

8 Key excerpts on "Chi Square Test for Independence"

Learn about this page

Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.

eBook - ePub
Essential Statistics for Public Managers and Policy Analysts
- Evan M. Berman, XiaoHu Wang(Authors)
- 2016(Publication Date)
- CQ Press
  (Publisher)
When analysts are confronted with two categorical variables, which can also be used to make a contingency table, chi-square is a widely used test for establishing whether a relationship exists (see the Statistics Roadmap at the beginning of the book). Chi-square has three test assumptions: (1) that variables are categorical, (2) that observations are independent, and (3) that no cells have fewer than five expected frequency counts. Remember, violation of test assumptions invalidates any test result. Chi-square is but one statistic for testing a relationship between two categorical variables.

Once analysts have determined that a statistically significant relationship exists through hypothesis testing, they need to assess the practical relevance of their findings. Remember, large datasets easily allow for findings of statistical significance. Practical relevance deals with the relevance of statistical differences for managers; it addresses whether statistically significant relationships have meaningful policy implications.

Key Terms

Alternate hypothesis (p. 182)
Chi-square (p. 178)
Chi-square test assumptions (p. 186)
Critical value (p. 184)
Degrees of freedom (p. 184)
Dependent samples (p. 186)
Expected frequencies (p. 179)
Five steps of hypothesis testing (p. 184)
Goodness-of-fit test (p. 191)
Independent samples (p.186)
Kendall’s tau-c (p.193)
Level of statistical significance (p. 183)
Null hypothesis (p. 181)
Purpose of hypothesis testing (p. 180)
Sample size (and hypothesis testing) (p. 188)
Statistical power (p. 190)
Statistical significance (p. 183)
Appendix 11.1: Rival Hypotheses: Adding a Control Variable

We now extend our discussion to rival hypotheses. The following is but one approach (sometimes called the “elaboration paradigm”), and we provide other (and more efficient) approaches in subsequent chapters. First mentioned in Chapter 2 , rival hypotheses are alternative, plausible explanations of findings. We established earlier that men are promoted faster than women, and in Chapter 8
Sign up to read
Learn more about book
eBook - ePub
Practical Statistics for Field Biology
- Jim Fowler, Lou Cohen, Philip Jarvis(Authors)
- 2013(Publication Date)
- Wiley
  (Publisher)
13

ANALYSING FREQUENCIES

13.1 The chi-square test

Field biologists spend a good deal of their time counting and classifying things on nominal scales such as species, colour and habitat. Statistical techniques which analyse frequencies are therefore especially useful. The classical method of analysing frequencies is the chi-square test. This involves computing a test statistic which is compared with a chi-square (χ 2 ) distribution that we outlined in Section 11.11. Because there is a different distribution for every possible number of degrees of freedom (df), tables in Appendix 3 showing the distribution of χ 2 are restricted to the critical values at the significance levels we are interested in. There we give critical values at P = 0.05 and P = 0.01 (the 5% and 1% levels) for 1 to 30 df. Between 30 and 100 df, the critical values are estimated by interpolation, but the need to do this arises infrequently.

Chi-square tests are variously referred to as tests for homogeneity, randomness, association, independence and goodness of fit. This array is not as alarming as it might seem at first sight. The precise applications will become clear as you study the examples. In each application the underlying principle is the same. The frequencies we observe are compared with those we expect on the basis of some Null Hypothesis. If the discrepancy between observed and expected frequencies is great, then the value of the calculated test statistic will exceed the critical value at the appropriate number of degrees of freedom. We are then obliged to reject the Null Hypothesis in favour of some alternative.

The mastery of the method lies not so much in the computation of the test statistic itself but in the calculation of the expected frequencies. We have already shown some examples of how expected frequencies are generated. They can be derived from sample data (Example 7.5) or according to a mathematical model (Section 7.4). The versions of the test which compare observed frequencies with those expected from a model are called goodness of fit tests. All versions of the chi-square test assume that samples are random and observations are independent.
Sign up to read
Learn more about book
eBook - ePub
Sensory Evaluation of Food
Statistical Methods and Procedures
- Michael O'Mahony(Author)
- 2017(Publication Date)
- CRC Press
  (Publisher)
6
Chi-Square

6.1 What is Chi-Square?

We now examine a test called chi-square or chi-squared (also written as χ 2 , where χ is the Greek lowercase letter chi); it is used to test hypotheses about frequency of occurrence. As the binomial test is used to test whether there may be more men or women in the university (a test of frequency of occurrence in the “men” and “women” categories), chi-square may be used for the same purpose. However, chi-square has more uses because it can test hypotheses about frequency of occurrence in more than two categories (e.g., dogs vs. cats vs. cows vs. horses). This is often used for categorizing responses to foods (“like” vs. “indifferent” vs. “dislike” or “too sweet” vs. “correct sweetness” vs. “not sweet enough”).

Just as there is a normal and a binomial distribution, there is also a chi-square distribution, which can be used to calculate the probability of getting our particular results if the null hypothesis were true (see Section 6.6 ). In practice, a chi-square value is calculated and compared with the largest value that could occur on the null hypothesis (given in tables for various levels of significance); if the calculated value is larger than this value in the tables, H 0 is rejected. This procedure will become clearer with examples.
In general, chi-square is given by the formula
Chi-square = Σ [
( O − E )
2
E
]
where
O = observed frequency
E =
expected frequency

We will now examine the application of this formula to various problems. First we look at the single-sample case, where we examine a sample to find out something about the population; this is the case in which a binomial test can also be used.

6.2 Chi-Square: Single-Sample Test-One-Way Classification

In the example we used for the binomial test (Section 5.2 ) we were interested in whether there were different numbers of men and women on a university campus. Assume that we took a sample of 22 persons, of whom 16 were male and 6 were female. We use the same logic as with a binomial test. We calculate the probability of getting our result on H 0 , and if it is small, we reject H 0 . From Table G.4.b , the two-tailed binomial probability associated with this is 0.052, so we would not reject H 0 at p < 0.05. However, we can also set up a chi-square test. If H 0 is true, there is no difference in the numbers of men and women; the expected number of males and females from a sample of 22 is 11 each. Thus we have our observed frequencies (O = 16 and 6) and our expected frequencies (E
Sign up to read
Learn more about book
eBook - ePub
A Conceptual Guide to Statistics Using SPSS
- Elliot T. Berkman, Steven P. Reise(Authors)
- 2011(Publication Date)
- SAGE Publications, Inc
  (Publisher)
3
The Chi-Squared Test for Contingency Tables

CHAPTER OUTLINE

Introduction to the Chi-Squared Test Computing the Chi-Squared Test in SPSS
A Closer Look: Fisher’s Exact Test
The Chi-Squared Test for Testing the Distribution of One Categorical Variable

Introduction to the Chi-Squared Test

In the chapter on descriptive statistics, we drew a distinction between categorical and continuous variables. Most of the inferential statistics we discuss in this book assume that your outcome variable is continuous. However, sometimes we have outcomes that fall into categories (e.g., was someone on trial for a crime convicted or not? Or did a participant choose to open door number one, two, or three?). In these cases, and when our predictor variable is also categorical, the chi-squared test is appropriate.

The raw data typically analyzed using the chi-squared test are counts of the same outcome in each of two or more conditions. For example, if you wanted to know whether gender affected traffic court convictions, you could tally up the number of men who were and weren’t convicted on a given day, then separately tally up the number of women who were and weren’t convicted on that same day. Those four counts would then be entered into a chi-squared analysis, and it would tell you whether the proportion of men who were convicted that day was different from the proportion of women who were convicted.

The chi-squared test can also be used to answer questions about proportions within a single variable. In other words, the test can be used to tell you whether the percentage of cases in each category differs from some hypothesized distribution. Suppose that a court claims that it convicts 90% of people who come up with a traffic violation. If you tallied the number of people who were convicted or not for a given period, the chi-squared test could tell you whether the proportion of people convicted in that period was significantly different from 90%.
Sign up to read
Learn more about book
eBook - ePub
Statistics for Psychologists
An Intermediate Course
- Brian S. Everitt(Author)
- 2001(Publication Date)
- Psychology Press
  (Publisher)
pairs of observations that both have A, and so on.
To test whether the probability of having A differs in the matched populations, the relevant test statistic is which, if there is no difference, has a chi-squared distribution with a single degree of freedom.

We can illustrate the use of Fisher’s exact test on the data on suicidal feelings in Table 9.4 because this has some small expected values (see Section 9.4 for more comments). The p-value form applying the test is .235, indicating that diagnosis and suicidal feelings are not associated.

To illustrate McNemar’s test, we use the data shown in Table 9.6 . For these data the test statistic takes the value 1.29, which is clearly not significant, and we can conclude that depersonalization is not associated with prognosis where endogenous depressed patients are concerned.

Table 9.6 Recovery of 23 Pairs of Depressed Patients

9.3. Beyond the Chi-Square Test: Further Exploration of Contingency Tables by Using Residuals and Correspondence Analysis

A statistical significance test is, as implied in Chapter 1 , often a crude and blunt instrument. This is particularly true in the case of the chi-square test for independence in the analysis of contingency tables, and after a significant value of the test statistic is found, it is usually advisable to investigate in more detail why the null hypothesis of independence fails to fit. Here we shall look at two approaches, the first involving suitably chosen residuals and the second that attempts to represent the association in a contingency table graphically.

9.3.1. The Use of Residuals in the Analysis of Contingency Tables

After a significant chi-squared statistic is found and independence is rejected for a two-dimensional contingency table, it is usually informative to try to identify the cells of the table responsible, or most responsible, for the lack of independence. It might be thought that this can be done relatively simply by looking at the deviations of the observed counts in each cell of the table from the estimated expected values under independence, that is, by examination of the residuals:

Learn more about book

eBook - ePub

Statistics

The Essentials for Research

Henry E. Klugh(Author)
2013(Publication Date)
Psychology Press
(Publisher)

qualitatively—non-orderable countables not amenable to true measurement, such as male-female. Chi square is relatively easy to calculate and, although it is frequently used incorrectly, its prevalence in the literature makes it an important test to know about.

10.11 Overview

This is the third distribution we have studied. We have discussed the binomial distribution, the normal distribution, and now the chi square distribution. The use of all of these distributions in tests of statistical significance is quite similar. The distributions provide us with a theoretical relative frequency of events; for the binomial it is the relative frequency, or probability, of obtaining any proportion of events in a sample of size n, given the proportion of events in the population from which the sample was randomly drawn; for the normal distribution it is the relative frequency, or probability, of obtaining samples yielding values of z as deviant as those listed in Table N ; for the chi square distribution with various df it is the probability of obtaining χ2 values as large or larger than those listed in Table C .

In each case, when we select an appropriate test of significance, we assume that if the null hypothesis is true, our data should conform to that theoretical sampling distribution. When the test is significant, it means that on the basis of the hypothesized sampling distribution, the results are quite improbable. However, before we can reject hypotheses about the population parameters, it is quite important that the remaining assumptions about the distribution have been met, for example, that observations are randomly obtained and that we have the proper df. If we have not met these assumptions, we shall be dealing with an unknown distribution and obtain meaningless levels of significance.

Learn more about book

eBook - ePub

Statistics for the Behavioural Sciences

An Introduction to Frequentist and Bayesian Approaches

Riccardo Russo(Author)
2020(Publication Date)
Routledge
(Publisher)

12 The chi-square distribution and the analysis of categorical data

12.1 Introduction

In this chapter, a new continuous distribution is described. This is the chi-square (or alternatively chi-squared) distribution. We will show how this continuous distribution can be used in the analysis of discrete categorical (or alternatively frequency) data.

First, the general characteristics of the chi-square distribution are presented. Then the Pearson's chi-square test is described. Examples of its application in the assessment of how well a set of observed frequencies matches a set of expected frequencies (i.e., goodness of fit test), and in the analysis of contingency tables (Frequentist and Bayesian) are provided.

12.2 The chi-square (χ2 ) distribution

“Chi” stands for the Greek letter χ and is pronounced as either “key” or “kai”. “Square” or, alternatively, “squared”, means raised to the power of two, hence the notation χ2 . The chi-square distribution is obtained from the standardised normal distribution in the following way. Suppose we could sample a z score from the z distribution, we square it and its value is recorded. The sampling process is performed an infinite number of times, allowing for the possibility that any z score can be sampled again (i.e., independent sampling). If the z2 scores obtained are then plotted, the resulting distribution is the χ2 distribution with one degree of freedom (denoted as

χ 1 2

). Now suppose we independently sample two χ2 scores from the

χ 1 2

distribution and we add their values, as done above in the case of the z scores. This process is performed an infinite number of times, and all the sums obtained are plotted. The resulting distribution is the χ2 distribution with two degrees of freedom (denoted as

χ 2 2

). This process can be generalised to the distribution of any sum of k random variables each having the

χ 1 2

distribution. The distribution of a sum of k random variables, each with the

χ 1 2

distribution, is itself a χ2 distribution with k degrees of freedom (denoted as

χ k 2

). Furthermore, the distribution of the sum of two random variables distributed as

χ a 2

and

χ b 2

, has a

a + b

distribution. For example if two independent random variables are distributed as

χ 2 2

and

χ 3 2

, then their sum is distributed as

χ 5 2

Learn more about book

eBook - ePub

Social Statistics

Managing Data, Conducting Analyses, Presenting Results

Thomas J. Linneman(Author)
2017(Publication Date)
Routledge
(Publisher)

ropractor . Although I like to drink chai, that’s not what we’re doing here. Although I appreciate tai chi, that’s not what we’re doing here. In the world of statistical tests, the chi-square test is a relatively easy one to use. It contrasts the frequencies you observed in the crosstab with the frequencies you would expect if there were no relationship among the variables in your crosstab. It makes this contrast with each cell in the crosstab. We’ll use the third sex/gun crosstab from earlier, the one where your gut wasn’t completely sure if there was a generalizable relationship. Here it is, with its frequencies expected crosstab next to it:

■ Exhibit 4.12:

Frequencies Observed and Frequencies Expected

Let’ s first find the difference between the frequencies observed (hereafter referred to as f o ) and the frequencies we would expect (hereafter referred to as f e ):

■ Exhibit 4.13:

Differences between Observed and Expected Frequencies