Mathematics

Two Categorical Variables

Two categorical variables refer to two variables that are both categorical in nature, meaning they represent distinct categories or groups. In statistical analysis, these variables are often used to examine relationships and associations between different categories. This can be done through methods such as contingency tables and chi-square tests to determine if there is a significant relationship between the two variables.

Written by Perlego with AI-assistance

10 Key excerpts on "Two Categorical Variables"

  • Exploratory and Multivariate Data Analysis
    Chapter 4

    2-D Statistical Data Analysis

    1 Introduction

    In practice, many users stop their statistical investigations after having studied the variables independently from each other. However, they have used only 1-D analysis, and usually cannot put forward any explanations of any causality for their data. For example, a questionnaire with two questions can be analyzed using two frequency distributions. However, studying each frequency distribution individually cannot provide any relation between the two questions. Another example is given by the study of two quantitative variables, for which as many statistical characteristics or graphics as required can be built (cf . Chapter 3 ). They cannot help, however, to explain the relation between the two variables. The only way to approach the explanation of how one variable is related to another is to build a relation between the two variables. That is the objective of 2-D statistical data analysis, where two variables are analyzed according to the following points of view:
    1.  To express and highlight the relationship between two variables, in order to show the statistical dependence between them.
    2.  When possible, to sum up the relations by a law of variation or a statistical dependence, and to characterize them by a numerical coefficient independent of the units of measure of the variables.
    These studies vary according to the type of variables involved (quantitative, categorical, chronological, logical, etc.), and are presented in what follows.

    2 2-D Analysis of Two Categorical Variables

    2.1 Contigency Data Sets

    The way to express a relation between Two Categorical Variables is to compute a contingency data set as follows: Let Two Categorical Variables be denoted by V 1 and V 2 :
    V 1 , has h forms denoted by A 1 A 2 , …
    , Ah
    ;
    V 2 has k forms denoted by B 1 , …
    , Bk .
    For each couple of forms (
    Ai
    ,
    Bj
    ), we compute the number of observations, denoted by
    nij
    , that possesses the forms A, and
    Bj
  • Introductory Statistics and Analytics
    eBook - ePub
    • Peter C. Bruce(Author)
    • 2015(Publication Date)
    • Wiley
      (Publisher)
    Chapter 5 Relationship Between Two Categorical Variables
    In this chapter, we look at two-way tables, also called 2 × 2 tables, in which rows and columns represent binary values of two different variables. 2 × 2 tables are a subset of r × c tables (short for row × column), where the row and columns represent more than two values of their variables. After completing this chapter, you should be able to
    • build and interpret 2 × 2 tables,
    • specify how to do a resampling test for a difference between two proportions,
    • perform probability calculations involving conditional probabilities,
    • perform basic Bayesian calculations
    • define and test for statistical independence

    5.1 Two-Way Tables

    We now return to the data previously mentioned on admission to graduate schools. The data are for the six largest academic departments, and the issue under consideration was admission rates for men and women. We begin with the Two Categorical Variables, Gender and Admit. As before, we look at eight folks in a fragment of the database (Table 5.1 ).
    Table 5.1
    Applicants to Graduate School (Small Subset)
    Gender Dept. Admit
    Male A Admitted
    Male B Rejected
    Male A Admitted
    Female C Rejected
    Male A Admitted
    Female B Rejected
    Male C Admitted
    Female B Admitted
    Ignoring the department variable for now, the first person is a male who was admitted, so he goes in Table 5.2 .
    Table 5.2
    Building a 2 × 2 Table
    Female Male
    Admitted 1
    Rejected
    Then, we have a rejected male, another admitted male, and a rejected female. We will enter these data as counts in each cell (Table 5.3 ).
    Table 5.3
    Female Male
    Admitted 2
    Rejected 1 1
    Finishing the table and adding row and column totals gives results that certainly look discriminatory (Table 5.4 ). However, these are only eight cases out of thousands. Table 5.5 is the full table for all 4526 applicants. Table 5.6 gives the data by percent. The column and row labeled “All” are termed marginal
  • Presenting Statistical Results Effectively
    4 Exploring and Describing Relationships
    Just as exploring univariate distributions is an important undertaking, so too is visualizing relationships among variables. Understanding relationships among variables at an early stage can facilitate proper model specification. It can also shed light on features yet to be uncovered by model diagnostics later in the research process. The primary focus of this chapter is graphical methods for exploring bivariate relationships, though when appropriate, information from more variables will also be incorporated. We will discuss both displays used largely for diagnostics purposes, which typically are not communicated to wider audiences, and graphs that are intended for presentations to larger audiences, including publication.
    The chapter is organized into three main sections that highlight the three types of relationships that we will encounter – Two Categorical Variables, one categorical and one continuous variable, and two continuous variables. We also offer some advice for multivariate displays. Many of the methods discussed in Chapter 3 will be adapted (and sometimes extended) to explore relationships, especially relationships involving categorical variables. We will introduce some new plots, including, but not limited to, mosaic plots, heatmaps, some useful enhancements and extensions to scatterplots, and bivariate and three-dimensional density estimation.

    4.1 Two Categorical Variables

    4.1.1 Cross-tabulation

    Cross-tabulation is a well-known method for presenting the relationship between Two Categorical Variables. A cross-tabulation (also known as a crosstab, contingency table or joint frequency distribution) displays the counts (and proportions or percentages) of observations of an explanatory variable, x, that fall into each category of a response variable, y. Although the table can be oriented otherwise, it is usually most effective to position the dependent variable on the vertical axis (or y-axis). By doing so, the dependent variable defines the rows of the table. The explanatory variable is then necessarily placed along the x
  • Statistics for Politics and International Relations Using IBM SPSS Statistics
    4 Describing categorical data
    Chapter summary
    This chapter introduces you to the production and interpretation of frequency tables and crosstabs using categorical data. Categorical data is some of the most common data in social statistics and is often used to describe populations. This can be done through simple frequencies of occurrences of a single variable or through bivariate tables that show us pairings of options between two variables. The results can be expressed through counts of the number of times an answer or pair of answers occur, or through percentages that represent this as a proportion. Each of the options conveys the same data differently and helps us to make different points, so it’s very important to be able to produce and interpret the data correctly.

    Objectives

    In this chapter, you will learn:
    • How to produce a table with one categorical variable
    • How to produce a crosstabulation with Two Categorical Variables
    • How to produce and interpret a variety of percentages
    • How to recode variables to create categorical variables or to combine into fewer categories
    • How to customize the appearance of the output tables.

    Introduction

    Many politics datasets, like the ESS, are dominated by categorical variables and have very few continuous variables. Politics researchers frequently want to answer questions about voting intention, history and party identification; educational qualifications; religion; marital status; ethnicity and citizenship; and public opinion on a range of issues. All of these common variables are categorical. This chapter works through the most common ways of describing this data by producing univariate frequency tables and bivariate crosstabulations.
  • Quantitative Data Analysis with SPSS 12 and 13
    eBook - ePub
    • Alan Bryman, Duncan Cramer(Authors)
    • 2004(Publication Date)
    • Routledge
      (Publisher)
    Chapter 8 Bivariate analysis: exploring relationships between two variables
    • Crosstabulation
    • Crosstabulation and statistical significance: the chi-square (χ2) test
    • Correlation
    • Other approaches to bivariate relationships
    • Regression
    • Overview of types of variable and methods of examining relationships
    • Exercises
    THIS CHAPTER FOCUSES on relationships between pairs of variables. Having examined the distribution of values for particular variables through the use of frequency tables, histograms, and associated statistics as discussed in Chapter 5 , a major strand in the analysis of a set of data is likely to be bivariate analysis – how two variables are related to each other. The analyst is unlikely to be satisfied with the examination of single variables alone, but will probably be concerned to demonstrate whether variables are related. The investigation of relationships is an important step in explanation and consequently contributes to the building of theories about the nature of the phenomena in which we are interested. The emphasis on relationships can be contrasted with the material covered in the previous chapter, in which the ways in which cases or subjects may differ in respect to a variable were described. The topics covered in the present chapter bear some resemblance to those examined in Chapter 7 , since the researcher in both contexts is interested in exploring variance and its connections with other variables. Moreover, if we find that members of different ethnic groups differ in regard to a variable, such as income, this may be taken to indicate that there is a relationship between ethnic group and income. Thus, as will be seen, there is no hard-and-fast distinction between the exploration of differences and of relationships.
    What does it mean to say that two variables are related? We say that there is a relationship between two variables when the distribution of values for one variable is associated with the distribution exhibited by another variable. In other words, the variation exhibited by one variable is patterned in such a way that its variance is not randomly distributed in relation to the other variable. Examples of relationships that are frequently encountered are: middle-class individuals are more likely to vote Conservative than members of the working class; infant mortality is higher among countries with a low per capita income than those with a high per capita income; work alienation is greater in routine, repetitive work than in varied work. In each case, a relationship between two variables is indicated: between social class and voting behaviour; between the infant mortality rate and one measure of a nation’s prosperity (per capita income); and between work alienation and job characteristics. Each of these examples implies that the variation in one variable is patterned, rather than randomly distributed, in relation to the other variable. Thus, in saying that there is a relationship between social class and voting behaviour from the above example, we are saying that people’s tendency to vote Conservative is not randomly distributed across categories of social class. Middle-class individuals are more likely to vote for this party; if there was no relationship we would not be able to detect such a tendency since there would be no evidence that the middle and working classes differed in their propensity to vote Conservative.
  • Statistical Methods for Communication Science
    • Andrew F. Hayes(Author)
    • 2020(Publication Date)
    • Routledge
      (Publisher)
    Of course, you do not know whether readers of the magazines the ad is going to be run in would have the same preferences. For this reason, it would have been wise to obtain the participants from a pool of likely readers of the magazine. But even so, if the things that lead the 150 participants to prefer magazine A in relatively large numbers are likely to have the same effect on people who read these magazines (such as color features, font size, etc), it is probably safe to assume that readers of the target magazines will be more attracted to advertisement A than they would have been to the other advertisements. If you aren’t confident that such processes would be at work in people who will actually be exposed to the advertisement, then which of the advertisements the actual magazine readers will find most attention grabbing remains anyone’s guess.

    11.2 Association Between Two Categorical Variables

    In Chapters 4 and Chapters 5, I introduced the concepts of association and independence. Two variables X and Y are said to be associated or dependent if certain values on X are paired more frequently with certain values of Y. For example, if two variables are positively associated, this means that relatively high scores on X tend to paired relatively frequently with relatively high scores on Y, and relatively low scores on X tend to be paired more frequently with relatively low scores on Y.
    In Chapter 4, I focused entirely on association between quantitative variables. However, we are often interested in quantifying and testing for association between variables that are categorical. For instance, a researcher might want to know whether an advertisement that advocates the importance of regular screening for a particular type of cancer is more effective in getting people to take the test when it emphasizes the peace of mind it brings relative to when it attempts to scare the person in to acting. If the outcome variable is measured as whether or not the person took the test within 6 months of hearing the message, then both of the variables are categorical. Or a researcher might want to know whether males are more likely than females to agree to a request when they are told whether or not their friends agreed relative to when they are not so told. Again, both variables in this case are categorical.
  • Quantitative Data Analysis with Minitab
    eBook - ePub

    Quantitative Data Analysis with Minitab

    A Guide for Social Scientists

    • Alan Bryman, Duncan Cramer(Authors)
    • 2003(Publication Date)
    • Routledge
      (Publisher)

    Chapter 8 Bivariate analysis Exploring relationships between two variables

    This chapter focuses on relationships between pairs of variables. Having examined the distribution of values for particular variables through the use of frequency tables, histograms, and associated statistics as discussed in Chapter 5 , a major strand in the analysis of a set of data is likely to be bivariate analysis— how two variables are related to each other. The analyst is unlikely to be satisfied with the examination of single variables alone, but will probably be concerned to demonstrate whether variables are related. The investigation of relationships is an important step in explanation and consequently contributes to the building of theories about the nature of the phenomena in which we are interested. The emphasis on relationships can be contrasted with the material covered in the previous chapter, in which the ways in which cases or subjects may differ in respect to a variable were described. The topics covered in the present chapter bear some resemblance to those examined in Chapter 7 , since the researcher in both contexts is interested in exploring variance and its connections with other variables. Moreover, if we find that members of different ethnic groups differ in regard to a variable, such as income, this may be taken to indicate that there is a relationship between ethnic group and income. Thus, as will be seen, there is no hard-and-fast distinction between the exploration of differences and of relationships.
    What does it mean to say that two variables are related? We say that there is a relationship between two variables when the distribution of values for one variable is associated with the distribution exhibited by another variable. In other words, the variation exhibited by one variable is patterned in such a way that its variance is not randomly distributed in relation to the other variable. Examples of relationships that are frequently encountered are: middle class individuals are more likely to vote Conservative than members of the working class; infant mortality is higher among countries with a low per capita income than those with a high per capita income; work alienation is greater in routine, repetitive work than in varied work. In each case, a relationship between two variables is indicated: between social class and voting behaviour; between the infant mortality rate and one measure of a nation’s prosperity (per capita income); and between work alienation and job characteristics. Each of these examples implies that the variation in one variable is patterned, rather than randomly distributed, in relation to the other variable. Thus, in saying that there is a relationship between social class and voting behaviour from the above example, we are saying that people’s tendency to vote Conservative is not randomly distributed across categories of social class. Middle class individuals are more likely to vote for this party; if there was no relationship we would not be able to detect such a tendency since there would be no evidence that the middle and working classes differed in their propensity to vote Conservative.
  • Data Driven Statistical Methods
    12.1 ) for all patients so treated in the past month. Again, however, our method of testing for no association using conditional inference would be the same, i.e. conditional upon observed marginal totals, but the validity of any statistical inferences is restricted by the nonrandomness of the method of data selection. Strictly the inferences apply to that data only.

    12.2   Inferences in 2×2 tables

    The 2×2 table is one of the simplest data formats in statistics. Simple indicators and measures of association for these tables often extend readily to r ×c tables and also to comparisons of sets of k tables, each 2×2, which may be looked upon as three-way 2×2×k tables. Many also extend with modification or added complexity to three-way r ×c ×k tables and also to tables in more than 3 dimensions.
    Even in a 2×2 table there are basic differences between appropriate models for the situation where one category (rows) is explanatory and the other (columns) is a response and the situation where both categories are responses. An example of the former was given by (12.1 ) while (12.2 ) gives an example of the latter and records the numbers of people in a random sample of 100 who, in the previous 12 months, have or have not used two forms of public transport at least once:
    Train
    Yes No
    Bus Yes 43 27 (12.2)
    No 19 11
    The distinction between explanatory and response variables is not always clear cut. For example if people are classified into two groups according to salary earned (e.gJE20 000 per annum or less, more than £20 000 per annum) and by job satisfaction (high or low) the salary division may be regarded by some as a variable that explains a response job satisfaction or by others as a response to an explanatory variable job satisfaction in that satisfaction is likely to motivate workers in a way that leads to promotion and better prospects of higher salary. Fortunately for some inferences we need not make such distinctions, for certain inferences about association will not be affected. The distinction may be more important in general r ×c tables. What is then often of more impact (section 12.3
  • Introductory Statistics for Health and Nursing Using SPSS

    8

    COMPARING Two Categorical Variables

    INTRODUCTION

    Many variables from health surveys and questionnaires produce categorical data such as gender, age group or smoking status. Initial analysis using frequencies as in Chapter 6 should be the starting point with this type of data. Following that, the simplest analyses of these is to determine how many participants and what percentage fall into the groups created by the two variables.
    Once an analysis plan has been established, then significance tests can be set up. This chapter will explain the general process involved in setting up significance tests. Where one variable is the outcome (for example survival status, died versus survived) and the other is a possible explanatory variable (for example gender, male versus female) then the most appropriate test (ignoring external variables, which will be explored in Chapter 12 ) is the chi-square test or Fisher’s exact test depending on whether the assumptions of the chi-square test have been met. Depending on the study design, it is often useful to present statistics to quantify the difference between the two groups. This can be done using risks and relative risks.
    Cohen’s Kappa can be used to quantify the level of agreement between two or more assessors who are assessing the same participants using the same criteria. Alternately (with care), kappa can be used to see whether participants have the same opinions or beliefs when assessed at two or more time points.
    Finally, this chapter explains sensitivity and specificity; used in determining what percentage of the time a screening test for a given condition/disease gives the same diagnosis as the gold standard for diagnosing that condition/disease. This will be extended to show how cut-offs can be established by maximising sensitivity and specificity. This will be shown visually using ROC curves.
    This chapter will be illustrated using a number of datasets. The first is a study of younger (18–49 years old) women staff working at a university and women students’ knowledge of and attitudes to breast cancer. It will also use a workplace study of back pain. The final dataset that will be used in this chapter comes from a physiotherapy setting, where the mobility of a group of older people was assessed and related to their falls status.
  • Quantitative Data Analysis with IBM SPSS 17, 18 & 19
    eBook - ePub
    • Alan Bryman, Duncan Cramer(Authors)
    • 2012(Publication Date)
    • Routledge
      (Publisher)

    Chapter 8

    Bivariate analysis: exploring relationships between two variables

    ■  Crosstabulation
    ■  
    Crosstabulation with statistical significance: the chi-square (χ 2 ) test
    ■  Correlation
    ■  Other approaches to bivariate relationships
    ■  Regression
    ■  Overview of types of variable and methods of examining relationships
    ■  Exercises
    T HIS CHAPTER FOCUSES on relationships between pairs of variables. Having examined the distribution of values for particular variables through the use of frequency tables, histograms, and associated statistics as discussed in Chapter 5 , a major strand in the analysis of a set of data is likely to be bivariate analysis – how two variables are related to each other. The analyst is unlikely to be satisfied with the examination of single variables alone, but will probably be concerned to demonstrate whether variables are related. The investigation of relationships is an important step in explanation and consequently contributes to the building of theories about the nature of the phenomena in which we are interested. The emphasis on relationships can be contrasted with the material covered in the previous chapter, in which the ways in which cases or participants may differ in respect to a variable were described. The topics covered in the present chapter bear some resemblance to those examined in Chapter 7
Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.