Mathematics

Inference for Distributions of Categorical Data

Inference for distributions of categorical data involves using statistical methods to draw conclusions about the distribution of categorical variables in a population based on sample data. This typically includes techniques such as chi-square tests and confidence intervals for proportions. The goal is to make inferences about the underlying population distribution from the observed sample data.

Written by Perlego with AI-assistance

8 Key excerpts on "Inference for Distributions of Categorical Data"

Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.
  • Recipes for Science
    eBook - ePub

    Recipes for Science

    An Introduction to Scientific Methods and Reasoning

    • Angela Potochnik, Matteo Colombo, Cory Wright(Authors)
    • 2018(Publication Date)
    • Routledge
      (Publisher)

    ...Coin tosses, dice throws, LeBron’s free throws, voting intentions, temperatures on the days of September, and the decibel level in a bar can all be treated as random variables. For any of these variables, inferential statistics allows us to analyze relevant data sets to predict yet-to-be-measured values of those variables. For example, one might assess from a sequence of heads and tails whether the coin is fair, predict from LeBron’s past record whether his free throw success will improve over time, infer the efficacy of a medical drug from observed treatment effects, or predict from an opinion poll which candidate will win an election. In brief, statistical inference is a form of inductive inference that employs probability to better understand the real-world phenomenon underlying a known data set. It allows scientists to formulate expectations about what they would observe in a new data set or in the larger population and to assess how confident they can be about those expectations. Frequency Distributions and Probability Distributions The starting point for using inferential statistics is a properly organized data set. Frequency distributions offer one way to organize a given set of raw data before we can use it to make predictions about new observations. Frequency distributions are lists that include every possible value of a variable and the number of times each value of that variable appears in the data set, often organized into tables— Tables 5.2 and 5.3 in the previous chapter are examples of frequency distributions of students’ grades. Relative frequency distributions are frequency distributions that record the proportion of occurrences of the value of a certain variable instead of the absolute number of occurrences. By using relative frequency distributions, we record how often different values occur for the variable under consideration, relative to the total number of values in the data set...

  • Statistics without Mathematics

    ...8 Introduction to the Ideas of Inference Summary The key idea underlying this chapter is the sampling distribution. This chapter outlines some of the basic ideas of hypothesis testing and confidence intervals and introduces some terminology, including significance and P -value. Introduction Statistical inference concerns drawing conclusions about a population on the evidence of a sample taken from it. The possibility of doing this depends upon how the sample has been obtained. In the previous chapter we faced the problem of how to choose a sample which was representative, and concluded that a simple random sample was the simplest way of doing this. It is the method of drawing the sample which provides the bridge from the data that we have, the sample, to the data we would like to have had, the population. Such sampling methods are sometimes described as probability sampling because of the implicit probabilistic element in the definition in the sampling method. We have deliberately played down this aspect because probability ideas have not been used so far, although the notion underlies the more intuitive idea of all samples being ‘equally likely’. Inference is often seen as one of the more difficult aspects of Statistics, but it plays a dominant role in many presentations of the subject, including many at the elementary level. We shall therefore approach it in easy stages, contending that most of the ideas are familiar in everyday reasoning. The only difference here is that they are refined and made more precise. A central idea is that of the sampling distribution, but we have deferred this to the next chapter because it is possible to lay the foundation before bringing that concept into the picture. The Essence of the Testing Problem We begin by posing a very simple question, the answer to which introduces the essential ideas without introducing any new concepts...

  • Statistical Literacy at School
    eBook - ePub
    • Jane M. Watson(Author)
    • 2013(Publication Date)
    • Routledge
      (Publisher)

    ...6 Beginning Inference: Supporting a Conclusion Instructional programs from prekindergarten through grade 12 should enable all students to develop and evaluate inferences and predictions that are based on data. In grades 6–8 all students should— use observations about differences between two or more samples to make conjectures about populations from which the samples were taken; make conjectures about possible relationships between two characteristics of a sample on the basis of scatterplots of the data and approximate lines of fit; use conjectures to formulate new questions and plan new studies to answer them. 1 6.1 Background Inference is a term that is usually associated with chapters near the end of introductory statistics texts where formal statistical tests are introduced with p -values to judge significance. At the end of secondary schooling some students meet z -tests, t -tests, chi-square tests, and correlation coefficients. These techniques are used to evaluate formally stated hypotheses and are based on quite a few underlying assumptions and complex calculating formulas. For students who have not been introduced to ideas of inference in less complex circumstances, the big picture is often lost in the procedural details, and arithmetical errors can result in absurd conclusions because there is no intuition about the story the data are telling. Although in this circumstance teachers may encourage students to sketch graphs of data sets before rushing to calculate statistics, students are notoriously reluctant to follow such advice. 2 One of the big advantages of following advice such as that given at the start of this chapter by the National Council of Teachers of Mathematics is that students are more likely to learn good habits of data handling and analysis before they learn the details of hypothesis testing or forming confidence intervals...

  • Foundations of Crime Analysis
    eBook - ePub

    Foundations of Crime Analysis

    Data, Analyses, and Mapping

    • Jeffery T. Walker, Grant R. Drawve(Authors)
    • 2018(Publication Date)
    • Routledge
      (Publisher)

    ...Chapter 9 Making inferences from one place to another Chapter Outline Making inferences Concepts Assumptions Normal curve Probability Sampling Probability sampling Nonprobability sampling Central limit theorem Confidence intervals Hypothesis testing Null and research hypotheses Types of hypothesis tests Type I and Type II errors Conclusion Questions and exercises This chapter focuses on introducing two topics, inferential analysis and hypothesis testing, which are not as often employed by crime analysts. This is not to say that these topics are not important, however. We discuss these topics to provide an overview of key concepts. We will not do calculations of these concepts; but provide the principles to expand your statistical knowledge and assist in advancing crime analysis techniques by providing a foundational understanding. If you are interested in calculating some of these concepts or using inferential analyses, you can find them in any introductory statistics textbook. Making Inferences In previous chapters of the book, you read about data sources and how to examine data. In essence, you have learned how to describe data, either about people or places. You are also able to describe general characteristics related to a data set. Crime analysts typically have access to jurisdiction-wide data, providing a population jurisdictional data set. What if data across a jurisdiction do not exist but you have a sample? Say, for example, you only have good data on 20 of the 99 neighborhoods in the city. Can you draw conclusions about the whole city from a sample of data? In this chapter, we will discuss concepts and assumptions pertaining to inferential analyses. Inferential analyses are used because collecting data for an entire population is often unfeasible (because of time and/or cost). Imagine a police department that wants to conduct a survey of residents’ perceptions of officers. Surveying every resident would be a difficult and expensive task for any agency...

  • The SAGE Encyclopedia of Educational Research, Measurement, and Evaluation

    ...As an example, gender is measured on a nominal scale, as the following two outcomes, male and female, cannot be ordered in a meaningful way. Ordinal categorical variables, in contrast, contain categories of responses that have a natural ordering to them. The observed outcomes can be ranked or ordered based on this natural ordering, which provides meaning to the categories. As an example, age categories are measured using an ordinal scale, as the following two age categories, 20–29 and 30–39, have a meaningful order to them. The second category, 30–39, represents subjects who are older than those in the first category, 20–29. Probability Distributions The use of inferential statistics requires an assumption of the distributional properties of the variables of interest. The distributional assumption of the categorical dependent variable provides the theoretical distribution of responses in the population, which is the basis for the statistical analysis being performed. For categorical data, the four most common distributions utilized in inferential statistics are the binomial distribution, the multinomial distribution, the hypergeometric distribution, and the Poisson distribution. Binomial Distribution The binomial distribution for random variable X calculates the probability of observing the count, Y, of the number of successes in a fixed number of trials of a Bernoulli experiment. A Bernoulli experiment is a random event in which there are two outcomes that have a fixed probability of occurring. In a binomial distribution, one of those outcomes is deemed a “success.” In a total of n trails, these successes are counted and the outcome X is the frequency of occurrence of a successful outcome...

  • Statistical Methods for Communication Science
    • Andrew F. Hayes(Author)
    • 2020(Publication Date)
    • Routledge
      (Publisher)

    ...This statistic is also used to assess the discrepancy between a crosstabulation of frequencies and what that crosstabulation should look like if the two categorical variables are statistically independent. Rejection of the null hypothesis of independence leads to the inference that the two categorical variables are statistically related, dependent, or associated. Association between variables is one of the more common statistical forms that a research hypothesis takes, and we will explore additional means of testing for association in several of the upcoming chapters....

  • Statistics for Politics and International Relations Using IBM SPSS Statistics

    ...6 Inference with categorical data Chapter summary This chapter continues with Chapter 4 ’s introduction to crosstabs using categorical data, expanding to creating tables with more than two variables and providing some techniques for working with missing data with crosstabs. The rest of the chapter is focused on statistical inference with categorical data, moving from describing what is, to assessing the strength of, the relationship between two or more variables. We look at a range of inferential statistics, starting with statistical significance (p-values) and chi squared tests, then looking at measures of strength of association (Cramer’s V and adjusted standardized residuals). Objectives In this chapter, you will learn: How to work with missing data when using crosstabs How to produce a crosstabulation with more than two categorical variables How to produce and interpret measures of statistical significance How to produce and interpret measures of strength of association for relationships between categorical variables. Introduction This chapter uses advanced crosstab techniques to explore research questions about voter behaviour such as: Do Conservatives really have a higher voter turnout rate? Do Conservative voters have higher incomes? Do people with higher incomes turn out to vote more? This chapter starts by using some familiar variables from the previous chapter to illustrate why and how we might need to change the way that our missing data is classified when using crosstabs. The second part of the chapter shows how to produce and interpret a multivariate crosstab to test three hypotheses about voter behaviour drawn from the political science literature. The third part of the chapter introduces the concept of statistical significance and looks at the most commonly used test for statistical significance using categorical variables, the chi squared test...

  • Mathematics for Biological Scientists
    • Mike Aitken, Bill Broadhurst, Stephen Hladky(Authors)
    • 2009(Publication Date)
    • Garland Science
      (Publisher)

    ...These terms can be misleading, as they make students think that the conclusions are different in the two cases. For example, a common mistake is to think that a statistically significant result for a two-tailed test means we decide that ‘the means are different’, but not which is larger, whereas we can only conclude that ‘one is larger than the other’ from a one-tailed test. This cannot be true, because a result that is significant with the two-tailed test will also be significant if we used the one-tailed test. For either test, we simply conclude from a statistically significant result that we have evidence against H 0 : that is, evidence that there is a difference, in the direction observed. 11.5 Hypothesis tests for categorical data So far, we have looked only at hypothesis tests for comparing measurements of amount; that is, measurements on interval or ratio scales that represent the magnitude of some quality (height, weight, blood cell count, and so on). The final techniques we will introduce are those for assessing measurements on a purely nominal scale. Here each measurement is of the type of a given observation (hair color, sex, blood group, and so on), rather than its magnitude. When we have this kind of measurement, we end up with data that tell us how many observations are of each type, so we know the frequency, or proportion, of a sample that falls within each category. There are research hypotheses that we might like to test for such data, including the very first example of a hypothesis test in this chapter ‘are red fish more common than blue fish?’. Although we were able to work out a p value for this type of question directly, it is much less obvious how to do this for a hypothesis with more than two categories, for example ‘do the proportions of offspring from crossbreeding between heterozygous parents occur in the predicted genotype ratio of 1:2:1?’. It turns out that there is a very simple way to test this type of hypothesis...