Mathematics

Sample Proportion

The sample proportion is a statistical measure that represents the proportion of a specific attribute within a sample. It is calculated by dividing the number of items with the attribute by the total sample size. The sample proportion is often used to estimate the proportion of the attribute in the entire population.

Written by Perlego with AI-assistance

10 Key excerpts on "Sample Proportion"

  • Statistical Concepts - A First Course
    • Debbie L. Hahs-Vaughn(Author)
    • 2020(Publication Date)
    • Routledge
      (Publisher)
    p , is defined as
    p =
    f n
    where f is the number of frequencies in the sample that fall into the category of interest (e.g., the number of individuals who support the candidate), and n is the total number of units (e.g., individuals) in the sample. The Sample Proportion p is thus a sample estimate of the population proportion, π. One way we can estimate the population variance is by the sample variance s 2 = p (1 − p ) and the population standard deviation of a proportion can be estimated by the sample standard deviation
    s =
    p
    (
    1 p
    )
    .
    The next concept to discuss is the sampling distribution of the proportion. This is comparable to the sampling distribution of the mean discussed in Chapter 5 . If one were to take many samples, and for each sample compute the Sample Proportion p , then we could generate a distribution of p . This is known as the sampling distribution of the proportion . For example, imagine that we take 50 samples of size 100 and determine the proportion for each sample. That is, we would have 50 different Sample Proportions each based on 100 observations. If we construct a frequency distribution of these 50 proportions, then this is actually the sampling distribution of the proportion.
    In theory, the Sample Proportions for this example could range from .00 (p = 0/100) to 1.00 (p
  • An Introduction to Statistical Concepts
    • Debbie L. Hahs-Vaughn, Richard Lomax(Authors)
    • 2020(Publication Date)
    • Routledge
      (Publisher)
    The next step is to discuss the corresponding sample statistics for the proportion. The Sample Proportion, p, is defined as p = f N where f is the number of frequencies in the sample that fall into the category of interest (e.g., the number of individuals who support the candidate), and n is the total number of units (e.g., individuals) in the sample. The Sample Proportion p is thus a sample estimate of the population proportion, π. One way we can estimate the population variance is by the sample variance s 2 = p (1 − p) and the population standard deviation of a proportion can be estimated by the sample standard deviation s = p (1 − p). The next concept to discuss is the sampling distribution of the proportion. This is comparable to the sampling distribution of the mean discussed in Chapter 5. If one were to take many samples, and for each sample compute the Sample Proportion p, then we could generate a distribution of p. This is known as the sampling distribution of the proportion. For example, imagine that we take 50 samples of size 100 and determine the proportion for each sample. That is, we would have 50 different Sample Proportions each based on 100 observations. If we construct a frequency distribution of these 50 proportions, then this is actually the sampling distribution of the proportion. In theory, the Sample Proportions for this example could range from.00 (p = 0/100) to 1.00 (p = 100/100), given that there are 100 observations in each sample. One could also examine the variability of these 50 Sample Proportions. That is, we might be interested in the extent to which the Sample Proportions vary. We might have, for one example, most of the Sample Proportions falling near the mean proportion of.60. This would indicate for the candidate data that (a) the samples generally support the candidate, as the average proportion is.60, and (b) the support for the candidate is fairly consistent across samples, as the Sample Proportions tend to fall close to.60
  • Statistics from A to Z
    eBook - ePub

    Statistics from A to Z

    Confusing Concepts Clarified

    • Andrew A. Jawlik(Author)
    • 2016(Publication Date)
    • Wiley
      (Publisher)
    Smaller Margin of Error specified
  • Estimated Proportion closer to 0.5
    1. After a certain point, larger Sample Sizes yield diminishing returns in accuracy.

    Explanation

    1. Minimum Sample Sizes are calculated very differently for Count data and Measurement Data. This article is about Sample Sizes for Proportions of Count data.
      The Part 2 article is about Sample Sizes for Measurement/ Continuous data.
    A Proportion is a percentage expressed as a decimal. So 50% is 0.50 and 100% is 1.0. Statistical formulas usually use the Proportion format.
    Proportions are calculated from Count (aka Discrete) data. These are non-negative integer numbers, e.g., 0, 1, 2, 3, etc.

    Examples of Proportions of Count Data

    Count
    Sample Size
    Proportion
    66 people said they'd vote for Candidate A 120 people were surveyed 66/120 = 0.55
    8 people preferred strawberry ice cream 20 people in a focus group 8/20 = 0.40
    6 defective items production run of 1000 6/1000 = 0.006
    The symbol for a Proportion is p. That is also the symbol for Probability. The two concepts are related. If the Proportion of people favoring Candidate A is 0.55 then the Probability of any one person favoring Candidate A is 0.55.
    If all you want is a quick number – without understanding what's behind it – here are the minimum Sample Sizes for a 95% Confidence Level (the most common) and for several values of the Margin of Error (symbol MOE or E).

    95% Confidence Level (the most common)

    MOE
    1% 2% 3% 4% 5% 6% 7% 8% 9% 10%
    Sample Size (n)
    9604 2401 1068 601 385 267 196 151 119 97
    These results assume you don't know the Population Size (N). If you do, divide the Sample Size above by 1 + n/N. But if you have to do that, you might as well just do a web search on “Sample Size Calculator” and just enter the relevant numbers on one of those websites.
    1. The report of the results of our statistical analysis might use a statement like this:
    As we'll see in Key to Understanding #3, the formula for calculating n, the Sample Size, includes four symbols, α, p, MOE, and z. The statement above helps explain what α, p, and MOE (Margin of Error) are about. z is derived from α
  • A Guide to Business Statistics
    • David M. McEvoy(Author)
    • 2018(Publication Date)
    • Wiley
      (Publisher)
    Therefore, on average, the Sample Proportion will equal the population proportion. This means that the Sample Proportion is an unbiased estimate of the population proportion. The importance of this result is that even if we did not know the value of (which will be the case in all real-world applications of inferential statistics), we know that the collection of all possible Sample Proportions will be distributed evenly around it. This property will hold for any sample size. 6.5.2 The Shape Like the sampling distribution of the mean, the shape of the sampling distribution of a proportion depends on the sample size. What is cool (if statistics can be cool) is that if the sample size is large enough, then the sampling distribution can be approximated as normal. How large does the sample need to be? As long as and, then we can assume that the shape of the sampling distribution is normal. When these two conditions are satisfied, we call that the normal approximation of the binomial distribution. In those cases, the distribution will follow a -distribution, which means that we already have a good understanding of what it looks like and how to calculate the probabilities. You may be asking yourself in the real-world situations in which we do not know the value of, how can we determine if our sampling distribution can be approximated as normal? Well, in those cases, we simply plug in our Sample Proportion. This would be and. 6.5.3 The Standard Deviation We call the standard deviation of a sampling distribution the standard error. It measures the average deviation of from for a given sample size (see formula A.7). The intuition is that as the sample size goes up, the closer the Sample Proportion is going to be clustered around the population proportion. Therefore, as the sample size goes up, the standard error of a proportion goes down. Let us take a look at a few graphs in order to get a better feel for the relationship between and the sampling distribution of a proportion
  • Painless Statistics
    p (the proportion of the population that possesses that characteristic).
    Is this a good estimate? The short answer is yes, because the Sample Proportion is an unbiased estimator of the population proportion. This means there is no systematic underestimating or overestimating associated with the Sample Proportion. But just how good of an estimate is it? Statistics can help give a clearer answer to this question. Again, the key is using facts about the sampling distribution.
    Sampling and Variability
    To understand how good your estimate of the population proportion is, you have to understand the effect that variability in sampling can have on the Sample Proportion. Here’s an example that helps illustrate that.
    Example 5:
    Recall the scenario from Example 4, but now imagine that you took a second sample of 100 American adults and asked them, “Do you support this spending bill?” This time, 63 said yes. In this case, the Sample Proportion is = = 0.63. This is different from the first Sample Proportion of 0.72, but this isn’t cause for alarm. This is a result of the inherent variability in the sampling process.
    You have learned that random processes always involve some random variation. The probability of flipping tails on a fair coin is 0.5, but if you flip a coin 100 times, you wouldn’t expect to see exactly 50 tails every time. The number will be around 50. If you repeated this experiment over and it over, though, sometimes it will be higher than 50 and sometimes it will be lower.
    Taking a random sample of size 100 from a population is just like flipping a coin 100 times. If you take repeated random samples of 100 Americans and ask them if they support this spending bill, you aren’t going to get the exact same proportion of yes’s every single time. Sometimes, the proportion might be a bit high; other times, it may be a bit low. It’s natural for two different samples to produce two different sample statistics because of the variability inherent in the sampling process. The consequence of this variability can be seen in the spread of the sampling distribution, and there are tools for understanding the spread of the data.
  • Working With Sample Data
    As municipal, state, and national election polls predict popular support for favored candidates and causes, the estimation and comparison of population proportions play an important role in anticipating and expressing the voice of the people and their democratic decisions. Preelection polls boast support for this or that candidate or cause at a certain level, within plus or minus a stated percent. There are few statistical topics as widely publicized or as important to the conduct of political processes as the estimation and comparison of population proportions. The difference between two population proportions plays an equally important role in business, in market research, business forecasting, financial auditing, and analysis of comparative defect rates, to name a few. At the heart of the proportion is a count, not a measurement, of sampled elements. When we sort a sample into subgroups—those elements that do meet a certain criterion and those that do not—and then produce a count of these subgroups, we use a proportion to compare the results.
    Inferences about (p 1 p 2 ) are based on two random samples from two unrelated populations. The two samples do not have to be the same size. We summarize the sample statistics for each, including sample sizes, the number of successes in each sample, and the Sample Proportions. The best estimate for the population parameter (p 1 p 2 ) is the sample statistic ( 1 2 ), where p 1 is the proportion for population 1, p 2 is the proportion for population 2, 1 = is the proportion for sample 1, and 2 = is the proportion for sample 2. Since we assume the population proportions are equal in the null hypothesis, we combine the two samples and form a single pooled estimate of the population proportion, pooled = , which we use in the calculation of the test statistic, if the sample sizes are sufficiently large.
    If the sample sizes are sufficiently large, the sampling distribution of ( 1 2 ) can be approximated by the standard normal, or z, distribution. As with the considerations for a single population proportion, what constitutes “sufficiently large” depends on both the size of the sample and the proportion of its population that satisfies the characteristic of interest. In the case of two population proportions, all four computations must generate a minimum expected count of 5: n 1 ≥ 5, n 1 • (1 − 1 ) ≥ 5, n 2 2 ≥ 5, n 2 • (1 − 2 ) ≥ 5.
    Example 5.6.
    A local baking company is testing some new cracker recipes. Their goal is to deliver boxes of crackers that pass consumer taste tests but have no broken crackers after shipping. From taste tests, the bakers narrowed the competition to two recipes, A and B. Bakers then produced each recipe, boxed the crackers, and loaded them onto delivery trucks. Drivers were told to take the boxes on the normal route but to deliver them back to the bakery at the end of their route rather than off-load them at consumer outlets. Once returned, the boxes were opened and inspected. A box with no broken crackers was scored a “0” and a box with one or more broken was scored a “1.” Results are as follows:
  • Illuminating Statistical Analysis Using Scenarios and Simulations
    • Jeffrey E. Kottemann(Author)
    • 2017(Publication Date)
    • Wiley
      (Publisher)
    Answer: Just like the survey example above. Just replace the words “agree” and “disagree” with the words “heads” and “tails.”
    Below are two more key terms. (From now on, I will underline and explain new terminology as it comes up.)
    Population is the group(s) you are investigating and want to generalize your results to. In the current example, it is the entire community. If, instead, you wanted to investigate only community females, you would say that you will sample from the population of community females. If you wanted to investigate registered voters across the United States, you would say that you will sample from the population of registered voters in the United States. You sample randomly from the population in order to get unbiased sample statistics that are used as estimates for the true population statistics. (There are other sampling strategies, but we'll stick with random sampling for our examples.)
    Population statistic is the unknown, true overall value that you are trying to estimate using sample statistics. At the beginning of this chapter, we said the true population (community) proportion was 0.40, but that only imaginary omniscient beings can know such things. The earthly surveyors used surveys to get Sample Proportions to use as estimates of the population proportion.
    Passage contains an image

    11 It Has Been the Normal Distribution All Along

    I remarked in an earlier chapter that this “trick” formula
    works because the shape of the sampling distribution histograms we have been seeing are expected to become the extra-special bell-shape called the normal distribution when our sample size is large enough.1
    The 1.96 in the formula is there instead of some other number because the distribution is a normal distribution. And with a normal distribution of binomial proportions, we expect 95% of its contents to lie within the interval of times . This is a fact about the normal distribution. It is a fact much like pi ( ) is a fact about the relationship between a circle's diameter and circumference (
  • Social Statistics
    eBook - ePub

    Social Statistics

    Managing Data, Conducting Analyses, Presenting Results

    • Thomas J. Linneman(Author)
    • 2021(Publication Date)
    • Routledge
      (Publisher)
    Chapter 5

    Using a Sample Mean or Proportion to Talk About a Population

    Confidence Intervals

    DOI: 10.4324/9781003220770-5
    This chapter covers …
    • … building a probability distribution of sample means
    • … how to find and interpret the standard error of a sampling distribution
    • … what the Central Limit Theorem is and why it is important
    • … population claims and how to put them to the test
    • … how to build and interpret confidence intervals
    • … how a researcher used confidence intervals to study transgender health inequality
    • … how researchers used confidence intervals to study Uber and traffic fatalities

    Introduction

    In this chapter, we continue our exploration of inference, going through some procedures that you will find strikingly similar to those in the chi-square chapter. Whereas in the chi-square chapter, we dealt with variables of the nominal or ordinal variety, here we deal with ratio-level variables. Our attention turns away from sample crosstabs and toward sample means (and, at the end of the chapter, proportions). But keep in mind that the inference goal remains the same: we will use sample means in order to make claims about population means. Just as we talked about the chi-square probability distribution, we’ll start this chapter with a distribution of sample means.

    Sampling Distributions of Sample Means

    Imagine a hypothetical class with 100 students in it. These students will serve as our population: it is the entire group of students in which we are interested. They get the following hypothetical grades:
    Exhibit 5.1 Grades for a Population of 100 Students: Frequency Distribution
    Grade # of Students Receiving This Grade
    1.0 1
    1.1 1
    1.2 1
    1.3 2
    1.4 2
    1.5 2
    1.6 2
    1.7 3
    1.8 3
    1.9 3
    2.0 4
    2.1 4
    2.2 5
    2.3 6
    2.4 7
    2.5 8
    2.6 7
    2.7 6
    2.8 5
    2.9 4
    3.0 4
    3.1 3
    3.2 3
    3.3 3
    3.4 2
    3.5 2
    3.6 2
    3.7 2
    3.8 1
    3.9 1
    4.0 1
    Source: Hypothetical data.
    Here is a bar graph of this frequency distribution:
  • Social Statistics
    eBook - ePub

    Social Statistics

    Managing Data, Conducting Analyses, Presenting Results

    • Thomas J. Linneman(Author)
    • 2017(Publication Date)
    • Routledge
      (Publisher)
    Chapter 5 Using a Sample Mean or Proportion to Talk About a Population Confidence Intervals

    This chapter covers ...

    • . . . building a probability distribution of sample means
    • . . . how to find and interpret the standard error of a sampling distribution
    • . . . what the Central Limit Theorem is and why it is important
    • . . . population claims and how to put them to the test
    • . . . how to build and interpret confidence intervals
    • . . . how a researcher used confidence intervals to study popular films
    • . . . how researchers used confidence intervals to study Uber and traffic fatalities

    Introduction

    In this chapter we continue our exploration of inference, going through some procedures that you will find strikingly similar to those in the chi-square chapter. Whereas in the chi-square chapter we dealt with variables of the nominal or ordinal variety, here we deal with ratio-level variables. Our attention turns away from sample crosstabs and toward sample means (and, at the end of the chapter, proportions). But keep in mind that the inference goal remains the same: we will use sample means in order to make claims about population means. Just as we talked about the chi-square probability distribution, we’ll start this chapter with a distribution of sample means.

    Sampling Distributions of Sample Means

    Imagine a hypothetical class with 100 students in it. These students will serve as our population: it is the entire group of students in which we are interested. They get the following hypothetical grades:
    Exhibit 5.1: Grades for a Population of 100 Students: Frequency Distribution
    Grade # of Students Receiving This Grade
    1.0 1
    1.1 1
    1.2 1
    1.3 2
    1.4 2
    1.5 2
    1.6 2
    1.7 3
    1.8 3
    1.9 3
    2.0 4
    2.1 4
    2.2 5
    2.3 6
    2.4 7
    2.5 8
    2.6 7
    2.7 6
    2.8 5
    2.9 4
    3.0 4
    3.1 3
    3.2 3
    3.3 3
    3.4 2
    3.5 2
    3.6 2
    3.7 2
    3.8 1
    3.9 1
    4.0 1
    Source: Hypothetical data.
    Here is a bar graph of this frequency distribution:
  • Practitioner's Guide to Statistics and Lean Six Sigma for Process Improvements
    • Mikel J. Harry, Prem S. Mann, Ofelia C. De Hodgins, Richard L. Hulbert, Christopher J. Lacke(Authors)
    • 2011(Publication Date)
    • Wiley
      (Publisher)
    13.7 .
    GLOSSARY
    central-limit theorem The theorem from which it is inferred that for a large sample size (n ≥ 30), the shape of the sampling distribution of is approximately normal. Also, by the same theorem, the shape of the sampling distribution of is approximately normal for a sample for which np ≥ 10and n (1 – p ) ≥ 10.
    estimator A sample statistic that is used to estimate a population parameter.
    finite population correction factor Multiple denoted by that is used in calculating the standard deviation of the sampling distribution of p or x when the sample size is more than 5% of the population size.
    mean of The mean of the sampling distribution of , denoted by , is equal to the population proportion p .
    mean of The mean of the sampling distribution of , denoted by , is equal to the population mean μ .
    population distribution The probability distribution of the population data.
    population proportion p The ratio of the number of elements in a population with a specific characteristic to the total number of elements in the population.
    Sample Proportion The ratio of the number of elements in a sample with a specific characteristic to the total number of elements in that sample.
    sampling distribution of The probability distribution of all the values of p calculated from all possible samples of the same size selected from a population.
    sampling distribution of The probability distribution of all the values of x calculated from all possible samples of the same size selected from a population.
    sampling error
  • Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.