Mathematics

Sampling Distribution

A sampling distribution is a probability distribution of a sample statistic, such as the mean or standard deviation, calculated from multiple samples of the same size taken from a population. It provides information about the variability of the sample statistic and is used to make inferences about the population parameter. Sampling distributions are fundamental in statistical inference and hypothesis testing.

Written by Perlego with AI-assistance

10 Key excerpts on "Sampling Distribution"

  • Foundations of Statistics for Data Scientists
    3 Sampling Distributions
    DOI: 10.1201/9781003159834-3
    Chapter 2 introduced probability distributions, which summarize probabilities of possible values for a random variable. Before the data are gathered, any statistic is a random variable. It has a set of possible values, and in a study employing randomization, a probability distribution applies to them. This chapter introduces probability distributions for statistics, which are called Sampling Distributions. Chapters 4 and 5 show that Sampling Distributions provide the basis for evaluating how precisely statistics estimate population parameters.
    The Sampling Distribution of the sample mean enables us to evaluate how close it is likely to be to the population mean. The law of large numbers states that the sample mean converges to the population mean as the sample size increases. The main reason for the importance of the normal distribution is the Central Limit Theorem, a remarkable result stating that with studies employing randomization, the sample mean has approximately a normal Sampling Distribution. The delta method shows that Sampling Distributions of many statistics other than the sample mean are also approximately normal.

    3.1 Sampling Distributions: Probability Distributions for Statistics

    The value of any particular statistic varies according to the sample chosen. The next example shows that, with randomization, a probability distribution for the possible values of a statistic enables us to evaluate how far the actual sample outcome of that statistic is likely to fall from the population parameter that it estimates.

    3.1.1 Example: Predicting an Election Result from an Exit Poll

    For major elections, television networks use exit polls of voters to help them predict winners well before all the votes are counted. For the 2020 Presidential election in the U.S. with Democratic candidate Joe Biden and Republican candidate Donald Trump, in an exit poll of California voters,1
  • A Guide to Business Statistics
    • David M. McEvoy(Author)
    • 2018(Publication Date)
    • Wiley
      (Publisher)
    Chapter 3 on descriptive statistics that a distribution of data is simply an organized dataset. Imagine a column of data which contains peoples' annual income, sorted from the smallest to the largest. That column is a distribution of data. In order to better visualize the data, suppose you take that column of income values and construct a histogram. With this histogram in mind, recall that a distribution of data has three important characteristics: (1) its shape, (2) its mean, and (3) its standard deviation.
    A Sampling Distribution is a particular type of dataset created by drawing different samples of the same size from a given population. Each time a new sample is drawn, a statistic (e.g., mean and proportion) iscalculated and added to the dataset. A complete Sampling Distribution contains statistics from all possible samples of the same size taken from a single population. If the population is finite (i.e., a fixed number of values), then the number of unique samples of a given size is also finite. When a population is infinite, then the number of unique samples is infinite and so is the Sampling Distribution. As long as population sizes are very large and sample sizes are relatively small, we can treat finite and infinite populations equivalently. We will discuss what is meant by “large” and “small” later in this chapter.
    To illustrate the concept of a Sampling Distribution, suppose the population of interest is a large statistics class of 100 students. Let us say we take a random sample of 10 students from this population and calculate the average grade point average (GPA). What we just calculated is a statistic, and that statistic is random because it comes from a random sample. If we drew another random sample of 10 students, we would likely get a different average GPA value. If everyone in the class were identical, different samples would lead to the same average GPA. Of course, this is not the case. The population is made up of all kinds of students, from bookworms to slackers. And each has the same chance of being in the sample of 10. When we say a unique sample, we mean a collection of 10 students that will not all be together in another sample. So, a single student, call him Johnny Crabcakes, can be part of many unique samples, but never with the same nine people more than once.
  • Introductory Probability and Statistics
    eBook - ePub

    Introductory Probability and Statistics

    Applications for Forestry and Natural Sciences (Revised Edition)

    • Robert Kozak, Antal Kozak, Christina Staudhammer, Susan Watts(Authors)
    • 2019(Publication Date)
    7     Sampling Distributions
    The Foundation of Inference
    As discussed earlier, one of the main purposes of statistics is to obtain information about a population by looking at partial or incomplete evidence that is provided by a subset of the population. It is simply too costly and time-consuming to consider every element of a population. Thus, we usually estimate one or more unknown characteristics (parameters) of the population by observing only a subset of the population (a sample) and by computing the appropriate statistics for characterizing the population from this sample. Since a sample is only a portion of a population, sample values (statistics) will change from sample to sample. In other words, the value of any statistic calculated from a sample is expected to vary. Despite this uncertainty, generalizations from a sample statistic to an unknown population parameter can be made with confidence if the probability distribution of the sample statistic is known. In this chapter, we will study the Sampling Distributions of means, proportions, differences of two means, differences of two proportions, variances and ratios of variances. These distributions will provide the basic tools to understand subsequent chapters of this book in which we will be dealing with the two most important practical applications of statistics: estimation and hypothesis testing.

    7.1    Sampling and Sampling Distributions

    Most statisticians agree that statistics calculated from simple random sampling usually provide sound, logical and reliable generalizations about population parameters. We will provide a formal definition of simple random sampling shortly, but for all intents and purposes, it can be thought of as randomly drawing sample elements from a hat. For reasons of simplicity, most procedures discussed in this book will be based on simple random sampling. However, it should be noted that in many practical situations, simple random sampling is not the preferred procedure; in fact, it is oftentimes undesirable. A brief discussion of common sampling techniques, such as systematic sampling , stratified random sampling and two-stage sampling , is provided in Chapter 13
  • Learning From Data
    eBook - ePub

    Learning From Data

    An Introduction To Statistical Reasoning

    • Arthur Glenberg, Matthew Andrzejewski(Authors)
    • 2007(Publication Date)
    • Routledge
      (Publisher)
    The probability distribution can be used to calculate the probability that a future random sample of that specific size from that specific population will have a statistic (for example, M) that meets certain characteristics. There are multiple Sampling Distributions, because each time you change the statistic (for example, from M to s 2), you specify a different Sampling Distribution; and each time you change the sample size, you get a different Sampling Distribution; and each time you change the population from which the random samples are drawn (for example, the population of CO content to the population of weights to the population of IQ scores), you get a different Sampling Distribution. There are many things that a Sampling Distribution is not. It is not a single number—it is a distribution. It is not the distribution of scores in a sample. Simply drawing a random sample (of any size) and constructing the relative frequency distribution of the scores in that sample does not give you a Sampling Distribution. Finally, a Sampling Distribution is not the distribution of scores in the population. What is it? It is the probability (relative frequency) distribution of a statistic computed from all possible random samples, all of the same size, all drawn from the same population. TWO Sampling DistributionS Constructing a Sampling Distribution of the Sample Mean Imagine a population with a total of 25 observations. The population consists of one 1, one 2, one 3, two 4s, three 5s, four 6s, six 7s, four 8s, and three 9s. Each of the 25 observations is written on a standard-size slip of paper, and the slips are well mixed in a hat. The relative frequency distribution of this population is illustrated as a histogram at the top of Figure 7.1. Note that the distribution is negatively skewed (skewed to the left). The mean of this population (computed in the usual way) is 6.16
  • Practitioner's Guide to Statistics and Lean Six Sigma for Process Improvements
    • Mikel J. Harry, Prem S. Mann, Ofelia C. De Hodgins, Richard L. Hulbert, Christopher J. Lacke(Authors)
    • 2011(Publication Date)
    • Wiley
      (Publisher)
    13.7 .
    GLOSSARY
    central-limit theorem The theorem from which it is inferred that for a large sample size (n ≥ 30), the shape of the Sampling Distribution of is approximately normal. Also, by the same theorem, the shape of the Sampling Distribution of is approximately normal for a sample for which np ≥ 10and n (1 – p ) ≥ 10.
    estimator A sample statistic that is used to estimate a population parameter.
    finite population correction factor Multiple denoted by that is used in calculating the standard deviation of the Sampling Distribution of p or x when the sample size is more than 5% of the population size.
    mean of The mean of the Sampling Distribution of , denoted by , is equal to the population proportion p .
    mean of The mean of the Sampling Distribution of , denoted by , is equal to the population mean μ .
    population distribution The probability distribution of the population data.
    population proportion p The ratio of the number of elements in a population with a specific characteristic to the total number of elements in the population.
    sample proportion The ratio of the number of elements in a sample with a specific characteristic to the total number of elements in that sample.
    Sampling Distribution of The probability distribution of all the values of p calculated from all possible samples of the same size selected from a population.
    Sampling Distribution of The probability distribution of all the values of x calculated from all possible samples of the same size selected from a population.
    sampling error
  • Mathematical Statistics with Resampling and R
    • Laura M. Chihara, Tim C. Hesterberg(Authors)
    • 2018(Publication Date)
    • Wiley
      (Publisher)
    The most likely outcome is 0.5, followed by 0.4 and 0.6, and so on – values farther from 0.5 are less likely. The distribution is bell shaped and centered approximately at 0.50; the sample mean is 0.504. The standard deviation is 0.157; we call the standard deviation of a statistic a standard error. Definition 4.1 Let be a random sample and let denote some statistic. The Sampling Distribution of is its probability distribution. ∥ (Here may be a vector ; it may also represent samples from multiple groups or populations.) The permutation distributions in Chapter 3 are Sampling Distributions, as is the above example. The key point is that a Sampling Distribution is the distribution of a statistic that summarizes a data set and represents how the statistic varies across many random data sets. A histogram of one set of observations drawn from a population does not represent a Sampling Distribution. A histogram of permutation means, each from one sample, does represent a Sampling Distribution. Definition 4.2 The standard deviation of a Sampling Distribution is called the standard error. We use the notation to denote the standard error for the Sampling Distribution of a statistic and to denote any estimate of the standard error for the Sampling Distribution of. ∥ Let us consider another example. Example 4.1 Suppose a population consists of four numbers,,,, and ; the population mean and standard deviation are and, respectively. If we draw samples of size (with replacement), there are 16 unique samples, with the following sample means: Sample mean 3 3.5 4.5 4.5 3.5 4 5 5 Sample mean 4.5 5 6 6 4.5 5 6 6 The Sampling Distribution for the mean of samples of size 2 (with replacement) from the given population is shown in Figure 4.3
  • Statistical Inference
    eBook - ePub

    Statistical Inference

    A Short Course

    • Michael J. Panik(Author)
    • 2012(Publication Date)
    • Wiley
      (Publisher)
    As a practical matter, how do we actually extract a random sample from a (finite) population? Those readers interested in the details of this process can consult Appendix 7.A (using a table of random numbers). In addition, if we are provided with a data set, how do we know that it has been drawn randomly? A test to assess the randomness of a particular sample appears in Appendix 10.A.

    7.2 The Sampling Distribution of The Mean

    Suppose μ is some unknown population mean. We want to determine its “true value.” How should we proceed to find μ? A reasonable approach is to take a simple random sample and make an inference from the sample to the population. How is this inference made? We use an estimator for μ. Think of an estimator as a function of the sample values used to estimate μ. A logical estimator for μ is the sample mean . (Specifying an estimator is a form of data reduction —we summarize the information about μ contained in a sample by determining some essential characteristic of the sample values. So for purposes of inference about μ, we employ the realization of rather than the entire set of observed data points.) Under random sampling, is a random variable and thus has a probability distribution that we shall term the Sampling Distribution of the mean —a distribution showing the probabilities of obtaining different sample means from random samples of size n taken from a population of size N (Fig. 7.1 ).
    Figure 7.1 Sampling Distribution of the mean.
    We shall call a point estimator for μ since it reports a single numerical value as the estimate of μ. The value of typically varies from sample to sample (since different samples of size n have different sample values). Hence, we cannot expect to be exactly on target—an error arises that we shall call the sampling error of . It amounts to . For some samples, the sampling error is positive while for others it will be negative; rarely will it be zero. Hence the values of will tend to cluster around μ, that is, the Sampling Distribution of the mean is concentrated about μ (Fig. 7.1 ).
    Is a “good” estimator for μ? It will be if has certain desirable properties (which we will get to as our discussion progresses). Interestingly enough, these so-called desirable properties are expressed in terms of the mean and variance of the Sampling Distribution of
  • Understanding Statistics
    • Bruce J. Chalmer(Author)
    • 2020(Publication Date)
    • CRC Press
      (Publisher)
    4

    Some Distributions Used in Statistical Inference

    4.1    Knowing the Sampling Distribution of a statistic allows us to draw inferences from sample data.

    Distributions in statistical inference

    We noted in Chapter 2 that knowing the Sampling Distribution of a statistic is the key to using the statistic to draw inferences about the parameter of interest. We noted also that the central limit theorem assures us that statistics calculated by summing or averaging have (at least approximately) normal Sampling Distributions.
    In this chapter we discuss the details of how to use the normal distribution to find the proportion of individual scores in any given interval. This will lay the groundwork for the inferential procedures of estimation and hypothesis testing which we cover in Chapter 5 . We also discuss the binomial distribution, which is another important distribution used in statistical inference. We conclude the chapter with a description of the relationship between the binomial and normal distributions.

    4.2    The standard normal distribution is used to find areas under any normal curve.

    Characteristics of normal distributions

    In Chapter 3 we noted that the mean and standard deviation completely specify a normal distribution. That is, once you know the mean and standard deviation of a distribution known to be normal in shape, you can say exactly what proportion of scores in the distribution are in any given range. Let’s consider how this is done.
    First, it is handy to consider some general characteristics. (In fact, you will find it convenient to memorize these characteristics of a normal distribution, since you will be using them very frequently.) Refer to Figure 4.1
  • Statistics for the Behavioural Sciences
    eBook - ePub

    Statistics for the Behavioural Sciences

    An Introduction to Frequentist and Bayesian Approaches

    • Riccardo Russo(Author)
    • 2020(Publication Date)
    • Routledge
      (Publisher)
    The Student's t -distribution, and the application of the t -test to the above type of problem will also be considered. The t -test, and its associated t -distribution, are used to examine hypotheses about means when the standard deviation of the population of the individual observations is not known. In the majority of cases where means are compared, we do not know the population standard deviation, so it is estimated using the sampled data, and the t -test is an appropriate way to test hypotheses about the mean. Then, we will show how to use sample data to construct intervals that have a given probability of containing the true population mean, to calculate indexes of the size of the effect of the independent variable on the dependent variable and to use this in performing statistical power analysis for the one-sample t -test. 7.2 The Sampling Distribution of the mean and the Central Limit Theorem Imagine we know that the distribution of the population of individual scores in a manual dexterity test is normal with µ = 50 and σ = 8. Now imagine that we draw an infinite number of independent samples, each of 16 observations, from this population. We then record these means and plot their values. What would the distribution of these sample means look like? It turns out that the distribution of these sample means (also called the Sampling Distribution of the mean) is normal with a mean of 50 (i.e., equal to the mean of the distribution of the population of the individual scores; the mean of the Sampling Distribution of the mean is usually denoted as μ x ¯) and a standard deviation of 2 (note that the standard deviation of the Sampling Distribution of the mean is usually called the standard error of the mean and is denoted as σ x ¯)
  • Probability and Statistics for Economists
    • Yongmiao Hong(Author)
    • 2017(Publication Date)
    • WSPC
      (Publisher)
    n .
    6.2. A community has five families whose annual incomes are 1, 2, 3, 4, and 5 respectively. Suppose a survey is to be made to two of the five families and the choice of the two is random. Find the Sampling Distribution of the sample mean of the family income. Give your reasoning clearly.
    6.3. Suppose the return of asset i is given by
    where Ri is the return on asset i, α is the return on the risk-free asset, Rm is the return on the market portfolio which represents the market risk, and Xi represents an idiosyncratic risk peculiar to the characteristics of asset i. Assume 0 < βi < ∞.
    We consider an equal-weighting portfolio that consists of n assets. The return on such an equal-weighting portfolio is then given by
    where is the sample mean of the random sample Xn = (X1 , · · ·, Xn ). Assume the random sample Xn is an independent and identically distributed random sample with population mean µ and population variance σ2 . Also, assume that Rm and Xn are mutually independent.
    The total risk of the equal-weighting portfolio is measured by its variance. (1) Show
    That is, the risk of the portfolio contains a market risk and a component contributed by n individual risks;
    (2) Show that idiosyncratic risks can be eliminated by forming a portfolio with a large number of assets, that is, by letting n → ∞.
    6.4. Suppose there are k IID random samples from a population Bernoulli(p) distribution, with sample sizes equal to n1 , · · ·, nk respectively. Assume these k random samples are independent of each other. Based on these k random samples, define k sample means, n 1 , · · ·,
    nk
    , respectively. Define an overall sample mean Find: (1) the mean of ; (2) the variance of .
    6.5. Suppose Xn = (X1 , · · ·, Xn ) is an IID N(µ1 , ) random sample, Y
Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.