Mathematics

Sampling

Sampling is the process of selecting a subset of individuals or items from a larger population to make inferences or generalizations about the entire group. In mathematics, sampling is used to gather data for statistical analysis and to draw conclusions about a population based on the characteristics of the sample. It is a fundamental concept in probability and statistics.

Written by Perlego with AI-assistance

10 Key excerpts on "Sampling"

  • Statistics for Business
    12 Theory of Sampling 12.1    Introduction
    In this chapter, we discuss the concepts of Sampling and Sampling distributions, which are the actual basis of statistical estimation and hypothesis testing. The main purpose of Sampling is to allow us to make use of the information gathered from the sample to draw influences about the entire population. One can define a ‘population’ as a collection of objects having a certain well-defined set of attributes. A ‘sample’ is any subset of a given population. It is possible to estimate the population parameters from the limited sample parameters with the help of statistical methods and concepts. This falls under the category of statistical inference (i.e., inductive statistics). The inferential process is not error free because the estimation or inference is based on the limited sample data obtained from samples.
    We should evaluate such errors to have a measure of confidence in our inferences. If we take random samples, these errors occur randomly, and thus, the same can be computed probabilistically.
    In this chapter, the concepts of Sampling will be developed, Sampling distributions for various sample statistics like the sample mean and proportion are described, and the well-known Sampling distributions as the Chi-square, F-distribution, t-distribution, and standard normal distribution are also introduced. These distributions fit well into certain sample statistics that play a major role in estimation and hypothesis testing.
    12.2    Why Sample?
    In many situations, even though we are interested in some characteristic of a specific population, we cannot physically examine the entire population because of cost, time, or other limitations. In such instances, we examine a part of a population by means of a sample with the expectation that it will be the representative of the population under study.
    12.3    How to Choose It?
    One way is to use simple random Sampling. Simple random Sampling provides all the samples of the size specified with an equal chance of being selected. Based on the given random sample, one can find a sample statistic such as the mean or variance and the same can be used to estimate the corresponding population parameter. Every statistic is a random variable with its own probability distribution. The probability distribution referred to by the sample statistic is known as a ‘Sampling distribution’. It has a defined property like any probability model. Based on the properties, one can evaluate the chance errors involved in drawing the inference from a sample.
  • Essential Statistics for Non-STEM Data Analysts
    eBook - ePub

    Essential Statistics for Non-STEM Data Analysts

    Get to grips with the statistics and math knowledge needed to enter the world of data science with Python

    Chapter 4 : Sampling and Inferential Statistics
    In this chapter, we focus on several difficult Sampling techniques and basic inferential statistics associated with each of them. This chapter is crucial because in real life, the data we have is, most likely, only a small portion of a whole set. Sometimes, we also need to perform Sampling on a given large dataset. Common reasons for Sampling are listed as follows:
    • The analysis can run quicker when the dataset is small.
    • Your model doesn't benefit much from having gazillions of pieces of data.
    Sometimes, you also don't want Sampling. For example, Sampling a small dataset with sub-categories may be detrimental. Understanding how Sampling works will help you to avoid various kinds of pitfalls.
    The following topics will be covered in this chapter:
    • Understanding fundamental concepts in Sampling techniques
    • Performing proper Sampling under different scenarios
    • Understanding statistics associated with Sampling
    We begin by clarifying the concepts.

    Understanding fundamental concepts in Sampling techniques

    In Chapter 2 , Essential Statistics for Data Assessment , I emphasized that statistics such as mean and variance were used to describe the population. The intent is to help you distinguish between the population and samples. With a population at hand, the information is complete, which means all statistics you calculated will be authentic since you have everything. With a sample, the information you have only relates to a small portion, or a subset of the population.
    What exactly is a population?
    A population is the whole set of entities under study
  • Research Methods for Public Administrators
    • Gary Rassel, Suzanne Leland, Zachary Mohr, Elizabethann O'Sullivan(Authors)
    • 2020(Publication Date)
    • Routledge
      (Publisher)
    Perhaps its most widespread use is in survey research. Sample surveys are used to provide statistical data on a wide range of subjects for research and administrative purposes. A relatively small number of individuals are interviewed in order to gather data that will allow an investigator to find out something about the larger population. Given today’s widespread use of surveys, people are often surprised to learn that the sample survey has a relatively short history. 3 Even at the beginning of the 20th century, statisticians debated whether anything less than investigation of a complete population was acceptable. 4 Sampling has since become widely accepted, and a number of techniques have been developed and refined. Sampling involves several interrelated factors. These include the type of sample, its size, the population of interest, the accuracy desired, and the confidence the investigator wishes to have in the results. We will discuss these topics in this chapter. Before describing the methods and techniques of Sampling, however, we need to introduce and define a number of terms. Defining the Population A sample is a subset of units selected from a larger set of the same units. They are the units studied and provide data for use in estimating the characteristics of the larger set. For example, polling organizations, such as the Gallup Poll, use samples of about 1,500 or fewer people to describe the opinions of over 200 million Americans. The population is the total set of units in which the investigator is interested, that is, the larger set from which the sample is drawn. The population’s characteristics and the relationships among these characteristics are inferred from the sample data. Investigators wish to generalize from the sample units studied to the entire population of units
  • Dissertation Research Methods
    eBook - ePub

    Dissertation Research Methods

    A Step-by-Step Guide to Writing Up Your Research in the Social Sciences

    • Philip Adu, D. Anthony Miles(Authors)
    • 2023(Publication Date)
    • Routledge
      (Publisher)
    Sekaran & Bougie, 2013 ).
    A Sampling frame is a list that identifies the individual elements of the population. A Sampling frame should contain all the elements of the population. It is from a frame, which is a list of all the units of the population to be surveyed, that a sample is selected. The frame should contain all the units of the population under consideration (Pedhazur & Pedhazur Schmelkin, 1991 ; Kerlinger & Lee, 1999 ; Rao, 2000 ; Thomas & Brubaker, 2000 ). A Sampling design is a mathematical function that gives you the probability of any given sample being drawn. Considering that Sampling is the foundation of nearly every research project, the study of Sampling design is a crucial part of statistics. Sampling design involves not only learning how to derive the probability functions that describe a given Sampling method but also understanding how to design a best-fit Sampling method for a real-life situation (Glen, 2020 ).
    A Sampling error is a statistical error that occurs when a sample used in a study does not represent the entire population. Also, a Sampling error is a measure of the departure of all the possible estimates of a probability Sampling procedure from the population quantity being estimated. An important feature of probability Sampling is that, in addition to providing an estimate of the unknown population quantity, it enables the assessment of the Sampling error of the estimate, the standard error. These errors often occur in the process of Sampling, which is analyzing a selected number of observations from a larger population. Furthermore, a Sampling error is the value of difference between the sampled value versus the true or total population value (Rao, 2000 ; Calabrese, 2009 ; Beins & McCarthy, 2012 ; Lepcha, 2022
  • Research Methodology
    eBook - ePub

    Research Methodology

    Techniques and Trends

    6 Sampling Design
    DOI: 10.1201/9781315167138-6

    6.1 Introduction to Sampling

    Where it is not possible to study the entire population, in such situations researchers use the concept of Sampling. For a variety of reasons, researchers usually cannot make direct observations of every unit of the population they are studying. Instead, they collect data from a subset of population called as a sample and use these observations drawn to make inferences about the entire population.
    Ideally, the characteristics of a sample should correspond to the characteristics of a population from which the sample was drawn. In that case, the conclusions drawn from a sample are probably applicable to the entire population.
    Sampling is the backbone of marketing research. In this chapter, you will be introduced to various Sampling concepts. A brief mention of Sampling and nonSampling errors will be made. The various probability and nonprobability Sampling designs as applicable to marketing research will be introduced. Since the choice of sample size involves various elements such as time, money accuracy, etc., an important decision while taking a sample is to know how large a sample should be taken. Therefore, the determination of sample size would also be discussed.

    6.2 Basic Definitions and Concepts

    Researchers usually cannot make direct observation of every individual in the population under study. Instead, they collect data from a subset of individuals called as a sample and to make inferences about the entire population using those observations.

    6.2.1 Element

    The unit about which information is collected is called as an element. According to a well-defined procedure this provides the basis for analysis. Elements should be well defined and the possibility of identifying them physically is important. For example, in a retail stores survey, a shop may be considered as a unit, whereas in a family budget enquiry a household may be treated as a unit.
  • The Essentials of Biostatistics for Physicians, Nurses, and Clinicians
    • Michael R. Chernick(Author)
    • 2011(Publication Date)
    • Wiley
      (Publisher)
    CHAPTER 2 Sampling from Populations
    One of the key aspects of statistics and statistical inference is to draw conclusions about a population based on a sample. Our ability to make good inferences requires an intelligent design and must include some form of random Sampling. Random Sampling is needed so that the sample can be analyzed based on the probability mechanism that generates the sample. This way, estimates based on the sample data can be obtained, and inference drawn based on the probability distribution associated with the sample.
    To illustrate, suppose we select five students at random from a math class of 40 students. We will formally define random Sampling later. If we give a math test to these students based on the material they have studied in the class, and we average the five scores, we will have a prediction of what the class average for that test will be. This prediction will be unbiased (meaning that if we repeatedly took samples of and averaged them, the average of the averages will approach the class average).
    In practice, we do not repeat the process, but we do draw inference based on the properties of the Sampling procedure. On the other hand, suppose we selected the five students to be the ones with the highest class average thus far in the class. In that case, we would not have a random sample, and the average of this group could be expected to be higher than the class average. The amount that it is higher is the bias of the prediction. Bias is something we want to avoid because usually we cannot adjust our estimate to get a good prediction.
    In addition to bias (which can be avoided by randomization), an estimate or prediction will have a variance. The variance is a measure of the variability in estimates that would be obtained by repeating the Sampling process. While bias cannot be controlled by the sample size, the variance can. The larger the sample size is, the smaller is the variance of the estimate, or in the example, above the prediction of the class average.
  • Interpreting Statistics for Beginners
    eBook - ePub

    Interpreting Statistics for Beginners

    A Guide for Behavioural and Social Scientists

    • Vladimir Hedrih, Andjelka Hedrih(Authors)
    • 2022(Publication Date)
    • Routledge
      (Publisher)
    nonprobabilistic Sampling procedures use a variety of Sampling procedures in which selection is not based on chance or probability.
    At this point it is also important to point to the concept of reSampling. ReSampling refers to a number of procedures where a new sample or samples are created from an existing sample, typically in order to simulate what would happen if additional samples were drawn from the same population. As we are aware that no matter how we create a sample its properties may more or less differ from the population, reSampling can be a convenient way to assess the likely magnitude of these differences and thus make more precise inferences about what might be expected from future studies of the same topic using different samples. While there are many different methods of reSampling, those most commonly seen in scientific research and statistical software include bootstrapping, jackknifing and cross validation.
    Bootstrapping is the name of a set of procedures where a (typically large) number of new samples is sampled with replacement from the one sample at hand (e.g. Good, 2006 ). Essentially, the existing sample is treated as if it were the population and then a number of new samples are sampled from entities in it. This is usually done by using random Sampling (see later) and because it is done through Sampling with replacement there is no limit in the number of new samples that can be sampled or in their size. Bootstrapping is most commonly used in procedures for making inferences about parameters (i.e. values of statistical indicators in the population) based on the statistics calculated from the sample. In a way, bootstrapping can be taken to simulate what would happen if we sampled a large number of samples from the same population, with the only difference being that here we are not Sampling from the population but from a sample taken from that population. This will be discussed in more detail in the part of this book about inferential statistics. Bootstrapping procedure was first proposed by Efron (1979)
  • Survey Methods in Social Investigation
    • C.A. Moser, G. Kalton(Authors)
    • 2017(Publication Date)
    • Routledge
      (Publisher)
    A decision to cover only a sample, rather than every member, of a population means leaving the field of description and certainty and entering that of inference and probability. An extreme illustration will make this point clear. Suppose a survey in a factory employing 3,000 workers is conducted to estimate the proportion of workers who smoke cigarettes. If coverage is complete, i.e. if every worker is interviewed, and 1,500 are found to be cigarette smokers, one can state as a fact that the proportion of smokers in the population is 50 per cent. Yet if we interviewed only 2,998 of the workers and found 1,499, that is 50 per cent, to be cigarette smokers, the proportion in the whole population, i.e. in the 3,000 workers, would still be taken as 50 per cent, but this would be an estimate and not a statement of fact. Knowledge regarding some of the population members would be lacking; any conclusion about the population must be given in terms of probability. Secondly, it must be emphasized that the principles outlined in this chapter rest on the assumption that a random method of selection (or, in American terminology, probability Sampling—see Chapter 5) is employed. A random method is one in which each member of the population has a known (and non-zero) chance of being selected into the sample. To a sample selected by non-random methods the theory and its convenient consequences cannot be applied. 4.3. Accuracy, bias and precision The basic ideas of Sampling are best made clear by considering a small model population and confining ourselves to what is called simple random Sampling —a method of selection whereby each possible sample of n units from a population of N units has an equal chance of being selected
  • Basic Statistical Methods and Models for the Sciences
    Chapter 3 Sampling and Descriptive Statistics 3.1 Representative and Random Samples Much of the data that we gather to gain information can be thought of as a sample from some larger collection of objects — which we will refer to as the target population. 1 When we sample from such a population we would like the sample to be representative for some attribute. By this we mean that we want the proportion of objects in the sample with this attribute be close to the proportion of the population 2 with this attribute — because it is usually the population proportion with the given attribute that is of interest. For instance, just recently we had good reason to believe that our dog (a not too discriminating greyhound) had eaten an entire container of Tums (this is the same dog that chewed a full tube of crazy glue). The veterinarian took a sample of his blood. Unless this sample’s proportion of calcium (the main ingredient in Tums) was likely to be close to the proportion of calcium in his entire bloodstream, it might be difficult to decide on the proper treatment
  • Quantitative Research Methods in Communication
    eBook - ePub

    Quantitative Research Methods in Communication

    The Power of Numbers for Social Justice

    • Erica Scharrer, Srividya Ramasubramanian(Authors)
    • 2021(Publication Date)
    • Routledge
      (Publisher)
    In order to proceed with the logic and a bit of the math behind these processes, it is necessary first to define parameters and statistics. A parameter is what a variable would look like if you were able to know how it exists fully in the entire population. A statistic is what the variable looks like in the sample you have drawn for your research. Now we will talk about how to determine the distance between those two things—how likely is it that the statistic matches the parameter? The difference between the population parameter and the sample statistic is Sampling error. Figure 4.8 Types of Sampling methods Source : Stephen Warren With probability samples, you can actually use some basic mathematical principles to estimate Sampling error, so that you know just how likely your results are to stand in well for a larger population. Nonprobability samples, alas, simply do not have the necessary qualities to allow for these calculations. What are those qualities, you ask? The most important quality is that for probability samples, the central limit theorem shows that taking multiple samples from the population—and thereby deriving multiple statistics on a variable of interest—will result in statistics that cluster around the mean of the population parameter. Once enough samples are selected from the population, those statistics will be distributed in a bell-shaped curve with what we call a normal distribution, clustered around the mean of the population and then tailing out from that center point on each side, as seen in Figure 4.10. A real-world example should help illustrate this point. Assume that we sample 100 women residents of a local town and we ask them a yes or no question: Have you ever experienced sexual harassment on the job? In that sample, we yield a statistic that shows 38% answered in the affirmative. We now sample another 100 women, and this time 45% say they have been sexually harassed. We sample again, with 200 participants this time, and 40% say yes
Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.