Mathematics

Normal Distribution Hypothesis Test

A normal distribution hypothesis test is a statistical method used to determine if a given set of data follows a normal distribution. It involves comparing the sample data to a theoretical normal distribution to assess the likelihood that the sample was drawn from a population with a normal distribution. This test is commonly used in various fields to assess the validity of statistical assumptions.

Written by Perlego with AI-assistance

11 Key excerpts on "Normal Distribution Hypothesis Test"

  • Understanding Least Squares Estimation and Geomatics Data Analysis
    • John Olusegun Ogundare(Author)
    • 2018(Publication Date)
    • Wiley
      (Publisher)
    Chapter 7 of this book.
    Figure 3.11
    Bivariate normal probability density function.

    3.5 Concepts of Statistical Hypothesis Tests

    An assumption, guess, or statement that one makes about the unknown parameters of a probability distribution of a population or about the distribution itself is called statistical hypothesis. This assumption or guess may or may not be true. The statistical hypothesis can be made on whether or not a population has a particular probability distribution or if the mean or a variance belongs to a particular distribution. A decision‐making procedure that consists of making statistical hypotheses and carrying out statistical tests to determine if these hypotheses are acceptable or not is known as statistical hypothesis test (or test of significance). Mathematical and stochastic models used in adjustments are based on some statistical hypotheses with statistical tests conducted to verify those hypotheses.
    Two complementary hypotheses are commonly made and tested: null hypothesis (or statistical hypothesis), H0 , and alternative hypothesis, HA . Null hypothesis (or statistical hypothesis), H0 , is a specific statement (or a theory) about a property of population that has been put forward, either because it is believed to be true or because it is to be used as a basis for argument, but has not been proved. This hypothesis must be specific enough to be testable. The alternative hypothesis, HA , is a statement of what a statistical hypothesis test is set up to establish. It is the statement to be accepted if the H0 hypothesis fails. The H0 and HA hypotheses are often formulated based on the given values of the population parameters (not based on the sample statistics or the estimated values from the sample). For example, the hypotheses can be set up as H0 : μ = 0 vs. HA : μ > 0 or HA : μ < 0 (but not as H0 : = 0 vs. HA :  > 0 or H0 :  < 0, since one cannot be making an assumption about a quantity like
  • Probability, Statistics, and Reliability for Engineers and Scientists
    • Bilal M. Ayyub, Richard H. McCuen(Authors)
    • 2016(Publication Date)
    • CRC Press
      (Publisher)
    The alternative hypothesis of step 1 provides for a difference between specified populations or parameters. To test the two hypotheses, it is necessary to develop a test statistic that reflects the difference suggested by the alternative hypothesis. The computed value of a test statistic varies from one sample to the next; therefore, the test statistic is a random variable and has a sampling distribution. A hypothesis test should be based on a theoretical model that defines the distribution function of the test statistic and the parameters of the sampling distribution. Based on the distributions of the test statistic, probability statements about computed values may be made.
    Theoretical models are available for all of the more frequently used hypothesis tests. In cases where theoretical models are not available, approximations have usually been developed. In any case, a model or theorem that specifies the test statistic, its distribution, and its parameters must be found. The test statistic reflects the hypotheses and the data that are usually available. Also, the test statistic is a random variable, thus it has a distribution function that is defined by a functional form and one or more parameters.
    Examples of test statistics were introduced in Chapter 8 . Specifically, the Z statistic (Equation 8.31 ) for a mean value and the Z statistic (Equation 8.33 ) for comparing two means. A number of other test statistics will be introduced in this chapter.

    9.2.3 Step 3: Level of Significance

    A set of hypotheses was formulated in step 1. In step 2, a test statistic and its distribution were selected to reflect the problem for which the hypotheses were formulated. In step 4, data will be collected to test the hypotheses. Before the data are collected, however, it is necessary to provide a probabilistic framework for accepting or rejecting the null hypothesis and, subsequently, making a decision. The framework should reflect the allowance to be made for the chance variation that can be expected in a sample of data. This chance variation is referred to as sampling variation.
  • Essential Mathematics and Statistics for Forensic Science
    • Craig Adam(Author)
    • 2011(Publication Date)
    • Wiley
      (Publisher)
    t -test for the statistical comparison of experimental data, which will be discussed later in the chapter.
    9.1 The normal distribution
    We saw in Section 6.3 how frequency data, represented by a histogram, may be rescaled and interpreted as a probability density histogram. This continuous function evolves as the column widths in the histogram are reduced and the stepped appearance transforms into a smooth curve. This curve may often be described by some mathematical function of the x -axis variable, which is the probability density function. Where the variation around the mean value is due to random processes this function is given by an exact expression called the normal or Gaussian probability density function. The obvious characteristic of this distribution is its symmetric “bell-shaped” profile centred on the mean. Two examples of this function are given in Figure 9.1 .
    Interpreting any normal distribution in terms of probability reveals that measurements around the mean value have a high probability of occurrence while those further from the mean are less probable. The symmetry implies that results greater than the mean will occur with equal frequency to those smaller than the mean. On a more quantitative basis, the width of the distribution is directly linked to the standard deviation as illustrated by the examples in the figure. To explore the normal distribution in more detail it is necessary to work with its mathematical representation, though we shall see later that to apply the distribution to the calculation of probabilities does not normally require mathematics at this level of detail. This function is given by the following mathematical expression, which describes how it depends on both the mean value μ and the standard deviation σ:
  • Business Statistics For Dummies
    • Alan Anderson(Author)
    • 2023(Publication Date)
    • For Dummies
      (Publisher)
    In many business applications, variables are assumed to be normally distributed. For example, returns to stocks are often assumed to be normally distributed by investors, portfolio managers, financial analysts, risk managers, and so on. The assumption of normality is not only convenient, but many standard statistical techniques require it in order to generate valid results. For example, computing a confidence interval for the mean of a population may be based on the normal distribution. Many of the techniques used in regression analysis to check the validity of the results are based on the normal distribution. As a result, even when the assumption of normality is not perfectly accurate, the normal distribution is often used to perform statistical analyses due to its convenience.

    Getting to know the standard normal distribution

    The standard normal distribution is the special case where μ = 0 and σ = 1. For example, suppose that the daily returns to a stock follow the standard normal distribution. The mean return over a single trading day is 0 percent, and the standard deviation is 1 percent; as a result:
    • The probability that tomorrow’s return will be between −1 percent and +1 percent is 0.6827 or 68.27 percent. −1 percent represents one standard deviation below the mean, while +1 percent represents one standard deviation above the mean.
    • The probability that tomorrow’s return will be between −2 percent and +2 percent is 0.9544 or 95.44 percent. −2 percent represents two standard deviations below the mean, while +2 percent represents two standard deviations above the mean.
    • The probability that tomorrow’s return will be between −3 percent and +3 percent is 0.9973 or 99.73 percent. −3 percent represents three standard deviations below the mean, while +3 percent represents three standard deviations above the mean.
    By convention, the letter Z represents a standard normal random variable, whereas the letter X represents any other normal random variable.

    Computing standard normal probabilities

    One approach to computing probabilities for the standard normal distribution is to use statistical tables. (For the mathematically inclined, the tables result from applying calculus to the normal distribution.) The standard normal table is designed to show cumulative probabilities; i.e., the probability that a standard normal random variable Z is less than or equal to a specified value, such as P(Z ≤ 2.50). Standard normal tables are divided into two parts; the first shows positive values for Z , and the second shows negative values for Z
  • Econometrics
    eBook - ePub
    • K. Nirmal Ravi Kumar(Author)
    • 2020(Publication Date)
    • CRC Press
      (Publisher)
    In general, there are two basic types of distributions viz., normal distribution and ‘t’ distribution. Among these two, normal distribution is an ideal model (Figures 7.3, 7.5). The ‘t’ distribution is much flatter than normal distribution and ‘t’ distribution varies with every possible sample size. This ‘t’ distribution is followed when the sample size is less than 30 and as the sample size increases, ‘t’ distribution loses its flatness and gradually equals to normal distribution. Thus, normal distribution is the appropriate probability distribution, when the sample size exceeds 30. •  After calculating the statistic (say, correlation coefficient), now conduct the significance test to estimate the significance of the statistic (say, ‘t’ test). •  Obtain the table or critical value (say, ‘t’ table value) in the statistical tables at a given df (say, at n – 2 df for two variables viz., price of the commodity and its quantity demanded) and at a given level of significance. •  Compare ‘t’ calculated value and ‘t’ table or critical value to make the decision. If the ‘t’ calculated value is more than ‘t’ table value, the researcher has to reject the H O and accept H A. On the other hand, if the ‘t’ calculated value is less than ‘t’ table value, the researcher has to accept the H O. •  In order to generalize the sample statistic back to the population parameter, compute the confidence intervals.
  • Integrative Statistics for the Social and Behavioral Sciences
    PART II

    INTRODUCTION TO HYPOTHESIS TESTING

    6.   Sampling Distribution of the Mean and the Single-Sample z Statistic
      7.   Inferential Statistics   8.   Single-Sample Tests   9.   Two-Sample Tests   Passage contains an image

    CHAPTER 6

    Sampling Distribution of the Mean and the Single-Sample z Statistic

      How Are Sampling Distributions Created?
    Single-Sample z Statistic
    Summary Chapter 6 Homework  
    G et ready to do your first “real” statistical assessment. You will now be able to statistically answer questions such as “Is this child’s birth weight normal?” and “Are the murder rates of my city higher than the national average?” Before we get to those answers, we need to provide some conceptual groundwork so you know how we are able to answer these questions statistically.
    In the previous chapter, you learned about the normal distribution, standardized (z) scores, and probability. The definition of a normal distribution and the concepts of probability will be important throughout this book since many inferential tests require that your data be normally distributed so that the probability of obtaining a particular result by chance can be calculated. Thus, you will repeatedly calculate statistics on your data but interpret them based on the likelihood of obtaining that statistic simply due to chance, not because of your experimental manipulations. If the chance (or odds) is low, then you will interpret your results as unlikely due to chance or more likely as being the result of your experimental manipulation.
    The best way to summarize data from a sample is to look at the mean and standard deviation of a sample, and the best way to compare a sample and a population is to compare their means. Thus, instead of comparing one score to a population mean, we want to compare a sample mean to a population mean, and we want to evaluate the difference between the sample and the population means based on chance. How do we do that? Essentially, we need to evaluate our calculated statistic (the difference between the sample and the population means divided by the standard deviation of the population) with an appropriate sampling distribution.
  • Understanding Statistics
    • Bruce J. Chalmer(Author)
    • 2020(Publication Date)
    • CRC Press
      (Publisher)
    4

    Some Distributions Used in Statistical Inference

    4.1    Knowing the sampling distribution of a statistic allows us to draw inferences from sample data.

    Distributions in statistical inference

    We noted in Chapter 2 that knowing the sampling distribution of a statistic is the key to using the statistic to draw inferences about the parameter of interest. We noted also that the central limit theorem assures us that statistics calculated by summing or averaging have (at least approximately) normal sampling distributions.
    In this chapter we discuss the details of how to use the normal distribution to find the proportion of individual scores in any given interval. This will lay the groundwork for the inferential procedures of estimation and hypothesis testing which we cover in Chapter 5 . We also discuss the binomial distribution, which is another important distribution used in statistical inference. We conclude the chapter with a description of the relationship between the binomial and normal distributions.

    4.2    The standard normal distribution is used to find areas under any normal curve.

    Characteristics of normal distributions

    In Chapter 3 we noted that the mean and standard deviation completely specify a normal distribution. That is, once you know the mean and standard deviation of a distribution known to be normal in shape, you can say exactly what proportion of scores in the distribution are in any given range. Let’s consider how this is done.
    First, it is handy to consider some general characteristics. (In fact, you will find it convenient to memorize these characteristics of a normal distribution, since you will be using them very frequently.) Refer to Figure 4.1
  • Quantitative Techniques in Business, Management and Finance
    • Umeshkumar Dubey, D P Kothari, G K Awari(Authors)
    • 2016(Publication Date)
    ¯ .
    For a large sample, the value of z for for a test of hypothesis for μ is
    z =
    x ¯
    μ
    σ
    x ¯
      if  σ  
    ( standard deviation )  is known
    z =
    x ¯
    μ
    S
    x ¯
      if  σ  is not known
    where
    σ
    x ¯
    =
    σ n
    and
    s
    x ¯
    =
    s n
    .
    This is the observed value of z.
    9.4.8 Test for Equality of Means for Small and Independent Samples
    To do this, one must make a confidence interval and test a hypothesis about the difference between two population means when samples are small and independent.
    9.4.8.1 Assumption
    The two populations from which the two samples are drawn are approximately normal.
    1. Case I: When population standard deviations are known. If the above assumption is true and the population standard deviations are known, then we can use the normal distribution for inference about the difference of population means.
    2. Case II: When population standard deviations are unknown. If the standard deviations of the populations are not known, then the normal distribution is replaced by the t-distribution to make inferences about the difference of population means.
    Here, we will make one more assumption, that the standard deviations of both populations are equal (σ1 = σ2 = σ).
    When the standard deviations of two populations are equal, then we can use a common standard deviation for both (σ for both σ1 and σ2 ). Because σ is unknown, we replace it by its point estimator, sp
  • Applied Statistics for the Social and Health Sciences
    • Rachel A. Gordon(Author)
    • 2023(Publication Date)
    • Routledge
      (Publisher)
    One origin of the name may reflect the distribution’s frequent appearance. Some characteristics, like height and weight, have been found to follow the distribution’s symmetric bell shape in many samples. Some sample statistics, such as the mean, also follow a “normal” distribution even when the population distribution is not normally distributed. When statisticians can mathematically represent a distribution like the “normal” distribution, the properties of the distribution can be used to test hypotheses and construct confidence intervals. In this section, we illustrate use of the “normal” distribution when the variable itself follows its shape. In later sections, we will use similar techniques when our sample statistic (the sample mean) follows the “normal” distribution.
    The term can also lead to over-reliance on the bell shape of the “normal” distribution, to the extent that scholars equate normal with best. Karl Pearson (1920) discussed the origins of his use of the term while covering the contributions of Pierre-Simon Laplace and Johann Carl Friedrich Gauss to formalizing the mathematical formulas of the distribution:
    Many years ago I called the Laplace-Gaussian curve the normal curve, which name, while it avoids an international question of priority, has the disadvantage of leading people to believe that all other distributions of frequency are in one sense or another ‘abnormal.’ That belief is, of course, not justifiable. It has led many writers to try and force all frequency by aid of one or another process of distortion into a ‘normal’ curve.
    (Pearson, 1920 , 25, Notes on the History of Correlation, Biometrika)
    We include this quote for several reasons. At a pragmatic level, as noted, you may see some writers refer to the distribution with the name Gaussian, the most common alternative we see to the term “normal” for the distribution. Yet, Pearson’s quote indicates that he also attributed precedence to Laplace. Either way, attaching a name to an empirically ubiquitous distribution is problematic. How many others may have “discovered” this distribution, and its mathematical representation, yet are unrecognized? Can we say someone “discovered” something that simply exists? For these reasons, we also do not find the second most common term for the distribution as “Gaussian distribution” satisfactory.
  • Understanding Statistical Concepts Using S-plus
    • Randall E. Schumacker, Allen Akers(Authors)
    • 2001(Publication Date)
    • Psychology Press
      (Publisher)
    Gossett, who worked for a brewery, in Dublin, Ireland determined that when the sample size was large, the sampling distribution of the z statistic was normal, however, when the sample size was small, the sampling distribution was leptokurtic or peaked. He referred to this slightly non-normal distribution as the t distribution (see Chapter 20). W.S.Gossett further discovered that as he increased the sample sizes, the sampling distribution became normal, and therefore the t values equaled the z values. W.S.Gossett signed his pivotal work, “Student,” and today small sample tests of mean differences are referred to as the “student t-test,” or simply the t-test. Sir Ronald Fisher, using the early work of “Student,” extended his ideas into modern day analysis of variance techniques (see Chapter 32). One-sample t-test The sampling distribution of the mean can be used to determine the expected value of the population mean and variance when they are unknown. The Central Limit Theorem supports this assumption because as sample size increases, the sampling distribution of the mean becomes a normal distribution with the mean=μ, and variance=σ 2 /N. Knowledge of this permits one to test whether the sample mean is statistically different from a population mean. The test is called a one-sample t-test, which is computed by: Suppose a class of 25 students was given instruction using a new, enriched mathematics curriculum and then took a nationally standardized math achievement test. If the new, enriched mathematics curriculum leads to better learning in the population, then this sample of 25 students should perform better on the math achievement test than students in the larger population who learned mathematics under the traditional mathematics curriculum. Assume the 25 students had a mean math achievement test score of 110 with a standard deviation of 15. The mean math achievement test score for students in the population taking the traditional math curriculum was 100
  • Researching Education
    eBook - ePub

    Researching Education

    Perspectives and Techniques

    • Kanka Mallick, Gajendra Verma(Authors)
    • 2005(Publication Date)
    • Routledge
      (Publisher)
    In this example, we do not need ‘t’ tests since we are comparing entire populations. If, however the figures arose from samples, the differences should be tested for significance. Assuming that the letter writer’s figures were correct and applied to whole populations or to large samples, we may speculate on the cause of these differences. Is there a tendency perhaps, where entry to schools and universities is competitive, to encourage the brightest at the expense of the relatively less able? In grammar schools, independent schools and direct grant schools, the less able are in fact well above average for the whole population, but they may be referred to and treated as stupid and so become discouraged. Alternatively, or in addition, teachers may tend to teach classes at an average pace, so that those below average (a very high proportion) fall behind and so do not obtain the successes which their abilities merit. Thus, if the figures were correct, it might be worthwhile to make further enquiries to see whether some schools avoid this unfortunate effect.

    Some Other Theoretical Distributions

    So far, we have discussed the normal distribution and have seen how to use areas under the normal curve in studying populations or large samples. The ‘t’ distribution has been introduced and the ‘t’ distribution used in comparing the mean of a small sample with an expected value. Sometimes, in addition, variances of samples have to be compared. To test whether one sample is more variable than another, i.e. whether the variances in the two samples differ significantly, the F-test is employed.
    The theoretical distribution, known as the F-distribution, is not even nearly normal for small samples and differs in shape depending on the relative sizes of the samples being compared. Three tables are commonly available showing the probability of obtaining different values of F for samples of different sizes; the three tables correspond with probability levels 0.05, 0.01 and 0.001.

    Example 5

    Scores for 25 pupils taking a history test have a standard deviation of 17, scores for 19 of these pupils who took a maths test had a standard deviation of 11. Do the variances differ significantly? The variances are 172 and 112 respectively, i.e. 289 and 121.
    The significance of the F-ratio obtained is checked against the appropriate ‘table value’, this being dependent on the degrees of freedom (df) for each of the two variances. In the present example, there are 24 degrees of freedom for the larger variance (history, 25–1) and 18 for the smaller variance (mathematics, 19–1). Reference to Table 8.10 , an abridged version of significant F-ratio values, in which the upper numbers of each pair are the figures for the 5 per cent level and the lower ones (in bold type) are those for the 1 per cent level, shows that the value F=2.4 lies between the 5 per cent level (2.15) and the 1 per cent level (3.00) for 24 and 18 df values. This means that the null hypothesis (see the Glossary) that the variances do not differ is rejected. Their difference is significant; it is a difference which would occur by chance less frequently than 5 times in 100 times. Make sure to match up the degrees of freedom accurately, when reading from a table of F-values: the df for the larger variance (history) run across the Table, and the smaller one (mathematics), run down
Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.