Technology & Engineering

Z Test

The Z test is a statistical method used to determine whether the means of two data sets are different from each other. It involves calculating the Z score, which measures the number of standard deviations a data point is from the mean. This test is commonly used in research and analysis to make inferences about population means based on sample data.

Written by Perlego with AI-assistance

9 Key excerpts on "Z Test"

  • Statistical Power Analysis for the Social and Behavioral Sciences
    • Xiaofeng Steven Liu(Author)
    • 2013(Publication Date)
    • Routledge
      (Publisher)
    2 . The hypothesis test is then called a one-sided test. So hypothesis testing can be either two-sided or one-sided, depending on the sidedness of the alternative hypothesis. By default, the alternative hypothesis is two-sided.
    Once the hypotheses are set up, the empirical data are combined into a test statistic. The statistic Z divides the sample mean difference with its standard error
    Z =
    Y 1
    ¯
    Y 2
    ¯
    σ
    1
    n 1
    +
    1
    n 2
    ,
    (2.1)
    where Ȳ1 and Ȳ2 are the sample means of the treatment and control groups, and n1 and n2 are the sample sizes of the two respective groups. We will use the upper case for a random variable and the lower case for the realized value of the random variable. Z means a normal random variable, and z a realized value of the Z random variable.
    The test statistic follows a standard normal distribution Z when the null hypothesis is true. The most expected values of the test statistic are concentrated around zero or the mean of the standard normal distribution. It is intuitively easy to understand this, as the equal-means assumption in the null hypothesis implies that the sample mean difference Ȳ1 − Ȳ2 should not be far from zero in the numerator of the Z statistic. A large discrepancy between the realized value z of the Z statistic and its most expected value zero is construed as contradicting the null hypothesis. How incompatible the test statistic is with the null hypothesis is represented by the probability of obtaining a test statistic at least as contradictory as the realized z. In other words, it is the probability of obtaining z values as far from zero as the realized z value. We use a cumulative probability of obtaining a range of values to describe the relative standing of a z because the probability of getting an exact z is undefined. The continuous random variable Z can take an indefinite number of realized values. If each realized value occurs with a finite probability, the sum of the probabilities of all these values will not converge to one but exceed one, which violates the probability rule. It is analogous to gaging someone's relative height by checking the percentage of people taller than this person, for the probability of finding a person of exactly the same height cannot be defined. So we use the probability of obtaining a test statistic at least as contradictory as the realized z, which is called the p
  • Experimental Design and Statistics
    • Steve Miller(Author)
    • 2005(Publication Date)
    • Routledge
      (Publisher)
    2 are the standard deviations of the two samples whose means are being compared. The zero in the numerator has no effect on the calculation, and is there to remind us that we are comparing the difference in the means with the difference that would be expected if the samples came from the same population, i.e. 0. For practical purposes the formula for a two-sample Z Test is therefore:
    (B)
    The Z statistic is then referred to the normal distribution tables to determine the probability that the samples came from the same population of scores.
    (2) If the samples are small, say less than thirty observations in each group, the estimate of used above is too inaccurate. For small samples it can be shown that the best estimate of the population standard deviation is given by:
    where the symbol ^ means ‘an estimate of’.
    When this expression is substituted in formula A the result is a more complicated test statistic which deviates slightly from the normal distribution. This statistic is known as t and is given by:
    (C)
    As with Z, the value of t is converted into a probability by reference to the appropriate tables (t tables). In this case, however, the sample sizes N1 and N2 as well as the t value are needed when looking up the probability (see pp. 83–6).
    We can summarize this section on the derivation of Z and t-tests as follows. Under the null hypothesis the two samples of scores have been drawn from a single population. By making certain assumptions about that population we can discover the distribution that would be obtained if many pairs of samples were randomly selected and their means compared. The observed difference between the two means obtained in an experiment is then compared with these expected differences by using the Z or t formula. If the observed difference is well within the range of differences expected by chance then Z or t will have low values. If the observed difference in means is larger than would normally be obtained by taking two samples from the same population, then the Z or t
  • Understanding Educational Statistics Using Microsoft Excel and SPSS
    • Martin Lee Abbott(Author)
    • 2014(Publication Date)
    • Wiley
      (Publisher)
    Up to now, we have been dealing with population values (parameters) that are known to the researcher. As I mentioned in Chapter 9, however, this is fairly rare. If you do have access to parameter values, you can use the Z Test to help you make a statistical decision about whether your sample values are likely to be taken from that known population.
    If you do not know the population values, we will learn in this chapter how to estimate them so that we can use them in our statistical decision-making process. This will seem a little strange at first, since we will use sample values to help estimate the population values, but we are going to make use of our sampling distribution as well. In addition, we will learn to make small “adjustments” based on the sample size as a way to better understand the population values.
    As we have discussed, the Z distribution consists of the known areas in the standard normal distribution. We learned in Chapter 9 that we could use the features of this distribution to help us understand whether a sample value could likely come from a distribution with known population parameters (Z test). Now we turn to a related distribution, the T distribution, to help us understand whether a sample value could likely come from a distribution with unknown population parameters . This is typically the situation researchers encounter in real-life problem solving because the knowledge of parameters is unlikely in most situations.
    The T test for a single sample is a statistical procedure similar to the Z test, but with some limitations:
    1. Population parameters are unknown. Typically, the population mean may be known as a general estimate based on similar research, or on the basis of some other reason. However, the standard deviation of the population is not known. Therefore, a T test uses estimates of population parameters based on sample values.
    2. Sample size is small. Sample size is very important in statistics because it is used as a denominator in many calculations. Large sample sizes typically result in better estimates of population means. According to the Central Limit Theorem, repeated large samples will more closely approximate a normal distribution. But how large is large? Researchers and statisticians vary on that score. Typically, a sample size of 30 is considered large for statistical procedures by many researchers. Other researchers suggest higher values. I use N =
  • Optimizing the Display and Interpretation of Data
    • Robert Warner(Author)
    • 2015(Publication Date)
    • Elsevier
      (Publisher)
    P value, whether the data point is larger or smaller than the mean of the population to which the data point is being compared.
    Z scores are broadly analogous to the well-known T test in statistics. Both Z scores and the T test are used to test the null hypothesis, i.e., to determine the likelihood that there is no meaningful difference from the mean of a comparison population. Whereas a Z score tests the null hypothesis concerning the difference between each data point and the mean of a comparison population, the T test tests the null hypothesis concerning the means of two different populations of data.
    It is important to be able to test the null hypothesis for individual data points. This is because the individual values in most sets of data can be expected to vary to some extent. Using Z scores permits one to determine for each data point that differs from a population’s mean, whether it is likely to represent a true abnormality or to merely be an example of the expected random variation among the values in a population of data.
    One way to regard the Z score is as a tool for solving the classical problem of distinguishing between signal and noise. In the context of data analysis, noise consists of chance variations among the values of data points that fail to reveal meaningful information. In the present context, statistically significant Z scores indicate which members of a set of data are most likely to represent an actual signal, i.e., to convey meaningful information. Conversely, statistically insignificant Z scores are more to indicate data points that represent random noise. The determination of what P value should be considered to indicate a statistically significant difference depends upon various factors. One factor is whether it is considered more important not to miss a given abnormality (such as a very serious medical condition) or more important not to incorrectly identify apparent abnormalities whose presence would be inconsequential. Another factor is whether one is making comparisons that involve only one versus several different parameters. Comparisons involving multiple parameters require more stringent P values and therefore higher absolute values of Z scores. If the context in which the comparison of the data requires greater stringency, then lower P
  • Biostatistics Decoded
    • A. Gouveia Oliveira(Author)
    • 2020(Publication Date)
    • Wiley
      (Publisher)
    rejection region of the null hypothesis.
    People tend to become anxious when looking at mathematical formulas, so now is a good opportunity to show that formulas are really quite harmless. Figure 4.1 shows the formula for the z‐test. The z‐test is the statistical test we have just described and it is used to test for the differences between two means from large samples. We reject the null hypothesis when the result of the formula (the z‐value) is greater than 1.96. How does that work?
    Figure 4.1
    Rationale of the z‐test for large samples.
    Look again at the formula. You will recognize that the numerator is the difference between the two sample means. The modulus sign means that we are not concerned with the direction of the difference, only with its absolute value. You will also recognize that the denominator is the standard error of the sample differences, calculated from the sample standard deviation, s. What is the meaning of the value 1.96 for z, as the value above which we reject the null hypothesis? Well, z has that value when the observed difference between sample means is 1.96 greater than its standard error, or larger. When that happens we reject the null hypothesis because this means that the difference between sample means falls into the rejection region. The z‐test formula, therefore, is nothing more than an expeditious way of deciding upon the existence of a difference between population means without the need for calculating confidence limits.
    Here is an example. Two random samples of 50 and 60 individuals with hypercholesterolemia had each been receiving treatment with one of two different lipid‐lowering drugs. On an occasional observation, the mean serum cholesterol level in the first group was 190 mg/dl, with a standard deviation of 38 mg/dl. In the second group, the mean cholesterol level was 170 mg/dl and the standard deviation 42 mg/dl. The question is whether this difference of 20 mg/dl between the two sample means is inconsistent with the null hypothesis of no difference between mean cholesterol levels in the populations treated with one or the other drug.
  • Practical Statistics Simply Explained
    when to use them. The only way to learn this is by practice. Accordingly, a fairly extensive set of miscellaneous questions is provided, specifically to enable you to become proficient with handling the sort of situations you will meet in real life. The guide to these miscellaneous questions is given in the form of a Guide to Significance Tests at the very back of this book, where it serves as a kind of index. But you will need practice at using this Table, and the miscellaneous questions are there to provide just that, and at the same time will raise a number of additional points of interest. As always, answers and comments are provided.
    The time has now come to let the statistical cat out of the bag.
    zM TEST (1733)
    Purpose
    The Z Test for Measurements (hence ‘zM’) compares a random sample of 1 or more measurements with a large parent group whose mean and standard deviation is known.
    Principle
    All things that can be measured have an unlimited number of possible measurements, which together form a potential parent group of measurements. The 50 measurements of my desk (p. 33 ) were a sample from such a parent group. The frequency distribution of the measurements making up a parent group usually follows the pattern of the Normal Curve (p. 35 ). The most important dimensions of this pattern are the mean (the measure of central tendency) and the standard deviation (the measure of dispersion around the mean).
    The difference between the mean of a parent group and the mean of a random sample is given by simple subtraction. Dividing this difference by the standard deviation of the parent group converts the units of measurement (inches, volts, etc.) into standard deviation units. This ‘standardized difference’ between the parent group mean and the sample mean is the value called z. The further the sample mean is from the parent mean, the larger will be the value of z.
    In the case of parent groups whose measurements have a logarithmic distribution (p. 78 ), the above calculation can be carried out using logarithmic means and the logarithmic standard deviation (pp. 84
  • Social and Behavioral Statistics
    eBook - ePub

    Social and Behavioral Statistics

    A User-Friendly Approach

    • Steven P. Schacht(Author)
    • 2018(Publication Date)
    • Routledge
      (Publisher)
    Putting this into English, say the above bovine (cow) example has a present alpha level of .05 and we find that the difference between the number of times per day a Holstein touches an electric fence (7) and the national average (10) is statistically significant. As such, Holsteins are far less likely, on average, to touch the “damn” electric fence, and thus (because of this difference) we infer that they are more intelligent. Moreover, we are 95% confident that we have correctly rejected the null hypothesis. On the other hand, we also recognize that there is a 5% chance that we have rejected the null hypothesis when in fact it is true: a Type I Error. When the first actual hypothesis test is undertaken, in a moment, this becomes clearer.
    Although beyond the scope of this text in terms of how it is numerically calculated, it is nevertheless important to discuss a second type of statistical error that can occur when hypothesis testing: a Type II Error. This type of error occurs when we do not reject the null hypothesis when in fact we should have. In other words, we fail to recognize a statistically significant difference between two means when in fact there is one. A great deal of research undertaken hopes to find a difference (some researchers’ funding may even be contingent upon finding a difference), so this can be very problematic. Both the Type II Error and the previously discussed Type I Error outcomes along with the relationships of the correct rejection and acceptance of the null and research hypothesis are summarized in the 2 x 2 contingency table.
    Table 9.1 2 × 2 (Two-by-Two) Contingency Table of Type I and Type II Errors

    Z Test

    Having discussed the abstract theoretical aspect of hypothesis testing, we now can turn to an actual application. To this end we use the bovine data (summarized below) from the discussion that has preceded, and offer the accompanying statistical formula, called a Z Test. As denoted by the formula, the only way we can use a Z test is when the population mean, the population standard deviation, the sample mean, and the sample size values are all known.
    What the Z test enables us to determine is if the difference between the two means (μ: mu and
    X ¯
    : x bar) is statistically significant. As the formula states, we first subtract the population mean from the sample mean. The population standard deviation is then divided by the square root of the sample size, once again referred to as the standard error of the mean. Recall from the previous chapter that, simply stated, the standard error of the mean is a standardized measure of the amount of variability and potential sampling error found around the mean of a given data set. In terms of how it is calculated, the format of the standard error of the mean differs and is dependent upon the type of test being performed. Nevertheless, all of the statistical tests in this chapter use a form of the standard error of the mean in the denominator portion of the given formula. Finally, we simply divide the resultant numerator value by the resultant denominator value to get that is called a Z obtained.
  • Sensory Evaluation of Food
    eBook - ePub

    Sensory Evaluation of Food

    Statistical Methods and Procedures

    • Michael O'Mahony(Author)
    • 2017(Publication Date)
    • CRC Press
      (Publisher)
    t tests.
    In Section 4.2 concerning z tests, we noted that for a normally distributed population, a given score (X ) can be represented in terms of how many standard deviations it is above or below the mean (μ ). This is called a z score. If X has a z score of 1.5, it is 1.5 standard deviations above the mean.
    Table G.l, a table of probabilities for the normal distribution, can be used to determine from the z score, the chance probability (if H 0 is true) of getting a score so high or higher in the population (or a score so low or even lower). This exercise is called a z test. If this probability is very low, it is unlikely that the large difference between X and p is merely due to chance. It is more likely that X does not belong to the population of which p is the mean. We reject a null hypothesis saying that X and p are from the same population.
    In Section 4.5 we repeated this line of reasoning except that instead of examining a score (X ) and a normally distributed population with a mean of μ , we examined a mean of a sample
    (
    X ¯
    )
    and a normally distributed sampling distribution with a mean p . We used Table G.l in the same way to see the probability of getting a sample mean as high as
    X ¯
    or higher on the null hypothesis. In effect, we used Table G.l to determine whether
    X ¯
    was significantly different from μ , whether the sample mean actually belonged to the sampling distribution, or in other words, whether the sample was drawn from that population.
    In the single-score case (X ), the population (mean = μ , standard deviation = σ ) must be normal for Table G.l to be used. In the sample mean case
    (
    X ¯
    )
    , the sampling distribution will always be normal (by the central limit theorem) regardless of the population, as long as the sample size (N ) is not too small. The sampling distribution has the same mean (μ ) as the population, but being a distribution of means, its standard deviation (called the standard error σ /N ) is smaller than that of the population (σ ). However, to calculate a z score and hence use Table G.l, the standard deviation of the population (σ ) must be known. It must be known to be able to determine the number of standard deviations (σ ) a score (X ) is above μ or the number of standard errors
    ( σ / N )
    a sample mean
    (
    X ¯
    )
    is above μ
  • Econometrics
    eBook - ePub
    • K. Nirmal Ravi Kumar(Author)
    • 2020(Publication Date)
    • CRC Press
      (Publisher)
    ¯ and sample SD (s) are 45.2 and 3.02 respectively. Using this information, assuming 5 per cent level of significance, compute whether sample mean differ significantly from population mean or not? In this case, given
    ○  Population mean = 45 ○  Sample mean = 45.2 ○  Sample SD (s) = 3.02 ○  Considering above, the following hypotheses are formulated:
    •  HO : Sample mean
    X ¯
    is statistically equal to Population mean (μ) or
    X ¯
    = μ
    •  HA : Sample mean
    X ¯
    is statistically ≠ Population mean (μ).
    In the above case, since the sample size n = 50 (i.e., n > 30), the researcher has to employ Z-statistic to study the statistical difference between sample mean and population mean. The following is the formula employed to compute Z-statistic (as per the Equation 7.52):
    Z cal
    =
    (
    X ¯
    μ )
    (
    s / n
    )
    =
    ( 45.2 45 )
    (
    3.02 / 50
    )
    = 0.468
    The above value, Z = 0.468 is called Z-calculated value. Since, the assumed level of significance, ‘α’ is 0.05 and it is a two-tailed test, the researcher framed the two critical regions both on left side and right side under the normal distribution curve (Figure 7.5 ), an area equal to α/2 lies on both the tails, ie., α = 0.025 each for left and right tails. Since, the Zcal value (0.468) is positive, we look on the right hand side tail of the normal distribution curve and it is found less than Ztab value (+1.96 at 5% level of significance ie., Z0.025 = +1.96). So, the HO is accepted and conclude that,
    X ¯
Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.