Technology & Engineering

Wilcoxon Rank Sum Test

The Wilcoxon Rank Sum Test is a non-parametric statistical test used to compare two independent samples. It is used when the assumptions of the t-test cannot be met, such as when the data is not normally distributed or the variances are unequal. The test compares the medians of the two samples to determine if they are significantly different.

Written by Perlego with AI-assistance

10 Key excerpts on "Wilcoxon Rank Sum Test"

  • Discovering Statistics Using SAS
    This process results in high scores being represented by large ranks, and low scores being represented by small ranks. The analysis is then carried out on the ranks rather than the actual data. This process is an ingenious way around the problem of using data that break the parametric assumptions. Some people believe that non-parametric tests have less power than their parametric counterparts, but as we will see in Jane Superbrain Box 15.1 this is not always true. In this chapter we’ll look at four of the most common non-parametric procedures: the Mann–Whitney test, the Wilcoxon signed-rank test, Friedman’s test and the Kruskal–Wallis test. For each of these we’ll discover how to carry out the analysis on SAS and how to interpret and report the results. 15.3. Comparing two independent conditions: the Wilcoxon rank-sum test and Mann–Whitney test When you want to test differences between two conditions and different participants have been used in each condition then you have two choices: the Mann-Whitney test (Mann & Whitney, 1947) and Wilcoxon’s rank-sum test (Wilcoxon, 1945; Figure 15.2). These tests are the non-parametric equivalent of the independent t -test. In fact both tests are equivalent, and there’s another, more famous, Wilcoxon test, so it gets extremely confusing for most of us. To make life slightly more confusing, there are two Wilcoxon tests – the Wilcoxon rank-sum test, and the Wilcoxon signed-rank test. As you can imagine, these similar names mean that it’s very easy to confuse things – many people prefer to never mention the rank-sum test, always calling it the Mann–Whitney test. However SAS calls them both Wilcoxon tests. To make it easier to tell which test I am talking about, I’m going to refer to the Wilcoxon two-sample test when we have independent groups (i.e. two samples), and the Wilcoxon signed-rank test when we have repeated measures (i.e
  • Sensory Evaluation of Food
    eBook - ePub

    Sensory Evaluation of Food

    Statistical Methods and Procedures

    • Michael O'Mahony(Author)
    • 2017(Publication Date)
    • CRC Press
      (Publisher)
    2 = 5 are
    2
    for p = 0.05
    0
    for p = 0.01
    23 25
    Our values of 20 and 5 are neither equal to, nor do they exceed the values in the table (e.g., 20 < 23; 5 > 2; etc.), so we cannot rejectH 0 at the 5% or 1% levels of significance. In fact, we cannot even reject it at the 10% level (critical values: 4, 21). Thus tasters and nontasters of PTC are not significantly different in their sensitivities to the second chemical (p > 0.1).

    16.6 The Wilcoxon-Mann-Whitney Rank Sums Test

    Relationship to the Mann-Whitney U Test
    This test, rather unfortunately, has a lot of different names. It was apparently devised by Frank Wilcoxon and has been called the rank sums test or the Wilcoxon rank sums test (not to be confused with the Wilcoxon test of Section 16.2 ). However, because it is equivalent to the U test of Mann and Whitney (Section 16.4 ), it is also sometimes called the Mann-Whitney test. But do not confuse it with the Mann-Whitney U test. It is not the same test; it is merely equivalent. To try and avoid confusion with these other tests, as well as diplomatically credit all possible authors, we will give this test the unwieldy name of the Wilcoxon-Mann-Whitney rank sums test. If that seems too long, it can be called the rank sums test. Consult the Key to Statistical Tests to see the relationship of this test to others.
    This independent-samples, two-sample difference test is used as an alternative to the Mann-Whitney U test. It is also related to the Mann-Whitney U test; in fact, it can be considered as a Mann-Whitney U test in disguise. The index for the rank sums test, S, is merely the difference between the two Mann-Whitney U values.
  • SPSS Explained
    eBook - ePub
    • Perry R. Hinton, Isabella McMurray, Charlotte Brownlow(Authors)
    • 2014(Publication Date)
    • Routledge
      (Publisher)
    Through the Options button you can select the means and standard deviations for the variables. While these descriptive statistics can be performed on ordinal data sets, we recommend a degree of caution when examining and interpreting them.
    SPSS advanced
    Mann-Whitney U is a commonly used independent design nonparametric test. However, a brief description is given below of further analyses that you may wish to use to examine the nature of your data.
    The Kolmogorov-Smirnov test is used in a variety of analyses, but in the case of Mann-Whitney and two sample tests, it is used to determine whether the two sets of scores come from the same distribution.
    The Wald-Wolfowitz runs test is used to show how many ‘runs’ we have in the rank ordering of the data (a run is a series of ranks from the same group). As we go through the ranks, we can see if consecutive ranks come from the same group. If one group has all the bottom ranks and the other group has all the top ranks, we only have two runs. In our data, the number of runs is six, indicating some mixing of the groups. With 16 participants, the worst mixing would give us 16 runs.
    The Moses extreme reactions test takes one of the groups as a control group and the second as an experimental group, and checks to see whether the experimental group has more extreme values than the control group.

    SPSS output

    The first table generated by SPSS is a description of the data giving the Mean Rank for each group and the Sum of Ranks for each group.
    Ranks
    social club N Mean Rank Sum of Ranks
    enjoyment rating Valley Social Club 9 11 22 101.00
    Hilltop Social Club 7 5.00 35.00
    Total 16
    It may be worth consulting a statistics book to refresh your memory on the logic of the Mann-Whitney test to enable a fuller understanding of this test – for example, Hinton (2014).
    See Chapter 17 , Hinton (2014)
    SPSS essential
    N indicates the number of participants in each group, and the total number of participants.
    The Mean Rank indicates the mean rank of scores within each group.
    The Sum of Ranks
  • Statistics for Psychologists
    eBook - ePub

    Statistics for Psychologists

    An Intermediate Course

    not normal. A number of commonly used distribution-free procedures will be described in Sections 8.2–8.5.
    A further class of procedures that do not require the normality assumption are those often referred to as computationally intensive for reasons that will become apparent in Sections 8.6 and 8.7. The methods to be described in these two sections use repeated permutations of the data, or repeated sampling from the data to generate an appropriate distribution for a test statistic under some null hypothesis of interest.
    Although most of the work on distribution-free methods has concentrated on developing hypothesis testing facilities, it is also possible to construct confidence intervals for particular quantities of interest, as is demonstrated at particular points throughout this chapter.
    8.2.  The Wilcoxon–Mann–Whitney Test and Wilcoxon’s Signed Ranks Test
    First, let us deal with the question of what’s in a name. The statistical literature refers to two equivalent tests formulated in different ways as the Wilcoxon Rank Sum Test and the Mann–Whitney test. The two names arise because of the independent development of the two equivalent tests by Wilcoxon (1945) and by Mann and Whitney (1947). In both cases, the authors’ aim was to come up with a distribution-free alternative to the independent samples t test.
    The main points to remember about the Wilcoxon–Mann–Whitney test are as follows.
    1. The null hypothesis to be tested is that the two populations being compared have identical distributions. (For two normally distributed populations with common variance, this would be equivalent to the hypothesis that the means of the two populations are the same.)
  • Statistics Explained
    • Perry R. Hinton(Author)
    • 2014(Publication Date)
    • Routledge
      (Publisher)
    Sometimes samples are not suitable for parametric analysis. It might be that the assumptions are not met, as the data are clearly not normally distributed. In this case a nonparametric test may be suitable. These tests do not require the same assumptions as a parametric test. One of the most common situations for using a nonparametric test is when the data are ordinal. In this case the scores can be used to order the data but we do not assume that the scores come from an interval scale. If someone ticks 8 (on a 1–10 scale) on a questionnaire for ‘how much do you like tennis?’ and 4 for ‘how much do you like golf?’ we know they like tennis more than golf but we might not be able to say they like it twice as much if we have concerns that the human ability to judge in combination with the questionnaire as a measuring device is not accurate enough for us to assume interval or ratio data. So the scores are converted to ranks and the analysis is performed on these ranks.
    In the Mann-Whitney test for independent samples, all the scores are ranked from lowest to highest and a U value calculated for each sample. If one sample is producing the high scores, and the other sample the low scores, then one U value will be large and the other small. If they are jumbled up then the two U values will be similar. When one U value is small we can work out the exact probability of obtaining such a small value or smaller under the null hypothesis and make a statistical decision of significance.
    The Wilcoxon test also analyses the ranks but as it is used with paired samples, the differences between the subjects’ scores on each sample are ranked. We would expect these to be consistent if there is an effect of the independent variable, and we would expect one large value and one small value for the sum of the ranks of the positive differences and the sum of the ranks of the negative differences. We label the small value as T . If the two values were similar then we would not expect a significant difference between the samples. We can work out the exact probability of finding a value as small or smaller than the calculated value of T
  • Statistical Analysis for Education and Psychology Researchers
    eBook - ePub

    Statistical Analysis for Education and Psychology Researchers

    Tools for researchers in education and psychology

    • Ian Peers(Author)
    • 2006(Publication Date)
    • Routledge
      (Publisher)
    The alternative hypothesis may be directional (a one-tailed test), for example, the majority of larger rank scores are found in one sample and this sample would have a larger mean rank score, or nondirectional, for example, this simply states that the two sample distributions of rank scores are different. The test statistic, S R, is the rank sum for the sample (group) which has the smallest sample size. With small sample sizes, <10, this test statistic, S R, has an exact sampling distribution, however S R rapidly approaches a normal distribution as the sample size of one or both of the groups increases—for sample sizes ≥20. The Wilcoxon M-W test is based on the idea that if there are two populations and not one (i.e., H 0 is false) the rank order scores in one sample will generally be larger than the rank scores in the other sample. This difference, that is higher ranking scores found mostly in one sample, could be detected by ranking all scores irrespective of what group they belong to and then summing the rank scores according to group membership. If H 0 is true, we would expect the rank scores to be similarly represented in both samples (groups) and the average ranks in each of the two groups to be about equal. We would not reject the null hypothesis and conclude that there is no difference in the two distributions being compared. If the two samples were different, that is having come from two distinct populations, then we would expect higher (or lower) rank sum totals (allowing for differences in sample size) in one of the samples. The sampling distribution of the rank sum S R is known and hence the probability associated with extreme values of the test statistic
  • Applied Statistical Inference with MINITAB®, Second Edition
    You may think that since nonparametric methods do not rely on strict distributional assumptions, these methods would obviously be preferred over the standard parametric methods of inference. However, this is not usually the case because nonparametric methods do rely on some basic distributional assumptions. Furthermore, nonparametric methods are not as powerful as parametric methods when all the model assumptions have been met. So, in practice, nonparametric methods only tend to be used when there are gross violations of the model assumptions, as is often the case when there are a significant number of outliers that could impact the analysis or when smaller sample sizes are used.
           

    11.2 Wilcoxon Signed-Rank Test

    The first nonparametric test that we will be considering is called the Wilcoxon signed-rank test . The Wilcoxon signed-rank test is similar to a t -test for comparing a population mean against a specific hypothesized value. However, the difference between the Wilcoxon signed-rank test and a one-sample t -test is that while the population median is tested against some hypothesized value in the former, the population mean is tested against some hypothesized value in the latter. The possible null and alternative hypotheses for a Wilcoxon signed-rank test are as follows:
    H 0
    : η =
    η 0
    H 1
    : η
    η 0
    H 0
    : η =
    η 0
    H 1
    : η <
    η 0
    H 0
    : η =
    η 0
    H 1
    : η >
    η 0
    Where η is the symbol for the population median and η 0 is the median being tested under the null hypothesis.
    The Wilcoxon signed-rank test relies on the assumption that the sample is drawn from a population that has a symmetric distribution , but no specific shape of the distribution is required. Recall in Chapter 4 that for a one-sample t -test with a small sample size (usually when n < 30), we assumed that the population we sampled from was approximately normally distributed. If the population sampled from does not come from a normal distribution, then we may want to consider using the Wilcoxon signed-rank test as an alternative to a one-sample t
  • Introductory Probability and Statistics
    eBook - ePub

    Introductory Probability and Statistics

    Applications for Forestry and Natural Sciences (Revised Edition)

    • Robert Kozak, Antal Kozak, Christina Staudhammer, Susan Watts(Authors)
    • 2019(Publication Date)
    Section 14.1 , uses only plus and minus signs to identify differences between the observations and their median. In the 1940s, Frank Wilcoxon created a similar but more sophisticated test that uses both the direction and the magnitude of the differences between the observations and their median.
    The null and alternative hypotheses for the so-called Wilcoxon signed rank test are the same as those in Section 14.1 . As with the sign test, the Wilcoxon signed rank test can be used to test the null hypothesis of = c in a one-sample test and the null hypothesis of in a paired difference two-sample test. Also, if the samples (either one-sample or paired) are taken from a continuous symmetric population, the signed rank test is applicable for testing unknown population means, as well as medians. In the one-sample case, the absolute values of the differences between the observations and the unknown hypothetical population median (or mean) are ranked. In the paired sample cases, the absolute values of the paired differences (
    di
    ) are ranked. In both cases, zero differences are discarded in the process of ranking. If there are ties, we assign the average of the ranks that would have been assigned if the differences were distinguishable. This concept of ‘ties’ is perhaps best illustrated with an example. In the following problem, the lowest ranked differences are –2, –2 and 2. Since we are looking at absolute values only, this is considered a three-way tie. If the differences between these values were distinguishable, they would be ranked 1, 2 and 3. However, since they are tied, each receives an average rank of (1 + 2 + 3)/3 = 2.
    If the null hypothesis is true, the total of the ranks corresponding to the positive differences (w + ) should be approximately equal to the total of the ranks of the negative differences (w ). When repeated samples are taken from a population, the w + and w are considered individual values of W + and W
  • Sample Size Determination and Power
    • Thomas P. Ryan(Author)
    • 2013(Publication Date)
    • Wiley
      (Publisher)
    That is, the one sample can be a sample of differences, computed by subtracting each of the second set of observations from each of the corresponding observations in the first sample. Then the null hypothesis is the same as when the starting point is just a single sample. This is probably the most common use of the Wilcoxon test. The set of differences might be correlated, however, so Rosner, Glynn, and Lee (2003, 2006) proposed a modified Wilcoxon test for paired comparisons of clustered data and Rosner and Glynn (2011) proposed sample size determination methods for the test by extending their methods for the regular Wilcoxon test that were given in Rosner and Glynn (2009). 10.2 WILCOXON TWO--SAMPLE TEST (MANN--WHITNEY TEST) The Wilcoxon two-sample test is more commonly referred to as the Mann--Whitney (1947) test and is sometimes called the Wilcoxon--Mann--Whitney test [as in Rahardja, Zhao, and Qu (2009) and Shieh, Jan, and Randles (2006)]. It is used for two independent samples to test whether the corresponding two populations have the same distribution when it is not reasonable to assume a normal distribution. As such, it is the most commonly used nonparametric test for comparing two populations. Okeh (2009) surveyed five biomedical journals and concluded that the Mann--Whitney two-sample test and the Wilcoxon one-sample test should be used more frequently in medical research, in which nonnormal data are widespread, with much data being ordinal (Rabbee, Coull, Mehta, Patel, and Senchaudhuri, 2003). Posten (1982) studied the power of the test relative to the independent sample t -test for various nonnormal distributions and concluded that the former is superior to the latter, although not for U- and J-shaped distributions
  • Beginning Statistics with Data Analysis
    • Frederick Mosteller, Stephen E. Fienberg, Robert E.K. Rourke, Stephen E. Fienberg, Robert E.K. Rourke(Authors)
    • 2013(Publication Date)
    B. The students are then tested for recall, and the following scores are obtained:
    Use the t test of Chapter 10 to analyze these data.
    8. Continuation. Reanalyze the data from Problem 7 using the sign test, and compare the observed significance level with that from the t test.
    9 . Continuation. Suppose we were to add two additional pairs of scores to the data in Problem 7:
    How do these new observations affect the sign test of Problem 8?

    16-3 THE MANN-WHITNEY-WILCOXON TWO-SAMPLE TEST

    When we have two independent samples, we may want to know if the populations have much the same location or if they are separated.
    When we are willing to suppose that our measurements are approximately normally distributed without wild observations, the two-sample t test suits us well for this purpose. But when we have no such comfortable views about the samples and their populations, we may prefer an approach in which a few wild observations will cause only limited damage. The Mann-Whitney-Wilcoxon two-sample rank test offers such an approach.
    EXAMPLE 4 Samples of sizes 2 and 4. Sample A contains two measurements, 6 and 24; sample B has four measurements, 14, 33, 74, and 105. Compare the samples for evidence that sample B comes from a population slipped to the right of that of sample A.
    SOLUTION. The ranking approach considers all six measurements as a population. Ranks are assigned from least to greatest, here rank 1 to the measurement 6, rank 2 to 14, and so on up to rank 6 for 105. Then we form all possible situations that divide the six into two samples of sizes 2 and 4. Finally, we compute the distribution of their summed ranks for the samples of size 2.
             This program has been carried out in Table 16-1
Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.