Non-parametric alternative: Mann-Whitney u-test

Written by:

Ylva B Almquist

Quick facts

Number of variables
One group variable
One test variable

Scales of variable(s)
Group variable: categorical with two values (binary)
Test variable: continuous or categorical (ordinal)

Introduction

It is not uncommon that at least one of the assumptions behind the independent samples t-test is violated. While you most commonly will be able to conduct the test anyway, it is important to be aware of the possible problems.

Alternatively, you can use the Mann-Whitney u-test, which is a non-parametric independent t-test that relaxes some of the assumptions that were presented earlier.

The Mann-Whitney u-test is specifically used when the test variable is not sufficiently normally distributed (e.g., when you have a test variable on the ordinal scale).

Note
The Mann-Whitney u-test is sometimes referred to as the Wilcoxon-Mann-Whitney test or the Wilcoxon Rank-Sum test.

Z-distribution

The Mann-Whitney u-test assumes a z-distribution.

The z-distribution is a special form of a normal distribution, where the mean is 0 and the standard deviation is 1.

U-values and z-statistic

To perform the Mann-Whitney u-test, the rankings of the individual values first need to be determined. In other words, the test starts by ordering the values across the two groups and assigns each individual a rank. These rankings are then added up for each of the two groups and transformed into u-values.

From the u-values, we can calculate a z-statistic.

P-value

For each z-statistic, there is a corresponding p-value. A p-value that is lower than 0.05 means that we can reject the null hypothesis (which stipulates that there is no difference between the groups).

Note
There is also something called “ties” that is relevant for the Mann-Whitney u-test. It basically means that two individuals can share the same rank (because they have the same value for the test variable). In this case, the calculation needs to be adjusted for ties.

Function

Basic command

ranksum testvar, by(groupvar)

Explanations
`testvar`	Insert the name of the variable that you want to test.
`groupvar`	Insert the variable defining the two groups.

More information
help ranksum

Practical example

Dataset

StataData1.dta

Variable name	cognitive
Variable label	Cognitive test score (Age 15, Year 1985)
Value labels	N/A

Variable name	sex
Variable label	Sex
Value labels	0=Man 1=Woman

ranksum cognitive, by(sex)

The z-statistic in this example is 5.100, with a p-value of 0.0000. Since the p-value is below 0.05, this allows us to reject the null hypothesis (which postulates that there is no mean difference between the two groups).

In other words, there is a significant difference in mean cognitive test scores between men and women in this example, to the advantage of men (just like the previous t-test also showed).

Note
We tend to recommend a pragmatic approach to the choice between parametric and non-parametric t-tests. If you experience a violation against the parametric t-test (i.e., the independent samples t-test), we strongly encourage you to perform the non-parametric t-test (i.e., the Mann-Whitney u-test) as a sensitivity analysis. If the latter leads the same conclusion, it is preferable to use the former since it is allows for further specifications. However, you might still exercise a bit of caution when it comes to reporting the exact mean differences.