Non-parametric alternative: Kruskal-Wallis ANOVA

Written by:

Ylva B Almquist

Quick facts

Number of variables
One group variable
One test variable

Scales of variable(s)
Group variable: categorical
Test variable: continuous or categorical (ordinal)

Introduction

It is not uncommon that at least one of the assumptions behind the one-way ANOVA is violated. While you most commonly will be able to conduct the test anyway, it is important to be aware of the possible problems.

Alternatively, you can use a Kruskal-Wallis ANOVA, which is a non-parametric type of ANOVA that relaxes some of the assumptions that were presented earlier.

The Kruskal-Wallis ANOVA is specifically used when the test variable is not sufficiently normally distributed (e.g., when you have a test variable on the ordinal scale).

Note
The Kruskal-Wallis ANOVA is sometimes referred to as the Kruskal-Wallis test, Kruskal-Wallis one-way ANOVA by ranks test, or Kruskal-Wallis H test.

Note
While this test is robust over moderate violations against the normality assumption, the group sizes should be approximately equal and that the distributions of the groups should also are approximately equal (they cannot be skewed in different directions, i.e. one is positively skewed and another is negatively skewed).

Note
The Kruskal-Wallis ANOVA will only tell you that there is a difference between the groups (or not), but not which of groups that are different from one another.

Note
It is very similar to the Mann-Whitney u-test, but adapted to accommodate a comparison between more than two groups.

Chi-square distribution

The Kruskal-Wallis ANOVA assumes a chi-square distribution.

A chi-square distribution is a continuous probability distribution. The main purpose of this type of distribution is not to describe real-world scenarios, but to be used in hypothesis testing.

The shape of the distribution is determined by the chosen alpha level (i.e., significance level) and degrees of freedom. The fewer the degrees of freedom the more the peak of the distribution approaches 0.

Note
For the Kruskal-Wallis ANOVA, the degrees of freedom correspond to the number of groups minus 1 (k-1).

H-statistic and chi-squared value

Coming soon!

Note
There is also something called “ties” that is relevant for the Kruskal-Wallis ANOVA. It basically means that the same value can occur in two or more groups. In this case, the calculation needs to be adjusted for ties.

Function

Basic command

kwallis testvar, by(groupvar)

Explanations
`testvar`	Insert the name of the variable that you want to test.
`groupvar`	Insert the variable defining the groups.

More information
help kwallis

Practical example

Dataset

StataData1.dta

Variable name	income
Variable label	Annual salary income (Age 40, Year 2010)
Value labels	N/A

Variable name	educ
Variable label	Educational level (Age 40, Year 2010)
Value labels	1=Compulsory 2=Upper secondary 3=University

kwallis income, by(educ)

The table provides some summary statistics. Here we can see number of observations per group (Obs) and the rank of each group (Rank Sum). These ranks are u-values.

Below the table, two sets of chi2 values and probabilities are reported. If the rank variable (in this case income) do not uniquely define individuals (i.e. individuals can have the same income and thus the same rank), then we should focus on the latter set (“chi-squared with ties”).

In this example, the chi2 value is 549.333. The probability is moreover 0.0001. Since this is below 0.05, it means that there is a significant difference in income according to educational level (confirming what we found for the one-way ANOVA).