Ordinal regression in short

If you have only one x, it is called simple regression, and if you have more than one x, it is called multiple regression.  

Regardless of whether you are doing a simple or a multiple regression, x-variables can be categorical (nominal/ordinal) and/or continuous (ratio/interval). 

Key information from ordinal regression

Effect 
Odds ratio (OR)The exponent of log odds
Log oddsThe logarithm of odds
OddsThe probability of the outcome being case divided by the probability of the outcome being a non-case
ProbabilityThe probability of an event happening
Direction 
NegativeOR below 1
PositiveOR above 1
Statistical significance 
P-valuep<0.05 Statistically significant at the 5% level
p<0.01 Statistically significant at the 1% level
p<0.001 Statistically significant at the 0.1% level
95% Confidence intervalsInterval does not include 1: Statistically significant at the 5% level
Interval includes 1: Statistically non-significant at the 5% level

Odds ratio (OR) 

In ordinal regression analysis, the effect that x has on y is reflected by an odds ratio (OR): 

OR below 1 For every unit increase in x, the odds of being in a higher ordered category of y decreases. 
OR above 1 For every unit increase in x, the odds of being in a higher ordered category of y increases. 

Exactly how one interprets the OR in plain writing depends on the measurement scale of the x-variable. That is why we will present examples later for continuous, binary, and categorical (non-binary) x-variables. 

Note
Unlike linear regression, where the null value (i.e. value that denotes no difference) is 0, the null value for ordinal regression is 1. 
Note
An OR can never be negative – it can range between 0 and infinity.

How not to interpret odds ratios 

Odds ratios are not the same as risk ratios (see Attributable proportion). ORs tend to be inflated when they are above 1 and understated when they are below 1. This becomes more problematic the more common the outcome is (i.e. the more “cases” we have). However, the rarer the outcome is (<10% is usually considered a reasonable cut-off here), the closer odds ratios and risks ratios become. 

Many would find it compelling to interpret ORs in terms of percentages. For example, an OR of 1.20 might lead to the interpretation that the odds of being in a higher ordered category of the outcome increase by 20%. If the OR is 0.80, some would then suggest that the odds of being in a higher ordered category of the outcome decrease by 20%. We would to urge you to carefully reflect upon the latter kind of interpretation since odds ratios are not symmetrical: it can take any value above 1 but cannot be below 0. Thus, the choice of reference category might lead to quite misleading conclusions about effect size. The former kind of interpretation is usually considered reasonable when ORs are below 2. If they are above 2, it is better to refer to “times”, i.e. an OR of 4.07 could be interpreted as “more than four times the odds of…”. 

Take home messages
  • Do not interpret odds ratios as risk ratios, unless the outcome is rare (<10%, but even then, be careful). 
  • It is completely fine to discuss the results more generally in terms of higher or lower odds/risks. However, if you want to give exact numbers to exemplify, you need to consider the asymmetry of odds ratios as well as the size of the OR. 

P-values and confidence intervals

In ordinal regression analysis you can get information about statistical significance, in terms of both p-values and confidence intervals (also see P-values).  

Note
The p-values and the confidence intervals will give you partly different information, but they are not contradictory. If the p-value is below 0.05, the 95% confidence interval will not include 1 and, if the p-value is above 0.05, the 95% confidence interval will include 1.

When you look at the p-value, you can rather easily distinguish between the significance levels (i.e. you can directly say whether you have statistical significance at the 5% level, the 1% level, or the 0.1% level).  

When it comes to confidence intervals, Stata will by default choose 95% level confidence intervals. It is however possible to change the confidence level for the intervals. For example, you may instruct Stata to show 99% confidence intervals instead. 

R-Squared

R-Squared (or R2) does not work very well due to the assumptions behind ordinal regression. Stata produces a pseudo R2, but due to inherent bias this is seldom used. 

Simple versus multiple regression models 

The difference between simple and multiple regression models, is that in a multiple regression each x-variable’s effect on y is estimated while accounting for the other x-variables’ effects on y. We then say that these other x-variables are “held constant”, or “adjusted for”, or “controlled for”. Because of this, multiple regression analysis is a way of dealing with the issue of confounding variables, and to some extent also mediating variables (see Z: confounding, mediating and moderating variables). 

It is highly advisable to run a simple regression for each of the x-variables before including them in a multiple regression. Otherwise, you will not have anything to compare the adjusted coefficients with (i.e. what happened to the coefficients when other x-variables were included in the analysis). Including multiple x-variables in the same model usually (but not always) means that they become weaker – which would of course be expected if the x-variables overlapped in their effect on y.   

A note

Remember that a regression analysis should follow from theory as well as a comprehensive set of descriptive statistics and knowledge about the data. In the following sections, we will – for the sake of simplicity – not form any elaborate analytical strategy where we distinguish between x-variables and z-variables (see Z: confounding, mediating and moderating variables). However, we will define an analytical sample and use a so-called pop variable (see From study sample to analytical sample).