As we discussed earlier (see Missing data: attrition and non-response), it is common to have missing data. Missing data is sometimes called attrition (particularly in register studies) and sometimes non-response (particularly in survey/questionnaire studies). Missing data can be external or internal:
| External | Occurs when individuals have been sampled from the population but, for various reasons, they do not get included in the register study (they have immigrated, died, moved, are imprisoned, etc.) or do not participate in the survey (they decline, are too sick, cannot be reached, etc.). |
| Internal | Occurs when individuals who are part of the study, for various reasons (they missed a page of the questionnaire, they refuse to answer specific questions, etc.), have no information for a specific variable or a set or variables. |
As shown above, there are many reasons for missing data. If the missingness is problematic or not, depends on what type of missing data we have. In statistical analysis, there are three types of missing data:
| MCAR | Missing Completely At Random: The probability of missing data is unrelated to both observed and unobserved data; it is completely by chance alone |
| MAR | Missing At Random: The probability of missing data is unrelated to unobserved data but may be related to observed data |
| MNAR | Missing Not At Random: The probability of missing data is related to unobserved data |
This was probably a bit confusing – let us exemplify the differences between MCAR, MAR and MNAR. Suppose we examine the distribution of income in the Swedish population. If missing data were MCAR, it means that the missingness is unrelated to both observed data (e.g. gender, employment status) and unobserved data (e.g. lower income does not influence the risk of missingness). If missing data were MAR, it would mean that missingness could be related to other variables in the dataset, but the probability of missingness is not increased by certain values of the variable itself (e.g. individuals having lower incomes). Finally, if individuals who had certain values of the variables itself were more likely to be missing, we would have MNAR.