How to deal with missing data?

It is not very easy to statistically address whether missingness is MCAR, MAR or MNAR. The most important advice is that you have to know your data well: produce descriptive statistics for your study variables to see the extent of missingness in the data material. Obviously, if you have a small number of individuals in your data material, a couple of missing values would have more serious consequences than if you have a couple of missing values in a data material based on the total population of a country.

A sound strategy to map out and illustrate potential problems with missingness is first to find out anything you can about the reasons for external attrition. Why are some individuals not included in your dataset? Is it likely that they similar in any important way or is the missingness due to technical reasons?

Then you get into the issue of internal attrition. Analysing internal attrition is simply called attrition analysis or non-response analysis. What you do here is to pick one or more variables for which all individuals in the study sample has information, such as gender, age, or some other socio-demographic variable. Produce descriptive statistics (choice of type of descriptive statistics depends on the measurement scale) for those variables, for all individuals in the sample. Then you produce descriptive statistics for the same variables, but now only for the individuals in the analytical sample (Section 11.5 describes how to define an analytical sample).

For example, we have a study sample that contains 5,000 individuals. Approximately 49% are men and 51% are women. The mean age is 38 years. Due to missing data on some of the variables we want to include in our analysis, our analytical sample is reduced to 4,500 individuals. In this sample, 46% are men and 54% are women. The mean age is 40 years. You can illustrate this in a simple descriptive table:

 Sample (n=5,000)Analytical sample (n=4,500)
Gender  
   Man49%46%
   Woman51%54%
 Age (mean)38 years40 years

If we compare the distribution of gender and age in the study sample with the distribution of gender and age in the analytical sample, we can conclude that women and older individuals are more likely to be included in our analysis. This is information that could be important to have when we interpret our results.