Statistical analysis

Written by:

Sydney Ross

After all of our preparations, we are now ready to begin analysis. First we will start with some descriptive analysis, then in the following pages we will do simple and multiple logistic regression analysis, interaction analysis, and model diagnostics.

Step 1: Cross-tabulation

To begin statistical analysis, we will first create a cross-tabulation with conflict_dich and health_dich. This cross-tabulation is just an initial exploration to see if there might be something interesting here to begin with, before going into investigating the association using regression analyses.

Note
Remember that the cross-tabulation should only be based on individuals with a value of 1 for the pop variable.

tab conflict_dich health_dich if pop==1, row

The cross-tabulation shows that in the group with conflicting demands to a large extent, 69% report poor self-rated health, compared to those who report a small extent of conflicting demands at work, where 25% report poor self-rated health. In general, 66% report good health and 34% bad health.

Next, we will add a chi-squared test to the cross-tabulation.

tab conflict_dich health_dich if pop==1, chi2

After viewing the chi-squared test, we see that there is a significant connection between conflict_dich and health_dich (P= 0.000).

Step 2: Descriptive statistics of analytical sample

The descriptive statistics may feel a bit unnecessary when we already know the variables using the codebook command previously. However, it is important later in the result section to describe the analytical sample. Therefore, we give examples on how to perform the descriptive statistics of your analytical sample below. It may also be relevant to describe the independent variables and covariates in relation to the outcome variable, which is what we will do here.

Note
For extra review, please re-visit this section: Designing descriptive tables and graphs.

Note
Remember that the descriptive statistics should only be based on those with a value of 1 for the pop variable.

First, we will look at all of the variables with codebook, where we want to describe the distribution between the categories for each variable.

codebook conflict_dich health_dich sex year age_cat qual stress if pop==1

Here are two examples of what the output looks like. In the output in Stata, we can note the number of observations for the categories in each variable, and then calculate the percentage in each category.

Now we want to describe the distribution between the categories for each variable and how the categories are distributed in the outcome variable. We can also add a chi-squared test to see if there is a statistically significant association between the variables. First, we start with the independent variable (which will be the same command as previously).

tab conflict_dich health_dich if pop==1, chi2

tab sex health_dich if pop==1, chi2

tab year health_dich if pop==1, chi2

tab age_cat health_dich if pop==1, chi2

tab qual health_dich if pop==1, chi2

tab stress health_dich if pop==1, chi2

This information can be noted, perhaps calculated to percentages, and used in our descriptive table of the analytical sample in the result section.

Now that the analytical sample is determined and descriptive statistics conducted for the variables, we are ready to perform the regression analyses.