Function Step 1
| Basic command |
factor varlist |
| Useful options |
factor varlist, mineigen(number) factor varlist, pcf or ipf or ml |
| Explanations | |
varlist | List which variables that you want to include in the analysis. |
pcf or ipf or ml | Specify the estimation method. Default is pf. |
| Short names | |
pf | Principal factor method |
pcf | Principal-component factor method |
ipf | Iterated principal-factor method |
ml | Maximum-likelihood factor method |
| Note Options can be used simultaneously, e.g: factor varlist, mineigen(number) pcf |
More informationhelp factor |
Performing a factor analysis can be seen as an iterative process: you conduct the analysis, evaluate it, might tweak it a bit, and then conduct it again. We will start by performing a simple factor analysis with the principal-component factor method (pcf).
Practical example
| Dataset |
| StataData2.dta |
| Variable name | Variable label |
| imp_ideas imp_rich imp_secure imp_good imp_help imp_success imp_risk imp_behave imp_environ imp_trad | Important to think up new ideas Important to be rich Important living in secure surroundings Important to have a good time Important to help people Important to be successful Important with adventure and taking risks Important to always behave properly Important looking after the environment Important with tradition |
factor imp_ideas-imp_trad, pcf |

In the first table, we first look at the column called Eigenvalue. We see that Factor1 and Factor2 produce eigenvalues above 1 (2.98870 and 1.61967, respectively). Next, focusing on the column called Proportion, we see that Factor1 accounts for 30% (0.2989) and Factor2 for (16% (0.1620) of the variance.
In the second table, we get the factor loadings for each item. When we use the option pcf, factor loadings are only shown for factors with eigenvalues above 1. For Factor1, loadings range between 0.4091 and 0.6658. For Factor2, they range between -0.3716 and 0.5807. The uniqueness values range between 0.4809 and 0.6089. Earlier, we suggested that factor loadings between 0.5 and 1 were acceptable, as well as uniqueness values between 0 and 0.5. Thus, our factor solution is quite poor. Moreover, it is not entirely clear which item belongs to which factor – we might need some rotation here.
Function step 2
| Basic command |
rotate |
| Useful options |
rotate, quartimax |
| Explanations | |
quartimax | Orthogonal rotation with the quartimax option. |
equamax | Orthogonal rotation with the equamax option. |
promax(number) | Oblique rotation with the promax option, replace “number” with preferred power (default is 3). |
oblimin(number) | Oblique rotation with the oblimin option, replace “number” with preferred gamma (default is 0). |
| Note Orthogonal rotation with the varimax option is default. To clear the results from rotation, use: rotate, clear |
More informationhelp rotate |
The next step is to rotate the results to minimize the complexity of the factor structure and facilitate interpretation. Since it is unlikely that our factors are uncorrelated (they seldom are, in the social sciences), we will go with an oblique rotation (more specifically, we try out promax).
Practical example
| Dataset |
| StataData2.dta |
| Variable name | Variable label |
| imp_ideas imp_rich imp_secure imp_good imp_help imp_success imp_risk imp_behave imp_environ imp_trad | Important to think up new ideas Important to be rich Important living in secure surroundings Important to have a good time Important to help people Important to be successful Important with adventure and taking risks Important to always behave properly Important looking after the environment Important with tradition |
rotate, promax |

The rotation made the factor loadings more clearly reflect the two factors.
If we identify for with factor each item has the higher loading, we can conclude that the two factors contain the following items:
Factor 1
- Important living in secure surroundings (security)
- Important to help the people (benevolence)
- Important to always behave properly (conformity)
- Important looking after the environment (universalism)
- Important with tradition (tradition)
Factor 2
- Important to think up new ideas (self-direction)
- Important to be rich (power)
- Important to have a good time (hedonism)
- Important being very successful (achievement)
- Important with adventure and taking risks (stimulation)
The ten variables used in this factor analysis actually stem from a theory of human values, developed by Schwartz. According to this theory, the variables should be categorised in the following way:
- Conservation: security, tradition, and conformity
- Openness to change: self-direction, stimulation, and hedonism
- Self-enhancement: power and achievement
- Self-transcendence: benevolence and universalism
If we compare the theoretical categories with the factors derived from factor analysis, we actually see that the Factor 1 includes all variables theoretically associated with conservation and self-transcendence, whereas Factor 2 includes all variables theoretically associated with openness to change and self-enhancement.
What do we do with this information then? Well, we need to examine possible reasons as to why the factor analysis did not reveal the same factors as the theory proposes. If we find no apparent problems with the empirics (e.g. missing data, problems with the questionnaire itself, etc.) we may suggest that the theory needs to be modified. At least it is important to discuss the differences between the theory and the empirics.
Sometimes, we do not have a clear theory guiding the factor analysis and, thus, we have no a priori understanding about which factors that are reasonable to expect. In that case, it is common practice to focus on a factor solution with good properties (i.e. clear factor structure and high factor loadings). It is always a trade-off between theory and empirics: if theory has precedence over empirics, we may be more disposed to accept lower factor loadings.
In practice, all of this might mean that we go on to create two indices (e.g. sum score, or mean score), with each reflecting one factor, which we can then include in another analysis (such as regression analysis).
Function step 3
| Basic command |
estat kmo screeplot |
| Explanations | |
kmo | Kaiser-Meyer-Olkin measure of sampling adequacy. |
screeplot | Plot eigenvalues. |
| Note Orthogonal rotation with the varimax option is default. To clear the results from rotation, use: rotate, clear |
More informationhelp estat factor |
The third step is to do some postestimations, such as looking at the Kaiser-Meyer-Olkin measure of sampling adequacy and a screeplot, to see if our two-factor solution makes sense.
Note that if we here find any problems with our factor analysis or the chosen number of factors, we should go back and make some adjustments in order to find a better solution. For instance, we can try out different estimation methods, rotate the solution differently, or remove one or several of the items.
Practical example
| Dataset |
| StataData2.dta |
| Variable name | Variable label |
| imp_ideas imp_rich imp_secure imp_good imp_help imp_success imp_risk imp_behave imp_environ imp_trad | Important to think up new ideas Important to be rich Important living in secure surroundings Important to have a good time Important to help people Important to be successful Important with adventure and taking risks Important to always behave properly Important looking after the environment Important with tradition |
estat kmo |

The KMO test produces an overall value of 0.7918, which shows that our factor analysis appears to be appropriate.
screeplot |

In the screeplot, we can see that the “elbow” begins with the third factor, thus reflecting that a two-factor solution seems feasible.