Show
Show
Skip to content
A GUIDE TO APPLIED STATISTICS WITH STATA
Latent class analysis
Coming soon!
Welcome!
Contributions
Contents
Versions, datasets, and citations
Advice
Search
PART I: THE BASIC STUFF
The Stata environment
File types
Dataset
Do-file
Log
Graph
Package
Creating a new dataset
From questionnaire to dataset
Variable structure
Manage variables
Coding the questionnaires
Adjusting an existing dataset
Review dataset
Convert variables
Rename variables
Delete variables
Sort dataset
Create an id number variable
Order variables
Generate
Copy of an existing variable
New variable with a specific value
New variable based on an expression
Rounding
Logarithmic transformation
Substring
Date variables
Egen
Standardization: z-scores
Recode
Recode numeric variables
Recode string variables
Condition the data with if
Descriptive statistics with if
Recode with if
By
Combining datasets
Merge
Append
Basic statistical concepts
Study design
Experimental design
Observational design
A comparison between study designs
Population and sampling
Population
Sampling
Missing data: attrition and non-response
Measurement scales
Types of scales
Differences between scales
Types of values
Distributions
Probability distributions
Empirical distributions
Descriptive analysis
Introduction
Frequency table
Bar chart
Pie chart
Histogram
Measures of central tendency and variation
Central tendency
Variation
Summarize
Tabstat
Epidemiological measures
Ratios, proportions, and rates
Morbidity
Mortality
Natality
Risks and odds
Attributable proportion
Designing descriptive tables and figures
Tables
Figures
Statistical significance
Hypothesis testing
Hypotheses
Outcomes
Errors
Statistical hypothesis testing
P-values
Significance levels and confidence levels
Practical importance
Confidence intervals
The “unknown population parameter”
Limits and levels
Confidence and precision
Choice between p-values and confidence intervals
Calculate confidence intervals for descriptive statistics
Confidence intervals for means
Confidence intervals for median
Confidence intervals for variances and standard deviations
Confidence intervals for counts
Confidence intervals for proportions
Power analysis
Compare groups
Descriptives
Box plot
Crosstable
T-test: Independent samples
Non-parametric alternative: Mann-Whitney u-test
T-test: Paired samples
Non-parametric alternative: Wilcoxon signed rank test
One-way ANOVA
Non-parametric alternative: Kruskal-Wallis ANOVA
Chi-square test
Correlation analysis
Descriptives
Scatterplot
Correlation analysis
Non-parametric alternatives: Spearman’s rank correlation and Kendall’s rank correlation
PART II: REGRESSION ANALYSIS
X, y, and z
Introduction
X and y
Z: confounding, mediating and moderating variables
Confounding variables
Mediating variables
Moderating (or effect modifying) variables
A note on causal inference
(M)AN(C)OVA
ANCOVA
MANOVA
MANCOVA
Preparations for regression analysis
What type of regression should be used?
Dummies
Dummy variables
Factor variables
A note on the choice of reference category
Analytical strategy
Missing data
How to deal with missing data?
From study sample to analytical sample
The “pop” variable
Imputation
Linear regression
Introduction
Linear regression in short
Function
Simple linear regression
Simple linear regression with a continuous x
Simple linear regression with a binary x
Simple linear regression with a categorical (non-binary) x
Multiple linear regression
Model diagnostics
Link test
Residual plot
Breusch-Pagan/Cook-Weisberg test
Density plot, normal probability plot, and normal quantile plot
Variance inflation factor and correlation matrix
Logistic regression
Introduction
Logistic regression in short
Function
Simple logistic regression
Simple logistic regression with a continuous x
Simple logistic regression with a binary x
Simple logistic regression with a categorical (non-binary) x
Multiple logistic regression
Model diagnostics
Link test
Box-Tidwell and exponential regression models
Deviance and leverage
Correlation matrix
The Hosmer and Lemeshow test
ROC curve
Linear probability modelling
Ordinal regression
Introduction
Ordinal regression in short
Function
Simple ordinal regression
Simple ordinal regression with a continuous x
Simple ordinal regression with a binary x
Simple ordinal regression with a categorical (non-binary) x
Multiple ordinal regression
Model diagnostics
Link test
Correlation matrix
Brant test
Multinomial regression
Introduction
Multinomial regression in short
Function
Simple multinomial regression
Simple multinomial regression with a continuous x
Simple multinomial regression with a binary x
Simple multinomial regression with a categorical (non-binary) x
Multiple multinomial regression
Alternative base outcomes
Model diagnostics
Assess model fit
Correlation matrix
Poisson regression
Introduction
Poisson regression in short
Function
Simple Poisson regression
Simple Poisson regression with a continuous x
Simple Poisson regression with a binary x
Simple Poisson regression with a categorical (non-binary) x
Multiple Poisson regression
Model diagnostics
Link test
Correlation matrix
Deviance goodness-of-fit test and Pearson goodness-of-fit test
Alternatives to Poisson regression
Negative binomial regression model
Zero-inflated Poisson regression
Compare fit of alternative count models
Hurdle regression
Cox regression
Introduction
Observational time and censoring
Survival function
Hazard function
Tied failure times
Non-parametric, parametric, and semi-parametric models
The Cox regression model
Cox regression in short
Declare that the data are time-to-event data
Descriptive analysis
Kaplan-Meier curves
Nelson-Aalen cumulative hazard function
Function
Simple Cox regression
Simple Cox regression with a continuous x
Simple Cox regression with a binary x
Simple Cox regression with a categorical (non-binary) x
Multiple Cox regression
Model diagnostics
Link test
Correlation matrix
Log-log plot of survival
Kaplan-Meier and predicted survival plot
Schoenfeld residuals
Tied failure times – cox
Laplace regression
Mediation analysis
Introduction
Type of regression analysis
Rescaling bias
Function
Practical example with logistic regression
Practical example with ordinal regression
Interaction analysis
Introduction
Type of regression analysis
Primary approaches to interaction analysis
Two ways of generating the interaction term
Interpretation
Approach A
Practical example with linear regression
Practical example with logistic regression
Approach B
Practical example with logistic regression
Practical example with Cox regression
PART III: TAKING IT ONE STEP FURTHER
Factor analysis
Introduction
Assumptions
Number of factors
Factor loadings
Rotation
Postestimation
Factor analysis vs principal component analysis
A practical example
Cronbach’s alpha
Latent class analysis
Structural equation modelling
Group-based trajectory modelling
Sequence analysis
Time-series analysis
Difference-in-differences
PART IV: TEST YOUR SKILLS
Data management and description
Stata and basic concepts
Descriptive analysis
Basic statistical analysis
Statistical significance
Differences and associations
Statistical data modelling
Linear regression
Logistic regression
PART V: FROM START TO FINISH
Practical example with linear regression
Aim and research questions
Data and methods
Data material
Variables
Statistical analysis
Simple linear regression analyses
Multiple linear regression analysis
Interaction analysis
Model diagnostics
Results
Discussion
Practical example with logistic regression
Aim and research question
Data and methods
Variables
Statistical analysis
Simple logistic regression
Multiple logistic regression
Interaction analysis
Model diagnostics
Results
Discussion
Practical example with Cox regression
Aim and research questions
Data and methods
Variables
Descriptive analysis
Statistical analysis
Simple and multiple Cox regression
Interaction analysis
Model diagnostics
Results
Discussion