T-test: Independent samples

Quick facts

Number of variables
One group variable 
One test variable

Scales of variable(s)
Group variable: categorical with two values (binary) 
Test variable: continuous

Introduction

The independent samples t-test is a parametric method for comparing the mean of one variable between two (unrelated) groups.

Example
Mean income salary among men Mean income salary among women 
Let us assume that you want to see if the income salary of teachers differs between men and women. The independent samples t-test can be used to compare the mean income between the two groups.
Note
The independent samples t-test is sometimes referred to as the two sample t-test, independent t-test, or student’s t-test.
T-distribution

The independent samples t-test assumes a t-distribution (see T-distribution).

T-statistic

A t-test will produce a t-statistic (t). This is a standardised value that is calculated based on the study sample we have.

The t-distribution is based on the assumption that the null hypothesis is true (see Hypotheses). A t-statistic that is 0 means that the result from the t-test exactly reflects the null hypothesis (i.e, there is no difference between the groups). The higher the t-value, the further we get from the null hypothesis.

A t-statistic does not mean so much in itself. It is difficult to directly assess whether is is high or not.

Degrees of freedom

Degrees of freedom is a rather tricky concept to make sense of. Applied to the t-test, degrees of freedom is the same as the number of observations (i.e., individuals) minus 1 (n-1).

P-value

When performing a t-test, we want to examine whether if there is sufficient evidence to reject the null hypothesis (which stipulates that there is no difference between the groups).

The higher the t-statistic, the lower the p-value. A p-value that is lower than 0.05 means that we can reject the null hypothesis.

In Stata, there are three p-values that are reported when we perform a t-test:

Ha: diff != 0Two-sided t-test.
Ha: diff < 0One-sided t-test (left tail)
Ha: diff > 0One-sided t-test (right tail)
Note
Generally, we focus on the two-sided t-test since we make no assumption of the direction of the association/relationship, i.e., we do not specifically assume that Group A has either a lower or a higher mean value than Group B.
For the one-sided (left tail) t-test, we assume that Group A has a lower mean value than Group B. For the one-sided (right tail) t-test, we assume that Group B has a higher mean value than Group B.
For statistical reasons, the one-sided t-tests increase the power to obtain p-values below 0.05, but we also risk missing associations that go in the opposite direction.

Assumptions

First, you have to check your data to see that the assumptions behind the independent samples t-test hold. If your data “passes” these assumptions, you will have a valid result. 

Below is a checklist for these assumptions. 

Continuous test variable Your test variable should be continuous (i.e. interval/ratio). For example: Income, height, weight, number of years of schooling, and so on. Although they are not really continuous, it is still very common to use ratings as continuous variables, such as: “How satisfied with your income are you?” (on a scale 1-10) or “To what extent do you agree with the previous statement?” (on a scale 1-5). 
Normal distributionThe test variable should be approximately normally distributed. Use a histogram to check (see Histogram).
Two unrelated categories in the group variable Your group variable should be categorical and consist of only two groups. Unrelated means that the two groups should be mutually excluded: no individual can be in both groups. For example: men vs. women, employed vs. unemployed, low-income earner vs. high-income earner, and so on. 
No outliersAn outlier is an extreme (low or high) value. For example, if most individuals have a test score between 40 and 60, but one individual has a score of 96 or another individual has a score of 1, this will distort the test. 

Functions

Basic command
ttest testvar, by(groupvar)
Explanations
testvarInsert the name of the variable that you want to test.
groupvarInsert the variable defining the two groups.
More information
help ttest

Practical example

Dataset
StataData1.dta
Variable namecognitive
Variable labelCognitive test score (Age 15, Year 1985)
Value labelsN/A
Variable namesex
Variable labelSex
Value labels0=Man
1=Woman
ttest cognitive, by(sex)

To start with, the overall mean is 308.4708. As can be seen, men have a slightly higher mean value compared to women (311.943 vs. 304.9106; a difference of 7.032464).

We can note that the t-statistic (t) in this example is 4.5949, with 8877 degrees of freedom.

The corresponding p-value is 0.0000 (look below “Ha: diff != 0”). This is below 0.05, which allows us to reject the null hypothesis (which postulates that there is no mean difference between the two groups).

In other words, there is a significant difference in mean cognitive test scores between men and women in this example, to the advantage of men.