Merge

Written by:

Ylva B Almquist

Sometimes, it is necessary to combine two or more datasets. That is quite common for us working with register datasets, where different variables are kept in different files. For this purpose, it is possible to use merge.

More information
help merge

For merge to work, you need one or more variables to merge the datasets with. Most of the time, you have two datasets that contain the same number of individuals which are identified through an id variable.

Open the dataset that you want to merge something to, with the following command:

use "path\filename.dta"

Change “path\filename” to the full path (i.e. the folder on your computer that contains the file), and specify the file name, such as:

use "C:\Users\yerik\Stata Guide\TestDataMA.dta"

Then you can use the following command:

merge 1:1 varlist using "path\filename.dta"

Note
1:1 means that you do a one-to-one merge on specified key variables. For varlist, you specify the variable(s) that you want to merge on.

Change “path\filename” to the full path to the dataset that you want to merge with (called the “using” dataset) the dataset that you have open (called the “master” dataset). For example:

merge 1:1 id using "C:\Users\yerik\Stata Guide\TestDataMB.dta"

This produces a variable in the first dataset that is called _merge. We also get a frequency table of this variable in the Results window. In our example, it looks like this:

Thus, all 10 observations in the two datasets have been matched successfully.

Note
Merging datasets can, of course, be a bit more complicated than this. If you have different amounts of individuals in the two datasets, you might need to use m:1, 1:m, or m:m, instead of 1:1 (m=many).