SIPP Extraction Programs

The zip files below, by panel, include all the necessary programs to read each entire SIPP panel into Stata (Core, Topical, and Longitudinal) for the 1990 through 2008 SIPP panels. This program also generates a unique id variable for SIPP respondents by concatenating the person id, the household entry number, and the person number. This facilitates merging data from the Core, Topical, and Longitudinal files.

Each panel is extracted with an extraction program that will read the raw data, give the appropriate variable names and labels, reshape all data into “person-month” format, and save the dataset as a Stata file. The program will do this for all waves of the panel, including Core, Topical, and Longitudinal files. Before running the programs, you should ensure that you have enough memory to extract a full second set of the data on your hard drive.

Step 1

Download the extraction programs listed below and place them in the appropriate directories.

Directions for running the extracts for each panel are embedded in the first lines of the program. Using the programs necessitates that you become familiar with the directory structure they require in a series of macros at the top of the program (where “01” is any two-digit SIPP panel year). Download CEPR SIPP Uniform Extract programs below and then unzip them into the directory indicated by the macros. For example, if the macros state that the program files should be placed in


then be sure to place the unzipped programs in that directory. You can of course change the names of the directories pointed to by the macros, but don’t change the actual macro names.

The extraction program creates datasets that are all in what is termed person-month or long format. This means that for each individual in the panel, there is one observation per person per month. So, in the 1996 panel, which has 48 months of data, there are 48 observations per person. This might be confusing to those who have previous experience with the SIPP as we have reshaped the longitudinal data to be in person-month (long) form for ease of use. This way, the programs for cleaning the data can be more easily merged.

The extract programs make a few changes to the raw SIPP data to facilitate merges. For all panels, we create a unique id variable from the person id, entry id, and person number variables. This is a string variable. We also change the name of the reference month variable in the 1992 and 1993 panels to match the name from the 1996 and later panels from refmth to srefmon. By doing this, we are able to use the variables id and srefmon, along with wave, to match individuals across the longitudinal, core, and topical module data files. The crosswalk [xls format] identifies other changes.

Step 2

Download and review the documentation.

We highly recommend that users download the SIPP raw data codebooks either from the NBER or directly from theU.S. Census. CEPR’s Uniform Data Files come with codebooks, but please refer directly to the SIPP data dictionaries for more complete information on the variable in question.

We also highly recommend that users read the SIPP User’s Guide 2001 [pdf format]. There are a number of places were CEPR’s programs refer to information in this guide that will be helpful for the user.

Step 3

Download and decompress the raw SIPP data files into the appropriate directory.

The compressed data files are available from NBER. Note that the files are quite large and users may need to unzip and work with the panels one at a time.

Ensure that the unzipped NBER data is placed in the required directory. For example, if the file macros indicate that the NBER raw data should be placed in


then be sure to place the raw data in that directory, or change the macro directory structure as appropriate.

CEPR SIPP Extraction Programs
Panel Do and Dictionary Files