SIPP Programs

The programs were developed in Stata version 8/9 and will run on Windows, Linux, Unix, and Mac machines.

These programs are the major source for information on our extract. The Stata code show the changes we have made to the original raw SIPP variables in order to create our extract.


Unfortunately, the two researchers who created the SIPP extracts for CEPR no longer work here, and none of us who work here have experience with the SIPP. So, for the time being, we will not be updating our SIPP extracts, and are unable to field questions about the SIPP. Please visit the SIPP page at the National Bureau of Economic Research to find extraction programs and further information on the SIPP. The SIPP page at the Census Bureau should also be useful.


One of the clear advantages of the CEPR SIPP Uniform Extracts is that researchers have to spend less time focusing on the way that Census prepares the data, and can spend more time on their analysis. However, there are a number of issues that researchers must be clear on before they begin to use the SIPP. An excellent resource is the Census Bureau’s SIPP User Guide, which provides details on the SIPP.

Data Structure

The SIPP is structured such that one fourth of the sample is interviewed every month and each four-month interval in which all sample members are interviewed is termed a *wave*. Each wave includes four months of data for each respondent, although not all questions are asked by month, some are only asked “once per wave.” During each wave, respondents are asked a set of core questions, which cover labor market participation, wages, and participation in income support programs, and questions from topical modules that change each wave. The first topical module, for example, includes employment and welfare history, asks questions that allow identification of a history of welfare use, as well as labor market experience prior to the panel. Other modules focus on childcare, assets, training history, etc. You can see a full list of topical module topics and which wave they were asked in at the Census. Typically, topical module questions are asked once per wave, that is the child care usage applies to the time of the inteview, not each month during the wave.

For each panel, Census typically releases three kinds of data: a Core and a Topical file for each wave, and a Longitudinal file for the entire panel or a series of Logitudinal files by wave. The typical researcher will need variables from all three panels, requiring intensive learning about the structure of each and how to merge variables. For the pre-1996 panels, Census provided longitudinal files that cleaned some of the core files, but not all, for longitudinal consistency. For example, they made sure that a respondent’s age did not jump around, but was consistent across the panel. However, this means that a researcher can find an age variable in both the longitudinal and the core data. The CEPR SIPP Uniform Extracts always use the longitudinal variable, if available.

The CEPR SIPP Uniform Data Extracts pull variables from the SIPP Core, Topical, and Longitudinal files. The data is converted into “person-month” format (one observation per person per month in the panel) for ease of programming. A researcher who wants to convert the data back to longitudinal structure (one observation per person) can easily do so with the Stata “reshape” command. We when we pull variables from the SIPP files we also create a unique identification variable for each SIPP respondent, as outlined in Census’s technical documetation for the SIPP.

Generating Your Own Data

You can always download our CEPR SIPP Uniform Extracts for free. To generate your own CEPR SIPP Uniform Extracts from scratch, proceed in two stages: extract the necessary variables from the Census SIPP files, then use our Stata programs to recode.


CEPR provides a set of extraction programs to facilitate the importation of the data from the raw files (ASCII) into Stata format. Raw SIPP data can be downloaded either from the National Bureau of Economic Research or the Census Bureau. Researchers can then run CEPR’s extraction programs for the panel that they are working with. These are based on the programs generated by the National Bureau of Economic Research, however, we made a number of important changes. First, we reshape all files into person-month format, including the longitudinal files. Second, we generate the unique ID variables (from the sample unit identifier, entry id, and person number). This way, once the data is in Stata, it is very easy to merge Core and Topical module waves together, or to merge a number of waves to form a panel.


After extracting the raw SIPP data, the researcher runs CEPR’s Stata programs that recode SIPP variables to generate files with uniform variables across SIPP panels. CEPR’s technical documentation is in the associated codebooks and is also annotated within the programs themselves. Each Uniform Extract has user notes that explain the variables and compare them to other data sources, so that the researcher has a clear sense of potential problems.