SIPP Frequent Questions

When will your SIPP extracts be updated?
Why do the variable names not match the SIPP documentation from the Census?
What are the hardware and software constraints?
How do I convert the unique string ID variable to a numeric one?
What are the experimental state weights?

Next Update

Unfortunately, the two researchers who created the SIPP extracts for CEPR no longer work here, and none of us who work here have experience with the SIPP. So, for the time being, we will not be updating our SIPP extracts, and are unable to field questions about the SIPP. Please visit the SIPP page at the National Bureau of Economic Research to find extraction programs and further information on the SIPP. The SIPP page at the Census Bureau should also be useful.

Variable Names

Our Stata programs are the major source for information on our extracts. For the SIPP, the programs can be found here. Our Stata code show the changes we made to the original raw SIPP variables in order to create our extract.

Hardware and software constraints

Our programs are developed in Stata 8/9, and will run on Windows, Linux, Unix, and Mac machines.

The SIPP data is relatively large. For each panel, there is a Core and Topical datafile for each wave of the panel, ranging from 300MB to 500MB per file. For the 1996 panels on forward, there is also a datafile for each wave for the longitudinal file. For the 1992 and 1993 files, however, there is one, larger datafile for the longitudinal file. To get a sense of the space needs, the full 1996 panel, unzipped and Stata format, is 14.7GB; in addition, our recoding programs will create about 4GB of Uniform Extracts for the 1996 panel. To use the programs to create the Uniform Extracts, it is best to be running Intercooled Stata or Stata SE, with at least 512MB of RAM.

If you are seeking the data in some form other than Stata .dta files, please contact us.

Destringing the unique ID variable

Our recode programs keep the unique ID variable as a string variable, the format in which the U.S. Census provides the data. However, certain procedures require that the ID variable be numeric. In Stata, the length of the string variable precludes simply “destringing” the ID. Instead, you can run the idx.do program to generate a unique numeric ID.

Experimental State Weights

The program used to pull variables for Uniform Extract Set A can also pull weights to generate state-level estimates with the same degree of sampling accuracy as in the national sample. This section of the code is currently commented out, but if you wish to use these weights, just uncomment the code and be sure you have the experimental state weight data from the U.S. Census Bureau. Our user notes for Set A contain an analysis of these state weights and Census Bureau contact information for obtaining them.