About the SIPP
The SIPP is a multi-panel, nationally representative dataset created by the U.S. Census. A panel dataset is one that follows the same individuals over time. The first SIPP panel was begun in the mid-1980s and the latest one was begun in 2008. The SIPP tracks individuals for two to four years, depending on the panel. SIPP respondents are asked questions every fourth month about their experiences over the prior four months.
SIPP respondents are asked questions about their participation in income maintenance programs, such as welfare and unemployment compensation, their household and family composition, employment and earnings, access to services, including health insurance and child-care, assets, and other topics. Some of these questions are only asked once or twice during the panel, but demographics, program participation, and employment-related information are asked each month.
The SIPP is the preferred dataset for some specific questions:
- The SIPP contains much more information than typically used datasets, such as the Current Population Survey, for individuals over time. For example, the topical modules provide information on the assets of individuals as well as their work schedules and child care usage.
- Usually, labor market data is “static” in that it covers only one point in time. Thus, from the Current Population Survey, we can determine someone’s wages in March 2002. However, with the SIPP, we can follow an individual over time so we know how much wages have grown for individuals from March 2000 to March 2002 as well as how long they’ve been at their job and whether or not they had training earlier in the year, enriching our analysis of what is happening in the labor market.
The Extracts are divided up thematically, with each one covering a separate topic, in order to facilitate ease of use and to keep them to a manageable size. For example, Extract B: Demographics contains demographic information on all SIPP respondents. The program for each Extracts runs for SIPP panels 1990 through 2008, across platforms (Unix, Linux, Mac, Windows), creating a set of variables that, to the extent possible, are consistent across panels. Inconsistencies are clearly noted in the codebooks. The codebooks also note which raw SIPP variables were used to generate each variable. A separate document, Crosswalk [xls format], shows how the raw SIPP variables are matched across panels.
For example, the education variable changes considerably between the 1993 and 1996 panels. We have generated a new education variable that is more consistent across panels. The User Notes show how this variable compares to other published education data. Further, since we provide all our Stata programs, other researchers can reject our “fix” and use their own, but at least they know of the problem, know what we think in the best solution, and have documentation on why we think so.
To use CEPR’s SIPP Uniform Extracts, you should first do a few of things. Take a look at the documentation to get a sense of what variables we include and how the SIPP compares to other data sources. Then, take a look at the programs to learn how we construct our SIPP Uniform Extracts and how to most efficiently make use of them. Finally, take a look at our page of publications to see what is possible with the SIPP and what others have done. Please feel free to contact us at ceprdata [at] cepr [dot] net if you have questions or comments.
When using this data for analysis, please cite it as
- Center for Economic and Policy Research. 2012. SIPP Uniform Extracts, Version 2.1.7 . Washington, DC.