Variable Names

Our Stata programs are the major source for information on our extracts. For the CPS from 1994-on, the programs can be found here. Our Stata code show the changes we made to the original raw CPS variables in order to create our extract.


Preferred Wage Variable

The CEPR preferred wage variable is rw_ot when an analysis uses only data from 1994 to the present and rw when an analysis includes data before and after 1994. Both of these variables are converted to the most recent dollars using the CPI-U-RS.

For a full discussion of wage variables, see the 2003 paper by John Schmitt, Creating a Consistent Hourly Wage Series from the Current Population Survey’s Outgoing Rotation Group, 1979-2002.


Calculating Standard Errors

The CPS is not a random sample of U.S. households –it is a “multistage stratified sample.” As a result, the procedures that statistical packages usually use to calculate standard errors will produce estimates that are systematically lower than they should be. (See Davern, Jones, Lepkowski, Davidson, and Blewett, 2007, LINK: http://www.jstor.org.proxyau.wrlc.org/stable/29773307, and Ludington, 1992, LINK:http://www.amstat.org/Sections/Srms/Proceedings/papers/1992_127.pdf, for example.)

Statistical packages (such as Stata) sometimes have procedures that take the survey design into account in order to produce more accurate estimates of standard errors. Unfortunately, the Census Bureau does not release the information about the CPS stratification method that is needed to use these procedures.

Several researchers, however, have developed procedures that can be used to approximate key features of the CPS design and allow the use of Stata (and presumably other statistical package’s) survey design routines. (Our procedure, below, is based on recommendations from Austin Nichols LINK:http://www.stata.com/statalist/archive/2008-04/msg00444.html.)

Beginning with version 1.8 of the CEPR CPS ORG extracts, for survey years from 2006 to the present, we have included variables that can be used in conjunction with the Stata svy commands to calculate more accurate standard errors than those produced by the usual procedures that do not take the CPS design into account.

The two new variables are:

cbsasz: categorical variable to identify Metropolitan Area size

cmsacode: Consolidated Statistical Area code which identifies 30 metropolitan areas

Using the following Stata code together with Stata’s svy commands will improve the accuracy of CPS standard errors:

egen psu=group(cbsasz cmsacode)

svyset [pw=orgwgt], strat(cbsasz) psu(psu)

Here is an example using data from the 2012 ORG extract. First, we calculate the weighted mean real wage for men age 16-64, without taking the CPS design into account:

. mean rw_ot [aw=orgwgt] if female==0 & (16<=age & age<=64)


Next, we perform the same calculation allowing for the survey design using Stata’s svy command:


 In both cases, the mean wage is identical: $24.11. But, the standard error using the canned procedure yields a standard error ($0.06) that is one-twelfth of the standard error calculated after taking the survey design into account ($0.74).

Here is another example, using a binary variable for union membership for the same population. The variable unmem takes the value 1 if the respondent is a union member, 0 if the respondent is currently working but is not a union member.

Using the standard ci command, without factoring in the survey design:


Using the proportion command and taking the survey design into account:

Once again, the estimated union membership share is identical across the two calculations (0.121 or 12.1 percent), but the standard error is much larger after adjusting for the survey design (0.007 versus 0.001).