Version 0.9.8.1 of the CEPR CPS March program files for 1980-2014 has been released. With this update, we are now using the most recently available raw data for the sample with redesigned income questions. Download the data or view the program files for changes to the final extracts.
Within a single file, the best way to uniquely identify households is to use the hhseq variable.
Here’s an explanation from Unicon Research Corporation (which provides the underlying source of our extract from 1980-2013) – “Beginning in 1994 through the current year, there is a problem with duplicate HHIDs for different household units. The problem is particularly severe starting with the SCHIP-expanded sample in 2001 (the 2001s data)… When identifying a household within a file, the variable hhseq should be used. This variable appears to have no problem with duplicate values. However, when matching records across files [if you’re trying to track the same household in multiple years], it is necessary to use HHID. Adding geographic variables to the sort (state, county) may aid in uniquely identifying household units. When that is not enough, we suggest that household units be identified using both hhid and hhseq, then use demographic variables (sex, age, race) to match up individuals within the household, thus insuring that the proper hhid/hhseq units are matched across years.”
While there are duplicate hhids for different households, if you look at hhseq by year, there will no longer be any duplicates.
Version 0.9.8 of the CEPR CPS March program files for 1980-2014 has been released. Download the data or view the program files for changes to the final extracts. CEPR CPS Basic Monthly program files for 1979-2014 are also available for download.
There have been several notable changes to the files. First, we now use the raw CPS March data as the sole source of our extract starting in 2014. We still use Unicon’s extract as the underlying source for our extract from 1980-2013.
In addition, the 2014 March CPS included redesigned health insurance and income questions. All of the addresses received the redesigned health insurance questions. However, the income questions were fielded using a split panel design, where 5/8ths of the sample (approximately 68,000 addresses) were given the same income questions as the previous year, while 3/8ths (approximately 30,000 addresses) were part of the “Research File” and were given redesigned income questions. Those who were asked the redesigned income questions gave responses that were significantly different from those who were given the original questions. Therefore, the Census Bureau urges researchers not to combine the two files.
We provide the files separately here. There is a regular CPS March 2014 file (cps_march_2014.zip) that is based on the 5/8ths sample – this is what the Census Bureau has used for their normal publications on income and poverty. We also have available the 3/8ths or Research File (cepr_march_2014_research.zip) that has the redesigned income questions.
We recommend that you use the 5/8ths sample with original income questions for your publications.
For more on these changes, and the Census Bureau’s recommendations on how to handle the data, please see here.
Version 2.0 of the CEPR CPS ORG program files for 1979-2014 has been released. Download the data or view the program files for changes to the final extracts. CEPR CPS Basic Monthly program files for 1994-2014 have also been updated and are available for download.
This update represents a major overhaul of our extract. We have made a large number of minor coding corrections to a number of variables, and have also dropped some variables from our extract. A full list is in the changelog at the end of the master program (which we’ve also pasted in below).
The biggest change is that we now use the Basic CPS data as the sole source for our extract from 1994 to the present. Previously we used NBER’s MORG extract as the underlying source for our extract from 1979-2002, while merging some variables from the Basic CPS into the NBER extract. With this update, we continue to use the NBER MORG extract for 1979-1993, but from 1994-present, we use the raw CPS Basic data directly from the Census.
With this update, we have also updated the version of the NBER MORG for 1979-1993 to use the most recent available version of the NBER extract (accessed July 2014).
We have also made significant changes to our wage variables. Most importantly, we went from carrying over 25 hourly wage variables to carrying just six. These six variables are wage1, wage2, wage3, wage4, rw, and rw_ot. We believe these variables are more straight-forward and do a better job of measuring overtime, tips, commissions, and bonuses (otc) for hourly workers.
Full details on the new wage variables are available in cepr_org_wages.do.
Briefly, wage1 is hourly earnings for workers paid by the hour; it excludes otc; and is available only for hourly workers.
wage2 is the usual hourly earnings, including otc, for nonhourly workers; and is available only for nonhourly workers.
wage 3 combines the usual hourly earnings for hourly workers (excluding otc) in wage1, and nonhourly workers (including otc) in wage2; wage3 is available for all workers and attempts to match the NBER’s recommendation for the most consistent hourly wage series from 1979 to the present.
wage4 is the usual hourly earnings, including otc for hourly and nonhourly workers. From 1994 to the present, this series uses hourly workers’ reported usual amounts of overtime, tips, commissions, and bonuses in order to estimate a wage for hourly workers that includes otc. From 1979 to 1993, this series attempts to estimate otc for hourly workers based on differences between weekly pay and the implied weekly pay at usual hours and straight pay. We do not place great faith in the wage4 series before 1994.
(The names wage1, wage2, wage3, and wage4 are borrowed from Economic Policy Institute terminology.)
We have retained a slightly modified version of the rw variable, which is based on wage3 with a number of adjustments. First, rw converts hourly wages to constant 2014 dollars using the CPI-U-RS. Second, for workers who report a top-coded weekly earnings, we assign our estimate of the mean above the top-code, rather than the top-coded value, in order to calculate hourly earnings; our procedure uses a lognormal approximation and is applied separately by gender. (See cepr_org_topcode_lognormal.do and cps_basic_topcode_lognormal.do). We do not adjust earnings for the very small number of hourly workers whose hourly pay is top-coded.) Third, rw includes respondents who report that their weekly “hours vary.” For these workers, we use reported hourly pay or, if necessary, weekly pay together with an imputed usual weekly hours; for details, see cepr_basic_hours.do. Finally, we trim observations where the real 1989 hourly wage is below $0.50 or above $200. (For a longer, somewhat dated, discussion of the top-coding, “hours vary,” and trimming procedures, see this 2003 paper.)
rw_ot is based on wage4 (which includes otc for all workers) and otherwise makes the same adjustments as rw.
Internally, we generally use rw_ot when an analysis uses only data from 1994 to the present and rw when an analysis includes data before and after 1994.
Changelog from cepr_org_master.do:
1. Added 2014 data
2. Now uses NBER MORG extract only for 1979-1993; previously was 1979-2002. Uses CPS Basic from 1994-present
3. Corrected coding error in 2012 for race variables: wbho, wbhao, wbhom, wbhaom, racehpia
4. Corrected coding error for vet in 2005
5. Corrected coding error for selfemp, selfinc, pubsect, pubfed, pubst, publoc
6. Dropped student, studpt variables
7. Added new school enrollment variables: schenrl, schhs, schcol, schft, schpt
8. Added multjob variable – has more than one job
9. Added multjobn variable – number of jobs
10. Added paid employees info variables: pdemp1, pdemp2, nmemp1, nmemp2
11. Corrected coding error in ownchild, ch02, ch05, ch35, ch614, ch1417
12. Corrected coding error in famrel94
13. Corrected coding error in 2004 for metro, centcity, suburb, rural
14. Extended principalcty variable back to 1994
15. Extended fipscountry and cbsasz back to Sep 1995
16. Renamed smsastat06 to smsastat05, and extended back to 2005
17. Dropped chi variable for Chicago
18. Corrected coding error in nyc, and la variables
19. Corrected coding error in hourslwa
20. Dropped hrmgfail – no longer nec. since we use CPS Basic from 94-on
21. Corrected coding error in blsimpt
22. uhourse: for years 94-present, now uses pehrusl1 for non-hourly, and peernhro for hourly workers
23. Added longitudinal weight (lonwgt) and family weight (famwgt) to keepord program
24. Added ind_m03 variable “major industry recode” for 2003-present
25. Dropped ind11
26. Added ind09, which is valid from Jan 09-April 2012
27. Corrected coding error in ind12 – now available May 2012-December 2013
28. Added ind14 variable
29. Dropped occ13 variable
30. Added occ12, which is valid May 2012-present
31. Corrected coding error in occ11, now available Jan 2011-April 2012.
32. Added occ_m03 variable “major occupation recode” for 2003-present
33. Added peernuot to keepord program
34. Dropped earnhre
35. Corrected coding error in blsimph and blsimpw
36. Added wage1, hourly earnings if paid by the hour, excluding otc
37. Added weekpay, usual weekly earnings for hourly and non-hourly workers, including otc
38. Added wage2, usual hourly earnings for nonhourly workers, including otc
39. Added wage3, nber-style wage variable for usual hourly earnings, excluding otc for hourly workers, but including otc for nonhourly workers
40. Added otcrec – usually receives otc. Different methodology for 79-93 vs. 94-present
41. Added otcamt (formerly wkotc) – weekly earnings from otc
42. Dropped wkotc
43. Added wage4 – usual hourly earnings for hourly and nonhourly workers, including otc
44. Dropped the following wage variables: w_nber, w_no_no, w_no_ot, w_ln_no, w_ln_ot, w_p7_no, w_p7_ot, w_p8_no, w_p8_ot, w_p9_no, w_p9_ot, rw_p8_no, rw_p8_ot, w_ln_noa, w_ln_ota, w_p7_noa, w_p7_ota, w_p8_noa, w_p8_ota, w_p9_noa, w_p9_ota, rwa, rw_ota, rw_p8_noa, rw_p8_ota
45. Changed rw and rw_ot, trims wage observations below $0.50 and above $200 in 1989$
46. Added proxy “Self or proxy response”, replacing lf_proxy
47. Added wholine “Line number of respondent”, replacing resp_lno
48. Added reltoref “”Relationship to reference person”, replacing rel_ref
49. Dropped lf_proxy, resp_lno, rel_refp
50. Uses newer version of NBER MORG extract for 1979-1993 (accessed July 2014)