How to uniquely identify households in the March CPS


Within a single file, the best way to uniquely identify households is to use the hhseq variable.

Here’s an explanation from Unicon Research Corporation (which provides the underlying source of our extract from 1980-2013) – “Beginning in 1994 through the current year, there is a problem with duplicate HHIDs for different household units. The problem is particularly severe starting with the SCHIP-expanded sample in 2001 (the 2001s data)… When identifying a household within a file, the variable hhseq should be used. This variable appears to have no problem with duplicate values. However, when matching records across files [if you’re trying to track the same household in multiple years], it is necessary to use HHID. Adding geographic variables to the sort (state, county) may aid in uniquely identifying household units. When that is not enough, we suggest that household units be identified using both hhid and hhseq, then use demographic variables (sex, age, race) to match up individuals within the household, thus insuring that the proper hhid/hhseq units are matched across years.”

While there are duplicate hhids for different households, if you look at hhseq by year, there will no longer be any duplicates.

Leave a Comment