- Why do the variable names not match the ACS documentation from the Census?
- What set of income variables should I use?
- How do I identify counties in the ACS?
- What weights should I use?
Our Stata programs are the major source for information on our extracts. For the ACS, the programs can be found here. Our Stata code show the changes we made to the original raw ACS variables in order to create our extract.
The CEPR preferred income variables are those that take into account the Census Bureau’s internal constant calendar year inflation adjustment factor as well as CEPR’s real wage program that uses the CPI-U-RS to convert dollar amounts to current year dollars. These variables have both an “r” prefix and an “_adj” suffix (for example, rincp_all_adj is the real wage adjusted total person’s income).
The ACS does not contain a variable for county. It does however, have variables for state, and PUMAs (public use microdata areas). You can use these variables to identify metropolitan areas that have at least 100,000 people.
Here’s a link to some more info on PUMAs from the Census Bureau. The PUMA names file will tell you which areas are covered by each PUMA. To uniquely identify them in our ACS files, you will need to use a combination of both the state (2 digits) and puma variable (5 digits).
Keep in mind that the combination of state and PUMA identifies metropolitan areas that have 100,000 or more people. Each combination doesn’t necessarily identify a specific county. You’ll have to take a look at the links above to figure out what each combination represents. There will be instances in which a single county is made up of several PUMAs, and other cases where multiple counties make up a single PUMA.
A: Generally, you’ll want to use the person weight (perwgt) if you’re trying to determine the characteristics of individuals., and the housing weight (hsgwgt) if you’re trying to determine the characteristics of households.
However, to generate more accurate standard error estimates for hypothesis testing and confidence intervals, replicate weights should be used.
The ACS includes 160 replicate weights, 80 for the analysis of individuals and 80 for the analysis of households. The commands below apply to individual replicate weights, but can easily be adapted for estimations of households.
To use replicate weights in Stata, you first must describe the survey to using the svyset command:
svyset [iw=perwgt], sdr(pwgtp1-pwgtp80) vce(sdr)
The use of the replicate weights allows the data to be treated as one strata, so no Primary Sampling Unit (PSU) needs to be specified. The full sample weight, perwgt, must be identified. Once some details of the survey data have been described, place the svy prefix before commands to use replicate weights in estimations. For example,
svy: reg rincp_all female
svy: mean educ
The above example use the successive difference replication (SDR) method, but bootstrap and balanced repeated replication (BRR) methods can also be applied. For the examples above, the SDR method provides the largest standard errors; almost twice the size of those estimated using bootstrapping and BRR. Not all commands can be used with the survey prefix; see help svy_estimation within Stata for a list.
For more information on replicate weights in the ACS, see the IPUMS page