Creating household or family characteristic measures

NAPP includes several variables that summarize the number of people within a household who have a particular characteristic, such as being a father or mother, or being a servant.

Because the number of variables of this type is limited only by users' imagination, and our files are already very large, we have opted not to create other variables that count characteristics of individuals within a household or family unit. This page contains instructions on how to create your own variables summarizing the number of people within a household or family unit who have a characteristic of interest. This guide assumes you have read and understood the Introduction to Family Interrelationship Variables. Please note that these instructions will also work for IPUMS datasets.

Identifying households and families

Within a single dataset individuals living in the same household are identified by a SERIAL number. Family units—identified by FAMUNIT— are nested within households, though in many cases there will only be one family living in the household, making FAMUNIT 1 for all individuals within the household.

If you are using one dataset only, you can identify households using SERIAL only. If you are using more than one dataset, you must identify the datasets with the variable SAMPLE to uniquely identify each sample (country, year, and dataset).

Note that if you are using United States data, households sampled as large units will not contain records for all persons in the household. Counts of characteristics will not be accurate for these households. You can identify these households using GQTYPE. The complete count NAPP censuses contain all people in group quarters, making this restriction unnecessary.

To summarize, a household is uniquely identified by the combination of SAMPLE and SERIAL. A family unit is identified by the combination of SAMPLE, SERIAL, and FAMUNIT.

Overview of creating household or family unit counts

To create counts of characteristics within households or families you will need to:

  1. Create an indicator/binary/dummy variable for each individual, identifying whether they have the characteristic you are interested in.
  2. Depending on your statistical package, sort your data so that people in the same household or family are grouped together. Within a household sort individuals by PERNUM
  3. Sum the indicator variable across the unique combinations of SAMPLE, SERIAL, [and FAMUNIT].

Stata code for creating household or family counts

gen byte <your-new-indicator-variable> = <variable-of-interest>==<value-of-interest> <if> <in>

quietly replace <indicator-variable> = 0 if <indicator-variable> == .

egen <household-sum> = total(<indicator-variable>), by(country year datanum serial famunit)

label variable <household-sum> "Number of people in household/family who have <variable-of-interest>==<value-of-interest>"

Example: Number of farm laborers in a household

gen byte isfarmlab=occ==62110

egen num_farmlabs_in_hh = total(isfarmlab), by(country year serial)

label variable num_farmlabs_in_hh = "Number of farm laborers (OCC=62110) in household"

SAS code for creating household or family counts

Example: Number of farm laborers in a household

DATA <your-data-set> ;
LENGTH isfarmlab 3 ;
SET <your-data-set> ;
IF occ=62110 THEN isfarmlab=1 ;
ELSE isfarmlab=0 ;
RUN ;

PROC SQL ;
CREATE TABLE temp
AS SELECT country, year, datanum, serial, SUM(isfarmlab) AS num_farmlabs_in_hh
FROM <your-data-set>
GROUP BY country, year, datanum, serial
;

CREATE TABLE <your-augmented-data-set>
AS SELECT a.*, b.num_farmlabs_in_hh
FROM <your-data-set> AS a, temp AS b
ON a.country=b.country AND a.year=b.year AND a.datanum=b.datanum AND a.SERIAL=b.SERIAL
;
QUIT ;

Note to SPSS users

SPSS users should follow the logic of the SAS and Stata code above. Create an indicator variable for each individual, and then use the AGGREGATE command across the variables identifying households.