This page contains notes on data and documentation issues, fixes, and revisions. Please inform the NAPP staff (at firstname.lastname@example.org) of any problems with the database, so we can improve it in future releases.
November 2016: Added new dataset
Released the full-count dataset for United States 1850.
November 2015: Added new and revised datasets
Released new full-count datasets for Great Britain 1911, Denmark 1787 and 1801, Iceland 1703 and 1910, and Sweden 1880.
Released a new dataset for Iceland 1729 which contains full-count data for three counties: Rangárvallasýsla, Árnessýsla, Hnappadalssýsla.
Released a new 5% sample for Canada 1911.
Released a revised full-count dataset for United States 1880. The revision includes the following additions and improvements:
- The revised file includes the variable OCCHISCO, which provides occupation using the Historical International Standard Classification of Occupations (HISCO) coding scheme.
- The revised file also includes several variables that were not present in the original Church of Jesus Christ of Latter-day Saints complete-count database for 1880: DEAF, BLIND, MAIMED, IDIOTIC, INSANE, SICKNESS, MARRINYR, SCHOOL, LIT, MOUNEMP, and QTRUNEMP.
- Several minor edits to singular cases correcting entry errors. Most notably, most of the cases formerly coded as "Adopted, n.s." (RELATED code 0304) have been changed to "Adopted Child". This change also effects the variables STEPMOM and STEPPOP as some children coded as "Adopted, n.s." were considered step-children when adoption status could not be determined.
- RELATE codes for some boarders and lodgers were being coded as "Relative of Employee" in cases where there was no employee in the household. These codes have been corrected. There were also several cases where an original RELATE code was missing and allocated, but the following individual RELATE codes within the household were incorrectly edited to Boarder/Lodger. This has been fixed to reflect the original input values for RELATE. This change to RELATE also affects variables that were generated based on family interrelationship.
- Data from one missing reel of microfilm was restored to the 1880 complete-count database.
Released a revised full-count dataset for Iceland 1901. The revision includes improved household breaks as well as refined versions of parish and farm ID variables. The dataset also includes a new parish of birth variable.
June 2012: Added new datasets and linked samples
Released a new full-count dataset for Noway 1910.
Released a new full-count dataset for Sweden 1890 and a slightly revised full-count dataset for Sweden 1900.
Released samples of linked males, females and couples across the 1851 and 1881 Great Britain datasets.
July 2011: Added new datasets
Released new full-count datasets for Iceland 1801 and 1901.
Released a new full-count dataset for Norway 1801.
Released two new samples for Canada. The 1852 sample is a systematic 1-in-5 sample of the national population. The 1891 sample combines three slightly overlapping subsamples of 5, 10 and 100% into one national sample.
February 2011: Improved web interface
Introduced a new version of the web user interface for browsing variables and creating data extracts. The new system is explicitly designed around the concept of a "data cart" to which one adds variables and samples while browsing, and from which one "checks out" to generate a data extract.
July 2010: Added new datasets, released expanded dataset, and added linked data samples
Released new datasets from the United States from 1850 to 1870 and 1900 to 1910.
Added an expanded version of the 1880 United States dataset, with additional education and disability variables and a 1-in-5 oversample of the minority population.
Released a new sample from Mecklenburg-Schwerin 1819, which includes full count data for the city of Rostock.
Linked datasets across samples in the United States and Norway. The datasets for Norway include linked males and couples across all three census years from 1865 to 1900. The datasets for the United States include 7 linked pairs of census years involving the 880 complete count data. The linked years include: 1850-1880, 1860-1880, 1870-1880, 1880-1900, 1880-1910, 1880-1920, and 1880-1930. We have created three independent linked samples for each paired year: linked men, linked women, and linked married couples. For more information on the linked samples, refer to the linked samples page.
October 2008: Released new complete count dataset, revised existing datasets, and revised the data extraction system
Released a complete count dataset of Sweden 1900.
Revised existing complete count datasets (Canada 1881, Norway 1865, Norway 1900, United States 1880, England and Wales 1881, and Scotland 1881) and sample datasets (Canada 1871, Canada 1901, and Norway 1875).
Significantly altered our web interface to accommodate the growing number of samples and variables and to give users greater control while browsing variables or defining a data extract. The NAPP data extract system now replicates the IPUMS-International data extract system.
Released four new datasets (Canada 1871 and 1901, Norway 1865, Scotland 1881).
Revised all existing datasets (Canada 1881, England and Wales 1881, Norway 1900, United States 1880). New and revised datasets contain a substantial number of newly constructed variables:
Family interrelationship variables: Added variables on number of couples, mothers, and fathers in household. Added grandparent pointers (analogous to the existing MOMLOC, POPLOC, and SPLOC pointer variables, but for grandparents). Number of sons or daughters married or unmaried is now available for all datasets with relationship information; previously, variables were only available for England and Wales 1881. Added new variable for number of children under age 10, analogous to the existing NCHLT5 variable. Relationship to household head codes are now available in IPUMS-International format.
Geographic variables: For Canada and the United States, urban residence, (IPUMS compatible) city codes, and city populations are now available. Enumeration and supervisors districts are available for the United States.
Work and employment variables: Labor force participation is now available for all samples. Harmonized occupational codes (adapted from the HISCO coding scheme) are available for the United States, Canada, and Norway. PRODUCT codes for sales workers are now available for both the United States and Norway. Standardized occupational strings (OCCLABEL) are available for the United States; this variable corrects spelling mistakes, expands abbreviations, and standardizes common phrases, to allow researchers searching for very specific occupations a better chance of finding these individuals.
Ethnicity and migration variables: Simplified country of birth codes identifying individuals as being born in a specific NAPP country, or in any other country (NAPPSTER), are now available for all samples. SPANNAME is now available for the United States.
Other variables: AGEMONTH now available for the United States.
2005: Added data, harmonized additional occupational data, added constructed family relationship variables
Completed additions to the 2004 data release, including adding Scotland to the Great Britain sample, harmonizing occupational data for Great Britain, and adding imputed relationships for Canada 1881.
December 2004: Posted revisions to the Canadian and United States datasets
Corrected missing values in Canada 1881.
Corrected missing values and improved the code for constructing household inter-relationship pointers for the United States 1880 data. Also added the following new variables:
LABFORCE: Added labor force participation variable based on the gainful occupation definition. This variable is consistent with the IPUMS LABFORCE variable for all pre-1940 censuses.
SEIUS: This variable for the Duncan Socioeconomic Index is consistent with the IPUMS variable SEI.
OCSCORUS: This occupational income score variable reports median total income in 1950 for the occupation (OCC50US). The unit of this variable is hundreds of 1950 dollars. Thus, an occupational income score of 70 means that the median total income of all people with the same occupation in 1950 was $7000. This variable is consistent with the IPUMS variable OCCSCORE.
NAPPSTER: This recode of country of birth identifies the five NAPP countries, assigns all other birthplaces to one code, and retains a code for unknown birthplace. This variable allows users to easily select all people born in any NAPP country.
SEAUS: This variable for state economic area (a grouping of contiguous counties that had close economic ties at the 1940 and 1950 censuses) is consistent with the IPUMS variable SEA, except in the Dakota Territory.
YEAR: Reports the year the census was conducted. Note that this is a four digit variable, while the IPUMS YEAR variable uses two digits.
RELEASED: Reports the date this version of the data was released.
November 2004: Released two new census datasets
All variables that are common to the censuses of Great Britain, Canada, Norway, and the United States are coded in harmonized coding schemes.
Norway 1900: users should be aware of the following data characteristics.
- The Norwegian enumeration of relationship to head does not correspond to the household breaks. Specifically, not everyone who is the first person in a household is enumerated as a head. The Norwegian enumeration of relationships recorded relationships within dwellings, which sometimes contained multiple households.
- To generate statistics at the household level, users should select people with a PERNUM of 1. Selecting people with a RELATE code of 0101 will not select all households.
Users should note that several coding schemes and variables retain information that was not consistently enumerated, but which was nevertheless recorded by some enumerators. In particular;
- BPLNO retains some information on the U.S. states where people were born, though this information was not required.
- Ethnic origin and language spoken were not collected in all areas of Norway. See the variable descriptions for ORIGIN and LANGUAGE variables for further detail.
- Added a NAPP version of HISCO for the first occupation specified.
- The following variables are alpha-numeric: RECTYP, RESNAMNO, OCCSTRNG, NAMEFRST and NAMELAST.
England and Wales 1881: Users should be aware of the following characteristics
- The data for Scotland and occupational harmonization have yet to be added to this sample.
- Two sets of household relationship codes are available for this census: the codes used by the 1881 British census project (RELAGB) and a harmonized set of codes used in all NAPP censuses (RELATE).
- We have included the pointer variables constructed by the 1881 British census project (MOTHERGB, FATHERGB, SPOUSEGB) for comparison with the NAPP pointers constructed by the Minnesota Population Center. NAPP-constructed pointers follow the IPUMS conventions and are constructed similarly for all datasets.
- The following variables are alpha-numeric: RECTYP, BPSTPAGB, DISABGB, OCCSTRING, NAMEFRST and NAMELAST.
December 2003: Released Canada 1881 data
Released a new dataset for Canada 1881. All variables that are common to the censuses of the United States and Canada are coded in harmonized coding schemes. Users should note the following issues:
- Relationship to household head was not enumerated in the 1881 Canadian census. Without relationship information, all spouses are coded as 2, "married, spouse absent". An imputed version of relationship, similar to IMPREL in IPUMS, will be constructed in the future, and will help identify whether or not a spouse is present.
- Users should note that several coding schemes and variables retain information that was not consistently enumerated, but which was nevertheless recorded by some enumerators. In particular:
- BPLCA retains information on birthplace below province, although respondents were only required to state their province.
- The majority of respondents stated only one religion and one ethnic origin. If a second ethnic origin or religion was entered, that information is recorded in a second variable, ORIGN2CA and RELIGON2 respectively. Because most people did not list a second entry, the vast majority of cases are blank.
- Canadian occupations have been coded into a version of HISCO modified by NAPP.
- The following variables are alpha-numeric: RECTYP, SDSTCA, SDSTNMCA, OCLANGCA, OCCSTRING, NAMEFRST and NAMELAST.
August 2003: Updated the United States 1880 data
We have posted some corrections to the previous dataset.
- All states and territories are now available, except territories that were not enumerated or not part of the United States: Alaska, Hawaii, and Oklahoma/Indian Territory.
- Corrected codes in MARST for people who are married, but whose spouses are absent from the household (2, "Married, spouse absent")
- Corrected NATIVITY codes.
- Corrected an issue with 13 year olds (see below) in the AGE variable.
July 2003: Released preliminary United States 1880 data
Released preliminary data from the U.S. Census of 1880. These preliminary data are not complete. Users should be aware of several issues.
- Preliminary data include the states alphabetically from Alabama through Ohio.
- This is a very large file, and your extracts will take some time to process, particularly if you are doing any case selection.
- Missing data values have not been allocated. Users may encounter a "Z" in numeric fields where there are missing values in MARST, RELATE, and FBPLDTUS, and MBPLDTUS.
- A coding error has changed the ages of 13 year olds to 12 years old.
- Occupation coding is incomplete for approximately 180,000 people. People with occupation code 997 also have temporary codes for industry.
- Pointer and associated constructed variables should be used with caution in this beta release.
- At this time, we are undertaking a final round of geography checking on the complete-count data, so small level geographic units are not yet available. If your research requires data from smaller geographic areas than counties, you should download the string variable (hyphen delimited) RECIDUS. This variable includes information on microfilm reel number, microfilm sequence number and stamped page number from the microfilmed manuscripts, which provides enough information to identify smaller geographic areas.
- Users should only download the alphabetic strings NAMEFRST, NAMELAST, and OCCSTRNG if they are necessary for research. These 32 character variables increase file sizes significantly.