Revision History
This page contains notes on data and documentation issues, fixes, and revisions. Please inform the NAPP staff of any problems with the data, so that we can improve it in future releases. napp@pop.umn.edu
Revised release of existing datasets, release of new complete count dataset, and a new data extraction system: October 2008
We have revised all our existing complete count datasets (Canada 1881, Norway 1865, Norway 1900, United States 1880, England and Wales 1881, and Scotland 1881) and sample datasets (Canada 1871, Canada 1901, and Norway 1875). We have also released a complete count dataset of Sweden 1900.
Additionally, we have made significant changes to our web interface to accommodate the growing number of samples and variables and to give users greater control while browsing variables or defining a data extract. The NAPP data extract system now replicates the IPUMS-International data extract system. Existing users should have automatic access to the new data extraction system. If you have any problems logging in or if you have any additional questions please do not hesitate to contact us.
Revisions to existing datasets, release of four new datasets, and new extract system: 25 October 2006
We have posted revisions to all existing datasets (Canada 1881, England and Wales 1881, Norway 1900, United States 1880), released four new datasets (Canada 1871 and 1901, Norway 1865, Scotland 1881), and moved the data to a completely new extract system.
The NAPP extract system is now the same system as the IPUMS-USA and IPUMS-International extract systems. Because of the move to a new extract system, we require existing users to re-register the first time they use the new system, and choose their own password. We apologize for the inconvenience to existing users.
The new datasets contain all variables previously available and a range of new constructed variables discussed below. The current release of the 1901 Canadian census does not include all variables collected in that census and is limited to variables that overlap with the 19th century complete-count censuses. We will release the additional variables in winter 2007.
We have included a substantial number of new variables describing geographic location, family relationships within households, work and employment status, and ethnicity and migration for all datasets. The Canadian censuses of 1871 and 1881 did not enumerate relationships between household members and lack the constructed variables derived from the original enumeration of relationship. The newly available variables are:
- Family interrelationship variables: Number of couples, mothers, and fathers in household. Grandparent pointers analogous to the existing MOMLOC, POPLOC and SPLOC but for grandparents. Number of sons or daughters married or unmarried is now available for all datasets with relationship information; previously these variables were only available for England and Wales 1881. Number of own children less than 10, analogous to the existing NCHLT5 variable. Relationship to household head codes now available in IPUMS-International format.
- Geographic variables: County, municipality or district areas. Urban residence is now available for Canada and the United States. City codes and populations for the United States (IPUMS compatible) and Canada. Enumeration and supervisors districts for the United States.
- Work and employment variables: Labor force participation now available for all samples. Harmonized occupational codes (adapted from the HISCO coding scheme) now available for the United States, as well as all Canadian and Norwegian samples. PRODUCT codes for sales workers now available for both the United States and Norway. Standardized occupational strings (OCCLABEL) available for the United States: this variable corrects spelling mistakes, expands abbreviations, and standardizes common phrases to allow researchers searching for very specific occupations a better chance of finding these individuals.
- Ethnicity and migration variables: Simplified country of birth codes identifying individuals as being born in a specific NAPP country, or any other country (NAPPSTER) now available for all samples. SPANNAME now available for the United States.
- Other variables: AGEMONTH now available for the United States.
Revisions to United States and Canadian data, 15 December 2004
We have posted revisions to the Canadian and United States data sets.
Canada, 1881: Most missing values have been corrected. Around 2000 cases have a missing value in one variable. These values will be corrected when the Canadian 1881 Census Project completes checking the source data.
United States, 1880: We have corrected all missing values, improved the code for constructing household inter-relationship pointers, and added the following new variables:
- LABFORCE: Labor force participation, based on the gainful occupation definition. This variable is consistent with the IPUMS LABFORCE variable for all pre-1940 censuses.
- SEIUS: Duncan Socioeconomic Index. Consistent with the IPUMS variable SEI.
- OCSCORUS: Occupational income score. Median total income in 1950 for the occupation (OCC50US) reported. The unit of this variable is hundreds of 1950 dollars. Thus an occupational income score of 70 means that the median total income of all people with the same occupation in 1950 was $7000. This variable is consistent with the IPUMS variable OCCSCORE.
- NAPPSTER: A recode of country of birth that identifies the five NAPP countries, assigns all other known birthplaces to one code, and retains a code for unknown birthplace. This variable allows users to easily select all people born in any NAPP country.
- SEAUS: State economic area: A grouping of contiguous counties within a state that had close economic ties at the 1940 and 1950 census. This variable is consistent with the IPUMS variable SEA, except in the Dakota Territory.
- YEAR: Year the census was conducted. Note that this is a four digit variable (the IPUMS uses two digits).
- RELEASED: Date this version of the data was released.
1900 Norwegian census, 8 November 2004
The 1900 Norwegian census was released on our beta site today. All variables that are common to the censuses of Norway, Canada, Great Britain and the United States are coded in harmonized coding schemes. Users should note the following issues:
- The Norwegian enumeration of relationship to head does not correspond to the household breaks. Specifically, not everyone who is the first person in a household is enumerated as a head. The Norwegian enumeration of relationships recorded relationships within dwellings which sometimes contained multiple households.
- To generate statistics at the household level, users should select people with a PERNUM of 1. Selecting people with a RELATE code of 0101 will not select all households.
- Users should note that several coding schemes and variables retain information that was not consistently enumerated, but which was nevertheless recorded by some enumerators. In particular;
- BPLNO retains some information on the states of the United States people were born in, though this information was not required.
- Ethnic origin and language spoken were not collected in all areas of Norway. See the variable descriptions for these variables for further detail.
- Coding of occupations into a version of HISCO modified by NAPP is complete for the first occupation specified. This variable is consistent with the encoding of Canadian occupations.
- Coding of secondary occupations is not yet complete.
- The following variables are alpha-numeric: RECTYP, RESNAMNO, OCCSTRNG, NAMEFRST and NAMELAST.
1881 England and Wales census, 8 November 2004
The 1881 England and Wales census was released on our beta site today. All variables that are common to the censuses of Great Britain, Canada, Norway and the United States are coded in harmonized coding schemes. Users should note the following issues:
- The data for Scotland are not yet available. We are currently fixing a problem with the geographic codes, and will release this data in early 2005. Please contact napp@pop.umn.edu if you have particular need for Scottish data before this date.
- Occupations are not yet coded to a harmonized coding scheme. Users wishing to compare occupations between Great Britain and the other NAPP countries should carefully consult the domestic coding scheme and identify similar titles. Harmonized occupational data will be availabe in 2005.
- Two sets of household relationship codes are available for this census: the codes used by the 1881 British census project (RELAGB) and a harmonized set of codes used in all NAPP censuses (RELATE).
- We have included the pointer variables constructed by the 1881 British census project (MOTHERGB, FATHERGB, SPOUSEGB) for comparison with the NAPP pointers constructed by the Minnesota Population Center. NAPP constructed pointers follow the IPUMS conventions, and constructed similarly for all datasets. Please alert us to any significant differences you find between the two sets of pointers.
- The following variables are alpha-numeric: RECTYP, BPSTPAGB, DISABGB, OCCSTRING, NAMEFRST and NAMELAST.
1881 Canadian Census, 23 December 2003
The 1881 Canadian census was released on our beta site today. All variables that are common to the censuses of the United States and Canada are coded in harmonized coding schemes. Users should note the following issues:
- Relationship to household head was not enumerated in the 1881 Canadian census. An imputed version, similar to IMPREL in IPUMS, will be constructed in 2005.
- Users should note that because relationship was not enumerated, all spouses in Canada are coded as 2 "Married, spouse absent" When IMPREL is constructed, we will be able to identify which spouses are present or not.
- Users should note that several coding schemes and variables retain information that was not consistently enumerated, but which was nevertheless recorded by some enumerators. In particular;
- BPLCA retains information on birthplace below province, although respondents were only required to state their province.
- The majority of respondents stated only one religion and one ethnic origin. To preserve information recorded by people who stated more than one religion or ethnic orgin we have created the variables ORIGN2CA and RELIGON2. These are blank for more than 4 million people out of a total population of 4.28 million. The coding schemes for ORIGIN and ORIGN2CA are identical, as are the codes for RELIGION and RELIGON2.
- Canadian occupations have been coded into a version of HISCO modified by NAPP, whereas United States occupations are currently only available in the domestic U.S. scheme. However, many occupations can be distinguished under both schemes although the codes or groups of codes needed to select them are different.
- The following variables are alpha-numeric: RECTYP, SDSTCA, SDSTNMCA, OCLANGCA, OCCSTRING, NAMEFRST and NAMELAST.
1880 U.S. Census, 4 August 2003
We have posted some corrections to the previous data set.
- All states and territories are now available, except for the territories that were not enumerated or not part of the United States; Alaska, Hawaii, and Oklahoma/Indian Territory.
- People who are married, but whose spouses are absent from the household now have the correct code for MARST (2, "Married, spouse absent")
- NATIVITY codes are now correct.
- Please note that there is an error in the variable YNGCH, which means that all YNGCH values are set to 0. We will correct this problem shortly.
- The problem with 13 year olds (see below) has been corrected.
1880 U.S. Census, 15 July 2003
Data from the U.S. Census of 1880 is now available through an extraction system. Users should note several issues in this preliminary release.
- Currently only Alabama through Ohio are available. Remaining states are being processed, and will be available shortly.
- This is a very large file, and your extracts will take some time to process, particularly if you are doing any case selection.
- Missing data values have not been allocated. Users may encounter a "Z" in numeric fields where there are missing values. This affects MARST, RELATE, and FBPLDTUS, and MBPLDTUS more than other variables. You can set your statistical package's program options to explicitly recognize this as the missing value code. In SAS you can insert the line "MISSING Z ;" in your data step. Alternatively, most packages should write character data as missing values if the field has been initialized as numeric. Stata users can execute the command "set more off" before reading in the data file so that the program will scroll past these errors. This issue affects a small percentage of cases, and we are working on correcting this error.
- There may be problems with case selection. Please let us know if your extracts are not created properly so that we can diagnose this problem.
- A coding error has changed the ages of 13 year olds to 12 years old. We will correct this error shortly.
- Approximately 180,000 people with potentially valid occupations have their occupations coded as "Occupation missing/unknown" (997) as coding of the occupation dictionary is not complete. A fully coded version of the occupational data will be available in winter 2003/4. People with occupation code 997 also have temporary codes for industry.
- Pointer and associated constructed variables should be used with caution in this beta release. In particular because the LDS transcription of the 1880 census did not record non-familial household relationships, there may be problems in constructing variables such as FAMUNIT.
- At this time, we are undertaking a final round of geography checking on the complete-count data. This involves checking total populations for small geographic areas with the published volumes, and reconciling any errors. Consequently, the data that is available now does not include the variables MCIVDIV, INCPLACE, SUPDIST or ENUMDIST. We anticipate making this information available in 2004.
If your research requires data from smaller geographic areas than counties, you should download the string variable (hyphen delimited) RECIDUS. This variable includes information on microfilm reel number, microfilm sequence number and stamped page number from the microfilmed manuscripts which provides enough information to identify smaller geographic areas in conjunction with indices to the microfilm. Please contact the NAPP staff for further information - Users should only download the alphabetic strings, NAMEFRST, NAMELAST, and OCCSTRNG if they are necessary for research. These 32 character variables increase file sizes significantly.
