By on 8.11.21 in Census 2020

We recently received a question from a reporter in North Carolina asking how to compare 2010 Census redistricting data to 2020 Census redistricting data when it is released on Thursday, August 12.   

This is a great question, and it can be localized. Here’s our guide on how to find the data, analyze the data, and what questions you can ask. For more great information, check out Tyler Duke’s GitHub containing a collection of data, methodology and other information related to the release of the 2020 Census redistricting file. (Tyler is a data journalist at the News & Observer and well-versed in Census data; his GitHub is a veritable treasure trove of material to explore and use.)

This tutorial steps you through how to do this using R (where possible…bear with me, I’m a slowly-learning Stata convert). In it, we will cover: 

  1. How to access 2020 Census data 
  2. NC Special: Adjusting 2020 Census data to reflect North Carolina’s 2019 redistricting plans
  3. How to obtain historical Census data for comparison
  4. How to convert historical Census data into 2020 Census boundaries
  5. How you can analyze the data

Skip the steps: North Carolina Data Files (CSVs) (will be updated as available):

2020 Census Population & Housing for Redistricting (NC Only)

File layout details (geographic variables vary from file to file)

Blocks | Counties | Places (Cities, Towns, and CDPs) | Place by County | Townships (Minor Civil Divisions) | VTDs | School Districts

2010 and 2020 Census Data for NC

Tracts | Counties | Congressional Districts | State House Districts | State Senate Districts | File Layout

Note: For data not shown here, check out Big Local News (“The 2020 Census Data Co-Op Project”, free with sign up and covers all states) or this GitHub from Tyler Dukes.

Step 1: How to access 2020 Census data 

The 2020 Census data in pipe-delimited, legacy format will be available for download from the U.S. Census Bureau at 1:00 p.m. EST on Thursday, August 12, 2021. The Bureau has provided detailed information on how to extract data from these files using Microsoft Access, SAS, or R (see “Legacy Format Support Materials”). 

The guidance here uses a modified version of the R statistical software import scripts provided by the Bureau (packages: dplyr). This works with the complete set of redistricting data and will subset a key set of variables (race and ethnicity for the total population and for population 18+, housing units and occupancy, and group quarters) for census blocks, census tracts, places, and counties. To use this file: 

  1. Identify the directory on your computer where you will be working and saving your information and scripts 
  2. Download the data of interest from the U.S. Census Bureau and save it to your directory. You’ll need all four files in the redistricting data release: 
    1. Geoheader 
    2. Segment 1 (includes Hispanic or Latino by Race for the Total Population) 
    3. Segment 2 (includes Hispanic or Latino by Race for the Population 18+ and Housing Occupancy Status) 
    4. Segment 3 (includes Group Quarters population) 
  3. Unzip the files 
  4. Save this R file to your directory
    Note: This R file builds on the file pl_all_4_2020_dar provided by the Bureau (lines 11-503 are directly from the Bureau’s original file) 
  5. Modify necessary elements in the R code 
    1. line 8: change working directory to your file path (step 1, above) 
    2. line 9: install dplyr if you do not already have it installed (install.packages(“dplyr”)) 
    3. lines 504-509: These export total population and total housing unit for counties and blocks. Comment out if not desired or change summary level (common SUMLEV codes) to select for an alternate geographic type. 
    4. lines 552-555: Identify selected variables (see lines 516-550) for inclusion in the data subset. Add additional variable names of interest if not already on the list or remove those not of interest (full list of names starts on page 99). 
    5. lines 560-593: Renames variables (add/delete to this list as necessary) 
    6. lines 598-601: Creates data sets for select geographies (census blocks, census tracts, counties, and places); change geographies by changing SUMLEV. 
    7. lines 603-607: Exports CSVs of the selected geographic outputs for use in other programs

Additional resources:

Step 2: NC Special: Adjusting 2020 Census data to reflect North Carolina’s 2019 redistricting plans 

The Census Bureau’s congressional and state legislative district boundaries in the 2020 Census redistricting data release are the 116th congressional and the 2018 state legislative districts. For North Carolina, this means that they do not reflect the 2019 redistricting plans and North Carolina data users cannot rely on the aggregate data in the P.L. 94-171 files or the Name Look-Up Tables provided by the Bureau.

There are two methods that we could use to get the 2020 data into the current NC redistricting shapes:

I used the backcasting method (option 2) to develop files for North Carolina’s congressional districts, state House, and state Senate (see my work or compare the population totals from the two approaches).

Step 3: How to obtain historical Census data for comparison

You can download historical redistricting data files directly from the U.S. Census Bureau (2010 and 2000), but the best resource for historical Census data is the National Historical Geographic Information System (NHGIS). NHGIS maintains easy-to-access summary tables and boundary files for all levels of U.S. Census geography and is free with signup. You can access historical files using the following steps: 

  1. Click on “Get Data” from the NHGIS home page.
  2. Filter the data to focus on the geography of interest. For example, we want the block, which is the building block of all other census geographies. The screenshot below shows an example of how the data sets are filtered to show just block-level data from the 2010 census.
  3. Choose tables to download. I wanted a subset of tables that would match most of what is being released in the redistricting file. I downloaded the following:
    2000 (SF1b)  2010 (SF1a) 
    NP001A. Total Population  NP001A. Total Population 
    NP005A. Total Population 18 Years and Over  NP005A. Total Population 18 Years and Over 
    NP008A. Population by Hispanic or Latino and Not Hispanic or Latino by Race  NP008A. Population by Hispanic or Latino and Not Hispanic or Latino by Race 
    NP027H. Population in Group Quarters by Group Quarter Type  NP027H. Population in Group Quarters by Group Quarter Type 
    NH001A. Total Housing Units  NH001A. Total Housing Units 
  4. Use the statistical software of your choice to infile the data and clean up variables to align with 2020 data of interest. Example: TXT file of my Stata code for 2010 extract.

Step 4: How to convert historical Census data into 2020 Census boundaries

One of the biggest challenges is that the census geographies like census tracts changes from decade-to-decade. This can make it difficult to compare population changes without additional manipulation. Tyler Dukes has a great write up of how to do this on his Github. The files we are releasing reflect this same approach (TXT file of my Stata code for reweighting 2010 blocks).

Step 5: How you can analyze the data

This section contains a list of questions we have received from reporters and community members. We’ll be updating this as we do our own analysis and receive and respond to questions, so check back for additional code and resources! (Have a question you’d like to see answered? Let us know at demography@unc.edu).

What is the right level of geography to understand local change?

Recent guidance from the Census Bureau on their Disclosure Avoidance System indicates that smaller geographies, such as blocks, may be “fuzzy” due to privacy protections; they recommend that users interested in small-area population changes analyze census tracts or larger geographies.

How can I visualize this data on a map?

We generally use a combination of ArcGIS (paid) to make shapefiles in combination with free visualization tools such as Tableau Public and Data Wrapper. Other free resources, such as R and QGIS, may be alternatives for making shapefiles.

What sorts of questions can you answer?

This really depends on what you are interested in (there are a lot!). For example, if you want to understand where we are in 2020:

  • How many people live in my community?
  • What is the race and ethnic composition of our population?
  • How many children live in my town?
  • How many residents in my community are in prison?
  • Compare parts to whole, e.g., census tract to county, county to state/nation.

Alternatively, if looking at 2020 in comparison to 2010:

  • How did my community change and grow over the decade?
  • Did all groups in my community grow at the same rate? Which groups grew the fastest?