Prepare the Demographics Dataset

With the bike station data in place, let’s prepare data on the number of people living in DC at a granular level.

Create a new Uploaded Files dataset from the Demographics file. Accept the default name block_group_demog.

This dataset contains the number of people in the US Census ACS5Y 2013 at the block group level. For reference, note that the US Census generally follows a hierarchy of States > Counties > Census Tracts > Block Groups > Census Blocks.

The fully qualified block group identification is contained within the geoid column. We can split this column to obtain the block group.

Note

We could alternatively build up the block group from the state, county, tract, and blkgrp columns. As a stretch goal, see if you can figure out how to do that. Then consider which method you prefer. There are many means to an end in data science, and you will need to assess what works best in each situation.

Create a new Prepare recipe from this dataset with the default output block_group_demog_prepared and the following steps in its Script:

  • Use the Split column processor on the geoid column with US as the delimiter. Choose to Truncate, keeping only one of the output columns, starting from the End.

    • The new geoid_0 column should keep everything after the “US”.

  • Use the Rename columns processor on a few columns:

    Old name

    New name

    geoid_0

    block_group

    name

    block_name

    BOOOO1_001E

    nbPeople

  • Ensure the block_group column storage type is set to “string” and NOT “bigint” by editing the schema from the menu directly beneath the column header.

  • Use the Round numbers processor on the nbPeople column to round values to integers (0 significant digits, 0 decimal places).

  • Remove five more columns that will no longer be required: state, county, tract, blkgrp, and geoid

  • Run the recipe, updating the schema to four columns.

../../../_images/compute_block_group_demog_prepared.png

Now that we have the number of people in every census block group in DC, the demographic data is ready!