Geo-Joining Datasets#

In this step, the objective is to enrich each accident observation with the nearest rental agency and the nearest garage so as to simulate our operating model and plan capacity at the station or geographic level.

With the Geo-join processor, we’ll be able to easily calculate the distance between an accident and the nearest rental agency, or an accident and the nearest garage.

From the accidents_database_prepared dataset, create a new Prepare recipe with accidents_joined as the output and the following steps in the script.

  1. Add a new step with the Geo-join processor:

    • Specify the latitude and longitude columns from “this” dataset.

    • Select rental_agencies_geocode as the “Dataset to Join with”.

    • geolatitude and geolongitude identify the latitude and longitude coordinates in the “other” dataset.

    • The “Columns to copy from the other dataset” are agency_name, geolatitude, and geolongitude.

    • Specify station_ as the output column prefix.

  2. Add another Geo-join processor step:

    • latitude and longitude remain the columns from “this” dataset.

    • garage_locations_prepared is the dataset to join with.

    • latitude and longitude identify the needed columns in the “other” dataset.

    • Copy the name and geopoint columns from the garage dataset.

    • Specify garage_ as the output column prefix.

Run the recipe, updating the schema to 25 columns.

../../../_images/compute_accidents_joined.png

Data preparation is done! We now have one dataset that includes not only the date, time, and location of every car accident, but also the names and locations of the nearest rental agency (or station) and garage for each accident.