Geo-Joining Datasets¶
In this step, the objective is to enrich each accident observation with the nearest rental agency and the nearest garage so as to simulate our operating model and plan capacity at the station or geographic level.
With the Geo-join processor, we’ll be able to easily calculate the distance between an accident and the nearest rental agency, or an accident and the nearest garage.
From the accidents_database_prepared dataset, create a new Prepare recipe with accidents_joined
as the output and the following steps in the script.
Add a new step with the Geo-join processor:
Specify the
latitude
andlongitude
columns from “this” dataset.Select rental_agencies_geocode as the “Dataset to Join with”.
geolatitude
andgeolongitude
identify the latitude and longitude coordinates in the “other” dataset.The “Columns to copy from the other dataset” are
agency_name
,geolatitude
, andgeolongitude
.Specify
station_
as the output column prefix.
Add another Geo-join processor step:
latitude
andlongitude
remain the columns from “this” dataset.garage_locations_prepared is the dataset to join with.
latitude
andlongitude
identify the needed columns in the “other” dataset.Copy the
name
andgeopoint
columns from the garage dataset.Specify
garage_
as the output column prefix.
Run the recipe, updating the schema to 25 columns.
Data preparation is done! We now have one dataset that includes not only the date, time, and location of every car accident, but also the names and locations of the nearest rental agency (or station) and garage for each accident.