Solution | Distribution Spatial Footprint#
Overview#
Business Case#
Distribution spatial footprint analysis is a powerful way to optimize retail networks depending on customers, competition, and distribution centers locations: this optimization is particularly critical for retailers.
Fueled with the right data, it can generate up to 20% in sales increase. Achieving these results relies on both global and local network optimizations, for which several use cases can be implemented such as opening/closing/relocating stores, finding the best places for new distribution centers, optimizing marketing campaigns based on local networks specificity etc.
A fundamental aspect of this solution is the computation of isochrone areas in order to enrich the input data for geospatial analysis. Isochrone areas are a type of catchment area which represents the area from which a location is reachable by someone within a given amount of time, using a particular mode of transportation.
The solution consists of a data pipeline that computes isochrones areas, further enriches the input data using these computed areas, and in doing so opens up a wide range of geospatial analyses. Analysts can input their own data and surface the outputs in a dashboard or interactive WebApp in order to analyze their organization’s own distribution networks. Data Scientists should use this solution as an initial building block to develop advanced analytics / support decision making. Roll-out and customization services can be offered on demand.
Installation#
The process to install this solution differs depending on whether you are using Dataiku Cloud or a self-managed instance.
This solution is not available on Dataiku Cloud. Although you may try to import the zip file found in the self-managed instructions onto a Cloud instance, Dataiku offers no support in this case.
After meeting the technical requirements below, self-managed users can install the Solution with the following instructions:
From the Design homepage of a Dataiku instance connected to the internet, click + Dataiku Solutions.
Search for and select Distribution Spatial Footprint.
Click Install, changing the project folder into which the solution will be installed if needed.
From the Design homepage of a Dataiku instance connected to the internet, click + New Project.
Select Dataiku Solutions.
Search for and select Distribution Spatial Footprint.
Note
Alternatively, download the Solution’s .zip project file, and import it to your Dataiku instance as a new project.
Technical Requirements#
To leverage this solution, you must meet the following requirements:
Have access to a Dataiku 12.5+* instance.
An API key for Openrouteservice or Here, a location platform for developers.
A Python 3.9 code environment named
solution_distribution-spatial-footprint
with the following required packages:
openrouteservice==2.3.3
folium==0.12.1
geopy==2.1.0
geopandas==0.8.2
Shapely==1.7.1
Flask==1.1.2
flexpolyline==0.1.0
scikit-learn==0.24.2
Install the following plugins:
-
To install this custom plugin, select Add Plugin > Upload and select the zip file. You will be asked to create a python 3.6 environment for the plugin.
Note: This plugin is required for the webapp but is not necessary for the project, Dataiku app, and dashboards to run.
Data Requirements#
The Dataiku Flow was initially built using publicly available data consisting of various French grocery store locations in the Burgundy region of France and fictional customer data. However, this project is meant to be used with your own data which can be uploaded using the Dataiku Application. Your input data should meet the general data requirements and will be renamed to the following datasets:
Dataset |
Description |
---|---|
locations_dataset |
Meets the following data requirements:
|
customers_dataset (optional) |
Meets the following data requirements:
|
Workflow Overview#
You can follow along with the sample project in the Dataiku gallery.
The project has the following high level steps:
Input your data and select your analysis parameters via the Dataiku Application.
Ingest and pre-process the data to be compatible with geospatial analysis.
Compute requested isochrones per location using the selected API service.
(If customer data is provided) Identify and count the customers located within your distribution network isochrones.
Visualize the overlapping isochrones in your distribution network as well as the locations of customers (if applicable) using pre-built dashboards.
Interactively analyze your distribution spatial footprint using a pre-built WebApp.
Walkthrough#
Note
In addition to reading this document, it is recommended to read the wiki of the project before beginning to get a deeper technical understanding of how this Solution was created and more detailed explanations of Solution-specific vocabulary.
Plug and play with your own data and parameter choices#
To begin, you will need to create a new instance of the Distribution Spatial Footprint Dataiku Application. This can be done by selecting the Dataiku Application from your instance home, and click Create App Instance.
Once the new instance has been created you can walk through the steps of the Application to add your data and select the analysis parameters to be run.
In the Inputs section of the Application, upload your distribution network dataset and, optionally, customer dataset. Refer to the Data Requirements section above for the specific formatting requirements for your data. In this section, you will also need to specify the identifier column(s) (i.e. the name of the column(s) containing location data) and how your locations are defined (i.e. latitude/longitude or addresses). It is important that you select the location definition as this will impact which preprocessing steps are run in the Flow.
Once your data has been uploaded and parameters input, the data can be preprocessed by clicking the PREPROCESS button(s) in the Preprocessing section of the App.
After the data is done being preprocessed we can move on to the Isochrones section where you will be asked to select an API service. At this time you should copy your API key and paste it into the correct field of the App. Here you will also be able to select the mode of transportation to base your isochrones off of, the isochrones to be computed based on travel time from a central location, and any other isochrone attributes of interest. Please note that isochrone attributes may vary between API providers. Once everything looks good, click Build.
Optionally, if you uploaded a customer dataset, the final Customers section of the App is where you can select which specific isochrones you want to search for customers within.
Once we’ve built all elements of our Dataiku Application you can either continue to the Project View to explore the generated datasets or go straight to the dashboards and WebApp to visualize the data. If you’re mainly interested in the visual components of this pre-packaged solution, feel free to skip over the next section.
Under the Hood: What happens in the Dataiku Application’s underlying Flow?#
The Dataiku Application is built on top of a Dataiku Flow that has been optimized to accept input datasets and respond to your select parameters. Let’s quickly walk through the different Flow zones to get an idea of how this was done.
Flow zone |
Description |
---|---|
inputs_zone |
Contains the uploaded dataset(s) for your distribution network (locations_dataset) and customer data (customers_dataset). |
Default |
Includes elements of your data that are not needed in the preprocessing zone (i.e data that is already geocoded). |
preprocessing_zone |
Is dependent on the parameters you input to the Dataiku Application to define the type of your location identifiers.
|
isochrones_zone |
Includes 3 datasets of interest that are created by sending each row of the locations_prepared to the selected isochrone API service to compute the requested isochrones.
|
customers_zone |
If you provided customer data to the Dataiku Application, this zone will:
|
dashboards_zone |
Is dedicated to the isolation of all the datasets needed to build visualizations for the two project dashboards (see below for more details). |
webapp_zone |
Isolates the datasets used by the solution WebApp and can largely be ignored unless you wish to make changes to the WebApp. Editing these datasets will break the WebApp. |
Expanding analysis with an interactive WebApp#
Although this solution already contains high value visualizations in the pre-built dashboards, the geospatial analytic capabilities are taken further by enabling you to conduct your own visual analysis using a pre-built WebApp for spatial analysis. The webapp has several fields that can be used to impact the real-time map visualization. Please note, at this time the WebApp does not save your previous searches and will restore to its default empty state each time you re-load it.
To begin, select the isochrones you would like to focus on from the full list of computed isochrones. The icon will change to reflect the transportation mode you previously selected in the Dataiku Application.
Within the Network Analysis section you can either individually add locations from your full list OR apply filters based on your location data (e.g. city, shop type, etc.).
The number of visualized points of sale out of the total sample of Points of Sale will update with the map. You can also increase or decrease the random sample size but please be wary that larger samples might cause the WebApp to slow down.
More information about a location can be displayed by clicking on the location pin
When displaying locations based on filter, clicking on the location pin will also allow you to unselect a specific location (i.e. remove it from displaying on the map) or focus on a specific location (i.e. remove all other locations from the map). This will push you into the From Location selection option. Switching back to From Filters will add all unselected locations back to the map
Similarly, clicking on an area within an isochrone will display a card with the isochrone information
The Comparitive Network Analysis section is turned on/off by clicking the slider button. Here you can add locations for comparison using the same fields as in the network analysis section.
Doing so has many benefits including identifying isochrone canibilization between points of sale or identifying strategic distribution points to support all your points of sale.
Sample size can also be independently increased/decreased here
Lastly, if customer data was provided, Customer Analysis can be turned on with the slider button. Customers cannot be added individually but can populate the map using filters based off of customer information in your customer dataset.
Only customers contained in the isochrones of a location will display on the map so a location must be selected.
Customer detail will increase by zooming and individual customer points can be clicked on to display the full card of customer information.
The sample size can be independently increased/decreased and the value you select will be a random sample of customers per location (e.g. sample size of 100 when 2 locations are displayed will result in 200 customers being displayed on the map)