Solution | Demand Forecast#

Overview#

Business Case#

Predicting how your business will behave in the future, whether being short, medium or long term is hard. Yet, it is critical for all companies to have the ability to forecast future trends in a reliable manner to answer a broad range of strategic questions. In order to answer those questions, companies should be able to plan for future trends: how? By leveraging Demand Forecast.

In this plug and play solution, transactions dataset, product metadata, seasonal events, and point of sale information is used to train a model to forecast demand. Users can quickly model different forecasting scenarios, enrich their model with demand drivers, leverage seasonality effects, and pave the road to optimization use cases.

Installation#

The process to install this solution differs depending on whether you are using Dataiku Cloud or a self-managed instance.

Dataiku Cloud users should follow the instructions for installing solutions on cloud.

  1. The Cloud Launchpad will automatically meet the technical requirements listed below, and add the Solution to your Dataiku instance.

  2. Once the Solution has been added to your space, move ahead to Data Requirements.

After meeting the technical requirements below, self-managed users can install the Solution in one of two ways:

  1. On your Dataiku instance connected to the internet, click + New Project > Dataiku Solutions > Search for Demand Forecast.

  2. Alternatively, download the Solution’s .zip project file, and import it to your Dataiku instance as a new project.

Technical Requirements#

To leverage this solution, you must meet the following requirements:

  • Have access to a Dataiku 12.0+* instance.

  • To benefit natively from the solution, your data (see Data Requirements) should be stored in one of the following connections:

    • Snowflake

    • Google Cloud Platform: BigQuery + GCS (Both are required if you want to leverage BigQuery)

    • PostgreSQL

    • However, the solution comes with demo data available on the filesystem managed connection.

  • No code environment is needed for using this solution if running an instance with Python3 as the built-in environment. If your instance has a Python2 built-in environment you should create a basic Python3 code env and update the project settings.

Data Requirements#

The Dataiku Flow was initially built using publicly available data. However, this project is meant to be used with your own data which can be uploaded using the Dataiku Application. Below are the input datasets that the solution has been built with:

Mandatory Datasets

  • transactions

  • products

Optional Datasets

  • forecast_granularity

  • products_inventory

  • events_calendar

  • products_pricing_information

  • company_forecasts

Workflow Overview#

You can follow along with the sample project in the Dataiku gallery.

Dataiku screenshot of the final project Flow showing all Flow zones.

The project has the following high-level steps:

  1. Connect your data as an input and select your analysis parameters via the Dataiku Application.

  2. Frame your demand forecast modeling problem (regular and/or cold start predictions).

  3. Ingest and pre-process the data to be available for demand forecasting.

  4. Identify seasonality and gather forecast features.

  5. Train a demand forecasting model and score data.

  6. Interactively explore the forecast model and the predicted forecast of products with a pre-built dashboard.

Walkthrough#

Note

In addition to reading this document, it is recommended to read the wiki of the project before beginning to get a deeper technical understanding of how this Solution was created and more detailed explanations of Solution-specific vocabulary.

Plug and play with your own data and parameter choices#

To begin, you will need to create a new instance of the Demand Forecast Dataiku Application. This can be done by selecting the Dataiku Application from your instance home, and clicking Create App Instance.

Once the new instance has been created you can walk through the steps of the Application to add your data and select the analysis parameters to be run. Users of any skill level can experiment with multiple approaches to quickly find the parameters that best fit an organization’s needs. You could also instantiate multiple Demand Forecast projects to compare more easily your feature engineering and modeling approaches.

Dataiku screenshot of part of the Dataiku Application for Demand Forecasting

Once we’ve built all elements of our Dataiku Application you can either continue to the Project View to explore the generated datasets or go straight to the dashboards and webApp to visualize the data. If you’re mainly interested in the visual components of this pre-packaged solution, feel free to skip over the next few sections

Under the Hood: How do we extract the necessary data to train a forecast model?#

The Dataiku Application is built on top of a Dataiku Flow that has been optimized to accept input datasets and respond to your select parameters. Let’s quickly walk through the different Flow zones to get an idea of how this was done.

Flow zone

Description

inputs

Centralizes the ingestion of all data sources involved in the use case (mandatory and optional).

products_metadata_preprocessing

Employs a Prepare recipe to process the products/SKUs metadata information to be used by the webapp and as input features for the forecast model.

products_sales_quality_audit

Assesses products sales regularity and sales volume quality so that we can choose which product sales data can be rejected from model training.

forecast_preprocessing

Takes the data from our input transactions history dataset and generates multiple datasets which will allow us to resample and frame the data in a way that is relevant to the demand forecast model.

seasons_preprocessing

Pushes information contained in the editable seasons dataset to a dataset in your preferred connection (e.g. filesystem managed, Snowflake).

sales_resampling_&_preparation

Resamples and prepares our sales data so that they are in the correct format for time feature engineering and enrichment further down the Flow.

sales_windows

Computes time windows over our sales information so that, for each period of time, we can assess the min, max, average, standard deviation, count, and lag of sales in preceding time periods and identify sales of previous years.

Dataiku screenshot showing how we identify product/SKU seasonality with KMeans.

The following Flow zones are optional and will not be used if the related optional datasets and/or parameters in the Dataiku Application are not included.

Flow zone

Description

forecast_granularity_preprocessing

Preprocesses the offline locations data and maps it to the forecast granularity defined in the Dataiku App. With this, we can enrich the sales historical data with our locations information, as geospatial features to be included in the model training.

products_inventory_feature_engineering

Prepares inventory data so that it can be joined with all other demand forecast features.

products_seasonality

Assesses each product/SKU’s individual seasonality by aggregating the fraction of our monthly or seasonal sales compared to our yearly sales. Then a KMeans clustering model is trained over these sales fractions to identify season product/SKU clusters.

calendar_events_feature_engineering

Transforms your calendar event data into time features exploitable by the forecast model. These features will transcribe the closeness of each *Demand Forecast* period to all the events that surround it.

known_periods_products_pricing_imputation

Aggregates the products/SKUs pricing information over the past 5 periods to be later used to fill missing data.

unknown_period_products_pricing_imputation

Contains two branches that are relevant if price is a key component of our demand forecast. If we have known sales prices we can impute the forecast period pricing information with our known data. If we only have our last known pricing information we can use that to impute but it is the less robust option.

Once your data has been appropriately ingested, pre-processed, and engineered via the aforementioned Flow zones, we can begin to extract what features will be used to train our model.

Under the Hood: Turning past data into future demand#

Several Flow zones are used to take the prepared data and train a demand forecast model.

Flow zone

Description

forecast_features_gathering

Gathers all the features that have been previously computed in the Flow and resample the product/SKU’s sales dataset. We then apply a prepare recipe to clean up our data in the format needed to pass it along to the next Flow zone.

demand_forecast_split

Splits the data into several sets to train and evaluate the Demand Forecast models.

demand_forecast

Trains models to predict the demand of each product/SKU in a target period of time (set in the Dataiku App). The solution uses an XGBoost model since that is what performed the best on our data but it is recommended to train other algorithms on your own data to find the one with the best performance.

Dataiku screenshot of the trained Demand Forecast Model

demand_forecast_evaluation

Stores the results of the deployed models evaluations.

Dataiku screenshot of the trained Demand Forecast Model

webapp_zone

Is not involved in the training of the forecast model but does map all past sales information with the recently scored forecast demands so that we can visualize the results of our Flow in the solution’s webapp.

Further explore your Demand Forecast with shareable visualizations#

The Demand Forecast solution comes with two prebuilt dashboards:

Dashboard

Description

Demand Forecast Dashboard

Includes the following tabs:

  • The Webapp | Forecast explorer tab provides a prebuilt webapp to enable assessment of our individual products’ forecasted demand at each forecast granularity. The webapp allows us to visualize multiple products together and compare the forecasted demand to past sales.

    Dataiku screenshot showing the interactive webapp for forecast exploration.
  • The Forecast models benchmark allows to compare the application deployed models with other baseline models.

  • The Forecast model evaluation tab allows for a more global analysis of the demand forecast model by leveraging Dataiku’s subpopulation analysis capability to look at the model predictions based on categorical attributes.

  • Forecast Model interpretation uses a feature importance graph to present the most important variables driving product demand and partial dependence plots to assess the relationship between input features and the model’s predictions.

  • Lastly, we can globally monitor our sales and quickly identify our most popular items via the Sales monitoring tab.

Products Seasonality Dashboard

Includes the Seasonal clustering tab, which allows us to observe the results of our products/SKUs’ seasonal clustering and assess the distribution of the clustering features.

A short note on automation#

It is possible to automate the Flow of this solution to be triggered based on new data, a specific time, etc via the Dataiku Application. All of these trigger parameters can be tuned in the Scenarios menu of the project. Additionally, reporters can be created to send messages to Teams, Slack, email, etc. to keep our full organization informed. These scenarios can also be run ad-hoc as needed. Full detail on the scenarios and project automation can be found in the wiki.

Going Beyond Demand Forecast#

Demand Forecast alone provides organizations the ability to make highly impactful and critical decisions based on historical data but the impact of Demand Forecast doesn’t end with this solution. Once your Demand Forecast solution has been appropriately tuned to your needs, the outputs of the solution can be used as an input to Dataiku’s Markdown Optimization solution for automated pricing and promotion insights.

Reproducing these Processes With Minimal Effort For Your Own Data#

The intent of this project is to enable marketing teams to have a plug-and-play solution built with Dataiku to forecast the demand of products over a period of time. By creating a singular solution that can benefit and influence the decisions of a variety of teams in a single organization, smarter and more holistic strategies can be designed in order to optimize sourcing and production planning, inventory management, pricing, and marketing strategies, and much more.

We’ve provided several suggestions on how to use transaction data to forecast demand but ultimately the “best” approach will depend on your specific needs and your data. If you’re interested in adopting this project to the specific goals and needs of your organization, roll-out and customization services can be offered on demand.