Solution | Demand Forecast#
Overview#
Business Case#
Predicting how your business will behave in the future, whether being short, medium or long term is hard. Yet, it is critical for all companies to have the ability to forecast future trends in a reliable manner to answer a broad range of strategic questions. In order to answer those questions, companies should be able to plan for future trends: how? By leveraging Demand Forecast.
In this plug and play solution, transactions dataset, product metadata, seasonal events, and point of sale information is used to train a model to forecast demand. Users can quickly model different forecasting scenarios, enrich their model with demand drivers, leverage seasonality effects, and pave the road to optimization use cases.
Installation#
The process to install this solution differs depending on whether you are using Dataiku Cloud or a self-managed instance.
Dataiku Cloud users should follow the instructions for installing solutions on cloud.
The Cloud Launchpad will automatically meet the technical requirements listed below, and add the Solution to your Dataiku instance.
Once the Solution has been added to your space, move ahead to Data Requirements.
After meeting the technical requirements below, self-managed users can install the Solution with the following instructions:
From the Design homepage of a Dataiku instance connected to the internet, click + Dataiku Solutions.
Search for and select Demand Forecast.
Click Install, changing the project folder into which the solution will be installed if needed.
From the Design homepage of a Dataiku instance connected to the internet, click + New Project.
Select Dataiku Solutions.
Search for and select Demand Forecast.
Note
Alternatively, download the Solution’s .zip project file, and import it to your Dataiku instance as a new project.
Technical Requirements#
To leverage this solution, you must meet the following requirements:
Have access to a Dataiku 13.2+* instance.
To benefit natively from the solution, your data (see Data Requirements) should be stored in one of the following connections:
Snowflake
Google Cloud Platform: BigQuery + GCS (Both are required if you want to leverage BigQuery)
PostgreSQL
However, the solution comes with demo data available on the filesystem managed connection.
A Python 3.8 code environment named
solution_demand-forecast
with the following required packages (please check that the version of pandas is <2):
MarkupSafe<2.2.0
Jinja2>=2.11,<3.2
cloudpickle>=1.3,<1.6
flask>=1.0,<2.3
itsdangerous<2.1.0
lightgbm>=3.2,<3.3
scikit-learn>=1.0,<1.4
scikit-optimize>=0.7,<=0.10.2
scipy>=1.5,<1.11
statsmodels>=0.12.2,<0.15
Werkzeug<3.1
xgboost>=1.5.1,<2
gluonts>=0.8.1,<0.14
pmdarima>=1.2.1,<2.1
mxnet>=1.8.0.post0,<1.10
prophet>=1.1.1,<1.2
Data Requirements#
The Dataiku Flow was initially built using publicly available data. However, this project is meant to be used with your own data which can be uploaded using the Dataiku Application. Below are the input datasets that the solution has been built with:
Mandatory Datasets
transactions
products
Optional Datasets
forecast_granularity
products_inventory
events_calendar
products_pricing_information
company_forecasts
Revenue forecasting: Which Solution is right for me?#
Revenue forecasting can be approached in two main ways:
Bottom-Up Approach: This method predicts the quantity of “product” sold at a granular level (e.g., units per product or category) and multiplies by price to estimate revenue. Forecasts are based on prior sales data and drivers at the same or higher granularity, such as weather, holidays, or customer/product attributes. If this describes your use case, then this Demand Forecast Solution can be used.
Top-Down Approach: This method forecasts total revenue directly for a business unit, without using unit sales * price as an intermediary. Forecasts are based on historical revenue figures and numerical drivers relevant to revenue streams. If this describes your usage then the Financial Forecasting Solution should be used instead.
Each approach lends itself more readily to different industries and expectations. Top-down approaches are most often seen when firms are selling bundles of products, or generating streams of revenue, such that counting individual products with consistent specific prices is not well matched to their business reality.
Workflow Overview#
You can follow along with the sample project in the Dataiku gallery.
The project has the following high-level steps:
Connect your data as an input and select your analysis parameters via the Dataiku Application.
Frame your demand forecast modeling problem (regular and/or cold start predictions).
Ingest and pre-process the data to be available for demand forecasting.
Identify seasonality and gather forecast features.
Train a demand forecasting model and score data.
Interactively explore the forecast model and the predicted forecast of products with a pre-built dashboard.
Walkthrough#
Note
In addition to reading this document, it is recommended to read the wiki of the project before beginning to get a deeper technical understanding of how this Solution was created and more detailed explanations of Solution-specific vocabulary.
Plug and play with your own data and parameter choices#
To begin, you will need to create a new instance of the Demand Forecast Dataiku Application. This can be done by selecting the Dataiku Application from your instance home, and clicking Create App Instance.
Once the new instance has been created you can walk through the steps of the Application to add your data and select the analysis parameters to be run. Users of any skill level can experiment with multiple approaches to quickly find the parameters that best fit an organization’s needs. You could also instantiate multiple Demand Forecast projects to compare more easily your feature engineering and modeling approaches.
Once we’ve built all elements of our Dataiku Application you can either continue to the Project View to explore the generated datasets or go straight to the dashboards and webApp to visualize the data. If you’re mainly interested in the visual components of this pre-packaged solution, feel free to skip over the next few sections
Under the Hood: How do we extract the necessary data to train a forecast model?#
The Dataiku Application is built on top of a Dataiku Flow that has been optimized to accept input datasets and respond to your select parameters. Let’s quickly walk through the different Flow zones to get an idea of how this was done.
Flow zone |
Description |
---|---|
inputs |
Centralizes the ingestion of all data sources involved in the use case (mandatory and optional). |
products_metadata_preprocessing |
Employs a Prepare recipe to process the products/SKUs metadata information to be used by the webapp and as input features for the forecast model. |
products_sales_quality_audit |
Assesses products sales regularity and sales volume quality so that we can choose which product sales data can be rejected from model training. |
forecast_preprocessing |
Takes the data from our input transactions history dataset and generates multiple datasets which will allow us to resample and frame the data in a way that is relevant to the demand forecast model. |
seasons_preprocessing |
Pushes information contained in the editable seasons dataset to a dataset in your preferred connection (e.g. filesystem managed, Snowflake). |
sales_resampling_&_preparation |
Resamples and prepares our sales data so that they are in the correct format for time feature engineering and enrichment further down the Flow. |
sales_windows |
Computes time windows over our sales information so that, for each period of time, we can assess the min, max, average, standard deviation, count, and lag of sales in preceding time periods and identify sales of previous years. |
The following Flow zones are optional and will not be used if the related optional datasets and/or parameters in the Dataiku Application are not included.
Flow zone |
Description |
---|---|
forecast_granularity_preprocessing |
Preprocesses the offline locations data and maps it to the forecast granularity defined in the Dataiku App. With this, we can enrich the sales historical data with our locations information, as geospatial features to be included in the model training. |
products_inventory_feature_engineering |
Prepares inventory data so that it can be joined with all other demand forecast features. |
products_seasonality |
Assesses each product/SKU’s individual seasonality by aggregating the fraction of our monthly or seasonal sales compared to our yearly sales. Then a KMeans clustering model is trained over these sales fractions to identify season product/SKU clusters. |
calendar_events_feature_engineering |
Transforms your calendar event data into time features exploitable by the forecast model. These features will transcribe the closeness of each *Demand Forecast* period to all the events that surround it. |
known_periods_products_pricing_imputation |
Aggregates the products/SKUs pricing information over the past 5 periods to be later used to fill missing data. |
unknown_period_products_pricing_imputation |
Contains two branches that are relevant if price is a key component of our demand forecast. If we have known sales prices we can impute the forecast period pricing information with our known data. If we only have our last known pricing information we can use that to impute but it is the less robust option. |
Once your data has been appropriately ingested, pre-processed, and engineered via the aforementioned Flow zones, we can begin to extract what features will be used to train our model.
Under the Hood: Turning past data into future demand#
Several Flow zones are used to take the prepared data and train a demand forecast model.
Flow zone |
Description |
---|---|
forecast_features_gathering |
Gathers all the features that have been previously computed in the Flow and resample the product/SKU’s sales dataset. We then apply a prepare recipe to clean up our data in the format needed to pass it along to the next Flow zone. |
demand_forecast_split |
Splits the data into several sets to train and evaluate the Demand Forecast models. |
demand_forecast |
Trains models to predict the demand of each product/SKU in a target period of time (set in the Dataiku App). The solution uses an XGBoost model since that is what performed the best on our data but it is recommended to train other algorithms on your own data to find the one with the best performance. |
demand_forecast_evaluation |
Stores the results of the deployed models evaluations. |
webapp_zone |
Is not involved in the training of the forecast model but does map all past sales information with the recently scored forecast demands so that we can visualize the results of our Flow in the solution’s webapp. |
A short note on automation#
It is possible to automate the Flow of this solution to be triggered based on new data, a specific time, etc via the Dataiku Application. All of these trigger parameters can be tuned in the Scenarios menu of the project. Additionally, reporters can be created to send messages to Teams, Slack, email, etc. to keep our full organization informed. These scenarios can also be run ad-hoc as needed. Full detail on the scenarios and project automation can be found in the wiki.
Going Beyond Demand Forecast#
Demand Forecast alone provides organizations the ability to make highly impactful and critical decisions based on historical data but the impact of Demand Forecast doesn’t end with this solution. Once your Demand Forecast solution has been appropriately tuned to your needs, the outputs of the solution can be used as an input to Dataiku’s Markdown Optimization solution for automated pricing and promotion insights.
Reproducing these Processes With Minimal Effort For Your Own Data#
The intent of this project is to enable marketing teams to have a plug-and-play solution built with Dataiku to forecast the demand of products over a period of time. By creating a singular solution that can benefit and influence the decisions of a variety of teams in a single organization, smarter and more holistic strategies can be designed in order to optimize sourcing and production planning, inventory management, pricing, and marketing strategies, and much more.
We’ve provided several suggestions on how to use transaction data to forecast demand but ultimately the “best” approach will depend on your specific needs and your data. If you’re interested in adopting this project to the specific goals and needs of your organization, roll-out and customization services can be offered on demand.