Solution | Demand Forecast#
Overview#
Business case#
Predicting how your business will behave in the future, whether being short, medium or long term is hard. Yet, it’s critical for all companies to have the ability to forecast future trends in a reliable manner to answer a broad range of strategic questions. To answer those questions, companies should be able to plan for future trends: how? By leveraging demand forecast.
In this plug and play Solution, you use transactions data, product metadata, seasonal events, and point of sale information to train a model to forecast demand. Users can quickly model different forecasting scenarios, enrich their model with demand drivers, leverage seasonality effects, and pave the road to optimization use cases.
Installation#
From the Design homepage of a Dataiku instance connected to the internet, click + Dataiku Solutions.
Search for and select Demand Forecast.
If needed, change the folder into which the Solution will be installed, and click Install.
Follow the modal to either install the technical prerequisites below or request an admin to do it for you.
From the Design homepage of a Dataiku instance connected to the internet, click + New Project.
Select Dataiku Solutions.
Search for and select Demand Forecast.
Follow the modal to either install the technical prerequisites below or request an admin to do it for you.
Note
Alternatively, download the Solution’s .zip project file, and import it to your Dataiku instance as a new project.
Technical requirements#
To leverage this Solution, you must meet the following requirements:
Have access to a Dataiku 13.2+* instance.
To benefit natively from the Solution, you should store your data (see Data requirements) in one of the following connections:
Snowflake
Google Cloud Platform: BigQuery + GCS (Both are required if you want to leverage BigQuery)
PostgreSQL
However, the Solution comes with demo data available on the filesystem managed connection.
A Python 3.8 code environment named
solution_demand-forecast
with the following required packages (please check that the version of pandas is <2):
MarkupSafe<2.2.0
Jinja2>=2.11,<3.2
cloudpickle>=1.3,<1.6
flask>=1.0,<2.3
itsdangerous<2.1.0
lightgbm>=3.2,<3.3
scikit-learn>=1.0,<1.4
scikit-optimize>=0.7,<=0.10.2
scipy>=1.5,<1.11
statsmodels>=0.12.2,<0.15
Werkzeug<3.1
xgboost>=1.5.1,<2
gluonts>=0.8.1,<0.14
pmdarima>=1.2.1,<2.1
mxnet>=1.8.0.post0,<1.10
prophet>=1.1.1,<1.2
Data requirements#
The Dataiku Flow was initially built using publicly available data. However, we intend for you to use this project with your own data, which you can upload using the Dataiku application. Below are the input datasets that the Solution has been built with:
Mandatory Datasets
transactions
products
Optional Datasets
forecast_granularity
products_inventory
events_calendar
products_pricing_information
company_forecasts
Revenue forecasting: The right Solution for me#
You can approach revenue forecasting in two main ways:
Bottom-Up Approach: This method predicts the quantity of “product” sold at a granular level (for example, units per product or category) and multiplies by price to estimate revenue. Forecasts are based on prior sales data and drivers at the same or higher granularity, such as weather, holidays, or customer/product attributes. If this describes your use case, then you can use this Demand Forecast Solution.
Top-Down Approach: This method forecasts total revenue directly for a business unit, without using unit sales * price as an intermediary. Forecasts are based on historical revenue figures and numerical drivers relevant to revenue streams. If this describes your usage, then you should use the Solution | Financial Forecasting instead.
Each approach lends itself more readily to different industries and expectations. Top-down approaches are most often seen when firms are selling bundles of products, or generating streams of revenue, such that counting individual products with consistent specific prices isn’t well matched to their business reality.
Workflow overview#
You can follow along with the sample project in the Dataiku gallery.

The project has the following high-level steps:
Connect your data as an input and select your analysis parameters via the Dataiku application.
Frame your demand forecast modeling problem (regular and/or cold start predictions).
Ingest and pre-process the data to be available for demand forecasting.
Identify seasonality and gather forecast features.
Train a demand forecasting model and score data.
Interactively explore the forecast model and the predicted forecast of products with a pre-built dashboard.
Walkthrough#
Note
In addition to reading this document, it’s recommended to read the wiki of the project before beginning to get a deeper technical understanding of how this Solution was created and more detailed explanations of Solution-specific vocabulary.
Plug and play with your own data and parameter choices#
To begin, you will need to create a new instance of the Demand Forecast Dataiku application. You can do this by selecting the Dataiku application from your instance home, and clicking Create App Instance.
Once you have created a new instance, you can walk through the steps of the application to add your data, and select the analysis parameters to run. Users of any skill level can experiment with multiple approaches to quickly find the parameters that best fit an organization’s needs. You could also instantiate multiple Demand Forecast projects to compare your feature engineering and modeling approaches.

Once you’ve built all elements of the Dataiku application, you can either continue to the Project View to explore the generated datasets or go straight to the dashboards and webapp to visualize the data. If you’re mainly interested in the visual components of this pre-packaged Solution, feel free to skip over the next few sections
Under the hood: Extracting the necessary data to train a forecast model#
The Dataiku application is built on top of a Dataiku Flow that has been optimized to accept input datasets and respond to your select parameters. Let’s quickly walk through the different Flow zones to get an idea of how this was done.
Flow zone |
Description |
---|---|
inputs |
Centralizes the ingestion of all data sources involved in the use case (mandatory and optional). |
products_metadata_preprocessing |
Employs a Prepare recipe to process the products/SKUs metadata information to be used by the webapp and as input features for the forecast model. |
products_sales_quality_audit |
Assesses products sales regularity and sales volume quality so that you can choose which product sales data can be rejected from model training. |
forecast_preprocessing |
Takes the data from the input transactions history dataset and generates multiple datasets which will allow you to resample and frame the data in a way that’s relevant to the demand forecast model. |
seasons_preprocessing |
Pushes information contained in the editable seasons dataset to a dataset in your preferred connection (for example filesystem managed, Snowflake). |
sales_resampling_&_preparation |
Resamples and prepares the sales data so that they’re in the correct format for time feature engineering and enrichment further down the Flow. |
sales_windows |
Computes time windows over the sales information so that, for each period of time, you can assess the min, max, average, standard deviation, count, and lag of sales in preceding time periods and identify sales of previous years. |

The following Flow zones are optional. You won’t use them if the related optional datasets and/or parameters in the Dataiku application aren’t included.
Flow zone |
Description |
---|---|
forecast_granularity_preprocessing |
Preprocesses the offline locations data and maps it to the forecast granularity defined in the Dataiku App. With this, you can enrich the sales historical data with the locations information, as geospatial features to be included in the model training. |
products_inventory_feature_engineering |
Prepares inventory data so that it can be joined with all other demand forecast features. |
products_seasonality |
Assesses each product/SKU’s individual seasonality by aggregating the fraction of monthly or seasonal sales compared to yearly sales. Then a KMeans clustering model is trained over these sales fractions to identify season product/SKU clusters. |
calendar_events_feature_engineering |
Transforms your calendar event data into time features exploitable by the forecast model. These features will transcribe the closeness of each Demand Forecast period to all the events that surround it. |
known_periods_products_pricing_imputation |
Aggregates the products/SKUs pricing information over the past 5 periods to be later used to fill missing data. |
unknown_period_products_pricing_imputation |
Contains two branches that are relevant if price is a key component of the demand forecast. If you have known sales prices, you can impute the forecast period pricing information with known data. If you only have last known pricing information, you can use that to impute but it’s the less robust option. |
Once your data has been appropriately ingested, pre-processed, and engineered via the aforementioned Flow zones, you can begin to extract what features will be used to train the model.
Under the hood: Turning past data into future demand#
Several Flow zones take the prepared data and train a demand forecast model.
Flow zone |
Description |
---|---|
forecast_features_gathering |
Gathers all the features that have been previously computed in the Flow and resample the product/SKU’s sales dataset. We then apply a Prepare recipe to clean up our data in the format needed to pass it along to the next Flow zone. |
demand_forecast_split |
Splits the data into several sets to train and evaluate the demand forecast models. |
demand_forecast |
Trains models to predict the demand of each product/SKU in a target period of time (set in the Dataiku App). The Solution uses an XGBoost model since that’s what performed the best on our data but it’s recommended to train other algorithms on your own data to find the one with the best performance. ![]() |
demand_forecast_evaluation |
Stores the results of the deployed models evaluations. ![]() |
webapp_zone |
Isn’t involved in the training of the forecast model, but does map all past sales information with the recently scored forecast demands so that you can visualize the results of the Flow in the Solution’s webapp. |
A short note on automation#
It’s possible to automate the Flow of this Solution based on new data, a specific time, etc. via the Project Setup. You can tune all trigger parameters in the Scenarios menu of the project.
Additionally, you can create reporters to send messages to Teams, Slack, email, etc. to keep your full organization informed. You can also run these scenarios ad-hoc as needed. You can find full details on the scenarios and project automation in the wiki.
Going beyond demand forecasting#
Demand forecasting alone provides organizations the ability to make highly impactful and critical decisions based on historical data but the impact of Demand forecasting doesn’t end with this Solution. Once your Demand Forecast Solution has been appropriately tuned to your needs, the outputs of the Solution can be used as an input to Dataiku’s Markdown Optimization Solution for automated pricing and promotion insights.
Reproducing these processes with minimal effort for your data#
The intent of this project is to enable marketing teams to have a plug-and-play Solution built with Dataiku to forecast the demand of products over a period of time.
By creating a singular Solution that can benefit and influence the decisions of a variety of teams in a single organization, you can design smarter and more holistic strategies to optimize sourcing and production planning, inventory management, pricing, and marketing strategies, and much more.
This documentation has reviewed provided several suggestions on how to derive value from this Solution. Ultimately however, the “best” approach will depend on your specific needs and data. If you’re interested in adapting this project to the specific goals and needs of your organization, Dataiku offers roll-out and customization services on demand.