Solution | Financial Forecasting


Business Case

The financial forecasting processes managed by finance teams play a central role in supporting companies to make appropriate cost management and investment decisions. Yet 40 percent of CFOs feel their forecasts are not accurate and that the process takes too much time. To be more precise, less costly-to-produce forecasts are of immediate value.

However, connecting to the data and tapping into the different techniques needed to achieve them can feel out of reach. This can be a result of too little time available to setting up a new forecasting project or a lack of confidence in the statistical and machine learning techniques involved.

Enhancing the efficiency of financial forecasting requires:

  • Improving the capacity for finance teams to quickly access data and automate data pipelines. Rather than relying on manual checks and merges via spreadsheets, teams can streamline their processes, save time and reduce errors, allowing them to focus more on analysis and decision making.

  • Easing the comparison of traditional and advanced statistical / machine-learning forecasting techniques, with simple tests and selection of appropriate drivers, developing more accurate projections alongside full ownership and explainability.

Dataiku’s Financial Forecasting Solution offers finance teams an opportunity confidently transition through a transformative shift in their business impact while retaining full control of process and outputs.

Technical Requirements

To leverage this solution, you must meet the following requirements:

  • Have access to a Dataiku 11.0+ instance.

  • A Python 3.7+ code environment named solution_financial-forecasting with the following required packages:

pmdarima == 2.0.2


When creating a new code environment, please be sure to use the name solution_financial-forecasting or remapping will be required.


If the technical requirements are met, this solution can be installed in one of two ways:

  • On your Dataiku instance click + New Project > Dataiku Solutions > Search for Financial Forecasting.

  • Download the .zip project file and upload it directly to your Dataiku instance as a new project.

Data Requirements

The input data should be separated into two distinct groups of datasets: historical_* and to_forecast_* datasets with the same time-frequency:

3 historical datasets: historical time series data about the financial variable to forecast, manual forecasts, and drivers.

  • historical_actual_value_dataset

  • historical_forecasts_dataset

  • historical_drivers_dataset

2 to_forecast datasets: time series data about the period we want to forecast, including drivers’ expected values and manual forecasts.

  • to_forecast_forecasts_dataset

  • to_forecast_drivers_dataset

These datasets include the following features:

  • date: sequence of dates taken at successive equally spaced points in time

  • actual_value: historical value of the quantitative variable the user wants to forecast

  • category: subcategories into which the target value is split

  • manual_forecast: forecasts figures computed manually by the user

  • (optional) driver_n_name: company specific or macroeconomic information chosen by the user

Workflow Overview

You can follow along with the solution in the Dataiku gallery.

Dataiku screenshot of the final project Flow showing all Flow zones.

The project has the following high-level steps:

  1. Configure the project with data and adjustments via the Dataiku Application.

  2. Clean and prepare financial data.

  3. Explore historical data to identify trends and seasonality.

  4. Create two distinct forecast options and cross-evaluate performance.

  5. Visualize financial forecasts to compare approaches and analyze the relationship between drivers and value.



In addition to reading this document, it is recommended to read the wiki of the project before beginning to get a deeper technical understanding of how this solution was created, the different types of data enrichment available, longer explanations of solution-specific vocabulary, and suggested future direction for the solution.

Tailor the Project to Our Own Needs

To begin, you will need to create a new instance of the Financial Forecasting Application. This can be done by selecting the Dataiku Application from your instance home and clicking Create App Instance. The project is delivered with sample data that should be replaced with our data, assuming that it adopts the data model described above This can be done in one of two ways:

  1. Data can be uploaded directly from our filesystem in the first section of the Dataiku app.

  2. Data can be connected to your database of choice by selecting an existing connection.

In both options, and loading in the data, be sure to refresh the page so that the app can dynamically take your data into account.

With our data selected and loaded into the flow, we can move to the final App section Forecast financial data. Here we can select the number of lag values and the drivers from our historical data to include in the advanced forecasting model. Clicking Run will initiate a series of scenarios that rebuilds the entire Flow and updates the dashboard. If you’re only interested in the dashboard, you can skip the following sections where we’ll dive into the underlying Flow that supports the Dataiku App.

Dataiku screenshot of the accompanying Dataiku Application for this solution

Cleaning and Preparing our Historical Financial Data

In total, ten Flow zones are involved in data preparation and cleaning for this Solution. We won’t go into heavy detail about each Flow zone as this information can be found in the wiki of the project. However, at a high level, the first seven Flow zones found towards the start of our Flow perform data normalization and lag values creation from our originating input datasets.

To deal with potential differences in the magnitude of values within each provided category, we must use a min-max normalization technique on the target variable. By applying a linear transformation on the original time series data, we can scale numeric features into a given range (e.g., 0 to 1 or -1 to 1). Since we are working with time series data, lag value creation is an important part of data preparation to rationalize historical data against a given reference point. In this solution, compute lag values refer to the number of previous periods used by the advanced forecast method to predict the value of the next horizons.

After our data has been used to train, validate, and score our forecast models, three final Data Prep Flow zones are called upon to flag the date of the last actual value, convert normalized values back into actual values, and prepare our data for metric calculation. These three Flow zones are important to get our data into a format that can be used to generate visualizations and metrics for the dashboard.

Simple and Advanced Forecasting

This Solution creates two distinct forecasts:

  • Simple Forecast uses time series models to forecast future values over the next horizons per category.

  • Advanced forecast uses an Extra Random Tree regression model to forecast the next horizons.

The Simple Forecast creates an ARIMA model for each category using the AutoARIMA functionality in Python to select the best parameters. The creation of this forecast serves to provide us with an initial approach for financial forecasting and, on its own, can still be a valuable comparison point against manual forecasting in terms of accuracy and time saved for finance Teams. Additionally, the forecast and horizon values output by the Simple Forecast is included as predictors in the Advanced Forecast.

The Advanced Forecast uses an Extra Random Tree regression model which samples a random threshold at which to do the normal splits that occur in a Random Forest model. If we set lag values and drivers when inputting parameters to the Dataiku Application, the Advanced Forecast will also take these inputs into account when training the model.

Dataiku screenshot of the training job for the Regression Model.

The inevitable question is, which approach is better? In anticipation of this question, this Solution computes the mean absolute percentage error (MAPE) to compare the performance between the forecasting methods (including manual forecast). The lower the MAPE, the better the forecast is. On our data, the Advanced Forecast performed the best, but performance can differ based on your data and additional improvements you may make to the model.

Visualize Approach Comparisons and Drivers

In the same dashboard as the previously mentioned Data Exploration slide, we can find four other slides to support our understanding of the Solution Approaches and Outputs. Filters are available on every slide by date, category, and/or driver.

The Forecast Comparison slide presents the results of all Forecasting approaches (Simple, Advanced, and Manual) for a side-by-side comparison of their MAPE, the alignment to the Actual Value overall, and by category. We can also analyze the performance of forecasts over different time horizons and gain insights into cumulative errors and average error percentages.

Dataiku screenshot of the slide used to compare forecast approaches.

With the Drivers slide, we can analyze the relationship between drivers contained in our data (and selected by the Dataiku app) and the actual value. Through this, we can determine which drivers may have a potential positive or negative impact on the advanced forecast. For users interested in running multiple analyses with different drivers, they can either re-run the Dataiku application with new parameters for each analysis or create multiple App instances, each with their own desired set of drivers.

Dataiku screenshot of the Dashboard slide allowing us to analyze drivers impact.

Our final two slides Simple Forecast and Advanced Forecast offer isolated metrics and visualizations per approach type regarding their accuracy as well as information about the models through explainability visualizations.

Dataiku screenshot of the Advanced Forecast explainability visualizations.

Responsible AI Considerations

This financial forecasting solution is intended for use at a business level and should not be used to evaluate individual performance. Misuse of this solution, such as using it to make decisions that may lead to individual potential harm, may result in inaccurate or unreliable forecasts.

Reproducing these Processes With Minimal Effort For Your Own Data

The intent of this project is to enable finance teams to understand how Dataiku can be used to improve accuracy and reduce the required effort for forecasting processes. By creating a singular solution that can benefit and influence the decisions of various teams in a single organization, smarter and more holistic strategies can be designed to transform existing forecasting processes, automate and streamline workflows, and focus on more strategic tasks.

We’ve provided several suggestions on how to use your historical financial data, but ultimately the “best” approach will depend on your specific needs and your data. If you’re interested in adapting this project to the specific goals and needs of your organization, roll-out and customization services can be offered on demand.