Solution | Financial Forecasting#
Overview#
Business Case#
The financial forecasting processes managed by finance teams play a central role in supporting companies to make appropriate cost management and investment decisions. Yet 40 percent of CFOs feel their forecasts are not accurate and that the process takes too much time. To be more precise, less costly-to-produce forecasts are of immediate value.
However, connecting to the data and tapping into the different techniques needed to achieve them can feel out of reach. This can be a result of too little time available to setting up a new forecasting project or a lack of confidence in the statistical and machine learning techniques involved.
Enhancing the efficiency of financial forecasting requires:
Improving the capacity for finance teams to quickly access data and automate data pipelines. Rather than relying on manual checks and merges via spreadsheets, teams can streamline their processes, save time and reduce errors, allowing them to focus more on analysis and decision making.
Easing the comparison of traditional and advanced statistical / machine-learning forecasting techniques, with simple tests and selection of appropriate drivers, developing more accurate projections alongside full ownership and explainability.
Dataiku’s Financial Forecasting solution offers finance teams an opportunity confidently transition through a transformative shift in their business impact while retaining full control of process and outputs.
Installation#
The process to install this solution differs depending on whether you are using Dataiku Cloud or a self-managed instance.
Dataiku Cloud users should follow the instructions for installing solutions on cloud.
The Cloud Launchpad will automatically meet the technical requirements listed below, and add the Solution to your Dataiku instance.
Once the Solution has been added to your space, move ahead to Data Requirements.
After meeting the technical requirements below, self-managed users can install the Solution with the following instructions:
From the Design homepage of a Dataiku instance connected to the internet, click + Dataiku Solutions.
Search for and select Financial Forecasting.
Click Install, changing the project folder into which the solution will be installed if needed.
From the Design homepage of a Dataiku instance connected to the internet, click + New Project.
Select Dataiku Solutions.
Search for and select Financial Forecasting.
Note
Alternatively, download the Solution’s .zip project file, and import it to your Dataiku instance as a new project.
Technical Requirements#
To leverage this solution, you must meet the following requirements:
Have access to a Dataiku 13.1+* instance.
A Python 3.9+ code environment named
solution_financial-forecasting
with the following required packages:
pmdarima == 2.0.2
numpy==1.22.4
urllib3<2
Data Requirements#
The input data should be separated into two distinct groups of datasets: historical_* and to_forecast_* datasets with the same time-frequency:
Three historical datasets: historical time series data about the financial variable to forecast, manual forecasts, and drivers.
historical_actual_value_dataset
historical_forecasts_dataset
historical_drivers_dataset
Two to_forecast datasets: time series data about the period we want to forecast, including drivers’ expected values and manual forecasts.
to_forecast_forecasts_dataset
to_forecast_drivers_dataset
These datasets include the following features:
Feature |
Description |
---|---|
date |
Sequence of dates taken at successive equally spaced points in time. |
actual_value |
Historical value of the quantitative variable the user wants to forecast. |
category |
Subcategories into which the target value is split. |
manual_forecast |
Forecasts figures computed manually by the user. |
driver_n_name (optional) |
Company specific or macroeconomic information chosen by the user. |
Revenue forecasting: Which Solution is right for me?#
Revenue forecasting can be approached in two main ways:
Bottom-Up Approach: This method predicts the quantity of “product” sold at a granular level (e.g., units per product or category) and multiplies by price to estimate revenue. Forecasts are based on prior sales data and drivers at the same or higher granularity, such as weather, holidays, or customer/product attributes. If this describes your usage, then the Demand Forecast Solution should be used instead.
Top-Down Approach: This method forecasts total revenue directly for a business unit, without using unit sales * price as an intermediary. Forecasts are based on historical revenue figures and numerical drivers relevant to revenue streams. If this describes your usage, then this Financial Forecasting Solution can be used.
Each approach lends itself more readily to different industries and expectations. Top-down approaches are most often seen when firms are selling bundles of products, or generating streams of revenue, such that counting individual products with consistent specific prices is not well matched to their business reality.
Workflow Overview#
You can follow along with the solution in the Dataiku gallery.
The project has the following high-level steps:
Configure the project with data and adjustments via the Dataiku Application.
Clean and prepare financial data.
Explore historical data to identify trends and seasonality.
Create two distinct forecast options and cross-evaluate performance.
Visualize financial forecasts to compare approaches and analyze the relationship between drivers and value.
Walkthrough#
Note
In addition to reading this document, it is recommended to read the wiki of the project before beginning to get a deeper technical understanding of how this Solution was created and more detailed explanations of Solution-specific vocabulary.
Tailor the Project to Our Own Needs#
To begin, you will need to create a new instance of the Financial Forecasting Application. This can be done by selecting the Dataiku Application from your instance home and clicking Create App Instance. The project is delivered with sample data that should be replaced with our data, assuming that it adopts the data model described above This can be done in one of two ways:
Data can be uploaded directly from our filesystem in the first section of the Dataiku app.
Data can be connected to your database of choice by selecting an existing connection.
In both options, and loading in the data, be sure to refresh the page so that the app can dynamically take your data into account.
With our data selected and loaded into the Flow, we can move to the final App section Forecast financial data. Here we can select the number of lag values and the drivers from our historical data to include in the advanced forecasting model. Clicking Run will initiate a series of scenarios that rebuilds the entire Flow and updates the dashboard. If you’re only interested in the dashboard, you can skip the following sections where we’ll dive into the underlying Flow that supports the Dataiku App.
Cleaning and Preparing our Historical Financial Data#
In total, ten Flow zones are involved in data preparation and cleaning for this solution. We won’t go into heavy detail about each Flow zone as this information can be found in the wiki of the project. However, at a high level, the first seven Flow zones found towards the start of our Flow perform data normalization and lag values creation from our originating input datasets.
To deal with potential differences in the magnitude of values within each provided category, we must use a min-max normalization technique on the target variable. By applying a linear transformation on the original time series data, we can scale numeric features into a given range (e.g., 0 to 1 or -1 to 1). Since we are working with time series data, lag value creation is an important part of data preparation to rationalize historical data against a given reference point. In this solution, compute lag values refer to the number of previous periods used by the advanced forecast method to predict the value of the next horizons.
After our data has been used to train, validate, and score our forecast models, three final Data Prep Flow zones are called upon to flag the date of the last actual value, convert normalized values back into actual values, and prepare our data for metric calculation. These three Flow zones are important to get our data into a format that can be used to generate visualizations and metrics for the dashboard.
Identifying Trends in Historical Data#
Before training a financial forecasting model, it’s important first to explore our historical data. Doing so allows us to manually identify trends and seasonality of the financial values we want to forecast. The Exploratory data analysis Flow zone computes all metrics and values needed to generate charts for the Data Exploration page of the dashboard. Here we can zoom in on specific categories, filter data by date range and category, and compare actual values against manual forecast values over time. Additionally, we can see a breakdown of the actual value by category and the percentage of change in actual values over a period of time. Explanation boxes accompany each graph of this page.
Simple and Advanced Forecasting#
This solution creates two distinct forecasts:
Simple Forecast uses time series models to forecast future values over the next horizons per category.
Advanced forecast uses an Extra Random Tree regression model to forecast the next horizons.
The Simple Forecast creates an ARIMA model for each category using the AutoARIMA functionality in Python to select the best parameters. The creation of this forecast serves to provide us with an initial approach for financial forecasting and, on its own, can still be a valuable comparison point against manual forecasting in terms of accuracy and time saved for finance Teams. Additionally, the forecast and horizon values output by the Simple Forecast is included as predictors in the Advanced Forecast.
The Advanced Forecast uses an Extra Random Tree regression model which samples a random threshold at which to do the normal splits that occur in a Random Forest model. If we set lag values and drivers when inputting parameters to the Dataiku Application, the Advanced Forecast will also take these inputs into account when training the model.
The inevitable question is, which approach is better? In anticipation of this question, this solution computes the mean absolute percentage error (MAPE) to compare the performance between the forecasting methods (including manual forecast). The lower the MAPE, the better the forecast is. On our data, the Advanced Forecast performed the best, but performance can differ based on your data and additional improvements you may make to the model.
Visualize Approach Comparisons and Drivers#
In the same dashboard as the previously mentioned Data Exploration page, we can find four other pages to support our understanding of the solution Approaches and Outputs. Filters are available on every page by date, category, and/or driver.
The Forecast Comparison page presents the results of all Forecasting approaches (Simple, Advanced, and Manual) for a side-by-side comparison of their MAPE, the alignment to the Actual Value overall, and by category. We can also analyze the performance of forecasts over different time horizons and gain insights into cumulative errors and average error percentages.
With the Drivers page, we can analyze the relationship between drivers contained in our data (and selected by the Dataiku app) and the actual value. Through this, we can determine which drivers may have a potential positive or negative impact on the advanced forecast. For users interested in running multiple analyses with different drivers, they can either re-run the Dataiku application with new parameters for each analysis or create multiple App instances, each with their own desired set of drivers.
Our final two pages Simple Forecast and Advanced Forecast offer isolated metrics and visualizations per approach type regarding their accuracy as well as information about the models through explainability visualizations.
Responsible AI Considerations#
This financial forecasting solution is intended for use at a business level and should not be used to evaluate individual performance. Misuse of this solution, such as using it to make decisions that may lead to individual potential harm, may result in inaccurate or unreliable forecasts.
Reproducing these Processes With Minimal Effort For Your Own Data#
The intent of this project is to enable finance teams to understand how Dataiku can be used to improve accuracy and reduce the required effort for forecasting processes. By creating a singular solution that can benefit and influence the decisions of various teams in a single organization, smarter and more holistic strategies can be designed to transform existing forecasting processes, automate and streamline workflows, and focus on more strategic tasks.
We’ve provided several suggestions on how to use your historical financial data, but ultimately the “best” approach will depend on your specific needs and your data. If you’re interested in adapting this project to the specific goals and needs of your organization, roll-out and customization services can be offered on demand.