Solution | Batch Performance Optimization#


Business Case#

Be it to produce bulk chemicals and packaged goods or to perform critical cleaning processes in food and drug production, batch processes form a critical part of the manufacturing value chain where inefficiencies cost billions of dollars each year.

At a time when supply chains are stressed and raw material prices are increasingly volatile, the need to maximize equipment utilization by reducing downtime and to improve yield by reducing unnecessary waste becomes even more critical. The proliferation of IoT devices and centralized data collection systems for plant automation networks has led to unprecedented opportunities for enterprise manufacturers.

The challenge ahead is now to turn the mountain of data produced by automation networks into insights actionable by Engineers and other professionals running batch manufacturing processes.

With this solution, organizations can quickly enhance their capacity to dissect vast volumes of production process data. They easily develop actionable insights for technicians, operators as well as reliability and process engineers to understand root cause of failures and to predict batch outcomes - accelerating the move from reaction to anticipation in batch manufacturing.


The process to install this solution differs depending on whether you are using Dataiku Cloud or a self-managed instance.

Dataiku Cloud users should follow the instructions for installing solutions on cloud.

  1. The Cloud Launchpad will automatically meet the technical requirements listed below, and add the Solution to your Dataiku instance.

  2. Once the Solution has been added to your space, move ahead to Data Requirements.

After meeting the technical requirements below, self-managed users can install the Solution in one of two ways:

  1. On your Dataiku instance connected to the internet, click + New Project > Dataiku Solutions > Search for Batch Performance Optimization.

  2. Alternatively, download the Solution’s .zip project file, and import it to your Dataiku instance as a new project.

Technical Requirements#

To leverage this solution, you must meet the following requirements:

  • Have access to a Dataiku 12.0+* instance.

  • Please also note that Dataiku instances with a built-in environment of Python2, instead of Python 3, will be unable to use the webapp. For instances with a built-in environment of Python2, users should create a basic Python3 code env and set the project to use this code env.

Data Requirements#

The solution takes in 2 input data sources.




Contains all the sensors values from all the machines in the following format:

  • equipment_id (string): ID of the machine

  • sensor_id (string): ID of the sensor

  • timestamp (date): timestamp when this value has been emitted

  • sensor_value (double): the sensor value


Contains information about batch general parameters in the following format:

  • batch_id (string): ID of the batch

  • equipment_id (string): ID of the machine

  • start_time (date): timestamp of the start time of the batch

  • end_time (date): timestamp of the end time of the batch

  • failure (bigint): the value of the outcome: 1 if it is a failure, 0 if it was a success

In this second dataset, you can add as many columns as you have batch parameters (string type). These parameters will be used to analyze your batches and predict failure.

Workflow Overview#

You can follow along with the solution in the Dataiku gallery.

Dataiku screenshot of the final project Flow showing all Flow zones.

The project has the following high-level steps:

  1. Connect your data as input and select your analysis parameters via the Dataiku Application.

  2. Ingest and Prepare our Data.

  3. Train, Score, and Use a failure Prediction Model.

  4. Analyze our historical Batch Data.

  5. Explore batch history, analyze sensors, and predict failure risk with an interactive Webapp.



In addition to reading this document, it is recommended to read the wiki of the project before beginning to get a deeper technical understanding of how this Solution was created and more detailed explanations of Solution-specific vocabulary.

Plug and play with your own data and parameter choices#

To begin, you will need to create a new Batch Performance Optimization Dataiku Application instance. This can be done by selecting the Dataiku Application from your instance home and clicking Create App Instance.

Once the new instance has been created, you can walk through the application steps to add your data and select the analysis parameters to run. Users of any skill level can input their desired parameters for analysis and view the results directly in a user-friendly Webapp. You could also instantiate multiple Batch Performance Optimization projects to compare your feature engineering and modeling approaches.

Dataiku screenshot of part of the Dataiku Application for Batch Performance Optimization

To begin, we can configure our data connection to retrieve and load our input datasets. A button can be selected to confirm that our data fits the data model of the solution. With the correct data connected, we can load all available batch parameters and select the batch attributes we want to consider in the model prediction. Pressing Run will train the model, which can take several hours depending on the data size. Finally, we can click the Open Webapp button to be taken directly to the interactive visual analysis tool delivered with this solution.

Once we’ve built all elements of our Dataiku Application, you can either continue to the Project View to explore the generated datasets or go straight to the Dashboards and WebApp to visualize the data. If you’re mainly interested in the visual components of this pre-packaged solution, feel free to skip over the following few sections.

Ingest and Prepare our Data#

The first two Flow zones of our project are fairly straightforward. We begin in the Data Ingestion Flow zone by bringing in 2 initial datasets detailed in the previous Data Requirements section.

Now that we have access to all of our input datasets, we can move to the Data preparation Flow zone. We first pivot the senor dataset so that there is only one column per sensor before joining it with the batch dataset. To be able to feed our model a reasonable amount of data, we compute and group by min, max, avg, stddev, and count for each sensor over each batch. This results in a dataset with one row per batch and one column by sensor aggregation. A final window recipe transforms the dataset such that each batch connects to information from the previous batch and the next batch failure. This final dataset is used to train our model. Separately a group recipe is applied to the original batch dataset to compute all the different batch parameter combinations used as filters in the Webapp.

Minimizing Risk of Failure: Training a Batch Failure Prediction Model#

We train a model on our prepared batch and sensor data, specifically training on the failure_last column based on all the previous batch information and the following batch parameters. The model can learn to detect anomalies in the sensor values to predict an imminent failure and help a technician with preventive maintenance.

The Prediction Flow zone is rebuilt by the interactive prediction scenario every time a user initiates a new analysis from the Next Batch Prediction tab of the interactive Webapp. The scenario uses a python recipe to prepare the batch and sensor data to match the user selection before using the previously trained model to score the data and output the failure risk prediction and shapley values. A group recipe is used to filter the train data on the selected batch parameters and compute the average values for each sensor aggregation when it has led to a failure or success.

Dataiku screenshot of the Next Batch Prediction tab of the Webapp.

Analyzing Batch and Sensor Data#

In addition to predicting future batch failures with our trained model, we can also analyze our historical batch and sensor data using the Sensor Values Analysis and General Analysis tabs of the Webapp. We can launch a new analysis from the Sensor Values tab, which will rebuild the Batch Analysis Flow zone. Specifically, this Flow zone:

  • Computes the batch duration, which is displayed in the General Analysis tab.

  • Filters the data to match the user selection.

  • Resamples the sensor value to have the elapsed time in seconds.

Dataiku screenshot showing the General analysis for batches

In addition to the batch duration, the General Analysis tab contains Success Rate by Batch Parameter and the Success Rate over Time. The outputs of the Batch Analysis Flow zone allow us to see charts for each sensor of the machines, aggregated over all the batches corresponding to our selected parameters for the Sensor Values Analysis.

Dataiku screenshot showing the Sensor Values Analysis

Reproducing these Processes With Minimal Effort For Your Own Data#

The intent of this project is to enable an understanding of how Dataiku can be used to analyze sensors and estimate the risk of failure and identify anomalies in equipment. By creating a singular solution that can benefit and influence the decisions of a variety of teams in a single organization, strategies can be implemented that focus on preventive maintenance, rapid response to equipment failures, and process optimization.

We’ve provided several suggestions on how to use your sensor and batch data to improve Batch Performance Optimization but ultimately, the “best” approach will depend on your specific needs and your data. If you’re interested in adapting this project to the specific goals and needs of your organization, roll-out and customization services can be offered on demand.