Solution | Batch Performance Optimization¶
Overview¶
Business Case¶
Be it to produce bulk chemicals and packaged goods or to perform critical cleaning processes in food and drug production, batch processes form a critical part of the manufacturing value chain where inefficiencies cost billions of dollars each year.
At a time when supply chains are stressed and raw material prices are increasingly volatile, the need to maximize equipment utilization by reducing downtime and to improve yield by reducing unnecessary waste becomes even more critical. The proliferation of IoT devices and centralized data collection systems for plant automation networks has led to unprecedented opportunities for enterprise manufacturers.
The challenge ahead is now to turn the mountain of data produced by automation networks into insights actionable by Engineers and other professionals running batch manufacturing processes.
With this solution, organizations can quickly enhance their capacity to dissect vast volumes of production process data. They easily develop actionable insights for technicians, operators as well as reliability and process engineers to understand root cause of failures and to predict batch outcomes - accelerating the move from reaction to anticipation in batch manufacturing.
Technical Requirements¶
Tip
Dataiku Cloud users should follow the instructions for installing solutions on cloud. The Cloud Launchpad will automatically meet the technical requirements and add the solution to your Dataiku instance. Once the solution has been added to your space, move ahead to Data Requirements.
To leverage this solution, you must meet the following requirements:
Have access to a Dataiku 12.0+* instance
Please also note that Dataiku instances with a built-in environment of Python2, instead of Python 3, will be unable to use the webapp. For instances with a built-in environment of Python2, users should create a basic Python3 code env and set the project to use this code env.
Self-managed Installation¶
Once self-managed users meet these requirements, they can install the solution in one of two ways:
Within the Platform¶
On your Dataiku instance connected to the internet, click + New Project > Dataiku Solutions > Search for Batch Performance Optimization.
From the Downloads Site¶
Alternatively, Solutions can be installed by downloading from the .zip project file and uploading the Solution directly to your Dataiku instance as a new project. All requirements must be pre-installed for this to succeed.
Data Requirements¶
The solution takes in 2 input data sources:
The input_sensor_data dataset contains all the sensors values from all the machines in the following format:
equipment_id (string): ID of the machine
sensor_id (string): ID of the sensor
timestamp (date): timestamp when this value has been emitted
sensor_value (double): the sensor value
The input_batch_data dataset contains information about batch general parameters in the following format:
batch_id (string): ID of the batch
equipment_id (string): ID of the machine
start_time (date): timestamp of the start time of the batch
end_time (date): timestamp of the end time of the batch
failure (bigint): the value of the outcome: 1 if it is a failure, 0 if it was a success
In the second dataset, you can add as many columns as you have batch parameters (string type). These parameters will be used to analyze your batches and predict failure.
Workflow Overview¶
You can follow along with the solution in the Dataiku gallery.

The project has the following high-level steps:
Connect your data as input and select your analysis parameters via the Dataiku Application.
Ingest and Prepare our Data
Train, Score, and Use a failure Prediction Model
Analyze our historical Batch Data
Explore batch history, analyze sensors, and predict failure risk with an interactive Webapp
Walkthrough¶
Note
In addition to reading this document, it is recommended to read the wiki of the project before beginning to get a deeper technical understanding of how this solution was created and more detailed explanations of solution-specific vocabulary.
Plug and play with your own data and parameter choices¶
To begin, you will need to create a new Batch Performance Optimization Dataiku Application instance. This can be done by selecting the Dataiku Application from your instance home and clicking Create App Instance.
Once the new instance has been created, you can walk through the application steps to add your data and select the analysis parameters to run. Users of any skill level can input their desired parameters for analysis and view the results directly in a user-friendly Webapp. You could also instantiate multiple Batch Performance Optimization projects to compare your feature engineering and modeling approaches.

To begin, we can configure our data connection to retrieve and load our input datasets. A button can be selected to confirm that our data fits the data model of the Solution. With the correct data connected, we can load all available batch parameters and select the batch attributes we want to consider in the model prediction. Pressing Run will train the model, which can take several hours depending on the data size. Finally, we can click the Open Webapp button to be taken directly to the interactive visual analysis tool delivered with this Solution.
Once we’ve built all elements of our Dataiku Application, you can either continue to the Project View to explore the generated datasets or go straight to the Dashboards and WebApp to visualize the data. If you’re mainly interested in the visual components of this pre-packaged solution, feel free to skip over the following few sections.
Ingest and Prepare our Data¶
The first two Flow zones of our project are fairly straightforward. We begin in the Data Ingestion Flow zone by bringing in 2 initial datasets detailed in the previous Data Requirements section.
Now that we have access to all of our input datasets, we can move to the Data preparation Flow zone. We first pivot the senor dataset so that there is only one column per sensor before joining it with the batch dataset. To be able to feed our model a reasonable amount of data, we compute and group by min, max, avg, stddev, and count for each sensor over each batch. This results in a dataset with one row per batch and one column by sensor aggregation. A final window recipe transforms the dataset such that each batch connects to information from the previous batch and the next batch failure. This final dataset is used to train our model. Separately a group recipe is applied to the original batch dataset to compute all the different batch parameter combinations used as filters in the Webapp.
Minimizing Risk of Failure: Training a Batch Failure Prediction Model¶
We train a model on our prepared batch and sensor data, specifically training on the failure_last column based on all the previous batch information and the following batch parameters. The model can learn to detect anomalies in the sensor values to predict an imminent failure and help a technician with preventive maintenance.
The Prediction Flow zone is rebuilt by the interactive prediction scenario every time a user initiates a new analysis from the Next Batch Prediction tab of the interactive Webapp. The scenario uses a python recipe to prepare the batch and sensor data to match the user selection before using the previously trained model to score the data and output the failure risk prediction and shapley values. A group recipe is used to filter the train data on the selected batch parameters and compute the average values for each sensor aggregation when it has led to a failure or success.

Analyzing Batch and Sensor Data¶
In addition to predicting future batch failures with our trained model, we can also analyze our historical batch and sensor data using the Sensor Values Analysis and General Analysis tabs of the Webapp. We can launch a new analysis from the Sensor Values tab, which will rebuild the Batch Analysis Flow zone. Specifically, this Flow zone:
Computes the batch duration, which is displayed in the General Analysis tab
Filters the data to match the user selection
Resamples the sensor value to have the elapsed time in seconds

In addition to the batch duration, the General Analysis tab contains Success Rate by Batch Parameter and the Success Rate over Time. The outputs of the Batch Analysis Flow zone allow us to see charts for each sensor of the machines, aggregated over all the batches corresponding to our selected parameters for the Sensor Values Analysis.

Reproducing these Processes With Minimal Effort For Your Own Data¶
The intent of this project is to enable an understanding of how Dataiku can be used to analyze sensors and estimate the risk of failure and identify anomalies in equipment. By creating a singular solution that can benefit and influence the decisions of a variety of teams in a single organization, strategies can be implemented that focus on preventive maintenance, rapid response to equipment failures, and process optimization.
We’ve provided several suggestions on how to use your sensor and batch data to improve Batch Performance Optimization but ultimately, the “best” approach will depend on your specific needs and your data. If you’re interested in adapting this project to the specific goals and needs of your organization, roll-out and customization services can be offered on demand.