Solution | Production Quality Control#


Business Case#

Providing products with consistent quality is at the forefront of priorities when operating in manufacturing. And it’s no surprise: consequences when facing a quality drop are multiple, ranging from an increase in production costs, pressure on supply chain management, increases in waste generation, and reduced sales down to fatalities and serious injuries.

Overall, as quality drops alone can sustainably damage customer trust and company brand reputation, they demand tight surveillance. With Industry 4.0 accelerating possibilities for the gathering of data across factories and supply chains, industrial companies have an opportunity to become faster and more flexible to produce higher-quality goods at reduced costs.

On this journey to efficiency, Production Quality Control offers production engineers, quality engineers, and maintenance teams a way to quickly integrate the forecast of AI models in the surveillance of key manufacturing processes. Thanks to full AI explainability, this solution allows production engineers to identify the parameters most influencing the quality of their manufacturing processes. By moving to near real-time insights and developing scenario-based alerts, relevant operators receive timely notifications and can act early if any changes are detected. A needed step towards embedding AI in the day-to-day.

The goal of this plug-and-play solution is to show Factory Management and Operations teams how Dataiku can be used to improve the quality of your production and adjust your manufacturing processes in near real time.


The process to install this solution differs depending on whether you are using Dataiku Cloud or a self-managed instance.

Dataiku Cloud users should follow the instructions for installing solutions on cloud.

  1. The Cloud Launchpad will automatically meet the technical requirements listed below, and add the Solution to your Dataiku instance.

  2. Once the Solution has been added to your space, move ahead to Data Requirements.

After meeting the technical requirements below, self-managed users can install the Solution in one of two ways:

  1. On your Dataiku instance connected to the internet, click + New Project > Dataiku Solutions > Search for Production Quality Control.

  2. Alternatively, download the Solution’s .zip project file, and import it to your Dataiku instance as a new project.

Technical Requirements#

To leverage this solution, you must meet the following requirements:

  • Have access to a Dataiku 12.0+* instance.

  • The version of the solution that’s available for download is built using filesystem-managed datasets. It’s possible to integrate Streaming with this project for real-time monitoring.

Data Requirements#

The data contained in this project represents the overmolding process where silicone is injected into a toothbrush handle, resulting in a seamless combination of plastic and rubber. The injection process depends on the right control of injection pressure, temperature, and curing time to avoid defects. Although the delivered data may be specific to toothbrush handle manufacturing, the use case represents a well-known problem in the factory of injection time drift caused by a machine that needs recalibration. Furthermore, thanks to the Dataiku Application and generalized data model in this solution, it is adaptable to any production quality control use case and data.

The solution takes in 2 input data sources.




Contains measurements from the machines and is composed of 6+ columns:

  • timestamp: Timestamp of the manufacturing

  • product_ID: Unique identifier of a product

  • machine_name: Name of the machine

  • Recipe: Type of manufactured part

  • Campaign_ID: Unique identifier of the campaigns

  • process_parameter_1: Process parameter

  • process_parameter_n: Process parameter

As illustrated, in this dataset, you can add as many columns as you have process parameters. These parameters will be used to create the machine learning model and will be sorted by the most influencing.


Contains a simplified view of the quality results, composed of 2 columns:

  • product_ID: Unique identifier of a product

  • defect: Indicates if the parts have defects or not

Workflow Overview#

You can follow along with the solution in the Dataiku gallery.

Dataiku screenshot of the final project Flow showing all Flow zones.

The project has the following high-level steps:

  1. Connect your data as input and select your analysis parameters via the Dataiku Application.

  2. Ingest and prepare our data.

  3. Train and score a defect prediction model.

  4. Detect and raise alerts for drifts in injection time.

  5. Predict defects based on data from the last 24h.

  6. Evaluate our model, identify injection drifts, and analyze our production quality with pre-built dashboards.

  7. Automate the solution for continuous, real-time analysis.



In addition to reading this document, it is recommended to read the wiki of the project before beginning to get a deeper technical understanding of how this Solution was created and more detailed explanations of Solution-specific vocabulary.

Plug and play with your own data and parameter choices#

To begin, you will need to create a new Production Quality Control Dataiku Application instance. This can be done by selecting the Dataiku Application from your instance home and clicking Create App Instance.

Once the new instance has been created, you can walk through the application steps to add your data and select the analysis parameters to run. Users of any skill level can input their desired parameters for analysis and view the results directly in pre-built user-friendly Dashboards. You could also instantiate multiple Production Quality Control projects to monitor multiple manufacturing processes in parallel.

Dataiku screenshot of part of the Dataiku Application for Production Quality Control

To begin, upload our Process and Quality data as CSV files. These data files need to contain data that conforms to the data model detailed in the Data Requirements section of this article. Once the two files are uploaded, the green INTEGRATE NOW button will load the schema and column list of the next step. We should now specify the columns from our uploaded datasets that we want to be included in the Flow (i.e., equipment id, batch id, etc.) and enter a date on which we want to split the data between historical and new data. By pressing RUN, the full Flow will be built, and the Defects prediction dashboard will be built. To enable the Production Quality Control dashboard, the subsequent LAUNCH BACKEND button must be pressed. Before moving on to the dashboards, a final section of the Dataiku App allows us to configure the drift detection alerting system, which is important for continuous monitoring of our in-production solution. Here we can edit the reporter on which we want to receive alerts and optionally define our custom parameter for the machine learning algorithm to monitor.

Once we’ve built all elements of our Dataiku Application, you can either continue to the Project View to explore the generated datasets or go straight to the Dashboards and WebApp to visualize the data. If you’re mainly interested in the visual components of this pre-packaged solution, skip over the following few sections.

Ingest and Prepare our Data#

The first two Flow zones of our project are fairly straightforward. We begin in the Data Ingestion Flow zone by bringing in 2 initial datasets that are detailed in the previous Data Requirements section.

Dataiku screenshot of the Flow zone dedicated to preparing the Production and Quality datasets.

Now that we have access to all of our input datasets, we can move along to the Data preparation Flow zone. We first round values in the Process-data dataset before joining it with quality-data on the InjectionID. This dataset is the source for several charts available in the Dashboard. Three branches for data preparation result from this dataset:

  • Computation of the overall defect rate per machine (top branch) used by the Webapp to display the defect rate per machine.

  • Resampling of the historical data (middle branch) used to train the models.

  • Computation of the defects and number of injections per day (bottom branch) used by the Webapp to display the daily production rate for the past 7 days.

Proactive Quality Control: Training a Defect Prediction Model#

Having previously created the process-data-joined-historical-resampled dataset we are ready to train and score a model for predicting defects made by our machines in the Defect prediction Flow zone. The dataset used to train our model represents data that was downsampled to have more accurate results at a ratio of 50/50 which is quite aggressive but will cause more false positives rather than false negatives. When training the model we also used the cost matrix available in the VisualML recipe to optimize. Ultimately we chose to deploy a Random Forest model which, despite taking 5x longer to train than the XGBoost model, still performed better.

Dataiku screenshot of our trained model's metrics and explainability values.

We can test out our model by using it to score the process-data-joined-new dataset which represents all of our new data which was split out in the previous Flow Zone.

Detecting Drifts in Our Data#

In addition to predicting Defects with our trained model, we can also detect drifts in the injection time as a way to monitor our Product Quality. We begin the Drift detection & alerting Flow zone by using a window recipe to compute the average and standard deviation of the injection time per recipe. Then we compute the Upper Control Limit as a distance from the average. Finally, we set an Alert variable as true if the injection Time exceeds our set limit. This, however, resulted in A LOT of alerts being raised so we further refined our monitoring by setting a threshold of Alerts (currently set to 800) that our injection Times needs to pass before we consider a Drift to be occurring. The generated Alerts can be used in a Scenario defined below to send notifications to a maintenance team. The limits used can be played with and tuned to achieve optimal results on your own data.

Dataiku screenshot showing drifts in the Injection Time of one Machine

Focusing on Relevant Updates#

When putting a solution like this into production, it’s handy to have a Flow zone like Last 24h data to perform faster data updates and scoring. We take, as an input to the Flow zone, process-data-joined-new_prepared and filter to focus on data from the last 24 hours. From this filtered dataset we create 3 branches each with its own goal:

  • Compute the average Injection Time, Pressure, and Temperature for the last 24 hours.

  • Compute the defect rate per machine for the last 24 hours.

  • Score data from the last 24 hours using the previously trained and deployed model.

All of the resulting metrics from these branches are available in the Webapp for monitoring.

Dataiku screenshot of the Flow zone used to isolate and compute metrics for our last 24 hours of data

Visual Monitoring and Data Exploration#

This solution comes with 2 pre-built Dashboards to assist you in monitoring and controlling your Production Quality. The first Dashboard, Defects Prediction uses Dataiku’s native charts to give a more detailed understanding of our Production Data and the Model used to predict defects. This Dashboard is composed of 4 tabs:



Model eval + Feature importance

Provides some explainability values about the trained model.


Includes a What if visualization for our model.

Injection time drifts

Shows the drift of injection time and examples of how charts specific to the performance and alerting of individual machines can be created.

Last 24h campaigns

Shows the last 24 hours’ campaigns distributions and if any drifts have been detected.


If your Dataiku instance’s builtin Python environment is Python2, instead of Python3, there will be errors when trying to use the Explanations tab of the Dashboard. These errors can be fixed by retraining the model and deploying the newly trained model. There are no other impacts on the performance of the solution.

Dataiku screenshot of the webapp delivered with this solution to monitor Product Quality

The second Dashboard Production Quality Dashboard contains the Webapp which has been specifically designed with easy-to-understand visualizations that are customizable. Out-of-the-box it displays common KPIs about total production and defect rates, as well as information on the top 3 most influential factors for defects.

Making this Solution Reactive to Real-Time Data#

For a solution like this to be useable for real-world production quality control, we need to think about Automated Alerting and Streaming. The former has been mentioned several times already and the Flow has been configured to support Alert generation based on drifts in Injection Time. This solution comes with a Scenario that will be triggered to run every 900s to compute if the Alerts Limits have been crossed. The scenario can be additionally configured to send alerts via email, Slack, or Microsoft Teams. This shows off the reactiveness of the solution to changing data but does not incorporate true Streaming.

Going Further with Streaming#


Streaming is an experimental feature of Dataiku and not natively provided in this solution. An instance admin will need to activate Streaming on your Dataiku instance before the following is possible and we do recommend reaching out to use for assistance.

To reach a near real-time prediction, implement the Streaming component by connecting your stream-processing platform to the solution. Download the project and implement it in your existing Production Quality Control solution in three simple steps:

  1. Share the model and the dataset from your implemented Production Quality Control solution.

  2. Setup the streaming endpoints to fit your Kafka topics.

  3. Adjust the project’s variables to fit the project, dataset, and model name.

Once this is done, your Flow should be updated allowing you to get real-time alerts to know whether a product has a defect based on your production data.

Reproducing these Processes With Minimal Effort For Your Own Data#

The intent of this project is to enable an understanding of how Dataiku can be used to improve the quality of your production and adjust your manufacturing processes in near real-time. By creating a singular solution that can benefit and influence the decisions of a variety of teams in a single organization, production efficiency can be improved and changes in production can be detected with sufficient lead time.

We’ve provided several suggestions on how to use your production data to improve production quality control but ultimately the “best” approach will depend on your specific needs and your data. If you’re interested in adapting this project to the specific goals and needs of your organization, roll-out and customization services can be offered on demand.