Solution | Production Quality Control¶
Providing products with consistent quality is at the forefront of priorities when operating in manufacturing. And it’s no surprise: consequences when facing a quality drop are multiple, ranging from an increase in production costs, pressure on supply chain management, increases in waste generation, and reduced sales down to fatalities and serious injuries. Overall, as quality drops alone can sustainably damage customer trust and company brand reputation, they demand tight surveillance. With Industry 4.0 accelerating possibilities for the gathering of data across factories and supply chains, industrial companies have an opportunity to become faster and more flexible to produce higher-quality goods at reduced costs.
On this journey to efficiency, Production Quality Control offers production engineers, quality engineers, and maintenance teams a way to quickly integrate the forecast of AI models in the surveillance of key manufacturing processes. Thanks to full AI explainability, this solution allows production engineers to identify the parameters most influencing the quality of their manufacturing processes. By moving to near real-time insights and developing scenario-based alerts, relevant operators receive timely notifications and can act early if any changes are detected. A needed step towards embedding AI in the day-to-day.
The goal of this plug-and-play solution is to show Factory Management and Operations teams how Dataiku can be used to improve the quality of your production and adjust your manufacturing processes in near real time.
To leverage this solution, you must meet the following requirements:
Have access to a Dataiku 11+ instance.
Please also note that Dataiku instances with a built-in environment of Python2, instead of Python 3, will receive errors when opening the Explanation tab of the Defects prediction Dashboard and will be unable to use the webapp. For instances with a built-in environment of Python2, users should create a basic Python3 code env and set the project to use this code env.
The version of the Solution that’s available for download is built using filesystem-managed datasets. It’s possible to integrate Streaming with this project for real-time monitoring.
If the technical requirements are met, this solution can be installed in one of two ways:
On your Dataiku instance click + New Project > Dataiku Solutions > Search for Production Quality Control.
Download the .zip project file and upload it directly to your Dataiku instance as a new project.
The data contained in this project represents the overmolding process where silicone is injected into a toothbrush handle, resulting in a seamless combination of plastic and rubber. The injection process depends on the right control of injection pressure, temperature, and curing time to avoid defects. Although the delivered data may be specific to toothbrush handle manufacturing, the use case represents a well-known problem in the factory of injection time drift caused by a machine that needs recalibration. Furthermore, thanks to the Dataiku Application and generalized data model in this solution, it is adaptable to any production quality control use case and data.
The solution takes in 2 input data sources:
The process-data dataset contains measurements from the machines and is composed of 6+ columns:
timestamp: Timestamp of the manufacturing
product_ID: Unique identifier of a product
machine_name: Name of the machine
Recipe: Type of manufactured part
Campaign_ID: Unique identifier of the campaigns
process_parameter_1: Process parameter
process_parameter_n: Process parameter
As illustrated, in this dataset, you can add as many columns as you have process parameters. These parameters will be used to create the machine learning model and will be sorted by the most influencing.
The quality-data dataset contains a simplified view of the quality results, composed of 2 columns:
product_ID: Unique identifier of a product
defect: Indicates if the parts have defects or not
You can follow along with the solution in the Dataiku gallery.
The project has the following high-level steps:
Connect your data as input and select your analysis parameters via the Dataiku Application.
Ingest and Prepare our Data
Train and Score a Defect Prediction Model
Detect and Raise Alerts for Drifts in Injection time
Predict Defects based on data from the last 24h
Evaluate our model, identify injection drifts, and analyze our Production Quality with pre-built Dashboards.
Automate the Solution for continuous, real-time analysis
In-depth technical details can be found in the wiki of the project.
To begin, you will need to create a new Production Quality Control Dataiku Application instance. This can be done by selecting the Dataiku Application from your instance home and clicking Create App Instance.
Once the new instance has been created, you can walk through the application steps to add your data and select the analysis parameters to run. Users of any skill level can input their desired parameters for analysis and view the results directly in pre-built user-friendly Dashboards. You could also instantiate multiple Production Quality Control projects to monitor multiple manufacturing processes in parallel.
To begin, upload our Process and Quality data as CSV files. These data files need to contain data that conforms to the data model detailed in the Data Requirements section of this article. Once the two files are uploaded, the green INTEGRATE NOW button will load the schema and column list of the next step. We should now specify the columns from our uploaded datasets that we want to be included in the flow (i.e., equipment id, batch id, etc.) and enter a date on which we want to split the data between historical and new data. By pressing RUN, the full flow will be built, and the Defects prediction dashboard will be built. To enable the Production Quality Control dashboard, the subsequent LAUNCH BACKEND button must be pressed. Before moving on to the dashboards, a final section of the Dataiku App allows us to configure the drift detection alerting system, which is important for continuous monitoring of our in-production Solution. Here we can edit the reporter on which we want to receive alerts and optionally define our custom parameter for the machine learning algorithm to monitor.
Once we’ve built all elements of our Dataiku Application, you can either continue to the Project View to explore the generated datasets or go straight to the Dashboards and WebApp to visualize the data. If you’re mainly interested in the visual components of this pre-packaged solution, skip over the following few sections.
The first two Flow zones of our project are fairly straightforward. We begin in the Data Ingestion Flow zone by bringing in 2 initial datasets that are detailed in the previous Data Requirements section.
Now that we have access to all of our input datasets, we can move along to the Data preparation Flow zone. We first round values in the Process-data dataset before joining it with quality-data on the InjectionID. This dataset is the source for several charts available in the Dashboard. Three branches for data preparation result from this dataset:
Computation of the overall defect rate per machine (top branch) used by the Webapp to display the defect rate per machine
Resampling of the historical data (middle branch) used to train the models
Computation of the defects and number of injections per day (bottom branch) used by the Webapp to display the daily production rate for the past 7 days
Having previously created the process-data-joined-historical-resampled dataset we are ready to train and score a model for predicting defects made by our machines in the Defect prediction Flow zone. The dataset used to train our model represents data that was downsampled to have more accurate results at a ratio of 50/50 which is quite aggressive but will cause more false positives rather than false negatives. When training the model we also used the cost matrix available in the VisualML recipe to optimize. Ultimately we chose to deploy a Random Forest model which, despite taking 5x longer to train than the XGBoost model, still performed better.
We can test out our model by using it to score the process-data-joined-new dataset which represents all of our new data which was split out in the previous Flow Zone.
In addition to predicting Defects with our trained model, we can also detect drifts in the injection time as a way to monitor our Product Quality. We begin the Drift detection & alerting Flow zone by using a window recipe to compute the average and standard deviation of the injection time per recipe. Then we compute the Upper Control Limit as a distance from the average. Finally, we set an Alert variable as true if the injection Time exceeds our set limit. This, however, resulted in A LOT of alerts being raised so we further refined our monitoring by setting a threshold of Alerts (currently set to 800) that our injection Times needs to pass before we consider a Drift to be occurring. The generated Alerts can be used in a Scenario defined below to send notifications to a maintenance team. The limits used can be played with and tuned to achieve optimal results on your own data.
When putting a Solution like this into production, it’s handy to have a Flow zone like Last 24h data to perform faster data updates and scoring. We take, as an input to the Flow zone, process-data-joined-new_prepared and filter to focus on data from the last 24 hours. From this filtered dataset we create 3 branches each with its own goal:
Compute the average Injection Time, Pressure, and Temperature for the last 24 hours
Compute the defect rate per machine for the last 24 hours
Score data from the last 24 hours using the previously trained and deployed model.
All of the resulting metrics from these branches are available in the Webapp for monitoring.
This Solution comes with 2 pre-built Dashboards to assist you in monitoring and controlling your Production Quality. The first Dashboard, Defects Prediction uses Dataiku’s native charts to give a more detailed understanding of our Production Data and the Model used to predict defects. This Dashboard is composed of 4 tabs:
Model eval + Feature importance: provides some explainability values about the trained model
Explanations: includes a What if visualization for our model
Injection time drifts: shows the drift of injection time and examples of how charts specific to the performance and alerting of individual machines can be created
Last 24h campaigns: shows the last 24 hours’ campaigns distributions and if any drifts have been detected.
If your Dataiku instance’s builtin Python environment is Python2, instead of Python3, there will be errors when trying to use the Explanations tab of the Dashboard. These errors can be fixed by retraining the model and deploying the newly trained model. There are no other impacts on the performance of the Solution.
The second Dashboard Production Quality Dashboard contains the Webapp which has been specifically designed with easy-to-understand visualizations that are customizable. Out-of-the-box it displays common KPIs about total production and defect rates, as well as information on the top 3 most influential factors for defects.
For a Solution like this to be useable for real-world production quality control, we need to think about Automated Alerting and Streaming. The former has been mentioned several times already and the flow has been configured to support Alert generation based on drifts in Injection Time. This Solution comes with a Scenario that will be triggered to run every 900s to compute if the Alerts Limits have been crossed. The scenario can be additionally configured to send alerts via email, Slack, or Microsoft Teams. This shows off the reactiveness of the Solution to changing data but does not incorporate true Streaming.
Streaming is an experimental feature of Dataiku and not natively provided in this Solution. An instance admin will need to activate Streaming on your Dataiku instance before the following is possible and we do recommend reaching out to use for assistance.
To reach a near real-time prediction, implement the Streaming component by connecting your stream-processing platform to the solution. Download the project and implement it in your existing Production Quality Control solution in three simple steps:
Share the model and the dataset from your implemented Production Quality Control solution
Setup the streaming endpoints to fit your Kafka topics
Adjust the project’s variables to fit the project, dataset, and model name
Once this is done, your flow should be updated allowing you to get real-time alerts to know whether a product has a defect based on your production data.
The intent of this project is to enable an understanding of how Dataiku can be used to improve the quality of your production and adjust your manufacturing processes in near real-time. By creating a singular solution that can benefit and influence the decisions of a variety of teams in a single organization, production efficiency can be improved and changes in production can be detected with sufficient lead time.
We’ve provided several suggestions on how to use your production data to improve production quality control but ultimately the “best” approach will depend on your specific needs and your data. If you’re interested in adapting this project to the specific goals and needs of your organization, roll-out and customization services can be offered on demand.