Solution | Parameters Analyzer#

Overview#

Business case#

For the average manufacturer, the cost of poor quality and downtime equates to more than 20% of total sales. Thus, it’s critical for manufacturing operations teams to identify how best to run their equipment for the desired outcome and equip their operators with the right information at the right time. This means understanding optimal startup settings and ideal equipment operating parameters with optimal product quality as a continuous objective.

Often, these insights are elusive. Machines have a plethora of settings, and in a high-volume production line, several daily changeovers. Optimal machine settings are highly specific to each individual piece of equipment, and also significantly influenced by other external factors like the quality of input materials. Additionally, paper checklists and disparate Excel worksheets still remain the norm despite investments made in building strong data collection systems.

With the Dataiku Parameters Analyzer solution, operations teams can adopt a repeatable, scalable, and automated process for understanding critical control parameters and their ranges. Process engineers and quality engineers benefit from an intuitive analysis tool using machine learning and AI to mine data, understand parameter adjustment, and make documented decisions to enhance data-driven work in manufacturing operations. By understanding equipment performance and better defining ideal control and startup settings, teams make a key step towards putting data at work for yield improvement and general quality optimization.

Installation#

This Solution is currently in a private preview phase. If you’re interested in accessing this Solution, please reach out to your Dataiku account manager or use the general Contact Us form.

Technical Requirements#

To leverage this solution, you must meet the following requirements:

  • Have access to a Dataiku 12.0+* instance

Data Requirements#

In this solution, two data models can be used:

  • (Recommended) Process parameters and Event information datasets

  • Process events-aggregated dataset: It’s necessary to have data aggregated to the events, not time series data.

Process events-aggregated dataset#

The solution can also work with a dataset aggregated to the process events, such as the product. It’s necessary to have data aggregated to the events — not time series data.

Here only one dataset is expected: Product Database dataset.

This dataset includes aggregated process information at the event level, along with outcome measurements, such as quality measurements. The schema is quite flexible, facilitating a variety of specific use cases. However, it has the following mandatory requirements:

  • A product identifier

  • At least one date related to the manufacturing process

  • At least one process measure or parameter

The general schema of the database looks like this:

  • product_identifier (string): Unique identifier of a product

  • identifier_1 … identifier_n (string): Various identifiers related to the product or process

  • process_parameter_1 … process_parameter_n (varies): Several process parameters

  • process_outcome_1 … process_outcome_2 (decimal, integer, boolean, etc.): Several process outcomes

  • date_1 … date_n (Date): Several dates related to the manufacturing process

Here, identifiers are unique values that allow us to pinpoint a specific entity — be it a product, a process, a machine, etc.

  • Process parameters are values that can be adjusted during the manufacturing process and can impact the end quality of the product.

  • Process measures are event-aggregated properties recorded from the product or process — for example, weight, length, temperature, etc.

  • Dates are timestamps related to the manufacturing process, such as the start date of the process, end date, inspection date, etc.

You can add as many columns as necessary for identifiers, process parameters, process measures, and dates. However, dates are not analyzed as part of this solution.

Workflow Overview#

You can follow along with the solution in the Dataiku gallery.

Dataiku screenshot of the final project flow.

The project has the following high-level steps:

  1. Connect your data as input, and select your analysis parameters via the Dataiku Application.

  2. Run your analysis in the web application.

  3. Get the saved rules from the last tab of the web application.

Walkthrough#

Note

In addition to reading this document, it is recommended to read the wiki of the project before beginning to get a deeper technical understanding of how this Solution was created and more detailed explanations of Solution-specific vocabulary.

Dataiku Application setup#

To begin, you must create a new Dataiku Application instance. This can be done by selecting the Dataiku Application from your instance home and clicking Create App Instance.

It will help you connect your data regardless of the connection type and seamlessly configure the whole Flow according to your specific parameters.

Step 1: Input data configuration#

Option A: Upload a dataset

Option A.

Drag and drop any supported file format respecting the event-aggregated data model.

You can review the auto-inferred schema by clicking on View settings of dataset product_database_upload.

Regarding dates: if one or more date columns are not automatically recognized by Dataiku, the most common date formats are automatically handled when clicking the Load & Check button. The formats automatically supported include:

yyyy-MM-DD HH:mm:SS
yyyy-MM-dd'T'HH:mm:ss
DD-MM-yyyy HH:mm:SS
yyyy/MM/DD
DD/MM/yyyy
DD-MM-yyyy
DD-MM-yy

(Recommended) Option B: Connect to an existing database respecting the 2-datasets data model

Option B.
  1. Select your connections:

    • Select the connection that can connect to the two input datasets.

    • Click on the Reconfigure button.

  2. Now that the connections are configured, you need to select the table/file that should be used as inputs within this connection:

    • Click on process_parameters. It will open the dataset configuration window, where you can select the table or file (depending on your connection type) that will be used as process parameters data. Once selected, click on Save in the top right corner, and come back to the application configuration page by clicking on the name of the Dataiku application instance.

    • Do the same with events_informations to connect sensor data.

Option C - Connect to an existing event-aggregated database

Option C.

It’s the same logic as previously, but with one dataset:

  1. Select your connections:

    • Select the connection that can connect to the input dataset.

    • Click on the Reconfigure button.

  2. Now that the connections are configured, you need to select the table/file that should be used as inputs within this connection:

    • Click on product_database. It will open the dataset configuration window, where you can select the table or file (depending on your connection type) that will be used as process parameters data. Once selected, click on Save in the top right corner, and come back to the application configuration page by clicking on the name of the Dataiku application instance.

Step 2: Analysis preparation#

To have the analysis prepared, we need to define a reference date variable that you can select from the menu. If you don’t see your variables, click on the Refresh button.

Also, you can add filters here by clicking on Edit filters. It opens a Filter recipe configuration, in which you can select as many filters as you want. Once selected, click on Save in the top right corner, and come back to the application configuration page by clicking on the name of the Dataiku application instance.

Analysis setup.

An optional step is the analysis of the empty values that computes a report to show the percentage of empty values per column. Once computed, you can click on Open Report to see it. It’ll be opened in a new tab. Then, you can eventually type a percentage of empty values you want to automatically filter.

Empty values analysis.

Webapp usage#

Once the Dataiku Application is configured, the web application should be easy to use.

Launch a new analysis#

Target definition

In the period section, select the range of the reference date selected in the Dataiku Application. The entire range of this date is automatically selected.

Then, select a target variable, the outcome you want to analyze, and click Apply.

If you have a limited number of points to display (<50,000), you’ll see the time series chart that will help you select the outcome OK range:

Target definition.

You can now select the OK range of your target variable with the slider on the left and click Apply. The chart will be colored, and you’ll get the target variable distribution:

Target definition.

Variables selection & results

The next step is in the second tab on the left, in which you’ll select the variables to analyze. You can directly type into this field to filter in and out some variables. Once your variables selection is done, you can adjust the number of classes from 10 to 30. It’s the number of points displayed on the charts.

Click on Run to get the result: The most impacting variables on the selected outcome are ranked by a correlation factor. On each chart, you have:

  • The name and type of the variable in the title

  • The average NOK rate: the percentage of NOK points calculated on non-empty rows

  • The rank and correlation of this variable

In the default bubble charts, the size of the bubble shows the number of points in this bubble; the horizontal line of the bubble shows the interval; the dashed line shows the average NOK rate of the variable; and the color is green if the NOK rate of this interval is below the average NOK rate.

The last option is to display the charts in expert mode, a histogram view of the same information where the gray histogram shows the number of points, and the bars start from the average NOK rate line, upward if the defect rate is superior to the average NOK rate, downward if the defect rate is inferior.

Below is a comparison of the two possible charts:

Standard mode chart

Expert mode chart

Define a new rule#

Once the results have been obtained, you can start defining a rule — that is to say, a combination of several parameter intervals. Just select a variable, adjust the range, or select the modalities and click on Save Range.

Define a new rule.

On the top, you’ll have:

  • The initial population (number of points and NOK rate in the analysis)

  • The currently selected population (number of points and nok rate in the selected ranges)

  • The ratio: the percentage of the initial population and percentage of improvement in the selected ranges

We always show the result of the combination of the selected ranges.

Explore saved studies and work on a copy#

Instead of launching a new analysis, you can check the existing saved studies by directly going to the fourth tab. Select the study you want to display and from here, and then work on a copy. This will reload the study with the saved analysis parameters.

Explore saved studies and work on a copy.

Reproducing these Processes With Minimal Effort For Your Own Data#

The intent of this solution is to enable an understanding of how Dataiku can be used to analyze the impact of process parameters changes by process experts, such as process and quality engineers.

By creating a singular solution that can benefit and influence the decisions of a variety of teams in a single organization, strategies can be implemented to better understand your manufacturing process, know and control the most impacting parameters, and finally make continuous improvement.