Solution | Predictive Maintenance#

Overview#

Business Case#

Equipment reliability is crucial for manufacturers — ensuring responsible, safe, and consistent product production. Maintenance is key to mitigate unplanned downtime and ensure safe and continuous operation. However, finding the right time for maintenance is a challenge to balance operational and cost objectives.

In many industries, maintenance is either reactive or driven by excessive time-based preventative routines. Both of these options are costly, erode asset performance, and lower operational efficiency, costing billions each year.

Time-based preventative maintenance approaches and reactive fire fighting represent two default strategies that no longer need to be the norm. Using AI and ML, manufacturers can refine their maintenance tactics by leveraging service history and equipment attributes. Techniques, including survival analysis, transform static time-based maintenance schedules into tailored plans that reflect the true risk of mechanical failure by asset.

With Dataiku’s Predictive Maintenance solution, organizations can quickly turn vast volumes of maintenance history into optimized maintenance plans. Thanks to common performance metrics like MTBF, MTTR and task paretos, reliability engineers can easily explore their fleet behaviors with descriptive analytics. ML algorithms provide remaining useful life based on maintenance history and a recommended maintenance schedule per asset, allowing service managers to adjust strategies. Whether for internal equipment maintenance or improving customer service, Dataiku’s Predictive Maintenance solution enables organizations to promptly revisit their manufacturing strategies.

Installation#

The process to install this solution differs depending on whether you are using Dataiku Cloud or a self-managed instance.

Dataiku Cloud users should follow the instructions for installing solutions on cloud.

  1. The Cloud Launchpad will automatically meet the technical requirements listed below, and add the Solution to your Dataiku instance.

  2. Once the Solution has been added to your space, move ahead to Data Requirements.

Technical Requirements#

To leverage this solution, you must meet the following requirements:

  • Have access to a Dataiku 12.0+* instance.

  • A Python 3.8 code environment named solution_predictive-maintenance and the following required packages:

lifelines==0.26.4
nbformat==5.9.1
numpy==1.23.5
pandas==1.0.5
plotly-calplot==0.1.16
plotly==5.15.0
scikit-learn==0.23.2

Data Requirements#

The solution takes in two input data sources.

Dataset

Description

maintenance_operations

Contains logs of all the maintenance operations in the following format:

  • equipment_id (string): ID of the machine

  • equipment_stop_time (date): Start of maintenance

  • equipment_restart_time (date): End of maintenance

  • is_planned (boolean): Indicates planned maintenance

  • maintenance_operation (string): Maintenance category. Entries should be predefined types or part names — not raw text.

equipment_information

Encompasses pertinent static details of each piece of equipment. There should be one row per equipment, and date columns should be pre-parsed to achieve the following data model:

  • equipment_id (string): Equipment’s unique ID

  • XXX (string / date / boolean / float): Include any additional columns essential for predictive maintenance analysis. Feel free to add multiple columns as necessary.

Workflow Overview#

You can follow along with the solution in the Dataiku gallery.

Dataiku screenshot of the final project Flow showing all Flow zones.

The project has the following high-level steps:

  1. Connect your data as input and select your analysis parameters via the Dataiku Application.

  2. Compute and incorporate features for survival analysis into our data.

  3. Train the survival model.

  4. Execute effective maintenance management with the support of pre-built dashboards.

Walkthrough#

Note

In addition to reading this document, it is recommended to read the wiki of the project before beginning to get a deeper technical understanding of how this Solution was created and more detailed explanations of Solution-specific vocabulary.

Define planned vs. unplanned maintenance goals via a Dataiku Application#

To begin, you must create a new Predictive Maintenance Dataiku Application instance. This can be done by selecting the Dataiku Application from your instance home and clicking Create App Instance.

Warning

This solution has been designed to be usable through the Dataiku Application delivered with it. Users should not try to use this solution by interacting with the project itself.

Once the new instance has been created, you can walk through the application steps to add your data and select the analysis parameters to run. Users of any skill level can input their desired parameters for analysis and view the results directly in a user-friendly webapp. You can also instantiate multiple Predictive Maintenance projects to compare your feature engineering and modeling approaches.

Dataiku screenshot of part of the Dataiku Application for Predictive Maintenance

Start by selecting the connection for your input datasets. For example, if your dataset is stored as a local file, you would select the “filesystem_managed” connection. Once the connections are established, you will select the specific table or file within the connection that should be used as input data. This process is done by clicking on the respective dataset names and selecting the relevant tables or files in the dataset configuration window. Remember to save your changes.

In the Predictive Maintenance Application, you can set key parameters. First, the Planned Maintenance Ratio target defines your ideal balance of planned versus unplanned maintenance operations. Then, the Max Maintenance Interval is the longest period a machine can run before needing planned maintenance. The value you set can override the model’s predictions for a longer interval, so consider your operational needs and risk comfort.

You should also choose equipment attributes that might affect the risk of maintenance. These selections help refine the predictions of the model. After setting these parameters, click on the Run button to start the analysis.

Censor Events and Up time: Getting our data ready for Survival Analysis#

In the context of survival analysis, censored values represent instances where the precise time to event (in this case, maintenance or failure of equipment) is not known. This could occur when a machine is still functional at the end of the observation period or when a machine is taken offline for planned maintenance. Because the machine didn’t fail, these instances are treated as censored data, indicating that the actual time to failure is potentially longer than what is observed.

To handle these censored values effectively, “boundary events” are computed. These are introduced at the start and end of the observation period and are marked as planned maintenance in the dataset — thus indicating periods of equipment uptime without recorded maintenance events. The incorporation of boundary events is critical for capturing all uptime periods, particularly those that overlap with the observation boundaries, making the survival probability curve more comprehensive and accurate.

Dataiku screenshot of identified boundary events in our historical data

Once all the boundary events have been computed and concatenated with the maintenance logs, we can compute the uptimes for each machine in two Flow zones:

  1. Analysis by equipment: This Flow zone focuses on the time to maintenance for each piece of equipment. It computes uptime and handles boundary events to generate a comprehensive survival probability curve for each machine.

  2. Analysis by equipment & maintenance operation: This zone emphasizes the time to maintenance for each maintenance operation. It incorporates uptime and boundary events to assess the likelihood of each maintenance operation being required over time.

Anticipating Machine Failure: Training a Survival Analysis Model for Predictive Maintenance#

Survival analysis, a statistical technique used to evaluate the time-to-event, is particularly pertinent in predictive maintenance applications. It effectively deals with censored data, a common occurrence in maintenance scenarios — such as when a machine has not yet failed or is taken offline for planned maintenance. The method enhances the overall predictive power of the model by treating these instances as censored data.

Moreover, survival analysis generates a survival probability curve that offers a detailed understanding of equipment longevity. Rather than merely predicting a single failure point, this curve allows for risk assessment over a duration, enabling more effective planning for maintenance schedules, resource allocation, and spare parts inventory management.

The training of the survival analysis model involves several key steps:

  1. The uptime dataset is joined with the equipment information dataset, integrating relevant equipment data for the analysis.

  2. A ‘prepare’ phase detects date columns and calculates the number of days elapsed between each date and the machine’s restart time, effectively converting date values into duration metrics for modeling. Unneeded columns are also removed at this stage.

  3. A ‘python’ phase handles feature engineering, embeds categorical columns, and builds a Cox proportional hazards model. The models are then stored in a Model folder. This step also generates insights into feature importance and the relative risk multipliers for each equipment value.

These steps ensure the model is adequately trained to leverage all maintenance logs and equipment data, thereby improving the overall reliability and precision of the predictive maintenance solution.

Dataiku screenshot of the Flow zone dedicated to training a Survival Model for Predictive Maintenance.

Design smarter and more effective maintenance strategies#

The pre-built Maintenance Operations Analysis and Predictions dashboard that is packaged within this solution has been designed to support reliability engineers in the effective maintenance management of their equipment fleet. The dashboard is divided into three pages.

Dataiku screenshot of a proposed maintenance schedule available in the General Overview page.

The General Overview dashboard serves as an initial summary and a strategic tool for managing and optimizing maintenance operations. It provides a high-level view of the key performance indicators (KPIs), enabling users to quickly gauge equipment reliability, issue resolution efficiency, and overall equipment availability. By tracking maintenance trends and leveraging predictive models to forecast future maintenance schedules, it facilitates data-driven decision-making.

Dataiku screenshot of charts summarizing the descriptive statistical analysis computed on our fleet.

The Deep Dive into Maintenance page provides a comprehensive analysis of maintenance operations and equipment performance, helping users to uncover detailed insights and make informed decisions. It offers a platform to investigate individual equipment performance, identify failure-prone equipment, and understand the dynamics of various maintenance operations.

By helping users pinpoint patterns and anticipate potential problems, it enables the formulation of targeted maintenance strategies. This dashboard is particularly useful for those seeking to enhance operational efficiency, reduce unscheduled downtime, and improve overall equipment lifespan.

Dataiku screenshot exposing the influence of different attributes on unplanned maintenance.

The Relative Risk Multipliers dashboard provides an analytical tool for understanding the impact of specific factors, or covariates, on the likelihood of equipment failure. It utilizes a Cox Proportional Hazards model to compute risk multipliers, which indicate how much a covariate increases or decreases the risk of failure.

For categorical covariates, each category has its own risk multiplier compared to a baseline category, while for numerical covariates, the risk multiplier quantifies how a one-unit increase or decrease in the covariate alters the hazard, assuming other factors remain constant.

This dashboard is essential for those aiming to identify and understand the factors that significantly influence equipment failure. By interpreting these risk multipliers, decision-makers can focus on the key variables affecting equipment longevity, thereby enhancing maintenance strategies and extending equipment life.

Reproducing these Processes With Minimal Effort For Your Own Data#

The intent of this solution is to enable an understanding of how Dataiku can be used to reduce unplanned downtime and optimize maintenance plans for your equipment. By creating a singular solution that can benefit and influence the decisions of a variety of teams in a single organization, strategies can be implemented to mitigate unplanned downtime, ensure safe continuous activities, and improve service to customers.

We’ve provided several suggestions on how to use your operations and equipment data to improve and optimize maintenance plans, but ultimately, the “best” approach will depend on your specific needs and your data. If you’re interested in adapting this project to the specific goals and needs of your organization, roll-out and customization services can be offered on demand.