Overview

Business Case

We are part of a data team working on a predictive maintenance use case at a car rental company.

Unexpected problems on the road for a rental car can really add to costs because of the associated repairs, unavailability, and the inconvenience to customers. With this in mind, the company wants to replace those cars that are more likely to break down before a problem occurs, thereby minimizing the chance of a rental car breaking down on a customer. At the same time, replacing otherwise healthy vehicles too often would not be cost-effective either.

The company has some information on past failures, as well as on car usage and maintenance. As the data team, we are here to offer a data-driven approach. More specifically, we want to use the information we have to answer the following questions:

  • What are the most common factors behind these failures?

  • Which cars are most likely to fail?

These questions are interrelated. As a data team, we are looking to isolate and understand which factors can help predict a higher probability of vehicle failure. To do so, we’ll build end-to-end predictive models in Dataiku DSS. We’ll see an entire advanced analytics workflow from start to finish. Hopefully, its results will end up as a data product that promotes customer safety and has a direct impact on the company’s bottom line!

Supporting Data

We’ll need three datasets in this tutorial. Find their descriptions and links to download below:

  • usage: number of miles the cars have been driven, collected at various points

  • maintenance: records of when cars were serviced, which parts were serviced, the reason for service, and the quantity of parts replaced during maintenance

  • failure: whether a vehicle had a recorded failure (not all cases are labelled)

An Asset ID, available in each file, uniquely identifies each car. Some datasets are organized at the vehicle level; others are not. A bit of data detective work might be required!

Workflow Overview

By the end of this walkthrough, your workflow in Dataiku DSS should mirror the one below. Moreover, the completed project can be found in the Dataiku gallery.

../../../_images/flow3.png

In order to achieve this workflow, we will complete the following high-level steps:

  • Import the data

  • Clean, restructure and merge the input datasets together

  • Split the merged dataset by whether outcomes are known and unknown, i.e., labelled and unlabelled

  • Train and analyze a predictive model on the known cases

  • Score the unlabelled cases using the predictive model

Prerequisites

You should be familiar with:

  • The Basics courses

  • Machine learning in Dataiku DSS

Technical Requirements

To complete this walkthrough, the following requirements need to be met:

  • Have access to a Dataiku DSS instance–that’s it!