Overview¶
Business Case¶
We are part of a data team working on a predictive maintenance use case at a car rental company.
Unexpected problems on the road for a rental car can really add to costs because of the associated repairs, unavailability, and the inconvenience to customers. With this in mind, the company wants to replace those cars that are more likely to break down before a problem occurs, thereby minimizing the chance of a rental car breaking down on a customer. At the same time, replacing otherwise healthy vehicles too often would not be cost-effective either.
The company has some information on past failures, as well as on car usage and maintenance. As the data team, we are here to offer a data-driven approach. More specifically, we want to use the information we have to answer the following questions:
What are the most common factors behind these failures?
Which cars are most likely to fail?
These questions are interrelated. As a data team, we are looking to isolate and understand which factors can help predict a higher probability of vehicle failure. To do so, we’ll build end-to-end predictive models in Dataiku DSS. We’ll see an entire advanced analytics workflow from start to finish. Hopefully, its results will end up as a data product that promotes customer safety and has a direct impact on the company’s bottom line!
Supporting Data¶
We’ll need three datasets in this tutorial. Find their descriptions and links to download below:
usage: number of miles the cars have been driven, collected at various points
maintenance: records of when cars were serviced, which parts were serviced, the reason for service, and the quantity of parts replaced during maintenance
failure: whether a vehicle had a recorded failure (not all cases are labelled)
An Asset ID, available in each file, uniquely identifies each car. Some datasets are organized at the vehicle level; others are not. A bit of data detective work might be required!
Workflow Overview¶
By the end of this walkthrough, your workflow in Dataiku DSS should mirror the one below. Moreover, the completed project can be found in the Dataiku gallery.
In order to achieve this workflow, we will complete the following high-level steps:
Import the data
Clean, restructure and merge the input datasets together
Split the merged dataset by whether outcomes are known and unknown, i.e., labelled and unlabelled
Train and analyze a predictive model on the known cases
Score the unlabelled cases using the predictive model
Technical Requirements¶
To complete this walkthrough, the following requirements need to be met:
Have access to a Dataiku DSS instance–that’s it!