Prepare the failure dataset#

The failure dataset has only two columns: Asset and failure_bin.

  • Here Asset IDs are unique (i.e., one row for each ID), so we are already structured at the level of individual cars. The Analyze tool is one quick method to verify this property.

  • The failure_bin column is the target variable. A score of 1 represents the failure of the associated Asset. We can use this variable as a label to model predictions for failures among the fleet.

Only one preparation step is needed here.

  • As done with the previous two datasets, Infer Types from Data within the Settings > Schema tab so that failure_bin is stored as a bigint.

Dataiku screenshot of the schema tab of a dataset.

Next, we’ll join all of our datasets together!