Preparing the Failure Dataset

The Failure dataset has only two columns: Asset ID and failure_bin.

  • Here Asset IDs are unique (i.e., one row for each ID), so we are already structured at the level of individual cars. The Analyze tool is one quick method to verify this important property.

  • The failure_bin variable contains 0’s and 1’s representing failures of the associated Asset. We can use this variable as a label to model predictions for failures among the fleet.

Only one preparation step is needed here. As done with the previous two datasets, use the Infer Types from Data from within the Settings > Schema tab so that failure_bin is stored as a bigint.

Our workflow is now beginning to look like a data pipeline. Next we’ll merge all of our datasets together!