Understanding the Model

If satisfied with the design of the model, return to the Results tab. Dataiku DSS provides model metrics auto-magically! For example, we can compare how models performed against each other. By default, the AUC is graphed for each model. By that metric, Random Forest has performed better than Logistic Regression.

We can switch from the Sessions view to the Table view to see a side-by-side comparison of model performance across a number of metrics. In this case, Random Forest has performed better across a number of different metrics. Let’s explore this model in greater detail.

../../../_images/ehBacblD-results.png

From any view in the Results tab, clicking on the name of the model will show us a great deal of under-the-hood insight and ready-made analysis. Besides providing information on features and training and validation strategies, this analysis also helps us interpret the model and to understand its performance.

For example, we can obtain the number of correct and incorrect predictions made by this Random Forest model in a Confusion matrix. We can see the ROC curve used to calculate the AUC, the metric used to select our top model, or explore Detailed metrics.

We can dive into the Decision trees that are aggregated to calculate which features are important and to what extent. At the same time, we don’t want to miss out of the important details of this Random Forest for these trees! So let’s discuss some of the implications of this model.

The Variable importance chart displays the importance of each feature in the model for tree-based methods. Some important ones are, not surprisingly, related to its age and usage. Its time_in_service, as well as its last known age and distance, help predict whether or not the car fails. The last known and total mileage (distance_last_known and distance) are also important. Finally, data from maintenance records are also useful. Among them, R193_Quantity_sum, i.e., the total number of parts used for reason code 193, is important in predicting failure at a later date.

Contextual knowledge becomes critical at this point. For example, knowledge of the rental company’s acquisition strategy for vehicles can discern whether these features are important or why they make sense. For now, let’s use the model to make some predictions.