Hands-On: Evaluate the Model

In the previous hands-on lesson, you built your first model. Before trying to improve its performance, let’s look at ways to interpret it, and understand prediction quality and model results.

To return to where we left off, you need to find the model report of the random forest model from the previous 103 Machine Learning project.

One way to find it is to:

  • In the Flow, navigate to the Actions tab of the customers_labeled dataset.

  • Enter the Lab where you will find the previously-created High revenue analysis.

  • Once in the visual analysis, navigate to the Models tab.

  • Then open the random forest model from the first training session.


Model Interpretation

Once you have returned to the model report, you will find a left sidebar panel with a range of tabs providing different insights into the model, beginning with a Summary.

Going down the list in the left panel, you will find a first section called Interpretation. This section provides information for assessing the behavior of the model and the contribution of features to the model outcome.

Some of the panels in this section are algorithm-dependent; for example, a linear model will display information about the model’s coefficients, while a tree-based model will display information about decision trees and variable importance.

To understand the random forest model, let’s begin by looking at the Variables importance panel.


We notice that some variables seem to have a strong relationship with being a high-value customer. Notably, the age at the time of first purchase age_first_order seems to be a good indicator.


The Interpretation section also contains a panel for creating partial dependence plots plots, performing subpopulation analysis, and providing individual explanations at a row-level. We’ll cover these in detail in the Explainable AI section.

Model Performance

Following the Interpretation section, you will find a Performance section.

Once again, some sections are algorithm-dependent. Here we discuss options for a classification task, but a regression task would include a scatterplot and error distribution. A clustering task would have a heatmap and cluster profiles.

The Confusion matrix compares the actual values of the target variable with predicted values (hence values such as false positives, false negatives…) and some associated metrics: precision, recall, f1-score.

A machine learning model usually outputs a probability of belonging to one of the two groups, and the actual predicted value depends on which cut-off threshold we decide to use on this probability; e.g., at which probability do we decide to classify our customer as a high value one?

The Confusion matrix shown will be dependent on the given threshold, which can be changed using the slider at the top:


The Decision Chart represents precision, recall, and f1 score for all possible cut-offs:


The Lift charts and ROC curve are visual aids, perhaps the most useful, to assess the performance of your model. While, of course, a longer version about the construction and interpretation of the Lift charts and ROC curve can be found separately, you can remember for now that, in both cases, the steeper the curves are at the beginning of the graphs, the better the model.

In our example again, the results look pretty good:

../../../_images/tshirt-ml-model-liftchart.png ../../../_images/tshirt-ml-model-roccurve.png

Finally, the Density chart shows the distribution of the probability to be high-value customer, compared across the two actual groups. A good model will be able to separate the 2 curves as much as possible, as we can see here:


The last section, Model Information, is a recap about how the model has been built. If you go the Features tab, you will notice some interesting things:


By default, all the variables available except customerID have been used to predict our target. Dataiku DSS has rejected customerID because this feature was detected as an unique identifier and was not helpful to predict high-profile customers.

Furthermore, criteria like the geopoint is probably not really interesting in a predictive model, because it will not generalize well on new records. We may want to refine the settings of the model.

What’s next?

You have discovered ways to interpret your model and understand prediction quality and model results. Next, we’ll look at ways to improve your model’s performance.