Concept | Model comparisons#
Watch the video
In search of the best performance, it is very common to iteratively build a range of models with different parameters. Then, in order to select the best model for one’s particular use case, one must be able to directly compare candidate models across key metrics side-by-side.
Let’s walk through the process in Dataiku for creating a model comparison, interpreting its information, and using it to select a champion model. The use case shown in the screenshots below is a simple credit card fraud use case. The Flow contains a binary classification model to predict which transactions will be authorized (labeled as “1”) and which will fail authorization (labeled as “0”).
Creating a model comparison#
You can create a model comparison by selecting models from:
Saved model versions from the Flow.
Evaluations from model evaluation stores.
Dataiku users will be familiar with the Models page of a visual analysis. In the screenshot below, we have trained many models, but want to compare random forest models from three different sessions side-by-side.
From the Result tab of the Models page within a visual analysis, check the box of the models you want to compare, and then select Compare from the Actions menu.
You have the option of comparing the models in a new or an existing comparison. We’ll choose new and click Compare.
Using a model comparison#
Once a model comparison has been created, it can be accessed at any time from the top navigation bar in the Visual analysis > Model Comparisons page.
A model comparison includes key information not only about the performance of the chosen models, but also training information and feature handling.
For example, you can see that the baseline model included certain features, such as card_active_first_month, which were rejected in the other two candidate models.
Many of the same kind of performance visualizations found for any individual model in the Lab can now be compared side-by-side. For example, below is a decision chart showing precision, recall, and F1 scores for the three candidate models.
Choosing a champion#
Comparing performance metrics side-by-side makes it easier to choose a champion model.
For detecting credit card fraud (where a fraudulent transaction is labeled as “0” and an authorized transaction is labeled as “1” ), having a high precision, which would minimize the number of false positives, may be the most valuable metric. Among these three candidates, the precision metric is quite similar. Therefore, let’s choose the “2nd iteration” model in green, which has the highest recall when precision is also high.
In the Summary panel of the Model Comparison, click to assign or remove the champion status from any model.
In other situations, the saved model deployed to the Flow may already be the current champion. In that case, you can also use the model comparison feature to evaluate the champion against possible challenger models being developed in the Lab.