AutoML Model Results¶
After training a machine learning model in Dataiku’s lab, you’ll need to interpret and evaluate the results. Explore these resources to learn how to determine the effectiveness of your models before deploying them to the Flow.
- Concept | The Result tab within the visual ML tool
- Concept | Model summaries within the visual ML tool
- Concept | Explainable AI
- Concept | Partial dependence plots
- Concept | Subpopulation analysis
- Concept | Individual prediction explanations
- Concept | What if? analysis
- Concept | Advanced What if? simulators
- Concept | Interpretation of regression model output
- Concept | Model comparisons
After building a model, you want to enable dashboard consumers to perform What if analyses using the model. Learn how to publish the What if? feature of visual ML to a dashboard.
From the What if? panel of a deployed model, click Publish and then choose the dashboard and slide where you want to add the analysis.
In the What if? tile, you can edit the properties of the tile to reorder the list of features, or hide certain features from view.
A useful practice is to also provide a dataset tile on the slide, so that dashboard consumers can copy rows from the dataset to more easily create What if scenarios.
When training a model using the Visual ML interface, have you ever noticed that the reported value of your optimization metric for a given algorithm does not always exactly match the final value in the line chart, or that the algorithm that visually performs the best in the chart is not necessarily the one that is reported as the model champion for that session?
For example, in the image above, the Random Forest algorithm show an AUC of 0.780, beating out the Logistic Regression’s AUC of 0.753. You can see this score of 0.780 in three places (highlighted with pink boxes), but yet if you hover over the individual data points in the line chart, the AUC is reported 0.793 (gold box). Why the difference?
This is because the line chart is plotting the cross-validation scores for each individual experiment–executed with a specific set of hyperparameters–on the cross-validation set, which is a subset of the train data.
Once grid-search is complete and Dataiku finds the optimal set of hyperparameters, it retrains the model on the whole train set and scores the holdout sample to produce the final results for that session (pink boxes).
Therefore, the values in the line chart cannot be directly compared to the test scores that you see on the other parts of the page, because:
They are not computed on the same data set. The grid-search is computed on a subset of the train set, and the final scores are computed on a holdout / test sample.
They are not computed with the same model. After grid-search, Dataiku retrains the model on the whole train set.
So while it is likely that the algorithm that looks best on the chart will also perform best on the holdout data and that the metric values will match, it is not always the case.
We suggest you instead use this line chart to guide you in the following ways:
Review the optimal set of hyperparameters found during grid search optimization on the training subset using the fly-over tooltips.
Use the slope of the line to evaluate the relative gains across each epoch in the optimization process.
Compare the relative training time various algorithms required in order for the model to converge.
Visit our documentation for a complete description of the visualization of grid search results.