Iterate on the design of a model training session#
See a screencast covering this section’s steps
Thus far, Dataiku has produced quick prototypes. From these baseline models, you can work on iteratively adjusting the design, training new sessions of models, and evaluating the results.
Switch to the Design tab.
Tour the Design tab#
From the Design tab, you have full control over the design of a model training session. Take a quick tour of the available options. Some examples include:
In the Train / Test Set panel, you could apply a k-fold cross validation strategy.
In the Feature reduction panel, you could apply a reduction method like Principal Component Analysis.
In the Algorithms panel, you could select different machine learning algorithms or import custom Python models.
Reduce the number of features#
Instead of adding complexity, let’s simplify the model by including only the most important features. Having fewer features could hurt the model’s predictive performance, but it may bring other benefits, such as greater interpretability, faster training times, and reduced maintenance costs.
In the Design tab, navigate to the Features handling panel.
Click the box at the top left of the feature list to select all features.
For the role, click Reject to toggle off all features.
Click the box at the top left of the feature list to de-select all features.
Turn On the three most influential features according to the Feature importance chart seen earlier: country, has_company_logo, and len_company_profile.
Tip
Your top three features may be slightly different. Feel free to choose these three or the three most important from your own results.
Train a second session#
Once you have just the top three features in the model design, you can kick off another training session.
Click Train.
Click Train once more to confirm.