Retrain machine learning model#

In the previous section, we resolved schema inconsistencies in the Flow and updated the input (train dataset) to our machine learning model. The train dataset now includes the new features that we generated. In this section, we will retrain the model’s algorithm using the new features.

Let’s return to the Lab to see the new features for training.

  • From the Flow, open the deployed model (the green diamond object).

  • Click View Origin Analysis.

  • Go to the Design tab and open the Features handling panel.

In the “Features Handling” panel, you can see the list of features that Dataiku has selected. This list includes the new features. We will keep this selection.

Design tab showing that Dataiku has automatically selected the newly generated features for training.
  • Click Train to launch a new training session.

Results of the training session in the Result tab of the Lab.

The result of the training session suggests that the new features improved the performance of all the trained models, with the Logistic Regression model having the highest training accuracy.

  • Click the Logistic Regression model to open its Report page.

  • Deploy the model to the Flow, selecting the Update Existing Training Recipe option when prompted.

  • Keep the default selection to activate the new model version, then click Update.

    Update the deployed model in the Flow.
  • In the Flow, open the Evaluate recipe and run it once more.

  • Explore the metrics dataset.

The metrics dataset now has a second row of metrics. Here, you can see that the accuracy of the new model is higher than the first model we trained.

updated metrics dataset.