Transfer Learning to Retrain the Model

At this point, we classified a set of images using only a pre-trained model. Let’s improve the pre-trained model with transfer learning.

To do this, we’ll use the folder containing our training images, Images for retraining. This folder contains more labelled images of lions and tigers. A simple Python recipe has also been created to extract the label (“lion” or “tiger”) from the filename into the Labels dataset.

  • With the Images for retraining folder selected, click on the Deep learning on images recipe in the Actions menu.

  • Choose Classification model retrain on images (v2).

  • Set Labels as the “Label dataset”, Images for retraining as the “Image folder”, Pre-trained model (imagenet) as the “Model folder”.

  • Name the newly-retrained “Model folder” output Retrained model, and click Create.

Dataiku screenshot of the dialog to create a recipe that will retrain the pretrained model.

Now just a few settings to adjust:

  • Under Dataset with labels, set the “Image filename column” to path and the “Label column” to label.

  • Under Training, reduce “Batch Size” to 10, “Steps per epoch” to 10, and “Number of validation steps” to 5 in order to speed up re-training.

  • Be sure the checkbox “Use TensorBoard” is selected so that you can access tensorboard via a DSS webapp.

  • Run the recipe.

Dataiku screenshot of recipe settings for retraining the classification model.

Classification after Transfer Learning

Now let’s classify the original test set images once more, but using the model that was retrained on the additional images instead of the purely pretrained model.

  • Select the compute_Classification recipe, and click Copy from the Actions menu.

  • Change the “Model folder” to Retrained model.

  • Create a new output dataset Classification_after_retrain.

  • Click Create Dataset, and then click Create.

  • The copied recipe is already set to 1 “Max number of class labels” so Run the recipe.

Prepare the Output from the Retrained Model

Did the retrained model do any better? Let’s apply the same Prepare recipe steps as the first classification to know.

  • Select the compute_Classification_results recipe, and click Copy from the Actions menu.

  • Change the Input dataset to Classification_after_retrain.

  • Name the new output dataset Classification_after_retrain_results.

  • Click Create Recipe.

Results will vary each time you re-train the model. For example, the retrained model might misclassify some of the images. The small retraining dataset and the settings we reduced in order to decrease the retraining time could have caused this.

In this case, the retrained model only missed one image instead of three!

Dataiku screenshot of the output dataset after transfer learning.