Concept | Custom modeling within the visual ML tool#

Watch the video

The visual machine learning (ML) tool (or interface) of Dataiku lets you apply several built-in machine learning algorithms (from libraries like Scikit-Learn, TensorFlow, and XGBoost) as well as custom algorithms to your data.

This lesson will show how to access and implement custom models in the visual ML tool of Dataiku.

Custom modeling in the visual ML tool#

Dataiku allows you to import custom algorithms into the visual ML tool, thereby giving you complete control over the ML algorithm’s design while still leveraging the visual ML tool’s capabilities of feature preprocessing, model interpretability, etc.

From the Design tab of the visual ML interface, you can access custom models by clicking the Algorithms panel.

../../_images/custom_modeling_concept_algorithms_list.png

From the Algorithms panel, you can create custom models by:

  • Importing custom algorithms defined in your project library or the global Python library of the Dataiku instance.

  • Importing custom algorithms from the Python libraries included in the code environment used by the visual ML tool.

  • Using a prediction algorithm that is part of a plugin.

When you click the Algorithms panel in the visual ML tool, the list of algorithms begins with the built-in algorithms. At the bottom of the list is an option for adding a custom model.

../../_images/custom_modeling_concept_add_custom_model.png

By clicking this option, Dataiku opens up a code window where the model object can be instantiated.

../../_images/custom_modeling_constraints.png

Note

We do not recommend that you define your ML algorithm in this code window; rather, you should define custom algorithms in libraries.

Custom algorithms from libraries#

When defining and importing algorithms for use in the visual ML tool, note that:

  • The algorithms must be scikit learn compatible, having both a fit() and a predict() method.

  • Classifiers must also have a classes_ attribute and can implement a predict_proba() method in addition to the fit() and predict() methods.

  • Algorithms can be defined in the project’s Python library or the global Python library of the data directory and imported into the visual ML tool.

  • Algorithms can also be imported from any library that is included in the code environment used by the visual ML tool.

Tip

Dataiku assists with instantiating model objects in the code window by providing Code Samples that you can use within the code window. The code samples button is located at the top right corner of the code window.

Custom algorithms from a plugin#

A custom model can also be created from a plugin’s prediction algorithm component.

In this approach, the plugin first needs to be installed on the Dataiku instance. The algorithm then becomes accessible from the Algorithms section of the Visual ML tool’s Design tab. For example, the Linear Discriminant Analysis and the LightGBM Classification algorithms in the following figure are plugin components.

../../_images/custom_modeling_concept_plugin_components.png

Note

You need to specify the proper code environment when using custom models in the visual ML tool. This code environment must include the required Python libraries for your machine learning algorithm.

You can specify the code environment by clicking the Runtime environment panel of the design tab.

Model training metrics#

When training a custom model, the model training dashboard may not always track the model’s metrics during training. This can happen if hyperparameter optimization is not enabled.

../../_images/custom_modeling_concept_training_metrics.png

Assessing custom model performance#

Once a custom model is built, open it to visualize its performance and all associated visual insights, just as you would with a built-in model.

Detailed output for custom model.

Custom models can be deployed in the Flow and used just like a standard built-in model!

What’s next?#

Continue learning about custom models in the visual ML tool by working through the Tutorial | Custom modeling within the visual ML tool.