Concept | Custom modeling within the visual ML tool#

The visual machine learning (ML) tool (or interface) of Dataiku lets you apply several built-in machine learning algorithms (from libraries like Scikit-Learn, TensorFlow, and XGBoost), as well as custom algorithms to your data.

This article will show how to access and implement custom models in the visual ML tool of Dataiku.

Custom modeling in the visual ML tool#

Dataiku allows you to import custom algorithms into the visual ML tool, thereby giving you complete control over the ML algorithm’s design, while still leveraging the visual ML tool’s capabilities of feature preprocessing, model interpretability, etc.

From the Design tab of the visual ML interface, you can access custom models by clicking the Algorithms panel.

From the Algorithms panel, you can create custom models by:

Importing custom algorithms defined in your project library or the global Python library of the Dataiku instance.
Importing custom algorithms from the Python libraries included in the code environment used by the visual ML tool.
Using a prediction algorithm that’s part of a plugin.

When you click the Algorithms panel in the visual ML tool, the list of algorithms begins with the built-in algorithms. At the bottom of the list is an option for adding a custom model.

By clicking this option, Dataiku opens up a code window where the model object can be instantiated.

Note

We don’t recommend that you define your ML algorithm in this code window. Rather, you should define custom algorithms in libraries.

Custom algorithms from libraries#

When defining and importing algorithms for use in the visual ML tool, note that:

The algorithms must be scikit-learn compatible, having both a fit() and a predict() method.
Classifiers must also have a classes_ attribute and can implement a predict_proba() method in addition to the fit() and predict() methods.
Algorithms can be defined in the project’s Python library or the global Python library of the data directory and imported into the visual ML tool.
Algorithms can also be imported from any library that’s included in the code environment used by the visual ML tool.

Tip

Dataiku assists with instantiating model objects in the code window by providing Code Samples that you can use within the code window. The code samples button is located at the top right corner of the code window.

Custom algorithms from a plugin#

A custom model can also be created from a plugin’s prediction algorithm component.

In this approach, the plugin first needs to be installed on the Dataiku instance. The algorithm then becomes accessible from the Algorithms section of the visual ML tool’s Design tab. For example, the Linear Discriminant Analysis and the LightGBM Classification algorithms in the following figure are plugin components.

Note

You need to specify the proper code environment when using custom models in the visual ML tool. This code environment must include the required Python libraries for your machine learning algorithm.

You can specify the code environment by clicking the Runtime environment panel of the design tab.

Model training metrics#

When training a custom model, the model training dashboard may not always track the model’s metrics during training. This can happen if hyperparameter optimization isn’t enabled.

Assessing custom model performance#

Once a custom model is built, open it to visualize its performance and all associated visual insights, just as you would with a built-in model.

Custom models can be deployed in the Flow and used just like a standard built-in model!

Next steps#

Continue learning about custom models in the visual ML tool by working through Tutorial | Custom preprocessing & modeling within visual ML.