AutoML Model Design¶
In these resources, you will learn about creating models within the visual ML tool. You’ll learn about automated options for quickly building prototype models, as well as how to customize them in a wide variety of ways.
Tip
Validate your knowledge of this area by registering for the Dataiku Academy course, Machine Learning Basics. Then challenge yourself to earn a certification!
How-to | Distributed hyperparameter search¶
Optimizing ML algorithm hyperparameters is a numbers game; the more options you explore, the higher likelihood of landing on the best model. Learn how to distribute hyperparameter computations across your Kubernetes cluster.
You will need a Dataiku instance set up to run with containerized execution and permission to run on that configuration.
When training the model, you need to select a containerized execution configuration on the Runtime environment panel of the Design tab.

On the Hyperparameters panel, choose to distribute the hyperparameter search and select the number of Kubernetes containers you wish to use.

When you train the model, the Results page during training displays the containers spinning up and shutting back down. It otherwise proceeds and produces output normally.

FAQ | How does the AutoML tool automatically select or reject features when training a model?¶
When training an automated machine learning task in the Lab, such as a quick prototype, Dataiku uses a few heuristics to automatically select or reject features by default. You can of course then manually reject features that Dataiku accepted (and vice versa).
You’ll notice in the Features handling panel of the Design tab that, by default, Dataiku rejects columns with any of the following characteristics:
Zero variance
Too many missing values (≥ 95% missing)
Categorical column suspected to contain identifiers based on:
High cardinality (e.g. ≥ 95% of rows have a unique value) and column name starts or ends with “id”
All unique values
Long text (natural language, not short categories)
Troubleshoot | In visual ML, I get the error “All values of the target are equal” when they are not¶
This error “All values of the target are equal” means that the target column you’re choosing to predict is a constant.
If you know that there is more than one target value in your full dataset, then it’s likely that the current sample is causing the issue by returning all the same values for the target column. This could be because of the way the data is sorted since the default sampling method for the Train/Test set is to take the first N records of your dataset.
Fortunately, the sampling method is easily configured. To workaround the issue at hand, you can try using random sampling instead, which you can specify in the Train / Test Set tab within the model design in the Lab.

Note that this will take a longer time because Dataiku must read the entire input dataset to compute a sample, which will cache. This sample should then have several different values in the target column.
If the issue persists, then you may want to review your workflow and dataset to check if the column you chose as a target does indeed contain the values that you expect.