AutoML Model Design

In these resources, you will learn about creating models within the visual ML tool. You’ll learn about automated options for quickly building prototype models, as well as how to customize them in a wide variety of ways.


Validate your knowledge of this area by registering for the Dataiku Academy course, Machine Learning Basics. Then challenge yourself to earn a certification!

FAQ | How does the AutoML tool automatically select or reject features when training a model?

When training an automated machine learning task in the Lab, such as a quick prototype, Dataiku uses a few heuristics to automatically select or reject features by default. You can of course then manually reject features that Dataiku accepted (and vice versa).

You’ll notice in the Features handling panel of the Design tab that, by default, Dataiku rejects columns with any of the following characteristics:

  • Zero variance

  • Too many missing values (≥ 95% missing)

  • Categorical column suspected to contain identifiers based on:

    • High cardinality (e.g. ≥ 95% of rows have a unique value) and column name starts or ends with “id”

    • All unique values

  • Long text (natural language, not short categories)

Troubleshoot | In visual ML, I get the error “All values of the target are equal” when they are not

This error “All values of the target are equal” means that the target column you’re choosing to predict is a constant.

If you know that there is more than one target value in your full dataset, then it’s likely that the current sample is causing the issue by returning all the same values for the target column. This could be because of the way the data is sorted since the default sampling method for the Train/Test set is to take the first N records of your dataset.

Fortunately, the sampling method is easily configured. To workaround the issue at hand, you can try using random sampling instead, which you can specify in the Train / Test Set tab within the model design in the Lab.


Note that this will take a longer time because Dataiku must read the entire input dataset to compute a sample, which will cache. This sample should then have several different values in the target column.

If the issue persists, then you may want to review your workflow and dataset to check if the column you chose as a target does indeed contain the values that you expect.