FAQ | How does the AutoML tool automatically select or reject features when training a model?#

When training an automated machine learning task in the Lab, such as a quick prototype, Dataiku uses a few heuristics to automatically select or reject features by default. You can of course then manually reject features that Dataiku accepted (and vice versa).

You’ll notice in the Features handling panel of the Design tab that, by default, Dataiku rejects columns with any of the following characteristics:

  • Zero variance

  • Too many missing values (≥ 95% missing)

  • Categorical column suspected to contain identifiers based on:

    • High cardinality (e.g. ≥ 95% of rows have a unique value) and column name starts or ends with “id”

    • All unique values

  • Long text (natural language, not short categories)