AutoML Model Design¶
In these resources, you will learn about creating models within the visual ML tool. You’ll learn about automated options for quickly building prototype models, as well as how to customize them in a wide variety of ways.
To validate your knowledge of this area, register for the Plugin Development course, part of the ML Practitioner learning path, on the Dataiku Academy.
Concepts & tutorials¶
- Concept | Quick models in Dataiku
- Concept | The Design tab within the visual ML tool
- Tutorial | Create the model (ML Practitioner part 1)
- Concept | Feature handling within the visual ML tool
- Concept | Feature generation & reduction
- Concept | Algorithm and hyperparameter selection within the visual ML tool
- Tutorial | Tune the model (ML Practitioner part 3)
- Tutorial | Visual ML assertions
- Tutorial | Visual ML features
- Tutorial | Cluster models with visual ML
- Tutorial | MLlib with Dataiku
How-to | Distributed hyperparameter search¶
Optimizing ML algorithm hyperparameters is a numbers game; the more options you explore, the higher likelihood of landing on the best model. Learn how to distribute hyperparameter computations across your Kubernetes cluster.
You will need a Dataiku DSS instance set up to run with containerized execution and permission to run on that configuration.
When training the model, you need to select a containerized execution configuration on the Runtime environment panel of the Design tab.
On the Hyperparameters panel, choose to distribute the hyperparameter search and select the number of Kubernetes containers you wish to use.
When you train the model, the Results page during training displays the containers spinning up and shutting back down. It otherwise proceeds and produces output normally.
Troubleshoot | In visual ML, I get the error “All values of the target are equal” when they are not¶
This error “All values of the target are equal” means that the target column you’re choosing to predict is a constant.
If you know that there is more than one target value in your full dataset, then it’s likely that the current sample is causing the issue by returning all the same values for the target column. This could be because of the way the data is sorted since the default sampling method for the Train/Test set is to take the first N records of your dataset.
Fortunately, the sampling method is easily configured. To workaround the issue at hand, you can try using random sampling instead, which you can specify in the Train / Test Set tab within the model design in the Lab.
Note that this will take a longer time because DSS must read the entire input dataset to compute a sample, which will cache. This sample should then have several different values in the target column.
If the issue persists, then you may want to review your workflow and dataset to check if the column you chose as a target does indeed contain the values that you expect.