Concept: Quick Models¶
You can use visual machine learning in DSS to train several machine learning models in just a few steps. Building your machine learning model happens in the Lab of DSS. The Lab is a place for drafting your work, whether it is preliminary data preparation or machine learning model creation.
In the Lab, you can select the Quick Model option to let DSS make smart modeling choices you like the train/test split or the preprocessing of features.
The next step is to define the machine learning task to perform. More specifically, you will need to select between Prediction and Clustering. Prediction relates to supervised learning problems where the variable to predict is available in a labelled train dataset. On the other hand, clustering refers to unsupervised learning problems where the target is unknown, and you’re looking to find patterns and similarities in your data points.
Different kinds of modeling tasks
Prediction models are learning algorithms that are supervised, e.g. they are trained on past examples for which the actual values (the target column) are known. The nature of the target variable will drive the kind of prediction task.
Regression is used to predict a real-valued quantity (i.e a duration, a quantity, an amount spent…).
Two-class classification is used to predict a boolean quantity (i.e presence / absence, yes / no…).
Multiclass classification is used to predict a variable with a finite set of values (red/blue/green, small/medium/big…).
Clustering models are inferring a function to describe hidden structure from “unlabeled” data. These unsupervised learning algorithms are grouping similar rows given features.
When building a supervised learning model, you’ll also need to select a target variable. The target variable is the variable whose values are to be modeled and predicted by your model using the other variables. It’s what you want to predict.
In the prediction style tab, you will be able to select between the Automated Machine Learning mode, where DSS will make a lot of optimized choices for you, and the expert mode, where you’ll have full control over the details of your model, write your own algorithms in code, or use deep learning models.
In the Automated Machine Learning mode, you’ll still be able to define the types of algorithms DSS will train. This will let you choose between fast prototypes, interpretable models, or high performing models with less interpretability.
You can also define the computation engine used to train the models. You’ll be able to leverage your machine’s Python-based back-end or, depending on the integrations made at the admin level, offload training to your Spark cluster using SparkMLlib or H20 Sparkling water.
You can now launch your first training session and train a few models on your training dataset.
A session is one iteration of your experiment. It will include and save all the parameters, the dataset, features, and algorithms used during training, as well as relevant training information. You’ll be able to create many training sessions to experiment and try to improve your baseline model’s performance.
It is good practice to name the sessions with an explicit name to let you identify and explore them later.