Generating Features¶

Before making our first model on the training dataset, let’s create a few more features that may be useful in predicting failure outcomes.

Because we are still designing this workflow, we’ll create a sandbox environment that won’t create an output dataset, yet. By going into the Lab, we can test out such transformations as well as try out some modeling strategies, plus much more. Nothing is added back to the Flow until we are done testing and ready to deploy!

Note

The Basics courses covers how steps in an analytics workflow can move to and from the Lab to the Flow.

With the training dataset selected, find the Lab in the Actions menu or in the right-click menu.
Under the Visual Analysis side, select New and accept the default name Analyze training.
In the screen which looks similar to a Prepare recipe, create two new variables with the formula processor.

distance from the expression, Use_max - Use_min

time_in_service from the expression, Time_max - Time_min

Use the Fill empty cells with fixed value processor to replace empty values with 0 in columns starting with Reason. The regular expression ^R.*_Quantity_sum$ is handy here.
Lastly, in order to make the model results more interpretable, use the Rename columns processor according to the table below.

Old col name	New col name
count	times_measured
Time_min	age_initial
Time_max	age_last_known
Use_min	distance_initial
Use_max	distance_last_known

Note

It is not necessary to deploy a script created in the Lab to the Flow in order to make use of the new features in the modeling process. Any models created in a Visual Analysis have access to any features created in the same Visual Analysis.

Next, we’ll make some models!