Generate features#

Before making any models with the joined dataset, let’s create a few more features that may be useful in predicting failure outcomes.

Because we are still designing this workflow, we’ll create a sandbox environment that won’t create an output dataset—yet. By going into the Lab, we can test out such transformations, plus much more. Nothing is added back to the Flow until we are done testing!

Note

See the Concept | Visual analyses in the Lab and Tutorial | Visual analyses in the Lab articles to learn how steps in an analytics workflow can move from the Lab to the Flow.

  • With the data_by_Asset dataset selected, navigate to the Lab in the right panel.

  • Create a New Analysis, and accept the default name Analyze data_by_Asset.

Dataiku screenshot of the dialog for creating a new visual analysis.
  • In the Script tab, which looks very similar to a Prepare recipe, let’s first use the “Rename columns” processor to rename the following 5 columns.

Old name

New name

count

times_measured

Time_min

age_initial

Time_max

age_last_known

Use_min

distance_initial

Use_max

distance_last_known

  • Create two new features with the Formula processor.

    • distance from the expression, distance_last_known - distance_initial

    • time_in_service from the expression, age_last_known - age_initial

  • Use the Fill empty cells with fixed value processor to replace empty values with 0 in columns representing reasons. The regular expression ^R.*_Quantity_sum$ is handy here.

Dataiku screenshot of a visual analysis script.

Once we’re satisfied with these features, we can deploy the script of the visual analysis as a Prepare recipe so all collaborators can see these data preparation steps in the Flow.

  • From the Script tab of the visual analysis, click the yellow Deploy Script button at the top right.

  • Accept the default output name, and click Deploy.

  • Then click to Run the Prepare recipe.

Note

Here we’ve taken the approach of deploying the Lab script to the Flow as a Prepare recipe. It’s also possible though to keep these data preparation steps within the visual analysis as part of the model object (which we’ll see). This can be helpful in some situations, such as deploying model as prediction endpoints.