Tutorial | Visual analyses in the Lab#

Having all of your work in the Flow can lead to overcrowding. On the other hand, the Lab is the place for experimentation and preliminary work.

For data preparation and visualization, you can create a visual analysis in the Lab, and then deploy this work as a Prepare recipe in the Flow.

Objectives#

In this tutorial, you will:

  • Prepare data in a visual analysis from the Lab.

  • Deploy a visual analysis from the Lab to the Flow as a Prepare recipe.

Prerequisites#

To reproduce the steps in this tutorial, you’ll need:

  • Access to an instance of Dataiku 12+.

  • Basic knowledge of Dataiku (Core Designer level or equivalent).

  • You may also want to review this tutorial’s associated concept article.

Create the project#

  1. From the Dataiku Design homepage, click + New Project > DSS tutorials > Advanced Designer > Visual analysis.

  2. From the project homepage, click Go to Flow.

Note

You can also download the starter project from this website and import it as a zip file.

You’ll next want to build the Flow.

  1. Click Flow Actions at the bottom right of the Flow.

  2. Click Build all.

  3. Keep the default settings and click Build.

Create a visual analysis#

The tx_joined dataset joining the transaction, credit card, and merchant data requires further preparation. Although we could do this directly in a Prepare recipe, we are still in an exploratory stage.

To avoid cluttering the production Flow with outputs we may not use, let’s take an alternative approach: a visual analysis in the Lab.

  1. From the Flow, select tx_joined.

  2. In the Actions tab on the right, click on the Lab button. Alternatively, navigate to the Lab tab of the right side panel (shown below).

  3. In the Visual analyses section, click New Analysis.

  4. Click Create Analysis, accepting the default name.

Dataiku screenshot of the dialog for creating a new visual analysis in the Lab.

Add preparation steps to a script#

We now see an interface that appears very similar to a Prepare recipe. We can add steps to it in exactly the same way.

Parse a date column#

Let’s start by parsing dates.

  1. Click on the column header for purchase_date.

  2. Click on Parse date.

  3. With the format yyyy-MM-dd selected, click Use Date Format.

  4. In the new Parse date step on the left, remove the name of the output column to parse the column in its original place.

Dataiku screenshot of parsing a date in a visual analysis in the Lab.

Convert currencies#

One handy processor provides the ability to convert currencies based on a date column.

  1. Near the bottom left, click Add a New Step.

  2. Search for Convert currencies.

  3. Give purchase_amount as the column.

  4. Select USD as the input currency.

  5. Select EUR as the output currency.

  6. For input date source, select From Column (Date), and give purchase_date as the date column.

  7. Name the output column purchase_amount_eur.

Dataiku screenshot of converting currency in a visual analysis in the Lab.

Round numbers#

Let’s also round the original purchase_amount values for simplicity.

  1. Click on the column header for purchase_amount.

  2. Click on Round to integer.

  3. Increase the decimal places to 2.

Dataiku screenshot of rounding numbers in a visual analysis in the Lab.

Simplify text#

If we want to work with a text or natural language column, a good starting point is often to normalize the text.

  1. Click on the column header for product_title.

  2. Click Simplify text.

  3. Leave the default option to normalize the text.

Dataiku screenshot of the Simplify text step in a visual analysis.

Extract with a regular expression#

We can manipulate string data with regular expressions in many places throughout Dataiku, including in data preparation. As an example, let’s use a regular expression to extract into a new column the name of the first match of common Apple products.

  1. Near the bottom left, click Add a New Step.

  2. Search for Extract with regular expression.

  3. Give product_title as the input column.

  4. Give apple as the prefix for the output column.

  5. Give \b(apple|macbook|ipad|iphone|ipod)\b as the regular expression.

Dataiku screenshot of regex extraction step in a visual analysis in the Lab.

Note

Here we’ve provided the regular expression for you, but you can explore how to use the smart pattern builder on your own.

Deploy a visual analysis script#

Of course we could continue preparing this data in a number of ways, but let’s stop here. Assuming we are satisfied, let’s now transfer these experimental steps into an actual Prepare recipe where it can transform data in the Flow.

  1. Near the top right, click Deploy Script.

  2. Name the output dataset tx_prepared.

  3. Check the box to build the new dataset now.

  4. Click Deploy & Build.

Dataiku screenshot of the dialog for deploying a visual analysis from the Lab.

What’s next?#

Your Flow should include a Prepare recipe with the same script of steps you’ve added, as well as the output dataset!

Now that you have a prepared dataset, the next step is to dive further into the toolkit of other visual recipes. Try the Pivot recipe next!