Tutorial | Dataiku applications#

Get started#

Dataiku applications allow users to package Dataiku projects into reusable applications with customizable inputs and pre-defined actions. Let’s build one together to demonstrate the power of this feature!

Objectives#

In this tutorial, you will:

  • Convert a Dataiku project into a reusable Dataiku application.

  • Design a Dataiku application that allows a user to edit a project variable, run a scenario, and download the output data.

  • Create additional app instances to see the similarities and differences between a Dataiku project and an application.

  • Convert a Dataiku project into a Dataiku application-as-recipe.

Prerequisites#

To reproduce the steps in this tutorial, you’ll need:

  • Access to an instance of Dataiku 12+.

  • Intermediate knowledge of Dataiku (recommended courses in the Advanced Designer learning path or equivalent).

Create the project#

  1. From the Dataiku Design homepage, click + New Project > DSS tutorials > Advanced Designer > Dataiku Applications.

  2. From the project homepage, click Go to Flow (or g + f).

Note

You can also download the starter project from this website and import it as a zip file.

Use case summary#

The project has three data sources:

Dataset

Description

tx

Each row is a unique credit card transaction with information such as the card that was used and the merchant where the transaction was made.

It also indicates whether the transaction has either been:

  • Authorized (a score of 1 in the authorized_flag column)

  • Flagged for potential fraud (a score of 0)

merchants

Each row is a unique merchant with information such as the merchant’s location and category.

cards

Each row is a unique credit card ID with information such as the card’s activation month or the cardholder’s FICO score (a common measure of creditworthiness in the US).

Build a visual Dataiku application#

Dataiku applications can be surfaced either as a visual application or an application-as-recipe. Let’s first create a visual application.

Review the project’s Flow and scenario#

At a high level, this project’s Flow combines transaction, customer, and credit card data. Using visual recipes, it prepares this data and transforms it through a data pipeline in a number of different ways.

Dataiku screenshot of the starting Flow of the project.

Before designing a Dataiku application, let’s highlight what is of greatest interest for this tutorial:

  • The Join recipe includes a pre-filter step that limits transactions to a specific month set by a project variable tx_month.

  • The tx_prepared dataset includes a data quality rule (or, for a pre-12.6 user, a check) on the record count. It currently returns an error.

  • The Data Refresh scenario first verifies data quality rules or runs checks (for pre-12.6 users) on the upstream tx dataset. Then, if all upstream rules or checks pass, it builds the downstream tx_windows.

The data quality rule or check on the record count of the tx_prepared dataset will not be relevant for our purposes. Let’s instruct the scenario to ignore any failures that a rule or check might produce.

  1. Open the Verify rules or run checks step of the Data Refresh scenario.

  2. Check the box to Ignore failure so that the scenario always proceeds with building the tx_windows dataset.

  3. Click Save.

Dataiku screenshot of the Steps tab of the Data Refresh scenario.

Convert the project into a Dataiku application#

Converting a project into a Dataiku application allows other users (including those unfamiliar with the internal project details) to customize and execute some core functionality provided by the project.

For example, imagine other users on our team routinely need access to customized versions of this project’s assets, such as the final output data. We can create a Dataiku application from the project at hand so that anyone can easily download refreshed data for their own month of interest.

Let’s get started!

  1. From the top navigation bar, select the More Options menu (…) > Application Designer.

  2. Select Convert into a visual application.

Configure the application header#

Let’s start with defining some parameters in the application header.

  1. Hover over the title. Click the pencil icon, and rename it Export transactions data.

  2. You can also define settings such as who can create, discover, and execute the application. As you are both the app’s creator and end user in this case, the default settings should be OK!

Dataiku screenshot of the header of a Dataiku application.

Configure included content#

Because all of the starting data resides in the actual project (as opposed to an external connection like a database or a bucket in the cloud), we’ll need to include those resources for the application to function.

  1. In the Included content tile, check the box to Export all ‘input’ datasets.

  2. Enable the Export all ‘input’ managed folders data checkbox.

Dataiku screenshot of the included content tile of a Dataiku application.

Note

We could also have explicitly selected which datasets or managed folders to include. This process is analogous to including additional content in a project bundle.

Design the application#

Now we can design the functionality of the application by adding tiles of pre-defined actions.

  1. At the bottom of the page, click Add Section.

  2. Give the title Export windowed transaction data by month.

Add a tile for editing variables#

First we need a tile that allows users to choose their month of interest (in other words, set a project variable).

  1. Click Add Tile.

  2. Select Edit project variables.

  3. Give the title Select transaction month.

  4. Replace the auto-generated controls with the following code block:

[
   {
      "name": "tx_month",
      "label": "Set the tx_month variable",
      "type": "STRING",
      "description": "Format YYYY-MM"
   }
]

Add a tile for running the scenario#

Next, we want the user to be able to run the scenario that rebuilds the Flow.

  1. Click Add Tile.

  2. Select Run scenario.

  3. Give the title Run the Data Refresh scenario.

  4. Select Data Refresh as the scenario.

Dataiku screenshot of the run scenario tile in a Dataiku application.

Add a tile for downloading the results#

Lastly, we want the user to be able to export the final output dataset.

  1. Click Add Tile.

  2. Select Download dataset.

  3. Give the title Download the output dataset.

  4. Select tx_windows as the dataset (since this is the dataset that the scenario builds).

  5. Click Save.

Dataiku screenshot of the download dataset tile in a Dataiku application.

Test the application#

With these three tiles in place, we are ready to test the application.

  1. Near the top right of the Application Designer, click Test.

  2. In the test instance of the application, click Edit Project Variables.

  3. Enter a new month 2017-02, and click Commit.

  4. Click Run Now to trigger the scenario run.

  5. When the scenario run finishes, click Run details. Note how the check fails as expected, but the scenario run continues.

  6. Use the back arrow of the browser to return to the application test instance.

  7. Click Download to export a copy of the tx_windows dataset rebuilt by the latest scenario run.

Dataiku screenshot of an app test instance.

Tip

Import the CSV file you just downloaded into a Dataiku project to verify that it includes only records from the chosen month, 2017-02!

Create a new app instance#

It’s important to recognize the difference between the parent Dataiku project and the child instance of the Dataiku application that we just created. Let’s browse available Dataiku applications, and then create a new app instance to make sure this is clear.

  1. From the top navigation bar, open the waffle menu, and select Dataiku Applications to browse Dataiku applications available on your instance.

  2. Select Export transactions data to return to the application’s home.

    Note

    Depending on your instance settings, you may also be able to discover Dataiku applications from the Dataiku Design homepage.

  3. Click Create App Instance.

  4. Give it a unique name like {YOURNAME} transaction export.

  5. Click Create.

    Dataiku screenshot of the dialog for creating an instance of a Dataiku application.

    Important

    Just like Dataiku projects, instances of Dataiku applications have project keys that cannot be changed and that must be unique to the instance.

  6. Once you have another copy of the application, test it out once more. Change the variable to a new month (like 2017-03), run the scenario, and export the results.

  7. When finished, click the Actions tab at the top right of the instance, and then Delete the project to clean up unneeded instances of the application.

Tip

In the Actions menu at the top right of the application home (not the individual app instance), you’ll also find the option to publish the application to a workspace, which can be a helpful collaboration space.

Recap of a visual Dataiku application#

Congratulations on creating your first Dataiku application!

Rather than focus on the functionality of this specific application, try to recognize the value that this work pattern brings:

  • One user can design a highly complex project that may prepare data, build models, or create dashboards for example.

  • The app’s creator can then enable many other users to access results or assets from that project in their own way — without interfering with other users or the original project.

Tip

You can find another tutorial including a visual Dataiku application in the AI Collaboration Quick Start.

Build a Dataiku application-as-recipe#

Although a visual Dataiku application suits a wide variety of use cases, a second way to surface a Dataiku application may also interest you: a Dataiku application-as-recipe.

You may recall seeing this option before converting the existing project into a visual application.

Dataiku screenshot of the interface for choosing which type of application to create.

Convert a project into a Dataiku application-as-recipe#

Instead of structuring the application as a separate instance with its own dedicated user interface, we can also package the application as a recipe that can be used in the Flow of other projects.

  1. Return to the Dataiku Applications project (the project from which you created the application). Unless you manually chose a different key, the project key begins with TUT_DKU_APPS.

  2. From the More options (…) menu in the top navigation bar, select Application Designer.

  3. Navigate to the Advanced tab near the top right.

  4. Toggle ON the “Use as recipe” field.

  5. Click Save.

Dataiku screenshot of the advanced tab of a Dataiku application.

Define the application-as-recipe#

Since we have already handled the application header and included content sections when creating the visual application, we just need to define the contents of the recipe.

  1. After saving, navigate back to the Content tab of the Application designer.

  2. Select the Application-as-recipe panel on the left.

  3. Click + Add New Input, and select the tx dataset.

  4. Click + Add New Output, and select the tx_windows dataset.

  5. For the scenario, select Data Refresh.

  6. For the Auto-generated controls, copy-paste the same JSON as before:

    [
       {
          "name": "tx_month",
          "label": "Set the tx_month variable",
          "type": "STRING",
          "description": "Format YYYY-MM"
       }
    ]
    
  7. Click Save.

Dataiku screenshot of the page to define a Dataiku application as recipe.

Create a second project to test the application-as-recipe#

You’ve now created a visual recipe that can be used in other projects on the same instance! To demonstrate its usage, let’s create a new project that contains a dataset with the schema that matches that of the tx dataset defined as the input to the application-as-recipe.

  1. From the Dataiku Design homepage, click + New Project > DSS tutorials > Advanced Designer > Dataiku Applications-as-recipes.

  2. From the project homepage, click Go to Flow (or g + f).

Note

You can also download the starter project from this website and import it as a zip file.

Use the application-as-recipe in the Flow#

This second project includes just one dataset: the transaction data from 2018. Its schema exactly matches the transaction data defined as an input to the application-as-recipe.

We can use the application-as-recipe just like any other visual recipe.

  1. From the Flow, click + Recipe > Applications > Export transactions data.

    Tip

    If you don’t see the new application-as-recipe as an option, try a hard refresh of your browser.

  2. Under Inputs, click + Add, and select tx_2018.

  3. Under Outputs, click + Add, and name it tx_windows_2018.

  4. Click Create Dataset.

  5. Click Create.

Dataiku screenshot of the dialog for adding the application-as-recipe.

Run the recipe#

The only step to configure in this recipe is to set the tx_month variable.

  1. Provide a variable month such as 2018-01.

  2. Click Run to execute the application-as-recipe.

Note

Depending on your instance settings, you may observe pop-up notifications indicating that the Data Refresh scenario is running.

When the recipe is finished running, explore the output and see your progress in the Flow.

Dataiku screenshot of the Flow after adding the application-as-recipe.

What’s next?#

Let’s recap what we achieved here: With both varieties of Dataiku application, we were able to abstract the functionality of a Dataiku project behind a simple interface that is reusable by colleagues and that does not interfere with the original project.

  • For a visual application, that abstraction comes in the form of its own instance and user interface.

  • For an application-as-recipe, that abstraction comes in the form of a visual recipe.

Now you’ve seen both varieties of Dataiku applications!

See also

See the reference documentation to learn more about Dataiku Applications.