Tutorial | Dataiku applications#

Get started#

Dataiku applications allow users to package Dataiku projects into reusable applications with customizable inputs and pre-defined actions. Let’s build one together to demonstrate the power of this feature!

Objectives#

In this tutorial, you will:

  • Convert a Dataiku project into a reusable Dataiku application.

  • Design a Dataiku application that allows a user to edit a project variable, run a scenario, and download the output data.

  • Create additional app instances to see the similarities and differences between a Dataiku project and an application.

  • Convert a Dataiku project into a Dataiku application-as-recipe.

Prerequisites#

To reproduce the steps in this tutorial, you’ll need:

  • Access to an instance of Dataiku 12+.

  • Intermediate knowledge of Dataiku (recommended courses in the Advanced Designer learning path or equivalent).

  • You may also want to review this tutorial’s associated concept article.

Create the project#

  1. From the Dataiku Design homepage, click + New Project > DSS tutorials > Advanced Designer > Dataiku Applications.

  2. From the project homepage, click Go to Flow.

Note

You can also download the starter project from this website and import it as a zip file.

Build a visual Dataiku application#

Dataiku applications can be surfaced either as a visual application or an application-as-recipe. Let’s first create a visual application.

Review the project’s Flow and scenario#

At a high level, this project’s Flow combines transaction, customer, and credit card data. Using visual recipes, it prepares this data and transforms it through a data pipeline in a number of different ways.

Dataiku screenshot of the starting Flow of the project.

Before designing a Dataiku application, let’s highlight what is of greatest interest for this tutorial:

  • The Join recipe includes a pre-filter step that limits transactions to a specific month set by a project variable tx_month.

  • The tx dataset includes a check on the record count. It currently returns an error.

  • The Data Refresh scenario first runs checks on the upstream tx dataset. Then, if all upstream checks pass, it builds the downstream tx_windows.

The check on the record count of the tx dataset will not be relevant for our purposes. Let’s instruct the scenario to ignore any failures that a check might produce.

  1. Open the Run checks step of the Data Refresh scenario.

  2. Check the box to Ignore failure so that the scenario always proceeds with building the tx_windows dataset.

  3. Click Save.

Dataiku screenshot of the Steps tab of the Data Refresh scenario.

Convert the project into a Dataiku application#

Converting a project into a Dataiku application allows other users (including those unfamiliar with the internal project details) to customize and execute some core functionality provided by the project.

For example, imagine other users on our team routinely need access to customized versions of this project’s assets, such as the final output data. We can create a Dataiku application from the project at hand so that anyone can easily download refreshed data for their own month of interest.

Let’s get started!

  1. From the top navigation bar, select the More Options menu (…) > Application Designer.

  2. Select Convert into a visual application.

Configure the application header#

Let’s start with defining some parameters in the application header.

  1. Hover over the title. Click the pencil icon, and rename it Export transactions data.

  2. You can also define settings such as who can create, discover, and execute the application. As you are both the app’s creator and end user in this case, the default settings should be OK!

Dataiku screenshot of the header of a Dataiku application.

Configure included content#

Because all of the starting data resides in the actual project (as opposed to an external connection like a database or a bucket in the cloud), we’ll need to include those resources for the application to function.

  1. In the Included content tile, check the box to Export all ‘input’ datasets.

  2. Enable the Export all ‘input’ managed folders data checkbox.

Dataiku screenshot of the included content tile of a Dataiku application.

Note

We could also have explicitly selected which datasets or managed folders to include.

Design the application#

Now we can design the functionality of the application by adding tiles of pre-defined actions.

  1. At the bottom of the page, click Add Section.

  2. Give the title Export windowed transaction data by month.

Add a tile for editing variables#

First we need a tile that allows users to choose their month of interest (in other words, set a project variable).

  1. Click Add Tile.

  2. Select Edit project variables.

  3. Give the title Select transaction month.

  4. Replace the auto-generated controls with the following code block:

[
   {
      "name": "tx_month",
      "label": "Set the tx_month variable",
      "type": "STRING",
      "description": "Format YYYY-MM"
   }
]

Add a tile for running the scenario#

Next, we want the user to be able to run the scenario that rebuilds the Flow.

  1. Click Add Tile.

  2. Select Run scenario.

  3. Give the title Run the Data Refresh scenario.

  4. Select Data Refresh as the scenario.

Dataiku screenshot of the run scenario tile in a Dataiku application.

Add a tile for downloading the results#

Lastly, we want the user to be able to export the final output dataset.

  1. Click Add Tile.

  2. Select Download dataset.

  3. Give the title Download the output dataset.

  4. Select tx_windows as the dataset (since this is the dataset that the scenario builds).

  5. Click Save.

Dataiku screenshot of the download dataset tile in a Dataiku application.

Test the application#

With these three tiles in place, we are ready to test the application.

  1. Near the top right of the Application Designer, click Test.

  2. In the test instance of the application, click Edit Project Variables.

  3. Enter a new month 2017-02, and click Commit.

  4. Click Run Now to trigger the scenario run.

  5. When the scenario run finishes, click Run details. Note how the check fails as expected, but the scenario run continues.

  6. Use the back arrow of the browser to return to the application test instance.

  7. Click Download to export a copy of the tx_windows dataset rebuilt by the latest scenario run.

Dataiku screenshot of an app test instance.

Tip

Import the CSV file you just downloaded into a Dataiku project to verify that it includes only records from the chosen month, 2017-02!

Create a new app instance#

It’s important to recognize the difference between the parent Dataiku project and the child instance of the Dataiku application that we just created. Let’s browse available Dataiku applications, and then create a new app instance to make sure this is clear.

  1. From the top navigation bar, open the Applications (waffle) menu, and select Applications to browse Dataiku applications available on your instance.

  2. Select Export transactions data to return to the application’s home.

    Note

    Depending on your settings, you may also be able to discover Dataiku applications from the Dataiku Design homepage.

  3. Click Create App Instance.

  4. Give it a unique name like {YOURNAME} transaction export.

  5. Click Create.

    Dataiku screenshot of the dialog for creating an instance of a Dataiku application.

    Note

    Just like Dataiku projects, instances of Dataiku applications have project keys that cannot be changed and that must be unique to the instance.

  6. Once you have another copy of the application, test it out once more. Change the variable to a new month (like 2017-03), run the scenario, and export the results.

  7. When finished, click the Actions tab at the top right of the instance, and then Delete the project to clean up unneeded instances of an application.

Tip

In the Actions menu at the top right of the application home (not the individual app instance), you’ll also find the option to publish the application to a workspace, which can be a helpful collaboration space.

Recap of a visual Dataiku application#

Congratulations on creating your first Dataiku application!

Rather than focus on the functionality of this specific application, try to recognize the value that this work pattern brings. One user can design a highly complex project that may prepare data, build models, or create dashboards for example.

The app’s creator can then enable many other users to access results or assets from that project in their own way — without interfering with other users or the original project.

Tip

You can find another tutorial including a visual Dataiku application in the AI Collaboration Quick Start.

Build a Dataiku application-as-recipe#

Although a visual Dataiku application suits a wide variety of use cases, a second way to surface a Dataiku application may also interest you: a Dataiku application-as-recipe.

You may recall seeing this option before converting the existing project into a visual application.

Dataiku screenshot of the interface for choosing which type of application to create.

Convert a project into a Dataiku application-as-recipe#

Instead of structuring the application as a separate instance with its own dedicated user interface, we can also package the application as a recipe that can be used in the Flow of other projects.

  1. Return to the Dataiku Applications project (the project from which you created the application).

  2. From the More options (…) menu in the top navigation bar, select Application Designer.

  3. Navigate to the Advanced tab near the top right.

  4. Toggle ON the “Use as recipe” field.

  5. Click Save.

Dataiku screenshot of the advanced tab of a Dataiku application.

Define the application-as-recipe#

Since we have already handled the application header and included content sections when creating the visual application, we just need to define the contents of the recipe.

  1. After saving, navigate back to the Content tab of the Application designer.

  2. Select the Application-as-recipe panel on the left.

  3. Click + Add New Input, and select the tx dataset.

  4. Click + Add New Output, and select the tx_windows dataset.

  5. For the scenario, select Data Refresh.

  6. For the auto-generated controls, copy-paste the same JSON as before:

    [
       {
          "name": "tx_month",
          "label": "Set the tx_month variable",
          "type": "STRING",
          "description": "Format YYYY-MM"
       }
    ]
    
  7. Click Save.

Create a second project to test the application-as-recipe#

You’ve now created a visual recipe that can be used in other projects on the same instance! To demonstrate its usage, let’s create a new project that contains a dataset with the schema that matches that of the tx dataset defined as the input to the application-as-recipe.

  1. From the Dataiku Design homepage, click + New Project > DSS tutorials > Advanced Designer > Dataiku Applications-as-recipes.

  2. From the project homepage, click Go to Flow.

Note

You can also download the starter project from this website and import it as a zip file.

Use the application-as-recipe in the Flow#

This second project includes just one dataset: the transaction data from 2018. Its schema exactly matches the transaction data defined as an input to the application-as-recipe.

We can use the application-as-recipe just like any other visual recipe.

  1. From the Flow, click + Recipe > Applications > Export transactions data.

  2. Under Inputs, click + Add, and select tx_2018.

  3. Under Outputs, click + Add, and name it tx_windows_2018.

  4. Click Create Dataset, and then Create.

Dataiku screenshot of the dialog for adding the application-as-recipe.

Run the recipe#

The only step to configure in this recipe is to set the tx_month variable.

  1. Provide a variable month such as 2018-01.

  2. Click Run to execute the application-as-recipe.

Note

Depending on your instance settings, you may observe pop-up notifications indicating that the Data Refresh scenario is running.

When the recipe is finished running, explore the output and see your progress in the Flow.

Dataiku screenshot of the Flow after adding the application-as-recipe.

What’s next?#

Let’s recap what we achieved here: With both varieties of Dataiku application, we were able to abstract the functionality of a Dataiku project behind a simple interface that is reusable by colleagues and that does not interfere with the original project.

  • For a visual application, that abstraction comes in the form of its own instance and user interface.

  • For an application-as-recipe, that abstraction comes in the form of a visual recipe.

Now you’ve seen both varieties of Dataiku applications!

Note

Learn more about both varieties of Dataiku applications in the reference documentation.