Tutorial | Dataiku applications#
Get started#
Dataiku applications allow users to package Dataiku projects into reusable applications with customizable inputs and pre-defined actions. Let’s build one together to demonstrate the power of this feature!
Objectives#
In this tutorial, you will:
Convert a Dataiku project into a reusable Dataiku application.
Design a Dataiku application that allows a user to edit a project variable, run a scenario, and download the output data.
Create additional app instances to see the similarities and differences between a Dataiku project and an application.
See how version management works for a Dataiku application.
Convert a Dataiku project into a Dataiku application-as-recipe.
Prerequisites#
To reproduce the steps in this tutorial, you’ll need:
Dataiku 12.0 or later.
Intermediate knowledge of Dataiku (recommended courses in the Advanced Designer learning path or equivalent).
Create the project#
From the Dataiku Design homepage, click + New Project > DSS tutorials > Advanced Designer > Dataiku Applications.
From the project homepage, click Go to Flow (or
g
+f
).
Note
You can also download the starter project from this website and import it as a zip file.
Use case summary#
The project has three data sources:
Dataset |
Description |
---|---|
tx |
Each row is a unique credit card transaction with information such as the card that was used and the merchant where the transaction was made. It also indicates whether the transaction has either been:
|
merchants |
Each row is a unique merchant with information such as the merchant’s location and category. |
cards |
Each row is a unique credit card ID with information such as the card’s activation month or the cardholder’s FICO score (a common measure of creditworthiness in the US). |
Build a visual Dataiku application#
Dataiku applications can be surfaced either as a visual application or an application-as-recipe. Let’s first create a visual application.
Review the project’s Flow and scenario#
At a high level, this project’s Flow combines transaction, customer, and credit card data. Using visual recipes, it prepares this data and transforms it through a data pipeline in a number of different ways.
Before designing a Dataiku application, let’s highlight what is of greatest interest for this tutorial:
The Join recipe includes a pre-filter step that limits transactions to a specific month set by a project variable tx_month.
The Data Refresh scenario automates the build of the tx_windows dataset.
Open the Data Refresh scenario.
Navigate to the Steps tab to review its actions.
Click Run to trigger it.
Navigate to the Last runs tab to confirm its success.
Convert the project into a Dataiku application#
Converting a project into a Dataiku application allows other users (including those unfamiliar with the internal project details) to customize and execute some core functionality provided by the project.
For example, imagine other users on our team routinely need access to customized versions of this project’s assets, such as the final output data. We can create a Dataiku application from the project at hand so that anyone can easily download refreshed data for their own month of interest.
Let’s get started!
From the top navigation bar, select the More Options menu (…) > Application Designer.
Select Convert into a visual application.
Configure the application header#
Let’s start with defining some parameters in the application header.
Click the pencil icon next to the title, and rename it
Export transactions data
.You can also define settings such as who can create, discover, and execute the application. As you are both the app’s creator and end user in this case, the default settings should be OK!
Configure included content#
Because all of the starting data resides in the actual project (as opposed to an external connection like a database or a bucket in the cloud), we’ll need to include those resources for the application to function.
Navigate to the Included content tile.
Check the box to Export all ‘input’ datasets.
Check the box to Export all ‘input’ managed folders data.
Note
We could also have explicitly selected which datasets or managed folders to include. This process is analogous to including additional content in a project bundle.
Design the application#
Now we can design the functionality of the application by adding tiles of pre-defined actions.
At the bottom of the page, click Add Section.
Give the title
Export windowed transaction data by month
.
Add a tile for editing variables#
First we need a tile that allows users to choose their month of interest (in other words, set a project variable).
Click Add Tile.
Select Edit project variables.
Give the title
Select transaction month
.Replace the auto-generated controls with the following JSON array:
[
{
"name": "tx_month",
"label": "Set the tx_month variable",
"type": "STRING",
"description": "Format YYYY-MM"
}
]
Add a tile for running the scenario#
Next, we want the user to be able to run the scenario that rebuilds the Flow.
Click Add Tile.
Select Run scenario.
Give the title
Run the Data Refresh scenario
.Select Data Refresh as the scenario.
Add a tile for downloading the results#
Lastly, we want the user to be able to export the final output dataset.
Click Add Tile.
Select Download dataset.
Give the title
Download the output dataset
.Select tx_windows as the dataset (since this is the dataset that the scenario builds).
Click Save.
Test the application#
With these three tiles in place, we are ready to test the application.
Near the top right of the Application Designer, click Test.
In the test instance of the application, click Edit Project Variables.
Enter a new month
2017-02
, and click Commit.Click Run Now to trigger the scenario.
When the scenario finishes, click Run details to view the results.
Use the back arrow of the browser to return to the application test instance.
Click Download to export a copy of the tx_windows dataset rebuilt by the latest scenario run.
Tip
Import the CSV file you just downloaded into a Dataiku project to verify that it includes only records from the chosen month, 2017-02!
Create a new app instance#
It’s important to recognize the difference between the parent Dataiku project and the child instance of the Dataiku application that we just created. Let’s browse available Dataiku applications, and then create a new app instance to make sure this is clear.
From the top navigation bar, open the waffle menu, and select Dataiku Applications to browse Dataiku applications available on your instance.
Select Export transactions data to navigate to the application’s home.
Click Create App Instance.
Give it a unique name like
{YOURNAME} transaction export
.Click Create.
Important
Just like Dataiku projects, instances of Dataiku applications have project keys that cannot be changed and that must be unique to the instance.
Once you have another copy of the application, test it out once more. Change the variable to a new month (like
2017-03
), run the scenario, and export the results.
Version a Dataiku application#
As the app’s creator iterates on their project, they’ll find a need for version management. Let’s introduce an arbitrary change to the application in order to simulate this situation.
Return to the original Dataiku project. If not changed, the project key begins with
TUT_DKU_APPS
.From the top navigation bar, select the More Options menu (…) > Application Designer.
Make an arbitrary change to the application, such as renaming the title
Export new transactions data
.Click Update Version.
You can use any convention for naming versions, such as
MAJOR.MINOR.PATCH
. This was a minor change so rename the new version as1.1
.Give
New title
as the notification message.Click Update.
Click Save on the Dataiku application.
Once the app’s creator has issued a new version, downstream app users have the ability to update their own instances of the application.
Return to your child app instance, the one named
{YOURNAME} transaction export
.Seeing the alert about the new version, click Recreate App Instance to upgrade the app instance.
When finished with this section of the tutorial, click the Actions tab at the top right of the instance, and then Delete the project to clean up unneeded instances of the application.
Tip
In the Actions menu at the top right of the application home (not the individual app instance), you’ll also find the option to publish the application to a workspace, which can be a helpful collaboration space.
Recap of a visual Dataiku application#
Rather than focus on the functionality of this specific application, try to recognize the value that this work pattern brings:
One user can design a highly complex project that may prepare data, build models, or create dashboards for example.
The app’s creator can then enable many other users to access results or assets from that project in their own way — without interfering with other users or the original project.
As the app’s creator iterates on their project, app users can update their instances.
Tip
You can find another tutorial including a visual Dataiku application in the AI Collaboration Quick Start.
Build a Dataiku application-as-recipe#
Although a visual Dataiku application suits a wide variety of use cases, a second way to surface a Dataiku application may also interest you: a Dataiku application-as-recipe.
You may recall seeing this option before converting the existing project into a visual application.
Convert a project into a Dataiku application-as-recipe#
Instead of structuring the application as a separate instance with its own dedicated user interface, we can also package the application as a recipe that can be used in the Flow of other projects.
Return to the Dataiku Applications project (the project from which you created the application). Unless you manually chose a different key, the project key begins with
TUT_DKU_APPS
.From the More Options (…) menu in the top navigation bar, select Application Designer.
Navigate to the Advanced tab near the top right.
Toggle ON the “Use as recipe” field.
Click Save.
Define the application-as-recipe#
Since we have already handled the application header and included content sections when creating the visual application, we just need to define the contents of the recipe.
After saving, navigate back to the Content tab of the Application designer.
Select the Application-as-recipe panel on the left.
Click + Add New Input, and select the tx dataset.
Click + Add New Output, and select the tx_windows dataset.
For the scenario, select Data Refresh.
For the Auto-generated controls, copy-paste the same JSON as before:
[ { "name": "tx_month", "label": "Set the tx_month variable", "type": "STRING", "description": "Format YYYY-MM" } ]
Click Save.
Create a second project to test the application-as-recipe#
You’ve now created a visual recipe that can be used in other projects on the same instance! To demonstrate its usage, let’s create a new project that contains a dataset with the schema that matches that of the tx dataset defined as the input to the application-as-recipe.
From the Dataiku Design homepage, click + New Project > DSS tutorials > Advanced Designer > Dataiku Applications-as-recipes.
From the project homepage, click Go to Flow (or
g
+f
).
Note
You can also download the starter project from this website and import it as a zip file.
Use the application-as-recipe in the Flow#
This second project includes just one dataset: the transaction data from 2018. Its schema exactly matches the transaction data defined as an input to the application-as-recipe.
We can use the application-as-recipe just like any other visual recipe.
From the Flow, click + Recipe > Applications > Export transactions data.
Tip
If you don’t see the new application-as-recipe as an option, try a hard refresh of your browser.
Under Inputs, click + Add, and select tx_2018.
Under Outputs, click + Add, and name it tx_windows_2018.
Click Create Dataset.
Click Create.
Run the recipe#
The only step to configure in this recipe is to set the tx_month variable.
Provide a variable month such as
2018-01
.Click Run to execute the application-as-recipe.
Note
Depending on your instance settings, you may observe pop-up notifications indicating that the Data Refresh scenario is running.
When the recipe is finished running, explore the output and see your progress in the Flow.
What’s next?#
Let’s recap what we achieved here: With both varieties of Dataiku application, we were able to abstract the functionality of a Dataiku project behind a simple interface that is reusable by colleagues and that does not interfere with the original project.
For a visual application, that abstraction comes in the form of its own instance and user interface.
For an application-as-recipe, that abstraction comes in the form of a visual recipe.
Dataiku applications are one way to operationalize a project. Another method is to batch deploy a project bundle to a production environment. Learn more in Concept | Batch deployment.
See also
See the reference documentation to learn more about Dataiku Applications.