Dataiku User Interface

Here you’ll find resources to help you get comfortable navigating the Dataiku interface. Learn about the core elements of Dataiku such as projects, the Flow, and catalog to make you a more productive user.

Reference | Right panel navigation

In Dataiku, datasets and recipes together make up the Flow of any project. On the right side of the Flow is a collapsible panel packed with more functionality than you may be aware of.

The Right Panel is contextual. Its content responds to what kind of object is selected from the Flow.

Actions tab

The most important tab is the Actions tab. With a dataset selected, you can find controls for building, exporting, or sharing the dataset, or initiating a Lab analysis. You can start visual, code, or plugin recipes. Or perform other actions such as changing a connection.

Screenshot of the Actions tab options for a dataset.

Selecting a different kind of object changes the functionality found in the Actions tab. With a recipe selected, you can now find actions such as editing the recipe or changing the engine.

Screenshot of the Actions tab options for a recipe.

With a machine learning model selected, you can retrain, share, or publish the model. You can also initiate a Score recipe to make predictions on new data with the model, or start an Evalute recipe to see how the model performs on already-known data.

Screenshot of the Actions tab options for a model.

Details tab

Beneath the Actions tab, the Details tab shows metadata about the object, such as its creation and modification dates. For a dataset, this tab will highlight the parent recipe and any associated Labs or Charts, along with the size and number of records.

Screenshot of the Details tab information for a dataset.

For a recipe, this tab also shows the corresponding input and output datasets. For some recipes, including Group, Prepare, Join, and Stack, the Details tab also shows a summary of actions completed in the recipe.

Screenshot of the Details tab summary for a recipe.

Schema tab

For a dataset, the Schema tab provides a list of all columns and their storage types.

Screenshot of the Schema tab for a dataset.

Discussions tab

The Discussions tab shows any existing discussions attached to the object, and lets you start a new one.

Screenshot of the Discussions tab.

Lab tab

For datasets, you can navigate to the Lab tab to start a visual machine learning task or start a new code notebook.

Screenshot of the Lab tab options for a dataset.

Timeline tab

The Timeline tab shows the date and time of actions made on objects, along with the user who took the action.

Screenshot of the Timeline tab information for a model training session.

How-to | Duplicate a Dataiku project

In many situations, an existing project can serve as a useful template for a new project. Fortunately, it is very easy to duplicate a Dataiku project so you never need to manually replicate a Flow.

Whether you want to copy the project to the same or another instance, detailed instructions for duplicating a project can be found in the reference documentation.

You can initiate this process from the project homepage or the Dataiku projects page.

From the project homepage

From the homepage of the project you want to copy, select Actions > Duplicate project. This will produce a dialog window through which you can edit the project name and key and specify a destination where the project should be copied.

You can also specify advanced options such as duplication mode (should all of the data be duplicated or only the Flow?) and any connections that may need to be re-mapped.

../../_images/kb-copy-project-1.png

From the projects page

Another way to start the same process is, from the Projects page, right-clicking on the tile of the project to be copied and choosing Duplicate project.

../../_images/kb-copy-project-2.png

The video below walks through the process for copying projects in detail:

Tip | Navigating the Flow

You don’t need to always go back to the Flow to navigate, there are shortcut buttons:

  • In a dataset, try the “Parent recipe” button

  • In a recipe, click “Inputs outputs” then the name of a dataset to open it.

  • Use the shortcut shift+a or click on the compass (next to the name of the recipe/dataset you’re working on) to bring up the navigator.

Tip

From the flow view, press Z to zoom the flow on the last browsed dataset. See more useful shortcuts like this one.

Tip | Refactoring the Flow

Refactoring will be reworked in future versions to make things easier.

While most refactor operations are possible today, some require multiple steps when one would rather use a single click operation.

To keep the flow simple and clean:

  • Refactor from time to time.

  • “Temporary” copies of a recipe tend to be not so temporary… It’s OK in a hurry, but make sure you have time after the deadline to come back. If not in a hurry, it’s often better to work a bit harder to rework the flow instead of copy-pasting a recipe.

  • Talk! When creating a new branch in the flow, explain what it is needed for, and especially when it can be deleted.

  • Conversely, clean your own stuff. Factorize with what other people did in the project, to avoid reinventing the wheel.

  • Know what your colleagues did, ask if you may modify their work so that it suits your need too.

Tip | Anchoring for Flow management

When your flow is composed of multiple short branches, it can be a pain to see them stretch all the way to the right of the screen.

Of course the ideal shape for a flow is a pipe, with clear inputs and outputs, and no branches dangling.

However, not all projects lead to such clean flows, and sometimes you are required to fork your work at multiple steps.

In order to better visualize your flow in that case, navigate to the project homepage, click on the settings menu, under the config tab uncheck the Anchor Flow graph option.

Flow with anchors activated vs. without anchors activated.

Tip | Hide or show Flow items

Dataiku provides the means to hide or show parts of your Flow, as needed. This is especially useful when working with large Flows.

By right-clicking a dataset or a recipe, you can access options that include Hide all upstream and Hide all downstream. This allows you to hide all the upstream or downstream objects connected to the selected objects.

Dataiku screenshot showing the option to hide objects upstream or downstream from a point.

To show the hidden objects again, simply click the plus (+) sign.

Tip | Exposing datasets and models

If major parts of your Flow are independent and relatively complex, you can use multiple projects to split your work. For example you could have a Flow for data collection, then a Flow for machine learning and analysis, and finally a Flow for visualization, all in their own separate project.

In order to use the datasets of a project in another project, go to project settings → config → exposed elements and expose datasets (or models) to another project.

Exposed datasets become a source element in the other project.

This is a way to split a big undertaking into a few projects and allowing each user to work in her own project.

Some users also create a “datamart” project, containing mostly datasets meant to be exposed to other projects.

Beware that the dependencies between projects can quickly become a maze, get the global picture before exposing datasets.

Tip | Using project folders

As the number of projects on a Dataiku instance grows, it is important to maintain an organizational structure easily understood by all contributors. At the same time, any organizational structure should also support the data governance policies of the enterprise.

As detailed in the reference documentation, project folders in Dataiku support both of these objectives.

Improve organization

Project Folders allow you to organize projects in a hierarchy of folders of unlimited depth. Just drag and drop projects (or existing folders of projects) into or out of folders. Click holding the shift key to select multiple projects at a time.

Alternatively, you can manage project folders via the Public API. Programmatically create empty folders to generate an organizational structure from a script. See the Python API documentation to learn more.

Control access

You can also properly control access to folders using the same groups-based permissions framework applied to projects.

Project folders in Dataiku have read, write, and admin permissions. You can grant these permissions to any groups on the instance.

The permissions of a new subfolder default to the permissions of the parent folder, and can be changed as needed.

While you can define permissions to view or edit a project folder, these will not affect the permissions to view or edit the individual projects that live in that folder.

See the video below for a walkthrough of how to utilize project folders in Dataiku: