Dataiku User Interface¶
Here you’ll find resources to help you get comfortable navigating the Dataiku interface. Learn about the core elements of Dataiku such as projects, the Flow, and catalog to make you a more productive user.
In many situations, an existing project can serve as a useful template for a new project. Fortunately, it is very easy to duplicate a Dataiku project so you never need to manually replicate a Flow.
Whether you want to copy the project to the same or another instance, detailed instructions for duplicating a project can be found in the reference documentation.
You can initiate this process from the project homepage or the Dataiku projects page.
From the project homepage¶
From the homepage of the project you want to copy, select Actions > Duplicate project. This will produce a dialog window through which you can edit the project name and key and specify a destination where the project should be copied.
You can also specify advanced options such as duplication mode (should all of the data be duplicated or only the Flow?) and any connections that may need to be re-mapped.
From the projects page¶
Another way to start the same process is, from the Projects page, right-clicking on the tile of the project to be copied and choosing Duplicate project.
The video below walks through the process for copying projects in detail:
Refactoring will be reworked in future versions to make things easier.
While most refactor operations are possible today, some require multiple steps when one would rather use a single click operation.
To keep the flow simple and clean:
Refactor from time to time.
“Temporary” copies of a recipe tend to be not so temporary… It’s OK in a hurry, but make sure you have time after the deadline to come back. If not in a hurry, it’s often better to work a bit harder to rework the flow instead of copy-pasting a recipe.
Talk! When creating a new branch in the flow, explain what it is needed for, and especially when it can be deleted.
Conversely, clean your own stuff. Factorize with what other people did in the project, to avoid reinventing the wheel.
Know what your colleagues did, ask if you may modify their work so that it suits your need too.
When your flow is composed of multiple short branches, it can be a pain to see them stretch all the way to the right of the screen.
Of course the ideal shape for a flow is a pipe, with clear inputs and outputs, and no branches dangling.
However, not all projects lead to such clean flows, and sometimes you are required to fork your work at multiple steps.
In order to better visualize your flow in that case, navigate to the project homepage, click on the settings menu, under the config tab uncheck the Anchor Flow graph option.
Dataiku provides the means to hide or show parts of your Flow, as needed. This is especially useful when working with large Flows.
By right-clicking a dataset or a recipe, you can access options that include Hide all upstream and Hide all downstream. This allows you to hide all the upstream or downstream objects connected to the selected objects.
To show the hidden objects again, simply click the plus (+) sign.
If major parts of your Flow are independent and relatively complex, you can use multiple projects to split your work. For example you could have a Flow for data collection, then a Flow for machine learning and analysis, and finally a Flow for visualization, all in their own separate project.
In order to use the datasets of a project in another project, go to project settings → config → exposed elements and expose datasets (or models) to another project.
Exposed datasets become a source element in the other project.
This is a way to split a big undertaking into a few projects and allowing each user to work in her own project.
Some users also create a “datamart” project, containing mostly datasets meant to be exposed to other projects.
Beware that the dependencies between projects can quickly become a maze, get the global picture before exposing datasets.
As the number of projects on a Dataiku instance grows, it is important to maintain an organizational structure easily understood by all contributors. At the same time, any organizational structure should also support the data governance policies of the enterprise.
As detailed in the reference documentation, project folders in Dataiku support both of these objectives.
Project Folders allow you to organize projects in a hierarchy of folders of unlimited depth. Just drag and drop projects (or existing folders of projects) into or out of folders. Click holding the shift key to select multiple projects at a time.
Alternatively, you can manage project folders via the Public API. Programmatically create empty folders to generate an organizational structure from a script. See the Python API documentation to learn more.
You can also properly control access to folders using the same groups-based permissions framework applied to projects.
Project folders in Dataiku have read, write, and admin permissions. You can grant these permissions to any groups on the instance.
The permissions of a new subfolder default to the permissions of the parent folder, and can be changed as needed.
While you can define permissions to view or edit a project folder, these will not affect the permissions to view or edit the individual projects that live in that folder.
See the video below for a walkthrough of how to utilize project folders in Dataiku: