Hands-On: Flow Zones, Tags, & More Flow Views

Flow views provide various options for displaying the Flow with different levels of detail. These views can help with organizing large Flows so that they are easier to navigate. They can also guide optimization of a Flow.

Let’s Get Started!

In this hands-on lesson, you will learn to:

  • Create, and manage tags for better data governance.

  • Create, and manage Flow Zones to create a higher-level view of the Flow.

  • Use Flow Zones to isolate experimental branches of the Flow.

  • Leverage available view options to highlight details about your Flow.

Prerequisites

This lesson assumes that you have basic knowledge of working with Dataiku DSS datasets and recipes.

Note

If not already on the Advanced Designer learning path, completing the Core Designer Certificate is recommended.

You’ll need access to an instance of Dataiku DSS (version 8.0 or above) with the following plugins installed:

These plugins are available through the Dataiku Plugin store, and you can find the instructions for installing plugins in the reference documentation. To check whether the plugins are already installed on your instance, go to the Installed tab in the Plugin Store to see a list of all installed plugins.

../../_images/adv-designer-plugins.png

Tip

Users of Dataiku Online should note that while plugin installation is not directly available, you can still explore available plugins from your launchpad:

  • From your instance launchpad, open the Features panel on the left hand side.

  • Click Add a Feature and choose “US Census” from the Extensions menu. (Reverse geocoding is already available by default).

  • You can see what plugins are already installed by searching for “installed plugins” in the DSS search bar.

We also recommend that you complete the Flow Views: Zones, Tags, & More lesson beforehand.

Create the Project

Rather than starting from scratch, we’ll use an existing Flow.

  • Click +New Project > DSS Tutorials > Advanced Designer > Flow Views & Actions (Tutorial).

Note

You can also use a successfully completed project from the Plugin Store course.

Change Connections (Optional)

Aside from the input datasets, all of the others are empty managed filesystem datasets.

You are welcome to leave the storage connection of these datasets in place, but you can also use another storage system depending on the infrastructure available to you.

To use another connection, such as a SQL database, follow these steps:

  • Select the empty datasets from the Flow. (On a Mac, hold Shift to select multiple datasets).

  • Click Change connection in the “Other actions” section of the Actions sidebar.

  • Use the dropdown menu to select the new connection.

  • Click Save.

Note

For a dataset that is already built, changing to a new connection clears the dataset so that it would need to be rebuilt.

../../_images/adv-designer-change-connection.png

Note

Another way to select datasets is from the Datasets page (G+D). There are also programmatic ways of doing operations like this that you’ll learn about in the Developer learning path.

The screenshots below demonstrate using a PostgreSQL database.

Build Your Project

If you have chosen to import a new project, you’ll notice that this project only has the skeleton of the Flow. The datasets have not yet been built.

../../_images/tags-views-starting-flow.png
  • To build the Flow, click Flow Actions at the bottom right corner of the Flow.

  • Select Build all.

  • Build with the default “Build required dependencies” option for handling dependencies.

Note

See the article on Dataset Building Strategies and the product documentation on Rebuilding Datasets to learn more about strategies for building datasets.

Tags

Tags are a universal property of all DSS objects. Tags help you to organize your work within projects and make it discoverable across the Dataiku DSS instance. We’ll begin by showing how to:

  • Create and assign tags to DSS objects.

  • Use tag views within the Flow.

  • Create tag categories for better data governance.

Creating and Assigning Tags

One of the easiest ways to create and assign tags is from the Summary of a DSS object. To create your first tag:

  • Go to the project homepage.

  • Click + Add tags to open the tag editor. From here, you can remove tags currently assigned to the project, add tags that already exist on this instance, and create and assign entirely new tags.

  • Since this project deals with a classification problem, add classification as a tag, creating it if necessary.

  • Since this project deals with customer transaction data, add transaction analytics as a tag, creating it if necessary.

  • Click Save.

These tags are useful for filtering search results in the Catalog, making the project more discoverable for colleagues looking for projects that contain classification problems, and/or work with transaction data.

Note

The list of available existing tags depends upon the DSS instance you’re working on, so what you see may differ from the image below.

../../_images/tags-homepage.png

You can also apply tags to lists of objects.

  • From the top navigation bar, click Recipes.

  • Select the checkbox for score_transactions_unknown and train_Prediction__RANDOM_FOREST_CLASSIFICATION__on_transactions_known.

  • From the Actions menu, select Tag.

  • Since this is a classification problem, Add classification as the tag and Add (or select it if it already exists).

  • Click Save.

../../_images/tags-object-list.png

You can also apply tags directly to objects in the Flow. Let’s set a goal to tag objects within the Flow to indicate different stages of activities for the ML branch: Connecting to data, Processing Data, and ML/Scoring.

  • Return to the Flow of this project.

  • Select the nine objects leading to the compute_transactions_joined recipe. This includes three input datasets (transactions, cardholder_info, merchant_info), three Sync recipes, and the corresponding copies (transactions_copy, cardholder_info_copy, merchant_info_copy).

Hint

An easy way to select multiple objects in the Flow is by holding the ‘shift’ or ‘command’ key while dragging a box that contains the objects).

  • In the Actions right panel, click Tag.

  • In the modal dialog, type stage:Connections (or select it, if already existing) and click Add.

  • Click Save.

../../_images/tags-flow.png

Now we’ll repeat the process.

  • Select all the objects from the Join recipe up to and including the transactions_windows dataset.

  • In the Actions right panel, click Tag.

  • In the modal dialog, type stage:Data processing and click Add.

  • Click Save.

  • Select all the objects from the Split recipe up to and including the transactions_unknown_scored dataset.

  • In the Actions right panel, click Tag.

  • In the modal dialog, type stage:ML/Scoring and click Add.

  • Click Save.

In the default view of the Flow, the tags are invisible. We can only see them by hovering over an object or from the Details panel of the Actions sidebar. However, we can use the Tags view to enrich our overall understanding of the Flow.

Tags View

There is a special Tags view in the Flow:

  • From the View menu in the lower left corner of the Flow, select Tags.

  • Select all three stage tags.

You can thus get a view of the Flow with the objects colored by tag, rather than by type of object. There is a small dot at the lower right of each icon so that even when a tag is not selected in the view, you can still see that the object is tagged.

../../_images/tags-view.png

We can immediately see that some of the objects in the Flow are not tagged. Now that you’ve seen how to create tags, feel free to tag other Flow objects with an appropriate category (eg. stage:transactions analytics for the lower branch).

Note that the training recipe and the scoring recipe have two tags applied because they are tagged as part of the machine learning stage and involve classification models.

So far, we have been creating tags on an ad hoc basis. But what happens if another colleague creates their own tags like stage:Inputs, stage:Processing data, stage:Machine Learning? The intent behind these tags is the same as the ones we’ve created, but because the names of the tags are different, it will make discoverability more difficult. Fortunately, we can use tag categories to create common tags across projects on an instance.

Tag Categories

Tag categories are an administrative tool to improve governance and consistency. They are set at a global level and apply across the entire instance.

Note

You must have the Administrator general permission on your DSS instance in order to manage tag categories.

Within the admin settings, you can create tag categories and define the tags within a category.

  • From the application menu, choose Administration.

  • Navigate to Settings > Global tag categories.

  • Click + Add a Category.

  • Name the category stage.

  • Click + Add a Tag and type Connections as the name of the tag.

  • Repeat to create tags Data processing and ML/Scoring.

  • Tag categories can be applied to all Dataiku DSS objects or a subset of them. Choose to apply these categories just to the Flow.

  • Click Save.

../../_images/tags-global-tag-categories.png

Once created and saved, tags from global tag categories act like any other tag. Return to the Flow to see that the new tag categories have superseded our old tags. This is because of the special format we used to create the tags (<category> + ":" + <tag>) so that when we created category stage with tag Connections, it immediately took the place of stage:Connections.

../../_images/tags-view-global-categ.png

We have created tags to understand our work within the project. Let’s do some more organization by dividing the Flow into Zones.

Flow Zones

Flow Zones help you to organize large Flows so that they are easier to navigate. We’ll begin by showing how to:

  • Create Flow Zones and move objects into zones to create a higher-level view of the Flow.

  • Manage the contents and properties of existing Flow Zones.

  • Use Flow Zones to isolate experimental branches of the Flow.

Creating Flow Zones

To create your first zone:

  • From the top right corner of the Flow, click + Zone.

  • Type Fraud detection as the name of the zone.

  • Click Confirm.

../../_images/fz-create-a-zone.png

This creates an empty zone named Fraud detection, and reveals the Default zone, which contains the rest of the Flow.

../../_images/fz-first-zone.png

To move objects into the Fraud detection flow zone:

  • Select the branch of the Flow starting with the three datasets transactions, cardholder_info, and merchant_info and ending with the dataset transactions_unknown_scored. (Hint: hold down ‘shift’ or ‘command’ while dragging a box to select several Flow items at once, and then click again on unwanted objects while holding shift.)

  • Right-click to open the context menu and select Move to a flow zone.

  • Confirm in the modal dialog that Fraud detection is selected as the zone to move the objects to, and then click Move.

../../_images/fz-move-into-zone.png

You can create further zones in a similar fashion, or directly from selected objects in the Flow. For example:

  • Select the two datasets about income-per-tract information: income_per_tract_usa and income_per_tract_usa_copy.

  • Also, select the three datasets about merchant information: merchant_by_state, merchant_census_tracts, and merchant_with_tract_income

  • From the right panel, select Move to a Flow Zone.

  • Within the modal dialog, click New Zone and type Merchant analysis as the name.

  • The dialog warns that moving this dataset will have the additional effect of moving several recipes into the new zone. Those are the parent recipes of the datasets. A recipe and its outputs always live in the same zone.

  • Click Confirm.

../../_images/fz-move-into-zone2.png

We can immediately see the labeling benefits of Flow Zones. While we may be able to infer the purpose of the datasets (merchant_by_state and income_per_tract_usa) from their names, the names of the zones (which contain the parts of the Flow that creates the datasets) provide more descriptive labels that immediately tell us their purpose.

Finally, we can rename the Default zone to something more descriptive.

  • Right-click on the Default zone and select Edit from the context menu.

  • Type Transactions analysis as the new name.

../../_images/fz-move-and-rename-zone.png

Using Zones to break up the Flow allows you to see things at a higher level of abstraction. This can help to quickly onboard new team members to projects, as they will be able to grasp the overall purpose of the Flow before getting into the details.

Flow Zone Views

There is also a special Flow Zones view:

  • From the View menu in the lower left corner of the Flow, select Flow Zones.

  • Click Hide Zones.

You can thus get a view of the entire Flow with the zone boxes hidden, but with the Flow objects colored according to their assigned zone.

../../_images/fz-hide-zones-view.png

Hiding Flow Details and Zooming Into Flows

You can also hide the details of zones to avoid distracting from the high-level view of the Flow.

  • Close the Flow Zone view to return to the default view.

  • Right-click on any of the zones, then select Collapse all.

You can then expand individual zones again when you want to see those details, or all at once. This feature is particularly useful in large Flows with many zones.

../../_images/fz-flow-zone-collapsed.png

You can select a zone to see the details of that part of the Flow.

  • In the title bar of the Fraud detection zone, click the Open icon next to the expand/collapse icon.

While zoomed in to the Zone, it’s easier to work with the objects in this part of the Flow.

../../_images/fz-flow-zone-zoomed-in.png

Note

Since they are DSS objects, you can give Flow Zones descriptions and tags, or hold discussions on them. You can access these functions in the right panel, as you can with other Flow objects.

Isolating Experimental Work

Lastly, you can also use Flow Zones to mark off “experimental” work within a Flow. You can share, rather than move, the objects that you want to experiment with to a new zone.

  • Close the Fraud detection zone to return to the main view of the Flow, where all zones are visible.

  • Expand the Merchant analysis zone and right-click on the last output dataset merchant_with_tract_income.

  • From the context menu, select Share to a flow zone.

  • In the modal dialog, click New Zone and type Experimental as the name.

  • Click Confirm.

../../_images/fz-flow-zone-experimental.png

Now this dataset is accessible in a new Flow Zone. We have a fresh space to begin experimental work without disturbing the original Flow.

Other Flow Views

We have seen Tags and Flow Zones, but Dataiku DSS offers many other informative views, such as connections, recipe engines, and code environments. Let’s have a look at the connections used here.

  • From the “View” menu in the lower left corner of the Flow, select Connections.

../../_images/views-connections.png

Since this Flow leverages only a single connection (in this case, a PostgreSQL database, but in yours possibly the managed filesystem), this view does not bring a lot of information. In production though, we can imagine a situation where different datasets are stored in different connections. In such a situation, the Connections view would provide an overview of where the datasets are stored.

Another informative view is the Recipe engines view. Since we changed the connection to an SQL database, we expect the recipes to leverage the SQL engine. Let’s check if that is the case.

  • First expand all Flow Zones by right-clicking the header of any zone, and selecting Expand all.

  • Then, from the View menu, click Recipe engines, and select only the checkbox for the “Sql” engine.

../../_images/views-recipe-engines.png

We can see that all recipes that have SQL datasets as input and output leverage the SQL engine, except for the Prepare recipes. This could be due to the fact that many of the processors in the Prepare recipe are not SQL-compatible (see Details on the in-database (SQL) engine to learn more).

Tip

On your own, try out some of the other available views to see what value they can bring to managing complicated Flows.

Learn More

Great job! Now you have some hands-on experience working with Tags, Flow Zones, and some of the other available Views.

If you have not already done so, register for the Academy course on Flow Views & Actions to validate your knowledge of this material.

You can also explore additional articles in the Flow Views & Actions page of the Knowledge Base to learn more about this topic.