Tags

Tags help you to organize your work within projects and make it discoverable across the Dataiku DSS instance. This tutorial will show you how to:

  • Create and assign tags to DSS objects.

  • Use tag views within the Flow.

  • Create tag categories for better data governance.

Create the Project

Let’s use the Haiku Starter project as it was at the end of the Flow Zone lesson.

Note

If you haven’t done the Flow Zone lesson, you can start with a fresh version of the project by clicking + New Project > DSS Tutorials > General Topics > Haiku Starter. As you work through this lesson, your Flow will appear a little different from the images, but otherwise, your experience with the lesson should be the same.

../../_images/flow-zone-experimental2.png

To have a clean starting point for this tutorial:

  • Delete the Experimental Flow Zone.

  • Expand all Flow Zones.

Creating and Assigning Tags

Tags are a universal property of all DSS objects. One of the easiest ways to create and assign them is from the Summary of a DSS object. To create your first tag:

  • Go to the project homepage for Haiku Starter.

  • Click + Add tags to open the tag editor. From here, you can remove tags currently assigned to the project, add tags that already exist on this instance, and create and assign entirely new tags.

  • Since this project deals with a classification problem, add classification as a tag, creating it if necessary.

  • Since this project deals with customer data, add customer analytics as a tag, creating it if necessary.

  • Click Save.

These tags are useful for filtering search results in the Catalog, making the project more discoverable for colleagues looking for projects that contain classification problems, and/or work with customer data.

Note

The list of available existing tags depends upon the DSS instance you’re working on, so what you see may differ from the image below.

../../_images/tags-project1.png

You can also apply tags to lists of objects.

  • From the top navigation bar, click Visual Analyses.

  • Select the checkbox for High revenue analysis.

  • From the Actions menu, select Tag.

  • Since this is a classification problem, select classification as the tag.

  • Click Save.

../../_images/tags-object-list2.png

You can also apply tags directly to objects in the Flow. Let’s set a goal to tag objects within the Flow to indicate different stages of activities: Connecting to data, Processing Data, and ML/Scoring.

  • Return to the Flow of this project.

  • Select the Orders and Customers folders and datasets.

  • In the Actions right panel, click Tag.

  • In the modal dialog, type stage:Connections and click Add.

  • Click Save.

../../_images/tags-flow2.png

Now we’ll repeat the process.

  • Select all the objects from the Join recipe to the Orders_enriched_prepared dataset, inclusive.

  • In the Actions right panel, click Tag.

  • In the modal dialog, type stage:Data processing and click Add.

  • Click Save.

  • Select all the objects from the Group recipe to the High revenue prediction deployed model, inclusive.

  • In the Actions right panel, click Tag.

  • In the modal dialog, type stage:ML/Scoring and click Add.

  • Click Save.

In the primary view, the tags are invisible in the Flow. Next, we’ll use tag views to see how they can enrich our understanding of the Flow.

Tags View

There is a special Tags view:

  • From the View menu in the lower left of the Flow, select Tags.

  • Select all three tags.

You can thus get a view of the Flow with the objects colored by tag, rather than by type of object. There is a small dot at the lower right of each icon so that even when a tag is not selected in the view, you can see that the object is tagged.

../../_images/tags-view2.png

We can immediately see that some of the objects in the flow are not tagged. We can tag them from the Tags view.

  • Deselect the stage:ML/Scoring and stage:Connections tags.

  • Select the following objects:

    • The Orders_by_Country_Category dataset and its parent Group recipe.

    • The Orders_by_Customer dataset and its parent Group recipe.

    • The Filter recipe and Orders_filtered dataset.

  • Right-click, and from the context menu, select Add tag stage: Data processing to selected items.

Now every object is tagged. Note in particular that the Orders_by_customer dataset and its parent Group recipe have two tags applied because they involve data processing as part of a lead-up to a machine learning model.

../../_images/tags-view-021.png

So far, we have been creating tags on an ad hoc basis. But what happens if another colleague creates their own tags like stage:Inputs, stage:Processing data, stage:Machine Learning? The intent behind these tags is the same as the ones we’ve created, but because the names of the tags are different, it will make discoverability more difficult. Fortunately, we can use tag categories to create common tags across instances.

Tag Categories

Tag categories are an administrative tool to improve governance and consistency. They are set at a global level and apply across the entire instance.

Note

You must have the Administrator general permission on your DSS instance in order to manage tag categories.

Within the admin settings, you can create tag categories and define the tags within the category.

  • From the application menu, choose Administration.

  • Navigate to Settings > Global tag categories.

  • Click + Add a Category.

  • Name the category stage.

  • Click + Add a Tag and type Connections as the name of the tag.

  • Repeat to create tags Data processing and ML/Scoring.

  • Tag categories can be applied to all Dataiku DSS objects or a subset of them. Choose to apply these categories just to the Flow.

  • Click Save.

../../_images/global-tag-categories1.png

Once created and saved, tags from global tag categories act like any other tag. Return to the Flow to see that the new tag categories have superseded our old tags. This is because of the special format we used to create the tags, <category> + “:” + <tag>, so that when we created category stage with tag Connections, it immediately took the place of stage:Connections.

../../_images/tags-view-031.png