Concept | Surfacing Dataiku items in the Govern node#

Readiness for AI governance requires first raising awareness of an organization’s existing assets.

The Govern node enables organizations to track in one place all their projects, models, bundles, and GenAI items across Dataiku nodes. It does this by fetching the metadata of Dataiku items found in connected nodes and maintaining a synchronized view.

Governable items#

The Govern node’s Governable items (Governable items icon.) page surfaces metadata of Dataiku items that aren’t yet governed. You can think of this page as an inbox of Dataiku items eligible for governance.

On this page, you can:

  1. View governable Dataiku items by type (project, model, bundle, etc.).

  2. Hide items you don’t want to govern using the eye (View impact icon.) icon at the end of an item’s row.

  3. Design and save filters (Add filter icon.) to sift through ungoverned items.

  4. Review item metadata in the Source objects (Source objects icon.) tab of the right panel.

  5. For some types of items, the right panel also includes tabs for Deployments (Deployments icon.) and/or Model metrics (Model metrics icon.).

  6. Most importantly, add a governance layer to a Dataiku item. In other words, govern (Gavel icon.) the item.

Dataiku screenshot of the Governable items page.

Types of governable items#

The type of metadata collected in the Govern node depends on the type of Dataiku item. The table below reports various examples of the metadata fetched for each type of Dataiku item.

Dataiku item

Description

Metadata examples

Projects

A Dataiku project is a container for all work on a specific activity. It organizes objects like datasets, processing logic, notebooks, analyses, models, agents, dashboards, etc. around one activity.

Name, key, creator, creation date, etc.

Bundles

A bundle is a versioned snapshot of a project’s configuration intended for deployment to a production environment. It contains the necessary components to “replay” the project in a production setting.

Project standards report, release notes, etc.

ML saved models

An ML saved model is the overall model entity represented by a diamond in the Flow. You might think of it as a placeholder for the model’s “verb” (predicting churn, clustering segments, forecasting revenue, etc.).

Associated node, last modification date, etc.

ML saved model versions

An ML saved model version is the model package (or model artifact) for an ML task. You might think of it as the actual algorithm.

Model metrics, deployments, etc.

LLM & Agent items

Learn more in the reference documentation on Generative AI and LLM Mesh and Agentic AI.

Last modifier, related objects, etc.

Tip

The Govern node only syncs the metadata of Dataiku items. It doesn’t store the actual items. The actual models, for example, never enter the Govern node. Within the Governable items page, you can review the available metadata for each type of item in the right Details panel.

Dataiku item hierarchy#

Dataiku items follow a specific information hierarchy. They conform to certain parent-child relationships, which, as you’ll see, have important consequences for governance. Use the diagram and table below to understand this hierarchy.

Slide of the Dataiku item hierarchy.

Dataiku item

Parent-child relationships

Diagram interpretation

Projects

A project is at the top of the Dataiku information hierarchy. It has no parent item. Possible child items include bundles, models, LLMs, and agents.

Dataiku Project A is the parent of Bundle B, Model C, LLM D, and Agent E.

Bundles

A Dataiku project can have any number of child bundles, but a bundle belongs to exactly one parent project.

Bundle B is a child item of its parent, Dataiku Project A.

Saved models

A Dataiku project can have any number of saved models, but a saved model belongs to exactly one parent project.

Model C is a child item of its parent, Dataiku Project A.

Saved model versions

A saved model can have any number of child model versions, but a model version belongs to exactly one parent model.

Model versions F and G are child items of their parent, Model C.

Fine-tuned LLMs

A Dataiku project can have any number of fine-tuned LLMs, but a fine-tuned LLM belongs to exactly one parent project.

LLM D is a child item of its parent, Dataiku Project A.

Fine-tuned LLM versions

A fine-tuned LLM can have any number of child LLM versions, but an LLM version belongs to exactly one parent LLM.

LLM versions H and J are child items of their parent, LLM D.

Agents

A Dataiku project can have any number of agents, but an agent belongs to exactly one parent project.

Agent E is a child item of its parent, Dataiku Project A.

Agent versions

An agent can have any number of child agent versions, but an agent version belongs to exactly one parent agent.

Agent versions K and L are child items of their parent, Agent E.

Non-governable items#

Looking at the hierarchy of governable items synced from connected Dataiku nodes, you may be wondering about other Dataiku items not present, such as datasets.

The Govern node does sync basic metadata from Dataiku datasets. Within the Governable items (Governable items icon.) page, you’ll find a list of datasets in a project’s Source objects (Source objects icon.) tab of the right Details panel.

Although the Govern node syncs the metadata of Dataiku datasets, datasets themselves aren’t governable items. Monitoring at the dataset level occurs on the Dataiku side of the platform. Project builders may use a combination of features including but not limited to:

Tip

Like other synced items, the Govern node doesn’t store actual Dataiku datasets. It’s only syncing metadata.

Item registries#

At times, you may want to view all your organization’s bundles, models, or GenAI items — whether they’re governed or not. Three different registries serve this need.

Bundle registry#

The Govern node’s Bundle registry (Bundle icon.) provides a complete list of all bundles from connected Dataiku nodes, regardless of governance status.

  1. Nested within each parent project, you’ll find all bundles, including whether it’s deployed.

  2. For a deployed bundle, the Deployments tab of the right details panel reports information such as the deployment infrastructure and Govern policy.

  3. You can filter the page to include only deployed bundles.

Dataiku screenshot of the Details panel of a bundle in the Bundle registry page.

Tip

You’ll encounter the role of Govern policies in Tutorial | Governance lifecycle.

Model registry#

The Govern node’s Model registry (Model registry icon.) provides a complete list of models from your connected Dataiku nodes—regardless of governance status—organized by project.

  1. Nested within each parent project and saved model, you’ll find all saved model versions, including results for the focus metric of your choice.

  2. Saved model versions have additional views in the right details panel, including for Model Metrics (Model metrics icon.) and Deployments (Deployments icon.).

  3. You can filter the page to include only deployed model versions.

  4. By default, ROC AUC is the Metric to Focus included in the row of each model version. You can switch to other metrics, such as data drift, precision, accuracy, etc.

Dataiku screenshot of the Details panel of a model version in the Model registry page.

Note

Most model metrics show the initial metric values drawn from the Design node or Automation node when building the model version. However, drift metrics come from the model evaluations stored in a model evaluation store (MES).

The MES must exist in the same project as the saved model of the model version being evaluated. You can configure the MES to opt out of the Govern sync if needed. Otherwise, metrics update anytime an evaluation runs.

GenAI registry#

Important

To take full advantage of governance of GenAI items, you’ll need an advanced license.

For a variety of reasons, including their associated costs, governance is a key obstacle to enterprise-wide deployment of Generative AI applications. Accordingly, governance of GenAI use cases is a motivating factor behind Dataiku’s LLM Mesh.

The Govern node plays an important complementary role in this mission:

  1. All pages identify Dataiku items that include LLM usage in recipes or Answers webapps with a pink LLM badge.

  2. All pages include the ability to filter for items using LLMs.

  3. Most specifically, the Govern node includes a GenAI registry (AI icon with stars.) to manage the governance of GenAI items such as fine-tuned LLMs, agents, and augmented LLMs. It functions just like the Model and Bundle registry pages.

Dataiku screenshot of the GenAI registry.

Next steps#

Now that you know about centralization in Dataiku Govern, learn how to launch item governance in Concept | Adding a governance layer to Dataiku items.