Concept | Surfacing Dataiku metadata in the Govern node#
Readiness for AI governance requires first raising awareness of an organization’s existing assets.
The Govern node enables organizations to track in one place all their projects, models, bundles, and GenAI items. It does this by fetching the metadata of Dataiku items found in connected nodes and maintaining a synchronized view.
Governable items#
The Govern node’s Governable items () page surfaces metadata of Dataiku items that aren’t yet governed. You can think of this page as an inbox of Dataiku items eligible for governance.
On this page, you can:
View governable Dataiku items by type (project, model, bundle, etc.).
Design and save filters (
) to sift through ungoverned items.
Review item metadata in the Source objects (
) tab of the right panel.
For some types of items, the right panel also includes tabs for Model metrics (
) and/or Deployments (
).
Most importantly, add a governance layer to a Dataiku item. In other words, govern (
) the item.
Types of governable items#
The type of metadata displayed in the Govern node depends on the type of Dataiku item. The table below reports various examples of the metadata fetched for each type of Dataiku item.
Dataiku item |
Description |
Metadata examples |
|---|---|---|
Projects |
A Dataiku project is a container for all work on a specific activity. It organizes objects like datasets, processing logic, notebooks, analyses, models, agents, dashboards, etc. around one activity. |
Name, key, creator, creation date, etc. |
Bundles |
A bundle is a versioned snapshot of a project’s configuration intended for deployment to a production environment. It contains the necessary components to “replay” the project in a production setting. |
Project standards report, release notes, etc. |
ML saved models |
An ML saved model is the overall model entity represented by a diamond in the Flow. You might think of it as a placeholder for the model’s “verb” (predicting churn, clustering segments, forecasting revenue, etc.). |
Associated node, last modification date, etc. |
ML saved model versions |
An ML saved model version is the model package (or model artifact) for an ML task. You might think of it as the actual algorithm. |
Model metrics, deployments, etc. |
LLM & Agent items |
Learn more in the reference documentation on Generative AI and LLM Mesh and Agentic AI. |
Last modifier, related objects, etc. |
Tip
The Govern node only syncs the metadata of Dataiku items. It doesn’t store the actual items. The actual models, for example, never enter the Govern node. Within the Governable items page, you can review the available metadata for each type of item in the right Details panel.
Dataiku item hierarchy#
Dataiku items follow a specific information hierarchy. They conform to certain parent-child relationships, which, as you’ll see, have important consequences for governance. Use the diagram and table below to understand this hierarchy.
Dataiku item |
Parent-child relationships |
Diagram interpretation |
|---|---|---|
Projects |
A project is at the top of the Dataiku information hierarchy. It has no parent item. Possible child items include bundles, models, LLMs, and agents. |
Dataiku Project A is the parent of Bundle B, Model C, LLM D, and Agent E. |
Bundles |
A Dataiku project can have any number of child bundles, but a bundle belongs to exactly one parent project. |
Bundle B is a child item of its parent, Dataiku Project A. |
Saved models |
A Dataiku project can have any number of saved models, but a saved model belongs to exactly one parent project. |
Model C is a child item of its parent, Dataiku Project A. |
Saved model versions |
A saved model can have any number of child model versions, but a model version belongs to exactly one parent model. |
Model versions F and G are child items of their parent, Model C. |
Fine-tuned LLMs |
A Dataiku project can have any number of fine-tuned LLMs, but a fine-tuned LLM belongs to exactly one parent project. |
LLM D is a child item of its parent, Dataiku Project A. |
Fine-tuned LLM versions |
A fine-tuned LLM can have any number of child LLM versions, but an LLM version belongs to exactly one parent LLM. |
LLM versions H and J are child items of their parent, LLM D. |
Agents |
A Dataiku project can have any number of agents, but an agent belongs to exactly one parent project. |
Agent E is a child item of its parent, Dataiku Project A. |
Agent versions |
An agent can have any number of child agent versions, but an agent version belongs to exactly one parent agent. |
Agent versions K and L are child items of their parent, Agent E. |
Non-governable items#
Looking at the hierarchy of governable items synced from connected Dataiku nodes, you may be wondering about other Dataiku items not present, such as datasets.
The Govern node does sync basic metadata from Dataiku datasets. Within the Governable items () page, you’ll find a list of datasets in a project’s Source objects (
) tab of the right Details panel.
Although the Govern node syncs the metadata of Dataiku datasets, datasets themselves aren’t governable items. Monitoring at the dataset level occurs on the Dataiku side of the platform. Project builders may use a combination of features including but not limited to:
The Data Catalog to find relevant datasets.
Data lineage to trace a column’s transformations up and down data pipelines.
Data quality rules, teamed with scenarios, to automate notifications and actions based on dataset characteristics.
Unified Monitoring to trigger alerts when the status of those rules or scenarios change.
Tip
Like other synced items, the Govern node doesn’t store actual Dataiku datasets. It’s only syncing metadata.
Item registries#
At times, you may want to view all your organization’s bundles, models, or GenAI items — whether they’re governed or not. Three different registries serve this need.
Bundle registry#
The Govern node’s Bundle registry () provides a complete list of all bundles from connected Dataiku nodes, regardless of governance status.
Nested within each parent project, you’ll find all bundles, including whether it’s deployed.
For a deployed bundle, the Deployments tab of the right details panel reports information such as the deployment infrastructure and Govern policy.
You can filter the page to include only deployed bundles.
Tip
You’ll encounter the role of Govern policies in Tutorial | Governance lifecycle.
Model registry#
The Govern node’s Model registry () provides a complete list of models from your connected Dataiku nodes—regardless of governance status—organized by project.
Nested within each parent project and saved model, you’ll find all saved model versions, including results for the focus metric of your choice.
Saved model versions have additional views in the right details panel, including for Model Metrics (
) and Deployments (
).
You can filter the page to include only deployed model versions.
By default, ROC AUC is the Metric to Focus included in the row of each model version. You can switch to other metrics, such as data drift, precision, accuracy, etc.
Note
Most model metrics show the initial metric values drawn from the Design node or Automation node when building the model version. However, drift metrics come from the model evaluations stored in a model evaluation store (MES).
The MES must exist in the same project as the saved model of the model version being evaluated. You can configure the MES to opt out of the Govern sync if needed. Otherwise, metrics update anytime an evaluation runs.
GenAI registry#
Important
To take full advantage of governance of GenAI items, you’ll need an advanced license.
For a variety of reasons, including their associated costs, governance is a key obstacle to enterprise-wide deployment of Generative AI applications. Accordingly, governance of GenAI use cases is a motivating factor behind Dataiku’s LLM Mesh.
The Govern node plays an important complementary role in this mission:
All pages identify Dataiku items that include LLM usage in recipes or Answers webapps with a pink LLM badge.
All pages include the ability to filter for items using LLMs.
Most specifically, the Govern node includes a GenAI registry (
) to manage the governance of GenAI items such as fine-tuned LLMs, agents, and augmented LLMs. It functions just like the Model and Bundle registry pages.
The effect of instance-level governance settings#
Thus far, this article has introduced how synced items appear in the Governable items and registry pages. However, it’s important to recognize that your Govern node administrator has the capability to determine what items appear on these pages by defining instance-level governance settings.
Before reaching any end users, the Govern node administrator may automatically hide certain kinds of items from view. Alternatively, the administrator may automatically govern certain items, thereby bypassing the Governable items page. These choices depend entirely on the needs of the organization and their governance strategy.
The screenshot below analyzes one possible arrangement that can impact what items actually appear in the pages discussed so far:
In all cases, the Govern node administrator defines these rules from the Governance settings page of the waffle menu (
).
In this example, projects will have a recommendation to be governed according to the Dataiku Standard governance template. Under this setting, all projects appear in the Governable items page.
Bundles will be automatically hidden from end users. As a consequence, you won’t find them in the Governable items page.
Models will be automatically governed according to a custom governance template. They too would be absent from the Governable items page (but for the opposite reason!).
Model versions will be governed according to a Python script. Their appearance in the Governable items or registry pages would depend on the script’s behavior.
Important
Custom templates and the ability to govern items according to a script are available only to holders of advanced licenses.
Tip
Some terminology, such as “governance template,” may be unfamiliar to you. Continue progressing with later articles for an introduction to these topics!
Next steps#
Now that you know the Govern node surfaces the metadata of synced items, learn how to apply item governance in Concept | Adding a governance layer to Dataiku items.
