Tutorial | Dataiku Govern framework#

Get started#

Dataiku Govern enables analytics leaders and project managers to track the progress of multiple data initiatives at the same time. We’ll show this by walking through how workflows and processes function in the machine learning model lifecycle.

Objectives#

In the following exercises, you will:

  • Govern a project, model, and model versions.

  • Advance through a project and model version workflow to prepare for deployment.

  • Understand the sign-off process for the model version you want to deploy.

  • Monitor metrics of your model version to ensure model health.

Caution

If you are using a shared instance of Dataiku Govern, be mindful that other users will be able to view what you have created in this tutorial.

Use case#

Throughout the tutorial, you will go through the steps to manage a Dataiku project dealing with credit card fraud modeling. Specifically, we’ll use our governance framework to make sure that the bundles and predictive models in this project have proper oversight before, during, and after deployment.

Prerequisites#

To reproduce the steps outlined in this tutorial, you will need:

  • Dataiku 13.3 or later.

  • A Dataiku Govern instance that is connected to a Design node.

  • The Reverse Geocoding plugin on your connected Design node.

  • A Full Designer or Governance Manager user profile.

We’ll start in the Design node to import your project.

Create the project#

  1. From the Dataiku Design homepage, click + New Project.

  2. Select Learning projects.

  3. Search for and select Dataiku Govern.

  4. Click Install.

  5. From the project homepage, click Go to Flow (or g + f).

Note

You can also download the starter project from this website and import it as a zip file.

Once you have the project:

  1. On your project homepage, rename the project <YOUR_INITIALS> Govern Tutorial to make it easier to identify.

  2. From the waffle menu, open Dataiku Govern.

Your new project in the Design node should appear automatically in your Govern node.

Govern items#

The first step is choosing which items need a governance layer.

Govern a project#

The Dataiku items that you may wish to govern appear in the Governable items tab.

  1. On the Govern homepage, click on the Governable items tab.

  2. Find your project in the Projects to Govern section of the Governable items tile. Use the search function if necessary.

  3. Click on the gavel icon to govern the project.

  4. If necessary, for the Project template, select Dataiku Standard.

  5. In the Rules for existing items column of the Govern children table, ensure that both the saved model and model versions are checked to apply the standard template to these items.

  6. Click Govern.

Dataiku screenshot of the dialog to govern a project.

Govern a model and model version#

Let’s govern the model and model versions in your project. Remember that the parent project and model must be governed to also govern model versions.

Recall that when adding the governance layer to the Dataiku project, we chose to also govern the existing saved model and model versions.

  1. Navigate to the Model registry tab.

  2. Expand the row corresponding to your project.

  3. Confirm the Govern status of the project’s saved model.

Screenshot highlighting the model row that needs to be governed in this project.

Execute workflows#

Next, we’ll learn why workflows are the key to item management in Dataiku Govern.

Follow the standard project workflow#

Before deploying your model, let’s advance through the initial parent project workflow steps.

  1. To find your Govern project, navigate to the Governed projects tab.

  2. Locate your project, and click on its title to open its overview.

Note

Because we aren’t designing a project in this tutorial, we have imported a completed project. However, in a real-world context, we would have begun these steps before completing the project.

Exploration#

A project should be in the Exploration step when a team is formulating specifications for the project.

  1. Click the Exploration step under Workflow in the left panel.

  2. Click Edit.

  3. In the Notes section, paste something like: This project will use a data pipeline to model credit card fraud..

  4. Click Save.

Tip

For this use case, a requirements document might be a great file to upload under Supporting Documents in the Exploration step.

Once the project is defined, we can move on to the Qualification step. This step allows you to decide whether or not to continue the project.

Qualification#

Perhaps you discover that this project is too high effort for too little value. You could then decide to not move forward with project development. However, in our use case, let’s say that the project has a high feasibility and a high value.

  1. Click Edit and then Set as Finished to mark the Exploration step as complete.

  2. Scroll to the Qualification step.

  3. Set the Value Rating field to High.

  4. Set the Feasibility Rating field to High.

  5. Click Set as Finished to mark this step complete as well.

  6. Click Save, bringing you to the In Progress step.

Screenshot showing the filled out exploration and qualification steps of the project.

Begin the model version workflow#

Now that your Govern project is in progress, it is appropriate to begin your model version workflow.

  1. Above the project workflow, click Overview.

  2. Scroll to Related items, and click to open the Predict authorized_flag (binary) model. (We want to stay within the Govern node for now so don’t click on the link to the saved model in the Design node).

  3. Under Governed model versions, click to open the active version of the model, Random forest (s1) - v2 model version. (Once again, we want to stay within the Govern node).

  4. Navigate to the Development step of the workflow.

Development#

The Development step lets you store information about a model version that is in development. For instance, imagine that you want to note the purpose of a model version to keep the development on track. You can do so in this step.

  1. Click Edit.

  2. In the Notes field, copy-paste: Design model to detect credit card fraud..

  3. Let’s assume this step is also now completed, and so click Set as Finished.

  4. Click Save, thereby advancing to the Review step.

Screenshot showing the Development step of a model version.

Review items#

The AI management lifecycle benefits from a thorough review process and consistent monitoring of model metrics and health. Let’s practice!

Configure sign-off#

As a reminder, our project uses a model to predict whether certain credit card transactions are fraudulent. Imagine this model in production: you need to ensure that the model is working correctly (flagging only fraudulent charges) before and after deployment. We can address this concern using sign-offs and model monitoring in Dataiku Govern.

First, you need to assign yourself to the sign-off. To do so:

  1. Return to the Govern Tutorial project overview page.

  2. Click Edit on the project overview and scroll to the Sign-off reviewers and approvers section.

  3. Add yourself as a reviewer in any slot.

  4. Add yourself as a final approver.

  5. Click Save when finished.

A screenshot of the sign-off configuration in the Govern project overview.

Request feedback#

Next, return to the Random forest (s1) - v2 Govern model version page. An easy way to do this is by clicking through the project’s related items. When you’re there:

  1. Go to the Random forest (s1) - v2 Govern model version page. One way to do this from the parent project is to go to Source Objects > Related items > Saved models.

  2. Go to the Review step. Notice the Not Started label under this step.

  3. Click on Request Feedback.

  4. In the Request feedback window, choose the Select users to notify option.

  5. Do not select any users. This ensures that no users on your instance will receive an email notification about this dummy request.

  6. Click Request Feedback.

Dataiku screenshot of the screen to request feedback on a model version.

Now you should see that your review has been requested!

Add feedback#

Assume you find out the model might have some location bias that causes imbalanced fraud alerts depending on the merchant_geopoint value.

  1. Click Review next to “You have been requested to review this object.”

  2. Set the Status to Minor issue.

  3. In the Optional comment box, copy-paste This model might have some location bias. Please review if you should keep merchant_geopoint as a model feature..

  4. Click Submit Review.

Dataiku screenshot of the dialog for adding feedback on a model version review.

Tip

At this point, it is possible to add, edit, or delete your feedback using the A picture of the more options button. More Options menu next to your feedback.

Request approval#

For additional practice, you will also act as the final approver. To start this process:

  1. Click Request Final Approval.

  2. As before, choose Select users to notify.

  3. Leave all users unchecked.

  4. Click Request Final Approval.

Note

If you want to go back to the feedback stage from the final approval stage, you can open the More Options menu and select Go back to feedback stage.

Add final approval#

Now you will see the option to add a review. Let’s say you started to investigate the potential location bias based on the feedback review. Assume you find out the location is a proxy variable for race.

  1. Click Review on the final approval request.

  2. Set the Status to Rejected.

  3. Copy-paste the following comment: The model must be redesigned to account for location and racial bias..

  4. Click Submit Review.

Important

If connected, this approval will then sync to the Deployer node. Whether it can be deployed depends on the Govern policy defined in the Deployer node.

Abandon sign-off#

Imagine this credit card fraud project has been de-prioritized. To completely quit the sign-off:

  1. Find the More Options menu of the model version sign-off at the beginning of the Review step.

  2. Click Abandon sign-off > Abandon.

A screenshot of the Abandon button in Dataiku Govern.

You should then see that reviews and other configurations are no longer editable.

Tip

If you have changed your decision, you can open the More Options menu and click Cancel abandon. This will reopen the sign-off with all of the previous feedback and approval data.

Reset sign-off#

Imagine that later in the year, this project becomes active again. Additionally, there have been new members added to the organization. Let’s reset the sign-off completely to remove any previous configurations and restart the process.

  1. Open the More Options menu at the top of the model version sign-off in the Review step.

  2. Click Reset sign-off.

  3. In the Confirm reset window, make sure that the Reload the latest sign-off configuration checkbox is not selected.

  4. Click Reset.

A screenshot of the Reset sign-off button in Dataiku Govern.

At this point, you have successfully practiced an end-to-end sign-off! Now let’s move on to the governance lifecycle after deployment.

Review metrics#

Monitoring model metrics is vital to ensure the longevity and robustness of your ML models. In our case, we want to ensure that the credit card fraud predictive model is still doing a good job after being deployed.

  1. Navigate to the Model registry tab with the Govern node.

  2. Expand your project row.

  3. Expand the Predict authorized_flag (binary) model.

  4. Select the active Random forest (s1) - v2 model version.

  5. Click on the small chart icon in the Details panel to view the Model metrics.

Screenshot highlighting where to find metrics in the Details panel.

Tip

If you want to see a specific metric in the row of a model version, use the Metric to Focus dropdown near the top right. Of course, you can also click the external link icon to view these assets in the Design node. If these assets are unfamiliar to you, see the ML Practitioner and MLOps Practitioner learning paths.

Summary#

Congratulations! You have successfully:

  • Governed your first items.

  • Leveraged standard item workflows.

  • Prepared a model version for deployment.

  • Reviewed model metrics in the Govern node.