Tutorial | Dataiku Govern framework#

Get started#

Dataiku Govern enables analytics leaders and project managers to track the progress of multiple data initiatives at the same time. We’ll show this by walking through how workflows and processes function in the machine learning model lifecycle.

Objectives#

In the following exercises, you will:

  • Govern a project, model, and model versions.

  • Advance through a project and model version workflow to prepare for deployment.

  • Understand the sign-off process for the model version you want to deploy.

  • Monitor metrics of your model version to ensure model health.

Use case#

Throughout the tutorial, you will go through the steps to manage a Dataiku project dealing with credit card fraud modeling. Specifically, we’ll use our governance framework to make sure that the bundles and predictive models in this project have proper oversight before, during, and after deployment.

Prerequisites#

To reproduce the steps outlined in this tutorial:

  • You must have the Enterprise license for Dataiku (version 12.2 or above).

  • Your user profile should be either DESIGNER or VISUAL_DESIGNER as defined by your admin.

  • You need access to a Dataiku Govern instance that is connected to a Design node.

  • You have to install the Reverse Geocoding plugin on your connected Design node.

Create the project#

We’ll start in the Design node to import your project.

  1. From the Dataiku homepage, click +New Project > DSS tutorials > MLOps Practitioner > Dataiku Govern.

    Note

    You can also download the project zip file and import it directly.

  2. On your project homepage, rename the project <YOUR_INITIALS> Govern Tutorial.

  3. From the Applications menu, open Dataiku Govern.

    Dataiku screenshot of the Dataiku Govern option in the Applications menu.

Your new project in the Design node should be appear automatically in your Govern node.

Govern items#

Govern a project#

Let’s see our centralized projects and learn how to govern them.

  1. On the Govern homepage, click on the Governable items tile.

  2. Find your project on this page. Use the search function if necessary.

  3. Click on the Govern button in this row.

  4. If the Governance template dropdown appears, select Dataiku Standard.

  5. Uncheck the Govern all of its Bundles checkbox.

  6. Click Govern.

An image highlighting the Govern tutorial project on the Governable items page.

This creates a new Govern item for the project. Now, you’ll be able to manage your workflow for that project.

Govern a model and model version#

Let’s govern the model and model versions in your project. Remember that the parent project and model must be governed to also govern model versions.

  1. Select Model registry from the navigation bar and find your project on the page.

  2. Expand your project row and select the Govern button next to the Predict authorized_flag (binary) model.

  3. Keep the Govern all of its Saved Model Versions button checked.

  4. If the Governance template dropdown appears, select Dataiku Standard.

  5. Click Govern.

Screenshot highlighting the model row that needs to be governed in this project.

Note

You can govern individual model versions if needed. For instance, you can decide to only govern the active model version by selecting the Govern button in the model version’s respective row.

Execute workflows#

Next, we’ll learn why workflows are the key to item management in Dataiku Govern.

Follow the standard project workflow#

Before deploying your model, let’s advance through the initial parent project workflow steps.

  1. To find your Govern project, navigate to the Governed projects page.

  2. Locate your project and click on it to open the project page.

Note

Because we aren’t designing a project in this tutorial, we have imported a completed project. However, in a real-world context, we would have begun these steps before completing the project.

Exploration#

A project should be in the Exploration step when a team is formulating specifications for the project.

  1. Click on the Exploration step under Workflow in the left panel.

  2. Select the Edit button.

  3. In the Notes section of Step 1 - Exploration, type: This project will use a data pipeline to model credit card fraud.

  4. Click the Save button.

Tip

For this use case, a requirements document might be a great file to upload under Supporting Documents in the Exploration step.

Once the project is defined, we can move on to the Qualification step. This step allows you to decide whether or not to continue the project.

Qualification#

Perhaps you discover that this project is too high effort for too little value. You could then decide to not move forward with project development. However, in our use case, let’s say that the project has a high feasibility and a high value.

  1. Click Edit and select Set as Finished to mark the Exploration step as done.

  2. Under Step 2, select High for the Value Rating field and High for the Feasibility Rating field.

  3. Using this information, we’ll decide to continue the project and select Go under the Resulting Decision field.

  4. Select Set as Finished on Step 2 - Qualification and Save.

Screenshot showing the filled out exploration and qualification steps of the project.

Now, the project is on Step 3, or In Progress.

Begin the model version workflow#

Now that your Govern project is In Progress, it is appropriate to begin your Govern model version workflow.

  1. Return to the Overview tab of your Govern project.

  2. Under Related items, click to open the Predict authorized_flag (binary) model.

  3. Next to Governed model versions, click to open the Random forest (s1) - v2 model version.

  4. Navigate to the Development step under the workflow in the left menu.

Development#

The Development step lets you store information about a model version that is in development. For instance, imagine that you want to note the purpose of a model version to keep the development on track. You can do so in this step.

  1. Select Edit.

  2. In the Notes field, write Design model to detect credit card fraud.

  3. Click Save.

Screenshot showing the Development step of a model version.

As a data scientist, once the model is ready in the Design node you can mark the Development step as finished.

  1. Select Edit and click Set as Finished next to Step 1 - Development.

  2. Save to finish this step.

Review items#

The AI management lifecycle benefits from a thorough review process and consistent monitoring of model metrics and health. Let’s practice!

Configure sign-off#

As a reminder, our project uses a model to predict whether certain credit card transactions are fraudulent. Imagine this model in production: you need to ensure that the model is working correctly (flagging only fraudulent charges) before and after deployment. Fortunately, we can address this concern using sign-offs and model monitoring in Dataiku Govern.

First, you need to assign yourself to the sign-off. To do so:

  1. Open your Govern Tutorial project page.

  2. Click Edit on the project overview and scroll to the Sign-off reviewers and approvers section.

  3. Add yourself as a reviewer in any slot.

  4. Add yourself as a final approver and Save.

A screenshot of the sign-off configuration in the Govern project overview.

Request feedback#

Next, return to the Random forest (s1) - v2 Govern model version page. An easy way to do this is by clicking through the project’s related items. When you’re there:

  1. Go to step 2: Review. Notice the Not Started label under this step.

  2. Click on Request Feedback.

  3. In the Request feedback window, choose the Select users to notify option.

  4. Do not select any users. This ensures that no users on your instance will receive an email notification about this request.

  5. Select Request Feedback.

A screenshot of the email notification dialogue window.

Now you should see that your review has been requested!

Add feedback#

Assume you find out the model might have some location bias that causes imbalanced fraud alerts depending on the merchant_geopoint value.

  1. Click on the Review button next to “You have been requested to review this object.”

  2. Set the Status to Minor issue.

  3. In the Optional comment box, write This model might have some location bias. Please review if you should keep merchant_geopoint as a model feature.

  4. Then click Submit Review.

Edit feedback#

At this point, it is possible to add, edit, or delete your feedback.

  1. Click on the A picture of the more options button. More Options menu next to your feedback.

  2. Select Edit feedback.

  3. Change the approval status to Major issue.

  4. Now, Save Changes.

A screenshot of the edited feedback that has been submitted.

Request approval#

For additional practice, you will also act as the Final Approver. To start this process:

  1. Click Request Final Approval.

  2. As before, choose Select users to notify and click Request Final Approval.

Note

If you want to go back to the feedback stage from the final approval stage, you can open the More Options menu and select Go back to feedback stage.

Add final approval#

Now you will see the option to add a review. Let’s say you started to investigate the potential location bias based on the feedback review. Assume you find out the location is a proxy variable for race.

  1. Click on Review.

  2. Set the Status to Rejected.

  3. Add the following comment: The model must be redesigned to account for location and racial bias.

  4. Click Submit Review.

A screenshot of the "Complete review" dialogue window with filled out fields.

If connected, this approval will then sync to the Deployer node. Whether it can be deployed depends on the Govern policy defined in the Deployer node.

Abandon sign-off#

Imagine this credit card fraud project has been deprioritized. To completely quit the sign-off:

  1. Select the More Options menu at the top of the sign-off.

  2. Click Abandon sign-off > Abandon.

You should then see that reviews and other configurations are no longer editable.

A screenshot of the Abandon button in Dataiku Govern.

Note

If you have accidentally abandoned the sign-off and want to “undo” this action, you can open the More Options menu and click Cancel abandon. This will reopen the sign-off with all of the previous feedback and approval data.

Reset sign-off#

Now, pretend that later in the year, this project becomes active again. Additionally, there have been new members added to the organization. Let’s reset the sign-off completely to remove any previous configurations and restart the process.

  1. Select the More Options menu at the top of the sign-off.

  2. Click Reset sign-off.

  3. In the Confirm reset window, make sure that the Reload the latest sign-off configuration checkbox is not selected.

  4. Lastly, Reset.

A screenshot of the Reset sign-off button in Dataiku Govern.

At this point, you have successfully practiced an end-to-end sign-off! Now let’s move on to the governance lifecycle after deployment.

Review metrics#

Monitoring model metrics is vital to ensure the longevity and robustness of your ML models. In our case, we want to ensure that the credit card fraud predictive model is still doing a good job after being deployed. Let’s return to the Model registry page once again to review model metrics.

  1. Expand your project row and the Predict authorized_flag (binary) model to access model versions.

  2. Highlight the row Random forest (s1) - v2, opening the right Details panel.

  3. Click on the small chart icon in the Details panel to open the Model metrics tab.

Screenshot highlighting where to find metrics in the Details panel.

Here, you can see the relevant metrics associated with the model version. Take note that the drift metrics appear graphically in the Model metrics tab.

View row metrics#

If you want to see a specific metric in the row of a model version:

  1. Click on the Metric to Focus dropdown.

  2. Select any metric that you wish to see.

Screenshot directing you to the Metric to Focus dropdown.

Open metric source#

  1. Click on the small icon next to the model version name in the Details panel to open the source model in the Design node.

  2. Review the panels under the Performance section of the left menu. Here, you can see many different criteria that measure model performance.

  3. Next, go to the Flow and open the mes_for_ground_truth model evaluation store (MES) in the Ground Truth Monitoring Flow zone.

Here, you can find more information about the different drift metrics of the model. Note that these won’t always be available for the model version.

Screenshot highlighting drift metric links in Dataiku.

Note

Metrics from our example project may not reflect outcomes of real-world conditions. To better understand the MES and drift metrics, visit our documentation on model evaluation stores and drift metrics.

Summary#

Congratulations! You have successfully:

  • Governed your first items.

  • Leveraged standard item workflows.

  • Prepared a model version for deployment.

  • Reviewed model metrics in the Govern node and the Design node.