Tutorial | Test scenarios#

Get started#

Organizations with AI ambitions know that they must not treat data projects as experiments, but rather as IT-critical platforms. To do so, these organizations must find ways to scale operations—moving from development sandboxes to monitored and regulated production environments.

Test scenarios are one piece in Dataiku’s broader MLOps ecosystem aimed at helping organizations meet this objective.

Objectives#

In this tutorial, you will:

Create test scenarios to verify a Dataiku project is ready for production.
Execute these test scenarios on a dedicated quality assurance (QA) infrastructure.
Publish dashboards and reports on the results of these test scenarios.
Use the results of these tests to control a project’s movement through the stages of a deployment pipeline.

Prerequisites#

Dataiku 13.3 or later.
You only need a Design node to create the test scenarios shown here. However, to walk through the deployment stages, you’ll need access to at least one, and preferably two, Automation nodes, connected to your Design node.
If interested in using Python test scenarios, you’ll need a code environment including pytest.

Create the project#

From the Dataiku Design homepage, click + New Project.
Select Learning projects.
Search for and select Test Scenarios.
If needed, change the folder into which the project will be installed, and click Install.
From the project homepage, click Go to Flow (or type g + f).

Note

You can also download the starter project from this website and import it as a zip file.

Use case summary#

The Flow in this project is about predicting churn for a fictitious telecommunications company. However, the actual Flow isn’t important. You can use test scenarios for any kind of Dataiku project that you want to advance into production.

Notably, this Flow happens to include saved models, but test scenarios don’t have specific capabilities for testing models. Dataiku already offers many components dedicated specifically to evaluate and test models:

From simple AutoML results to code-based experiment tracking.
Packaging a model with diagnostics and assertions.
Model evaluation stores.

Model testing in this context means treating the Score and Evaluate recipes as any other kind of data transformation. Rather than testing the model and its metrics, you verify that the output of such recipes matches what’s expected.

Design test scenarios#

Scenarios offer three types of steps for testing various aspects of a Dataiku project:

Test a Flow
Test a Python library
Test a webapp

Tip

You can review these three sections in any order depending on your needs.

Test a Flow#

Any change in a recipe can cause an unexpected change somewhere downstream in the Flow. To ensure that the Flow is still performing as expected before moving it to a production environment, Dataiku provides an integration test step in a scenario.

Craft test datasets#

Ideally, this kind of testing requires crafting reference datasets that cover as many relevant processing cases as possible. This can be a time consuming process, but it ensures the relevance of the result. Without such reference datasets, having test scenarios may provide a false sense of security.

Navigate to the Flow to inspect the datasets in the Test reference inputs & outputs Flow zone.
Observe the relationship between the following Flow datasets and their test references:

Flow dataset	Test reference
unlabelled_customers	test_input_unlabelled_customers
revenue_loss	test_reference_revenue_loss

Important

In this case, the reference datasets are small samples. This may be a good way to get started. However, as you progress, the best practice is to specifically craft these datasets.

Create a test scenario#

The project already includes a scenario that you can use for a Flow integration test. However, it’s not yet designated as a test scenario, and so won’t appear in the project’s test dashboard (which you’ll see later). Let’s do that now.

From the Jobs () menu of the top navigation bar, select Scenarios.
Open the scenario Flow Test.
On the Settings tab of the new scenario, check the box Mark as a test scenario.
Navigate to the Steps tab.

Configure the integration test step#

The integration test step has three key parts to understand:

To be configured	Purpose
Reference inputs	The crafted test datasets that will be used instead of the current input in the Flow.
Build actions	The Dataiku items in the Flow that the scenario will build (as in an ordinary scenario).
Reference outputs	Output datasets to validate the results. Using the new reference input, the scenario will run the requested build, creating a new output dataset. It will then check if this new output matches the reference output.

To see this in action, let’s choose an input dataset at the start of the pipeline and an output dataset at the end.

Click Add Step > Run integration test.
Under Reference inputs, click + Add Remapping.
- For the current input, select unlabeled_customers.
- For the reference input, select test_input_unlabelled_customers.
Under Builds, click + Add Item.
- Select the dataset revenue_loss.
- Click Add Item.
Under Results validation, click + Add Remapping.
- For the current output, select revenue_loss.
- For the reference output, select the dataset test_reference_revenue_loss.
Click Save, but don’t yet run it.

Tip

In this case, we’ve chosen to perform a content comparison across all columns. Depending on the nature of the tests, and therefore the shape of the corresponding reference datasets, you may only want to test a selection of columns.

Run the integration test scenario#

Before running the scenario, let’s introduce an arbitrary Flow change to show how the test would detect it.

Go to the Flow (g + f).
Open the Prepare recipe that outputs the revenue_loss dataset.
Make and save a change to the Formula step to create an output that the reference dataset won’t account for, such as:
```
prediction * ( Total_Charge - (proba_1 * Total_Charge))
```
Return to the Flow Test scenario, and click Run.
Go to the Last runs tab, and see the failure.
Click View step log to see the reported problem.

Let’s review the scenario’s activities in detail:

The scenario swapped an upstream input (unlabeled_customers) with a new test input (test_input_unlabelled_customers).
It built an output (revenue_loss) using the new test input.
It compared the new output (revenue_loss) to the provided reference output (test_reference_revenue_loss).
Due to the change in the Prepare recipe, this comparison failed.

Tip

Feel free to fix the cause of the failing test scenario and confirm it succeeds. Note though that having at least one failing test scenario will be useful before moving to the deployment stage of the tutorial.

Test a Python library#

Python users will be familiar with the pytest testing framework. You can use the same framework for unit tests in Dataiku.

Store tests in a project library#

The first step is writing Python unit tests according to the pytest framework and making them accessible from the project’s code library. The starter project already includes this step.

From the Code () menu of the top navigation bar, select Libraries (or g + l).
Explore the sample code in the python/ folder.

Tip

See the Developer Guide on Project libraries to get started working with code in this way.

Create a test scenario#

Once the actual tests are in place, the next step is having a scenario execute them. The starter project includes an empty scenario, but it’s not yet a test scenario.

From the Jobs () menu of the top navigation bar, select Scenarios.
Open the Python Test scenario.
On the Settings tab of the new scenario, check the box Mark as a test scenario.
Navigate to the Steps tab.

Configure the Python test step#

Next, add the dedicated step to execute the selected Python tests whenever this scenario executes.

Click Add Step > Execute Python test.
Specify the unit tests to run with a pytest selector. In this case, give the folder containing all your tests, which is python/test.
Select a code environment that includes the pytest library.
Click Run.

Run the Python test scenario#

Let’s demonstrate both success and failure.

Navigate to the Last runs tab to review the scenario’s progress. The run you’ve just triggered should have succeeded.
Return to the python/test/test_drift_functions.py file in the project library.
Make and save any edit to introduce a failed test. For example, invert the check in the test_drift_integer_option_percent function so that it reads:
```
assert avg_drift_percent > max_drift_percent_expected
```
Return to the Python Test scenario, and click Run.
View the failure in the Last runs tab.
Click Show log tail or View scenario log to find the cause of the failure.

Tip

Test a webapp#

You can also test webapps. The webapp test step does not require reference datasets or self-written tests. This type of step, at a minimum, checks whether the webapp is able to start and respond to a ping. Users of 13.4+ have the ability to do more robust testing. This can include defining a custom URL to check, or a call and validation of the response to a URL or API.

Create a test scenario#

As with the Python test, the starter project includes an empty scenario, but it’s not yet a test scenario.

From the Jobs () menu of the top navigation bar, select Scenarios.
Open the Webapp Test scenario.
On the Settings tab of the new scenario, check the box Mark as a test scenario.
Navigate to the Steps tab.

Configure the webapp test step#

Next, add a step for a basic webapp test whenever this scenario executes.

Click Add Step > Test Webapp.
For Webapp to test, select WebApp1.
Click Run.

Run the webapp test scenario#

Let’s demonstrate both success and failure.

Navigate to the Last runs tab to review the scenario’s progress. It should succeed.
Open WebApp1, and go to the Settings tab.
Make and save a change that will cause a failure. For example, in the Python tab, uncomment the line to introduce invalid syntax.
Return to the Webapp Test scenario, and click Run.
View the failure in the Last runs tab.
Click View step log to find the cause of the failure.

Tip

Prepare for deployment to production environments#

Now that you’ve seen how these test steps work, you can review the overall test status before moving a project into production.

View the test dashboard#

The test dashboard includes a summary of results from scenarios specifically marked as test scenarios.

From the Jobs () menu of the top navigation bar, select Automation Monitoring.
Navigate to the Test Dashboard tab.
View the latest result of every test scenario in the project.
Note how reports can be downloaded in XML or HTML formats.

Important

Assuming all tests pass, your next step would be to deploy to a QA infrastructure and run them. However, for the purpose of this demonstration, ensure at least one test is failing before continuing.

Create and publish a bundle#

Once you are confident in your test results, follow the standard batch deployment process for publishing a project bundle to the Deployer.

From the More Options () menu of the top navigation bar, select Bundles.
Click + New Bundle.
Name it v1.
Add to the bundle the test reference datasets (test_input_unlabelled_customers and test_reference_revenue_loss), plus the initial uploaded files, labelled_customers and unlabelled_customers.
Also add the two saved models from the Flow.
Click Create.
Select the bundle, and click Publish on Deployer. Confirm to finish.

Important

Only publish the bundle to the Deployer. Don’t yet deploy the bundle to an infrastructure. You’ll do that next.

Leverage test scenarios in deployment pipelines#

Running test scenarios on a Design node is only the beginning. To derive the maximum amount of value from test scenarios, you need to incorporate them into a production workflow.

An organization might leverage test scenarios into the overall framework of their deployment processes in many possible ways. Rather than outline detailed procedures, we’ll illustrate one possibility, which may inspire you to consider how you can incorporate test scenarios into Dataiku’s existing MLOps ecosystem.

Deploy to a QA infrastructure#

The Deployer allows organizations to structure their deployments into lifecycle stages. By default, these stages are often named Dev, Test, and Prod. Organizations though may use their own terminology. The key point is that multiple stages allow for additional levels of scrutiny in the deployment lifecycle.

This is important as development and production environments may have important differences when it comes to settings like connections, code environments, plugins, recipe engines, and user permissions.

The setup includes two Automation nodes: one for QA and one for production. Accordingly, the first step was to deploy the bundle to the QA infrastructure.

We previously published a bundle from the Design node to the Project Deployer.
From the Project Deployer, we deployed the bundle to a QA Automation node called xops-qa.
In doing so, a post-deployment hook running the test scenarios failed.

Use a post-deployment hook to run test scenarios#

Deployment hooks are custom Python actions executed before or after a deployment to an infrastructure.

In this case, the QA infrastructure has a post-deployment hook. Accordingly, the deployment of the failing bundle was still able to succeed. In this setup, we allow failures on the QA infrastructure. We just want to know about them.

Let’s take a look at where the post-deployment hook warning comes from.

On the Project Deployer, go to Infrastructures.
Select your infrastructure.
Navigate to the Settings tab.
Go to the Deployment hooks panel.
View the Post-Deployment Hooks.
The one of interest in this case is called Running all test scenarios.

Below is a sample of this kind of hook.

def execute(requesting_user, deployment_id, deployment_report, deployer_client, automation_client, deploying_user, deployed_project_key, **kwargs):

    project = automation_client.get_project(deployed_project_key)
    cnt = 0
    fail = 0
    for scn in project.list_scenarios(as_type="objects"):
        if scn.get_settings().get_raw()['markedAsTest'] :
            print(f"Execution test scenario {scn.id}")
            cnt +=1
            run_handler = scn.run_and_wait(no_fail=True)
            if run_handler.outcome != "SUCCESS" :
                print(" --> Failure when running test scenario {scn.id}")
                fail += 1
            else :
                print(" --> Execution successful")

    if fail > 0:
        return HookResult.error(f"Failure when running test scenarios ({fail} fail runs out of {cnt} scenarios)")
    else :
        return HookResult.success(f"All test scenarios have been executed successfully ({cnt} scenarios)")

Run test scenarios on the QA infrastructure#

Now that the QA infrastructure has a running bundle, the next step was to execute the test scenarios.

From the QA Automation node, we executed one of the test scenarios.
We checked the results in the test dashboard.
On an Automation node, this test dashboard enables switching between bundles of the same project.
Note that you can download XML or HTML versions of the report.

Tip

One possible use of the report available for download is as an input to sign-off processes in the Govern node.

Prevent a deployment with a pre-deployment hook#

Although this bundle has failing tests, we attempted to deploy it to a production infrastructure. This demonstrates how you can use a pre-deployment hook as an extra layer of security to prevent an unwanted deployment.

From the Project Deployer, we attempted to deploy the failing bundle to a production infrastructure.
This time, a pre-deployment hook caused the project update to fail. The bundle wasn’t deployed to the production infrastructure.

If we look at the deployment hooks attached to the production infrastructure, we find a pre-deployment hook called Validate Test Status with the following code:

import dataikuapi

QA_INFRA_ID = "xops-qa"

def execute(requesting_user, deployment_id, deployment_report, deployer_client, automation_client, deploying_user, deployed_project_key, **kwargs):
    print("----------CONTROLLING TEST STATUS-----------------")
    # Fetch deployment details
    pdpl = deployer_client.get_projectdeployer()
    deplyt = pdpl.get_deployment(deployment_id).get_settings()

    # Identify the bundle that's being deployed
    bundle_id = deplyt.get_raw()['bundleId']
    pub_prj = pdpl.get_project(deplyt.get_raw()['publishedProjectKey'])

    #Find data on the corresponding deployment on QA node
    qa_dpl = pub_prj.get_status().get_deployments(infra_id = QA_INFRA_ID)
    if not qa_dpl:
        return HookResult.error(f"Cannot find deployment of this project on QA instance, please deploy and test first")

    qa_status = qa_dpl[0].get_testing_status(bundle_id).get_raw()
    print(f"Found test report for bundle '{bundle_id}' on infra '{QA_INFRA_ID}':\n{qa_status}")

    if qa_status['nbScenariosPerOutcome']['FAILED'] > 0 :
        return HookResult.error(f"Deployment not allowed as test status of bundle {bundle_id} is not OK ({qa_status})")

    return HookResult.success("All tests are OK, validating deployment to production")

Fix the failing test scenarios on the Design node#

Unless the source of the failing tests is the production environment itself, it’s important to correct failing test scenarios in the Design node project. Then, you can create a new bundle to insert into your deployment pipeline.

On the Design node, fix any problems leading to failures in test scenarios and re-execute them.
Confirm that the test dashboard on the Design node shows no failures.
Create a new bundle.
Deploy the new bundle to the QA infrastructure. The post-deployment hook should report no errors.
Check that the test dashboard on the QA Automation node shows no errors.
Deploy the same bundle to the production infrastructure.
Without a failing test, the deployment to the production infrastructure should pass, with the pre-deployment hook succeeding.

Automate the deployment pipeline#

Once you understand how test scenarios can fit into a deployment pipeline, you can automate all steps as your processes mature.

As a demonstration, the project contains a scenario called Automate Deployment that aims to achieve this objective. When executed, this scenario:

Creates a new project bundle and publishes it to the Deployer.
Updates the deployment on the QA infrastructure (which also runs the post-deployment hook to automatically run all test scenarios, and stop the scenario’s execution if one test fails).
Updates the deployment on the production infrastructure (after running the pre-deployment hook, which ensures all test scenarios have been run successfully on the QA automation node).

Tip

If testing this scenario yourself, be sure to update the deployment IDs to match your project key and infrastructure names.

Integrate test scenarios with Govern workflows#

Many organizations also use a Govern node as a key part of their overall MLOps strategy.

For example, test scenario reports from a QA infrastructure can be an input to deployment sign-off processes in the Govern node.

For an organization keeping a human in the loop of their deployment processes, these kinds of test reports can help build trust and awareness.

Next steps#

Congratulations! In this tutorial, you saw how to incorporate test scenarios into a production workflow within Dataiku’s MLOps ecosystem.

In conclusion, you shouldn’t overlook the overall extensibility of these scenarios. The scenarios shown here focused only on using the test steps. However, you can use any relevant scenario step, including Python steps or custom plugins, in a test scenario.

You can also access test reports through the Python API. See the Developer Guide to get started.

Finally, the test steps themselves are simple enough to use. However, writing effective tests (such as with Python) or crafting reference datasets for testing the Flow can be a considerable investment of time. Accordingly, test scenarios may yield the greatest benefits to projects that have reached a certain level of stability.