Tutorial | Test scenarios#
Get started#
Organizations with AI ambitions know that data projects must not be treated as experiments, but rather as IT-critical platforms. In order to do so, these organizations must find ways to safely scale operations — moving from development sandboxes to properly-monitored and regulated production environments.
Test scenarios are one piece in Dataiku’s broader MLOps ecosystem aimed at helping organizations meet this objective.
Objectives#
In this tutorial, you will:
Create test scenarios to verify a Dataiku project is ready for production.
Execute these test scenarios on a dedicated quality assurance (QA) infrastructure.
Publish dashboards and reports on the results of these test scenarios.
Use the results of these tests to control a project’s movement through the stages of a deployment pipeline.
Prerequisites#
Dataiku 13.3 or later.
You only need a Design node to create the test scenarios shown here. However, to walk through the deployment stages, you’ll need access to at least one, and preferably two, Automation nodes, connected to your Design node.
If interested in using Python test scenarios, you’ll need a code environment including pytest.
Create the project#
From the Dataiku Design homepage, click + New Project.
Select Learning projects.
Search for and select Test Scenarios.
Click Install.
From the project homepage, click Go to Flow (or
g
+f
).
Note
You can also download the starter project from this website and import it as a zip file.
Use case summary#
The Flow in this project is about predicting churn for a fictitious telecom company. However, the actual Flow is not important. You can use test scenarios for any kind of Dataiku project that you want to advance into production.
Notably, this Flow happens to include saved models, but test scenarios do not have specific capabilities for testing models. Dataiku already offers many components dedicated specifically to evaluate and test models: from simple AutoML results to code-based experiment tracking, packaging a model with diagnostics and assertions, and model evaluation stores.
Rather than testing the model and its metrics, model testing, in this context, means treating the Score and/or Evaluate recipes as any other kind of data transformation and verifying that the output of such recipes matches what is expected.
Design test scenarios#
Scenarios offer three types of steps for testing various aspects of a Dataiku project:
Tip
You can review these three sections in any order depending on your needs.
Test a Flow#
Any change in a recipe can cause an unexpected change somewhere downstream in the Flow. To ensure that the Flow is still performing as expected before moving it to a production environment, Dataiku provides an integration test step in a scenario.
Craft test datasets#
Ideally, this kind of testing requires crafting reference datasets that cover as many relevant processing cases as possible. This can be a time consuming process, but it ensures the relevance of the result. Without such reference datasets, having test scenarios may provide a false sense of security.
Navigate to the Flow to inspect the datasets in the Test reference inputs & outputs Flow zone.
Observe the relationship between the following Flow datasets and their test references:
Flow dataset |
Test reference |
---|---|
unlabelled_customers |
test_input_unlabelled_customers |
revenue_loss |
test_reference_revenue_loss |
Important
In this case, we have taken small samples as reference datasets. This may be a good way to get started. However, as you progress, the best practice is to specifically craft these datasets.
Create a test scenario#
The project already includes a scenario that we can use for a Flow integration test. However, it hasn’t yet been designated as a test scenario, and so won’t appear in the project’s test dashboard (which we’ll see later). Let’s do that now.
From the Jobs () menu of the top navigation bar, select Scenarios.
Open the scenario
Flow Test
.On the Settings tab of the new scenario, check the box Mark as a test scenario.
Navigate to the Steps tab.
Configure the integration test step#
The integration test step has three key parts to understand:
To be configured |
Purpose |
---|---|
Reference input(s) |
The crafted test dataset(s) that will be used instead of the current input in the Flow. |
Build action(s) |
The Dataiku item(s) in the Flow that the scenario will build (as in an ordinary scenario). |
Reference output(s) |
Output dataset(s) to validate the results. Using the new reference input, the scenario will run the requested build, creating a new output dataset. It will then check if this new output matches the reference output. |
To see this in action, let’s choose an input dataset at the start of the pipeline and an output dataset at the end.
Click Add Step > Run integration test.
Under Reference inputs, click + Add Remapping.
For the current input, select unlabeled_customers.
For the reference input, select test_input_unlabelled_customers.
Under Builds, click + Add Item.
Select the dataset revenue_loss.
Click Add Item.
Under Results validation, click + Add Remapping.
For the current output, select revenue_loss.
For the reference output, select the dataset test_reference_revenue_loss.
Click Save, but don’t yet run it.
Tip
In this case, we’ve chosen to perform a content comparison across all columns. Depending on the nature of the tests, and therefore the shape of the corresponding reference datasets, you may only want to test a selection of columns.
Run the integration test scenario#
Before running the scenario, let’s introduce an arbitrary Flow change to show how the test would detect it.
Go to the Flow (
g
+f
).Open the Prepare recipe that outputs the revenue_loss dataset.
Make and save a change to the Formula step to create an output that the reference dataset won’t account for, such as:
prediction * ( Total_Charge - (proba_1 * Total_Charge))
Return to the Flow Test scenario, and click Run.
Go to the Last runs tab, and see the failure.
Click View step log to see the reported problem.
Let’s review the scenario’s activities in detail:
The scenario swapped an upstream input (unlabeled_customers) with a new test input (test_input_unlabelled_customers).
It built an output (revenue_loss) using the new test input.
It compared the new output (revenue_loss) to the provided reference output (test_reference_revenue_loss).
Due to the change in the Prepare recipe, this comparison failed.
Tip
Feel free to fix the cause of the failing test scenario and confirm it succeeds. Note though that having at least one failing test scenario will be useful before moving to the deployment stage of the tutorial.
Test a Python library#
Python users will be familiar with the pytest testing framework. You can use the same framework for unit tests in Dataiku.
Store tests in a project library#
The first step is writing Python unit tests according to the pytest framework and making them accessible from the project’s code library. This step has already been done for you.
From the Code () menu of the top navigation bar, select Libraries (or
g
+l
).Explore the sample code in the
python/
folder.
Tip
See the Developer Guide on Project libraries to get started working with code in this way.
Create a test scenario#
Once the actual tests are in place, the next step is having a scenario execute them. An empty scenario has been started for you, but it’s not yet a test scenario.
From the Jobs () menu of the top navigation bar, select Scenarios.
Open the Python Test scenario.
On the Settings tab of the new scenario, check the box Mark as a test scenario.
Navigate to the Steps tab.
Configure the Python test step#
Next, add the dedicated step to execute the selected Python tests whenever this scenario executes.
Click Add Step > Execute Python test.
Specify the unit tests to run with a pytest selector. In this case, give the folder containing all your tests, which is
python/test
.Select a code environment that includes the pytest library.
Click Run.
Run the Python test scenario#
Let’s demonstrate both success and failure.
Navigate to the Last runs tab to review the scenario’s progress. The run you’ve just triggered should have succeeded.
Return to the
python/test/test_drift_functions.py
file in the project library.Make and save any edit to introduce a failed test. For example, invert the check in the
test_drift_integer_option_percent
function so that it reads:assert avg_drift_percent > max_drift_percent_expected
Return to the Python Test scenario, and click Run.
View the failure in the Last runs tab.
Click Show log tail or View scenario log to find the cause of the failure.
Tip
Feel free to fix the cause of the failing test scenario and confirm it succeeds. Note though that having at least one failing test scenario will be useful before moving to the deployment stage of the tutorial.
Test a webapp#
Webapps can also be tested. No reference dataset or self-written tests are required. This type of step checks whether the webapp is able to start and respond to a ping.
Create a test scenario#
As with the Python test, an empty scenario has been started for you, but it’s not yet a test scenario.
From the Jobs () menu of the top navigation bar, select Scenarios.
Open the Webapp Test scenario.
On the Settings tab of the new scenario, check the box Mark as a test scenario.
Navigate to the Steps tab.
Configure the webapp test step#
Next, add a step to start and ping the webapp whenever this scenario executes.
Click Add Step > Test Webapp.
For Webapp to test, select WebApp1.
Click Run.
Run the webapp test scenario#
Let’s demonstrate both success and failure.
Navigate to the Last runs tab to review the scenario’s progress. It should succeed.
Open WebApp1, and go to the Settings tab.
Make and save a change that will cause a failure. For example, in the Python tab, uncomment the line to introduce invalid syntax.
Return to the Webapp Test scenario, and click Run.
View the failure in the Last runs tab.
Click View step log to find the cause of the failure.
Tip
Feel free to fix the cause of the failing test scenario and confirm it succeeds. Note though that having at least one failing test scenario will be useful before moving to the deployment stage of the tutorial.
Prepare for deployment to production environments#
Now that you’ve seen how these test steps work, you can review the overall test status before moving a project into production.
View the test dashboard#
A summary of results from scenarios specifically marked as test scenarios can be found on the test dashboard.
From the Jobs () menu of the top navigation bar, select Automation Monitoring.
Navigate to the Test Dashboard tab.
View the latest result of every test scenario in the project.
Note how reports can be downloaded in XML or HTML formats.
Important
Assuming all tests pass, your next step would be to deploy to a QA infrastructure and run them. However, for the purpose of this demonstration, ensure at least one test is failing before continuing.
Create and publish a bundle#
Once you are confident in your test results, you can follow the standard batch deployment process for creating and publishing a project bundle to the Deployer.
From the More Options () menu of the top navigation bar, select Bundles.
Click + New Bundle.
Name it
v1
.Add to the bundle the test reference datasets (test_input_unlabelled_customers and test_reference_revenue_loss), plus the initial uploaded files, labelled_customers and unlabelled_customers.
Also add the two saved models from the Flow.
Click Create.
Select the bundle, and click Publish on Deployer. Confirm to finish.
Important
Only publish the bundle to the Deployer. Do not yet deploy the bundle to an infrastructure. We’ll do that next.
See also
You can review the process for batch deployment, including the rationale for adding additional content to the bundle, in Tutorial | Batch deployment.
Leverage test scenarios in deployment pipelines#
Running test scenarios on a Design node is only the beginning. To derive the maximum amount of value from test scenarios, you need to incorporate them into a production workflow.
An organization might leverage test scenarios into the overall framework of their deployment processes in many possible ways. Rather than outline detailed procedures, for the remainder of this tutorial, we’ll illustrate one possibility, which may inspire you to consider how test scenarios can be incorporated into Dataiku’s existing MLOps ecosystem.
Deploy to a QA infrastructure#
The Deployer allows organizations to structure their deployments into lifecycle stages. By default, these stages may be named Dev, Test, and Prod. Organizations though may use their own terminology. The key point is that multiple stages allow for additional levels of scrutiny in the deployment lifecycle.
This is important as development and production environments may have important differences when it comes to settings like connections, code environments, plugins, recipe engines, and user permissions.
In our setup, we have two Automation nodes: one for QA and one for production. Accordingly, our first step was to deploy the bundle to the QA infrastructure.
We previously published a bundle from the Design node to the Project Deployer.
From the Project Deployer, we deployed the bundle to a QA Automation node called xops-qa.
In doing so, a post-deployment hook running the test scenarios failed.
Use a post-deployment hook to run test scenarios#
Deployment hooks are custom Python actions executed before or after a deployment to an infrastructure.
In this case, our QA infrastructure has a post-deployment hook. Accordingly, the deployment of the failing bundle was still able to succeed. In this setup, we allow failures on the QA infrastructure. We just want to know about them.
Let’s take a look at where the post-deployment hook warning comes from.
On the Project Deployer, go to Infrastructures.
Select your infrastructure.
Navigate to the Settings tab.
Go to the Deployment hooks panel.
View the Post-Deployment Hooks.
The one of interest in our case is called Running all test scenarios.
Below is a sample of this kind of hook.
import time
def execute(requesting_user, deployment_id, deployment_report, deployer_client, automation_client, deploying_user, deployed_project_key, **kwargs):
project = automation_client.get_project(deployed_project_key)
cnt = 0
time.sleep(3)
for scn in project.list_scenarios(as_type="objects"):
if scn.get_settings().get_raw()['markedAsTest'] :
print(f"Execution test scenario {scn.id}")
cnt +=1
run_handler = scn.run_and_wait()
if run_handler.outcome != "SUCCESS" :
HookResult.error("Failure when running test scenario {scn.id}")
else :
print(" --> Execution successful")
return HookResult.success(f"All test scenarios have been executed successfully ({cnt} scenarios)")
Run test scenarios on the QA infrastructure#
Now that we have a bundle running on the QA infrastructure, the next step was to execute the test scenarios.
From the QA Automation node, we executed one of the test scenarios.
We checked the results in the test dashboard.
On an Automation node, this test dashboard enables us to switch between bundles of the same project.
Note that you can download XML or HTML versions of the report.
Tip
One possible use of the report available for download is as an input to sign-off processes in the Govern node.
Prevent a deployment with a pre-deployment hook#
Although this bundle has failing tests, we attempted to deploy it to a production infrastructure. This will demonstrate how a pre-deployment hook can be used as an extra layer of security to prevent an unwanted deployment.
From the Project Deployer, we attempted to deploy the failing bundle to a production infrastructure.
This time, a pre-deployment hook caused the project update to fail. The bundle was not deployed to the production infrastructure.
If we look at the deployment hooks attached to the production infrastructure, we find a pre-deployment hook called Validate Test Status with the following code:
import dataikuapi
QA_INFRA_ID = "xops-qa"
def execute(requesting_user, deployment_id, deployment_report, deployer_client, automation_client, deploying_user, deployed_project_key, **kwargs):
print("----------CONTROLLING TEST STATUS-----------------")
# Fetch deployment details
pdpl = deployer_client.get_projectdeployer()
deplyt = pdpl.get_deployment(deployment_id).get_settings()
# Identify the bundle that is being deployed
bundle_id = deplyt.get_raw()['bundleId']
pub_prj = pdpl.get_project(deplyt.get_raw()['publishedProjectKey'])
#Find data on the corresponding deployment on QA node
qa_dpl = pub_prj.get_status().get_deployments(infra_id = QA_INFRA_ID)
if not qa_dpl:
return HookResult.error(f"Cannot find deployment of this project on QA instance, please deploy and test first")
qa_status = qa_dpl[0].get_testing_status(bundle_id)
print(f"Found test report for bundle '{bundle_id}' on infra '{QA_INFRA_ID}':\n{qa_status}")
if qa_status['nbScenariosPerOutcome']['FAILED'] > 0 :
return HookResult.error(f"Deployment not allowed as test status of bundle {bundle_id} is not OK ({qa_status})")
return HookResult.success("All tests are OK, validating deployment to production")
Fix the failing test scenarios on the Design node#
Unless the source of the failing tests is the production environment itself, when faced with failing test scenarios, it is important to correct them in the Design node project and create a new bundle to insert into your deployment pipeline.
On the Design node, fix any problems leading to failures in test scenarios and re-execute them.
Confirm that the test dashboard on the Design node shows no failures.
Create a new bundle.
Deploy the new bundle to the QA infrastructure. The post-deployment hook should report no errors.
Check that the test dashboard on the QA Automation node shows no errors.
Deploy the same bundle to the production infrastructure.
Without a failing test, the deployment to the production infrastructure should pass, with the pre-deployment hook succeeding.
Automate the deployment pipeline#
Once you understand how test scenarios can fit into a deployment pipeline, you can automate all steps as your processes mature.
As a demonstration, the project contains a scenario called Automate Deployment that aims to achieve this objective. When executed, this scenario:
Creates a new project bundle and publishes it to the Deployer.
Updates the deployment on the QA infrastructure (which also runs the post-deployment hook to automatically run all test scenarios, and stop the scenario’s execution if one test fails).
Updates the deployment on the production infrastructure (after running the pre-deployment hook, which ensures all test scenarios have been run successfully on the QA automation node).
Tip
If testing this scenario yourself, be sure to update the deployment IDs to match your project key and infrastructure names.
See also
To test out these scenario steps in a simpler context, see Tutorial | Deployment automation.
Integrate test scenarios with Govern workflows#
Many organizations also use a Govern node as a key part of their overall MLOps strategy. Test scenarios can be incorporated into Govern workflows.
For example, some organizations may find that the reports generated by test scenarios on a QA infrastructure (as shown above) can be a vital input to deployment sign-off processes occurring in the Govern node.
For an organization keeping a human in the loop of their deployment processes, these kinds of test reports can help build trust and awareness as they try to scale their efforts.
What’s next?#
Congratulations! In this tutorial, you saw how test scenarios can be incorporated into a production workflow within Dataiku’s MLOps ecosystem.
In conclusion, the overall extensibility of these scenarios should not be overlooked. The scenarios shown here focused only on using the test steps. However, any relevant scenario step, including Python steps or custom plugins, can be used in a test scenario.
Test reports can also be accessed through the Python API. See the Developer Guide to get started.
Finally, it’s important to recognize that although the test steps themselves are simple to use, writing effective tests (such as with Python) or crafting reference datasets for testing the Flow can be a considerable investment of time. Accordingly, test scenarios may yield the greatest benefits to projects that have reached a certain level of stability.
See also
For more information, including best practices for testing, see the reference documentation on Testing a project.