Tutorial | MLOps introduction & prerequisites (MLOps part 0)¶
Welcome! Consider that you’ve successfully built a data pipeline, which includes a model. What comes next? Throughout this series of MLOps tutorials, you’ll consider questions like:
How do I deploy my project and models to a production environment?
How do I monitor and evaluate the models in my project while in production?
How do I automate these processes?
Objectives¶
To answer these questions, you’ll gain experience using Dataiku features, such as:
the Evaluate recipe, model evaluation store, and drift analysis
the Project Deployer and the Automation node
the API Deployer and API nodes
the Event server for fetching logs from an API node
Tip
The content in this series of tutorials can also be found in the MLOps Practitioner learning path, alongside concept lessons and quizzes. Register there to track your progress and verify your knowledge. You can also use those courses to prepare for the MLOps Practitioner certification.
Use case¶
You’ll work with a simple credit card fraud use case. Using data about transactions, merchants, and cardholders, we have a Flow including a model that predicts which transactions should be authorized and which are potentially fraudulent.
A score of 1 for the target variable, authorized_flag, represents an authorized transaction.
A score of 0, on the other hand, is a transaction that failed authorization.
Putting this model “into production” can enable two different styles of use cases commonly found in machine learning workflows:
batch scoring (think of bank employees creating monthly fraud reports)
real-time scoring (think of a bank’s internal computer system authorizing each transaction as it happens)
This series of tutorials will demonstrate how to implement both of these scoring methods with Dataiku.
Tip
This use case is just an example to practice monitoring and deploying MLOps projects into production. Rather than thinking about the data here, consider how you’d apply the same techniques and Dataiku features to solve problems that matter to you!
Technical prerequisites¶
Below is a list of technical prerequisites needed to fully reproduce all of the tutorials in the MLOps Practitioner learning path. Specific tutorials may only require a subset of these prerequisites noted at the start of each tutorial.
A business or enterprise license for Dataiku (version 11 or above).
Warning
The free edition and free trials of Dataiku Cloud are not fully compatible.
The starter project for the MLOps tutorials was built on an instance where the builtin Python environment is Python 3.6. (The default Python environment for Dataiku Cloud is 3.6). If the builtin environment for your instance is Python 2.7, you will need your instance administrator to upgrade the instance to Python 3.6.
The Reverse Geocoding plugin (version 2.1 or above) installed on your Dataiku instance. (This plugin is installed by default on Dataiku Cloud).
A Design node connected to both Automation and API nodes and deployment infrastructure.
Note
A user with administrative privileges on your Dataiku instance can follow the reference documentation for setting up the Deployer and API Deployer.
Dataiku Cloud users can consult additional documentation for setting up an Automation node and API node from the Launchpad.
A user profile belonging to a group with access to a deployment infrastructure.
The last tutorial in the series requires the installation and configuration of the Event server. See the API endpoint monitoring tutorial for more details.
In addition, for the batch deployment tutorials, you’ll also need:
Users need to be able to create a project bundle, which requires, for 12.1+ users, the Write project content permission on the TUT_MLOPS project used throughout the learning path. Users on instances prior to 12.1 require the project admin permission.
Optional: A Mail reporter can be configured to send emails via a scenario on both Design and Automation nodes. Dataiku Cloud users can add a mail reporter from their launchpad by clicking Extensions > Add an Extension > Email channel, and then selecting to use this feature on both their Design and Automation nodes.
Section Preview¶
Tutorial | Update a project deployment automatically (MLOps part 3)
Tutorial | Create an API endpoint and test queries (MLOps part 4)
Tutorial | Add an enrichment to a prediction endpoint (MLOps part 6)
Tutorial | Add a dataset lookup endpoint to an API service (MLOps part 7)
Tutorial | Update an API deployment automatically (MLOps part 8)
Tutorial | Manage multiple versions of an API service (MLOps part 9)