Concept | Real-time APIs#

Watch the video

In this lesson, we’ll introduce the basics of real-time APIs and how they work in Dataiku.

Suppose we have a project with a prediction model. We’ve built the model in a Design node, using Dataiku’s visual ML tool. The model implements data cleansing, feature preprocessing, and model training.

We may want to deploy this model from the Design node into production so that it will be used to score data in real-time.

In a credit card fraud use case for example, as a model receives incoming credit card transactions, it immediately tells us whether the transactions are fraudulent or authorized.

Image showing real-time scoring for incoming transactions.

Dataiku lets us implement this type of real-time scoring strategy by exposing our prediction model as an API endpoint. This endpoint, along with possibly other endpoints (that perform tasks not limited to scoring), can be exposed on an API service, which will then be deployed on the API node.

Image showing deployment for real-time scoring.

Comparing real-time processing with batch processing, recall that batch processing works on available records together (or in batches) and at specified times (such as daily, weekly, etc.) to return a batch of results. You implement batch deployment in Dataiku by deploying your project bundle to the Automation node. In contrast, for real-time processing, you deploy an API service to the API node.

Note

For review, return to the article on How the Dataiku Architecture Supports MLOps.

For the rest of the lesson, we’ll focus on real-time APIs and how they work in Dataiku.

Key terminology#

When deploying to the API node in Dataiku, there are some important terms to know.

API terminology#

Term

Definition

API (Application Programming Interface)

An API is a software intermediary that allows two applications to talk to each other and exchange data over a network. For example, one can use a Google API to get weather data from a Google server.

API endpoint

An API endpoint is a single path on the API: that is, a URL to which HTTP requests are posted, and from which a response is expected. It is the place where two applications can interact. Each endpoint fulfills a single function, e.g. performing a dataset lookup.

API service

A Dataiku API Service is the unit of management and deployment for the API node. One API service can host several endpoints.

API Designer

An API Designer is available in each Dataiku project (whether the project is on the Design or Automation node). You use the API Designer to create, design, and develop APIs.

API Deployer

The API Deployer is one component of the Deployer. The API Deployer is the interface for deploying API services and endpoints to API nodes. The API Deployer manages several deployment infrastructures that can either be static API nodes or containers that run API nodes in a Kubernetes cluster.

API node

An API node is the application server that does the actual job of answering HTTP requests. Once an API service has been designed, you can deploy it on the API node.

Summary#

To tie it all together:

  • API services are designed in one or more Design or Automation nodes using the API Designer in a Dataiku project.

  • Each API service can contain several endpoints, with each endpoint fulfilling a single function.

  • The API service is then pushed to the API Deployer, which in turn deploys the API service to one or more API nodes.

  • Finally, the API nodes are the application servers that do the actual job of answering API calls.

What’s next?#

Now that you’ve had an introduction to the basics of real-time APIs and how they work in Dataiku, see the lesson on API Endpoints to see how to create an API service. Later, you can follow the tutorials to gain experience doing this by yourself!

Note

The reference documentation provides more details about Real-time APIs in Dataiku.