Concept | Real-time APIs#

Watch the video

On our Design node, suppose we have a Dataiku project that includes a prediction model in the Flow. We can deploy this model from the Design node into a production environment using a batch or real-time scoring framework. Let’s focus on the latter.

Use case#

In a credit card fraud use case for example, as a model receives an incoming credit card transaction, it can immediately generate a prediction as to whether the transaction should be authorized or flagged as fraudulent.

Image showing real-time scoring for incoming transactions.

Dataiku lets us implement this type of real-time scoring strategy by exposing our prediction model as an API endpoint. This endpoint, along with possibly other endpoints (that perform tasks not limited to scoring), can be exposed on an API service, which will then be deployed on an API node (or potentially an external infrastructure).

Image showing deployment for real-time scoring.

Comparing real-time processing with batch processing, recall that batch processing works on available records together (or in batches) and often at specified times (such as daily, weekly, etc.) to return a batch of results.

You implement batch deployment in Dataiku by deploying your project bundle to the Automation node. In contrast, for real-time processing, you deploy an API service to an API node (or external infrastructure).

Note

For review, return to Concept | Dataiku architecture for MLOps.

Key terminology#

When implementing a real-time processing workload, there are some important terms to know.

API terminology#

Term

Definition

API (Application Programming Interface)

An API is a software intermediary that allows two applications to talk to each other and exchange data over a network. For example, one can use a Google API to get weather data from a Google server.

API endpoint

An API endpoint is a single path on the API: that is, a URL to which HTTP requests are posted, and from which a response is expected. It is the place where two applications can interact. Each endpoint fulfills a single function, e.g. returning a prediction or looking up a value in a dataset.

API service

A Dataiku API Service is the unit of management and deployment for the API node. One API service can host several API endpoints.

API Designer

The API Designer is available in each Dataiku project. You use the API Designer to create API services — and within those services, API endpoints.

API Deployer

The API Deployer is one component of the Deployer. It is the interface for deploying API services to API nodes or other external infrastructures. The API Deployer manages several deployment infrastructures that can either be static API nodes or containers that run API nodes in a Kubernetes cluster.

API node

An API node is the Dataiku application server that does the actual job of answering HTTP requests. Once an API service has been designed, you can deploy it on an API node via the API Deployer.

Summary#

To tie it all together:

  • API services are designed in one or more Design or Automation nodes using the API Designer in a Dataiku project.

  • Each API service can contain several API endpoints, with each endpoint fulfilling a single function.

  • The API service is then pushed to the API Deployer, which in turn deploys the API service to one or more API nodes (if not some external infrastructure).

  • Finally, the API nodes are the application servers that do the actual job of answering API calls.

What’s next?#

Now that you’ve had an introduction to the basics of real-time APIs and how they work in Dataiku, learn more about the API endpoints within an API service.

See also

See the reference documentation for more details about API Node & API Deployer: Real-time APIs.