Concept: The Public API

In another lesson, we introduced how the dataiku package enables low-level interaction with Dataiku DSS objects, such as datasets, managed folders, and saved models.

The public API, on the other hand, can accomplish a wide variety of administration and automation tasks.

To name a few examples, it can be used to manage:

  • users and groups,

  • projects and project folders,

  • connections and code environments,

  • jobs, notebooks, and scenarios,

  • the lifecycle of machine learning models.

The product documentation covers its many other capabilities.

A slide suggesting all of the areas in which the public API can be a benefit.

The public API is an HTTP REST API, but it is also available as a Python API client, which is the recommended way to interact with this API. The wrapper for the public REST API is the dataikuapi Python package. However, in this lesson, we’ll use this package under the hood, by using a DSS client inside the dataiku package.

A slide introducing how to interact with the public API through the Python client.

Public API Examples

Let’s see how this works in a notebook. When using methods from the public API inside Dataiku DSS, we only need to import the dataiku package, and then establish the API client with this line of code.

Instead of providing an API key, the API client will automatically inherit connection credentials from the current context.

import dataiku
client = dataiku.api_client()
client
Jupyter notebook output showing the output from the code above creating a project handle.

Once we have a DSS client, we can perform all authorized actions. For example, you could list all project keys on the instance with the line below.

client.list_project_keys()
Jupyter notebook output showing the output from the code above listing project keys.

From this client handle, we can work our way to other objects, such as a specific project. From a project handle, we have a wide range of methods we can use.

project = client.get_project("DKU_HAIKU")
Jupyter notebook output showing methods available for a project handle.

For example, let’s create a handle on a dataset in this project. Note that this object, one created with a method from the public API, is not the same as a dataiku.Dataset object that you see in any code recipe.

dataikuapi_ds = project.get_dataset("Orders")
dataikuapi_ds
Jupyter notebook output showing the output from creating a dataset handle.

With a dataset handle, we can build, clear, copy, or delete it. We can create ML tasks or statistics worksheets. We can get and set metadata, schema, settings, and zone information. In other words, we have a wide range of high-level capabilities available.

Jupyter notebook output showing methods available to a dataset handle.

For the sake of comparison, let’s create the more familiar dataiku.Dataset object for the same dataset.

dataiku_ds = dataiku.Dataset("Orders")
print(dataiku_ds)
Jupyter notebook output showing the output from creating a core Dataiku dataset object.

Now we have a more narrow range of methods, such as get_dataframe(). With a dataiku Dataset object, the scope is focused on different ways to read or write the dataset.

Jupyter notebook output showing methods available to a core Dataiku dataset object.

Let’s give a more administrative example.

  • We can obtain a dictionary of all groups on an instance;

dss_groups = client.list_groups()
  • create a group;

new_group = client.create_group('new_group', description='test group', source_type='LOCAL')
  • modify its settings;

group_definition = new_group.get_definition()
group_definition['description'] = 'New description'
new_group.set_definition(group_definition)
  • and when needed, delete it.

group = client.get_group('new_group')
group.delete()

This lesson introduced the scope of possibilities available by coding with the public API. You’ll have a chance to explore more of these methods in the hands-on exercises.