Concept: The Public API

In another lesson, we introduced how the dataiku package enables low-level interaction with Dataiku DSS objects, such as datasets, managed folders, and saved models.

The public API, on the other hand, can accomplish a wide variety of administration and automation tasks.

To name a few examples, it can be used to manage:

  • users and groups,

  • projects and project folders,

  • connections and code environments,

  • jobs, notebooks, and scenarios,

  • the lifecycle of machine learning models.

The product documentation covers its many other capabilities.

../../_images/apis-public-egs.png

The public API is an HTTP REST API, but it is also available as a Python API client, which is the recommended way to interact with this API. The wrapper for the public REST API is the dataikuapi Python package. However, in this lesson, we’ll use this package under the hood, by using a DSS client inside the dataiku package.

../../_images/apis-public-detail.png

Public API Examples

Let’s see how this works in a notebook. When using methods from the public API inside Dataiku DSS, we only need to import the dataiku package, and then establish the API client with this line of code.

Instead of providing an API key, the API client will automatically inherit connection credentials from the current context.

import dataiku
client = dataiku.api_client()
client
../../_images/apis-public-1.png

Once we have a DSS client, we can perform all authorized actions. For example, you could list all project keys on the instance with the line below.

client.list_project_keys()
../../_images/apis-public-2.png

From this client handle, we can work our way to other objects – such as a specific project. From a project handle, we have a wide range of methods we can use.

project = client.get_project("DKU_HAIKU")
../../_images/apis-public-3.png

For example, let’s create a handle on a dataset in this project. Note that this object, one created with a method from the public API, is not the same as a dataiku.Dataset object that you see in any code recipe.

dataikuapi_ds = project.get_dataset("Orders")
dataikuapi_ds
../../_images/apis-public-3a.png

With a dataset handle, we can build, clear, copy, or delete it. We can create ML tasks or statistics worksheets. We can get and set metadata, schema, settings, and zone information. In other words, we have a wide range of high-level capabilities available.

../../_images/apis-public-4.png

For the sake of comparison, let’s create the more familiar dataiku.Dataset object for the same dataset.

dataiku_ds = dataiku.Dataset("Orders")
print(dataiku_ds)
../../_images/apis-public-4a.png

Now we have a more narrow range of methods, such as get_dataframe(). With a dataiku Dataset object, the scope is focused on different ways to read or write the dataset.

../../_images/apis-public-5.png

Let’s give a more administrative example.

  • We can obtain a dictionary of all groups on an instance;

dss_groups = client.list_groups()
  • create a group;

new_group = client.create_group('new_group', description='test group', source_type='LOCAL')
  • modify its settings;

group_definition = new_group.get_definition()
group_definition['description'] = 'New description'
new_group.set_definition(group_definition)
  • and when needed, delete it.

group = client.get_group('new_group')
group.delete()

This lesson introduced the scope of possibilities available by coding with the public API. You’ll have a chance to explore more of these methods in the hands-on exercises.