Concept | Dataiku public API#

In Concept | The dataiku package, we introduced how the dataiku package enables low-level interaction with Dataiku objects, such as datasets, managed folders, and saved models.

The public API, on the other hand, can accomplish a wide variety of administration and automation tasks.

To name a few examples, it can be used to manage:

users and groups,
projects and project folders,
connections and code environments,
jobs, notebooks, and scenarios,
the lifecycle of machine learning models.

The reference documentation covers its many other capabilities.

The public API is an HTTP REST API, but it’s also available as a Python API client, which is the recommended way to interact with this API.

The wrapper for the public REST API is the dataikuapi Python package. However, in this lesson, we’ll use this package under the hood, by using a Dataiku client inside the dataiku package.

Public API examples#

Let’s see how this works in a notebook. When using methods from the public API inside Dataiku, we only need to import the dataiku package, and then establish the API client with this line of code.

Instead of providing an API key, the API client will automatically inherit connection credentials from the current context.

import dataiku
client = dataiku.api_client()
client

Once we have a Dataiku client, we can perform all authorized actions. For example, you could list all project keys on the instance with the line below.

client.list_project_keys()

From this client handle, we can work our way to other objects, such as a specific project. From a project handle, we have a wide range of methods we can use.

project = client.get_project("DKU_HAIKU")

For example, let’s create a handle on a dataset in this project. Note that this object, one created with a method from the public API, isn’t the same as a dataiku.Dataset object that you see in any code recipe.

dataikuapi_ds = project.get_dataset("Orders")
dataikuapi_ds

With a dataset handle, we can build, clear, copy, or delete it. We can create ML tasks or statistics worksheets. We can get and set metadata, schema, settings, and zone information. In other words, we have a wide range of high-level capabilities available.

For the sake of comparison, let’s create the more familiar dataiku.Dataset object for the same dataset.

dataiku_ds = dataiku.Dataset("Orders")
print(dataiku_ds)

Now we have a more narrow range of methods, such as get_dataframe(). With a dataiku Dataset object, the scope is focused on different ways to read or write the dataset.

Let’s give a more administrative example.

We can obtain a dictionary of all groups on an instance;

dss_groups = client.list_groups()

create a group;

new_group = client.create_group('new_group', description='test group', source_type='LOCAL')

modify its settings;

group_definition = new_group.get_definition()
group_definition['description'] = 'New description'
new_group.set_definition(group_definition)

and when needed, delete it.

group = client.get_group('new_group')
group.delete()

What’s next#

This article introduced the scope of possibilities available by coding with the public API. You’ll have a chance to explore more of these methods in Tutorial | Dataiku public API.