Build Your Security Model - Connections - Usage Parameters

Usage parameters allow you to manage how Dataiku DSS (DSS) behaves with the datasets behind a connection. In this article, we’ll discuss the following usage parameters which are common to all connection types:

  • Allow write

  • Allow managed datasets

Usage Parameters section of the Connections tab within DSS settings.

“Allow Write” Usage Parameter

The Allow write usage parameter specifies whether DSS is allowed to write datasets through the connection. One example is uploading a dataset and modifying its metadata.

Note

Even when the dedicated service account that you used to configure the connection is able to write on the backend, you’ll still need to activate the Allow write usage parameter to allow DSS to write datasets via the connection.

“Allow Managed Datasets” Usage Parameter

The Allow managed datasets usage parameter specifies whether users can create managed datasets through the connection. For more information, visit Managed and external datasets.

The Allow managed datasets usage parameters can vary based on the connection type. Two examples are cloud storage and SQL databases.

Cloud Storage Configuration (S3)

As an example, let’s look at the usage parameters for an Amazon S3 connection.

Managed datasets and folder section of the Connections tab within DSS settings.

Managed Datasets & Folders

With an S3 connection, you can define a default bucket and path to force managed datasets to be created from a specific zone.

Naming Rules for New Datasets and Folders

You can prepend (prefix) and append (suffix) a dataset with fixed values or paths. You can also use variables such as ${projectKey} which refers to the project ID.

You can also define a specific metastore database name, and apply the same prefix and suffix mechanism to metastore table names.

Note

These settings are only applied when a user creates a new managed dataset or folder. To apply these settings to an existing managed dataset or folder, you can use the dataset settings.

SQL Databases Configuration

As another example, let’s look at the usage parameters for a PostgreSQL connection.

New postgreSQL connection page in the Connections tab within DSS settings.

Naming Rules for New Datasets and Folders

You can define a schema with the option to allow schema override. You can apply the same prefix and suffix mechanism as for metastore table names.

Read Only Data Lake Example

The following diagram depicts an example of a strategy with a read-only data lake and a dedicated DSS bucket for managed datasets.

Diagram showing a read-only data lake and a dedicated DSS bucket for managed datasets.

This basic example demonstrates how connections can be separated to answer security or governance concerns. In this example, two connections are created inside DSS.

Connection A refers to DSS connection defined to interact with a cloud storage used as a reference data lake inside the company. This storage will not be editable by DSS.

Connection B refers to a storage zone in which DSS (through the Service Account defined in both connections) will be able to write its managed datasets.

Service Account Configuration

When creating the Service Account it will be attached to policies allowing a set of actions:

  • Read access on the data lake storage (and the metastore if needed).

  • Read/Write access on the dedicated DSS storage so that managed datasets can be created and managed (deletion, metadata update…).

Connections Configuration

Both connections will be created using the same service account for interactions. Each will point to the relevant object storage.

The connection used for managed datasets will be configured with the following:

  • Allow write activated

  • Allow managed datasets activated

  • Some optional extra configuration specific to the managed datasets (Path, prefix, metastore…)

The connection used to reach the data lake will be configured with the following:

  • Allow write deactivated

  • Allow managed datasets deactivated

This connection will not be available when selecting an output zone inside the flow.