Concept | Default, fallback, and forced dataset connections#

A dataset is the core object users will manipulate in DSS. There are different ways that dataset connections can be chosen:

  • By default, DSS suggests a dataset connection consistent with the input dataset when a user creates a new managed output dataset. This enables functionality such as push-down computation of SQL-based operations.

  • Administrators can configure fallback connections for instances when no obvious consistent connection is available for the input dataset

  • Administrators can force a default connection to be used. This will overrule the connection that DSS would have chosen. It is recommended to work with preferred connections over strict enforcement of connections.

  • In addition, administrators can define the preferred file format to be used for new managed datasets.

Note

Cloud-based object storage such as Amazon S3 and network-attached storage devices are good choices for the default connection. It is not recommended to use the local filesystem of the server running DSS as the default storage location.

Administrators can configure these settings globally and on a per-project basis. Find the options to define a default connection for datasets in DSS under Administration > Settings > Engines & connections.

Here, you can:

  • Set a preferred fallback connection for datasets.

  • Force a connection (overriding a perhaps better contextual connection).

  • Define a preferred storage format.