Concept | Connection changes#

Watch the video

Let’s learn how to change dataset connections in Dataiku.

Connection changes#

Suppose your Flow consists of datasets stored in a filesystem, but you want to store them in another location, such as in a SQL or an HDFS database.

Slide depicting a Flow with filesystem datasets changed to a SQL database.

Dataiku allows you to change the connection of multiple datasets at the same time, either from the Datasets page or directly from the Flow. You can choose a new connection depending on the database connections that have been set up in your instance.

Dataiku screenshot of change connection option in the right Actions panel.

You can also Drop data from the original storage location, which is useful for preventing unused datasets from taking up storage space.

If you choose to Reuse connection settings if possible, Dataiku will reuse the file format settings that were previously set up in the Format/Preview page of the dataset.

Risks#

Once you change a dataset’s connection, you transfer its schema. The dataset itself is empty and needs to be rebuilt. However, certain changes will cause errors when you attempt to rebuild the dataset.

Dataiku warns you that changing dataset connections can break the computations or lead to different results.

Dataiku screenshot showing warning when changing the connection of a dataset.

This can happen, for instance, if you try to store a dataset with an array type column into a PostgreSQL database. Even though you succeed in saving the connection change, you will get an error message when you try to build the dataset, because SQL databases cannot store arrays.

Slide highlighting how changing connections can lead to problems if the new connection does not recognize the same storage types.

Luckily, you can employ a variety of transformations to your dataset in Dataiku to make it compatible with other databases.

What’s next?#

To learn more about Flow view and actions through lessons and tutorials, please register for the free Academy course on this subject found in the Advanced Designer learning path.

The reference documentation also contains more information about topics such as supported connections and folding flows.