Concept | Connection changes

In this lesson, you’ll learn to perform advanced Flow actions, such as:

  • changing dataset connections,

  • reusing Flow items, and

  • hiding parts of your Flow.

Tip

This content is also included in a free Dataiku Academy course on Flow Views & Actions, which is part of the Advanced Designer learning path. Register for the course there if you’d like to track and validate your progress alongside concept videos, summaries, hands-on tutorials, and quizzes.

Connection changes

Suppose your Flow consists of datasets stored in a filesystem, but you want to store them in another location, such as in a SQL or an HDFS database.

Slide depicting a Flow with filesystem datasets changed to a SQL database.

Dataiku DSS allows you to change the connection of multiple datasets at the same time–either from the Datasets page, or directly from the Flow. The available options for the new connection depend on the database connections that have been previously set up on the instance.

Dataiku screenshot of change connection option in the right Actions panel.

When changing connections, you also have the option to “Drop data” from the original storage location — which is useful for preventing unused datasets from taking up storage space.

Further, you have the option to “reuse connection settings if possible” — which allows you to reuse the file format settings that were previously set up in the Format/Preview page of the dataset.

In addition, Dataiku DSS warns you that “Changing dataset connections can break the computations or lead to different results”.

Dataiku screenshot showing warning when changing the connection of a dataset.

This situation can happen, for instance, if you try to store a dataset with an ‘array’ type column into a PostgreSQL database. Even though you succeed in saving the connection change, you will get an error message when you try to build the dataset, because SQL databases cannot store arrays.

Slide highlighting how changing connections can lead to problems if the new connection does not recognize the same storage types.

When you change a dataset’s connection, you transfer its schema. After changing the connection, the dataset itself is empty, and needs to be rebuilt.

You can repeat the previous steps to change the connection of the datasets back to a file system, or to a different database, as needed.

Learn more

Congrats! Now you’ve seen how to change dataset connections, reuse Flow items, and hide or show parts of your Flow as needed.

To learn more about Flow Views & Actions, including through hands-on exercises, please register for the free Academy course on this subject found in the Advanced Designer learning path.

The product documentation also contains more information about topics such as supported connections and folding flows.