Concept | Connection changes#
Watch the video
Let’s learn how to change dataset connections in Dataiku.
Suppose your Flow consists of datasets stored in a filesystem, but you want to store them in another location, such as in a SQL or an HDFS database.
Dataiku allows you to change the connection of multiple datasets at the same time, either from the Datasets page or directly from the Flow. You can choose a new connection depending on the database connections that have been set up in your instance.
You can also Drop data from the original storage location, which is useful for preventing unused datasets from taking up storage space.
If you choose to Reuse connection settings if possible, Dataiku will reuse the file format settings that were previously set up in the Format/Preview page of the dataset.
Once you change a dataset’s connection, you transfer its schema. The dataset itself is empty and needs to be rebuilt. However, certain changes will cause errors when you attempt to rebuild the dataset.
Dataiku warns you that changing dataset connections can break the computations or lead to different results.
This can happen, for instance, if you try to store a dataset with an array type column into a PostgreSQL database. Even though you succeed in saving the connection change, you will get an error message when you try to build the dataset, because SQL databases cannot store arrays.
Luckily, you can employ a variety of transformations to your dataset in Dataiku to make it compatible with other databases.