Concept | Data connections#
Watch the video
Dataiku lets you change, shape, and analyze your data through a variety of actions. It also enables you to perform these manipulations on externally stored datasets through connections.
Let’s see how Dataiku manages connections to make this possible.
Importing a new dataset#
You can import a new dataset in the Flow by uploading your own files or accessing data through any previously established connections, such as SQL databases, cloud storage, or NoSQL sources. You might also have plugins allowing you to import data from other non-native sources.
While importing a dataset, you can browse connections and available file paths, and preview the dataset and its schema.
Once you have done that, the user interface for exploring, visualizing, and preparing the data is the same for all kinds of datasets. This is because the processing logic that acts upon a dataset is decoupled from its underlying storage infrastructure.
Instance administrators can configure connections in:
The Connections menu from the launchpad of Dataiku Cloud.
The Applications > Administration > Connections menu from the top navigation bar in a self-managed Dataiku installation.
From here, they can control settings such as credentials, security settings, naming rules, and usage parameters. Admins can also establish new connections to SQL and NoSQL databases, cloud storage, and other sources. Many additional connection types are available in the plugin store for any non-native connections.
One benefit of this system is a clearer division of labor between those who manage data connections and those who work with data.
While understanding a dataset’s storage is often beneficial (particularly with large datasets), those working with data do not always necessarily need expertise in how their organization warehouses its data.
In this lesson, you learned to work with datasets in Dataiku that have different underlying connections or storage infrastructure. Continue getting to know the basics of Dataiku by learning about dataset schemas.