This content is also included in the free Dataiku Academy course, Basics 101, which is part of the Core Designer learning path. Register for the course there if you’d like to track and validate your progress alongside concept videos, summaries, hands-on tutorials, and quizzes.
The processing logic that acts upon a DSS dataset is decoupled from its underlying storage infrastructure. The way in which DSS manages connections helps make this possible.
You can import a new dataset in the Flow by uploading your own files or accessing data through any previously-established connections, such as SQL databases, cloud storage, or NoSQL sources. You might also have plugins allowing you to import data from other non-native sources.
While importing a dataset, you can browse connections and available file paths, and preview the dataset and its schema. Once you have done that, the user interface for exploring, visualizing, and preparing the data is the same for all kinds of datasets.
Admin users have the ability to manage connections on an instance from a centralized location. From here, they can control settings such as credentials, security settings, naming rules, and usage parameters. Admins can also establish new connections to SQL and NoSQL databases, cloud storage, and other sources. Many additional connection types are available in the Plugin Store for any non-native connections.
One benefit of this system is a clearer division of labor between those who manage data connections and those who work with data. While having some understanding of a dataset’s storage is often beneficial, particularly in cases of very large datasets, those working with data do not necessarily always need expertise in how their organization warehouses its data.