Connect to Data

When creating a new project in Dataiku DSS, you’ll likely have data coming from various sources such as SQL databases. Dataiku DSS makes it easy for you to connect to your data. In this section, we’ll remap the dataset connections and then build the datasets in the Flow.

Note

Non-coders can use the point-and-click interface to ingest and prepare data for model training, evaluation, and deployment.

Remap Dataset Connections

The datasets in this project are stored in a filesystem storage layer. We want to change the dataset connections so that the datasets are stored in an SQL database. If you are not using the Dataiku Online Trial from Snowflake Partner Connect, you can map the datasets in the Flow to your SQL database storage layer.

Tip

In Dataiku DSS, dataset connections in the Flow can be changed to any supported connector.

To remap the dataset connections:

  • Select all datasets in the Flow except for the first two input datasets.

  • Click the flight_data_input dataset in the first Flow Zone, to select it.

  • Hold down the Shift key and select the next dataset.

  • Continue until all datasets in both Flow Zones are selected.

  • Do not select the first two input datasets in the first Flow Zone.

  • With all the datasets selected, open the side panel by clicking the arrow at the top right corner of the page, then scrolling down to Other actions.

../../../_images/select-datasets-in-the-flow.png
  • In Other actions, click Change connection.

  • Select New connection to view the available connections.

  • If you are using the Dataiku Online Trial from Snowflake Partner Connect, select the Snowflake connection; otherwise, select your SQL connection.

../../../_images/change-dataset-connections.png

Note

Changing the dataset connection creates a copy of the data from one database to another database. The Drop data option drops the data from the original database. This can help reduce the amount of storage space used.

  • Click Save.

  • Close the right panel.

../../../_images/flow-after-changing-connections.png

Build the Datasets

Now that the dataset connections have been remapped, we’ll build the datasets in the Flow.

To do this:

  • Click Flow Actions from the bottom-right corner of your window.

  • Select Build all and keep the default selection for handling dependencies.

../../../_images/initial-build-all.png
  • Click Build.

  • Wait for the build to finish, and then refresh the page to see the built Flow. Wait time may be more than five minutes when using the Dataiku Online trial from Snowflake Partner Connect.

Note

Whenever we make changes to recipes within a Flow, or there is new data arriving in our source datasets, there are multiple options for propagating these changes. Visit Rebuilding Datasets to find out more.