Tutorial | Data transfer with visual recipes#
Once you have configured a data connection, you’ll want to learn about visual methods for moving data to a database.
In this tutorial, you will:
Move data to a SQL database via a Sync recipe.
Perform basic preprocessing steps with the Prepare recipe and send the output to a SQL database.
You will need access to a Dataiku instance with an available SQL connection. If you don’t have one yet, see Tutorial | Configure a connection between Dataiku and a SQL database.
You may also want to review this tutorial’s associated concept article.
Create the project#
From the Dataiku Design homepage, click + New Project > DSS tutorials > Core Designer > Data Transfer.
From the project homepage, click Go to Flow.
You can also download the starter project from this website and import it as a zip file.
Use case summary#
The Flow begins from two uploaded CSV files:
Order information for t-shirts, including the order’s date, the type of t-shirt, and the customer who placed the order.
Additional information about the customers who have placed a t-shirt order.
Move data to a database with a Sync recipe#
The Sync recipe is convenient when you need to copy an existing dataset into a database without any data preparation.
Infer storage types#
The orders dataset in the starter project is a CSV file that has been uploaded into Dataiku. CSV files do not contain any kind of typing information. Therefore, Dataiku assumes by default that all columns have a string storage type.
The Sync recipe maps the storage type in the input dataset to a similar type in the output database. Let’s first set the storage types of numeric columns before syncing the dataset to the database.
One way to handle this is to infer the storage types from the data and save the updated schema.
Open the orders dataset.
Navigate to the Settings tab.
Navigate to the Schema subtab.
Click Check Now to confirm the schema and data are consistent.
Click Infer Types from Data and then Confirm.
Save your dataset, recognizing the change in storage types for numeric columns.
Type inference in this manner is performed against a sample of the data, and you should check that the inferred types correspond to your actual data.
Create a Sync recipe#
With the updated storage types, let’s sync the uploaded dataset to the database.
From the orders dataset, open the Actions sidebar.
From the menu of visual recipes, select Sync.
If necessary, adjust the storage location of the output to your SQL database (in this example, a Snowflake connection).
Click Create Recipe.
On the Configuration tab of the recipe, click Run.
Move data to a database with a Prepare recipe#
In cases when you have some preprocessing to do prior to loading your data into a database, you can use a Prepare recipe instead of a Sync recipe.
Create a Prepare recipe#
Let’s start by creating a Prepare recipe.
From the Flow, select the customers_stacked dataset.
From the menu of visual recipes in the Actions sidebar, select Prepare.
If necessary, adjust the storage location of the output to your SQL database.
Click Create Recipe.
Add data preparation steps#
The actual data preparation steps are arbitrary for our objectives. We could include any number of steps as an example.
From the birthdate column header dropdown, select Parse date.
Click Use Data Format with the default selection in the smart date dialog.
In the step settings on the left, clear the output column to parse the column in place.
At the bottom left, click Run.
The Prepare recipe operations are run using the DSS engine, and the data are then pushed into the database. See the reference documentation to learn more about how the Prepare recipe creates the schema of the output dataset with respect to choosing storage types.
Congratulations! You’ve successfully transferred data to a database using two different visual methods.
Once you are comfortable moving data into databases, you’ll want to understand how to use the in-database engine for other operations, such as data transformation and visualization.