Tutorial | Data transfer with the Sync recipe (SQL part 1)¶
Once you have configured a connection, you’ll want to learn about visual methods for moving data to a database. We’ll start with the Sync recipe.
In this tutorial, you will:
infer storage types from an uploaded dataset;
move it to a SQL database via a Sync recipe.
The first step is to create a new project.
From the Dataiku homepage, click +New Project > DSS tutorials > Developer > SQL in Dataiku.
You can also download the starter project from this website and import it as a zip file.
In the Flow, you see the uploaded Haiku T-Shirt orders and customer data. Further, the labeled and unlabeled customer data has been stacked into a single dataset. Let’s get the uploaded data into the SQL database.
A screencast below walks through the actions described here.
Let’s start by opening the orders dataset. It is a CSV file that has been uploaded into Dataiku. CSV files do not contain any kind of typing information. Thus, the columns of this dataset are for the moment not specifically typed, and so Dataiku assumes by default that all columns have a string storage type.
However, when we sync this dataset to the database, we want pages_visited, tshirt_price, and tshirt_quantity to have integer, double, and integer storage types, respectively.
The Sync recipe maps the storage type in the input dataset to a similar type in the output database. So let us first set the column types in the input dataset. One way to handle this is to infer the storage types from the data and save the updated schema.
Note that the type inference is performed against a sample of the data, and you should check that the inferred types correspond to your actual data.
Open the Settings tab of the orders dataset.
In the Schema subtab, click Check Now to confirm the schema is consistent.
Then click Infer types from data, confirm, and then save your dataset.
Return to the Explore tab to confirm the new storage types.
With the updated storage types, let’s sync the dataset to the database.
From the orders dataset, select Sync from the Actions sidebar.
Leave the default dataset name of
Store the new dataset into a SQL connection. In the video below, we use the PostgreSQL_tshirt connection.
Create and run the recipe.
To review these steps, please see the video below.