Tutorial | Data transfer with the Prepare recipe (SQL part 2)

The Sync recipe is convenient when you need to copy an existing dataset into a database without any data preparation. In cases when you have some preprocessing to do prior to loading your local data into a database, you can use a Prepare recipe.


In this tutorial, you will:

  • perform basic preprocessing steps with the Prepare recipe;

  • send the output to a SQL database.

Starting here?

If you skipped the previous sections, you need to:

  • Configure an SQL connection in Dataiku if one is not already available. The example here demonstrates using a PostgreSQL connection called PostgreSQL_tshirt.

  • Create the project (+New Project > DSS tutorials > Developer > SQL in Dataiku).

Create a Prepare recipe

  • From the Flow, select the customers_stacked dataset.

  • Choose Prepare from the list of visual recipes in the actions sidebar.

  • Leave customers_stacked_prepared as the default dataset name.

  • Choose to store the new dataset into an available SQL connection.

  • Click Create Recipe.

Add data preparation steps

Let’s take a few basic preparation steps. See the screencast below for the full details. In summary, we:

  • Parse birthdate.

  • Classify the user_agent column, keeping the resulting user_agent_brand and user_agent_os columns.

  • Resolve the GeoIP of the ip_address column, keeping the resulting ip_address_country and ip_address_geopoint columns.

  • Click Run.

The Prepare recipe operations are run in the DSS engine, and the data are then pushed into the PostgreSQL database. The Prepare recipe infers the storage type of each column based on a sample, so typically you don’t have to make any manual adjustments.

What’s next?

Once you are comfortable moving data into databases, you’ll want to explore using SQL code recipes to run SQL queries in-database.