Hands-On Tutorial: Prepare Recipe for Loading a Database

Note

This lesson is part of a series of “Usage of SQL and Dataiku tutorials” that begins with the Hands-On Tutorial: Sync Recipe.

Tip

This content is also included in a free Dataiku Academy course Dataiku & SQL, which is an optional part of the Core Designer learning path. Register for the course there if you prefer to track and validate your progress alongside concept videos, text summaries, hands-on tutorials, and quizzes.

The Sync recipe is convenient when you need to copy an existing dataset into a database without any data preparation. In cases when you have some preprocessing to do prior to loading your local data into a database, you can use a Prepare recipe.

  • From the Flow, select the customers_stacked dataset.

  • Choose Prepare from the list of visual recipes in the Actions sidebar.

  • Leave customers_stacked_prepared as the default dataset name.

  • Choose to store the new dataset into an available SQL connection.

  • Click Create Recipe.

Let’s take a few basic preparation steps. See the screencast below for the full details. In summary, we:

  • Parse birthdate.

  • Classify the user_agent column, keeping the resulting user_agent_brand and user_agent_os columns.

  • Resolve the GeoIP of the ip_address column, keeping the resulting ip_address_country and ip_address_geopoint columns.

Click Run. The Prepare recipe operations are run in the DSS engine, and the data are then pushed into the PostgreSQL database. The Prepare recipe infers the storage type of each column based on a sample, so typically you don’t have to make any manual adjustments.