Concept: Join Recipe

Tip

This content is also included in the free Dataiku Academy course, Basics 103, which is part of the Core Designer learning path. Register for the course there if you’d like to track and validate your progress alongside concept videos, summaries, hands-on tutorials, and quizzes.

The primary use case for the Join recipe is to enrich one dataset with columns from another. DSS matches values using a key column that is common to both datasets. The Left join is a common join type used in data enrichment. It lets you keep all the records in your main dataset regardless if there is a match in the enrichment dataset.

While the default join type is a Left join, you can set the join type that best fits your use case.

../../../_images/join-join-types.png

You can always change the detected key column by selecting your own columns to match on and setting the conditions.

../../../_images/join-join-conditions.png

In the Selected columns step, you can tell DSS which columns you want to see in the output dataset.

../../../_images/join-selected-columns.png

There are a few other options including Pre-filters which allows you to keep or drop rows based on your criteria.

../../../_images/join-pre-filter.png

You can use the Post-filter to inform DSS if duplicate rows are allowed and when you want to be able to select only the rows that match a condition.

../../../_images/join-post-filter.png

Finally, you can use the Output step to review the execution specs, for example the generated SQL query and execution plan.

../../../_images/join-output.png

There are a lot of reasons to use joins when building a Flow. In the following hands-on lesson, you can practice using the Join Recipe to enrich the customers dataset.