Concept: Join Recipe¶
Tip
This content is also included in the free Dataiku Academy course, Basics 103, which is part of the Core Designer learning path. Register for the course there if you’d like to track and validate your progress alongside concept videos, summaries, hands-on tutorials, and quizzes.
The primary use case for the Join recipe is to enrich one dataset with columns from another. DSS matches values using a key column that is common to both datasets. The Left join is a common join type used in data enrichment. It lets you keep all the records in your main dataset regardless if there is a match in the enrichment dataset.
While the default join type is a Left join, you can set the join type that best fits your use case.
You can always change the detected key column by selecting your own columns to match on and setting the conditions.
In the Selected columns step, you can tell DSS which columns you want to see in the output dataset.
There are a few other options including Pre-filters which allows you to keep or drop rows based on your criteria.
You can use the Post-filter to inform DSS if duplicate rows are allowed and when you want to be able to select only the rows that match a condition.
Finally, you can use the Output step to review the execution specs, for example the generated SQL query and execution plan.
There are a lot of reasons to use joins when building a Flow. In the following hands-on lesson, you can practice using the Join Recipe to enrich the customers dataset.