Concept | Join recipe¶
The primary use case for the Join recipe is to enrich one dataset with columns from another. Dataiku matches values using a key column that is common to both datasets. The Left join is a common join type used in data enrichment. It lets you keep all the records in your main dataset regardless if there is a match in the enrichment dataset.
While the default join type is a Left join, you can set the join type that best fits your use case.
You can always change the detected key column by selecting your own columns to match on and setting the conditions in the Join step.
When performing a left, right, or inner join, you can add a dataset to capture any unmatched rows.
In the Selected columns step, you can tell Dataiku which columns you want to see in the output dataset.
There are a few other options including Pre-filters which allows you to keep or drop rows based on your criteria. This can be useful before joining unwanted rows in large datasets.
You can use the Post-filter step to filter the results of the Join operation before writing the output dataset. For example, you can inform Dataiku if duplicate rows are allowed and if you want to return only rows that match a condition.
Finally, you can use the Output step to review the execution specs, for example the generated SQL query and execution plan.
There are a lot of reasons to use joins when building a Flow.
In this hands-on tutorial, you can practice using the Join Recipe to enrich the customers dataset.