Concept | Join recipe

See the video version of this article


The primary use case for the Join recipe is to enrich one dataset with columns from another. Dataiku matches values using a key column that is common to both datasets. The Left join is a common join type used in data enrichment. It lets you keep all the records in your main dataset regardless if there is a match in the enrichment dataset.

While the default join type is a Left join, you can set the join type that best fits your use case.

Screenshot showing the join types in Dataiku.

You can always change the detected key column by selecting your own columns to match on and setting the conditions in the Join step.

Screenshot showing the join conditions dialog.

When performing a left, right, or inner join, you can add a dataset to capture any unmatched rows.

Screenshot showing the option to Send unmatched rows to other output dataset(s).

In the Selected columns step, you can tell Dataiku which columns you want to see in the output dataset.

Screenshot showing the Selected columns step of a Join recipe.

There are a few other options including Pre-filters which allows you to keep or drop rows based on your criteria. This can be useful before joining unwanted rows in large datasets.

Screenshot showing the pre-filter options in the Join recipe.

You can use the Post-filter step to filter the results of the Join operation before writing the output dataset. For example, you can inform Dataiku if duplicate rows are allowed and if you want to return only rows that match a condition.

Screenshot showing the post-filter options in Dataiku.

Finally, you can use the Output step to review the execution specs, for example the generated SQL query and execution plan.

Screenshot showing the Output options.

What’s next?

There are a lot of reasons to use joins when building a Flow.

In this hands-on tutorial, you can practice using the Join Recipe to enrich the customers dataset.