Learn more about many of the natively-available visual recipes for data preparation.
Concepts & tutorials¶
- Concept | Recipes in Dataiku
- Concept | Group recipe
- Tutorial | Group the data (Core Designer part 6)
- Concept | Join recipe
- Tutorial | Enrich the dataset (Core Designer part 8)
- Concept | Distinct recipe
- Concept | Pivot recipe
- Tutorial | Pivot Recipe (Advanced Designer part 4)
- Tutorial | Excel-style pivot tables with the Pivot recipe
- Tutorial | Reshaping data from long to wide format with the Pivot recipe
- Concept | Filter recipe
- Concept | Sample recipe
- Concept | Sort recipe
- Concept | Split recipe
- Concept | Stack recipe
- Concept | Top N recipe
- Tutorial | Top N Recipe (Advanced Designer part 5)
- Concept | Window recipe
- Tutorial | Window recipe deep dive (Advanced Designer part 2)
- Tutorial | Window Recipe (Advanced Designer part 1)
- Tutorial | Fuzzy Join recipe
- Concept | Common recipe steps: Pre-filter, Post-filter & Computed columns
- Tutorial | In-database operations with visual recipes
How-to | Copy a recipe in the Flow¶
Do you have recipes that you want to re-use elsewhere in a project? You can copy recipes from the Flow for use in the same project.
From the Flow, click the recipe you want to copy, and a Copy action will appear in the Actions sidebar on the right.
You will be asked to choose on which dataset the recipe should be applied before the recipe is copied.
Note that, in many cases, you can avoid keeping multiple identical recipes up to date by stacking your data, via the Stack recipe, then splitting your data, via the Split recipe. This is particularly helpful when managing training and testing datasets for machine learning. More information can be found on the Dataiku Academy in the Visual Recipes 101.
If you are looking to copy the steps of a Prepare recipe, you can use the method described here, but you also have the option of copying the steps themselves. Once you have the steps copied to your clipboard, you can paste them into Prepare recipes in other projects. That process is described in detail in the reference documentation.
There are other methods of duplicating recipe steps or even entire projects:
How-to | Segment your data using statistical quantiles¶
You can create statistical quantiles without code in Dataiku in two ways:
The Split recipe allows you to break down each quantile into separate datasets, so it can be useful if you’re planning to separately handle a small amount of quantiles like quartiles or deciles.
The Window recipe allows you to create a new column containing the quantile value, which can be easier to set up for a large amount of quantiles like centiles.
In statistics and probability, quantiles are cut points dividing the range of a probability distribution into continuous intervals with equal probabilities, or dividing the observations in a sample in the same way. In the two examples below, let’s assume that you want to create quantiles based on a numerical column called “score”.
Using a window recipe¶
Configure the Windows recipe to reorder the rows according to the scoring column, enable the window frame with no limits set, and configure the number of quantiles you want in the aggregations screen in addition to retrieving all the existing columns.
Using a split recipe¶
Configure the Split recipe with the “Dispatch percentiles of sorted data” mode, order the rows according to the scoring column, and assign each portion of the rows in separate datasets.
In addition, to interactively compute statistical quantiles, you can refer to the quantiles table of the Interactive Statistics worksheets.
For more details about interactive statistics, please refer to this course.
You can read more about different Dataiku recipes:
You can also watch this presentation on Customer Predictive Analytics to learn how Dataiku was used to perform data preparation. This resulted in using a machine learning algorithm to assess the probability of a customer returning to the website a certain number of days after their visit.