Learn more about many of the natively-available visual recipes for data preparation.
- Concept | Recipes in Dataiku
- Concept | Group recipe
- Concept | Join recipe
- Concept | Distinct recipe
- Concept | Pivot recipe
- Concept | Filter recipe
- Concept | Sample recipe
- Concept | Sort recipe
- Concept | Split recipe
- Concept | Stack recipe
- Concept | Top N recipe
- Concept | Window recipe
- Concept | Common recipe steps: Pre-filter, Post-filter & Computed columns
- Tutorial | Group the data (Core Designer part 6)
- Tutorial | Enrich the dataset (Core Designer part 8)
- Tutorial | Pivot Recipe (Advanced Designer part 4)
- Tutorial | Top N Recipe (Advanced Designer part 5)
- Tutorial | Window recipe (Advanced Designer part 1)
- Tutorial | Window recipe deep dive (Advanced Designer part 2)
- Tutorial | Fuzzy Join recipe
- Tutorial | In-database operations with visual recipes (SQL part 4)
You can create statistical quantiles without code in Dataiku in two ways:
The Split recipe allows you to break down each quantile into separate datasets, so it can be useful if you’re planning to separately handle a small amount of quantiles like quartiles or deciles.
The Window recipe allows you to create a new column containing the quantile value, which can be easier to set up for a large amount of quantiles like centiles.
In statistics and probability, quantiles are cut points dividing the range of a probability distribution into continuous intervals with equal probabilities, or dividing the observations in a sample in the same way. In the two examples below, let’s assume that you want to create quantiles based on a numerical column called “score”.
Using a window recipe¶
Configure the Windows recipe to reorder the rows according to the scoring column, enable the window frame with no limits set, and configure the number of quantiles you want in the aggregations screen in addition to retrieving all the existing columns.
Using a split recipe¶
Configure the Split recipe with the “Dispatch percentiles of sorted data” mode, order the rows according to the scoring column, and assign each portion of the rows in separate datasets.
For more details about interactive statistics, please refer to this course.
You can read more about different Dataiku recipes:
You can also watch this presentation on Customer Predictive Analytics to learn how Dataiku was used to perform data preparation. This resulted in using a machine learning algorithm to assess the probability of a customer returning to the website a certain number of days after their visit.