Concept: Code Recipes in Dataiku

Dataiku recipes contain the transformation steps, or processing logic, that act upon datasets or folders. In the Flow, they are represented by circles connecting the input and output datasets or folders.

Code recipes are a type of recipe which executes a piece of user-defined code, using a language such as Python, R, or SQL, among others. Code recipes are represented by orange circles in the Flow. Like other recipes, they typically connect an input and an output dataset, but can also produce output datasets or folders without a Dataiku object as input.

../../../_images/code-recipes-flow.png

Types of Code Recipes

Dataiku is a technology-agnostic tool, allowing users to code in the language of their choice. As such, it offers the following code recipes:

../../../_images/code-recipe-types.png

Create a Code Recipe

You can create a new code recipe in one of two ways:

  • from the Flow, by clicking on the New Recipe button; or

  • in the Actions menu of a dataset, which can be accessed from the Flow or from the dataset itself.

To create a code recipe:

  • First, select the input dataset or datasets. This step is optional, as a code recipe doesn’t always need to have an input dataset.

  • You’ll also need to create or select one or multiple output datasets or folders, and select in which connection they will be stored.

../../../_images/create-code-recipe.png

Edit Code in a Recipe

All code recipes have a common layout and UI, which contains a code editor.

Starter Code

Once you have created your recipe, it is auto-filled with “starter” code.

../../../_images/code-recipe-starter-code.png

For example, in a Python recipe, in order to simplify the process of connecting to and setting up the data, the dataiku.Dataset class in the starter code abstracts away the notion of underlying data storage and allows you to easily obtain a pandas DataFrame.

Similarly, the process for writing output at the end of the recipe simply requires the user to provide a DataFrame and tell Dataiku which dataset to write the output in, regardless of storage or connection type.

This code is here to help you get started, in particular with reading and writing data, but you need to add your code to it in order for it to suit your needs.

Execute Code in a Recipe

To execute the code, you need to run the recipe. Code recipes have a “Run” button that automatically appears as soon as at least one output dataset has been defined for the recipe.

Most recipes also have a “Validate” button that can be used before running a recipe to perform consistency checks. Some recipes are also able to automatically compute the output schema of datasets. If the current output schema does not match what the recipe wants to output, you’ll get prompts to update the output datasets’ schemas.

When you click the Run button, a new job is started which executes the recipe’s code. When it’s finished, Dataiku will display either a success or an error message, and you can explore the generated output datasets.

../../../_images/python-recipe-run.png

You can also navigate to the Jobs menu to observe and monitor the activities triggered while a recipe is running, and use the Job logs for potential troubleshooting.

Learn More

To learn more about code recipes, you might want to check out: