Concept | Dataiku formulas

See the video version of this article


Often in a Prepare recipe, you will want to create new columns based on those already present in your dataset. In the world of machine learning, this is called feature generation.

Similar to what you might find in a spreadsheet tool like Excel, Dataiku has its own Formula language.

It is a powerful expression language to perform calculations, manipulate strings, and much more.

../../_images/prepare-formulas-slide.png

From the processor library, you can add a Formula step and provide the name of the output column.

You could write simple formulas directly in the Expression box. Using the Editor, however, adds a few support measures. The first is code completion. As soon as you start typing, Dataiku starts suggesting columns from the dataset or functions to apply. The Editor will also alert you if the formula is invalid.

The Formula language allows you to craft expressions of considerable complexity. For example, you can use:

  • common mathematical functions, such as round, sum and max

  • comparison operators, such as >, <, >=, <=

  • logical operators, such as AND and OR

  • tests for missing values, such as isBlank() or isNULL()

  • string operations with functions like contains(), length(), and startsWith()

  • conditional if-then statements

../../_images/prepare-formula-dss.png

You can always visit the reference documentation for help, or visit the Academy to view common use cases with examples.

Learn More

In this lesson, you learned how to use Dataiku’s spreadsheet-like formula language to perform calculations, manipulate strings, and much more. Continue getting to know the basics of Dataiku by learning about statistics worksheets.