How to Remove Scientific Notation in a Column


In version 11.1 and above, in columns of type Decimal, numbers below 1015 (i.e. up to 999,999,999,999,999) are automatically formatted using the decimal notation in the dataset Explore view. Similarly, numbers above 10-15 (i.e. down to 0.000000000000002) are now automatically formatted using the decimal notation. In version 11.0 and earlier, the thresholds were 107 and 10-7.

You can also switch scientific notation on or off for numbers between 10-7 and 10-15 and between 107 and 1015 by selecting Toggle scientific notation from the column name dropdown in a Prepare recipe or dataset Explore view.

Formatting numbers can often be a tedious data cleaning task.

It can be made easier with the format() function of the Dataiku Formula language. This function takes a printf format string and applies it to any value.

Format strings are immensely powerful, as they allow you to truncate strings, change precision, switch between numerical notations, left-pad strings, pad numbers with zeros, etc. More specifically, Dataiku formulas use the Java variant of format strings.

For example, you may have a column of very small numbers represented in scientific notation. If instead you wanted to convert this column to 5 decimal places, you can use:

format("%.5f", my_column_name)


For the change in format to persist in the output of the Prepare recipe, you must change the storage type of the formatted column to a string. You can do this from the storage type dropdown of the column header, while in the recipe editor.

For a deeper explanation of this behavior, please see the product documentation on variable typing and auto-typing in the formula language.

What’s Next?

For more information, see: