How to remove scientific notation in a column

Formatting numbers can often be a tedious data cleaning task.

It can be made easier with the format() function of the Dataiku DSS Formula language. This function takes a printf format string and applies it to any value.

Format strings are immensely powerful, as they allow you to truncate strings, change precision, switch between numerical notations, left-pad strings, pad numbers with zeros, etc. More specifically, Dataiku DSS formulas use the Java variant of format strings.

For example, you may have a column of very small numbers represented in scientific notation. If instead you wanted to convert this column to 5 decimal places, you can use:

format("%.5f", my_column_name)

Note

For the change in format to persist in the output of the Prepare recipe, you must change the storage type of the formatted column to a string. You can do this from the storage type dropdown of the column header, while in the recipe editor.

For a deeper explanation of this behavior, please see the product documentation on variable typing and auto-typing in the formula language.

What’s Next?

For more information, see: