How to remove scientific notation in a column¶
Formatting numbers can often be a tedious data cleaning task.
It can be made easier with the format()
function of the Dataiku DSS Formula language. This function takes a printf format string and applies it to any value.
Format strings are immensely powerful, as they allow you to truncate strings, change precision, switch between numerical notations, left-pad strings, pad numbers with zeros, etc. More specifically, Dataiku DSS formulas use the Java variant of format strings.
For example, you may have a column of very small numbers represented in scientific notation. If instead you wanted to convert this column to 5 decimal places, you can use:
format("%.5f", my_column_name)
Note
For the change in format to persist in the output of the Prepare recipe, you must change the storage type of the formatted column to a string. You can do this from the storage type dropdown of the column header, while in the recipe editor.
For a deeper explanation of this behavior, please see the product documentation on variable typing and auto-typing in the formula language.