How-to | Normalize number formats in a Prepare recipe#
You can use a Prepare script (either in a visual analysis or a recipe) to handle datasets with various kinds of numeric representations. In particular, this is a job for the Convert number formats processor.
Here is a snippet of a dataset in a visual analysis containing decimals formatted in both US and French styles:
For the us_notation column, Dataiku predicts a meaning of “Decimal”, but the first two values are invalid. On the other hand, Dataiku predicts a meaning of “Decimal (comma)” for the fr_notation column. Our goal is for Dataiku to recognize both of these columns as valid decimals.
For the fr_notation column, Dataiku suggests a conversion from the French decimal format to a regular decimal. This steps uses the Convert number formats processor to convert this column to a Decimal meaning.
The same processor can fix the us_notation column. Add a new step to the script and find the Convert number formats processor. The input format should be recognized as “English” and the output format set to “Raw”.
Now Dataiku recognizes all values of both output columns with a Decimal meaning, and can be processed as such by all Dataiku-supported compute engines.