How-to | Normalize number formats in a Prepare recipe#
You can use a Prepare script (either in a visual analysis or a recipe) to handle datasets with various kinds of numeric representations. In particular, this is a job for the Convert number formats processor.
Here is a snippet of a dataset in a visual analysis containing decimals formatted in both US and French styles:
data:image/s3,"s3://crabby-images/739f4/739f4f93a346d73979eee06739e026566eae3602" alt="A dataset with decimal columns in two US and French formats."
For the us_notation column, Dataiku predicts a meaning of “Decimal”, but the first two values are invalid. On the other hand, Dataiku predicts a meaning of “Decimal (comma)” for the fr_notation column. Our goal is for Dataiku to recognize both of these columns as valid decimals.
For the fr_notation column, Dataiku suggests a conversion from the French decimal format to a regular decimal. This steps uses the Convert number formats processor to convert this column to a Decimal meaning.
data:image/s3,"s3://crabby-images/f2b79/f2b7986aaa3ecd031a48ccc5652e833ce9f2e919" alt="Context menu to convert French format to regular decimal format."
The same processor can fix the us_notation column. Add a new step to the script and find the Convert number formats processor. The input format should be recognized as “English” and the output format set to “Raw”.
data:image/s3,"s3://crabby-images/4a743/4a743848407bc2ecfb170cb625e439aad7393523" alt="Prepare recipe output with converted number formats."
Now Dataiku recognizes all values of both output columns with a Decimal meaning, and can be processed as such by all Dataiku-supported compute engines.