Concept | Date handling in Dataiku#

Watch the video

Introduction to date parsing challenges#

Working with dates poses a number of data cleaning challenges.

There are many date formats, different time zones, and components like “day of the week” which can be difficult to extract. A human might be able to recognize that “1/5/19”, “2019-01-05”, and “1 May, 2019” are all the same date. However, to a computer, these are just three different strings.

Parsing dates.

Using the Prepare Recipe to solve these challenges#

Strings representing dates need to be parsed so that the computer can recognize the true, unambiguous meaning of the date. Dataiku answers this problem with the Prepare recipe.

When you have a column that appears to be a date, Dataiku is able to recognize it as a date. In the example below, the meaning of the first column is an unparsed date.

You can proceed in two ways to parse it:

  • Open the processor library, filter for Dates, and search for a step to help in whatever situation you may find yourself. Here, we find the Parse date processor.

    Parse date processor.
  • Take advantage of how Dataiku suggests transformation steps based on a column’s meaning. Because Dataiku has identified this column as an unparsed date, it suggests adding the Parse date processor to the script.

    Screenshot of the context menu of a date column.

Both methods achieve the same result.

After you have chosen the correct processor, it is just a few more clicks to select the correct settings, in this case, the format of the date and the timezone.

Once you’ve added a step, a preview of the output is immediately visible. You can see how the format of the date has changed, and the meaning is now a Date.

Now, with the properly parsed date, you’re on your way! Dataiku will suggest new steps, such as Compute time since, Extract date components, and Filter on date.


What’s next?#

In this lesson, you learned how to handle and format dates in Dataiku. Continue getting to know the basics of Dataiku by learning about formulas.