Concept | Time series preparation#

Watch the video

Before we can unlock the power of modeling and forecasting techniques on time series data, we often need to ensure the data has certain properties. This can be achieved through time series preparation.

Let’s learn about how to achieve some common objectives of time series preparation using the Time Series Preparation plugin in Dataiku.

Time series data review#

To proceed with time series preparation, you must be familiar with time series data types and formats. In brief, time series data can be:

  • Single or multiple

  • Univariate or multivariate

  • Stored in wide or long format

Time series preparation recipes#

Dataiku provides five different recipes for time series preparation.

Recipe

Purpose

Visualization

Resampling

Transform data occurring in irregular intervals into equispaced data.

../../_images/resampling.png

Interval Extraction

Identify periods when data values are within a given range under certain conditions.

../../_images/interval-extraction.png

Windowing

Smooth the data in order to reduce the noise and volatility or enrich it to uncover hidden patterns.

../../_images/windowing.png

Extrema Extraction

Focus on specific sections of data that are of particular interest.

../../_images/extrema-extraction.png

Decomposition

Break your time series data down into trend, seasonality and residuals.

../../_images/decomposition.png

Common steps before preparation#

Let’s take a look at an example time series dataset in Dataiku.

We have a column of order dates and a column recording the amount spent on those orders. We also have a categorical identifier column.

../../_images/orders-raw.png

So, is this data ready for windowing, decomposition, and other preparation tasks?

Parsing the dates#

In order for the Time Series Preparation plugin to treat the contents of a date column as a timestamp, it must be parsed.

In this example, we can use the Prepare recipe to parse the order_date column so that it can be recognized as a date. Therefore, the values can be used as timestamps.

../../_images/parse-date.png

Converting to wide format#

Before using the Time Series Preparation plugin, we also need to know whether our data is in long or wide format.

Here, the data is stored long format because all of our measurements over time are stored in the same column, amount_spent.

If you want to convert the data to wide format, you could use the Pivot recipe or the Pivot processor in the Prepare recipe to reshape the data. In this case, you would want to pivot on the identifier column, tshirt_category, that tells us the individual time series to which each measurement belongs.

../../_images/orders-wide.png

Once in wide format, it is easier to see that there are actually six different time series aggregated in the long format — one for each t-shirt category.

Verifying time series data validity#

Another step in time series preparation is making sure that the data is valid and that each timestamp for each time series is unique. If you try to use a recipe from the Time Series Preparation plugin, and the data is invalid, Dataiku will throw an error about duplicate timestamps.

One way to check this is to use the Analyze tool on the whole dataset. Here, every date has no more than one measurement for each series of a product category.

../../_images/valid-ts.png

Checking for missing data#

Data in wide format lets us easily spot missing values.

Not only might there be missing orders from a date, but there may be missing dates according to your defined time step. For example, if our time step is one day, then we can quickly tell that there are not data points for each day.

../../_images/orders-wide-missing.png

Additionally, it is clear that there is unequal spacing between successive timestamps. If applicable, this can be handled with the Resampling recipe in the Time Series Preparation plugin.

What’s next?#

Try out the tasks and recipes outlined here in Tutorial | Time series preparation!