Concept | Time series preparation#
Watch the video
Before we can unlock the power of modeling and forecasting techniques on time series data, we often need to ensure the data has certain properties. This can be achieved through time series preparation.
Let’s learn about how to achieve some common objectives of time series preparation using the Time Series Preparation plugin in Dataiku.
Time series data review#
To proceed with time series preparation, you must be familiar with time series data types and formats. In brief, time series data can be:
Single or multiple
Univariate or multivariate
Stored in wide or long format
Time series preparation recipes#
Dataiku provides five different recipes for time series preparation.
Recipe |
Purpose |
Visualization |
---|---|---|
Resampling |
Transform data occurring in irregular intervals into equispaced data. |
|
Interval Extraction |
Identify periods when data values are within a given range under certain conditions. |
|
Windowing |
Smooth the data in order to reduce the noise and volatility or enrich it to uncover hidden patterns. |
|
Extrema Extraction |
Focus on specific sections of data that are of particular interest. |
|
Decomposition |
Break your time series data down into trend, seasonality and residuals. |
Common steps before preparation#
Let’s take a look at an example time series dataset in Dataiku.
We have a column of order dates and a column recording the amount spent on those orders. We also have a categorical identifier column.
So, is this data ready for windowing, decomposition, and other preparation tasks?
Parsing the dates#
In order for the Time Series Preparation plugin to treat the contents of a date column as a timestamp, it must be parsed.
In this example, we can use the Prepare recipe to parse the order_date column so that it can be recognized as a date. Therefore, the values can be used as timestamps.
Converting to wide format#
Before using the Time Series Preparation plugin, we also need to know whether our data is in long or wide format.
Here, the data is stored long format because all of our measurements over time are stored in the same column, amount_spent.
If you want to convert the data to wide format, you could use the Pivot recipe or the Pivot processor in the Prepare recipe to reshape the data. In this case, you would want to pivot on the identifier column, tshirt_category, that tells us the individual time series to which each measurement belongs.
Once in wide format, it is easier to see that there are actually six different time series aggregated in the long format — one for each t-shirt category.
Verifying time series data validity#
Another step in time series preparation is making sure that the data is valid and that each timestamp for each time series is unique. If you try to use a recipe from the Time Series Preparation plugin, and the data is invalid, Dataiku will throw an error about duplicate timestamps.
One way to check this is to use the Analyze tool on the whole dataset. Here, every date has no more than one measurement for each series of a product category.
Checking for missing data#
Data in wide format lets us easily spot missing values.
Not only might there be missing orders from a date, but there may be missing dates according to your defined time step. For example, if our time step is one day, then we can quickly tell that there are not data points for each day.
Additionally, it is clear that there is unequal spacing between successive timestamps. If applicable, this can be handled with the Resampling recipe in the Time Series Preparation plugin.
What’s next?#
Try out the tasks and recipes outlined here in Tutorial | Time series preparation!