Concept | Time series resampling#

Watch the video

Resampling is critical to time series analysis and forecasting. This is why Dataiku provides multiple ways to accomplish resampling, including:

  • Using the Resampling recipe from the Time Series Preparation plugin.

  • Configuring the Train / Test Set step of visual time series forecasting.

  • Writing custom formulas (not covered here).

Now, let’s start learning about resampling and its application through the use case of revenue from T-shirt sales over time!

Resampling introduction#

Time series resampling is a technique used to change the frequency of the time intervals in a time series dataset. In many cases, you’ll want to fill in missing dates in your datasets.

A slide showing a small dataset and graph that have irregularly space data points.

One reason is because many time series models (like ARIMA) assume that you provide equispaced data, meaning that each data point within a time series is uniformly separated by a consistent time interval.

A slide showing a small dataset and graph that have equispaced data points.

Once you have resampled your data, you need to add values to the new date records! Let’s see how to do so using data imputation.

Data imputation#

In the context of data analysis and statistics, data imputation refers to the process of replacing missing or incomplete data with estimated values. In Dataiku, you have the following built-in imputation methods:

Method

Definition

Interpolation

Estimates the values of a time series between observed data points.

Extrapolation

Predicts values beyond the observed range of a dataset.

Non-numerical imputation

Fills in categorical values during interpolation and extrapolation.

Imputation techniques#

The table below describes the few options you can choose from for each method.

Type

Description

Visualization

Linear

Connects the points with a straight (linear) line.

A slide showing a small dataset and graph with linearly imputed data values.

Previous

Carries forward the previous value.

A slide showing a small dataset and graph that uses previous values to fill in data points.

Next

Populates backwards from the next value.

A slide showing a small dataset and graph that uses next values to fill in data points.

Quadratic

Uses a quadratic model.

A slide showing a small dataset and graph that uses a quadratic model for data interpolation.

These are just a few of the possible options. The one you choose will depend on your own understanding and assumptions about the data.

Resampling recipe#

Let’s explore the Resampling recipe. From the example data below, in wide format, we can easily see that the timestamps are not equispaced. Many dates are missing.

A Dataiku screenshot of a dataset in wide format that has missing sequential dates.

To solve this issue, the Resampling recipe lets you:

  • Choose a timestamp column.

  • Define your time series format.

  • Input a time step.

  • Pick from various resampling methods.

  • Remove first or final values in the dataset.

Remember that if we don’t interpolate, extrapolate, or impute any values, we end up with new date rows with missing information — so let’s experiment with some techniques!

A Dataiku screenshot of a dataset with equispaced dates, but missing values.

Constant value interpolation#

One option is to interpolate a constant value like zero. The value you choose will depend on your use case.

In the output, we can see that values of 0 have been added in between the original data points. Note that there are null values because we did not choose an extrapolation method.

A Dataiku screenshot of a dataset that is filled with zeros between values.

As shown in the image, the first sale of men’s black T-Shirts was for 19 dollars. The next purchase was about one month later for 57 dollars. All of the dates in between those dates had a 0 filled in.

Extrapolation#

To reiterate, because we did not perform any extrapolation, Dataiku did not insert any values before any individual series began. That’s why there are still missing values before the first recorded value for some time series.

We can fill the missing values with zeros so we have a common start and end date for each time series in the dataset by using the same interpolation method for extrapolation.

A Dataiku screenshot of a dataset that is filled with zeros at the beginning of time series.

Clip series#

Before moving on to data analysis, we may want to shorten the series. For the first month, sales are very sparse. Perhaps we don’t want to include this data. The plugin makes it easy to clip the series from the beginning or the end in the same time unit as the resampling parameter.

In this case, we removed 31 days from the beginning of the time series:

A Dataiku screenshot of a dataset highlighting the new start date of the time series.

See also

You can try using the Resampling recipe yourself in Tutorial | Time series preparation.

Resampling during model design#

The Resampling recipe is a powerful tool, but you might not always need to create an extra recipe for resampling. In fact, resampling is a step that is embedded in the model design phase of visual time series modeling.

Most of the configurations in this panel are analogous to the Resampling recipe settings.

Dataiku screenshot of the resampling settings available in a time series forecast visual model.

What’s next?#

Now that you know the basics of resampling, you can do further data preparation like calculating total sales per month or a rolling 7-day average, for example.

Next, see how to identify periods when data values are within a given range under certain conditions using the Interval Extraction recipe!