Tutorial | Time series analysis#

Get started#

Let’s perform some statistical analyses on time series data. We’ll be using a time series containing weekly price information for three airline stocks: United Airlines (UAL), American Airlines (AAL), and Delta Airlines (DAL).

Objectives#

In this tutorial, you will:

  • Learn about a multivariate time series dataset.

  • Perform built-in statistical tests on the data for trends, autocorrelation, and stationarity.

  • Interpret the output of these tests.

Prerequisites#

  • A Dataiku instance (version 11 and above).

  • Previous exposure to Interactive Statistics in Dataiku is helpful, but not required.

Create the project#

  1. From the Dataiku Design homepage, click + New Project > DSS tutorials > ML Practitioner > Time Series Analysis.

  2. From the project homepage, click Go to Flow (or g + f).

Note

You can also download the starter project from this website and import it as a zip file.

Explore the Flow#

If you want to build models from data, it is always important to explore that data beforehand by plotting charts and performing statistical analyses. Time series data is no exception.

By exploring your time series data, you’ll better understand underlying trends, patterns, and correlations. These insights will guide your feature engineering and inform which kinds of algorithms would be best suited for modeling the data.

Review the starting Flow#

In this project, the airline_stocks dataset is transformed into a prepared train dataset.

The train dataset is a time series dataset that includes three important columns:

Column

Description

Ticker

Stores the stock symbol identifying three independent time series for the three airlines: American (AAL), Delta (DAL), and United (UAL).

Date

Stores weekly timestamps from 2008 to January 2022.

Adj_close

Stores the stock’s daily closing price.

Training dataset for the project.

See also

To learn more about time series data preparation, visit Concept | Time series preparation.

Review the existing charts#

Creating a visual representation of time series data can be a valuable first step in identifying patterns, including trends and seasonalities. An initial chart has been provided for you.

  1. Navigate to the Charts tab of the train dataset.

  2. Interactively explore the existing line plots.

Dataiku screenshot of the Charts tab of a time series dataset.

We can observe that:

  • Airline stock prices fall in early 2020, likely due to the COVID-19 pandemic.

  • There appears to be a general upward trend from 2009 to 2020 for the UAL and DAL time series.

  • AAL stock prices started to decrease in 2017.

See also

See Tutorial | Time series preparation for an exercise on visualizing time series data. Visit Charts for more general information.

Test for autocorrelation in time series data#

Tests for autocorrelation assess whether a time series is correlated to lagged versions of itself. You can create a plot to assess the autocorrelation over a year (52 weeks) for the adjusted closing price.

Create an autocorrelation function plot#

Let’s add this kind of card to the statistics worksheet.

  1. From the existing statistics worksheet, click + New Card.

  2. Select Time series analysis > Autocorrelation > Autocorrelation function plot.

  3. On the Settings tab, provide Adj_close as the series variable as before. Uncheck the box to automatically compute the lags, and specify 52 as the number of lags.

  4. Switch to the Multiple series tab.

  5. Check the box for multiple series. Ticker should already be set as the series identifier.

  6. Click Create Card.

Dataiku screenshot of the dialog to create an autocorrelation statistics card.

Interpret the output#

The spikes at each lag indicate autocorrelation. The autocorrelation appears to decrease as the time lags increase. Intuitively, you can expect that the stock price for a given day will be correlated with the prices from the previous days or weeks, and this correlation will reduce as time progresses.

Dataiku screenshot of output of an autocorrelation function plot.

See also

You can also conduct the Durbin-Watson statistical test to confirm the presence of a positive serial correlation in the time series.

Test for stationarity in time series data#

The Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test aims to reject the hypothesis that the series is stationary. If the hypothesis is rejected, you can conclude the series is not stationary. Otherwise, the test is inconclusive. Non-stationary data is extremely difficult to estimate accurately.

Create a KPSS test#

Let’s add one last card to the worksheet.

  1. From the existing statistics worksheet, click + New Card.

  2. Select Time series analysis > Stationarity > Kwiatkowski-Phillips-Schmidt-Shin test.

  3. On the Settings tab, provide Adj_close as the series variable as before.

  4. Switch to the Multiple series tab.

  5. Check the box for multiple series. Ticker should already be set as the series identifier.

  6. Click Create Card.

Dataiku screenshot of the dialog to create a KPSS statistics card.

Interpret the output#

At the default 95% confidence level, the results indicate that each series is not stationary. This could be due to an external event, such as market fluctuations.

Dataiku screenshot of the output of a KPSS test.

See also

See the reference documentation for other stationarity and unit root tests, such as the Augmented Dickey-Fuller test or the Zivot-Andrews test.

What’s next?#

Congratulations on taking your first steps with performing statistical analyses on time series data! Now that you’ve tested statistical assumptions on this data, your next step may be to build forecasting models.

See how to do that in Tutorial | Time series forecasting (Visual ML)!

See also

You can find more information about Time Series Analysis in the reference documentation.