Tutorial | Time series analysis#

Get started#

This tutorial covers how to perform various statistical analyses on a time series containing weekly price information for three airline stocks: United Airlines (UAL), American Airlines (AAL), and Delta Airlines (DAL).

Objectives#

In this tutorial, you will:

  • Perform built-in statistical analyses on time series data, including tests such as those for trends, autocorrelation, and stationarity.

Prerequisites#

  • A Dataiku instance (version 11 and above).

  • Previous exposure to Interactive Statistics in Dataiku is helpful, but not required.

Create the project#

  1. From the Dataiku Design homepage, click + New Project > DSS tutorials > ML Practitioner > Time Series Analysis.

  2. From the project homepage, click Go to Flow (or g + f).

Note

You can also download the starter project from this website and import it as a zip file.

You’ll next want to build the Flow.

  1. Click Flow Actions at the bottom right of the Flow.

  2. Click Build all.

  3. Keep the default settings and click Build.

Explore the Flow#

For any kind of data, before ever building models, it is important to explore the data by plotting charts and performing statistical analyses. Time series data is no exception.

By exploring your time series data, you’ll understand its characteristics better. For example, you can get insights into the underlying trends, patterns, and correlations. These insights will guide your feature engineering and inform which kinds of algorithms that would be best suited for modeling the data.

Review the starting Flow#

After a few brief preparation steps, the train dataset in the starter project includes three columns of importance:

Column

Description

Ticker

Stores the stock symbol identifying three independent time series for the three airlines: American (AAL), Delta (DAL), and United (UAL).

Date

Stores weekly timestamps from 2008 to January 2022.

Adj_close

Stores the stock’s daily closing price we hope to predict or forecast.

Training dataset for the project.

Note

If building forecasting models later, note that the airline_stocks_prepared dataset contains all the training and validation data. This is helpful because Dataiku requires that the input datasets to the Evaluate and Scoring recipes include the historical (training) data used by the time series model.

Review the existing charts#

Plotting the data is a good initial step to see if you can observe any patterns in the time series, such as trends and seasonalities. This has already been done for you.

  1. Navigate to the Charts tab of the train dataset.

  2. Interactively explore the existing line plots.

Dataiku screenshot of the Charts tab of a time series dataset.

The plots show a dip in airline stock prices in early 2020 — likely due to the COVID pandemic. Also, there appears to be a general upward trend from 2009 to 2020 for the UAL and DAL time series, perhaps less so for AAL.

See also

See the tutorial on time series preparation for an exercise on visualizing time series data. More generally, you can find resources on charts in the Knowledge Base.

Test for autocorrelation in time series data#

Tests for autocorrelation assess whether a time series is correlated to lagged versions of itself. You can create a plot to assess the autocorrelation over a year (52 weeks) for the adjusted closing price.

Create an autocorrelation function plot#

Let’s add this kind of card to the statistics worksheet.

  1. From the existing statistics worksheet, click + New Card.

  2. Select Time series analysis > Autocorrelation > Autocorrelation function plot.

  3. On the Settings tab, provide Adj_close as the series variable as before. Uncheck the box to automatically compute the lags, and specify 52 as the number of lags.

  4. Switch to the Multiple series tab.

  5. Check the box for multiple series. Ticker should already be set as the series identifier.

  6. Click Create Card.

Dataiku screenshot of the dialog to create an autocorrelation statistics card.

Interpret the output#

The spikes at each lag indicate autocorrelation. The autocorrelation appears to decrease as the time lags increase. Intuitively, you can expect that the stock price for a given day will be correlated with the prices from the previous days or weeks, and this correlation will reduce as time progresses.

Dataiku screenshot of output of an autocorrelation function plot.

See also

You can also conduct the Durbin-Watson statistical test to confirm the presence of a positive serial correlation in the time series.

Test for stationarity in time series data#

The Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test aims to reject the hypothesis that the series is stationary. If the hypothesis is rejected, you can conclude the series is not stationary. Otherwise, the test is inconclusive. Non-stationary data is extremely difficult to estimate accurately.

Create a KPSS test#

Let’s add one last card to the worksheet.

  1. From the existing statistics worksheet, click + New Card.

  2. Select Time series analysis > Stationarity > Kwiatkowski-Phillips-Schmidt-Shin test.

  3. On the Settings tab, provide Adj_close as the series variable as before.

  4. Switch to the Multiple series tab.

  5. Check the box for multiple series. Ticker should already be set as the series identifier.

  6. Click Create Card.

Dataiku screenshot of the dialog to create a KPSS statistics card.

Interpret the output#

At the default 95% confidence level, the results indicate that each series is not stationary. This could be due to an external event, such as market fluctuations.

Dataiku screenshot of the output of a KPSS test.

See also

See the reference documentation for other stationarity and unit root tests, such as the Augmented Dickey-Fuller test or the Zivot-Andrews test.

What’s next?#

Congratulations on taking your first steps with performing statistical analyses on time series data! Now that you’ve tested statistical assumptions on this data, your next step may be to build forecasting models.

See how to do that in Tutorial | Time series forecasting (Visual ML)!

See also

You can find more information about Time Series Analysis in the reference documentation.