# How-To: Perform Statistical Analysis on Time Series Data¶

Before building models on a dataset, it is important to explore the data by plotting charts and performing statistical analyses. This process applies to any dataset, including time series data.

By exploring your time series data, you’ll understand its characteristics better. For example, you can get insights into the underlying trends, patterns, correlations, etc. These insights will help you know which kinds of feature engineering to apply to your time series data and decide on the kinds of algorithms that would be best suited for modeling the data.

This article covers how to perform various statistical analyses on a dataset that contains weekly price information for three airline stocks United Airlines (UAL), American Airlines (AAL), and Delta Airlines (DAL). The following figure shows plots of the adjusted closing price Adj Close of the three time series in the dataset.

The plots show a dip in airline stock prices in early 2020 — likely due to the COVID pandemic. Also, there appears to be a general upward trend from 2009 to 2020 for the UAL and DAL time series, and not so much for AAL.

Let’s now see how you can perform statistical analyses on the time series to supplement the insights you’ve gained from the charts. In the next few sections, you’ll run some of the available tests within Dataiku to investigate if there are trends and autocorrelation within the data.

Note

You can follow along using the Forecasting Time Series With Visual ML (Tutorial) which uses the same datasets. From the Dataiku homepage, click +New Project > DSS Tutorials > Time Series > Forecasting Time Series With Visual ML (Tutorial).

Note

For a complete list of the available time series tests in Dataiku, see Time Series Analysis in the product documentation.

## Test for Autocorrelation in Time Series Data¶

Tests for autocorrelation allow you to assess whether a time series is correlated to lagged versions of itself. You can create a plot to assess the autocorrelation over a year (52 weeks) for the adjusted closing price of the UAL time series.

Dataiku creates a test card that contains the autocorrelation plot.

The spikes at each lag indicate autocorrelation. The autocorrelation appears to decrease as the time lags increase. Intuitively, you can expect that the stock price for a given day will be correlated with the prices from the previous days or weeks, and this correlation will reduce as time progresses.

Note

You can also conduct the Durbin-Watson statistical test to confirm the presence of a positive serial correlation in the time series.

## Test for Stationarity in Time Series Data¶

Kwiatkowski-Phillips-Schmidt-Shin (KPSS) aims to reject the hypothesis that the series is stationary. If the hypothesis is rejected, you can conclude the series is not stationary, otherwise, the test is inconclusive. Non-stationary data is extremely difficult to estimate accurately.

Dataiku lets you apply resampling when a series does not have a constant, regular time step. For example, let’s say you want to create a plot to assess the stationarity for the adjusted closing price of the AAL time series where the data for the week of 12/6/2021 is missing.

Dataiku creates a test card that contains the stationarity plot. We can see that the Time summary table indicates a Resampled time step.

At the default 95% confidence level, the results indicate Adj_close is not stationary. This could be due to an external event such as market fluctuations.

Note

You may be interested in other time series tests provided by the visual time series interface such as the Augmented Dickey-Fuller test or the Zivot-Andrews test. These test for properties such as stationarity or the presence of a unit root.

## What’s Next?¶

Congratulations on taking your first steps with performing statistical analyses on time series data.

You can continue learning by checking out the Time Series Analysis page in the product documentation and by trying out the other time series analysis cards in the Statistics worksheet of the dataset.