Tutorial | Time series analysis#
Get started#
This tutorial covers how to perform various statistical analyses on a time series containing weekly price information for three airline stocks: United Airlines (UAL), American Airlines (AAL), and Delta Airlines (DAL).
Objectives#
In this tutorial, you will:
Perform built-in statistical analyses on time series data, including tests such as those for trends, autocorrelation, and stationarity.
Prerequisites#
A Dataiku instance (version 11 and above).
Previous exposure to Interactive Statistics in Dataiku is helpful, but not required.
Create the project#
From the Dataiku Design homepage, click + New Project > DSS tutorials > ML Practitioner > Time Series Analysis.
From the project homepage, click Go to Flow (or
g
+f
).
Note
You can also download the starter project from this website and import it as a zip file.
You’ll next want to build the Flow.
Click Flow Actions at the bottom right of the Flow.
Click Build all.
Keep the default settings and click Build.
Explore the Flow#
For any kind of data, before ever building models, it is important to explore the data by plotting charts and performing statistical analyses. Time series data is no exception.
By exploring your time series data, you’ll understand its characteristics better. For example, you can get insights into the underlying trends, patterns, and correlations. These insights will guide your feature engineering and inform which kinds of algorithms that would be best suited for modeling the data.
Review the starting Flow#
After a few brief preparation steps, the train dataset in the starter project includes three columns of importance:
Column |
Description |
---|---|
Ticker |
Stores the stock symbol identifying three independent time series for the three airlines: American (AAL), Delta (DAL), and United (UAL). |
Date |
Stores weekly timestamps from 2008 to January 2022. |
Adj_close |
Stores the stock’s daily closing price we hope to predict or forecast. |
Note
If building forecasting models later, note that the airline_stocks_prepared dataset contains all the training and validation data. This is helpful because Dataiku requires that the input datasets to the Evaluate and Scoring recipes include the historical (training) data used by the time series model.
Review the existing charts#
Plotting the data is a good initial step to see if you can observe any patterns in the time series, such as trends and seasonalities. This has already been done for you.
Navigate to the Charts tab of the train dataset.
Interactively explore the existing line plots.
The plots show a dip in airline stock prices in early 2020 — likely due to the COVID pandemic. Also, there appears to be a general upward trend from 2009 to 2020 for the UAL and DAL time series, perhaps less so for AAL.
See also
See the tutorial on time series preparation for an exercise on visualizing time series data. More generally, you can find resources on charts in the Knowledge Base.
Test for trends in time series data#
The previous chart showed that the time series seem to have a general upward trend. You can use the Mann-Kendall trend test to confirm this trend.
Create a statistics worksheet#
The first step is to create the correct card in a statistics worksheet.
From the train dataset, navigate to the Statistics tab, and click + Create Your First Worksheet.
Select Time series analysis as the card type.
Select the Trend & Seasonality panel on the left of the dialog.
Choose the Mann-Kendall trend test.
Configure a Mann-Kendall trend test#
Now we just supply the correct variables to the card.
On the Settings tab of the card, select Adj_close as the series variable and Date as the time variable.
Switch to the Multiple series tab.
Check the box indicating the data has multiple series (one for each airline in our case).
Select Ticker as the series identifier.
Click Create Card.
Interpret the output#
In this case, this test confirms that each time series has an upward or increasing trend, given a significance level of 0.05.
Test for autocorrelation in time series data#
Tests for autocorrelation assess whether a time series is correlated to lagged versions of itself. You can create a plot to assess the autocorrelation over a year (52 weeks) for the adjusted closing price.
Create an autocorrelation function plot#
Let’s add this kind of card to the statistics worksheet.
From the existing statistics worksheet, click + New Card.
Select Time series analysis > Autocorrelation > Autocorrelation function plot.
On the Settings tab, provide Adj_close as the series variable as before. Uncheck the box to automatically compute the lags, and specify
52
as the number of lags.Switch to the Multiple series tab.
Check the box for multiple series. Ticker should already be set as the series identifier.
Click Create Card.
Interpret the output#
The spikes at each lag indicate autocorrelation. The autocorrelation appears to decrease as the time lags increase. Intuitively, you can expect that the stock price for a given day will be correlated with the prices from the previous days or weeks, and this correlation will reduce as time progresses.
See also
You can also conduct the Durbin-Watson statistical test to confirm the presence of a positive serial correlation in the time series.
Test for stationarity in time series data#
The Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test aims to reject the hypothesis that the series is stationary. If the hypothesis is rejected, you can conclude the series is not stationary. Otherwise, the test is inconclusive. Non-stationary data is extremely difficult to estimate accurately.
Create a KPSS test#
Let’s add one last card to the worksheet.
From the existing statistics worksheet, click + New Card.
Select Time series analysis > Stationarity > Kwiatkowski-Phillips-Schmidt-Shin test.
On the Settings tab, provide Adj_close as the series variable as before.
Switch to the Multiple series tab.
Check the box for multiple series. Ticker should already be set as the series identifier.
Click Create Card.
Interpret the output#
At the default 95% confidence level, the results indicate that each series is not stationary. This could be due to an external event, such as market fluctuations.
See also
See the reference documentation for other stationarity and unit root tests, such as the Augmented Dickey-Fuller test or the Zivot-Andrews test.
What’s next?#
Congratulations on taking your first steps with performing statistical analyses on time series data! Now that you’ve tested statistical assumptions on this data, your next step may be to build forecasting models.
See how to do that in Tutorial | Time series forecasting (Visual ML)!
See also
You can find more information about Time Series Analysis in the reference documentation.