# Cointegration

Template:Expert-subject Cointegration is a statistical property of time series variables. Two or more time series are cointegrated if they share a common stochastic drift.

## Introduction

If two or more series are individually integrated (in the time series sense) but some linear combination of them has a lower order of integration, then the series are said to be cointegrated. A common example is where the individual series are first-order integrated (I(1)) but some (cointegrating) vector of coefficients exists to form a stationary linear combination of them. For instance, a stock market index and the price of its associated futures contract move through time, each roughly following a random walk. Testing the hypothesis that there is a statistically significant connection between the futures price and the spot price could now be done by testing for the existence of a cointegrated combination of the two series. (If such a combination has a low order of integration—in particular if it is I(0), this can signify an equilibrium relationship between the original series, which are said to be cointegrated.)

Before the 1980s many economists used linear regressions on (de-trended{{ safesubst:#invoke:Unsubst||date=__DATE__ |\$B= {{#invoke:Category handler|main}}{{#invoke:Category handler|main}}[citation needed] }}) non-stationary time series data, which Nobel laureate Clive Granger and Paul Newbold showed to be a dangerous approach that could produce spurious correlation,  since standard detrending techniques can result in data that are still non-stationary. His 1987 paper with Robert Engle formalized the cointegrating vector approach, and coined the term.

The possible presence of cointegration must be taken into account when choosing a technique to test hypotheses concerning the relationship between two variables having unit roots (i.e. integrated of at least order one).

The usual procedure for testing hypotheses concerning the relationship between non-stationary variables was to run ordinary least squares (OLS) regressions on data which had initially been differenced. This method is incorrect if the non-stationary variables are cointegrated. Cointegration measures may be calculated over sets of time series using fast routines.Template:Dead link

## Test

The three main methods for testing for cointegration are:

### Engle–Granger two-step method

If two time series $x_{t}$ and $y_{t}$ are cointegrated, a linear combination of them must be stationary. In other words:

$y_{t}-\beta x_{t}=u_{t}\,$ If we knew $u_{t}$ , we could just test it for stationarity with something like a Dickey–Fuller test, Phillips–Perron test and be done. But because we don't know $\beta$ , we must estimate this first, generally by using ordinary least squares, and then run our stationarity test on the estimated $u_{t}$ series, often denoted ${\hat {u}}_{t}$ .

A second regression is then run on the first differenced variables from the first regression, and the lagged residuals ${\hat {u}}_{t-1}$ is included as a regressor.

This is the Engle–Granger two-step method.

### Johansen test

The Johansen test is a test for cointegration that allows for more than one cointegrating relationship, unlike the Engle–Granger method, but this test is subject to asymptotic properties, i.e. large samples. If the sample size is too small then the results will not be reliable and one should use Auto Regressive Distributed Lags (ARDL). 

### Phillips–Ouliaris cointegration test

Peter C. B. Phillips and Sam Ouliaris (1990) show that residual-based unit root tests applied to the estimated cointegrating residuals do not have the usual Dickey–Fuller distributions under the null hypothesis of no-cointegration. Because of the spurious regression phenomenon under the null hypothesis, the distribution of these tests have asymptotic distributions that depend on (1) the number of deterministic trend terms and (2) the number of variables with which co-integration is being tested. These distributions are known as Phillips–Ouliaris distributions and critical values have been tabulated. In finite samples, a superior alternative to the use of these asymptotic critical value is to generate critical values from simulations.