The attached file, dax-30-2006-2015, contains the closing levels of the German stock index, DAX-30, and the daily percentage changes for the period between February 2006 and the end of 2015. Develop ARIMA/GARCH models for the observed data.
The analysis should include
The model I arrived at for describing and forecasting the DAX stock market is ARIMA(0,1,0) (with a Box-Cox transformation) + GARCH(2,1). Below are the requested explanations and additional calculations.
dax <- read.csv("dax30dailyreturns-2006-2015.csv")
The first step is to look at the data. For simplicity, we are going to treat the daily data as a basic time series, ignoring actual dates as well as the gaps in the data caused by weekends.
daxTs <- ts(dax$price)
autoplot(daxTs)
Next we will run auto.arima to find two things: First, what the suggested arima model is, and also we will check the residuals for signs of volatility clustering, or conditional volatility.
The suggested model is ARIMA(0,1,0) random walk without drift, with a BoxCox transformation with a lambda of 0.86. The unconditional variance is 803.1. Checking residuals shows that the plot is close to white noise - the ACF is mostly below threshold levels. However, the residuals plot does appear to show clusters of volatility.
daxAr <- daxTs %>% auto.arima(allowdrift = T, lambda="auto")
daxAr
## Series: .
## ARIMA(0,1,0)
## Box Cox transformation: lambda= 0.8602771
##
## sigma^2 = 803.1: log likelihood = -12036.52
## AIC=24075.04 AICc=24075.04 BIC=24080.87
checkresiduals(daxAr)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,1,0)
## Q* = 12.202, df = 10, p-value = 0.2718
##
## Model df: 0. Total lags used: 10
To check more closely for residuals clustering we can plot the square of residuals. This seems to clearly show that volatility is clustered - in other words our unconditional variance term is likely not accurate.
plot(daxAr$residuals^2)
To deal with conditional volatility, we will model the data using the GARCH model. Because R does not provide a function to automatically pick parameters for the GARCH(p,q) model, the loop function below will test values of 1-3 for both p and q. The formula assumes an ARMA(0,0) (which applies no autoregression or moving average model) on an already differenced and BoxCox transformed DAX dataset.
Below, GARCH(2,1) is shown to provide the best fit, with the lowest AIC of 9.305.
daxDiff = daxTs %>% BoxCox(daxAr$lambda[1]) %>% diff()
for (p in 1:3){
for (q in 1:3){
formula_str <- sprintf("arma(0,0) + garch(%s, %s)", p, q) %>% reformulate()
daxFit <- garchFit(formula = formula_str, data = daxDiff , trace = F)
str <- sprintf("GARCH(%s, %s) - AIC: %s", p, q, daxFit@fit$ics["AIC"])
print(str)
}
}
## [1] "GARCH(1, 1) - AIC: 9.30612738370554"
## [1] "GARCH(1, 2) - AIC: 9.31151287056042"
## Warning in sqrt(diag(fit$cvar)): NaNs produced
## [1] "GARCH(1, 3) - AIC: 9.30812298993742"
## [1] "GARCH(2, 1) - AIC: 9.30479578848344"
## [1] "GARCH(2, 2) - AIC: 9.30558724079916"
## [1] "GARCH(2, 3) - AIC: 9.30634996672897"
## [1] "GARCH(3, 1) - AIC: 9.30575733159481"
## [1] "GARCH(3, 2) - AIC: 9.30586520535771"
## Warning in sqrt(diag(fit$cvar)): NaNs produced
## [1] "GARCH(3, 3) - AIC: 9.30567511667776"
Now we will assess the results from the ARMA(0,0) + GARCH(2,1). It is worth noting that GARCH(2,1) may be slightly overkill, because as mentioned in class, the “simple GARCH (1,1) models often works very well in representing volatility clustering occurring in financial markets where conditional variance is positively correlated.”
GARCH is applied to an ARMA model which does not include differencing, so I will run the garchFit function on already differenced data.
daxGarch = garchFit(~arma(0,0)+garch(2,1), data = daxDiff , trace = F)
summary(daxGarch)
##
## Title:
## GARCH Modelling
##
## Call:
## garchFit(formula = ~arma(0, 0) + garch(2, 1), data = daxDiff,
## trace = F)
##
## Mean and Variance Equation:
## data ~ arma(0, 0) + garch(2, 1)
## <environment: 0x000001ffafe60a80>
## [data = daxDiff]
##
## Conditional Distribution:
## norm
##
## Coefficient(s):
## mu omega alpha1 alpha2 beta1
## 1.536901 13.858488 0.037866 0.061404 0.885926
##
## Std. Errors:
## based on Hessian
##
## Error Analysis:
## Estimate Std. Error t value Pr(>|t|)
## mu 1.53690 0.44395 3.462 0.000536 ***
## omega 13.85849 3.62296 3.825 0.000131 ***
## alpha1 0.03787 0.01960 1.932 0.053352 .
## alpha2 0.06140 0.02415 2.542 0.011008 *
## beta1 0.88593 0.01569 56.465 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Log Likelihood:
## -11751.61 normalized: -4.650419
##
## Description:
## Sat Dec 10 20:24:27 2022 by user: gideo
##
##
## Standardised Residuals Tests:
## Statistic p-Value
## Jarque-Bera Test R Chi^2 205.9121 0
## Shapiro-Wilk Test R W 0.9878889 7.891146e-14
## Ljung-Box Test R Q(10) 6.416474 0.7791441
## Ljung-Box Test R Q(15) 9.068859 0.8738896
## Ljung-Box Test R Q(20) 14.32942 0.8134087
## Ljung-Box Test R^2 Q(10) 4.147413 0.940452
## Ljung-Box Test R^2 Q(15) 8.986804 0.8782064
## Ljung-Box Test R^2 Q(20) 15.4441 0.7504572
## LM Arch Test R TR^2 6.977546 0.8590938
##
## Information Criterion Statistics:
## AIC BIC SIC HQIC
## 9.304796 9.316341 9.304788 9.308985
Using this GARCH model we will assess some plots to check performance. Note that the residuals of the GARCH model are the same as the the differenced series - GARCH affects the variance, not the estimated value.
ACF The below ACF plot shows that most lag correlations do not exceed the 95% confidence interval correlation band, although some do by a little bit. The 5th lag correlation is most concerning. Although not shown for space reasons, the PACF chart shows similar levels of partial autocorrelations, also with a slightly concerning 5th lag.
ggAcf(daxGarch@residuals)
Ljung-Box test The large p-value of the Ljung Box (0.89) test suggest that the residuals are white noise.
Box.test(daxGarch@residuals)
##
## Box-Pierce test
##
## data: daxGarch@residuals
## X-squared = 0.020138, df = 1, p-value = 0.8872
QQPlot This plot shows some lack of normality at tail ends. Although it is useful to know that that the distribution of residuals is not perfectly normal, this result does not necessarily mean that the residuals are not white noise.
residuals(daxGarch) %>% qqnorm()
Using the “predict” function we can predict differenced data for 10 days into the future. A plot and tabular data is below. The plot shows the confidence interval gradually converging to +/- 55, which is 1.96 times the series’ unconditional standard deviation of 28.3, which was given to us by the original ARIMA function.
predict(daxGarch, plot=TRUE, nx=50)
## meanForecast meanError standardDeviation lowerInterval upperInterval
## 1 1.536901 45.34574 45.34574 -87.33912 90.41292
## 2 1.536901 44.52459 44.52459 -85.72968 88.80349
## 3 1.536901 44.40135 44.40135 -85.48814 88.56195
## 4 1.536901 44.23601 44.23601 -85.16408 88.23788
## 5 1.536901 44.07508 44.07508 -84.84868 87.92248
## 6 1.536901 43.91565 43.91565 -84.53620 87.61000
## 7 1.536901 43.75789 43.75789 -84.22699 87.30079
## 8 1.536901 43.60176 43.60176 -83.92098 86.99479
## 9 1.536901 43.44726 43.44726 -83.61817 86.69197
## 10 1.536901 43.29438 43.29438 -83.31852 86.39232
Note that these are predictions for the differenced series, or the change in each unit of time. To see the actual series prediction (without a 95% confidence interval) we can reverse the BoxCox and differencing transformations and then add these values to the final point of the DAX series.
predict(daxGarch)$meanForecast %>% diffinv() %>% InvBoxCox(lambda = daxAr$lambda[1]) + daxTs[length(daxTs)]
## [1] 10744.01 10745.67 10747.51 10749.45 10751.49 10753.59 10755.76 10757.98
## [9] 10760.24 10762.55 10764.90
First we should just describe what our model does, after the series has been appropriately transformed via Box-Cox and then differenced.
The ARMA(0,0) + GARCH(2,1) model (with differencing) describes a univariate time series where:
The below plot of volatility from the GARCH model alongside the original data demonstrates that the model used correctly assumes higher volatility during apparent volatility clusters.
par(mfrow = c(2,1))
plot(diff(daxTs))
plot(volatility(daxGarch), type="l")
The period of highest volatility (x-index between 500 and 1,000) aligns with the onset of a global recession in 2008-2009, which makes sense. The reason having conditional volatility is important for financial time series is that volatility changes based on economic and political conditions.
The high volatility at the end of the series is also shown by the wide confidence interval in the prediction plot, which then converges toward the unconditional variance.
Finally, here is the plot of daxGarch, which shows that most points fall withing the 95th confidence interval of the prediction. The biggest “miss” is the enormous drop at time period 500. Although such a sudden fall would be hard to capture in any model, this does point at one of the flaws in the standard GARCH model - which is that it does not allow for different reactions to negative or positive changes. In reality, large negative changes tend to create more volatility than large positive changes. 1
plot(daxGarch, which = 3)
Based on this analysis, I would say that our model: