Contents

### Data

Market data for several German stock indices (DAX, MDAX, SDAX, and TecDAX) were collected from Thomson Reuters Eikon (formerly Datastream). Trading volume was measured as turnover by volume for the DAX, MDAX, and SDAX and as turnover by value for the TecDAX due to data restrictions in the Thomson Reuters Eikon database. Due to these data restrictions, the time series for trading volume in the TecDAX was shorter and contained 2,144 observations compared to 3,565 observations for all other index and variable combinations. Control variables were also included, which were dummy variables for Monday (Wang et al. 1997), January and December (Agrawal and Tandon 1994; Gultekin and Gultekin 1983), and the ’Sell in May and go away strategy’ named the Halloween anomaly (Bouman and Jacobsen 2002). Furthermore, the previous day’s yield on the Dow Jones Industrial Average (DJIA) was included as a variable (Drozdz et al. 2001). The DJIA returns were calculated on the basis of the performance index; the German indices were also calculated based on the total return system. Additionally, we added a dummy variable to capture any possible turn-of-the-month effect in the data (Zwergel 2010). For the weather, we used data from the Climate Data Center (CDC) of the German Weather Service. The data were for the period from August 2003 to August 2017 in Frankfurt (Station-ID 01420). Frankfurt is Germany’s financial center; as Germany is relatively small compared to the US or China, it ensures that a large proportion of domestic investors are exposed to the weather in Frankfurt or to similar weather conditions (Schneider 2014a,b). Schneider (2014a) showed, for example, that air pressure conditions are highly correlated across Germany. Similar results were found by Klein (2005), who found high correlations of sunshine duration and cloudiness between major German cities. Therefore, the weather in Frankfurt is a good proxy for that in other German cities.

The selected weather variables were sky cover, temperature, precipitation, air pressure, humidity and wind speed. Sunshine was not selected due to multicollinearity with cloud cover. To account for seasonal weather patterns, we followed Hirshleifer and Shumway (2003) and calculated the average value of each weather variable for a particular calendar week over the whole dataset. Each daily observation was subtracted by the corresponding weekly mean. This method ensured that the variable being measured was the impact of abnormal weather conditions on stock markets. Table 5 summarizes the variables and their descriptions.

**Table 5 Variables and their descriptions**

Full size table

### Descriptives

To test for the normality of the stock market data, we used the Jarque-Bera test. The return, trading volume, and volatility data were not normally distributed, suggesting that the residuals of subsequent regressions would not be normally distributed either. Hence, we used robust standard errors for the significance tests. Autocorrelation in the data was assessed by means of the Ljung-Box (LB) Q test. A significant test result indicated the presence of autocorrelation or the absence of white noise. Moreover, the test results indicated autocorrelation and the existence of volatility clusters. The difference stationarity of the stock market data was tested with the augmented Dickey-Fuller (ADF) test and trend-stationarity with the Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test. If stationarity was present, then the ADF test should be statistically significant, and the KPSS test should not. These results indicated nonstationarity for turnover by volume and value time series, and thus, we transformed the corresponding values using the first difference of the natural logarithm. Regarding the test results, we assumed that the time series were stationary. Table 6 shows the descriptives of the weather variables, and Tables 7 and 8 show the stock market descriptives and the rest of the results.

**Table 6 Descriptives of weather variables**

Full size table

**Table 7 Descriptives of stock market data I**

Full size table

**Table 8 Descriptives of stock market data II**

Full size table

### Model

Stock market returns have specific characteristics that cannot be adequately represented by classical time-series models or simple OLS regressions. These characteristics include leptokurtic distributions, higher-order autocorrelations and volatility clusters. The autoregressive conditional heteroskedasticity (ARCH) models introduced by Engle (1982) can handle time series with these specific characteristics and assume that conditional variance is a function of the available information from previous periods. In this way, the error term varies over time. To model financial time series, ARCH models have been replaced by GARCH models, which allow for a more parsimonious specification. Classic ARCH or GARCH models assume a symmetrical effect of positive and negative errors on volatility. According to this assumption, both good and bad news should have symmetrical effects on the variation in the data. However, this assumption often does not stand up to empirical scrutiny for certain capital market data. In the case of stock returns, for example, it has been observed that volatility reacts more sensitively to falling prices or bad news than to rising prices or good news, respectively. This asymmetrical reaction of volatility is called the leverage effect (Black 1976) and is considered in the exponential GARCH (E-GARCH) model proposed by Nelson (1991) and the threshold GARCH (T-GARCH or GJR-GARCH) model proposed by Glosten et al. (1993).

Indeed, our dataset displayed the abovementioned characteristics. The LB test revealed strong autocorrelation in the returns and trading volume series, and the data showed volatility clustering. As a result, and following other studies (e.g., Chang et al. 2006; Floros 2011; Kang et al. 2010, and Yoon and Kang 2009), we applied GARCH models to capture this volatility clustering and to consider heteroskedasticity in the estimation (Bollerslev 1986).

To investigate the relationship between stock returns and abnormal weather conditions, we chose a linear autoregressive (AR(2)) model with the GJR-GARCH(1,1) process from Glosten et al. (1993). In all models, following good empirical research practices, we applied Bollerslev-Wooldridge error terms from the maximum likelihood estimation, which were robust to conditional nonnormality (Zivot 2009).

$$\text{RET}_{i,t}=\text{mu}_{i,0}+w_{1}\text{RET}_{i,t-1}+w_{2}\text{RET}_{i,t-2}+w_{3}\text{WIND*}_{t}+w_{4}\text{PREC*}_{t}+w_{5}\text{SKC*}_{t}+w_{6}\text{PRES*}_{t}+w_{7}\text{TEMP*}_{t}+w_{8}\text{HUMI*}_{t}+w_{9}\text{DJIA}_{t-1}+w_{10}\text{MON}+w_{11}\text{DEC}+w_{12}\text{Halloween}+w_{13}\text{JAN}+w_{14}\text{TURN}+w_{15}\text{TUR*}+\epsilon_{i,t},$$

(1)

$$\sigma^{2}_{t}=\alpha_{0}+\sum_{i=1}^{q}(\alpha_{i}+\gamma d_{t-1})\epsilon_{t-1}^{2}+\sum_{j=1}^{p}\beta_{j}\sigma^{2}_{t-j}.$$

(2)

Eq. (1) includes autoregressive processes to correct for the autocorrelation of returns. In addition, the weather and control variables are included as explanatory variables. The error term \(\epsilon_{t}\) is a zero-mean white noise process and is normally distributed. Eq. (2) gives the specification of the conditional variance of \(\sigma_{t}^{2}\) at time \(t\), where \(\alpha\) represents the lagged squared residuals and can be interpreted as the news coefficient, with higher values implying that more recent news has a greater impact. \(\beta\) is the conditional variance of previous periods, showing the impact of past variance, and \(\alpha+\beta\) measures the persistence of volatility (Bollerslev 1986).

The GJR specification allows for an asymmetric impact of bad and good news on conditional variance. The leverage effect \(\gamma\) is considered via the dummy variable \(d\), where \(d_{t}=1\) if \(\epsilon_{t}<0\) and \(d_{t}=0\) otherwise. In this way, good and bad news can have different impacts on conditional volatility. Good news (\(\epsilon_{t}\geq 0\)) has an impact of \(\alpha_{i}\), while bad news (\(\epsilon_{t}<0\)) has an impact of \(\alpha+\gamma\). If \(\gamma\) is significant and positive leverage exists, then bad news increases volatility. For \(\gamma=0\), the model is reduced to a symmetric GARCH model. The nonnegativity constraint is satisfied if \(\alpha_{0}> 0\), \(\alpha_{i}+\gamma> 0\), \(\beta_{j}> 0\).

A similar model with a GJR-GARCH(1,1) process is adopted to assess the relationship between stock returns and daily changes in weather.

$$\text{RET}_{i,t}=\text{mu}_{i,0}+w_{1}\text{RET}_{i,t-1}+w_{2}\text{RET}_{i,t-2}+w_{3}\Delta\text{WIND}_{t}+w_{4}\Delta\text{PREC}_{t}+w_{5}\Delta\text{SKC}_{t}+w_{6}\Delta\text{PRES}_{t}+w_{7}\Delta\text{TEMP}_{t}+w_{8}\Delta\text{HUMI}_{t}+w_{9}\text{DJIA}_{t-1}+w_{10}\text{MON}+w_{11}\text{DEC}+w_{12}\text{Halloween}+w_{13}\text{JAN}+w_{14}\text{TURN}+w_{15}\text{TUR*}+\epsilon_{i,t}.$$

(3)

To analyze the relationship between stock volatility and weather factors, we selected the linear autoregressive (AR) model with the E‑GARCH(1,1) process from Nelson (1991) because it avoided nonnegativity constraints for the parameters in the variance equation, which now include weather and control variables. The logarithmic function of the conditional variance (Eq. 5) ensures that the variance is positive. E‑GARCH models, like for GJR-GARCH processes, can capture asymmetry in the volatility.

$$\text{RET}_{i,t}=\text{mu}_{i}+\rho_{i}\text{RET}_{i,t}+\epsilon_{i,t}.$$

(4)

$$\ln(\sigma_{i,t}^{2})=\alpha_{0}+\alpha_{i}+g(z_{t-1})+\beta_{i}\ln\sigma_{i,t-1}^{2}+\sum_{k=1}^{l}m_{ik}M_{ik,t},$$

(5)

$$\text{with}\;g(z_{t-1})=\Theta\left[\left|\frac{\epsilon_{t-1}}{\sigma_{t-1}}\right|\right]-E\left(\left|\frac{\epsilon_{t-1}}{\sigma_{t-1}}\right|\right)+\gamma\frac{\epsilon_{t-1}}{\sigma_{t-j}}.$$

(6)

Eq. (5) assumes that returns follow an AR(1) process with drift, analogous to the series in Symeonidis et al. (2010). \(M\) represents the weather and control variables. In equation (6), \(\gamma\) shows the sign and leverage effect, \(\Theta\) indicates the size effect, and \(\beta\) displays the persistence.

The impact of weather on trading volume was tested with several models. Based on (unreported) tests (namely, LBQ statistics and an Engle’s ARCH test), a linear AR(5) model with the GJR-GARCH(1,1) process was identified as the most appropriate model. The weather and control variables were regressed against the first difference of the logarithmized trading volume (TUR*). The variance equation conformed to the return models.

$$\text{TUR*}_{i,t}=\text{mu}_{i,0}+\sum^{5}_{l=1}{z_{il}\text{TUR*}_{il,t}}+h_{1}\text{WIND*}_{t}+h_{2}\text{PREC*}_{t}+h_{3}\text{SKC*}_{t}+h_{4}\text{PRES*}_{t}+h_{5}\text{TEMP*}_{t}+h_{6}\text{HUMI*}_{t}+h_{7}\text{RET}_{i,t-1}+h_{8}\text{DJIA}_{t-1}+h_{9}\text{MON}+h_{10}\text{DEC}+h_{11}\text{Halloween}+h_{12}\text{JAN}+h_{13}\text{TURN}+\epsilon_{i,t}.$$

(7)

### Regression Diagnostics and Robustness

For maximum-likelihood-based procedures, the quality of the model fit was determined by means of the Akaike and Bayesian information criteria (AIC and BIC, respectively), which are mainly used for model selection and the detection of overfitting and thus are not relevant for our purposes. To test for the existence of residual heteroskedasticity, we used the Lagrange multiplier (LM) test proposed by Engle (1982). Nonsignificant test results indicated homoscedastic residuals. Table 9 shows the ARCH-LM test results for different lag parameters. With the exception of lag 7 for turnover by volume on the TecDAX, all test results were nonsignificant. Accordingly, we could assume homoscedastic residuals. The autocorrelation of the residuals was tested by means of the LB test with different lags and with standardized and squared standardized residuals (Table 10). The LB test on standardized residuals evaluated the dependence of the first moments with a time lag. The LB test on the squares of standardized residuals, similar to the ARCH-LM test, evaluated the dependence of the second moments with a time lag.

The clearly significant results for the turnover-by-volume model for the DAX, MDAX, SDAX, and TecDAX reflected an autocorrelation problem that was already present upon model selection (see Sect. 3.3) and could not be completely resolved by our AR(5) model. However, all further changes to the model specification (e.g., a higher number of lags and the multiple differentiation of trading volume) did not lead to an improvement but, in fact, worsened the diagnostic values. Therefore, we retained the GJR-GARCH(1,1) AR(5) model. The LB test on the squares of standardized residuals and the ARCH-LM test showed no problems. In summary, the regression diagnostics showed the good usability of the models, even if there were autocorrelation problems for the turnover-by-volume model.

We tested the robustness of the results in two ways. First, we removed all outliers from the data and then recalculated the GARCH models. The results remained constant, even with the outliers excluded. Another robustness test was carried out to vary the distribution assumption of the GARCH specification. For this, the models were computed with the generalized error distribution (GED) and Student’s t distribution, instead of the normal distribution we used for our calculations. Except for the results for trade volume, the effects changed only slightly, even after varying the distribution assumptions. One reason for the lack of robustness in trade volume could be the heteroscedasticity problem discussed earlier. Therefore, we saw no evidence of a lack of robustness in the results. We can provide the comprehensive robustness results upon request.

**Table 9 Regression diagnostics: ARCH-LM test**

Full size table

**Table 10 Regression diagnostics: LB test**

Full size table

### Results

The results are presented in detail in Tables 16–19 in the appendix and in concise form in Tables 11–14 in this section. For the interpretation of the results, we used only the abridged tables.

In contrast to the findings of traditional studies, here, we could not observe a sunshine or cloud cover effect. One reason for this might be that almost all former studies identifying a sunshine or cloud cover effect adopted classic OLS or time-series models, which cannot accurately represent stock market data, as they are characterized by autocorrelation and volatility clustering. As a consequence, it cannot be ruled out that the significant results detected in the prior literature might be spurious. Only Yoon and Kang (2009) used a model that was appropriate for capital market data, namely, a GJR model, to identify a significant impact of cloud cover on stock returns in Korea in the period prior to the Asian financial crisis (1990–1997); however, this impact disappeared in the post-crisis period (1998–2006).

In total, 8 significant effects could be found that could be assigned to the theoretical construct of the AIM and 3 significant effects in connection with the MMH. These findings can be taken as a weak indication that good weather leads to risk-seeking behavior and that bad weather to risk-averse behavior in the stock market. A more detailed discussion is provided in Sect. 4.

### Returns

The results mainly showed no weather effects in any of the German stock markets when the dependent variable was returns. There was only a statistically significant effect of air pressure on SDAX returns. Thus, good weather conditions may have a positive effect on returns (AIM), but the predominantly missing effects point to a rejection of H1a and H1b.

In addition, we modeled the effect of daily changes in weather on returns and found more significant effects. If air pressure increases, then the returns of the DAX, MDAX and SDAX increase (AIM). Only for the TecDax does no significant correlation with air pressure appear. In addition, our results show positive effects of a temperature improvement on the DAX and MDAX (AIM). In contrast to the literature (see Table 2), which found mainly negative effects on returns from temperature increases, an increase in temperature in the German market leads to a positive effect on returns, which can be attributed to the temperate climate in Germany, in that a rising temperature represents a positive change in weather, whereas in Asian markets, for example, a rise in temperature tends to denote a worsening of the weather. Given these effects of the changes in weather in terms of air pressure and temperature, this indicates the confirmation of H1c. However, since the effects are not consistently observable across the large and small indices and since other weather influences are absent, we also cannot confirm H1c.

**Table 11 Regression results overview: Returns**

Full size table

**Table 12 Regression results overview: Changes in weather and returns**

Full size table

### Volatility

Among the weather variables, we observed three statistically significant effects (see Table 13). Wind speed reduced the volatility of the SDAX (AIM), and relative humidity reduced the volatility of the TecDAX (AIM). Thus, bad weather conditions may have had a negative effect on volatility, which is indicative of risk-averse behavior and thus attributable to the AIM. In contrast, cloud cover had a positive impact on TecDAX volatility, which is attributable to the MMH. Since there were no weather effects for the DAX and MDAX and only 3 contradictory effects for the SDAX and TecDAX, we could not confirm H2a or H2b.

**Table 13 Regression results overview: Volatility**

Full size table

### Trading volume

The regression results show significant negative effects of air pressure on trading volume for the SDAX and TecDAX. A rise in air pressure could be associated with good weather, which leads to decreased trading (MMH). These effects are in line with H3b, which posited that good weather conditions lead to a lower trading volume. However, since we did not observe effects from any of the other variables, the existing effects could be shown for only the SDAX and TecDAX, and there were still some autocorrelation problems for the analysis of trading volume (see Sect. 3.4), we were not able to confirm H3b.

**Table 14 Regression results overview: Trading volume**

Full size table

### GARCH vs. OLS

The majority of past empirical weather anomaly studies used OLS regression. However, this was not adequate in most cases due to heteroskedasticity issues, even when controlling for heteroskedasticity using White or Newey-West standard errors. Our literature review showed that for returns, for example, not even one-third of the studies used modern financial econometrics for empirical analysis (see also Sect. 2).

How serious an influence the choice of method has on the results can be shown by a comparative analysis. A calculation of our models with OLS using White estimators led to completely different results compared to those identified using the GARCH model. The following Table shows an overview of the GARCH and OLS results. If there is interest in the detailed regression tables, they can be provided upon request.

**Table 15 Regression results overview: GARCH vs. OLS**

Full size table

Table 15 shows that only one significant effect is detectable with OLS regression for the impact of weather on returns. For changes in weather, the results showed a positive influence of sky cover on the DAX. The GARCH model, conversely, identified one positive effect of air pressure on returns in the SDAX and five positive effects of changes in air pressure and temperature on the DAX, MDAX and SDAX. The analysis of trading volume also showed that OLS regression provided a completely different picture of these relationships. Although the GARCH model showed only two negative effects of air pressure on trading volume for the SDAX and TecDAX, OLS regression showed one positive effect of wind on trading volume and eight negative effects of sky cover, air pressure, temperature and humidity for the DAX, MDAX and SDAX.

These different results make it clear that the choice of method has a significant impact on the results or that the violation of application requirements of econometric models for the detection of financial market anomalies can lead to incorrect conclusions. At the same time, it is of great importance to consider which control variables are used. In particular, month effects (e.g., Halloween effect and Monday, January, and December dummies) should be controlled; otherwise, they could be incorrectly assigned to weather.