karolinachalupova / diplomathesis Goto Github PK

View Code? Open in Web Editor NEW

1.0 0.0 0.0 41.32 MB

Explaining Equity Returns with Interpretable Machine Learning

TeX 72.14% PLSQL 0.97% SuperCollider 0.34% Python 26.55%

diplomathesis's Introduction

DiplomaThesis

Can Machines Explain Stock Returns?

Available here
Defense: February 2021, Charles University, Faculty of Social Sciences, Institute of Economic Studies
Supervisor: doc. PhdDr. Jozef Baruník, Ph.D.
Consultant: Mgr. Martin Hronec

Abstract

Recent research shows that neural networks predict stock returns better than any other model. The networks' mathematically complicated nature is both their advantage, enabling to uncover complex patterns, and their curse, making them less readily interpretable, which obscures their strengths and weaknesses and complicates their usage. This thesis is one of the first attempts at overcoming this curse in the domain of stock returns prediction. Using some of the recently developed machine learning interpretability methods, it explains the networks' superior return forecasts. This gives new answers to the long-standing question of which variables explain differences in stock returns and clarifies the unparalleled ability of networks to identify future winners and losers among the stocks in the market. Building on 50 years of asset pricing research, this thesis is likely the first to uncover whether neural networks support the economic mechanisms proposed by the literature. To a finance practitioner, the thesis offers the transparency of decomposing any prediction into its drivers, while maintaining a state-of-the-art profitability in terms of Sharpe ratio. Additionally, a novel metric is proposed that is particularly suited to interpret return-predicting networks in financial practice. This thesis offers a usable and economically explainable account of how machines make stock return predictions.

diplomathesis's People

Contributors

Stargazers

diplomathesis's Issues

Correlated features problems

Feature importance measures are calculated by nulling a single input feature. But some features are very correlated (calculated slightly differently from same underlying data), so it does not make sense to keep the correlated features not nulled.
A possible way around this is to only choose a single feauture from a group of very correlated features. I consider doing this manually just using the correlation matrix and my brain to decide.

Add more validation metrics: MSE, MAE, R2

Use smaller number of features

Možná by dávalo smysl použít i menší množství features, protože napříč Kellym (Gu et al, 2018) a vlastně i napříč ostatní literaturou mi připadá, že se modely hodně shodují ve výběru důležitých proměnných, takže bych je prozatím mohla vzít jako dané. Řekla bych, že když bude menší množství features, bude ta interpretabilita daleko přehlednější, což je podle mě dobrý, protože se v tom já i čtenář líp zorientujeme. Viděla bych okolo 30 features namísto současných 150.

@martinhronec what do you think?

Simulations: tiny technical detail question for @martinhronec

@martinhronec
I'm in the process of simulating the returns. In Gu et al., Internet Appendix A, around line 6 page 1 I read:

"in which their variances are calibrated so that the average time series R^2 is 40%".

What do they mean by average timeseries R^2? Does it mean: run auto-regression of return in t+1 on return in t for all firms, calculate R^2, and take average across firms? Thanks!

Can I somehow reduce time dimension to make my life easier?

Right now, the same model is trained multiple times (call it T times) on expanding window, there are essentially T models and T test corresponding test sets (same as Gu et al., 2018, we have discussed this @martinhronec). This makes sense from the perspective of using the model in practice for trading.
However, it is burdensome for purposes of interpretability. Given that I would like to try more architectures (simpler to more complex, as in Gu et al., 2018) and different seeds, there are a lot of models. So number of models is num of architectures times num of seeds tried times T. It would make my life easier if I could only have T=1, in other words, a single train-validation-test split for the whole data. It would be a lot easier to code it as well as to handle the results.
The question is: can I do it? There are two ways of doing this:

ASSUMING IT: getting permission to do it from prior literature, eg. Gu et al, 2018 - is the model stable in time? I need to look.
SHOWING IT: showing that the model is stable in time myself. This means showing e.g. that feature importance is stable in time.

@martinhronec any thoughts?

Decide if I want to use US data instead of rest of the world data.

Right now, I only have rest of world.
@martinhronec : Is it worth it to do both US and rest of world? Or only US? Or only rest?
Considerations:

more data means more observations
I suppose US data will be more clean in terms of less missing observations and more liquidity (but liquidity should be taken care of by your filter @martinhronec )

Liquidity filter - do I have the right data from @martinhronec?

@martinhronec Tomas mentioned that there was a problem with the liquidity filter. I do not have the code of your liquidity filter, just the dscodes that pass the filter. Tomas said that the filter was more strict than it should be. I just want to make sure I have the right data. Even if I dont, this may be immaterial, as the filter is not wrong, just strict.

A few models predict the same number for all inputs (just for some seeds). What should I do?

@martinhronec I had a deeper look on the models by random seed for models trained on 12 years and models trained on 13 years.
I discovered that a few of them (3/(592) = 3 percent of models) learned to predict the same number no matter what input.
It appears in deep models in particular. The following models suffer from the issue:

among models trained on 12 years:
- architecture with 4 hidden layers, 5th seed
- architecute with 5 hidden layers, 8th seed
among models trained on 13 year:
- architecture with 5 hidden layers, 9th seed

The predicted number is always very close to the mean of the training data.

What do you think I should do about these models?

Understanding Fisher et al. 2019: All Models are Wrong, but Many are Useful

I think this paper could be crucial. https://arxiv.org/abs/1801.01489
With a single measure:

It gives confidence intervals for feature importance
helps understand why ensembles work
helps with the problem of interpreting models with correlated features

If I understand correctly, the MR is permutation feature importance (based on decrease in loss). MCR provides confidence interval for MR, using epsilon-Rashomon set.
- Issue A: I cannot find an implementation in python. I can try to code it up. There is R interpretation from the authors: https://github.com/aaronjfisher/mcr
- Issue B: I do not understand what models consitute the empirical epsilon-Rashomon set - what is their general family - different models / architectures / seeds?

karolinachalupova / diplomathesis Goto Github PK

diplomathesis's Introduction

DiplomaThesis

Abstract

diplomathesis's People

Contributors

Stargazers

diplomathesis's Issues

Correlated features problems

Add more validation metrics: MSE, MAE, R2

Use smaller number of features

Simulations: tiny technical detail question for @martinhronec

Can I somehow reduce time dimension to make my life easier?

Decide if I want to use US data instead of rest of the world data.

Liquidity filter - do I have the right data from @martinhronec?

A few models predict the same number for all inputs (just for some seeds). What should I do?

Understanding Fisher et al. 2019: All Models are Wrong, but Many are Useful

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent