miroblog / deep_rl_trader Goto Github PK

View Code? Open in Web Editor NEW

403.0 22.0 92.0 3.25 MB

Trading Environment(OpenAI Gym) + DDQN (Keras-RL)

Python 100.00%

keras-rl deep-reinforcement-learning trading keras

deep_rl_trader's Introduction

Deep RL Trader (Duel DQN) Implemented using Keras-RL

This repo contains

Trading environment(OpenAI Gym) for trading crypto currency
Duel Deep Q Network
Agent is implemented using keras-rl(https://github.com/keras-rl/keras-rl)

Agent is expected to learn useful action sequences to maximize profit in a given environment.
Environment limits agent to either buy, sell, hold stock(coin) at each step.
If an agent decides to take a

LONG position it will initiate sequence of action such as buy- hold- hold- sell
for a SHORT position vice versa (e.g.) sell - hold -hold -buy.

Only a single position can be opened per trade.

Thus invalid action sequence like buy - buy will be considered buy- hold.
Default transaction fee is : 0.0005

Reward is given

when the position is closed or
an episode is finished.

This type of sparse reward granting scheme takes longer to train but is most successful at learning long term dependencies.

Agent decides optimal action by observing its environment.

Trading environment will emit features derived from ohlcv-candles(the window size can be configured).
Thus, input given to the agent is of the shape (window_size, n_features).

With some modification it can easily be applied to stocks, futures or foregin exchange as well.

Visualization / Main / Environment

Sample data provided is 5min ohlcv candle fetched from bitmex.

train : './data/train/ 70000
test : './data/train/ 16000

Prerequisites

keras-rl, numpy, tensorflow ... etc

pip install -r requirements.txt

# change "keras-rl/core.py" to "./modified/core.py"

Getting Started

Create Environment & Agent

# create environment
# OPTIONS
ENV_NAME = 'OHLCV-v0'
TIME_STEP = 30
PATH_TRAIN = "./data/train/"
PATH_TEST = "./data/test/"
env = OhlcvEnv(TIME_STEP, path=PATH_TRAIN)
env_test = OhlcvEnv(TIME_STEP, path=PATH_TEST)

# random seed
np.random.seed(123)
env.seed(123)

# create_model
nb_actions = env.action_space.n
model = create_model(shape=env.shape, nb_actions=nb_actions)
print(model.summary())


# create memory
memory = SequentialMemory(limit=50000, window_length=TIME_STEP)

# create policy
policy = EpsGreedyQPolicy()# policy = BoltzmannQPolicy()

# create agent
# you can specify the dueling_type to one of {'avg','max','naive'}
dqn = DQNAgent(model=model, nb_actions=nb_actions, memory=memory, nb_steps_warmup=200,
               enable_dueling_network=True, dueling_type='avg', target_model_update=1e-2, policy=policy,
               processor=NormalizerProcessor())
dqn.compile(Adam(lr=1e-3), metrics=['mae'])

Train and Validate

# now train and test agent
while True:
    # train
    dqn.fit(env, nb_steps=5500, nb_max_episode_steps=10000, visualize=False, verbose=2)
    try:
        # validate
        info = dqn.test(env_test, nb_episodes=1, visualize=False)
        n_long, n_short, total_reward, portfolio = info['n_trades']['long'], info['n_trades']['short'], info[
            'total_reward'], int(info['portfolio'])
        np.array([info]).dump(
            './info/duel_dqn_{0}_weights_{1}LS_{2}_{3}_{4}.info'.format(ENV_NAME, portfolio, n_long, n_short,
                                                                        total_reward))
        dqn.save_weights(
            './model/duel_dqn_{0}_weights_{1}LS_{2}_{3}_{4}.h5f'.format(ENV_NAME, portfolio, n_long, n_short,
                                                                        total_reward),
            overwrite=True)
    except KeyboardInterrupt:
        continue

Configuring Agent

## simply plug in any keras model :)
def create_model(shape, nb_actions):
    model = Sequential()
    model.add(CuDNNLSTM(64, input_shape=shape, return_sequences=True))
    model.add(CuDNNLSTM(64))
    model.add(Dense(32))
    model.add(Activation('relu'))
    model.add(Dense(nb_actions, activation='linear'))

Running

[Verbose] While training or testing,

environment will print out (current_tick , # Long, # Short, Portfolio)

[Portfolio]

initial portfolio starts with 100*10000(krw-won)
reflects change in portfolio value if the agent had invested 100% of its balance every time it opened a position.

[Reward]

simply pct earning per trade.

Inital Result

Trade History : Buy (green) Sell (red)

Cumulative Return, Max Drawdown Period (red)

total cumulative return :[0] -> [3.670099054203348]
portfolio value [1000000] -> [29415305.46593453]

Wow ! 29 fold return, 3.67 reward !
! Disclaimer : if may have overfitted :(

Authors

Lee Hankyol - Initial work - deep_rl_trader

License

This project is licensed under the MIT License - see the LICENSE.md file for details

deep_rl_trader's People

Contributors

Stargazers

Watchers

Forkers

iloveopenworld sanfendu zhangarejiu geraybos hcyjason elephann andrew-yian kimtaesu raghavb deveweber greg2paris upalm citymap tobby2002 michael-pacheco wizcap ray-0403 expert68 mathematixy zhuzhenping adeetyaa4 deeptradingfx chenxingqiang joomladigger gostevehsieh mzs0207 botbuildtra beannguyen limpins darwin2k herruli vxgu86 chenyitian darkstar4401 bringer-of-light rusdyn tonylibing arnoldkuo q-learning-trader watchsea zerounnet kyjo2014 opensourceweb3 2vpetrov laokpa elliotvilhelm hpzhen 0xdarkman tradv jrkerns purple666 shaggy63 genarionogueira joshua-xia lumptyd jjshin95 janesbuaa lgh0504 yutiansut ewardyou hkskunal077 dariussd widebowl omnitron freqai xfunture b30wulf himikk vsinic jindongyang94 inforse xiaolongguo lixiaosi33 matan22g webclinic017 tuandat64 benwaldner overfittingstudyroom swate101 tianhm ydeh22 gabsgroi bassemfg code0987 ll0rd iq-scm yolandazeng aragier thanhen

deep_rl_trader's Issues

Cannot Run this

Using TensorFlow backend.
2020-04-26 16:18:38.050381: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cudart64_101.dll'; dlerror: cudart64_101.dll not found
2020-04-26 16:18:38.050737: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
ModuleNotFoundError: No module named 'numpy.core._multiarray_umath'
ImportError: numpy.core.multiarray failed to import
ImportError: numpy.core._multiarray_umath failed to import
ImportError: numpy.core.umath failed to import
2020-04-26 16:18:38.671400: F tensorflow/python/lib/core/bfloat16.cc:675] Check failed: PyBfloat16_Type.tp_base != nullptr

1.Is this because I dont have a GPU?
2.My IDE is Pycharm, should use Anaconda to match the environment? (Cause the pycharm keeps telling me some requirements couldnt be installed, including "Anaconda client ==1.6.0 , bitarray == 0.8.1 ......... "

error when running model.fit

Thank you so much for this great project.

When i try to run ddqn_rl_trader.py on windows (my computer has no GPU, so i use LSTM instead of CuDNNLSTM), i get the following errors:

2019-01-17 17:06:16.101245: I T:\src\github\tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary
start episode ... XBTUSD_5m_70000_train.csv at 0
Traceback (most recent call last):
File "ddqn_rl_trader.py", line 81, in
main()
File "ddqn_rl_trader.py", line 65, in main
dqn.fit(env, nb_steps=5500, nb_max_episode_steps=10000, visualize=False, verbose=0)
File "C:\Python36\lib\site-packages\rl\core.py", line 182, in fit
if not np.isreal(value):
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

May i ask what change i can make to this problem?

Thanks a lot

run on google colab "error"

Hi, thank you for this implementation of reinforcement learning.

I created a google colab file. I succeded to run most of the code but I got an error at the very last part.
This is the error im getting :
`Training for 5500 steps ...
start episode ... XBTUSD_5m_70000_train.csv at 0

ValueError Traceback (most recent call last)

in ()
1 while True:
2 # train
----> 3 dqn.fit(env, nb_steps=5500, nb_max_episode_steps=10000, visualize=False, verbose=2)
4 try:
5 # validate

/usr/local/lib/python3.6/dist-packages/rl/core.py in fit(self, env, nb_steps, action_repetition, callbacks, verbose, visualize, nb_max_start_steps, start_step_policy, log_interval, nb_max_episode_steps)
180 observation, r, done, info = self.processor.process_step(observation, r, done, info)
181 for key, value in info.items():
--> 182 if not np.isreal(value):
183 continue
184 if key not in accumulated_info:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()`

How can I fix this?

I shared the colab file so it can mabe help other people to.
https://colab.research.google.com/drive/1DyURfsL9091Hx8IsEKwPGFxVl_aUEspp

Thank you for your help,
greg

pip install error

Could not find a version that satisfies the requirement anaconda-client==1.6.0

Traceback: DQN expects a model that has one dimension for each action, in this case 3.

Making all the changes to the PIP code to run, this is not the first time this error has occurred:

Traceback (most recent call last):
File "y:/python_udemy/deep_rl_trader-master/deep_rl_trader/ddqn_rl_trader.py", line 80, in
main()
File "y:/python_udemy/deep_rl_trader-master/deep_rl_trader/ddqn_rl_trader.py", line 59, in main
processor=NormalizerProcessor())
File "C:\Users\danilo.martins\Anaconda3\lib\site-packages\rl\agents\dqn.py", line 111, in init
raise ValueError('Model output "{}" has invalid shape. DQN expects a model that has one dimension for each action, in this case {}.'.format(model.output, self.nb_actions))
ValueError: Model output "Tensor("dense_2/BiasAdd:0", shape=(?, 3), dtype=float32)" has invalid shape. DQN expects a model that has one dimension for each action, in this case 3.

What am I doing wrong, you know?
That's what I just tried to give the RUN to see how it would work here.

I think there is a look-ahead bias

Hi there, nice work.
However I think there is a look-ahead bias.
Every timestep, you get state and this state includes the current closeprice.
Then with step method you calculate profit as:

self.exit_price = self.closingPrice
self.reward += ((self.entry_price - self.exit_price)/self.exit_price + 1)*(1-self.fee)**2 - 1 # calculate reward

In this case you are using the same information that you already used to predict the next action.
What do you think about it?

Training Data / Validation Data Overlap?

I noticed in the /data folder, the training data in /train includes all data for validation data in /test. There's no validation split in the model so I assume validation datapoints also have a chance to be trained by the model.
Doesn't that lead to overfit and exaggerated model performance?

How to use this model in real trade env?

The project implements train and test function. If it is used in actual trading envirionment, how to use this model to predict action when every K bar is close? The DQNAgent has not predict function, use model.predict() or some function else?