Giter Site home page Giter Site logo

deep_rl_trader's Introduction

Deep RL Trader (Duel DQN) Implemented using Keras-RL

This repo contains

  1. Trading environment(OpenAI Gym) for trading crypto currency
  2. Duel Deep Q Network
    Agent is implemented using keras-rl(https://github.com/keras-rl/keras-rl)

Agent is expected to learn useful action sequences to maximize profit in a given environment.
Environment limits agent to either buy, sell, hold stock(coin) at each step.
If an agent decides to take a

  • LONG position it will initiate sequence of action such as buy- hold- hold- sell
  • for a SHORT position vice versa (e.g.) sell - hold -hold -buy.

Only a single position can be opened per trade.

  • Thus invalid action sequence like buy - buy will be considered buy- hold.
  • Default transaction fee is : 0.0005

Reward is given

  • when the position is closed or
  • an episode is finished.

This type of sparse reward granting scheme takes longer to train but is most successful at learning long term dependencies.

Agent decides optimal action by observing its environment.

  • Trading environment will emit features derived from ohlcv-candles(the window size can be configured).
  • Thus, input given to the agent is of the shape (window_size, n_features).

With some modification it can easily be applied to stocks, futures or foregin exchange as well.

Visualization / Main / Environment

Sample data provided is 5min ohlcv candle fetched from bitmex.

  • train : './data/train/ 70000
  • test : './data/train/ 16000

Prerequisites

keras-rl, numpy, tensorflow ... etc

pip install -r requirements.txt

# change "keras-rl/core.py" to "./modified/core.py"

Getting Started

Create Environment & Agent

# create environment
# OPTIONS
ENV_NAME = 'OHLCV-v0'
TIME_STEP = 30
PATH_TRAIN = "./data/train/"
PATH_TEST = "./data/test/"
env = OhlcvEnv(TIME_STEP, path=PATH_TRAIN)
env_test = OhlcvEnv(TIME_STEP, path=PATH_TEST)

# random seed
np.random.seed(123)
env.seed(123)

# create_model
nb_actions = env.action_space.n
model = create_model(shape=env.shape, nb_actions=nb_actions)
print(model.summary())


# create memory
memory = SequentialMemory(limit=50000, window_length=TIME_STEP)

# create policy
policy = EpsGreedyQPolicy()# policy = BoltzmannQPolicy()

# create agent
# you can specify the dueling_type to one of {'avg','max','naive'}
dqn = DQNAgent(model=model, nb_actions=nb_actions, memory=memory, nb_steps_warmup=200,
               enable_dueling_network=True, dueling_type='avg', target_model_update=1e-2, policy=policy,
               processor=NormalizerProcessor())
dqn.compile(Adam(lr=1e-3), metrics=['mae'])

Train and Validate

# now train and test agent
while True:
    # train
    dqn.fit(env, nb_steps=5500, nb_max_episode_steps=10000, visualize=False, verbose=2)
    try:
        # validate
        info = dqn.test(env_test, nb_episodes=1, visualize=False)
        n_long, n_short, total_reward, portfolio = info['n_trades']['long'], info['n_trades']['short'], info[
            'total_reward'], int(info['portfolio'])
        np.array([info]).dump(
            './info/duel_dqn_{0}_weights_{1}LS_{2}_{3}_{4}.info'.format(ENV_NAME, portfolio, n_long, n_short,
                                                                        total_reward))
        dqn.save_weights(
            './model/duel_dqn_{0}_weights_{1}LS_{2}_{3}_{4}.h5f'.format(ENV_NAME, portfolio, n_long, n_short,
                                                                        total_reward),
            overwrite=True)
    except KeyboardInterrupt:
        continue

Configuring Agent

## simply plug in any keras model :)
def create_model(shape, nb_actions):
    model = Sequential()
    model.add(CuDNNLSTM(64, input_shape=shape, return_sequences=True))
    model.add(CuDNNLSTM(64))
    model.add(Dense(32))
    model.add(Activation('relu'))
    model.add(Dense(nb_actions, activation='linear'))

Running

[Verbose] While training or testing,

  • environment will print out (current_tick , # Long, # Short, Portfolio)

[Portfolio]

  • initial portfolio starts with 100*10000(krw-won)
  • reflects change in portfolio value if the agent had invested 100% of its balance every time it opened a position.

[Reward]

  • simply pct earning per trade.

Inital Result

Trade History : Buy (green) Sell (red)

trade
partial_trade

Cumulative Return, Max Drawdown Period (red)

cum_return

  • total cumulative return :[0] -> [3.670099054203348]
  • portfolio value [1000000] -> [29415305.46593453]

Wow ! 29 fold return, 3.67 reward !
! Disclaimer : if may have overfitted :(

Authors

License

This project is licensed under the MIT License - see the LICENSE.md file for details

deep_rl_trader's People

Contributors

miroblog avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deep_rl_trader's Issues

Cannot Run this

Using TensorFlow backend.
2020-04-26 16:18:38.050381: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cudart64_101.dll'; dlerror: cudart64_101.dll not found
2020-04-26 16:18:38.050737: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
ModuleNotFoundError: No module named 'numpy.core._multiarray_umath'
ImportError: numpy.core.multiarray failed to import
ImportError: numpy.core._multiarray_umath failed to import
ImportError: numpy.core.umath failed to import
2020-04-26 16:18:38.671400: F tensorflow/python/lib/core/bfloat16.cc:675] Check failed: PyBfloat16_Type.tp_base != nullptr

1.Is this because I dont have a GPU?
2.My IDE is Pycharm, should use Anaconda to match the environment? (Cause the pycharm keeps telling me some requirements couldnt be installed, including "Anaconda client ==1.6.0 , bitarray == 0.8.1 ......... "

error when running model.fit

Thank you so much for this great project.

When i try to run ddqn_rl_trader.py on windows (my computer has no GPU, so i use LSTM instead of CuDNNLSTM), i get the following errors:

2019-01-17 17:06:16.101245: I T:\src\github\tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary
start episode ... XBTUSD_5m_70000_train.csv at 0
Traceback (most recent call last):
File "ddqn_rl_trader.py", line 81, in
main()
File "ddqn_rl_trader.py", line 65, in main
dqn.fit(env, nb_steps=5500, nb_max_episode_steps=10000, visualize=False, verbose=0)
File "C:\Python36\lib\site-packages\rl\core.py", line 182, in fit
if not np.isreal(value):
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

May i ask what change i can make to this problem?

Thanks a lot

run on google colab "error"

Hi, thank you for this implementation of reinforcement learning.

I created a google colab file. I succeded to run most of the code but I got an error at the very last part.
This is the error im getting :
`Training for 5500 steps ...
start episode ... XBTUSD_5m_70000_train.csv at 0


ValueError Traceback (most recent call last)

in ()
1 while True:
2 # train
----> 3 dqn.fit(env, nb_steps=5500, nb_max_episode_steps=10000, visualize=False, verbose=2)
4 try:
5 # validate

/usr/local/lib/python3.6/dist-packages/rl/core.py in fit(self, env, nb_steps, action_repetition, callbacks, verbose, visualize, nb_max_start_steps, start_step_policy, log_interval, nb_max_episode_steps)
180 observation, r, done, info = self.processor.process_step(observation, r, done, info)
181 for key, value in info.items():
--> 182 if not np.isreal(value):
183 continue
184 if key not in accumulated_info:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()`

How can I fix this?

I shared the colab file so it can mabe help other people to.
https://colab.research.google.com/drive/1DyURfsL9091Hx8IsEKwPGFxVl_aUEspp

Thank you for your help,
greg

pip install error

Could not find a version that satisfies the requirement anaconda-client==1.6.0

Traceback: DQN expects a model that has one dimension for each action, in this case 3.

Making all the changes to the PIP code to run, this is not the first time this error has occurred:

Traceback (most recent call last):
File "y:/python_udemy/deep_rl_trader-master/deep_rl_trader/ddqn_rl_trader.py", line 80, in
main()
File "y:/python_udemy/deep_rl_trader-master/deep_rl_trader/ddqn_rl_trader.py", line 59, in main
processor=NormalizerProcessor())
File "C:\Users\danilo.martins\Anaconda3\lib\site-packages\rl\agents\dqn.py", line 111, in init
raise ValueError('Model output "{}" has invalid shape. DQN expects a model that has one dimension for each action, in this case {}.'.format(model.output, self.nb_actions))
ValueError: Model output "Tensor("dense_2/BiasAdd:0", shape=(?, 3), dtype=float32)" has invalid shape. DQN expects a model that has one dimension for each action, in this case 3.

What am I doing wrong, you know?
That's what I just tried to give the RUN to see how it would work here.

I think there is a look-ahead bias

Hi there, nice work.
However I think there is a look-ahead bias.
Every timestep, you get state and this state includes the current closeprice.
Then with step method you calculate profit as:

self.exit_price = self.closingPrice
self.reward += ((self.entry_price - self.exit_price)/self.exit_price + 1)*(1-self.fee)**2 - 1 # calculate reward

In this case you are using the same information that you already used to predict the next action.
What do you think about it?

Training Data / Validation Data Overlap?

I noticed in the /data folder, the training data in /train includes all data for validation data in /test. There's no validation split in the model so I assume validation datapoints also have a chance to be trained by the model.
Doesn't that lead to overfit and exaggerated model performance?

How to use this model in real trade env?

The project implements train and test function. If it is used in actual trading envirionment, how to use this model to predict action when every K bar is close? The DQNAgent has not predict function, use model.predict() or some function else?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.