aminhp / gym-anytrading Goto Github PK

The most simple, flexible, and comprehensive OpenAI Gym trading environment (Approved by OpenAI Gym)

License: MIT License

Python 100.00%

openai-gym reinforcement-learning q-learning dqn trading trading-environments forex stocks gym-environments trading-algorithms

gym-anytrading's People

Contributors

Stargazers

Watchers

Forkers

andribiz vyorick hasanzad jerudamaja nayan96 jacketme wwxfromtju lukemshannonhill munkichung pelikhovp hgazali carlomigs ajs1ngh septumcapital profintegra lymsh codlaug kuan-li ap-rl-research timpara dsadulla sbhadade zysilence simonesalvucci ersawant bwang12 giancds jeisonbatista nborggren kelvin-76 abstractguy fallendev wjsxlb2017 mohala562 iluvmf qthen kylinliu virneo super-pirata ivanfoong henryforyou nhu2000 shubhsoni ryanbacastow kekeke29341 macdonaldezra bartoszkaszewczuk gittrainee323 webclinic017 glongh boogies gloomystar zhuzhenping 2vpetrov smashingeric cbio71 lujoselu98 jccg mikoim lucifer2288 bionicles zhangjielun1994 albertvillanova fdoperezi rafmacalaba amineaboussalah datamining4finance farshidbalan maxcodextc vinaykachare xiaoli-chen ngoduyvu gliu92 raphaelmansuy saleemjawad breadpowder lu0x1a0 hiforex wallace-163 pjus ashbabu perishabledave sword134 josebarreiros 0trade qwang-big smikelm kenjikun watchsea ufosky-ai lgh0504 alabmh0d satanblack swarajthakur mzs0207 brezels christram xicocaio thegrapesofwrath newshah

gym-anytrading's Issues

Next Steps

After a good bit of looking I haven't been able to find a way to illustrate what the training process has come up with so that I can take it into next steps and write a trading script to use the methodology/model the computer came up with.

With custom imported data I've been able to get relatively consistent 'Explained Variance' close to 1. Which to my knowledge, means that the model and the actual data have very small discrepencies. Meaning the model could potentially be used to make a trading methodology that will have perhaps consistent wins.

My trouble is seeing what exactly the contents of the model the computer came up with. Using quant-stats I can easily see its performance over the test period but that's not quite as useful as seeing how exactly the computer traded to be able to achieve quoted return data.

Any guidance or advice would be appreciated!

[QUESTION] Error in TradeEnv running check_env() from stable baselines

Hi there,

I've encountered an issue running the following code:

import gym
from gym_anytrading.envs import TradingEnv, ForexEnv, Actions, Positions 
from gym_anytrading.datasets import FOREX_EURUSD_1H_ASK
from stable_baselines.common.env_checker import check_env


env = gym.make('forex-v0', frame_bound=(10, 500), window_size=10)
check_env(env, warn=False, skip_render_check=False)

The output I get is:
AssertionError: The observation returned by the reset() method does not match the given observation space

I've debugged your TradingEnv class and did't see any issue, so I've thought the problem could be in check_env().

I've debugged check_env() as well, but everything seems fine there.
Then I went for the last test which was running check_env() with the classic CartPole-v0 from gym, here check_env() didn't trow any exception and run smoothly.

This is the code for the CartPole-v0:

import gym
from gym.envs.classic_control import CartPoleEnv
from stable_baselines.common.env_checker import check_env

env = gym.make('CartPole-v0')
check_env(env, warn=False, skip_render_check=False)

Do you have any clue why this is happening? I'm confused lol

Issue with quantstats

Hi @AminHP
trying your code into example "a2c_quantstats.ipynb" the last part does not run and I got the following error:
File "", line 3, in
net_worth = pd.Series(env.history['total_profit'], index=df.index[start_index+1:end_index])

AttributeError: 'ForexEnv' object has no attribute 'history'

I can not figure out how to fix it, do you have any suggestion?
Thank you

How do you use your init.py to register it to gym?

How to register gym-anytrading env to Gym?

I see that you have the init.py. How do you do this?

Thank you,
Vic

Loading a saved model the right way

I just want to confirm if I am loading a saved model the right way on a test set.

So firstly I ran my preferred model and saved it

My env variable before the model load is only on the test set

Then I do

model = A2C.load(load_path, env=env)
obs = env.reset()
while True:
obs = obs[np.newaxis, ...]
action, _states = model.predict(obs)
........

Sorry I don't know how to indent lines of code in Github comments (I thought tab would do it)

Request

I would really like to see Binary Options supported by this, and am working myself on trying and completing this (by hiring someone) as I lack the knowledge / am struggling on how to complete this. Nadex is my prefered binary options exchange, however, i can see others benefiting from something such-as iq option

Thank you!

DayTrade

Hi .

Hello @AminHP, how are you?
I've been studying for that time and searching more and more about this world of trade and more and more, going to daytrade (intraday).
In this case, for this project, would it be possible for us to use it for training between a specific time during the day?
I have the data, but I'm not sure how to use it to start these trainings and see how it could be useful for our world!
Do you have any suggestions on how to do it, to use this data during a period of the day or even an example, like a light for my head, on how to use this project and learn more?
Thank you, sir!

Clarifications on Frequency of Algo trades

I am playing around with some financial data and testing various models, and I'd like to understand some things:

Firstly, is there a way to plot all the training buy/sell positions against the prices the bot bought and sold at? I can only see reward over time plotted.
In the attachment, can you explain why the reward jumps up from 378 to 379 on the x-axis when it goes from selling a low price to buying a high price? Or is price info hidden and the bot is actually buying a low price and selling a high price in the background?
My biggest challenge is how often should the bot run because every time it is run, it generates a signal. So if I run it every day, it will generate a signal very day, if I run it every 4 days, it will generate a signal every 4 days etc.....it basically generates a signal based on the timesteps of your dataset. So if you have hourly data, is it best practice to run it every hour, and if you have daily data, is it best practice to run it every day? How can we change the reward function to penalize very short term trades and make it trade every once in a while?

Have you tried using multiple cpu on the Example here in A2C?

I am trying to use multiple cpu for the example provided on this link?

I tried to change the environment to multiple cpu.

env = DummyVecEnv([env_maker for i in range(16)])

But I have a problem in the done and info in stable baselines. It seems they turned into arrays.

There is an error in this code: any suggestions or any of you done this? It seems lstm in stable baselines are like this.

#env = env_maker()
#observation = env.reset()

while True:
    #observation = observation[np.newaxis, ...]

    # action = env.action_space.sample()
    action, _states = model.predict(observation)
    observation, reward, done, info = env.step(action)

    # env.render()
    if done:
        print("info:", info)
        break

------------------------------

Error:

```python
ValueError                                Traceback (most recent call last)
<ipython-input-27-2d78acbb8800> in <module>
     10 
     11     # env.render()
---> 12     if done:
     13         print("info:", info)
     14         break

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Alternate way of training the agent to reduce training time (Suggestion/Discussion)

Hello,
I've been thinking whether or not it would be possible to integrate this kind of feature in gymanytrading, basically, it goes like this:

The problem

From what I know about RL the policy gradients are initialized randomly and the agent is rewarded according to its actions, within trading this potentially means that it can take millions of iterations across the dataset before it even comes up with a strategy that is remotely successful, thereafter it spends time optimizing the strategy which again can take a long time. In the end, you are presented with a model that is attempting to maximize its reward, based on the reward structure this can mean that if you are maximizing the total net worth you might end up with a model applying a scalping strategy, where you personally would have liked a model that was swing trading. So how can we potentially adjust for this and also make it faster in the process? Normally you would add several reward functions to reward it based on what kind of strategy you want it to employ, however, there might be another way.

The solution

A theoretical concept that I have been messing around within my head is that if we instead of define reward functions by networth, sortino ratio etc. we should perhaps go in our dataset and place buy and sell markers either manually or mathematically. These buy and sell markers are where you want the RL to ideally enter and exit trades, therefore we reward only the RL when it trades at these points/prices. The obvious concern here is overfitting:
First, we have to address the fact that in all forms of ML you have to split your dataset into training and testing this will allow people to see whether or not it's actually overfitting on the training dataset.
Second of all, a precision parameter could be set, this parameter would determine the range in percent from the buy and sell prices specified which we would still reward the RL for buying and selling at.

An example

Take a daily chart of Apple, if we are going to apply a swing trading strategy, the perfect buy entry would occur on the 23rd of March 2020 at price 212,61$. We would mark this as our buy entry. The perfect sell exit would occur on the 13th of July 2020 at a price of 399,82$, we then mark this as the sell exit. You would keep doing this either manually or mathematically across the entire dataset on which you want to train on.
Next, we would set a precision parameter in this case we set it to 1,5 meaning that we will still reward the agent if it buys at a price of +-1,5% from 212,61$ and sells at a price of +-1,5% from 399,82$.

The impact on the model

So how would this impact our RL? My theory is that our RL agent will be trying to create a strategy that generates entry and exit signals according to these buy and sell points, this will allow for more controllability for the user who can now specify what strategy they want to employ (swing trading, scalping etc.). Besides controllability, the RL would presumably train faster since it doesn't need to find out which entry and exit points generates the most reward (we did that for it), therefore it would instead spend its time going over the dataset to find signals that would trigger withing the precision range of our entries and exits, of course, if these signals also trigger outside the range it gets punished for it so as to avoid it just constantly generating buy and sell signals on each bar.

Final words

This is of course just my take on things and I am posting it here because of two reasons. Number 1 being that this could potentially become a unique feature only in gymanytrading (I haven't seen this elsewhere) given that the other reason I'm posting will hold its ground.
Reason number 2 is to get feedback on this idea, I am still fairly new to RL so if some of the experts out there think that this won't work because of A, B, C or D well then comment here and let's get a discussion going. After all it is in everyone's best interest to get gymanytrading to be as good as possible, even if it means implementing new ideas that might not have been tried before since they could turn out to be the best ideas.

Visualize training behavior (feature suggestion)

Hello,
Today I stumpled upon this mighty fine github repo that allows you to visualize the training behaviour of an RL that runs on OpenAI's Gym. https://github.com/deepmind/bsuite

Just thought that it might be a useful addition to gymanytrading!

Problem with DummyVecEnv wrapped inside a VecNormalize wrapper with render_all() method

The problem is something inside the DummyVecEnv which resets the environment automatically after it is done.

Also, there was a mistake in your code. Try this:

env_maker = lambda: gym.make('forex-v0', frame_bound=(100, 5000), window_size=10)
env = DummyVecEnv([env_maker])

# Training Env
policy_kwargs = dict(net_arch=[64, 'lstm',dict(vf=[128,128,128], pi=[64,64])])
model = A2C("MlpLstmPolicy", env, verbose=1, policy_kwargs=policy_kwargs)
model.learn(total_timesteps=1000)

# Testing Env 
env = env_maker()
observation = env.reset()

while True:
    observation = observation[np.newaxis, ...]
    # action = env.action_space.sample()
    action, _states = model.predict(observation)
    observation, reward, done, info = env.step(action)
    # env.render()
    if done:
        print("info:", info)
        break

# Plotting results
plt.cla()
env.render_all()
plt.show()

Originally posted by @AminHP in #1 (comment)

I saw this reply on a similar problem I had with the render_all() method. Though in my case I am using a VecNormalize() wrapper around my DummyVecEnv. In the solution quoted a DummyVecEnv was made that was used for training, and then another env was instantiated for the prediction/testing that could be used with render all. In my case this won't work since I need the VecNormalize to normalize observations and reward.

env = make_vec_env(env_maker, n_envs=1, monitor_dir=log_dir)
env = VecNormalize(env, norm_obs=True, norm_reward=True, clip_obs=10.)

model = PPO2('MlpLstmPolicy', env, verbose=1, nminibatches=1, policy_kwargs=policy_kwargs,)
callback = SaveOnBestTrainingRewardCallback(check_freq=1000, log_dir=log_dir, env=env, verbose=1)
# model = PPO2('MlpLstmPolicy', env, verbose=1)

model.learn(total_timesteps=5000, callback=callback, log_interval=10)

env.norm_reward = False
env.training = False

mean_reward, std_reward = evaluate_policy(model, env, n_eval_episodes=1)
print(f"mean_reward:{mean_reward:.2f} +/- {std_reward:.2f}")

# I get the expected reward here using evaluate_policy()

plt.figure(figsize=(15,6))
plt.cla()
env.render_all()
plt.show()

# This part doesn't work because of the same error

What can I do to use render_all() method (Or any other attribute like env.history for that matter) while maintaining the VecNormalize() environment?

Using this environment for Course Project

Hi @AminHP ,

I wanted to use this environment as a part of my course project for RL. So, I wanted to ask you if anything has been implemented in this environment prior to this. And if possible if you can give me some resources of algorithms implemented for this environment.

Best,
Kunal

Question: TODO on price validation

@AminHP First off, thank you for your amazing work on this. This has been very helpful for my understanding.

This is just a question on the TODO comment located in the stocks_env line 21 link and forex_env line 22 link.

What did you have in mind for "validating the indices"? Thank you.

Multi-asset datasets

Do you plan to support multi-asset datasets?

What is total_reward and total_profit

Hi, I was going over the code where when we render our results we get the result benchmarks such as

info {'total_reward': 8.100000000000023, 'total_profit': 0.7996927239889693, 'position': 1}

I just want to know what these parameters mean espically total_reward and total_profit.

PIP package not updated

This issue is just to notify that PIP package is not updated, it's still at the first commit dated 22/09/2019.

stable-basellines3 example

It will be very helpful if you can point me some examples using stable-baselines3. I am still not sure how it comparable to stable-baselines as they have big warning box to compare performance. Appreciated.

strong baseline with extend env

def my_process_data(df, window_size, frame_bound):
    prices = df.loc[:, 'NDX'].to_numpy()
    prices[frame_bound[0] - window_size]  # validate index (TODO: Improve validation)
    prices = prices[frame_bound[0]-window_size:frame_bound[1]]
    signal_features = df.to_numpy()#np.column_stack((prices, diff))
    return prices, signal_features

class MyForexEnv(StocksEnv):
    def __init__(self, prices, signal_features, **kwargs):
        self._prices = prices
        self._signal_features = signal_features
        super().__init__(**kwargs)
    def _process_data(self):
        return self._prices, self._signal_features

window_size = 30
start_index = window_size
end_index = len(df)

#env = MyForexEnv(df=df, window_size=10, frame_bound=(start_index, end_index))
prices, signal_features = my_process_data(df=df, window_size=window_size, frame_bound=(start_index, end_index))
env = MyForexEnv( prices, signal_features, df=df, window_size=window_size, frame_bound=(start_index, end_index))

env_maker = lambda: gym.make('env')

env = DummyVecEnv([env_maker])

im trying using the extended env with strong baseline but i keep getting errors:
TypeError: argument of type 'MyForexEnv' is not iterable
or

class MyForexEnv(StocksEnv):
    def __init__(self, prices = prices, signal_features = signal_features, **kwargs):
        self._prices = prices
        self._signal_features = signal_features
        super().__init__(**kwargs)
    def _process_data(self):
        return self._prices, self._signal_features
window_size = 30
start_index = window_size
end_index = len(df)
prices, signal_features = my_process_data(df=df, window_size=window_size, frame_bound=(start_index, end_index))
env_maker = lambda: gym.make(MyForexEnv,prices =prices, signal_features =signal_features, df=df, window_size=window_size, frame_bound=(start_index, end_index) )
env = DummyVecEnv([env_maker])

which returns:
TypeError: argument of type 'type' is not iterable

i have also tried using just the def in the new class definition to get the data but makes no difference.
or:

class MyForexEnv(gym.ActionWrapper):
    def __init__(self, env, prices = prices, signal_features = signal_features, **kwargs):
        self.trade_fee_bid_percent = 0.05
        self.trade_fee_ask_percent = 0.05
        self._prices = prices
        self._signal_features = signal_features
        super(MyForexEnv, self).__init__(env)
    def _process_data(self):
        return self._prices, self._signal_features
env = MyForexEnv(gym.make("stocks-v0"), prices, signal_features, df=df, window_size=window_size, frame_bound=(start_index, end_index))

still not working. any idea?

Muzero Integration

Hello, i was trying to integrate gym anytrading with muzero-general, but i got this error:
File "Development/Python/Muzero-GymAnytrading/self_play.py", line 137, in play_game ), f"Observation should be 3 dimensionnal instead of len(n_obs): {len(n_obs)} dimensionnal. Got observation of shape: n_obs: {n_obs}" AssertionError: Observation should be 3 dimensionnal instead of len(n_obs): 4 dimensionnal. Got observation of shape: n_obs: (1, 1, 10, 2)

Do you know what it means and how to resolve it?

Thank you,
Marco.

What is the purpose of this code? len(frame_bound) == 2 (Question)

What is the purpose of this code in forex_env?
assert len(frame_bound) == 2? I am getting this error.
What is the purpose of that? my parameters are window_size: 10 frame_bound: (50, 100)

 def __init__(self, df, window_size, frame_bound, unit_side='left'):
        
        print("df: ",df," window_size: ",window_size," frame_bound: ", frame_bound)
        assert len(frame_bound) == 2
        assert unit_side.lower() in ['left', 'right']

        self.frame_bound = frame_bound
        self.unit_side = unit_side.lower()
        super().__init__(df, window_size)

        self.trade_fee = 0.0003  # unit

How to implement a continuous action space?

say I wanted to train actions to be in a range between 0 and 1, whereby the number represents the percentage of the networth I should be invested in the asset. The resulting action is then to buy or sell the difference.

Forex Environment profit calculation: only sell leg is applied with 3 pips commission

Hi, I notice in the forexenv, there is a commission applied at the sell leg but not at the buy leg.
May I understand the reason?

Thanks in advance for your explanation

Rgds

Buy/Sell Step Logic Not Working in Stocks?

Hey!

Just found the repo and love it, but I am wondering what's going on re: the stocks environment.

According to the step logic in the trading_env super class:

        trade = False
        if ((action == Actions.Buy.value and self._position == Positions.Short) or
            (action == Actions.Sell.value and self._position == Positions.Long)):
            trade = True

...and after a reset, the initial position is a Short. So this to me reads that a Buy position will only be opened if the current position is a short, and a Buy action is generated. But when I review my render:

...you can see it starts with a bunch of red sells. How? Why? :'(

Also, using that same step logic, I would assume that if a Sell/Short position is already active, no other Short/Sells would be issued, but there's still consecutive red dots on that render, same as green for buys. What do these dots denote exactly?

I have another question about the quantstats report too. If the render says "Total Profit: 0.6828566" or whatever it's profited, how come the quantstats report is so down??

Thanks! Love the work!

no only buy and sell actions

Hello, Thanks for the great work. With all due respect I believe your assumption is faulty and there needs to be a "do nothing" event. Imagine the market goes sideways and the price variance is smaller than spread for the time longer than the window. in that case no action is the best action.

flatten()

I'm finding it difficult to use this because of the need to flatten the signal_features into a single vector so as to simplify later shapes for higher order matrix multiplications.

Confusion over results if run multiple times

Hi,
Sorry if this is the wrong place to ask this but I couldn't find anywhere else. I love the package, excellent work, and I'm sure I am doing something wrong but I wonder if you can explain why when I run the same thing multiple times I get such wildly different results. For example, running this to train / test it 10 times:

import gym
import gym_anytrading
from gym_anytrading.envs import TradingEnv, ForexEnv, Actions, Positions 
from gym_anytrading.datasets import FOREX_EURUSD_1H_ASK, STOCKS_GOOGL

env = gym.make('forex-v0', frame_bound=(50, 100), window_size=10)

for i in range(10):
  observation = env.reset()
  while True:
      action = env.action_space.sample()
      observation, reward, done, info = env.step(action)
      if done:
          print("info:", info)
          break

I get:

info: {'total_reward': -50.99999999999439, 'total_profit': 0.9875980085384239, 'position': 0}
info: {'total_reward': 24.099999999995784, 'total_profit': 0.9886818462999193, 'position': 1}
info: {'total_reward': 24.499999999987313, 'total_profit': 0.9893252791394607, 'position': 0}
info: {'total_reward': 138.10000000000767, 'total_profit': 0.9953009801405461, 'position': 1}
info: {'total_reward': 107.10000000001328, 'total_profit': 0.9926679505350279, 'position': 1}
info: {'total_reward': 127.00000000000375, 'total_profit': 0.996177843192774, 'position': 0}
info: {'total_reward': -144.90000000000117, 'total_profit': 0.9813550423422519, 'position': 1}
info: {'total_reward': -128.90000000000293, 'total_profit': 0.9843355398695747, 'position': 0}
info: {'total_reward': 45.699999999999626, 'total_profit': 0.9912142586709967, 'position': 0}
info: {'total_reward': -39.39999999999389, 'total_profit': 0.9859639867316038, 'position': 1}

Wouldn't I expect them all to have the same position given it's the same data / training, or am I missing something fundamental here?

Thank you

Change reward function

Would it be possible to make a function comparable to add_signals for changing the reward function? It would be nice to use custom KPI's as rewards for example risk-adjusted return.

Kind regards.

Extracting results for quantstats

Hello,
after creating a model and running results = model.learn(int(1000)) how do I use the results to compare with a benchmark in quantstats?
Currently the results doesn't hold the data that quantstats expect to be able to use in conjunction with qs.reports.html(results, "SPY", output="D:\ReinforcementLearning\BaseLines\Trading\Myreport.html")

how to enable live peper trading?

hello, is there a way to implement a live paper trading?
or just a way to feed the price in real time?

[QUESTION] Stable Baseline render vectorized forex enviroment

Hello, I have a question...

I'm currently using the stable baselines library to train a model using your 'forex-v0' environment.

env = DummyVecEnv([lambda: gym.make('forex-v0', frame_bound=(10, 500), window_size=10)])
policy_kwargs = dict(net_arch=[64, 'lstm',dict(vf=[128,128,128], pi=[64,64])])
model = A2C("MlpLstmPolicy", env, verbose=1, policy_kwargs=policy_kwargs)
model.learn(total_timesteps=5000)

After training the model I perform a test using your code:

observation = env.reset()
while True:
        action = model.predict(observation)
        observation, reward, done, info = env.step(action)
        # env.render()
        if done:
            print("info:", info)
            break

# Plotting results
plt.cla()
env.render_all()
plt.show()

But unfortunately I get a DummyVecEnv has no render_all() which makes sense to me because now the environment is in a Vector.
The thing I don't understand is how I can call env.render_all() in the Vector.
My confusion it's because when I call env.render() everything works fine, but not when I call env.render_all()

Reproducibility of result calculation (total_profit) using fixed test data for ForexEnv

Hi, I played with the forex model using this gym.
I created an RL A2C model with stable-baselines3 and tested against this gym. Somehow I always get different 'total_profit' calculation whilst 'max possible profit' is fixed. Can anybody advise on how do I tweak the code so that I can get consistent 'total_profit' result? I have tried to fix the seed using env.seed(32) and env.action_space.seed(32) but I still get different result in 'total_profit' calculation

Rgds,
Harry

Expiration

Do believe expiration to be a significant factor in forex

_update_profit - function inner workings

Greetings,

First of all, Many thanks to AminHP for sharing the project.

I have trouble understanding the ForexEnv's _update_profit function.

I understand that as I am using the Euro as my base currency with EURUSD pair, I should use unit_side='left'. Am I correct with this assumption?
The _total_profit variable is updated, only when a buy action is given and the existing position is short. From these rules, I understand, that the _update_profit function takes only short trades into account, calculating the _total_profit from latest short-trades only. Is this assumption correct?

Would you please be kind and clarify me, does the _update_profit function take into account profits from long-trades with Euro currency and if it does, how does it work?

def _update_profit(self, action):
        trade = False
        if ((action == Actions.Buy.value and self._position == Positions.Short) or
            (action == Actions.Sell.value and self._position == Positions.Long)):
            trade = True

        if trade or self._done:
            current_price = self.prices[self._current_tick]
            last_trade_price = self.prices[self._last_trade_tick]

            if self.unit_side == 'left':
                if self._position == Positions.Short: 
                    
# Here the _total_profit variable is updated only if given action is Buy and existing position is Short.

                    quantity = self._total_profit * (last_trade_price - self.trade_fee)
                    self._total_profit = quantity / current_price

            elif self.unit_side == 'right':
                if self._position == Positions.Long:
                    quantity = self._total_profit / last_trade_price
                    self._total_profit = quantity * (current_price - self.trade_fee)

How to deploy trained model as app or into back testing frameworks like backtrader to predict buy or Sell ?

How to deploy the trained OpenAI gym model for stocking trading as app or into back testing frameworks like backtrader to predict buy or Sell ?

TF-Agents Implementation Example

Is it possible to create an implementation for TF-Agents also ?

Reward computation for stockEnv

Hi AminHp,

Really great work, the code is very pleasant to read. I have a question regarding the _calculate_reward function in the StocksEnvs : why is the step_reward only updated when we sell a long position ? As I understand buying after a short position should also generate profit/a loss and thus the agent should be rewarded accordingly but it is not taken into account if i'm correct. Forgive me if this is a noob question, I just got into finance and stock trading yesterday.

Best regards,

MISS

lee

Problems

[Question] Is having a high reward and low profit a normal case?

Hello,

Is reward calculation ok? I have a high reward but on a loss profit.

I am using stable baselines.

I am using this signal features.

def my_process_data(env):
    start = env.frame_bound[0] - env.window_size
    end = env.frame_bound[1]
    prices = env.df.loc[:, 'Close'].to_numpy()[start:end]
    # print(env.df)
    indi = Indicators(env.df)
    signal_features = env.df.loc[:, ['Close', 'Open', 'High', 'Low','Volume']].to_numpy()[start+1:end]
    #signal_features = env.df.loc[:, ['Close','Volume']].to_numpy()[start+1:end]
    
   
    rsi = indi.rsi(5,1)
    rsicolumn = rsi.to_numpy()[start:end].reshape(-1,1)
    print("rsi shape: ",rsicolumn.shape)
    signal_features = np.append(signal_features, rsicolumn, axis=1)
    
    # print(signal_features)
    return prices, signal_features

I think there are serious issues with this ENV.

I was writing tests for this and its becoming more and more clear this gym has some serious deficiencies. I dont think anyone should be using it in production and your READM ideally would reflect that. At a base level the only 2 actions and states are long or short which is very wrong and messes with whatever algorithm is being used to train. Many algorithms depend on a gaussian action space. i.e -1 or [0, 0, 1], 0 or [0, 0, 0], 1 or [1, 0, 0].

Clarifications: Confused on how to implement in production(live)

@AminHP I am having a hard time wrapping my head around on how to implement this in a live environment for paper trading, just to hook everything up E2E.

env_maker = lambda: gym.make(
    'stocks-v0',
    df=test_df,
    window_size=window_size,
    frame_bound=(start_index, end_index)
)

The above snippet is how to create an environment for the agent/model to step through. But in order to create the environment, we have to pass in a DataFrame. In the real world, we won't know the current day OHCLV until the markets close. So how would we be able to use a trained model in a current environment with up to date data and features (observations)? Unless it's predicted actions are actually for the next day?

Side question: Why on observation = observation[np.newaxis, ...] while stepping through the env do we have to reduce observation by 1 dimension before predicting? I don't think observations (signal_features) is changing in the environment.

Thank you!

Running to 1 or no trade on evaluation

I am running to 1 or no trade on evaluation. I am just using sample code in TF DQN. The collect_step will trigger trade but evaluation step in compute_avg_return only has 1 or 0 trade.


for _ in range(num_iterations):

  # Collect a few steps using collect_policy and save to the replay buffer.
  for _ in range(collect_steps_per_iteration):
    collect_step(train_env, agent.collect_policy, replay_buffer)

  # Sample a batch of data from the buffer and update the agent's network.
  experience, unused_info = next(iterator)
  train_loss = agent.train(experience).loss

  step = agent.train_step_counter.numpy()

  if step % log_interval == 0:
    print('Time = {0}, step = {1}: loss = {2}'.format(datetime.now(), step, train_loss))
  if step % eval_interval == 0:
    avg_return = compute_avg_return(eval_env, agent.policy, num_eval_episodes)
    print('Evaluate Time = {0}, step = {1}: Average Return = {2}'.format(datetime.now(), step, avg_return))
    returns.append(avg_return)

Regarding Reward and Profit

Hi,

I recently used anytrading with a custom environment and stable baseline's PPO2 algo.

After running the evaluation part 10 times my output was something like

info {'total_reward': 24392200.00000009, 'total_profit': 0.9844407417070604, 'position': 0}
info {'total_reward': 48881799.99999967, 'total_profit': 1.011612620710015, 'position': 0}
info {'total_reward': 51085300.00000165, 'total_profit': 1.013074701451891, 'position': 1}
info {'total_reward': 14793399.999999177, 'total_profit': 0.9767670021357563, 'position': 0}
info {'total_reward': 17957400.000001136, 'total_profit': 0.9815584135159401, 'position': 0}
info {'total_reward': -2354400.0000011073, 'total_profit': 0.9607471236716814, 'position': 1}
info {'total_reward': 20103799.9999998, 'total_profit': 0.9839828662099608, 'position': 0}
info {'total_reward': 19209400.000002127, 'total_profit': 0.9826626717429163, 'position': 1}
info {'total_reward': 14625800.00000124, 'total_profit': 0.9773373249065562, 'position': 1}
info {'total_reward': 53867999.99999998, 'total_profit': 1.0180095847348958, 'position': 1}

As far as i understand profit, if profit is >1 it is profit otherwise <1 is loss. My question is why is the total_reward positive in cases where the total_profit is actually <1 (loss).

Also it seems like it trades too frequently even when it shouldn't, Can we add one such action like wait to see a bigger price difference or trend. (Sorry if there's a proper term for it, I am new to trading stuff)

Clarification: Position Prediction

I would like to clarify something. So I imported the standard environment 'stocks-v0' but then overlayed a custom environment on top of that with extra features beyond just OHVLC. I split my dataframe into training and testing. Lets say the shape[0] of my df is 100,000 with row 100,000 as the latest trade data row. If I train on the first X rows, then test on X+1 rows up to row 99,999, is the position that the model spits out, 1 or 0, for the 100,000 row?

Unable to plot in virtual Environment

QObject::moveToThread: Current thread (0x141cf30) is not the object's thread (0x1a84fd0).
Cannot move to target thread (0x141cf30)

qt.qpa.plugin: Could not load the Qt platform plugin "xcb" in "/home/jothi/Software/btgym/venv/lib/python3.8/site-packages/cv2/qt/plugins" even though it was found.
This application failed to start because no Qt platform plugin could be initialized. Reinstalling the application may fix this problem.

Available platform plugins are: xcb, eglfs, linuxfb, minimal, minimalegl, offscreen, vnc, wayland-egl, wayland, wayland-xcomposite-egl, wayland-xcomposite-glx, webgl.

Aborted (core dumped)

Model learns the opposite direction, worst possible reward

Hi, this is not an issue, but after days of trying to figure this out, I wanted to ask in case someone has an advice for me. First I found this issue on my own custom env. I tried DQN, A2C, PPO and all of them are doesn't know which way to go. It just fluctuates between best and worst possible reward. It learns perfectly, because when it is negative it is the worst possible outcome. Then I wanted to try your env which is very clean and easy to understand, but I am having the exact same issue. Do you have any experience with something like this? I'm doing something wrong but couldn't find it. Thanks.

ValueError: Cannot feed value of shape (1, 3926, 2) for Tensor 'deepq/input/Ob:0', which has shape '(?, 3927, 2)'

Hello,
with the below code I am presented with the error stated in the title. I want my window to be as big as my df frame.

custom_env = gym.make('stocks-v0', df = data, window_size = 3927, frame_bound = (1, 3927))

The error:

Traceback (most recent call last):
  File "d:\ReinforcementLearning\BaseLines\Trading\RL Trading.py", line 131, in <module>
    results = model.learn(int(100000))
  File "D:\Anaconda\envs\RL\lib\site-packages\stable_baselines\deepq\dqn.py", line 216, in learn
    action = self.act(np.array(obs)[None], update_eps=update_eps, **kwargs)[0]
  File "D:\Anaconda\envs\RL\lib\site-packages\stable_baselines\deepq\build_graph.py", line 159, in act
    return _act(obs, stochastic, update_eps)
  File "D:\Anaconda\envs\RL\lib\site-packages\stable_baselines\common\tf_util.py", line 287, in <lambda>
    return lambda *args, **kwargs: func(*args, **kwargs)[0]
  File "D:\Anaconda\envs\RL\lib\site-packages\stable_baselines\common\tf_util.py", line 330, in __call__
    results = sess.run(self.outputs_update, feed_dict=feed_dict, **kwargs)[:-1]
  File "D:\Anaconda\envs\RL\lib\site-packages\tensorflow\python\client\session.py", line 900, in run
    run_metadata_ptr)
  File "D:\Anaconda\envs\RL\lib\site-packages\tensorflow\python\client\session.py", line 1111, in _run
    str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (1, 3926, 2) for Tensor 'deepq/input/Ob:0', which has shape '(?, 3927, 2)'

StocksEnv profit calculation does not consider short ?

Not sure if here is the best place to ask a question.

In the _calculate_reward function, the reward does not seem to consider when shorting. Trade is true, but it does not add reward when we short and price goes down or remove reward if price goes up when shorting.

if trade:
    current_price = self.prices[self._current_tick]
    last_trade_price = self.prices[self._last_trade_tick]
    price_diff = current_price - last_trade_price

    if self._position == Positions.Long:
        step_reward += price_diff

Shouldn't it be changed to:

if trade:
    current_price = self.prices[self._current_tick]
    last_trade_price = self.prices[self._last_trade_tick]
    price_diff = current_price - last_trade_price

    if self._position == Positions.Long:
        step_reward += price_diff
    else:
        step_reward -= price_diff # Change here to account for shorting

Examples for RLLIB

I am struggling on how to use this environment with Ray's RLLIB.

Any idea or sample?