aminhp / gym-anytrading Goto Github PK
View Code? Open in Web Editor NEWThe most simple, flexible, and comprehensive OpenAI Gym trading environment (Approved by OpenAI Gym)
License: MIT License
The most simple, flexible, and comprehensive OpenAI Gym trading environment (Approved by OpenAI Gym)
License: MIT License
After a good bit of looking I haven't been able to find a way to illustrate what the training process has come up with so that I can take it into next steps and write a trading script to use the methodology/model the computer came up with.
With custom imported data I've been able to get relatively consistent 'Explained Variance' close to 1. Which to my knowledge, means that the model and the actual data have very small discrepencies. Meaning the model could potentially be used to make a trading methodology that will have perhaps consistent wins.
My trouble is seeing what exactly the contents of the model the computer came up with. Using quant-stats I can easily see its performance over the test period but that's not quite as useful as seeing how exactly the computer traded to be able to achieve quoted return data.
Any guidance or advice would be appreciated!
Hi there,
I've encountered an issue running the following code:
import gym
from gym_anytrading.envs import TradingEnv, ForexEnv, Actions, Positions
from gym_anytrading.datasets import FOREX_EURUSD_1H_ASK
from stable_baselines.common.env_checker import check_env
env = gym.make('forex-v0', frame_bound=(10, 500), window_size=10)
check_env(env, warn=False, skip_render_check=False)
The output I get is:
AssertionError: The observation returned by the reset() method does not match the given observation space
I've debugged your TradingEnv
class and did't see any issue, so I've thought the problem could be in check_env()
.
I've debugged check_env()
as well, but everything seems fine there.
Then I went for the last test which was running check_env()
with the classic CartPole-v0
from gym, here check_env()
didn't trow any exception and run smoothly.
This is the code for the CartPole-v0
:
import gym
from gym.envs.classic_control import CartPoleEnv
from stable_baselines.common.env_checker import check_env
env = gym.make('CartPole-v0')
check_env(env, warn=False, skip_render_check=False)
Do you have any clue why this is happening? I'm confused lol
Hi @AminHP
trying your code into example "a2c_quantstats.ipynb" the last part does not run and I got the following error:
File "", line 3, in
net_worth = pd.Series(env.history['total_profit'], index=df.index[start_index+1:end_index])
AttributeError: 'ForexEnv' object has no attribute 'history'
I can not figure out how to fix it, do you have any suggestion?
Thank you
How to register gym-anytrading env to Gym?
I see that you have the init.py. How do you do this?
Thank you,
Vic
I just want to confirm if I am loading a saved model the right way on a test set.
So firstly I ran my preferred model and saved it
My env
variable before the model load is only on the test set
Then I do
model = A2C.load(load_path, env=env)
obs = env.reset()
while True:
obs = obs[np.newaxis, ...]
action, _states = model.predict(obs)
........
Sorry I don't know how to indent lines of code in Github comments (I thought tab would do it)
I would really like to see Binary Options supported by this, and am working myself on trying and completing this (by hiring someone) as I lack the knowledge / am struggling on how to complete this. Nadex is my prefered binary options exchange, however, i can see others benefiting from something such-as iq option
Thank you!
Hi .
Hello @AminHP, how are you?
I've been studying for that time and searching more and more about this world of trade and more and more, going to daytrade (intraday).
In this case, for this project, would it be possible for us to use it for training between a specific time during the day?
I have the data, but I'm not sure how to use it to start these trainings and see how it could be useful for our world!
Do you have any suggestions on how to do it, to use this data during a period of the day or even an example, like a light for my head, on how to use this project and learn more?
Thank you, sir!
I am playing around with some financial data and testing various models, and I'd like to understand some things:
Firstly, is there a way to plot all the training buy/sell positions against the prices the bot bought and sold at? I can only see reward over time plotted.
In the attachment, can you explain why the reward jumps up from 378 to 379 on the x-axis when it goes from selling a low price to buying a high price? Or is price info hidden and the bot is actually buying a low price and selling a high price in the background?
My biggest challenge is how often should the bot run because every time it is run, it generates a signal. So if I run it every day, it will generate a signal very day, if I run it every 4 days, it will generate a signal every 4 days etc.....it basically generates a signal based on the timesteps of your dataset. So if you have hourly data, is it best practice to run it every hour, and if you have daily data, is it best practice to run it every day? How can we change the reward function to penalize very short term trades and make it trade every once in a while?
I am trying to use multiple cpu for the example provided on this link?
I tried to change the environment to multiple cpu.
env = DummyVecEnv([env_maker for i in range(16)])
But I have a problem in the done and info in stable baselines. It seems they turned into arrays.
There is an error in this code: any suggestions or any of you done this? It seems lstm in stable baselines are like this.
#env = env_maker()
#observation = env.reset()
while True:
#observation = observation[np.newaxis, ...]
# action = env.action_space.sample()
action, _states = model.predict(observation)
observation, reward, done, info = env.step(action)
# env.render()
if done:
print("info:", info)
break
------------------------------
Error:
```python
ValueError Traceback (most recent call last)
<ipython-input-27-2d78acbb8800> in <module>
10
11 # env.render()
---> 12 if done:
13 print("info:", info)
14 break
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Hello,
I've been thinking whether or not it would be possible to integrate this kind of feature in gymanytrading, basically, it goes like this:
From what I know about RL the policy gradients are initialized randomly and the agent is rewarded according to its actions, within trading this potentially means that it can take millions of iterations across the dataset before it even comes up with a strategy that is remotely successful, thereafter it spends time optimizing the strategy which again can take a long time. In the end, you are presented with a model that is attempting to maximize its reward, based on the reward structure this can mean that if you are maximizing the total net worth you might end up with a model applying a scalping strategy, where you personally would have liked a model that was swing trading. So how can we potentially adjust for this and also make it faster in the process? Normally you would add several reward functions to reward it based on what kind of strategy you want it to employ, however, there might be another way.
A theoretical concept that I have been messing around within my head is that if we instead of define reward functions by networth, sortino ratio etc. we should perhaps go in our dataset and place buy and sell markers either manually or mathematically. These buy and sell markers are where you want the RL to ideally enter and exit trades, therefore we reward only the RL when it trades at these points/prices. The obvious concern here is overfitting:
First, we have to address the fact that in all forms of ML you have to split your dataset into training and testing this will allow people to see whether or not it's actually overfitting on the training dataset.
Second of all, a precision parameter could be set, this parameter would determine the range in percent from the buy and sell prices specified which we would still reward the RL for buying and selling at.
Take a daily chart of Apple, if we are going to apply a swing trading strategy, the perfect buy entry would occur on the 23rd of March 2020 at price 212,61$. We would mark this as our buy entry. The perfect sell exit would occur on the 13th of July 2020 at a price of 399,82$, we then mark this as the sell exit. You would keep doing this either manually or mathematically across the entire dataset on which you want to train on.
Next, we would set a precision parameter in this case we set it to 1,5 meaning that we will still reward the agent if it buys at a price of +-1,5% from 212,61$ and sells at a price of +-1,5% from 399,82$.
So how would this impact our RL? My theory is that our RL agent will be trying to create a strategy that generates entry and exit signals according to these buy and sell points, this will allow for more controllability for the user who can now specify what strategy they want to employ (swing trading, scalping etc.). Besides controllability, the RL would presumably train faster since it doesn't need to find out which entry and exit points generates the most reward (we did that for it), therefore it would instead spend its time going over the dataset to find signals that would trigger withing the precision range of our entries and exits, of course, if these signals also trigger outside the range it gets punished for it so as to avoid it just constantly generating buy and sell signals on each bar.
This is of course just my take on things and I am posting it here because of two reasons. Number 1 being that this could potentially become a unique feature only in gymanytrading (I haven't seen this elsewhere) given that the other reason I'm posting will hold its ground.
Reason number 2 is to get feedback on this idea, I am still fairly new to RL so if some of the experts out there think that this won't work because of A, B, C or D well then comment here and let's get a discussion going. After all it is in everyone's best interest to get gymanytrading to be as good as possible, even if it means implementing new ideas that might not have been tried before since they could turn out to be the best ideas.
Hello,
Today I stumpled upon this mighty fine github repo that allows you to visualize the training behaviour of an RL that runs on OpenAI's Gym. https://github.com/deepmind/bsuite
Just thought that it might be a useful addition to gymanytrading!
The problem is something inside the DummyVecEnv
which resets the environment automatically after it is done.
Also, there was a mistake in your code. Try this:
env_maker = lambda: gym.make('forex-v0', frame_bound=(100, 5000), window_size=10)
env = DummyVecEnv([env_maker])
# Training Env
policy_kwargs = dict(net_arch=[64, 'lstm',dict(vf=[128,128,128], pi=[64,64])])
model = A2C("MlpLstmPolicy", env, verbose=1, policy_kwargs=policy_kwargs)
model.learn(total_timesteps=1000)
# Testing Env
env = env_maker()
observation = env.reset()
while True:
observation = observation[np.newaxis, ...]
# action = env.action_space.sample()
action, _states = model.predict(observation)
observation, reward, done, info = env.step(action)
# env.render()
if done:
print("info:", info)
break
# Plotting results
plt.cla()
env.render_all()
plt.show()
Originally posted by @AminHP in #1 (comment)
I saw this reply on a similar problem I had with the render_all() method. Though in my case I am using a VecNormalize() wrapper around my DummyVecEnv. In the solution quoted a DummyVecEnv was made that was used for training, and then another env was instantiated for the prediction/testing that could be used with render all. In my case this won't work since I need the VecNormalize to normalize observations and reward.
env = make_vec_env(env_maker, n_envs=1, monitor_dir=log_dir)
env = VecNormalize(env, norm_obs=True, norm_reward=True, clip_obs=10.)
model = PPO2('MlpLstmPolicy', env, verbose=1, nminibatches=1, policy_kwargs=policy_kwargs,)
callback = SaveOnBestTrainingRewardCallback(check_freq=1000, log_dir=log_dir, env=env, verbose=1)
# model = PPO2('MlpLstmPolicy', env, verbose=1)
model.learn(total_timesteps=5000, callback=callback, log_interval=10)
env.norm_reward = False
env.training = False
mean_reward, std_reward = evaluate_policy(model, env, n_eval_episodes=1)
print(f"mean_reward:{mean_reward:.2f} +/- {std_reward:.2f}")
# I get the expected reward here using evaluate_policy()
plt.figure(figsize=(15,6))
plt.cla()
env.render_all()
plt.show()
# This part doesn't work because of the same error
What can I do to use render_all() method (Or any other attribute like env.history for that matter) while maintaining the VecNormalize() environment?
Hi @AminHP ,
I wanted to use this environment as a part of my course project for RL. So, I wanted to ask you if anything has been implemented in this environment prior to this. And if possible if you can give me some resources of algorithms implemented for this environment.
Best,
Kunal
Do you plan to support multi-asset datasets?
Hi, I was going over the code where when we render our results we get the result benchmarks such as
info {'total_reward': 8.100000000000023, 'total_profit': 0.7996927239889693, 'position': 1}
I just want to know what these parameters mean espically total_reward and total_profit.
This issue is just to notify that PIP package is not updated, it's still at the first commit dated 22/09/2019.
It will be very helpful if you can point me some examples using stable-baselines3. I am still not sure how it comparable to stable-baselines as they have big warning box to compare performance. Appreciated.
def my_process_data(df, window_size, frame_bound):
prices = df.loc[:, 'NDX'].to_numpy()
prices[frame_bound[0] - window_size] # validate index (TODO: Improve validation)
prices = prices[frame_bound[0]-window_size:frame_bound[1]]
signal_features = df.to_numpy()#np.column_stack((prices, diff))
return prices, signal_features
class MyForexEnv(StocksEnv):
def __init__(self, prices, signal_features, **kwargs):
self._prices = prices
self._signal_features = signal_features
super().__init__(**kwargs)
def _process_data(self):
return self._prices, self._signal_features
window_size = 30
start_index = window_size
end_index = len(df)
#env = MyForexEnv(df=df, window_size=10, frame_bound=(start_index, end_index))
prices, signal_features = my_process_data(df=df, window_size=window_size, frame_bound=(start_index, end_index))
env = MyForexEnv( prices, signal_features, df=df, window_size=window_size, frame_bound=(start_index, end_index))
env_maker = lambda: gym.make('env')
env = DummyVecEnv([env_maker])
im trying using the extended env with strong baseline but i keep getting errors:
TypeError: argument of type 'MyForexEnv' is not iterable
or
class MyForexEnv(StocksEnv):
def __init__(self, prices = prices, signal_features = signal_features, **kwargs):
self._prices = prices
self._signal_features = signal_features
super().__init__(**kwargs)
def _process_data(self):
return self._prices, self._signal_features
window_size = 30
start_index = window_size
end_index = len(df)
prices, signal_features = my_process_data(df=df, window_size=window_size, frame_bound=(start_index, end_index))
env_maker = lambda: gym.make(MyForexEnv,prices =prices, signal_features =signal_features, df=df, window_size=window_size, frame_bound=(start_index, end_index) )
env = DummyVecEnv([env_maker])
which returns:
TypeError: argument of type 'type' is not iterable
i have also tried using just the def in the new class definition to get the data but makes no difference.
or:
class MyForexEnv(gym.ActionWrapper):
def __init__(self, env, prices = prices, signal_features = signal_features, **kwargs):
self.trade_fee_bid_percent = 0.05
self.trade_fee_ask_percent = 0.05
self._prices = prices
self._signal_features = signal_features
super(MyForexEnv, self).__init__(env)
def _process_data(self):
return self._prices, self._signal_features
env = MyForexEnv(gym.make("stocks-v0"), prices, signal_features, df=df, window_size=window_size, frame_bound=(start_index, end_index))
still not working. any idea?
Hello, i was trying to integrate gym anytrading with muzero-general, but i got this error:
File "Development/Python/Muzero-GymAnytrading/self_play.py", line 137, in play_game ), f"Observation should be 3 dimensionnal instead of len(n_obs): {len(n_obs)} dimensionnal. Got observation of shape: n_obs: {n_obs}" AssertionError: Observation should be 3 dimensionnal instead of len(n_obs): 4 dimensionnal. Got observation of shape: n_obs: (1, 1, 10, 2)
Do you know what it means and how to resolve it?
Thank you,
Marco.
What is the purpose of this code in forex_env?
assert len(frame_bound) == 2? I am getting this error.
What is the purpose of that? my parameters are window_size: 10 frame_bound: (50, 100)
def __init__(self, df, window_size, frame_bound, unit_side='left'):
print("df: ",df," window_size: ",window_size," frame_bound: ", frame_bound)
assert len(frame_bound) == 2
assert unit_side.lower() in ['left', 'right']
self.frame_bound = frame_bound
self.unit_side = unit_side.lower()
super().__init__(df, window_size)
self.trade_fee = 0.0003 # unit
say I wanted to train actions to be in a range between 0 and 1, whereby the number represents the percentage of the networth I should be invested in the asset. The resulting action is then to buy or sell the difference.
Hi, I notice in the forexenv, there is a commission applied at the sell leg but not at the buy leg.
May I understand the reason?
Thanks in advance for your explanation
Rgds
Hey!
Just found the repo and love it, but I am wondering what's going on re: the stocks environment.
According to the step logic in the trading_env
super class:
trade = False
if ((action == Actions.Buy.value and self._position == Positions.Short) or
(action == Actions.Sell.value and self._position == Positions.Long)):
trade = True
...and after a reset, the initial position is a Short. So this to me reads that a Buy position will only be opened if the current position is a short, and a Buy action is generated. But when I review my render:
...you can see it starts with a bunch of red sells. How? Why? :'(
Also, using that same step logic, I would assume that if a Sell/Short position is already active, no other Short/Sells would be issued, but there's still consecutive red dots on that render, same as green for buys. What do these dots denote exactly?
I have another question about the quantstats report too. If the render says "Total Profit: 0.6828566" or whatever it's profited, how come the quantstats report is so down??
Thanks! Love the work!
Hello, Thanks for the great work. With all due respect I believe your assumption is faulty and there needs to be a "do nothing" event. Imagine the market goes sideways and the price variance is smaller than spread for the time longer than the window. in that case no action is the best action.
I'm finding it difficult to use this because of the need to flatten the signal_features
into a single vector so as to simplify later shapes for higher order matrix multiplications.
Hi,
Sorry if this is the wrong place to ask this but I couldn't find anywhere else. I love the package, excellent work, and I'm sure I am doing something wrong but I wonder if you can explain why when I run the same thing multiple times I get such wildly different results. For example, running this to train / test it 10 times:
import gym
import gym_anytrading
from gym_anytrading.envs import TradingEnv, ForexEnv, Actions, Positions
from gym_anytrading.datasets import FOREX_EURUSD_1H_ASK, STOCKS_GOOGL
env = gym.make('forex-v0', frame_bound=(50, 100), window_size=10)
for i in range(10):
observation = env.reset()
while True:
action = env.action_space.sample()
observation, reward, done, info = env.step(action)
if done:
print("info:", info)
break
I get:
info: {'total_reward': -50.99999999999439, 'total_profit': 0.9875980085384239, 'position': 0}
info: {'total_reward': 24.099999999995784, 'total_profit': 0.9886818462999193, 'position': 1}
info: {'total_reward': 24.499999999987313, 'total_profit': 0.9893252791394607, 'position': 0}
info: {'total_reward': 138.10000000000767, 'total_profit': 0.9953009801405461, 'position': 1}
info: {'total_reward': 107.10000000001328, 'total_profit': 0.9926679505350279, 'position': 1}
info: {'total_reward': 127.00000000000375, 'total_profit': 0.996177843192774, 'position': 0}
info: {'total_reward': -144.90000000000117, 'total_profit': 0.9813550423422519, 'position': 1}
info: {'total_reward': -128.90000000000293, 'total_profit': 0.9843355398695747, 'position': 0}
info: {'total_reward': 45.699999999999626, 'total_profit': 0.9912142586709967, 'position': 0}
info: {'total_reward': -39.39999999999389, 'total_profit': 0.9859639867316038, 'position': 1}
Wouldn't I expect them all to have the same position given it's the same data / training, or am I missing something fundamental here?
Thank you
Would it be possible to make a function comparable to add_signals for changing the reward function? It would be nice to use custom KPI's as rewards for example risk-adjusted return.
Kind regards.
Hello,
after creating a model and running results = model.learn(int(1000))
how do I use the results to compare with a benchmark in quantstats?
Currently the results doesn't hold the data that quantstats expect to be able to use in conjunction with qs.reports.html(results, "SPY", output="D:\ReinforcementLearning\BaseLines\Trading\Myreport.html")
hello, is there a way to implement a live paper trading?
or just a way to feed the price in real time?
Hello, I have a question...
I'm currently using the stable baselines library to train a model using your 'forex-v0' environment.
env = DummyVecEnv([lambda: gym.make('forex-v0', frame_bound=(10, 500), window_size=10)])
policy_kwargs = dict(net_arch=[64, 'lstm',dict(vf=[128,128,128], pi=[64,64])])
model = A2C("MlpLstmPolicy", env, verbose=1, policy_kwargs=policy_kwargs)
model.learn(total_timesteps=5000)
After training the model I perform a test using your code:
observation = env.reset()
while True:
action = model.predict(observation)
observation, reward, done, info = env.step(action)
# env.render()
if done:
print("info:", info)
break
# Plotting results
plt.cla()
env.render_all()
plt.show()
But unfortunately I get a DummyVecEnv
has no render_all()
which makes sense to me because now the environment is in a Vector.
The thing I don't understand is how I can call env.render_all()
in the Vector.
My confusion it's because when I call env.render()
everything works fine, but not when I call env.render_all()
Hi, I played with the forex model using this gym.
I created an RL A2C model with stable-baselines3 and tested against this gym. Somehow I always get different 'total_profit' calculation whilst 'max possible profit' is fixed. Can anybody advise on how do I tweak the code so that I can get consistent 'total_profit' result? I have tried to fix the seed using env.seed(32) and env.action_space.seed(32) but I still get different result in 'total_profit' calculation
Rgds,
Harry
Do believe expiration to be a significant factor in forex
Greetings,
First of all, Many thanks to AminHP for sharing the project.
I have trouble understanding the ForexEnv's _update_profit function.
I understand that as I am using the Euro as my base currency with EURUSD pair, I should use unit_side='left'. Am I correct with this assumption?
The _total_profit variable is updated, only when a buy action is given and the existing position is short. From these rules, I understand, that the _update_profit function takes only short trades into account, calculating the _total_profit from latest short-trades only. Is this assumption correct?
Would you please be kind and clarify me, does the _update_profit function take into account profits from long-trades with Euro currency and if it does, how does it work?
def _update_profit(self, action):
trade = False
if ((action == Actions.Buy.value and self._position == Positions.Short) or
(action == Actions.Sell.value and self._position == Positions.Long)):
trade = True
if trade or self._done:
current_price = self.prices[self._current_tick]
last_trade_price = self.prices[self._last_trade_tick]
if self.unit_side == 'left':
if self._position == Positions.Short:
# Here the _total_profit variable is updated only if given action is Buy and existing position is Short.
quantity = self._total_profit * (last_trade_price - self.trade_fee)
self._total_profit = quantity / current_price
elif self.unit_side == 'right':
if self._position == Positions.Long:
quantity = self._total_profit / last_trade_price
self._total_profit = quantity * (current_price - self.trade_fee)
How to deploy the trained OpenAI gym model for stocking trading as app or into back testing frameworks like backtrader to predict buy or Sell ?
Is it possible to create an implementation for TF-Agents also ?
Hi AminHp,
Really great work, the code is very pleasant to read. I have a question regarding the _calculate_reward function in the StocksEnvs : why is the step_reward only updated when we sell a long position ? As I understand buying after a short position should also generate profit/a loss and thus the agent should be rewarded accordingly but it is not taken into account if i'm correct. Forgive me if this is a noob question, I just got into finance and stock trading yesterday.
Best regards,
lee
A
Hello,
Is reward calculation ok? I have a high reward but on a loss profit.
I am using stable baselines.
I am using this signal features.
def my_process_data(env):
start = env.frame_bound[0] - env.window_size
end = env.frame_bound[1]
prices = env.df.loc[:, 'Close'].to_numpy()[start:end]
# print(env.df)
indi = Indicators(env.df)
signal_features = env.df.loc[:, ['Close', 'Open', 'High', 'Low','Volume']].to_numpy()[start+1:end]
#signal_features = env.df.loc[:, ['Close','Volume']].to_numpy()[start+1:end]
rsi = indi.rsi(5,1)
rsicolumn = rsi.to_numpy()[start:end].reshape(-1,1)
print("rsi shape: ",rsicolumn.shape)
signal_features = np.append(signal_features, rsicolumn, axis=1)
# print(signal_features)
return prices, signal_features
I was writing tests for this and its becoming more and more clear this gym has some serious deficiencies. I dont think anyone should be using it in production and your READM ideally would reflect that. At a base level the only 2 actions and states are long or short which is very wrong and messes with whatever algorithm is being used to train. Many algorithms depend on a gaussian action space. i.e -1 or [0, 0, 1], 0 or [0, 0, 0], 1 or [1, 0, 0].
@AminHP I am having a hard time wrapping my head around on how to implement this in a live environment for paper trading, just to hook everything up E2E.
env_maker = lambda: gym.make(
'stocks-v0',
df=test_df,
window_size=window_size,
frame_bound=(start_index, end_index)
)
The above snippet is how to create an environment for the agent/model to step through. But in order to create the environment, we have to pass in a DataFrame. In the real world, we won't know the current day OHCLV until the markets close. So how would we be able to use a trained model in a current environment with up to date data and features (observations)? Unless it's predicted actions are actually for the next day?
Side question: Why on observation = observation[np.newaxis, ...]
while stepping through the env do we have to reduce observation by 1 dimension before predicting? I don't think observations (signal_features
) is changing in the environment.
Thank you!
I am running to 1 or no trade on evaluation. I am just using sample code in TF DQN. The collect_step will trigger trade but evaluation step in compute_avg_return only has 1 or 0 trade.
for _ in range(num_iterations):
# Collect a few steps using collect_policy and save to the replay buffer.
for _ in range(collect_steps_per_iteration):
collect_step(train_env, agent.collect_policy, replay_buffer)
# Sample a batch of data from the buffer and update the agent's network.
experience, unused_info = next(iterator)
train_loss = agent.train(experience).loss
step = agent.train_step_counter.numpy()
if step % log_interval == 0:
print('Time = {0}, step = {1}: loss = {2}'.format(datetime.now(), step, train_loss))
if step % eval_interval == 0:
avg_return = compute_avg_return(eval_env, agent.policy, num_eval_episodes)
print('Evaluate Time = {0}, step = {1}: Average Return = {2}'.format(datetime.now(), step, avg_return))
returns.append(avg_return)
Hi,
I recently used anytrading with a custom environment and stable baseline's PPO2 algo.
After running the evaluation part 10 times my output was something like
info {'total_reward': 24392200.00000009, 'total_profit': 0.9844407417070604, 'position': 0}
info {'total_reward': 48881799.99999967, 'total_profit': 1.011612620710015, 'position': 0}
info {'total_reward': 51085300.00000165, 'total_profit': 1.013074701451891, 'position': 1}
info {'total_reward': 14793399.999999177, 'total_profit': 0.9767670021357563, 'position': 0}
info {'total_reward': 17957400.000001136, 'total_profit': 0.9815584135159401, 'position': 0}
info {'total_reward': -2354400.0000011073, 'total_profit': 0.9607471236716814, 'position': 1}
info {'total_reward': 20103799.9999998, 'total_profit': 0.9839828662099608, 'position': 0}
info {'total_reward': 19209400.000002127, 'total_profit': 0.9826626717429163, 'position': 1}
info {'total_reward': 14625800.00000124, 'total_profit': 0.9773373249065562, 'position': 1}
info {'total_reward': 53867999.99999998, 'total_profit': 1.0180095847348958, 'position': 1}
As far as i understand profit, if profit is >1 it is profit otherwise <1 is loss. My question is why is the total_reward positive in cases where the total_profit is actually <1 (loss).
Also it seems like it trades too frequently even when it shouldn't, Can we add one such action like wait to see a bigger price difference or trend. (Sorry if there's a proper term for it, I am new to trading stuff)
I would like to clarify something. So I imported the standard environment 'stocks-v0' but then overlayed a custom environment on top of that with extra features beyond just OHVLC. I split my dataframe into training and testing. Lets say the shape[0] of my df is 100,000 with row 100,000 as the latest trade data row. If I train on the first X rows, then test on X+1 rows up to row 99,999, is the position that the model spits out, 1 or 0, for the 100,000 row?
QObject::moveToThread: Current thread (0x141cf30) is not the object's thread (0x1a84fd0).
Cannot move to target thread (0x141cf30)
qt.qpa.plugin: Could not load the Qt platform plugin "xcb" in "/home/jothi/Software/btgym/venv/lib/python3.8/site-packages/cv2/qt/plugins" even though it was found.
This application failed to start because no Qt platform plugin could be initialized. Reinstalling the application may fix this problem.
Available platform plugins are: xcb, eglfs, linuxfb, minimal, minimalegl, offscreen, vnc, wayland-egl, wayland, wayland-xcomposite-egl, wayland-xcomposite-glx, webgl.
Aborted (core dumped)
Hi, this is not an issue, but after days of trying to figure this out, I wanted to ask in case someone has an advice for me. First I found this issue on my own custom env. I tried DQN, A2C, PPO and all of them are doesn't know which way to go. It just fluctuates between best and worst possible reward. It learns perfectly, because when it is negative it is the worst possible outcome. Then I wanted to try your env which is very clean and easy to understand, but I am having the exact same issue. Do you have any experience with something like this? I'm doing something wrong but couldn't find it. Thanks.
Hello,
with the below code I am presented with the error stated in the title. I want my window to be as big as my df frame.
custom_env = gym.make('stocks-v0', df = data, window_size = 3927, frame_bound = (1, 3927))
The error:
Traceback (most recent call last):
File "d:\ReinforcementLearning\BaseLines\Trading\RL Trading.py", line 131, in <module>
results = model.learn(int(100000))
File "D:\Anaconda\envs\RL\lib\site-packages\stable_baselines\deepq\dqn.py", line 216, in learn
action = self.act(np.array(obs)[None], update_eps=update_eps, **kwargs)[0]
File "D:\Anaconda\envs\RL\lib\site-packages\stable_baselines\deepq\build_graph.py", line 159, in act
return _act(obs, stochastic, update_eps)
File "D:\Anaconda\envs\RL\lib\site-packages\stable_baselines\common\tf_util.py", line 287, in <lambda>
return lambda *args, **kwargs: func(*args, **kwargs)[0]
File "D:\Anaconda\envs\RL\lib\site-packages\stable_baselines\common\tf_util.py", line 330, in __call__
results = sess.run(self.outputs_update, feed_dict=feed_dict, **kwargs)[:-1]
File "D:\Anaconda\envs\RL\lib\site-packages\tensorflow\python\client\session.py", line 900, in run
run_metadata_ptr)
File "D:\Anaconda\envs\RL\lib\site-packages\tensorflow\python\client\session.py", line 1111, in _run
str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (1, 3926, 2) for Tensor 'deepq/input/Ob:0', which has shape '(?, 3927, 2)'
Not sure if here is the best place to ask a question.
In the _calculate_reward
function, the reward does not seem to consider when shorting. Trade is true, but it does not add reward when we short and price goes down or remove reward if price goes up when shorting.
if trade:
current_price = self.prices[self._current_tick]
last_trade_price = self.prices[self._last_trade_tick]
price_diff = current_price - last_trade_price
if self._position == Positions.Long:
step_reward += price_diff
Shouldn't it be changed to:
if trade:
current_price = self.prices[self._current_tick]
last_trade_price = self.prices[self._last_trade_tick]
price_diff = current_price - last_trade_price
if self._position == Positions.Long:
step_reward += price_diff
else:
step_reward -= price_diff # Change here to account for shorting
I am struggling on how to use this environment with Ray's RLLIB.
Any idea or sample?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.