Giter Site home page Giter Site logo

About the action space about personae HOT 6 CLOSED

ceruleanacg avatar ceruleanacg commented on August 23, 2024
About the action space

from personae.

Comments (6)

Ceruleanacg avatar Ceruleanacg commented on August 23, 2024

For DDPG, the action space is self.codes_count * 3, because here DDPG is implemented for continuous action space, so here for each stock code, the action is [-1, 1] which represents the possibility of taking each action.

For PolicyGradient, the action space is still self.codes * 3, because in fact, we can only take one action for each state, the same to DDPG, so you may find some logical problems in DDPG, because DDPG tries to do self.codes actions in one state, that is not reasonable.

So I implemented method for test forward_v2 to avoid this problem.

Thank you very much.

from personae.

ewanlee avatar ewanlee commented on August 23, 2024

@Ceruleanacg Your explanation is very detailed, thank you very much. So you define an action as an operation (buy, sell or hold) on a stock.

In _get_next_info method, You compare the number of operations performed by the trader at current state self.trader.action_times with the number of stocks self.code_count. I guess you want to jump to the next state after you have operated on all the stocks? If you do not complete the operation of all stocks then the current state will not change. But in the training phase of the PolicyGradient, you use greedy strategy (the use_prob parameter is False ) to interact with the market. If the current state is unchanged, the action taken is unchanged. Then self.code_count actions are performed on the same stock with same operation. Is it something unreasonable?

from personae.

Ceruleanacg avatar Ceruleanacg commented on August 23, 2024

In the method _get_next_info, there are two factors that will influence state_next, the first is current_date, which will be updated by comparing the self.trader.action_times with self.code_count in order to get next date, the second is in _get_scaled_stock_data_as_state method, which inserts self.trader.cash and self.trader.holding_value into state_next.

So for PolicyGradient that uses forward_v2, every action taken will influence the next state.

from personae.

ewanlee avatar ewanlee commented on August 23, 2024

I am sorry I did not read the code carefully. I have the last two questions:

  1. What is your purpose for compare self.trader.action_times and self.code_count ?
  2. Whether or not the PolicyGradient will prematurely converge to a local optimum if you set use_prob always False?

from personae.

Ceruleanacg avatar Ceruleanacg commented on August 23, 2024

For question 1, if the self.trader.action_times == self.code_count, it means that the self.current_date needs to be updated in order to get stock data for next date.

For question 2, actually we will get local optimum, you can also set it true if you want, but, how to say, i found PolicyGradient performs very bad if I set it true :)

If you have further questions, you can add my WeChat 17392810723, we could learn more from each other.

from personae.

ewanlee avatar ewanlee commented on August 23, 2024

Alright, thank you very much 👍

from personae.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.