Giter Site home page Giter Site logo

Comments (6)

GitPascalP avatar GitPascalP commented on May 29, 2024

For my understanding the momentary implementation provides:

  • a reward for "normal control"

  • a violation reward for passing the limits (absolute value) given by the physical system

  • violation reward can be changed with the gamma parameter

       - all in Reward/ Reward function class
    

The idea is now:

  • new Monitorclass which checks for the reward fct. if a violation occurs, so the reward can react to this.
  • defining limits in the monitor, not just physical system limits, but also newly defined limit (maybe user-defined if it works with the motor)
  • extend the 2 cases (normal reward, violation => very high reward, termination) with the soft case:
    violation => higher reward as usual but no termination.

I think at the time its possible to simply overwrite the violationreward with a extern function, to have any violationreward (e.g. Constant,...). So here could be also an extension, that you can choose the kind of reward more easily with an argument in the make() statement. (if this topic is included in this issue)

Regarding the distribution of tasks, i think its a matter of the definiton what the monitor have to do. Maybe the monitor should really only monitor the states (with respect to the defined/ used limits) and deliver all nessesary infos to the reward function, which than can calculate the reward.

from gym-electric-motor.

atra94 avatar atra94 commented on May 29, 2024

new Monitorclass which checks for the reward fct. if a violation occurs, so the reward can react to this.

Yes, that is the intention, behind this issue.

defining limits in the monitor, not just physical system limits, but also newly defined limit (maybe user-defined if it works with the motor)

I don't know if we need "newly defined limits". Basically, it is just checking the systems state for undesired values / value combinations. Fixed limits, as I undestand them, would again restrict the ConstraintMonitor in its usage. In my opinion, the ConstraintMonitor could call a list of functions (the constraints) that take the systems state and return True/False or a value in [0..1], if you want to consider soft constraints. The user can then choose predefined constraints (like a simple limit observation) or specify its own constraints.

I think at the time its possible to simply overwrite the violationreward with a extern function, to have any violationreward (e.g. Constant,...). So here could be also an extension, that you can choose the kind of reward more easily with an argument in the make() statement. (if this topic is included in this issue)

I think this is not necessarily part of this issue.

Regarding the distribution of tasks, i think its a matter of the definiton what the monitor have to do. Maybe the monitor should really only monitor the states (with respect to the defined/ used limits) and deliver all nessesary infos to the reward function, which than can calculate the reward.

Yes, the RewardFunction is responsible for the reward and the ConstraintMonitor for the constraints. As already mentioned, a single value in [0..1] could be sufficient to pass to the RewardFunction.

from gym-electric-motor.

wallscheid avatar wallscheid commented on May 29, 2024

I don't know if we need "newly defined limits". Basically, it is just checking the systems state for undesired values / value combinations. Fixed limits, as I undestand them, would again restrict the ConstraintMonitor in its usage. In my opinion, the ConstraintMonitor could call a list of functions (the constraints) that take the systems state and return True/False or a value in [0..1], if you want to consider soft constraints. The user can then choose predefined constraints (like a simple limit observation) or specify its own constraints.

I totally agree with the proposd way of parsing the states (and potentially also the latest action) to an arbitrary list of functions. This would enable lot of flexibility in how we would be able to define constraints i.e. we could implement box/linear/nonlinear constraints on single individual states or also combining multiple states together. Very nice.

I also agree to the SoftConstraint idea: lets use [0...1] as a scalar feedback from the ConstraintMonitor where 1 equals a full hard constraint violation (leading to a termination of the episode as it is currently implemented) and values above 0 but below 1 would be considered a tolerable/almost limit violation which would not terminate the episode directly.

from gym-electric-motor.

GitPascalP avatar GitPascalP commented on May 29, 2024

In my understanding, the scalar feedback then can be used by the reward function as a scaling factor (or exponent like reward_power).
To get the scalar feedback the amount of limit violation (independent from defined limitation functions, or which state is violated) can be evaluated by a penalty function (soft constraint optimization kind) which can be tuned by a user parameter input. The question is how complicated should be the penalty calculation. I think it is in any case important the user can define the strictness of the penalty

from gym-electric-motor.

wallscheid avatar wallscheid commented on May 29, 2024

The penalty calculation can be kept more or less simple at the moment. Since we will not implement a soft constraint feedback (e.g. the scalar feedback will be binary 0 or 1 until further change in the code), the penalty could be just a bias offset to the reward e.g. -10000 or any other crazy value indicating that something is really wrong.

In the long run the soft constraint feedback can then be used in more sophisticated ways in order to foster safe RL algorithms.

from gym-electric-motor.

GitPascalP avatar GitPascalP commented on May 29, 2024

Given the described requirements or ideas, a new class ConstraintMonitor is implemented. The class, in particular, a check method for the specified constraint is called by the reward function. At the moment it checks the constraints or physical limits and returns a "hard" 0 or 1 to the reward function.

It further Implementations it should be convenient to implement a kind of penalty function for a soft return. It has to be discussed which further extensions can be useful, e. g. time-variant constraints (harder punishment at the beginning, to achieve smoother starting characteristics, ...), where it can be implemente in the new class too.

from gym-electric-motor.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.