To enable more complex constraints than simple limit observation (e.g. the PMSMs squar

Add ConstraintMonitor about gym-electric-motor HOT 6 CLOSED

upb-lea commented on May 29, 2024

Add ConstraintMonitor

from gym-electric-motor.

Comments (6)

GitPascalP commented on May 29, 2024

For my understanding the momentary implementation provides:

a reward for "normal control"
a violation reward for passing the limits (absolute value) given by the physical system
violation reward can be changed with the gamma parameter
```
   - all in Reward/ Reward function class
```

The idea is now:

new Monitorclass which checks for the reward fct. if a violation occurs, so the reward can react to this.
defining limits in the monitor, not just physical system limits, but also newly defined limit (maybe user-defined if it works with the motor)
extend the 2 cases (normal reward, violation => very high reward, termination) with the soft case:
violation => higher reward as usual but no termination.

I think at the time its possible to simply overwrite the violationreward with a extern function, to have any violationreward (e.g. Constant,...). So here could be also an extension, that you can choose the kind of reward more easily with an argument in the make() statement. (if this topic is included in this issue)

Regarding the distribution of tasks, i think its a matter of the definiton what the monitor have to do. Maybe the monitor should really only monitor the states (with respect to the defined/ used limits) and deliver all nessesary infos to the reward function, which than can calculate the reward.

from gym-electric-motor.

atra94 commented on May 29, 2024

new Monitorclass which checks for the reward fct. if a violation occurs, so the reward can react to this.

Yes, that is the intention, behind this issue.

defining limits in the monitor, not just physical system limits, but also newly defined limit (maybe user-defined if it works with the motor)

I don't know if we need "newly defined limits". Basically, it is just checking the systems state for undesired values / value combinations. Fixed limits, as I undestand them, would again restrict the ConstraintMonitor in its usage. In my opinion, the ConstraintMonitor could call a list of functions (the constraints) that take the systems state and return True/False or a value in [0..1], if you want to consider soft constraints. The user can then choose predefined constraints (like a simple limit observation) or specify its own constraints.

I think at the time its possible to simply overwrite the violationreward with a extern function, to have any violationreward (e.g. Constant,...). So here could be also an extension, that you can choose the kind of reward more easily with an argument in the make() statement. (if this topic is included in this issue)

I think this is not necessarily part of this issue.

Regarding the distribution of tasks, i think its a matter of the definiton what the monitor have to do. Maybe the monitor should really only monitor the states (with respect to the defined/ used limits) and deliver all nessesary infos to the reward function, which than can calculate the reward.

Yes, the RewardFunction is responsible for the reward and the ConstraintMonitor for the constraints. As already mentioned, a single value in [0..1] could be sufficient to pass to the RewardFunction.

from gym-electric-motor.

wallscheid commented on May 29, 2024

I don't know if we need "newly defined limits". Basically, it is just checking the systems state for undesired values / value combinations. Fixed limits, as I undestand them, would again restrict the ConstraintMonitor in its usage. In my opinion, the ConstraintMonitor could call a list of functions (the constraints) that take the systems state and return True/False or a value in [0..1], if you want to consider soft constraints. The user can then choose predefined constraints (like a simple limit observation) or specify its own constraints.

I totally agree with the proposd way of parsing the states (and potentially also the latest action) to an arbitrary list of functions. This would enable lot of flexibility in how we would be able to define constraints i.e. we could implement box/linear/nonlinear constraints on single individual states or also combining multiple states together. Very nice.

I also agree to the SoftConstraint idea: lets use [0...1] as a scalar feedback from the ConstraintMonitor where 1 equals a full hard constraint violation (leading to a termination of the episode as it is currently implemented) and values above 0 but below 1 would be considered a tolerable/almost limit violation which would not terminate the episode directly.

from gym-electric-motor.

GitPascalP commented on May 29, 2024

In my understanding, the scalar feedback then can be used by the reward function as a scaling factor (or exponent like reward_power).
To get the scalar feedback the amount of limit violation (independent from defined limitation functions, or which state is violated) can be evaluated by a penalty function (soft constraint optimization kind) which can be tuned by a user parameter input. The question is how complicated should be the penalty calculation. I think it is in any case important the user can define the strictness of the penalty

from gym-electric-motor.

wallscheid commented on May 29, 2024

The penalty calculation can be kept more or less simple at the moment. Since we will not implement a soft constraint feedback (e.g. the scalar feedback will be binary 0 or 1 until further change in the code), the penalty could be just a bias offset to the reward e.g. -10000 or any other crazy value indicating that something is really wrong.

In the long run the soft constraint feedback can then be used in more sophisticated ways in order to foster safe RL algorithms.

from gym-electric-motor.

GitPascalP commented on May 29, 2024

Given the described requirements or ideas, a new class ConstraintMonitor is implemented. The class, in particular, a check method for the specified constraint is called by the reward function. At the moment it checks the constraints or physical limits and returns a "hard" 0 or 1 to the reward function.

It further Implementations it should be convenient to implement a kind of penalty function for a soft return. It has to be discussed which further extensions can be useful, e. g. time-variant constraints (harder punishment at the beginning, to achieve smoother starting characteristics, ...), where it can be implemente in the new class too.

from gym-electric-motor.

Add ConstraintMonitor about gym-electric-motor HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent