Comments (6)
For my understanding the momentary implementation provides:
-
a reward for "normal control"
-
a violation reward for passing the limits (absolute value) given by the physical system
-
violation reward can be changed with the gamma parameter
- all in Reward/ Reward function class
The idea is now:
- new Monitorclass which checks for the reward fct. if a violation occurs, so the reward can react to this.
- defining limits in the monitor, not just physical system limits, but also newly defined limit (maybe user-defined if it works with the motor)
- extend the 2 cases (normal reward, violation => very high reward, termination) with the soft case:
violation => higher reward as usual but no termination.
I think at the time its possible to simply overwrite the violationreward with a extern function, to have any violationreward (e.g. Constant,...). So here could be also an extension, that you can choose the kind of reward more easily with an argument in the make() statement. (if this topic is included in this issue)
Regarding the distribution of tasks, i think its a matter of the definiton what the monitor have to do. Maybe the monitor should really only monitor the states (with respect to the defined/ used limits) and deliver all nessesary infos to the reward function, which than can calculate the reward.
from gym-electric-motor.
new Monitorclass which checks for the reward fct. if a violation occurs, so the reward can react to this.
Yes, that is the intention, behind this issue.
defining limits in the monitor, not just physical system limits, but also newly defined limit (maybe user-defined if it works with the motor)
I don't know if we need "newly defined limits". Basically, it is just checking the systems state for undesired values / value combinations. Fixed limits, as I undestand them, would again restrict the ConstraintMonitor in its usage. In my opinion, the ConstraintMonitor could call a list of functions (the constraints) that take the systems state and return True/False or a value in [0..1], if you want to consider soft constraints. The user can then choose predefined constraints (like a simple limit observation) or specify its own constraints.
I think at the time its possible to simply overwrite the violationreward with a extern function, to have any violationreward (e.g. Constant,...). So here could be also an extension, that you can choose the kind of reward more easily with an argument in the make() statement. (if this topic is included in this issue)
I think this is not necessarily part of this issue.
Regarding the distribution of tasks, i think its a matter of the definiton what the monitor have to do. Maybe the monitor should really only monitor the states (with respect to the defined/ used limits) and deliver all nessesary infos to the reward function, which than can calculate the reward.
Yes, the RewardFunction is responsible for the reward and the ConstraintMonitor for the constraints. As already mentioned, a single value in [0..1] could be sufficient to pass to the RewardFunction.
from gym-electric-motor.
I don't know if we need "newly defined limits". Basically, it is just checking the systems state for undesired values / value combinations. Fixed limits, as I undestand them, would again restrict the ConstraintMonitor in its usage. In my opinion, the ConstraintMonitor could call a list of functions (the constraints) that take the systems state and return True/False or a value in [0..1], if you want to consider soft constraints. The user can then choose predefined constraints (like a simple limit observation) or specify its own constraints.
I totally agree with the proposd way of parsing the states (and potentially also the latest action) to an arbitrary list of functions. This would enable lot of flexibility in how we would be able to define constraints i.e. we could implement box/linear/nonlinear constraints on single individual states or also combining multiple states together. Very nice.
I also agree to the SoftConstraint idea: lets use [0...1] as a scalar feedback from the ConstraintMonitor where 1 equals a full hard constraint violation (leading to a termination of the episode as it is currently implemented) and values above 0 but below 1 would be considered a tolerable/almost limit violation which would not terminate the episode directly.
from gym-electric-motor.
In my understanding, the scalar feedback then can be used by the reward function as a scaling factor (or exponent like reward_power).
To get the scalar feedback the amount of limit violation (independent from defined limitation functions, or which state is violated) can be evaluated by a penalty function (soft constraint optimization kind) which can be tuned by a user parameter input. The question is how complicated should be the penalty calculation. I think it is in any case important the user can define the strictness of the penalty
from gym-electric-motor.
The penalty calculation can be kept more or less simple at the moment. Since we will not implement a soft constraint feedback (e.g. the scalar feedback will be binary 0 or 1 until further change in the code), the penalty could be just a bias offset to the reward e.g. -10000 or any other crazy value indicating that something is really wrong.
In the long run the soft constraint feedback can then be used in more sophisticated ways in order to foster safe RL algorithms.
from gym-electric-motor.
Given the described requirements or ideas, a new class ConstraintMonitor is implemented. The class, in particular, a check method for the specified constraint is called by the reward function. At the moment it checks the constraints or physical limits and returns a "hard" 0 or 1 to the reward function.
It further Implementations it should be convenient to implement a kind of penalty function for a soft return. It has to be discussed which further extensions can be useful, e. g. time-variant constraints (harder punishment at the beginning, to achieve smoother starting characteristics, ...), where it can be implemente in the new class too.
from gym-electric-motor.
Related Issues (20)
- Log Variables on each Time Step in PMSM Example Notebook[feature request] HOT 10
- incompatibility with gym >0.24.0 HOT 9
- Improve ReferenceGenerator chapter in Gem_Cookbook
- Flat velocity plot for scim_ideal_grid_simulation.py HOT 2
- Limit values for PMSM motor appear to be a peak to peak value vs an amplitude value HOT 2
- Do GEM currently support 3-phase brushless motor control? HOT 3
- Typo in the calculation of the decoupling HOT 1
- Feature Request: migrate from gym to gymnasium HOT 5
- Documentation is showing only current API (no past releases APIs)
- Phase voltage accuracy for PMSM motors HOT 5
- ConstReferenceGenerator enters the reference_name not as string into the env
- gem modularization
- modularization
- CosSinProcessor calculates Cosinus and Sinus based on normalized Epsilon
- DqToAbcActionProcessor calculates advanced angle wrongly
- Extend the GEM cookbook to explain voltage supply usage
- change the nominalization of the voltage
- Change the mapping of discrete switching states in FiniteB6BridgeConverter
- Hi, We are using the GEM library for testing the functionality of PMSM as a part of our undergraduate project. HOT 2
- Merge reference with state HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gym-electric-motor.