Comments (7)
Hi @alexbeloi, the step size is computed according to the TRPO paper: https://arxiv.org/pdf/1502.05477v4.pdf. You can find the formula in Appendix.C.
How negative is the computed value of descent_direction.dot(Hx(descent_direction))
, and can you describe more about your setup? This could happen if the code has a bug so that if you compute the mean KL is nonzero (or not sufficiently close to zero) before taking the step. We've also observed it sometimes happen with recurrent networks, although adjusting the nonlinearity seems to have solved it.
from rllab.
Hi @dementrock, it appears that mean KL is nonzero before taking the step because of something I'm doing. This issue came up when debugging the ISSampler with TRPO.
What I'm doing is taking (off-policy) stored paths, computing the agent_infos
for those paths with respect to the current policy using _, agent_infos = policy.get_action(observations)
, and then those agent_infos
get passed to old_dist_info_vars_list
in the optimizer.
What I expected was that the on-policy agent_infos
that I computed would be identical to the dist_info_vars = policy.dist_info_sym(obs_var, state_info_vars)
evaluated by the optimizer before taking the step, so kl = dist.kl_sym(old_dist_info_vars, dist_info_vars)
would be zero before the step, but this isn't the case.
Is there a difference between agent_info
computed from _, agent_infos = policy.get_action(observations)
and the evaluation of dist_info_vars = policy.dist_info_sym(obs_var, state_info_vars)
for obs_var
evaluated at observations
?
from rllab.
I feel there is some confusion on my part. Where does the NPO algorithm get values for old_dist_info_vars
and dist_info_vars
from?
from rllab.
Oh wow, super silly bug on my part. The last line of is_sampler.py
should return samples
not return paths
. This was the root of the issue.
from rllab.
@alexbeloi Re difference between agent_infos and evaluating dist_info_vars: agent_infos
may contain more entries than dist_info_vars
, but for the common keys their values should be the same. Otherwise there is a bug somewhere.
Does replacing return paths
with return samples
solve the NaN issue?
from rllab.
@dementrock yes, that one line fix solves the NaN issue. I made a pull request with the patch and a (now working) example of TRPO with ISSampler.
from rllab.
Awesome, thanks!
from rllab.
Related Issues (20)
- gym.wrappers.monitoring import error HOT 1
- Problem running rllab MazeAntEnv HOT 2
- ImportError: cannot import name 'MemmapingPool' HOT 8
- How to record videos in SwimmerGatherEnv
- Error Using Custom Env + GaussianGRU + VPG
- Docker intended running environment HOT 2
- Gaussian Policy - no inputs
- can not find files vendor/mujoco/ HOT 4
- Dockerfiles unnecessarily large
- AttributeError: 'NoneType' object has no attribute 'put' HOT 1
- Difference between std_hidden_nonlinearity and hidden_nonlinearity?
- gradient descent to optimize the TRPO or PPO algorithm?
- No module named 'cached_property' HOT 1
- How to improve the GPU-Util when running RL program with RLLab. HOT 2
- setup_linux.sh always exits before creating environment
- Error while instantiating <class 'rllab.envs.gym_env.GymEnv'> HOT 1
- [Installation Issue]: ResolvePackageNotFound HOT 2
- How to test trained model??
- ResolvePackageNotFound:
- Stuck while training at 977 itr
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rllab.