Comments (4)
Greetings.
I would say that the
the resampled actions from this part (right hand side
you have mentioned instead correspond to the following:
curr_actions_tensor, curr_log_pis = self._get_policy_actions(obs, num_actions=self.num_random, network=self.policy)
# skipped lines
q1_curr_actions = self._get_tensor_values(obs, curr_actions_tensor, network=self.qf1)
q2_curr_actions = self._get_tensor_values(obs, curr_actions_tensor, network=self.qf2)
So it should be current state with new actions.
That would correspond to the lines highlighted above I think.
Also, from the equation in your comment, the log exp sum is computed over multiple actions {a_i }, i \in {1.. num_actions}
sampled for a specific state s
.
Therefore, if we were to rigorously follow that same equation, if we compute the new_curr_actions_tensor
using the next_obs
, the log sum exp should also be taken with respect to those next_obs
, I think.
Nevertheless, it would still "work" since the goal of the CQL objective is to minimize the the Q values for know states but "out of distributions" actions. Namely, new_curr_actions_tensor
would indeed be "out of distribution" with respect to states next_obs
.
from cql.
I think this is OK actually.
Perhaps confusingly named "q1_next_actions" but it seems to be the resampled actions from this part (right hand side) of the estimation for the log sum exp term (from appendix F in the paper):
So it should be current state with new actions.
from cql.
Sorry for the late reply. It is mathematically correct, since it is just a third term for passing action samples for computing the logsumexp. In this code version, the log-sum-exp is computed using there terms:
- Actions from the current policy. This is what I guess is clear.
- Uniform actions.
- Actions from the policy at the next state. Note that we can still use these next actions with the state since these are just action samples given to us. If we know the probabilities from which these actions are sampled, which is \pi(next_actions|next_obs), then the Q-function term should be Q(curr_state, next_action) - \log \pi(next_action|next_obs), where this is fine, since we sampled next actions from the policy at the next state but we are using these action samples to compute the log-sum-exp of the Q-function.
from cql.
Thanks for the answer.
from cql.
Related Issues (20)
- Error when creating quantile agent for using CQL for Atari HOT 2
- Function argument problem about expl_path_collector.collect_new_paths() HOT 1
- About the readability HOT 2
- QF_Loss backprops policy network HOT 17
- Logsumexp calculation in CQL(H) for continuous action space HOT 2
- Potential mismatch between math and code for CQL(rho) HOT 2
- cannot reproduce results of adroit task hammer-cloned, relocate-human and relocate-cloned HOT 1
- code bugs HOT 2
- Examples for D4RL Experiments
- Unable to open file: 'batch_rl/fixed_replay/configs/quantile_pong.gin' HOT 1
- seed is not actually used in the script HOT 1
- ImportError: cannot import name 'elem_or_tuple_to_numpy' from 'rlkit.torch.core' HOT 3
- ResolvePackageNotFound HOT 1
- About the derivation in paper HOT 1
- `rlkit/torch/sac/cql.py` not found HOT 1
- Why "alpha_prime" and "min_q_weight" are not the same thing when using Lagrange?
- Make checkpoints public
- Why substract entropy from Q-values? ("min_q_version == 3") HOT 1
- About hyperparameters for D4RL Kitchen
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cql.