Comments (6)
Why do FOCOPS and CUP also utilize the Adam optimizer? Given that both CUP and FOCOPS, as first-order optimization algorithms, also have a substantial dependence on hyperparameters, we believe that implementing the Adam optimizer as opposed to the original SGD optimizer could provide a smoother operation, thereby enhancing the algorithm's performance.
As for supporting the original implementation in the future we're considering introducing the original implementation, that is, the SGD optimizer, as an option in our code and will disclose the ablation test results to the community while updating our code accordingly.
from safe-policy-optimization.
In this case, can I consider that FOCOPS and CUP have no difference in handling cost constraints compared to Lagrangian methods such as PPO-Lagrangian, and their main difference is the way of updating actors?
from safe-policy-optimization.
Sure, in code implementation, these three algorithms bear striking similarities. Their difference, indeed, lies solely in the actor-update process.
from safe-policy-optimization.
The last question is about TRPO class algorithms, such as TRPO, TRPO-Lagrangian, CPO, should they use multiple epochs of full batch or multiple epochs of mini batch to update the critic networks, I found that most of the TRPO class algorithms on the internet use multiple epochs of full batch to update the critic, while most of the PPO class algorithms utilize mini batch. In your implementation, you uniformly use multiple epochs of mini-batch to update the critic, is it because it is more effective and fair to try to preserve the comparison with the first-order methods of the PPO class?
Also, I found that using multiple epochs of full batch to update the critic of TRPO-like algorithms, the training time is much faster than multiple epochs of mini-batch because the number of updates is much lower, is it possible to adopt multiple epochs of full batch to update the critic in TRPO-based algorithms?
from safe-policy-optimization.
In our implementation process, we've referred to Tianshou and Stable-Baselines and have employed multiple mini-batches for multiple rounds of critic updates. We've experimented with using full-batch updates for the critic in previous tests, but its performance didn't quite measure up to the mini-batch approach.
from safe-policy-optimization.
I will further verify the effectiveness of updating the critic network for full batch, and thank you very much for your patient answer, which has solved my long-standing question.
from safe-policy-optimization.
Related Issues (10)
- Process conflict casused abnormal termination HOT 4
- Question about the implementation of the IPO HOT 1
- why? HOT 3
- Safexp-PointGoal1-v0 vs SafetyPointGoal1-v0 HOT 2
- Question about rescale in cpo HOT 1
- Something about IPO HOT 3
- question about env reset HOT 2
- Question about logger value HOT 2
- Question about the torch.size of loss_pi in focops implement HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from safe-policy-optimization.