Giter Site home page Giter Site logo

Comments (6)

Gaiejj avatar Gaiejj commented on June 9, 2024

Why do FOCOPS and CUP also utilize the Adam optimizer? Given that both CUP and FOCOPS, as first-order optimization algorithms, also have a substantial dependence on hyperparameters, we believe that implementing the Adam optimizer as opposed to the original SGD optimizer could provide a smoother operation, thereby enhancing the algorithm's performance.

As for supporting the original implementation in the future we're considering introducing the original implementation, that is, the SGD optimizer, as an option in our code and will disclose the ablation test results to the community while updating our code accordingly.

from safe-policy-optimization.

lijie9527 avatar lijie9527 commented on June 9, 2024

In this case, can I consider that FOCOPS and CUP have no difference in handling cost constraints compared to Lagrangian methods such as PPO-Lagrangian, and their main difference is the way of updating actors?

from safe-policy-optimization.

Gaiejj avatar Gaiejj commented on June 9, 2024

Sure, in code implementation, these three algorithms bear striking similarities. Their difference, indeed, lies solely in the actor-update process.

from safe-policy-optimization.

lijie9527 avatar lijie9527 commented on June 9, 2024

The last question is about TRPO class algorithms, such as TRPO, TRPO-Lagrangian, CPO, should they use multiple epochs of full batch or multiple epochs of mini batch to update the critic networks, I found that most of the TRPO class algorithms on the internet use multiple epochs of full batch to update the critic, while most of the PPO class algorithms utilize mini batch. In your implementation, you uniformly use multiple epochs of mini-batch to update the critic, is it because it is more effective and fair to try to preserve the comparison with the first-order methods of the PPO class?

Also, I found that using multiple epochs of full batch to update the critic of TRPO-like algorithms, the training time is much faster than multiple epochs of mini-batch because the number of updates is much lower, is it possible to adopt multiple epochs of full batch to update the critic in TRPO-based algorithms?

from safe-policy-optimization.

Gaiejj avatar Gaiejj commented on June 9, 2024

In our implementation process, we've referred to Tianshou and Stable-Baselines and have employed multiple mini-batches for multiple rounds of critic updates. We've experimented with using full-batch updates for the critic in previous tests, but its performance didn't quite measure up to the mini-batch approach.

from safe-policy-optimization.

lijie9527 avatar lijie9527 commented on June 9, 2024

I will further verify the effectiveness of updating the critic network for full batch, and thank you very much for your patient answer, which has solved my long-standing question.

from safe-policy-optimization.

Related Issues (10)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.