Giter Site home page Giter Site logo

Log prob Computation about ddpo HOT 2 CLOSED

jannerm avatar jannerm commented on August 23, 2024
Log prob Computation

from ddpo.

Comments (2)

kvablack avatar kvablack commented on August 23, 2024

Excellent question! I actually implemented it as jnp.sum first, and immediately got NaNs after the first training step. I quickly realized this is due to the unusually high dimensionality of our action space. Typical applications of PPO (e.g. for control) have $N < 100$, whereas our actions are entire latent images, so $N = 64 \times 64 \times 4 = 16,384$. That means if our current jnp.mean implementation produces a ratio of just $r = 1.01$ for a given action, switching to jnp.sum would produce a ratio of $1.01^{16,384} \approx 6.33 \times 10^{70}$ (way above the maximum value for float32). That means our "true" clip range is actually much larger. If currently we clip to the ratio $(0.9999, 1.0001)$, we're clipping the "true" ratio to $(0.9999^{16,384}, 1.0001^{16,384}) \approx (0.19, 5.15)$.

However, I think this is about more than numerical stability. Even if our numerical format could support arbitrarily large (or small) density ratios, that really doesn't seem like what we would want. Unfortunately, I can't really give a better intuitive or theoretical justification than this, but it seems like it makes sense to clip based on the "average" change per pixel rather than the total change of the image. It might also have to do with the fact that we model the policy as an isotropic Gaussian, where every pixel is independent, but in reality policy updates are very highly correlated across pixels. FWIW this has probably been studied by somebody in the past, I'm just not familiar with the relevant literature.

from ddpo.

anschen1994 avatar anschen1994 commented on August 23, 2024

Clear explanation~ Thanks!

from ddpo.

Related Issues (11)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.