Giter Site home page Giter Site logo

Comments (2)

Anttwo avatar Anttwo commented on May 26, 2024

Hi JuliusQv,

Thank you for your nice words!

In general, multiplying the position-learning rate by the scale of the scene (i.e. the radius of camera positions) is a very good idea as it makes the method invariant to the general scale of the scene. This was actually the idea of the authors of the original Gaussian Splatting paper.

Let me explain.
The position learning rate controls how much your Gaussians move in the scene at each training iteration: the larger the learning rate, the harder it is for the Gaussians to do very small moves. The smaller the learning rate, the harder it is for the Gaussians to make large moves.

Let's say you have two instances of the same datasets, with the exact same SfM point cloud and same camera poses. However, you multiply by 2 the scale of one of these two identical scenes (so you multiply by 2 the positions of the cameras, and the coordinates of the points in the SfM point cloud).
If you multiply the learning rate of the larger scene by 2, then the two scenes will be optimized in the same way.
On the contrary, if you do not multiply the position learning rate by 2, then the two scenes will behave differently, and the learning rate may be too small in the larger scene.

Consequently, multiplying the learning rate by the scale of the scene makes the method much more robust, as it does not depend on the scale of the SfM (so if you use an SfM method that outputs coordinates between -1 and 1, you'll get the same results as another SfM method that outputs coordinates between -10 and 10).

OK, now we are convinced that taking position_lr_init * spatial_lr_scale as the learning rate is a good idea.
But what about the value of position_lr_init?

Similarly, the value of position_lr_init has an influence on the optimization: the larger its value, the harder it is for Gaussians to fit very small structures.
The default value of position_lr_init is optimized to produce good reconstructions on standard scenes, with reasonable size. Let's say that, in a standard scene, the size of an object is between 1/100th and 1/10th the size of the full scene. Then, the default value of position_lr_init is optimized to allow Gaussians to fit elements with a size between 1/100th and 1/10th the size of the full scene.

On the contrary, in a very large scene like a city district, the typical size of an object may be 1/1000th of the size of the full scene. Then, if you use the default value of position_lr_init, it is impossible for Gaussians to fit correctly to objects of your scene. Therefore, lowering the value of position_lr_init greatly improves the quality of the reconstruction.

Question: So why don't we always use a very small learning rate, so that details can be reconstructed in any scene?

Using a small learning rate enforces Gaussians to make small moves and to be smaller, so you will generally need more Gaussians to reconstruct your scene. For standard scenes, you will end up with too many unnecessary Gaussians, which is very inefficient for memory. On the contrary, when reconstructing a large scene like a city disctrict, you generally have much more SfM points, and want much more gaussians in the scene.

I hope this message helps you!
Thanks!

from sugar.

JuliusQv avatar JuliusQv commented on May 26, 2024

Thank you for your detailed explanation.
I have figured out this problem.

from sugar.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.