Giter Site home page Giter Site logo

Comments (14)

zeux avatar zeux commented on August 19, 2024 9

Since I ended up looking into this a little bit, I'll share my findings in hopes that it will help.

Measured by clicking "Reimport" on the scene in an otherwise empty project, --verbose says import took 276 seconds (that's a little under 5 minutes).
Note that the scene has ~800 meshes that add up to ~39.3M triangles (~50k each, looks reasonably uniformly distributed). Overall I would have expected one mesh per scene here, but I'm not familiar with how Godot workflows work, and it's a good stress test regardless.

perf profile on Linux / editor build with default settings with fno-omit-frame-pointers -- please note that timings add up to 45% (perf doesn't normalize them):

image

Renormalizing the percentages by dividing by 0.45, and focusing on significant underlying components, we get:

  • 5% scene save
  • 14% tangent space generation
  • 25% normal reprojection after LOD generation (raycasts)
  • 29% simplification (meshopt_simplify)
  • 24% the rest of generate_lods (it's inlined here so hard to see from the profile exactly)

In aggregate, LOD generation takes ~78% here, so definitely good to focus on that. When looking at something like a 5-minute import though, my expectations are usually that small gains are not terribly exciting, so something more significant needs to happen.

A note on the scale here: each mesh gets approximately 6 LOD levels generated. The work for meshopt_simplify scales with that; the work for normal reprojection scales with the total number of rays, which scales with the total number of triangles in all LODs, times the area factor - looks like we cast 16..64 rays which is a lot of rays :)

If I were tackling this problem, I would entertain the following projects:

  1. For scenes with many large meshes like this, my first goal would be to process meshes in parallel. I'm not familiar with the details of ImporterMesh code but superficially nothing should prevent fully generating each mesh in parallel. Maybe that requires refactoring some of this code to actually be thread-safe. It would also require making sure that the dependent code is thread-safe internally - meshopt definitely is, I assume so is Embree, but some care would be required. That alone would probably get this to be under a minute on an 8-core system if we discount tangent space generation.

  2. I'm skeptical that tangent space generation is efficient here. For a sense of scale, meshopt_simplify does a fair bit more work per call, and it's called ~6 times per mesh here and still only takes twice as much time. I would assume tangent space generation has internal algorithmic inefficiencies and could be improved, but I haven't looked at that code myself.

I would not advise trying to optimize the internals of meshopt_simplify (trust me...). Some small future performance improvements are planned here in meshoptimizer but largely speaking unless this runs into some edge case, which it doesn't look like it does to me, it should be very well tuned already. Same for Embree - I would assume it's impractical to optimize that to the degree that is relevant here. However:

  1. I would certainly think of, at the minimum, reducing the amount of requested work from both meshopt_simplify and Embree here. Notably, meshopt_simplify is called approximately 6 times per mesh here and is asked to generate larger and larger meshes. Because of this, it does more or less the same amount of work: simplifying the mesh 2x is almost the same effort as simplifying the mesh 10x (... well, not quite, but it gets there quickly). However, in LOD chain generations you can usually generate the LODs in the opposite direction: start by requesting a ~1.5x smaller mesh, if that target is reached, ask for ~1.5x smaller mesh again, etc. I don't recall why the order here is reversed but I would consider flipping it and simplifying from the last LOD. I don't think that's going to reduce the work here 6x, but I would expect something like 3-4x improvement in cost to call simplify.

  2. In a similar vein, casting 16-64 rays per triangle is a lot, especially for higher levels of detail. I would probably reduce this in general or at least scale this as the LOD levels get closer to original mesh: in the limit, we're casting at least 16 rays per triangle here for something that only has 1.5x fewer triangles than original mesh, and that just feels wasteful. This has a risk of reducing the quality of the resulting normals because there's a higher chance of missing the mesh or hitting a wrong triangle. Maybe ray casts here aren't the right fit and averaging triangle normals from triangles that are in a bounding sphere of the generated triangle is better, but this brings me to my final point:

  3. We've already discussed this at some point in another issue, but overall I'm not 100% sure the current normal processing in the importer for LODs is generally beneficial. With the normal aware simplifier with the recent fixes, generally speaking I'd expect decent normals to come out of the simplifier itself. Sometimes that's not the case, but I'm not sure the ray cast logic is perfect either, and it's just a lot of complexity to always keep in mind. I do think the reindexing that happens in this code is beneficial for some faceted meshes though. So a good use of time would be to perhaps introduce an option for normal reprojection that would disable the ray cast based normal recreation (I'd expect that alone cuts half of the overhead of LOD generation here), test the option in a release, then maybe default it to skip the normal recreation and see if this comes up.

Hopefully this is helpful :) I would be happy to discuss (3)/(5) further and/or maybe contribute a patch or two as I'm generally interested in making sure simplification integration is working well for Godot; I'll leave 1/2/4 to others if they are motivated to work on this.

from godot.

fire avatar fire commented on August 19, 2024 5

As someone who works on this, I am supporting changes that improves quality and performance. Can review and help test.

from godot.

lvcivs avatar lvcivs commented on August 19, 2024 4

I tried this on 4.3.beta2.official and although it was very slow, it did eventually load after about 6 minutes (during the whole time it appeared stuck at 0%).
grafik

Opening the scene took a couple more minutes:
grafik
This was on Ubuntu 24.04. Edit: Godot uses about 9 GB of RAM with this scene open.

from godot.

Sluggernot avatar Sluggernot commented on August 19, 2024 2

Yes. I have been able to load the file. I did some quick benchmarking with Visual Studio and have a couple of very small efficiencies made locally. I need to benchmark the before and after when I get some really good changes made to this.
Main finding is that _parse_meshes is the main function loading this file. My changes are to GenerateSharedVerticesIndexList and one small one to static SVec3 GetPosition().

from godot.

fire avatar fire commented on August 19, 2024 2

I will try to review any pull requests that can improve load times on the 777mb glb with nothing broken.

from godot.

zeux avatar zeux commented on August 19, 2024 1

On "I'm not 100% sure the current normal processing in the importer for LODs is generally beneficial", I decided to do a quick comparison on the scene from this file. It looks like it's easy to disable normal override, basically just need to disable the ray caster creation (as mentioned earlier, I believe current splitting logic to be generally beneficial for faceted meshes). I then look at a few low LODs (where the risk of picking a bad normal due to ray casts is maximized), by tuning the LOD bias to be a very small value.

On the left (yes, left, I double checked!) is the import without using the raycaster. On the right is current master (raycaster enabled). Both levels are at ~2200 triangles. I see somewhat similar issues on a few other models - this is not universal, this happened to be the first model I checked, and some models from this scene look about the same with or without the raycaster enabled. But this to me is strong evidence that raycaster should be optional, and probably opt-in.

image
image

I've switched to using a smaller version of the scene from the original post (that one has 800 meshes but each mesh is duplicated 8 times, I've switched to a deduplicated version where there's only 100 meshes, easier to work with and faster to reimport). Reimport takes 37 seconds on master and 22 seconds without raycaster enabled.

from godot.

fire avatar fire commented on August 19, 2024

Can you check 4.3? The cow data size was increased to a larger number

from godot.

Sluggernot avatar Sluggernot commented on August 19, 2024

Tried on latest from github (4 or 5 days ago). I hang on import. Restarting the editor restarts and re-hangs the import, automatically.
For some reason my Attach to Process is being disconnected and reattaching it doesnt show me the Call Stack. (Mind currently blown.)
Just pulled latest and recompiling.

from godot.

AllenDang avatar AllenDang commented on August 19, 2024

@lvcivs I created this file just for testing purpose, want to see how godot will handle it :P

from godot.

JekSun97 avatar JekSun97 commented on August 19, 2024

After transferring the model to Godot 4.3 beta2, it still didn’t load for me, I waited 28 minutes, then closed it.
I also tested this on Blender 3.6.2, waited 3 minutes and Blender closed itself, which didn't happen with Godot.

Godot v4.3.beta2 - Windows 10.0.19045 - Vulkan (Mobile) - dedicated Radeon RX 560 Series (Advanced Micro Devices, Inc.; 31.0.14001.45012) - Intel(R) Core(TM) i5-4570 CPU @ 3.20GHz (4 Threads)

from godot.

fire avatar fire commented on August 19, 2024

Next steps is to get profiles for the load.

My recommendations is use either https://github.com/mstange/samply or https://superluminal.eu/

from godot.

Sluggernot avatar Sluggernot commented on August 19, 2024

Oh... Nothing broken? Ah, nevermind then.
Really, yes my first challenge is proving that it is faster.
Thanks!

from godot.

Sluggernot avatar Sluggernot commented on August 19, 2024

Ok, I didnt know github would add these comments from my own fork because I referenced the Issue in the description. I will be avoiding that in the future.

from godot.

Sluggernot avatar Sluggernot commented on August 19, 2024

Wow, well that is surprising. Are there any examples where the raycaster was better in visual fidelity. (I understand that's somewhat subjective but your above screenshot feels fairly objective as to which is "better.")
I've been diving further into this section of code throughout the day today, attempting to rally myself before trying multithreading. I really appreciate your write-up. This is absolutely great to see!

from godot.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.