Hello, this is really a fascinating job. I have a question. I understand that this

Training Requirement,about tobias-kirschstein/nersemble

Comments (6)

tobias-kirschstein commented on July 25, 2024

Hi Lee,

I did some experiments to see what model configurations you can fit on an RTX 3090.
The most promising I could find was using only 8 instead of 32 hash encodings and slightly restricting the number of simultaneous samples that are being processed:
--n_hash_encodings 8 --latent_dim_time 8 --max_n_samples_per_batch 19
This should still give you a reasonable performance but will be noticeably worse than the full model when the observed movements are very complex.
In the paper, we already experimented with using 16 hash encodings, which only marginally impaired the results. Going further down to 8 will have a similar effect. The extreme case would be to only use a single hash encoding, which is equivalent to the NGP + Def. ablation in Table 3 of the paper. The performance will suffer, but it was still on par with DyNeRF in our experiments.
So, playing around with the number of hash encodings is a good way to address GPU memory concerns and will still give reasonable results.

So far, I haven't tried running the full model in a distributed manner. But the first thing I would try here is to distribute the hash encodings to different GPUs. The starting point would be the hash ensemble implementation:

nersemble/src/nersemble/nerfstudio/field_components/hash_ensemble.py

Line 102 in 2424f47

for h, hash_encoding in enumerate(self.hash_encodings):

where we loop over the hash encodings and collect the spatial features. I guess, it shouldn't be too hard to have the hash grids reside on separate GPUs and communicate the 3d positions as well as the queried spatial features with a dedicated main GPU or something.

Hope this helps

from nersemble.

LeeHanmin commented on July 25, 2024

Thanks a lot！

from nersemble.

LeeHanmin commented on July 25, 2024

Hi, I have successfully trained Nersemble and it is awesome. Can I get the video taken by each camera with one of the IDs?

from nersemble.

tobias-kirschstein commented on July 25, 2024

Glad you like it!
Not exactly sure what you mean by "get the video taken by each camera with one of the IDs".
I assume you are talking about rendering the trained model from each camera?
You can get the predictions from the evaluation cameras by running the evaluation script (see section 3.2. in the README).
Use the flags
--skip_timesteps 3 --max_eval_timesteps -1
to tell the evaluation script that you want to render every 3rd timestep (=24.3fps).
The rendered images will be put into some subfolder in ${NERSEMBLE_MODELS_PATH}/NERS-XXX-${name}/evaluation.
From there, it should be straightforward to pack the rendered images into a video.

from nersemble.

LeeHanmin commented on July 25, 2024

Sorry I wasn't clear enough. What I mean is that I want the video of 16 monocular cameras with a certain id such as 124 from the first frame to the last frame. Could you please provide it to me?

from nersemble.

tobias-kirschstein commented on July 25, 2024

Sorry, I still don't quite understand your request.
What exactly do you need?
Do you need the 16 videos of a person from the dataset to train NeRSemble? In that case section 2 of the README describes that.
But since you wrote "I have successfully trained Nersemble" above, I assumed you just want to render a trained model from the 12 training and 4 evaluation viewpoints. But my last comment describes how to get those renderings.
Not sure what other "video of 16 monocular cameras" you are referring to? Do you maybe mean the circular renderings as in the teaser image in the README?

from nersemble.

Training Requirement about nersemble HOT 6 CLOSED

Comments (6)

Related Issues (13)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent