Hi, I'm running into this problem where any gpu backend would consis

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

gpu_rgb and gpu_autodiff_rgb CUDA out of memory about mitsuba2 HOT 15 CLOSED

mitsuba-renderer commented on July 17, 2024

gpu_rgb and gpu_autodiff_rgb CUDA out of memory

from mitsuba2.

Comments (15)

Speierers commented on July 17, 2024 9

When rendering with high spp, you can split the rendering process in multiple passes by using the samples_per_pass property of the integrator:

<integrator type="path">
    <integer name="samples_per_pass" value="64"/>
</integrator>

<sensor type="perspective">
    ...
    <sampler type="independent">
        <integer name="sample_count" value="256"/>
    </sampler>
    ...
</sensor>

Note that sample_count should be a multiple of sample_per_pass

Here the renderer should perform 4 passes with 64 SPP each and accumulate the results.

I have to say that this feature hasn't been thoroughly tested so please let me know if it works for you!

from mitsuba2.

Speierers commented on July 17, 2024 2

Here the autodiff computation graph is maintained between the passes, so it keeps growing until you eventually run out of memory on your GPU.

One solution would be to "perform autodiff on the image" within the loop for the passes. You can then average the gradients you get in the different passes.

from mitsuba2.

chaosink commented on July 17, 2024 1

@Speierers ~~You have a typo of samples_per_pass in the XML snippet, which I copied and got an XML parsing error.~~ 😄

Now fixed.

from mitsuba2.

Speierers commented on July 17, 2024 1

Randomly, when trying some of the above, it would also crash my console when trying to develop the HDRFilm. This seemed to be random to me, as when I repeated the same rendering (after crashing the terminal), it worked.

If this is reproducible, could you open another issue for this with a full log? Thanks!

from mitsuba2.

chaosink commented on July 17, 2024

What is the spp for the cbox scene?
Maybe try using a smaller one.
On my PC with NVIDIA 1080Ti (same memory size with 2080 Ti), the spp can only be up to 128 for the cbox scene.
The same error occurs if the spp is more than 128 here.

from mitsuba2.

philcn commented on July 17, 2024

That solves my problem. I have to keep it below 256x256@128spp. I understand that gpu_autodiff backend can use a lot of VRAM, but it looks like gpu_rgb uses no less. Are we not supposed to render with large spps using the OptiX backends?

from mitsuba2.

merlinND commented on July 17, 2024

The gpu_rgb backend should use less GPU memory than gpu_autodiff (no autodiff graph to store), but a large wavefront is still created and stored in memory. It's possible to split rendering into several passes so that each wavefront fits in memory.

from mitsuba2.

wjakob commented on July 17, 2024

@merlinND Just a side remark: the gpu_autodiff and gpu_rgb backends should use the same amount of memory if nothing is being differentiated. @philcn Can this issue be closed?

from mitsuba2.

merlinND commented on July 17, 2024

Right, I meant in case something is being differentiated.

from mitsuba2.

abhinavvs commented on July 17, 2024

Thanks for raising this issue! I am facing the same problem as well. My scene is 512x384 and composed of simple objects. I am not able to increase my spp beyond 50.

When compared to scalar_ mode, gpu_ variant is about 3x faster for my scene rendered with 50 spp. However, 50 spp is not yielding a satisfactory output and I need to increase the SPP to reduce the noise. I am able to run mitusba2 using scalar mode with very large spp (>1024, which is what I need).

I was hoping to reap the benefits of GPU acceleration for such time-intensive rendering tasks (with high SPP). Is there any hope to extend the capabilities of gpu_ variants to do something like this? If not, what is the best way to render scenes with low rendering noise (any other hack I can use other than simply increasing SPP) in Mitsuba2?

PS: Will using .ply or .serialized files for loading objects reduce this memory footprint as compared to .obj files?

from mitsuba2.

garethwalkom commented on July 17, 2024

I also had this issue when rendering the example cbox.xml scene using gpu_autodiff_spectral at 256x256 @ 256 samples, which gives me the error mentioned by @philcn:

2020-04-02 14:38:36 INFO  main  [mitsuba.cpp:194] Mitsuba version 2.0.0 (master[008cb4df], Windows, 64bit, 12 threads, 8-wide SIMD)
2020-04-02 14:38:36 INFO  main  [mitsuba.cpp:195] Copyright 2019, Realistic Graphics Lab, EPFL
2020-04-02 14:38:36 INFO  main  [mitsuba.cpp:196] Enabled processor features: cuda avx2 avx fma f16c sse4.2 x86_64
2020-04-02 14:38:36 INFO  main  [xml.cpp:1129] Loading XML file "resources\data\scenes\cbox\cbox.xml" ..
2020-04-02 14:38:36 INFO  main  [xml.cpp:1130] Using variant "gpu_autodiff_spectral"
2020-04-02 14:38:36 INFO  main  [PluginManager] Loading plugin "plugins\regular.dll" ..
2020-04-02 14:38:36 INFO  main  [PluginManager] Loading plugin "plugins\path.dll" ..
2020-04-02 14:38:36 INFO  main  [PluginManager] Loading plugin "plugins\independent.dll" ..
2020-04-02 14:38:36 INFO  main  [PluginManager] Loading plugin "plugins\gaussian.dll" ..
2020-04-02 14:38:36 INFO  main  [PluginManager] Loading plugin "plugins\hdrfilm.dll" ..
2020-04-02 14:38:36 INFO  main  [PluginManager] Loading plugin "plugins\perspective.dll" ..
2020-04-02 14:38:36 INFO  main  [PluginManager] Loading plugin "plugins\diffuse.dll" ..
2020-04-02 14:38:36 INFO  main  [PluginManager] Loading plugin "plugins\area.dll" ..
2020-04-02 14:38:36 INFO  main  [PluginManager] Loading plugin "plugins\d65.dll" ..
2020-04-02 14:38:36 INFO  main  [PluginManager] Loading plugin "plugins\obj.dll" ..
2020-04-02 14:38:36 INFO  main  [Scene] Validating and building scene in OptiX.

Caught a critical exception: cuda_malloc(): out of memory!

(here, I don't need to restart my console)

Rendering the same as the above, but now with 128 samples, gives me a different error:

2020-04-02 14:07:12 INFO  main  [mitsuba.cpp:194] Mitsuba version 2.0.0 (master[008cb4df], Windows, 64bit, 12 threads, 8-wide SIMD)
2020-04-02 14:07:12 INFO  main  [mitsuba.cpp:195] Copyright 2019, Realistic Graphics Lab, EPFL
2020-04-02 14:07:12 INFO  main  [mitsuba.cpp:196] Enabled processor features: cuda avx2 avx fma f16c sse4.2 x86_64
2020-04-02 14:07:12 INFO  main  [xml.cpp:1129] Loading XML file "resources\data\scenes\cbox\cbox.xml" ..
2020-04-02 14:07:12 INFO  main  [xml.cpp:1130] Using variant "gpu_autodiff_spectral"
2020-04-02 14:07:12 INFO  main  [PluginManager] Loading plugin "plugins\regular.dll" ..
2020-04-02 14:07:12 INFO  main  [PluginManager] Loading plugin "plugins\path.dll" ..
2020-04-02 14:07:12 INFO  main  [PluginManager] Loading plugin "plugins\independent.dll" ..
2020-04-02 14:07:12 INFO  main  [PluginManager] Loading plugin "plugins\gaussian.dll" ..
2020-04-02 14:07:12 INFO  main  [PluginManager] Loading plugin "plugins\hdrfilm.dll" ..
2020-04-02 14:07:12 INFO  main  [PluginManager] Loading plugin "plugins\perspective.dll" ..
2020-04-02 14:07:12 INFO  main  [PluginManager] Loading plugin "plugins\diffuse.dll" ..
2020-04-02 14:07:12 INFO  main  [PluginManager] Loading plugin "plugins\area.dll" ..
2020-04-02 14:07:12 INFO  main  [PluginManager] Loading plugin "plugins\d65.dll" ..
2020-04-02 14:07:12 INFO  main  [PluginManager] Loading plugin "plugins\obj.dll" ..
2020-04-02 14:07:12 INFO  main  [Scene] Validating and building scene in OptiX.
cuda_check(): runtime API error = 0002 "cudaErrorMemoryAllocation" in C:/Users/u0120819/mitsuba2/ext/enoki/src/cuda/horiz.cu:59.

(I need to restart my console after getting the above error)

However, the following fixed it (when using 128 samples):

When rendering with high spp, you can split the rendering process in multiple passes by using the samples_per_pass property of the integrator:
<integrator type="path">
    <integer name="samples_per_pass" value="64"/>
</integrator>

Thank you!

I ran some quick tests by adjusting the samples per pass with the same settings for 256x256 @ 128 samples:
max depth = 6:
64 samples per pass: took 2.606s to render
32 samples per pass: took 2.666s to render
16 samples per pass: took 2.811s to render

max depth = -1:
64 samples per pass: took 5.469s to render
32 samples per pass: took 6.385s to render
16 samples per pass: took 7.293s to render

I then took this a step further by adjusting the samples per pass, but now with 256x256 @ 256 samples:
max depth = -1:
128 samples per pass: gave the error from above of: cuda_check(): runtime API error = 0002 "cudaErrorMemoryAllocation" in C:/Users/u0120819/mitsuba2/ext/enoki/src/cuda/horiz.cu:59.
64 samples per pass: took 10.613s
32 samples per pass: took 12.502s

Randomly, when trying some of the above, it would also crash my console when trying to develop the HDRFilm. This seemed to be random to me, as when I repeated the same rendering (after crashing the terminal), it worked.

from mitsuba2.

philcn commented on July 17, 2024

Thanks @merlinND and @wjakob for explaining the memory usage of gpu backends, and @Speierers for the multipass example! My problem is solved.

from mitsuba2.

daralthus commented on July 17, 2024

Is samples_per_pass relevant for the pathreparam integrator too?

from mitsuba2.

Speierers commented on July 17, 2024

Yes it should.

from mitsuba2.

realWDC commented on July 17, 2024

Is samples_per_pass helpful for gpu_autodiff_* variants also when doing autodiff?

I am using following script to render and for autodiff, according to the suggestions in this thread (if my understanding is correct). I found passes is limited to a maximum value, beyond which it would cause the "out of memory" error, and this is confirmed by checking the GPU memory usage during program running.

# create scene, optimizer, ...

# render image using multi-pass as suggested in this thread
passes = 10 # increasing which beyond this number would cause out-of-memory.
spp_per_pass = 12
img_i = render(scene, optimizer=opt, spp=spp_per_pass)
for i in range(passes-1):
    img_i = render(scene, optimizer=opt, spp=spp_per_pass)
    img_i[img_i != img_i] = 0
    image += img_i
    del img_i
image = image / passes

# perform autodiff on image ...

Any suggestions?

from mitsuba2.

gpu_rgb and gpu_autodiff_rgb CUDA out of memory about mitsuba2 HOT 15 CLOSED

Comments (15)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent