Giter Site home page Giter Site logo

OOM when validating about jnerf HOT 6 OPEN

jittor avatar jittor commented on August 23, 2024
OOM when validating

from jnerf.

Comments (6)

Gword avatar Gword commented on August 23, 2024

Generally, the memory-usage of lego dataset is about 8G. You can pull the latest version and try it. Or you can try to reduce n_rays_per_batch in ngp_base.py to reduce the memory usage when validate.

from jnerf.

huangfuyang avatar huangfuyang commented on August 23, 2024

Generally, the memory-usage of lego dataset is about 8G. You can pull the latest version and try it. Or you can try to reduce n_rays_per_batch in ngp_base.py to reduce the memory usage when validate.

Thanks for your suggestion. I remove the validate stage and it seems ok when training. But the render result is messy. Is there any idea about it
lego_r_14
?

from jnerf.

Gword avatar Gword commented on August 23, 2024

What is your operating system, GPU and CUDA version?

from jnerf.

huangfuyang avatar huangfuyang commented on August 23, 2024

What is your operating system, GPU and CUDA version?

ubuntu 18.04 1080TI cuda 11.2

from jnerf.

LeoRainly avatar LeoRainly commented on August 23, 2024

Generally, the memory-usage of lego dataset is about 8G. You can pull the latest version and try it. Or you can try to reduce n_rays_per_batch in ngp_base.py to reduce the memory usage when validate.

I also meet this when training:
[i 0414 07:15:02.026124 56 compiler.py:955] Jittor(1.3.7.13) src: /usr/local/lib/python3.8/dist-packages/jittor
[i 0414 07:15:02.032066 56 compiler.py:956] g++ at /usr/bin/g++(9.4.0)
[i 0414 07:15:02.032149 56 compiler.py:957] cache_path: /root/.cache/jittor/jt1.3.7/g++9.4.0/py3.8.10/Linux-3.10.0-1x47/IntelRXeonRSilxb7/default
[i 0414 07:15:02.038467 56 init.py:411] Found nvcc(11.3.109) at /usr/local/cuda/bin/nvcc.
[i 0414 07:15:02.045088 56 init.py:411] Found addr2line(2.34) at /usr/bin/addr2line.
[i 0414 07:15:02.432493 56 compiler.py:1010] cuda key:cu11.3.109_sm_75
[i 0414 07:15:02.667230 56 init.py:227] Total mem: 62.39GB, using 16 procs for compiling.
[i 0414 07:15:02.872833 56 jit_compiler.cc:28] Load cc_path: /usr/bin/g++
[i 0414 07:15:03.160802 56 init.cc:62] Found cuda archs: [75,]
[i 0414 07:15:03.355404 56 compile_extern.py:522] mpicc not found, distribution disabled.
[i 0414 07:15:05.770059 56 init.py:6] JNeRF(0.1.3.0) at /data3/liuyu/develop/JNeRF/python/jnerf
[i 0414 07:15:05.990121 56 cuda_flags.cc:39] CUDA enabled.
Loading config from: ./projects/ngp/configs/ngp_base.py
load train data
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 200/200 [00:07<00:00, 26.77it/s]
load val data
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:01<00:00, 6.31it/s]
10%|███████████████████████▎ | 4094/40000 [00:50<07:12, 82.97it/s]/data3/liuyu/develop/JNeRF/python/jnerf/runner/runner.py:191: RuntimeWarning: invalid value encountered in cast
ndarr = (img*255+0.5).clip(0, 255).astype('uint8')
STEP=4096 | LOSS=nan | VAL PSNR=nan
20%|██████████████████████████████████████████████▋ | 8192/40000 [01:53<07:02, 75.34it/s]STEP=8192 | LOSS=nan | VAL PSNR=nan
31%|█████████████████████████████████████████████████████████████████████▋ | 12288/40000 [02:58<06:02, 76.48it/s]STEP=12288 | LOSS=nan | VAL PSNR=nan
41%|████████████████████████████████████████████████████████████████████████████████████████████▉ | 16377/40000 [04:08<05:30, 71.40it/s]STEP=16384 | LOSS=nan | VAL PSNR=nan
51%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ | 20475/40000 [05:26<05:19, 61.18it/s]STEP=20480 | LOSS=nan | VAL PSNR=nan
61%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍ | 24572/40000 [06:53<04:34, 56.16it/s]STEP=24576 | LOSS=nan | VAL PSNR=nan
72%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋ | 28670/40000 [08:48<04:37, 40.85it/s]STEP=28672 | LOSS=0.08867161720991135 | VAL PSNR=15.269938468933105
82%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉ | 32767/40000 [10:40<02:56, 40.92it/s]STEP=32768 | LOSS=0.08883365988731384 | VAL PSNR=10.358024597167969
92%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ | 36863/40000 [12:31<01:42, 30.54it/s]STEP=36864 | LOSS=0.08840325474739075 | VAL PSNR=10.357995986938477
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 40000/40000 [13:57<00:00, 47.79it/s]
load test data
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 200/200 [00:07<00:00, 27.13it/s]
rendering testset...
0%| | 0/200 [00:00<?, ?it/s][w 0414 07:29:31.006825 56 cuda_device_allocator.cc:30] Unable to alloc cuda device memory, use unify memory instead. This may cause low performance.
[i 0414 07:29:31.006853 56 cuda_device_allocator.cc:32]
=== display_memory_info ===
total_cpu_ram: 62.39GB total_device_ram: 7.795GB
hold_vars: 96 lived_vars: 243 lived_ops: 158
name: sfrl is_device: 1 used: 1.546GB(49.2%) unused: 1.597GB(50.8%) total: 3.144GB
name: sfrl is_device: 1 used: 3.912GB(99.5%) unused: 20.97MB(0.521%) total: 3.933GB
name: sfrl is_device: 0 used: 3.912GB(99.5%) unused: 20.97MB(0.521%) total: 3.933GB
name: sfrl is_device: 0 used: 81.33MB(80.5%) unused: 19.67MB(19.5%) total: 101MB
name: temp is_device: 0 used: 0 B(-nan%) unused: 0 B(-nan%) total: 0 B
name: temp is_device: 1 used: 0 B(-nan%) unused: 0 B(-nan%) total: 0 B
cpu&gpu: 11.11GB gpu: 7.076GB cpu: 4.031GB
free: cpu( 11.7GB) gpu(209.4MB)
swap: total( 0 B) last( 0 B)
[w 0414 07:29:31.008137 56 cuda_device_allocator.cc:30] Unable to alloc cuda device memory, use unify memory instead. This may cause low performance.
[i 0414 07:29:31.008161 56 cuda_device_allocator.cc:32]
=== display_memory_info ===
total_cpu_ram: 62.39GB total_device_ram: 7.795GB
hold_vars: 96 lived_vars: 236 lived_ops: 158
name: sfrl is_device: 1 used: 3.454GB(68.4%) unused: 1.598GB(31.6%) total: 5.052GB
name: sfrl is_device: 1 used: 3.912GB(99.5%) unused: 20.97MB(0.521%) total: 3.933GB
name: sfrl is_device: 0 used: 3.912GB(99.5%) unused: 20.97MB(0.521%) total: 3.933GB
name: sfrl is_device: 0 used: 81.33MB(80.5%) unused: 19.67MB(19.5%) total: 101MB
name: temp is_device: 0 used: 0 B(-nan%) unused: 0 B(-nan%) total: 0 B
name: temp is_device: 1 used: 0 B(-nan%) unused: 0 B(-nan%) total: 0 B
cpu&gpu: 13.02GB gpu: 8.984GB cpu: 4.031GB
free: cpu( 11.7GB) gpu(209.4MB)
swap: total( 0 B) last( 0 B)
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 200/200 [05:19<00:00, 1.60s/it]
TOTAL TEST PSNR====11.312249183654785

But the rendered result of the params.pkl is a video of nothing

from jnerf.

ZhengHFei avatar ZhengHFei commented on August 23, 2024

Have you solved this problem? I have the same problem

from jnerf.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.