Hi, I'm a bit confused how to deal with this error. Can you help? /h

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Debugging Error about pacnet HOT 11 CLOSED

nvlabs commented on August 15, 2024

Debugging Error

from pacnet.

Comments (11)

suhangpro commented on August 15, 2024

Are you able to run testing with the provided weights? For training, 11GB memory should be enough for up to 8x upsampling, but might not for 16x.

from pacnet.

josephdanielchang commented on August 15, 2024

That's odd. When I run testing, it gives a very similar error.

Command:
CUDA_VISIBLE_DEVICES=4 python -m task_jointUpsampling.main --load-weights weights_flow/x8_pac_weights_epoch_5000.pth --download --factor 8 --model PacJointUpsample --dataset Sintel --data-root data/sintel

Output:

TEST LOADER START
TEST LOADER END

Model weights initialized from: weights_flow/x8_pac_weights_epoch_5000.pth
TEST START
BEFORE APPLY MODEL
BEFORE NET
AFTER NET
AFTER APPLY MODEL
BEFORE APPLY MODEL
BEFORE NET
Traceback (most recent call last):
  File "/home/joseph/anaconda3/envs/pac/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/joseph/anaconda3/envs/pac/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/joseph/pacnet-master/task_jointUpsampling/main.py", line 362, in <module>
    main()
  File "/home/joseph/pacnet-master/task_jointUpsampling/main.py", line 335, in main
    log_test = test(model, test_loader, device, last_epoch, init_lr, args.loss, perf_measures, args)                   # TEST
  File "/home/joseph/pacnet-master/task_jointUpsampling/main.py", line 89, in test
    output = apply_model(model, lres, guide, args.factor)
  File "/home/joseph/pacnet-master/task_jointUpsampling/main.py", line 23, in apply_model
    out = net(lres, guide)
  File "/home/joseph/anaconda3/envs/pac/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/joseph/pacnet-master/task_jointUpsampling/models.py", line 245, in forward
    x = self.up_convts[i](x, guide_cur)
  File "/home/joseph/anaconda3/envs/pac/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/joseph/pacnet-master/pac.py", line 786, in forward
    self.output_padding, self.dilation, self.shared_filters, self.native_impl)
  File "/home/joseph/pacnet-master/pac.py", line 498, in pacconv_transpose2d
    shared_filters)
  File "/home/joseph/pacnet-master/pac.py", line 252, in forward
    output = torch.einsum('ijklmn,jokl->iomn', (in_mul_k, weight))
  File "/home/joseph/anaconda3/envs/pac/lib/python3.6/site-packages/torch/functional.py", line 211, in einsum
    return torch._C._VariableFunctions.einsum(equation, operands)
RuntimeError: CUDA out of memory. Tried to allocate 2.69 GiB (GPU 0; 10.92 GiB total capacity; 5.74 GiB already allocated; 1.86 GiB free; 2.78 GiB cached)

from pacnet.

suhangpro commented on August 15, 2024

This is indeed odd ... which pytorch version are you using? The code was originally developed for 0.4, but there is an experimental branch for 1.4 which you might try out.

from pacnet.

josephdanielchang commented on August 15, 2024

Using python >> import torch >> print(torch.version), mine is 1.1.0
I am running the optical flow test on Sintel data with weights_flow x8_pac_weights_epoch_50000.pth

python -m task_jointUpsampling.main --load-weights weights_flow/x8_pac_weights_epoch_5000.pth --download --factor 8 --model PacJointUpsample --dataset Sintel --data-root data/sintel

How many GB of GPU would you estimate is necessary to run the test program?

from pacnet.

suhangpro commented on August 15, 2024

11GB GPUs should be enough for both training (w/ the exception of some 16x models) and testing. versions >1.0 are not supported by the master branch (I expect some test cases to fail as well). The th14 branch is to be used with version 1.4, but has not been thoroughly tested.

from pacnet.

josephdanielchang commented on August 15, 2024

So, 1.0 should work then correct? Should I downgrade and test again or do you have other suggestions?

from pacnet.

suhangpro commented on August 15, 2024

You can downgrade to 1.0 or upgrade to 1.4 (and use the th14 branch).

from pacnet.

josephdanielchang commented on August 15, 2024

I downgraded to 1.0.0 and it still has GPU out of memory error for testing flow. Is the data-root supposed to be: --data-root data/sintel? There are a lot of folders under the data-root, should I specify a particular folder?

from pacnet.

suhangpro commented on August 15, 2024

@josephdanielchang I just tested on 11GB mem GPU and found that indeed the 8x and 16x flow tests won't work. Sorry that I didn't provide clear information before. With a 11GB mem GPU, you are able to run all depth experiments and only 4x flow experiments.

The data path is correct as is.

from pacnet.

josephdanielchang commented on August 15, 2024

Thanks, it does work with only 4x for flow. Followup question, where do I find the results for the these "upsampled" flow after running flow test on the sintel flow data? I only find a folder exp/sintel with test.log and train.log, but no .flo files generated anywhere. Is there supposed to be no output?

from pacnet.

suhangpro commented on August 15, 2024

Right, the code is for quantitative evaluation only and does not save results (for the semantic segmentation code though we do have a "--eval pred" option for this purpose).

from pacnet.

Debugging Error about pacnet HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent