Giter Site home page Giter Site logo

Debugging Error about pacnet HOT 11 CLOSED

nvlabs avatar nvlabs commented on August 15, 2024
Debugging Error

from pacnet.

Comments (11)

suhangpro avatar suhangpro commented on August 15, 2024

Are you able to run testing with the provided weights? For training, 11GB memory should be enough for up to 8x upsampling, but might not for 16x.

from pacnet.

josephdanielchang avatar josephdanielchang commented on August 15, 2024

That's odd. When I run testing, it gives a very similar error.

Command:
CUDA_VISIBLE_DEVICES=4 python -m task_jointUpsampling.main --load-weights weights_flow/x8_pac_weights_epoch_5000.pth --download --factor 8 --model PacJointUpsample --dataset Sintel --data-root data/sintel

Output:

TEST LOADER START
TEST LOADER END

Model weights initialized from: weights_flow/x8_pac_weights_epoch_5000.pth
TEST START
BEFORE APPLY MODEL
BEFORE NET
AFTER NET
AFTER APPLY MODEL
BEFORE APPLY MODEL
BEFORE NET
Traceback (most recent call last):
  File "/home/joseph/anaconda3/envs/pac/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/joseph/anaconda3/envs/pac/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/joseph/pacnet-master/task_jointUpsampling/main.py", line 362, in <module>
    main()
  File "/home/joseph/pacnet-master/task_jointUpsampling/main.py", line 335, in main
    log_test = test(model, test_loader, device, last_epoch, init_lr, args.loss, perf_measures, args)                   # TEST
  File "/home/joseph/pacnet-master/task_jointUpsampling/main.py", line 89, in test
    output = apply_model(model, lres, guide, args.factor)
  File "/home/joseph/pacnet-master/task_jointUpsampling/main.py", line 23, in apply_model
    out = net(lres, guide)
  File "/home/joseph/anaconda3/envs/pac/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/joseph/pacnet-master/task_jointUpsampling/models.py", line 245, in forward
    x = self.up_convts[i](x, guide_cur)
  File "/home/joseph/anaconda3/envs/pac/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/joseph/pacnet-master/pac.py", line 786, in forward
    self.output_padding, self.dilation, self.shared_filters, self.native_impl)
  File "/home/joseph/pacnet-master/pac.py", line 498, in pacconv_transpose2d
    shared_filters)
  File "/home/joseph/pacnet-master/pac.py", line 252, in forward
    output = torch.einsum('ijklmn,jokl->iomn', (in_mul_k, weight))
  File "/home/joseph/anaconda3/envs/pac/lib/python3.6/site-packages/torch/functional.py", line 211, in einsum
    return torch._C._VariableFunctions.einsum(equation, operands)
RuntimeError: CUDA out of memory. Tried to allocate 2.69 GiB (GPU 0; 10.92 GiB total capacity; 5.74 GiB already allocated; 1.86 GiB free; 2.78 GiB cached)

from pacnet.

suhangpro avatar suhangpro commented on August 15, 2024

This is indeed odd ... which pytorch version are you using? The code was originally developed for 0.4, but there is an experimental branch for 1.4 which you might try out.

from pacnet.

josephdanielchang avatar josephdanielchang commented on August 15, 2024

Using python >> import torch >> print(torch.version), mine is 1.1.0
I am running the optical flow test on Sintel data with weights_flow x8_pac_weights_epoch_50000.pth

python -m task_jointUpsampling.main --load-weights weights_flow/x8_pac_weights_epoch_5000.pth --download --factor 8 --model PacJointUpsample --dataset Sintel --data-root data/sintel

How many GB of GPU would you estimate is necessary to run the test program?

from pacnet.

suhangpro avatar suhangpro commented on August 15, 2024

11GB GPUs should be enough for both training (w/ the exception of some 16x models) and testing. versions >1.0 are not supported by the master branch (I expect some test cases to fail as well). The th14 branch is to be used with version 1.4, but has not been thoroughly tested.

from pacnet.

josephdanielchang avatar josephdanielchang commented on August 15, 2024

So, 1.0 should work then correct? Should I downgrade and test again or do you have other suggestions?

from pacnet.

suhangpro avatar suhangpro commented on August 15, 2024

You can downgrade to 1.0 or upgrade to 1.4 (and use the th14 branch).

from pacnet.

josephdanielchang avatar josephdanielchang commented on August 15, 2024

I downgraded to 1.0.0 and it still has GPU out of memory error for testing flow. Is the data-root supposed to be: --data-root data/sintel? There are a lot of folders under the data-root, should I specify a particular folder?

from pacnet.

suhangpro avatar suhangpro commented on August 15, 2024

@josephdanielchang I just tested on 11GB mem GPU and found that indeed the 8x and 16x flow tests won't work. Sorry that I didn't provide clear information before. With a 11GB mem GPU, you are able to run all depth experiments and only 4x flow experiments.

The data path is correct as is.

from pacnet.

josephdanielchang avatar josephdanielchang commented on August 15, 2024

Thanks, it does work with only 4x for flow. Followup question, where do I find the results for the these "upsampled" flow after running flow test on the sintel flow data? I only find a folder exp/sintel with test.log and train.log, but no .flo files generated anywhere. Is there supposed to be no output?

from pacnet.

suhangpro avatar suhangpro commented on August 15, 2024

Right, the code is for quantitative evaluation only and does not save results (for the semantic segmentation code though we do have a "--eval pred" option for this purpose).

from pacnet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.