Giter Site home page Giter Site logo

problem about yolof HOT 13 CLOSED

megvii-model avatar megvii-model commented on July 26, 2024
problem

from yolof.

Comments (13)

chensnathan avatar chensnathan commented on July 26, 2024

Hi,

You can refer to #12 to re-install cvpods.

BTW, we are working on a neat implementation in this pr (#13). It will be merged when it is ready.

from yolof.

xiexu0210 avatar xiexu0210 commented on July 26, 2024

Hi,

You can refer to #12 to re-install cvpods.

BTW, we are working on a neat implementation in this pr (#13). It will be merged when it is ready.

thanks for your reply. but I have some other problems. There are bugs in your code during training

ERROR [03/30 20:32:14 c2.engine.base_runner]: Exception during training:
Traceback (most recent call last):
File "/DATA/xiexu/yf/YOLOF/cvpods/engine/base_runner.py", line 84, in train
self.run_step()
File "/DATA/xiexu/yf/YOLOF/cvpods/engine/base_runner.py", line 185, in run_step
loss_dict = self.model(data)
File "/home/xiexu/anaconda3/envs/yfb/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/xiexu/anaconda3/envs/yfb/python3.7/site-packages/torch/nn/parallel/distributed.py", line 511, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/home/xiexu/anaconda3/envs/yfb/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "../yolof_base/yolof.py", line 131, in forward
anchors, pred_anchor_deltas, gt_instances)
File "/home/xiexu/anaconda3/envs/yfb/python3.7/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context
return func(*args, **kwargs)
File "../yolof_base/yolof.py", line 245, in get_ground_truth
box_pred = self.box2box_transform.apply_deltas(box_delta, all_anchors)
File "/DATA/xiexu/yf/YOLOF/cvpods/modeling/box_regression.py", line 93, in apply_deltas
deltas).all().item(), "Box regression deltas become infinite or NaN!"
AssertionError: Box regression deltas become infinite or NaN!

from yolof.

walynlee avatar walynlee commented on July 26, 2024

Hello, when I run it with 'pods_train --num-gpus 1' and I have the same probelm "KeyError: "No object named 'RandomShift' found in 'transforms' registry!", I also refer to #12 to do ,the result is “.....Requirement already satisfied: certifi>=2020.06.20 in ./.local/lib/python3.6/site-packages (from matplotlib>=3.1.1->lvis) (2020.6.20)
Requirement already satisfied: pillow>=6.2.0 in ./.local/lib/python3.6/site-packages (from matplotlib>=3.1.1->lvis) (7.2.0)
Installing collected packages: lvis
Successfully installed lvis-0.5.3
”,everything is successful but it still repo this wrong(No object named 'RandomShift' found in 'transforms' registry!) and not to solve.
How did you solve the problem,upstairs?Hope someone to tell me.Thanks so much.

from yolof.

xiexu0210 avatar xiexu0210 commented on July 26, 2024

Hello, when I run it with 'pods_train --num-gpus 1' and I have the same probelm "KeyError: "No object named 'RandomShift' found in 'transforms' registry!", I also refer to #12 to do ,the result is “.....Requirement already satisfied: certifi>=2020.06.20 in ./.local/lib/python3.6/site-packages (from matplotlib>=3.1.1->lvis) (2020.6.20)
Requirement already satisfied: pillow>=6.2.0 in ./.local/lib/python3.6/site-packages (from matplotlib>=3.1.1->lvis) (7.2.0)
Installing collected packages: lvis
Successfully installed lvis-0.5.3
”,everything is successful but it still repo this wrong(No object named 'RandomShift' found in 'transforms' registry!) and not to solve.
How did you solve the problem,upstairs?Hope someone to tell me.Thanks so much.
hi, I think you need to re install the environment,The steps are as follows
pytorch=1.6 python==3.7
git clone https://github.com/thomasbrandon/mish-cuda
cd mish-cuda
python setup.py build install
cd ..
git clone [email protected]:megvii-model/YOLOF.git
cd YOLOF/
python setup.py develop
cd ./playground/detection/coco/yolof/yolof.res50.C5.1x
pods_train --num-gpus 2

from yolof.

chensnathan avatar chensnathan commented on July 26, 2024

@xiexu0210 Hi, have you modify any code in the repo? And does the bug occur every time you run with YOLOF?

from yolof.

chensnathan avatar chensnathan commented on July 26, 2024

@walynlee Hi, maybe you should uninstall the previous cvpods first, then re-install YOLOF locally follow the steps.

from yolof.

walynlee avatar walynlee commented on July 26, 2024

@chensnathan Hello, I haven't modified any code yet,it already report errors,this is my environment, should I uninstall pytorch1.7 and install pytorch1.6 and update my python version?

Environment info:


sys.platform linux
Python 3.6.9 (default, Jan 26 2021, 15:33:00) [GCC 8.4.0]
numpy 1.19.3
cvpods 0.1 @/home/a303/cvpods/cvpods
cvpods compiler GCC 7.5
cvpods CUDA compiler 10.0
cvpods arch flags /home/a303/cvpods/cvpods/_C.cpython-36m-x86_64-linux-gnu.so; cannot find cuobjdump
cvpods_ENV_MODULE
PyTorch 1.7.0 @/home/a303/.local/lib/python3.6/site-packages/torch
PyTorch debug build True
CUDA available True
GPU 0 GeForce RTX 2080 Ti
CUDA_HOME :/usr/local/cuda-10.0
Pillow 7.2.0
torchvision 0.8.1 @/home/a303/.local/lib/python3.6/site-packages/torchvision
torchvision arch flags /home/a303/.local/lib/python3.6/site-packages/torchvision/_C.so; cannot find cuobjdump
cv2 4.4.0


PyTorch built with:

  • GCC 7.3
  • C++ Version: 201402
  • Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v1.6.0 (Git Hash 5ef631a030a6f73131c77892041042805a06064f)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • NNPACK is enabled
  • CPU capability usage: AVX2
  • CUDA Runtime 10.2
  • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75
  • CuDNN 7.6.5
  • Magma 2.5.2

from yolof.

chensnathan avatar chensnathan commented on July 26, 2024

Could you post your training log?

from yolof.

xiexu0210 avatar xiexu0210 commented on July 26, 2024

Could you post your training log?

hi,The error was reported only once 。
I ran through your code and changed the number of GPUs to 3。The result was very bad。
image

I have another question, How to debug your code,I only run it as a command line

from yolof.

chensnathan avatar chensnathan commented on July 26, 2024

The model diverges during your training. When you use fewer GPUs, you should warm up more iterations.

from yolof.

qijindao avatar qijindao commented on July 26, 2024

Hi,
You can refer to #12 to re-install cvpods.
BTW, we are working on a neat implementation in this pr (#13). It will be merged when it is ready.

thanks for your reply. but I have some other problems. There are bugs in your code during training

ERROR [03/30 20:32:14 c2.engine.base_runner]: Exception during training:
Traceback (most recent call last):
File "/DATA/xiexu/yf/YOLOF/cvpods/engine/base_runner.py", line 84, in train
self.run_step()
File "/DATA/xiexu/yf/YOLOF/cvpods/engine/base_runner.py", line 185, in run_step
loss_dict = self.model(data)
File "/home/xiexu/anaconda3/envs/yfb/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/xiexu/anaconda3/envs/yfb/python3.7/site-packages/torch/nn/parallel/distributed.py", line 511, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/home/xiexu/anaconda3/envs/yfb/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "../yolof_base/yolof.py", line 131, in forward
anchors, pred_anchor_deltas, gt_instances)
File "/home/xiexu/anaconda3/envs/yfb/python3.7/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context
return func(*args, **kwargs)
File "../yolof_base/yolof.py", line 245, in get_ground_truth
box_pred = self.box2box_transform.apply_deltas(box_delta, all_anchors)
File "/DATA/xiexu/yf/YOLOF/cvpods/modeling/box_regression.py", line 93, in apply_deltas
deltas).all().item(), "Box regression deltas become infinite or NaN!"
AssertionError: Box regression deltas become infinite or NaN!

Hi,I met the same problem when my code have been trained for a little time.' Box regression deltas become infinite or NaN!'suddenly occurs.How did you solve the problem?

from yolof.

tangjiuqi097 avatar tangjiuqi097 commented on July 26, 2024

Could you post your training log?

hi,The error was reported only once 。
I ran through your code and changed the number of GPUs to 3。The result was very bad。
image

I have another question, How to debug your code,I only run it as a command line

Hi, if you use Pycharm to debug, in Run/Debug Configurations, you can

set the working directory to the code path which you want to run, e.g. YOLOF/playground/detection/coco/yolof/yolof.res50.C5.1x

set the script path to YOLOF/tools/train_net.py.

from yolof.

shenhaibb avatar shenhaibb commented on July 26, 2024

Could you post your training log?

hi,The error was reported only once 。
I ran through your code and changed the number of GPUs to 3。The result was very bad。
image
I have another question, How to debug your code,I only run it as a command line

Hi, if you use Pycharm to debug, in Run/Debug Configurations, you can

set the working directory to the code path which you want to run, e.g. YOLOF/playground/detection/coco/yolof/yolof.res50.C5.1x

set the script path to YOLOF/tools/train_net.py.

image

I set like this, why the problem still has

from yolof.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.