Giter Site home page Giter Site logo

coincheung / bisenet Goto Github PK

View Code? Open in Web Editor NEW
1.4K 17.0 304.0 3.68 MB

Add bisenetv2. My implementation of BiSeNet

License: MIT License

Python 67.90% C 0.27% C++ 22.89% Cuda 6.12% CMake 2.51% Shell 0.31%
bisenet cityscapes pytorch cocostuff tensorrt ncnn openvino triton-inference-server ade20k

bisenet's People

Contributors

coincheung avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bisenet's Issues

Question for Bisenet

Hi cheung

  1. When I run the command python evaluate.py, you could see the errors as follows:

Blade:~/BiSeNet-master$ python evaluate.py
Traceback (most recent call last):
File "evaluate.py", line 5, in
from model import BiSeNet
File "/home/keshi/BiSeNet-master/model.py", line 8, in
import torchvision
File "/home/keshi/anaconda3/lib/python3.7/site-packages/torchvision/init.py", line 1, in
from torchvision import models
File "/home/keshi/anaconda3/lib/python3.7/site-packages/torchvision/models/init.py", line 11, in
from . import detection
File "/home/keshi/anaconda3/lib/python3.7/site-packages/torchvision/models/detection/init.py", line 1, in
from .faster_rcnn import *
File "/home/keshi/anaconda3/lib/python3.7/site-packages/torchvision/models/detection/faster_rcnn.py", line 7, in
from torchvision.ops import misc as misc_nn_ops
File "/home/keshi/anaconda3/lib/python3.7/site-packages/torchvision/ops/init.py", line 1, in
from .boxes import nms, box_iou
File "/home/keshi/anaconda3/lib/python3.7/site-packages/torchvision/ops/boxes.py", line 2, in
from torchvision import _C
ImportError: /home/keshi/anaconda3/lib/python3.7/site-packages/torchvision/_C.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZN2at7getTypeERKNS_6TensorE

Do you tell me the version of torchvision module and ubuntu you used?

  1. When I run the command python diss/evaluate.py, you could see the errors as follows:
    keshi@keshi-Blade:~/BiSeNet-master$ python diss/evaluate.py
    Traceback (most recent call last):
    File "diss/evaluate.py", line 4, in
    from logger import setup_logger
    ImportError: cannot import name 'setup_logger' from 'logger' (/home/keshi/anaconda3/lib/python3.7/site-packages/logger/init.py)

I do not why it will happen? Could you tell me the reason?

help! Cityscapes Test Server

Is anything wrong with Cityscapes Server? I can't get my result. In the column ‘Eval status’ ,it shows pending. After a very long time , it changes to failed. I wonder if I make some mistakes about the form of submission. Can anyone help me?
my Submissions:
微信截图_20191025163059

Thank you very much for sharing

Thank you very much for sharing. ARM and FFM are not completely implemented according to the original author's requirements. Do you know if I feel right? You add your own ideas to this part of the implementation.

Some error I have meet

Hi
I have run the code but i get a error, do you have ever meet this error
RuntimeError: Error building extension 'inplace_abn': [1/5] :/usr/local/cuda-9.0:/usr/local/cuda-9.0/bin/nvcc -DTORCH_EXTENSION_NAME=inplace_abn -DTORCH_API_INCLUDE_EXTENSION_H -isystem /home/xinyawang/anaconda2/envs/HSI/lib/python3.6/site-packages/torch/lib/include -isystem /home/xinyawang/anaconda2/envs/HSI/lib/python3.6/site-packages/torch/lib/include/torch/csrc/api/include -isystem /home/xinyawang/anaconda2/envs/HSI/lib/python3.6/site-packages/torch/lib/include/TH -isystem /home/xinyawang/anaconda2/envs/HSI/lib/python3.6/site-packages/torch/lib/include/THC -isystem :/usr/local/cuda-9.0:/usr/local/cuda-9.0/include -isystem /home/xinyawang/anaconda2/envs/HSI/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --compiler-options '-fPIC' --expt-extended-lambda -std=c++11 -c /data/train/BiSeNet2/modules/src/inplace_abn_cuda.cu -o inplace_abn_cuda.cuda.o
FAILED: inplace_abn_cuda.cuda.o
:/usr/local/cuda-9.0:/usr/local/cuda-9.0/bin/nvcc -DTORCH_EXTENSION_NAME=inplace_abn -DTORCH_API_INCLUDE_EXTENSION_H -isystem /home/xinyawang/anaconda2/envs/HSI/lib/python3.6/site-packages/torch/lib/include -isystem /home/xinyawang/anaconda2/envs/HSI/lib/python3.6/site-packages/torch/lib/include/torch/csrc/api/include -isystem /home/xinyawang/anaconda2/envs/HSI/lib/python3.6/site-packages/torch/lib/include/TH -isystem /home/xinyawang/anaconda2/envs/HSI/lib/python3.6/site-packages/torch/lib/include/THC -isystem :/usr/local/cuda-9.0:/usr/local/cuda-9.0/include -isystem /home/xinyawang/anaconda2/envs/HSI/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --compiler-options '-fPIC' --expt-extended-lambda -std=c++11 -c /data/train/BiSeNet2/modules/src/inplace_abn_cuda.cu -o inplace_abn_cuda.cuda.o
/bin/sh: 1: :/usr/local/cuda-9.0:/usr/local/cuda-9.0/bin/nvcc: not found
[2/5] :/usr/local/cuda-9.0:/usr/local/cuda-9.0/bin/nvcc -DTORCH_EXTENSION_NAME=inplace_abn -DTORCH_API_INCLUDE_EXTENSION_H -isystem /home/xinyawang/anaconda2/envs/HSI/lib/python3.6/site-packages/torch/lib/include -isystem /home/xinyawang/anaconda2/envs/HSI/lib/python3.6/site-packages/torch/lib/include/torch/csrc/api/include -isystem /home/xinyawang/anaconda2/envs/HSI/lib/python3.6/site-packages/torch/lib/include/TH -isystem /home/xinyawang/anaconda2/envs/HSI/lib/python3.6/site-packages/torch/lib/include/THC -isystem :/usr/local/cuda-9.0:/usr/local/cuda-9.0/include -isystem /home/xinyawang/anaconda2/envs/HSI/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --compiler-options '-fPIC' --expt-extended-lambda -std=c++11 -c /data/train/BiSeNet2/modules/src/inplace_abn_cuda_half.cu -o inplace_abn_cuda_half.cuda.o
FAILED: inplace_abn_cuda_half.cuda.o
:/usr/local/cuda-9.0:/usr/local/cuda-9.0/bin/nvcc -DTORCH_EXTENSION_NAME=inplace_abn -DTORCH_API_INCLUDE_EXTENSION_H -isystem /home/xinyawang/anaconda2/envs/HSI/lib/python3.6/site-packages/torch/lib/include -isystem /home/xinyawang/anaconda2/envs/HSI/lib/python3.6/site-packages/torch/lib/include/torch/csrc/api/include -isystem /home/xinyawang/anaconda2/envs/HSI/lib/python3.6/site-packages/torch/lib/include/TH -isystem /home/xinyawang/anaconda2/envs/HSI/lib/python3.6/site-packages/torch/lib/include/THC -isystem :/usr/local/cuda-9.0:/usr/local/cuda-9.0/include -isystem /home/xinyawang/anaconda2/envs/HSI/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --compiler-options '-fPIC' --expt-extended-lambda -std=c++11 -c /data/train/BiSeNet2/modules/src/inplace_abn_cuda_half.cu -o inplace_abn_cuda_half.cuda.o
/bin/sh: 1: :/usr/local/cuda-9.0:/usr/local/cuda-9.0/bin/nvcc: not found
[3/5] c++ -MMD -MF inplace_abn_cpu.o.d -DTORCH_EXTENSION_NAME=inplace_abn -DTORCH_API_INCLUDE_EXTENSION_H -isystem /home/xinyawang/anaconda2/envs/HSI/lib/python3.6/site-packages/torch/lib/include -isystem /home/xinyawang/anaconda2/envs/HSI/lib/python3.6/site-packages/torch/lib/include/torch/csrc/api/include -isystem /home/xinyawang/anaconda2/envs/HSI/lib/python3.6/site-packages/torch/lib/include/TH -isystem /home/xinyawang/anaconda2/envs/HSI/lib/python3.6/site-packages/torch/lib/include/THC -isystem :/usr/local/cuda-9.0:/usr/local/cuda-9.0/include -isystem /home/xinyawang/anaconda2/envs/HSI/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -O3 -c /data/train/BiSeNet2/modules/src/inplace_abn_cpu.cpp -o inplace_abn_cpu.o
[4/5] c++ -MMD -MF inplace_abn.o.d -DTORCH_EXTENSION_NAME=inplace_abn -DTORCH_API_INCLUDE_EXTENSION_H -isystem /home/xinyawang/anaconda2/envs/HSI/lib/python3.6/site-packages/torch/lib/include -isystem /home/xinyawang/anaconda2/envs/HSI/lib/python3.6/site-packages/torch/lib/include/torch/csrc/api/include -isystem /home/xinyawang/anaconda2/envs/HSI/lib/python3.6/site-packages/torch/lib/include/TH -isystem /home/xinyawang/anaconda2/envs/HSI/lib/python3.6/site-packages/torch/lib/include/THC -isystem :/usr/local/cuda-9.0:/usr/local/cuda-9.0/include -isystem /home/xinyawang/anaconda2/envs/HSI/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -O3 -c /data/train/BiSeNet2/modules/src/inplace_abn.cpp -o inplace_abn.o
ninja: build stopped: subcommand failed.
Thanks for your code is pretty useful.

about pretrained diss_model weights

Hi, old iron, I would give you a lot of 666. Your implementation is great.
I am wondering if you could upload the pretrained weights on the diss_model mentioned in the "diss this paper section". The computation resources that I can make use of are limited. So, could you plz help me with this. It would be really nice of you to upload the pretrained weights.

There is a problem about BN layer.

Thank you very much for your open source sharing.
Why do you write the BN layer script alone? Isn't there a defined BN in Pytorch? Is there something wrong with the BN layer of Pytorch? I use the BN layer of pytorch. When the size of batch_size is too small, the model test is very bad. I have never encountered this problem in other deep learning frameworks.

Continue to training

Hi , i dont know how to continue to training from checkpoint we saved , please help me

run with cpu

Thank you for sharing.
When I run with cpu, something wrong with "functions.py":

File "inference.py", line 13, in
from model import BiSeNet
File "/data/Parker/BiSeNet-master/diss/model.py", line 10, in
from resnet import Resnet18
File "/data/Parker/BiSeNet-master/diss/resnet.py", line 9, in
from modules.bn import InPlaceABNSync as BatchNorm2d
File "/data/Parker/BiSeNet-master/diss/modules/init.py", line 1, in
from .bn import ABN, InPlaceABN, InPlaceABNSync
File "/data/Parker/BiSeNet-master/diss/modules/bn.py", line 10, in
from .functions import *
File "/data/Parker/BiSeNet-master/diss/modules/functions.py", line 18, in
extra_cuda_cflags=["--expt-extended-lambda"]
File "/opt/conda/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 645, in load
is_python_module)
File "/opt/conda/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 814, in _jit_compile
with_cuda=with_cuda)
File "/opt/conda/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 838, in _write_ninja_file_and_build
check_compiler_abi_compatibility(os.environ.get('CXX', 'c++'))
File "/opt/conda/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 162, in check_compiler_abi_compatibility
if not check_compiler_ok_for_platform(compiler):
File "/opt/conda/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 138, in check_compiler_ok_for_platform
which = subprocess.check_output(['which', compiler], stderr=subprocess.STDOUT)
File "/opt/conda/lib/python3.6/subprocess.py", line 336, in check_output
**kwargs).stdout
File "/opt/conda/lib/python3.6/subprocess.py", line 418, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['which', 'c++']' returned non-zero exit status 1.

It looks like you use CUDA progame;
Could you tell me how to run inference with cpu?

TCP address is taken after training

Hi, thanks for your great work.
I have a question about distributed training. When I use distributed training, after the program is finished, I tried to re-run distributed training, but the error log says the tcp address is already taken. Is there any method to free the tcp address, without rebooting the machine? I'm not very familiar with distributed training in pytorch, thanks for your help.

about the module "inplace_abn"

I trained the model with only one GPU.

When I train the code with "CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node=1 train.py", there are some errors:

/home/min/JHW/code/modules/src/inplace_abn_cuda.cu(318): error: no instance of overloaded function "thrust::transform_if" matches the argument list
            argument types are: (<error-type>, thrust::device_ptr<float>, thrust::device_ptr<float>, thrust::device_ptr<float>, lambda [](const float &)->float, lambda [](const float &)->__nv_bool)
          detected during instantiation of "void elu_backward_impl(T *, T *, int64_t) [with T=float]" 
(330): here

12 errors detected in the compilation of "/tmp/tmpxft_00002758_00000000-7_inplace_abn_cuda.cpp1.ii".
ninja: build stopped: subcommand failed.

However, when I trained it with “CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node=2 train.py”, the traceback is:

Traceback (most recent call last):
  File "train.py", line 6, in <module>
    from model import BiSeNet
  File "/home/min/JHW/code/model.py", line 10, in <module>
    from resnet import Resnet18
  File "/home/min/JHW/code/resnet.py", line 9, in <module>
    from modules.bn import InPlaceABNSync as BatchNorm2d
  File "/home/min/JHW/code/modules/__init__.py", line 1, in <module>
    from .bn import ABN, InPlaceABN, InPlaceABNSync
  File "/home/min/JHW/code/modules/bn.py", line 10, in <module>
    from .functions import *
  File "/home/min/JHW/code/modules/functions.py", line 18, in <module>
    extra_cuda_cflags=["--expt-extended-lambda"])
  File "/home/min/anaconda2/envs/py3/lib/python3.5/site-packages/torch/utils/cpp_extension.py", line 645, in load
    is_python_module)
  File "/home/min/anaconda2/envs/py3/lib/python3.5/site-packages/torch/utils/cpp_extension.py", line 825, in _jit_compile
    return _import_module_from_library(name, build_directory, is_python_module)
  File "/home/min/anaconda2/envs/py3/lib/python3.5/site-packages/torch/utils/cpp_extension.py", line 964, in _import_module_from_library
    file, path, description = imp.find_module(module_name, [path])
  File "/home/min/anaconda2/envs/py3/lib/python3.5/imp.py", line 297, in find_module
    raise ImportError(_ERR_MSG.format(name), name=name)
ImportError: No module named 'inplace_abn'

So, I am not sure it is because of Ninja or the module inplace_abn. I guess it may related to the version of CUDA and cuDNN (I use 9.0 and 7.4.1).

Thank you for your help!

Question about lr_mul in optimizer.py

In the README you say, that you 'use a 10 times larger lr at the model output layers.'
When I look at your model.py, I see that you set lr_mul to True for these layers.
Why do you then multiply the lr by 10 for layers where lr_mul is set to False in the step function of your optimizer?

if pg.get('lr_mul', False):
  pg['lr'] = self.lr * 10
else:
  pg['lr'] = self.lr

For my understanding it should be exactly the other way around:

if pg.get('lr_mul', False):
  pg['lr'] = self.lr
else:
  pg['lr'] = self.lr * 10

The technique of reproducing the author's accuracy

Although this is not the first time for me to hang out with the author, I would like to thank the author again for the code.
I've almost reproduced the accuracy of the open source code here.
Accuracy of single scale test: 76.17%
Multi scale accuracy test: 77.90%
It's very easy to get this precision. I downloaded the code directly here, and then the training environment is similar to the one mentioned in the author's code. That is, I can run the code directly without any modification.
Be sure to remember to train directly without any modification.
To add up, the capacity of my GPU graphics card is still a little small. Each GPU's batch_size is 6, and the sum of the two graphics cards is 12.So it's normal that the accuracy here is a little bit poor.

about the miou?

@CoinCheung hi, thanks for your job. when reproducing your result with default hp in this repo, the miou is 75.68% with your evaluate.py, does not arrives at your report 78+%. can you give me some suggestion?

about the cityscapes_info.json

I trained the cityscapes dataset, it worked. So how to train my own data, and how to generate my own json just like the cityscapes_info.json. Thanks a lot.

inference time

Could anyone kindly share the inference time of this inplementation and the corresponding gpu used? Thanks a lot!

Diss Back

Hello,

I am the author of this paper BiSeNet.

First, thanks for your awesome practice. The tricks mentioned in the README file are our common tricks in the implementation of semantic segmentation algorithms.

However, I don't agree with your "diss" part, because the result you mentioned is the performance compared with other non-real-time algorithms. Actually, this paper, the several-month hard work of me and my collaborators, mainly focuses on the real-time scenario, which is also our original motivation. It is precise because of the limit of the real-time scenario, we design the two-branch architecture. In the non-real-time scene, without consideration of the computation resource, we didn't need to design this type of architecture. Furthermore, for the real-time result, we didn't use the evaluation tricks, like multi-scale and flip testing. If you read this paper carefully, I think you would not make these mistakes. Finally, due to the request of the reviewers and to validate the effectiveness of this architecture, we also make a detailed comparison with other non-real-time algorithms.

Next, I admit that this paper is not a significant improvement. However, the advance of the research community depends on the hard work of each researcher. I think each of us, as a qualified researcher, should show enough tolerance and respect to these efforts. Of course, I also sincerely hope each researcher, including you and me, can make major progress to push forward the whole community.

Finally, send one quote I just saw tonight to you: "Be kind, always. Everyone you meet is fighting a battle you know nothing about."

Train on single GPU

Thanks for your sharing. Did you try to train BiSeNet using one GPU? Besides, could you please share your model on cityscapes dataset?

Agree with you

Therefore, I feel that the real contribution of this paper is the successful usage of the training and evaluating tricks, though the authors made little mention of these tricks and only advocates their model structures in the paper.

This phenomenon is very common now, which is very bad!!

How to test on my own pictures with your pretrained model?

Hello, thanks for your work and your humor!

now I need to get segemantation results on my own pictures(.bmp),but I need to use your pretrained model, can u give me some advice?(my coding ability is very poor now) Thanks very much!

On COCO dataset

Do you have any pretrained weights for COCO dataset. If not do you have any train.py or script to load the COCO dataset. Appreciate any help.

Release training log

Thanks for your great work. Is it possible to release the training log together with the pretrained weights? That would be easier for further comparison.

loss problem

In OhemCELoss, you set a parameter named score_thres, what's the role of this parameter?

Two class Segmentation, but loss does not going down

Hi,
Thank you for your sharing.
I changed your network to do a two-class segmentation.But met some probles
what I changed is :
dataloader: background as 0, target as 1;
lr_start = 2.5e-6; warmup_start_lr = 1e-7;
I have 7000 images to train;
set batch_size=6, epochs=20;
use resnet18 as backbone;
train on a single GPU;

What bother me is the loss:
loss down to 1.7 from 2.3 in the first 2.5k step, but after that , loss begin to shake with great amplitude and going up. I adjust learning rate but that does not work, could you give me some advise?

Model is not saved in training

Hi, there, I am a newbie at pytorch. I did not find any codes about saving the models during training, only the final model is saved after 80k iterations, is that correct? What if the training process is interrupted by accident? Where can I find the log or restore the trained parameters??

By the way, I am using 2 Tesla K80 to train the model, and it takes almost 2 days for the first 33950 iters. However, the author claims that it takes almost 1 day using 2 1080ti. What am I missing?

About the inference time

I have tested your model params of the diss version, it's speed is very slow which could to 50ms in 1024x2048 size. your result is very good, but some still not clear.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.