xy-guo / liga-stereo Goto Github PK
View Code? Open in Web Editor NEWCode for LIGA-Stereo Detector, ICCV'21
License: Apache License 2.0
Code for LIGA-Stereo Detector, ICCV'21
License: Apache License 2.0
Hello,
Thanks a lot for your wonderful work. I followed the instructions to install mmdet and mmcv. It returned the errors that "cannot import name 'MultiScaleDeformableAttention' from 'mmcv.cnn.bricks.transformer'". It seems that this module is not defined in mmcv.cnn.
I tried other versions, no one can match all the requires of the test repository. Could you please share the versions of mmcv and mmdet that you used in your project.
Thanks in advance. Hoping to hear from you soon.
Best.
Hi!Thanks for sharing your awesome code.
But I have some problem when i running this code...
My error massages:
data/kitti/training/image_2/001773.png
data/kitti/training/image_2/001816.png
data/kitti/training/image_2/002829.png
data/kitti/training/image_3/001773.png
data/kitti/training/image_3/001816.png
data/kitti/training/image_3/002829.png
{'NAME': 'filter_truncated', 'AREA_RATIO_THRESH': None, 'AREA_2D_RATIO_THRESH': None, 'GT_TRUNCATED_THRESH': 0.98}
filter truncated ratio: null 3d boxes [[ 2.99 -3.87 -0.66499996 4.43 1.84 1.75
-0.2907964 ]] flipped False image idx 890 frame_id 001773
/home/users/gaoshiyu01/anaconda3/envs/liga5/lib/python3.7/site-packages/torch/optim/lr_scheduler.py:123: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
"https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
/home/users/gaoshiyu01/anaconda3/envs/liga5/lib/python3.7/site-packages/torch/optim/lr_scheduler.py:143: UserWarning: The epoch parameter in `scheduler.step()` was not necessary and is being deprecated where possible. Please use `scheduler.step()` to step the scheduler. During the deprecation, if epoch is different from None, the closed form is used instead of the new chainable form, where available. Please open an issue if you are unable to replicate your use case: https://github.com/pytorch/pytorch/issues/new/choose.
warnings.warn(EPOCH_DEPRECATION_WARNING, UserWarning)
data/kitti/training/image_2/004052.png
data/kitti/training/image_3/004052.png
Traceback (most recent call last):
File "/home/users/gaoshiyu01/anaconda3/envs/liga5/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/users/gaoshiyu01/anaconda3/envs/liga5/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/users/gaoshiyu01/anaconda3/envs/liga5/lib/python3.7/site-packages/torch/distributed/launch.py", line 261, in <module>
main()
File "/home/users/gaoshiyu01/anaconda3/envs/liga5/lib/python3.7/site-packages/torch/distributed/launch.py", line 257, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/home/users/gaoshiyu01/anaconda3/envs/liga5/bin/python', '-u', 'tools/train.py', '--local_rank=1', '--launcher', 'pytorch', '--fix_random_seed', '--sync_bn', '--save_to_file', '--cfg_file', './configs/stereo/kitti_models/liga.3d-and-bev.yaml', '--exp_name', 'test1']' died with <Signals.SIGSEGV: 11>.
Seems like a common bug caused by mmdet, so i followed the instruction from: mmdet bug report and checked my running/compiling libraries with nvcc, but everything seems alright, i still have no idea how to fix it, could you please provide more info, thanks a lot :)
My environment:
Name Version Build Channel
libgcc_mutex 0.1 main defaults
_openmp_mutex 5.1 1_gnu defaults
addict 2.4.0 pypi_0 pypi
blas 1.0 mkl defaults
ca-certificates 2022.07.19 h06a4308_0 defaults
certifi 2022.6.15 py37h06a4308_0 defaults
cudatoolkit 10.1.243 h6bb024c_0 defaults
cycler 0.11.0 pypi_0 pypi
cython 0.29.32 pypi_0 pypi
easydict 1.9 pypi_0 pypi
fire 0.4.0 pypi_0 pypi
fonttools 4.37.2 pypi_0 pypi
freetype 2.11.0 h70c0345_0 defaults
future 0.18.2 pypi_0 pypi
giflib 5.2.1 h7b6447c_0 defaults
imageio 2.21.3 pypi_0 pypi
importlib-metadata 4.12.0 pypi_0 pypi
intel-openmp 2021.4.0 h06a4308_3561 defaults
jpeg 9e h7f8727e_0 defaults
kiwisolver 1.4.4 pypi_0 pypi
lcms2 2.12 h3be6417_0 defaults
ld_impl_linux-64 2.38 h1181459_1 defaults
lerc 3.0 h295c915_0 defaults
libdeflate 1.8 h7f8727e_5 defaults
libffi 3.3 he6710b0_2 defaults
libgcc-ng 11.2.0 h1234567_1 defaults
libgomp 11.2.0 h1234567_1 defaults
libpng 1.6.37 hbc83047_0 defaults
libstdcxx-ng 11.2.0 h1234567_1 defaults
libtiff 4.4.0 hecacb30_0 defaults
libwebp 1.2.2 h55f646e_0 defaults
libwebp-base 1.2.2 h7f8727e_0 defaults
liga 0.1.0+0 dev_0
llvmlite 0.39.1 pypi_0 pypi
lz4-c 1.9.3 h295c915_1 defaults
matplotlib 3.5.3 pypi_0 pypi
mkl 2021.4.0 h06a4308_640 defaults
mkl-service 2.4.0 py37h7f8727e_0 defaults
mkl_fft 1.3.1 py37hd3c417c_0 defaults
mkl_random 1.2.2 py37h51133e4_0 defaults
mmcv-full 1.2.1 pypi_0 pypi
mmdet 2.6.0 dev_0
mmpycocotools 12.0.3 pypi_0 pypi
ncurses 6.3 h5eee18b_3 defaults
networkx 2.6.3 pypi_0 pypi
ninja 1.10.2 h06a4308_5 defaults
ninja-base 1.10.2 hd09550d_5 defaults
numba 0.56.2 pypi_0 pypi
numpy 1.21.5 py37h6c91a56_3 defaults
numpy-base 1.21.5 py37ha15fc14_3 defaults
opencv-python 4.6.0.66 pypi_0 pypi
openssl 1.1.1q h7f8727e_0 defaults
packaging 21.3 pypi_0 pypi
pillow 9.2.0 py37hace64e9_1 defaults
pip 22.1.2 py37h06a4308_0 defaults
protobuf 3.20.1 pypi_0 pypi
pyparsing 3.0.9 pypi_0 pypi
python 3.7.13 h12debd9_0 defaults
python-dateutil 2.8.2 pypi_0 pypi
pytorch 1.6.0 py3.7_cuda10.1.243_cudnn7.6.3_0 pytorch
pywavelets 1.3.0 pypi_0 pypi
pyyaml 6.0 pypi_0 pypi
readline 8.1.2 h7f8727e_1 defaults
scikit-image 0.19.3 pypi_0 pypi
scipy 1.7.3 pypi_0 pypi
setuptools 59.8.0 pypi_0 pypi
six 1.16.0 pyhd3eb1b0_1 defaults
spconv 1.2.1 pypi_0 pypi
sqlite 3.39.2 h5082296_0 defaults
tensorboardx 2.5.1 pypi_0 pypi
termcolor 2.0.1 pypi_0 pypi
terminaltables 3.1.10 pypi_0 pypi
tifffile 2021.11.2 pypi_0 pypi
tk 8.6.12 h1ccaba5_0 defaults
torchvision 0.7.0 py37_cu101 pytorch
tqdm 4.64.1 pypi_0 pypi
typing-extensions 4.3.0 pypi_0 pypi
wheel 0.37.1 pyhd3eb1b0_0 defaults
xz 5.2.5 h7f8727e_1 defaults
yapf 0.32.0 pypi_0 pypi
zipp 3.8.1 pypi_0 pypi
zlib 1.2.12 h5eee18b_3 defaults
zstd 1.5.2 ha4553b6_0 defaults
</details>
Hi Xiaoyang! Thanks for your great work.
In the Introduction of LIGA-Stereo, you mentioned
'Comparing with traditional knowledge distillation for recognition tasks, we did not take the final erroneous classification and regression predictions from the LiDAR model as “soft” targets, which we found benefits little for training stereo detection networks.'
Could you please elaborate on your implementation process and experimental results?
Hi, thanks for your great work!
I have a question about the coordinate system.
I notice that in the stereo_kitti_dataset.py file, there is the introduction of a pseudo-lidar coordinate system.
I would like to know why this function is not rect_to_lidar, but rect_to_lidar_pseudo? Is there any difference in labelling between double and single purpose?
Thanks for your great work~
When I run the following commands:
python -m liga.datasets.kitti.kitti_dataset create_kitti_infos python -m liga.datasets.kitti.kitti_dataset create_gt_database_only
An error comes to me:
No module named liga.datasets.kitti.kitti_dataset
I find that there are only stereo_kitti_dataset.py and lidar_kitti_dataset.py in the path: liga/datasets/kitti/
Any suggestions would be deeply appreciated!
Thanks again.
If you find bugs about <THC/THC.h>, you can do the following modifications:
uncomment this line:
define a new ceil_div function:
int ceil_div(int a, int b){
return (a + b - 1) / b;
}
replace this line:
dim3 grid(std::min(ceil_div((long)(output_size / 2), 512), 4096));
replace this line:
dim3 grid(std::min(ceil_div((long)(grad.numel()), 512) , 4096));
replace THCudaCheck(cudaGetLastError());
with AT_CUDA_CHECK(cudaGetLastError());
hello,
what's the difference between pseudo-lidar coodinate and Lidar coordinate?
Thankyou
Thanks to your sharing,but when i first run following codes in my docker containers
'./scripts/dist_train.sh 1 dev configs/stereo/kitti_models/liga.yaml'
or
'./scripts/dist_test_ckpt.sh 1 ./configs/stereo/kitti_models/liga.yaml ./ckpt/pretrained_liga.pth'
nothing to show!
If I cancle this processing by ctrl+c, run it again that will show
'''bash
Traceback (most recent call last):
File "tools/train.py", line 211, in
main()
File "tools/train.py", line 73, in main
args.tcp_port, args.local_rank, backend='nccl'
File "/root/LIGA-Stereo-master/liga/utils/common_utils.py", line 181, in init_dist_pytorch
world_size=num_gpus
File "/root/miniconda3/envs/liga/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 422, in init_process_group
store, rank, world_size = next(rendezvous_iterator)
File "/root/miniconda3/envs/liga/lib/python3.7/site-packages/torch/distributed/rendezvous.py", line 126, in _tcp_rendezvous_handler
store = TCPStore(result.hostname, result.port, world_size, start_daemon, timeout)
RuntimeError: Address already in use
Traceback (most recent call last):
File "/root/miniconda3/envs/liga/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/root/miniconda3/envs/liga/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/root/miniconda3/envs/liga/lib/python3.7/site-packages/torch/distributed/launch.py", line 261, in
main()
File "/root/miniconda3/envs/liga/lib/python3.7/site-packages/torch/distributed/launch.py", line 257, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/root/miniconda3/envs/liga/bin/python', '-u', 'tools/train.py', '--local_rank=0', '--launcher', 'pytorch', '--fix_random_seed', '--sync_bn', '--save_to_file', '--cfg_file', 'configs/stereo/kitti_models/liga.yaml', '--exp_name', 'dev']' returned non-zero exit status 1.
'''
How should I solve it?
Thank you for your great contribution.
I do manage to compile everything in a docker with CUDA 11.0/pytorch 1.7.1. including spconv (it seems that spconv show no error in build and install)
But after it start training for the first step, the code ends with error:
CUDA_VISIBLE_DEVICES=0 ./scripts/dist_train.sh 1 exp_name configs/stereo/kitti_models/liga.3d-and-bev.yaml
subprocess.CalledProcessError: Command '['/usr/bin/python3', '-u', 'tools/train.py', '--local_rank=0', '--launcher', 'pytorch', '--fix_random_seed', '--sync_bn', '--save_to_file', '--cfg_file', 'configs/stereo/kitti_models/liga.3d-and-bev.yaml', '--exp_name', 'exp_name']' died with <Signals.SIGSEGV: 11>.
Then I rewrite your code for single GPU training without distributed training (the re-written code is in my fork repo). Everything looks the same and it turns out to be a segmentation fault.
python3 tools/train.py --cfg configs/stereo/kitti_models/liga.3d-and-bev.yaml --launcher=none --batch_size 1
Segmentation fault (core dumped)
I have not fully investigated where does it happen.
I then try using a lower CUDA version, but 3090 only supports CUDA 11+, and the current model is too large to fit into a single 1080Ti/2080Ti (similar to DSGN?).
First, thank you for your great work and code.
I saw in your code that you force the batch_size_per_gpu = 1. What's the reason for this config? If I want to train a larger batch size on a single GPU, which parts should I modify?
Look forward to your answer. Thanks.
Hello, there is "VoxelBackBone4x" in lidar model configs, but use "VoxelBackBone4xNoFinalBnReLU" in LIGA. why is that?
你好,图2里在这个模型中的b部分第一个BEV特征与第二个BEV特征之间用到的2D Aggregation Network 在代码中的那里,能否给指出详细的位置(具体到开始的那一行),感谢!
Hello Xiaoyang,
Thanks a lot for your great contribution! I am facing a problem when I run the following command:
python -m liga.datasets.kitti.kitti_dataset create_kitti_infos
python -m liga.datasets.kitti.kitti_dataset create_gt_database_only
First I didn't find "kitti_dataset" in the ~/liga/dataset/kitti/kitti_dataset, but I have lidar_kitti_dataset.py and stereo_kitti_dataset.py instead. Then I run this command "python -m liga.datasets.kitti.kitti_dataset create_kitti_infos", it returned the error: "AttributeError: module 'matplotlib.cbook' has no attribute '_rename_parameter'. "
Any ideas and suggestions will be helpful.
Thanks in advance.
Hello,
Thanks for your excellent work !
I have several problem about distributed training
When i try to "CUDA_VISIBLE_DEVICE=0 python3 tools/train.py --cfg_file ${cfg} --batch_size 1" and
"CUDA_VISIBLE_DEVICE=0 ./scripts/dist_train.sh 1 exp cfg_path", it is worked.
but when i try to
"python3 tools/train.py --cfg_file ${cfg} --batch_size 1" or
"CUDA_VISIBLE_DEVICE=0,1,2,3 python3 tools/train.py --cfg_file ${cfg} --batch_size 1" or
"CUDA_VISIBLE_DEVICE=0,1,2,3 ./scripts/dist_train.sh 4 exp cfg_path", That are not worked. How can i modify about the code for distributed training?
Hi, Xiaoyang! I'm trying to reimplement your awesome work.
In 'Getting Started', you mentioned 'Generate the data infos by running the following command:'
python -m liga.datasets.kitti.kitti_dataset create_kitti_infos
python -m liga.datasets.kitti.kitti_dataset create_gt_database_only
Unfortunately, these are only 'lidar_kitti_dataset' and 'stereo_kitti_dataset' in './liga/datasets/kitti/'. I successfully created kitti_infos and gt_database by running python -m liga.datasets.kitti.lidar_kitti_dataset create_kitti_infos
and python -m liga.datasets.kitti.lidar_kitti_dataset create_gt_database_only
.
However, I don't know how to create kitti_infos for the stereo detector. When I ran python -m liga.datasets.kitti.stereo_kitti_dataset create_kitti_infos
, I found that I can't get the .pkl files (kitti_infos) because there is no 'create_kitti_infos()' and 'create_gt_database_only' in stereo_kitti_dataset create_kitti_infos.py.
More directly, if I want to train the whole LIGA-Stereo instead of just the modified SECOND, should I first create kitti_infos for the Stereo detector and then run ./scripts/dist_train.sh ${NUM_GPUS} 'exp_name' ./configs/stereo/kitti_models/liga.3d-and-bev.yaml
?
Look forward to your answer!
Thank for your excellent work!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.