Giter Site home page Giter Site logo

cvi-szu / clims Goto Github PK

View Code? Open in Web Editor NEW
115.0 4.0 11.0 42.52 MB

[CVPR 2022] CLIMS: Cross Language Image Matching for Weakly Supervised Semantic Segmentation

License: MIT License

Python 99.15% Shell 0.85%
semantic-segmentation weakly-supervised-learning weakly-supervised-segmentation

clims's Introduction

CLIMS

Code repository for our paper "CLIMS: Cross Language Image Matching for Weakly Supervised Semantic Segmentation" in CVPR 2022.

😍 Code for our paper "CCAM: Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localization and Semantic Segmentation" in CVPR 2022 is also available here.

Please to NOTE that this repository is an improved version of our camera-ready version (you can refer to the directory of previous_version/). We recommend to use our improved version of CLIMS instead of camera-ready version.

Dataset

PASCAL VOC2012

You will need to download the images (JPEG format) in PASCAL VOC2012 dataset from here and train_aug ground-truth can be found here. Make sure your data/VOC2012 folder is structured as follows:

├── VOC2012/
|   ├── Annotations
|   ├── ImageSets
|   ├── SegmentationClass
|   ├── SegmentationClassAug
|   └── SegmentationObject

MS-COCO 2014

You will need to download the images (JPEG format) in MSCOCO 2014 dataset here and ground-truth mask can be found here. Make sure your data/COCO folder is structured as follows:

├── COCO/
|   ├── train2014
|   ├── val2014
|   ├── annotations
|   |   ├── instances_train2014.json
|   |   ├── instances_val2014.json
|   ├── mask
|   |   ├── train2014
|   |   ├── val2014

Training on PASCAL VOC2012

  1. Install CLIP.
$ pip install ftfy regex tqdm
$ pip install git+https://github.com/openai/CLIP.git
  1. Download pre-trained baseline CAM ('res50_cam.pth') here and put it at the directory of cam-baseline-voc12/.
  2. Train CLIMS on PASCAL V0C2012 dataset to generate initial CAMs.
CUDA_VISIBLE_DEVICES=0 python run_sample.py --voc12_root /data1/xjheng/dataset/VOC2012/ --hyper 10,24,1,0.2 --clims_num_epoches 15 --cam_eval_thres 0.15 --work_space clims_voc12 --cam_network net.resnet50_clims --train_clims_pass True --make_clims_pass True --eval_cam_pass True
  1. Train IRNet and generate pseudo semantic masks.
CUDA_VISIBLE_DEVICES=0 python run_sample.py --voc12_root /data1/xjheng/dataset/VOC2012/ --cam_eval_thres 0.15 --work_space clims_voc12 --cam_network net.resnet50_clims --cam_to_ir_label_pass True --train_irn_pass True --make_sem_seg_pass True --eval_sem_seg_pass True
  1. Train DeepLabv2 using pseudo semantic masks.
cd segmentation/

Evaluation Results

The quality of initial CAMs and pseudo masks on PASCAL VOC2012.

Method backbone CAMs + RW + IRNet
CLIMS(camera-ready) R50 56.6 70.5 -
CLIMS(this repo) R50 58.6 ~73 74.1

Evaluation results on PASCAL VOC2012 val and test sets.

Please cite the results of camera-ready version

Method Supervision Network Pretrained val test
AdvCAM I DeepLabV2 ImageNet 68.1 68.0
EDAM I+S DeepLabV2 COCO 70.9 70.6
CLIMS(camera-ready) I DeepLabV2 ImageNet 69.3 68.7
CLIMS(camera-ready) I DeepLabV2 COCO 70.4 70.0
CLIMS(this repo) I DeepLabV2 ImageNet 70.3 70.6
CLIMS(this repo) I DeepLabV2 COCO 71.4 71.2
CLIMS(this repo) I DeepLabV1-R38 ImageNet 73.3 73.4

(Please cite the results of camera-ready version. Initial CAMs, pseudo semantic masks, and pre-trained models of camera-ready version can be found at Google Drive)

Training on MSCOCO 2014

  1. Download pre-trained baseline CAM ('res50_cam.pth') at here and put it at the directory of cam-baseline-coco/.
  2. Train CLIMS on MSCOCO 2014 dataset to generate initial CAMs.
CUDA_VISIBLE_DEVICES=6,7 python -m torch.distributed.launch --nproc_per_node=2 run_sample_coco.py --work_space clims_coco --clims_network net.resnet50_clims --train_clims_pass True --make_clims_pass True --eval_cam_pass True --clims_num_epoches 8 --cam_eval_thres 0.15 --hyper 2,14,1.25,0.2 --cam_batch_size 16 --clims_learning_rate 0.0005 --use_distributed_train True --cbs_loss_thresh 0.285

If you are using our code, please consider citing our paper.

@InProceedings{Xie_2022_CVPR,
    author    = {Xie, Jinheng and Hou, Xianxu and Ye, Kai and Shen, Linlin},
    title     = {CLIMS: Cross Language Image Matching for Weakly Supervised Semantic Segmentation},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2022},
    pages     = {4483-4492}
}
@article{xie2022cross,
  title={Cross Language Image Matching for Weakly Supervised Semantic Segmentation},
  author={Xie, Jinheng and Hou, Xianxu and Ye, Kai and Shen, Linlin},
  journal={arXiv preprint arXiv:2203.02668},
  year={2022}
}

This repository was highly based on IRNet, thanks for Jiwoon Ahn's great code.

clims's People

Contributors

sierkinhane avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

clims's Issues

How to obtain pre-trained baseline CAM

Hi, I have a question about how to obtain the pre-trained baseline CAM (res50_cam.pth).

Currently this repo directly provides the checkpoint, and I'd like to know how this model is trained. Could you please explain?

And if I want to use CLIMS on a custom dataset, do I need to retrain the res50_cam.pth based on my custom dataset? Thanks!

Training time

Thanks for your great work!

Could you share the approximate training time for each stage (CLIP-finetune, CLIMS, and affinitynet)? This will help me a lot.

是否可提供训练好的权重档作复现?

您好,想请教是否可提供训练好的权重档?

因为我照您的训练步骤做复现,训练多次后得出初始CAMS的mIoU均值只有54.449,标准差0.0811,未能达到论文提及的56.6指标。

about deeplab setting

Hi, thanks for your great work.
I notice that in the last step of training segmentation, the deeplabv1 with WResNet38 outperforms others. However, I does not find the WResNet38 backbone in the provided link of https://github.com/kazuto1011/deeplab-pytorch . I wonder if you reproduce it? Can you please provide the code about DeepLabv1(WResNet38) or tell me where I can find relevant code?
Thanks.

Creation of sem-seg

Thank you for the repository, it's very useful for my research work.

I would like to know if there is any suggestion for speeding up the sem-seg creation process ? It's taking a lot of time.

Undefined Function get_dataset

Hi,

When I run CUDA_VISIBLE_DEVICES=0 bash run_voc12_coco_pretrained.sh to train DeepLab v2, I met following error.

Traceback (most recent call last):
  File "main.py", line 26, in <module>
    from libs.datasets import get_dataset
ImportError: cannot import name 'get_dataset' from 'libs.datasets' (unknown location)

It looks like the function get_dataset is not defined. Could you check, thanks!

读取数据集出现错误

Traceback (most recent call last):
File "/media/jk1803/E/jc/CLIMS-master/run_sample.py", line 144, in
step.train_clims.run(args)
File "/media/jk1803/E/jc/CLIMS-master/step/train_clims.py", line 69, in run
train_dataset = voc12.dataloader.VOC12ClassificationDataset(args.train_list, voc12_root=args.voc12_root,
File "/media/jk1803/E/jc/CLIMS-master/voc12/dataloader.py", line 167, in init
super().init(img_name_list_path, voc12_root,
File "/media/jk1803/E/jc/CLIMS-master/voc12/dataloader.py", line 118, in init
self.img_name_list = load_img_name_list(img_name_list_path)
File "/media/jk1803/E/jc/CLIMS-master/voc12/dataloader.py", line 62, in load_img_name_list
img_name_list = np.loadtxt(dataset_path, dtype=np.int32)
File "/home/jk1803/anaconda3/envs/CLIMs/lib/python3.8/site-packages/numpy/lib/npyio.py", line 1338, in loadtxt
arr = _read(fname, dtype=dtype, comment=comment, delimiter=delimiter,
File "/home/jk1803/anaconda3/envs/CLIMs/lib/python3.8/site-packages/numpy/lib/npyio.py", line 999, in _read
arr = _load_from_filelike(
ValueError: could not convert string '2007_000032' to int32 at row 0, column 1.

test time

您好,非常感谢您的代码,为我的工作提供了很多帮助。
请问使用您提供的deeplabv2以及后处理的代码,在coco2014 val数据集上测试大概花费多长时间?

How to extract background image features

When calculating the cosine similarity between the background and text, only the features of the background are extracted? and How to delete the features of the foreground objects? I try to make the foreground object black in the image, and keep the background ,but sometimes CLIP still recognizes that object and make a high scores. So I do not know how did you extract image features from the background of the image.
image

Hi,should I create a directory cam-baseline-voc12?

Should I create a directory cam-baseline-voc12 and put res50_cam.pth in it?
Is the res50_cam.pth pre-trained on ImageNet by your team?
What is the difference between this res50_cam.pth and net/resnet50_cam.py trained by myself?
If I run the run_sample.py command line, am I just train your .pth more time(epoch) or I train the model from scratch?
Sorry to ask such many easy questions, I am the beginner of WSSS. Thank you so much!

Dataset

This data set link download is invalid.
Could you please give me a new download link

Error on load_img_name_list function

Hi, thanks for your great work!

However, I met an error when run the new version of CLIMS. The error information is shown below.

Traceback (most recent call last):
  File "run_sample.py", line 144, in <module>
    step.train_clims.run(args)
  File "/home/wensheng/code/for_dev/CLIMS/step/train_clims.py", line 66, in run
    train_dataset = voc12.dataloader.VOC12ClassificationDataset(args.train_list, voc12_root=args.voc12_root,
  File "/home/wensheng/code/for_dev/CLIMS/voc12/dataloader.py", line 167, in __init__
    super().__init__(img_name_list_path, voc12_root,
  File "/home/wensheng/code/for_dev/CLIMS/voc12/dataloader.py", line 118, in __init__
    self.img_name_list = load_img_name_list(img_name_list_path)
  File "/home/wensheng/code/for_dev/CLIMS/voc12/dataloader.py", line 62, in load_img_name_list
    img_name_list = np.loadtxt(dataset_path, dtype=np.int32)
  File "/home/wensheng/anaconda3/envs/clims/lib/python3.8/site-packages/numpy/lib/npyio.py", line 1338, in loadtxt
    arr = _read(fname, dtype=dtype, comment=comment, delimiter=delimiter,
  File "/home/wensheng/anaconda3/envs/clims/lib/python3.8/site-packages/numpy/lib/npyio.py", line 999, in _read
    arr = _load_from_filelike(
ValueError: could not convert string '2007_000032' to int32 at row 0, column 1.

The problem is in the load_img_name_list function (here).

Since the text in the train_aug.txt is like 2007_000032, while the load_img_name_list function tries to load the txt by img_name_list = np.loadtxt(dataset_path, dtype=np.int32), which causes problem. I guess we will need to modify either the train_aug.txt to remove the middle underline _, or adjust the load_img_name_list function.

Could you check, thanks!

Ran out of input

Hi😄 thanks for your code and paper,but i meet an error when i run:
CUDA_VISIBLE_DEVICES=0 python run_sample.py --voc12_root /data1/xjheng/dataset/VOC2012/ --hyper 10,24,1,0.2 --clims_num_epoches 15 --cam_eval_thres 0.15 --work_space clims_voc12 --cam_network net.resnet50_clims --train_clims_pass True --make_clims_pass True --eval_cam_pass True

error happend:
Traceback (most recent call last):
File "/home/anaconda3/lib/python3.8/site-packages/clip/clip.py", line 129, in load
model = torch.jit.load(opened_file, map_location=device if jit else "cpu").eval()
File "/home/anaconda3/lib/python3.8/site-packages/torch/jit/init.py", line 277, in load
cpp_module = torch._C.import_ir_module_from_buffer(cu, f.read(), map_location, _extra_files)
RuntimeError:
aten::_convolution(Tensor input, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups, bool benchmark, bool deterministic, bool cudnn_enabled) -> (Tensor):
Expected at most 12 arguments but found 13 positional arguments.
:
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/conv.py(420): _conv_forward
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/conv.py(423): forward
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py(709): _slow_forward
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py(725): _call_impl
/root/workspace/multimodal-pytorch/multimodal/model/multimodal_transformer.py(85): forward
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py(709): _slow_forward
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py(725): _call_impl
/root/workspace/multimodal-pytorch/multimodal/model/multimodal_transformer.py(221): visual_forward
/opt/conda/lib/python3.7/site-packages/torch/jit/_trace.py(940): trace_module
(36): export_torchscript_models
(3):
/opt/conda/lib/python3.7/site-packages/IPython/core/interactiveshell.py(3418): run_code
/opt/conda/lib/python3.7/site-packages/IPython/core/interactiveshell.py(3338): run_ast_nodes
/opt/conda/lib/python3.7/site-packages/IPython/core/interactiveshell.py(3147): run_cell_async
/opt/conda/lib/python3.7/site-packages/IPython/core/async_helpers.py(68): _pseudo_sync_runner
/opt/conda/lib/python3.7/site-packages/IPython/core/interactiveshell.py(2923): _run_cell
/opt/conda/lib/python3.7/site-packages/IPython/core/interactiveshell.py(2878): run_cell
/opt/conda/lib/python3.7/site-packages/IPython/terminal/interactiveshell.py(555): interact
/opt/conda/lib/python3.7/site-packages/IPython/terminal/interactiveshell.py(564): mainloop
/opt/conda/lib/python3.7/site-packages/IPython/terminal/ipapp.py(356): start
/opt/conda/lib/python3.7/site-packages/traitlets/config/application.py(845): launch_instance
/opt/conda/lib/python3.7/site-packages/IPython/init.py(126): start_ipython
/opt/conda/bin/ipython(8):
Serialized File "code/torch/torch/nn/modules/conv/___torch_mangle_9366.py", line 8
def forward(self: torch.torch.nn.modules.conv.___torch_mangle_9366.Conv2d,
input: Tensor) -> Tensor:
x = torch._convolution(input, self.weight, None, [32, 32], [0, 0], [1, 1], False, [0, 0], 1, False, False, True, True)
~~~~~~~~~~~~~~~~~~ <--- HERE
return x
def forward1(self: torch.torch.nn.modules.conv.___torch_mangle_9366.Conv2d,

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "run_sample.py", line 144, in
step.train_clims.run(args)
File "/home/lihaoyu/CLIMS/step/train_clims.py", line 101, in run
clip_model, preprocess = clip.load(args.clip, device=device)
File "/home/lihaoyu/anaconda3/lib/python3.8/site-packages/clip/clip.py", line 136, in load
state_dict = torch.load(opened_file, map_location="cpu")
File "/home/lihaoyu/anaconda3/lib/python3.8/site-packages/torch/serialization.py", line 585, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/home/lihaoyu/anaconda3/lib/python3.8/site-packages/torch/serialization.py", line 755, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
EOFError: Ran out of input

how can i solve it?

Need Coco baseline scores

Hi!

We were trying to run the Coco codes, have a mIOU ~0.34. I am not sure if this is what we should expect, so it would be great if you could add baseline results to the repository. We did not change the default tuning settings.

Thanks

请问如何Finetune CLIP模型?

您好:
拜读了您的论文,原文中提到:we use the text label descriptions in the training set to finetune the CLIP model (both image and text encoder) for 20 epochs, with an initial learning rate of 0.00005 and a weight decay of 0.003,但是您开源的代码中并没有给出这一过程。您给出了另外一段训练resnet50_cam的代码,但是论文中却并没有提及。我有些困惑,上述哪部分是运行您的算法的必须过程?

  烦请您解答,谢谢

How to train DeepLabV1-R38 ?

Thank you in advance for providing the open source code for this research, so that my research can proceed smoothly.
I would like to ask you about the training method of CLIMS(this repo) DeepLabV1-R38, how to improve the program in order to train it? Thank you.

About The quality of initial CAMs

Hi, thanks for sharing this great work. I have some detail questions regarding the results in https://github.com/CVI-SZU/CLIMS#the-quality-of-initial-cams-and-pseudo-masks-on-pascal-voc2012. First, I think these results (56.6 / 58.6) is evaluated on the train set. But which one, the 1464 images original one or the 10582 images augmented one? Second, are these results (56.6 / 58.6) obtained after dCRF or not? If not, has dCRF participated in your pipeline? As far as I understand, following codes:

pred = imutils.crf_inference_label(img, fg_conf_cam, n_labels=keys.shape[0])

makes dCRF not contribute to the (56.6 / 58.6) results, but to the (70.5 / 73) results. Am I right?
Looking forward to your reply. Thanks!

关于梯度

您好,使用clip进行图文相似度计算,继而用来监督mask生成的idea之前也有考虑过。可当时实现时候因为clip的层数极深,造成算出的loss在优化前端模型时,无法得到有力的梯度。咨询下您是否有遇到过这个问题。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.