Giter Site home page Giter Site logo

bas-extension's Introduction

Background Activation Suppression for Weakly Supervised Object Localization and Semantic Segmentation (IJCV)

PyTorch implementation of ''Background Activation Suppression for Weakly Supervised Object Localization and Semantic Segmentation''. This repository contains PyTorch training code, inference code and pretrained models. ''Background Activation Suppression for Weakly Supervised Object Localization and Semantic Segmentation'' is built upon our conference version (CVPR 2022).

πŸ“‹ Table of content

  1. πŸ“Ž Paper Link
  2. πŸ’‘ Abstract
  3. ✨ Motivation
  4. πŸ“– Method
  5. πŸ“ƒ Requirements
  6. ✏️ Usage
    1. Start
    2. Download Datasets
    3. WSOL task
    4. WSSS task
  7. πŸ“Šβ›Ί Experimental Results and Model Zoo
  8. βœ‰οΈ Statement
  9. πŸ” Citation

πŸ“Ž Paper Link

  • Background Activation Suppression for Weakly Supervised Object Localization (CVPR2022) (link)

    Authors: Pingyu Wu*, Wei Zhai*, Yang Cao

    Institution: University of Science and Technology of China (USTC)

  • Background Activation Suppression for Weakly Supervised Object Localization and Semantic Segmentation (IJCV) (link)

    Authors: Wei Zhai*, Pingyu Wu*, Kai Zhu, Yang Cao, Feng Wu, Zheng-Jun Zha

    Institution: University of Science and Technology of China (USTC) & Institute of Artificial Intelligence, Hefei Comprehensive National Science Center

πŸ’‘ Abstract

Weakly supervised object localization (WSOL) aims to localize objects using only image-level labels. Recently a new paradigm has emerged by generating a foreground prediction map (FPM) to achieve localization task. Existing FPM-based methods use cross-entropy (CE) to evaluate the foreground prediction map and to guide the learning of generator. We argue for using activation value to achieve more efficient learning. It is based on the experimental observation that, for a trained network, CE converges to zero when the foreground mask covers only part of the object region. While activation value increases until the mask expands to the object boundary, which indicates that more object areas can be learned by using activation value. In this paper, we propose a Background Activation Suppression (BAS) method. Specifically, an Activation Map Constraint (AMC) module is designed to facilitate the learning of generator by suppressing the background activation value. Meanwhile, by using foreground region guidance and area constraint, BAS can learn the whole region of the object. In the inference phase, we consider the prediction maps of different categories together to obtain the final localization results. Extensive experiments show that BAS achieves significant and consistent improvement over the baseline methods on the CUB-200-2011 and ILSVRC datasets. In addition, our method also achieves state-of-the-art weakly supervised semantic segmentation performance on the PASCAL VOC 2012 and MS COCO 2014 datasets. Code and models are available at https://github.com/wpy1999/BAS-Extension.

✨ Motivation


Motivation. (A) Experimental procedure and related definitions. (B) The entropy value of CE loss w.r.t foreground mask and foreground activation value w.r.t foreground mask. (C) The results with statistical significance. Implementation details of the experiment and further results are available in Section 3.5.

πŸ“– Method


The architecture of the proposed BAS. The architecture of the proposed BAS in the training phase. The class-specific foreground prediction map Mf and the coupled background prediction map Mb are obtained by the generator according to the ground-truth class (GT), and then fed into the Activation Map Constraint module together with the feature map F.


Applying BAS to weakly supervised semantic segmentation task..

πŸ“ƒ Requirements

  • python 3.6.10
  • torch 1.4.0
  • torchvision 0.5.0
  • opencv 4.5.3

✏️ Usage

Start

git clone https://github.com/wpy1999/BAS-Extension.git
cd BAS-Extension

Download Datasets

WSOL task

cd WSOL

Training CUB/ OpenImages (with single gpu)

python train.py --arch ${Backbone}

Training ILSVRC/ CUB/ OpenImages (with multiple gpus)

CUDA_VISIBLE_DEVICES="0,1,2,3" python -m torch.distributed.launch --nproc_per_node 4 train_ILSVRC.py

Inference

To test the localization accuracy on CUB/ ILSVRC, you can download the trained models from Model Zoo, then run evaluator.py:

python evaluator.py  

To test the segmentation accuracy on CUB/ OpenImages, you can download the trained models from Model Zoo, then run count_pxap.py:

python count_pxap.py 

WSSS task

Training PASCAL

CUDA_VISIBLE_DEVICES="0,1,2,3" python -m torch.distributed.launch --nproc_per_node 4 train_bas.py

Inference

To test the segmentation accuracy on PASCAL, you can download the trained models from Model Zoo, then run run_sample.py:

python run_sample.py  

πŸ“Šβ›Ί Experimental Results and Model Zoo

You can download all the trained models here (WSOL (Google Drive, Baidu Drive); WSSS (Google Drive, Baidu Drive))

or download any one individually as follows:

CUB models

Top1 Loc Top5 Loc GT Known Weights
VGG 70.90 85.36 91.04 Google Drive, Baidu Drive
MobileNet 70.54 86.71 93.04 Google Drive, Baidu Drive
ResNet 76.75 90.04 95.41 Google Drive, Baidu Drive
Inception 72.09 88.11 94.63 Google Drive, Baidu Drive

ILSVRC models

Top1 Loc Top5 Loc GT Known Weights
VGG 52.94 65.38 69.66 Google Drive, Baidu Drive
MobileNet 53.05 66.68 72.03 Google Drive, Baidu Drive
ResNet 57.46 68.57 72.00 Google Drive, Baidu Drive
Inception 58.50 69.03 72.07 Google Drive, Baidu Drive

OpenImages

PIoU PxAP Weights
ResNet 50.72 66.86 Google Drive, Baidu Drive

PASCAL VOC models

On the PASCAL VOC 2012 training set. The results on the other baseline methods can be obtained in the same way.

Seed Weights Mask Weights
Our 57.7 Google Drive, Baidu Drive
Our + IRN 58.2 Google Drive, Baidu Drive 71.1 Google Drive, Baidu Drive

On the PASCAL VOC 2012 val and test sets (DeepLabv2).

Val Test Weights
Ours 69.6 69.9 Google Drive, Baidu Drive

βœ‰οΈ Statement

This project is for research purpose only, please contact us for the licence of commercial use. For any other questions please contact [email protected] or [email protected].

πŸ” Citation

@inproceedings{wu2022background,
  title={Background Activation Suppression for Weakly Supervised Object Localization},
  author={Wu, Pingyu and Zhai, Wei and Cao, Yang},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={14248--14257},
  year={2022}
}
@article{zhai2023background,
  title={Background Activation Suppression for Weakly Supervised Object Localization and Semantic Segmentation},
  author={Zhai, Wei and Wu, Pingyu and Zhu, Kai and Cao, Yang and Wu, Feng and Zha, Zheng-Jun},
  journal={International Journal of Computer Vision},
  pages={1--26},
  year={2023},
  publisher={Springer}
}

bas-extension's People

Contributors

tiaotiao11-22 avatar wpy1999 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

cv-seg

bas-extension's Issues

About mobilenet with pre-training weights.

Thank you for sharing this awesome code.
I would like to ask, can you provide a pre-training weight file about MobileNet v1?
Because I can't find the pre-trained weight file you use in your code(WSOL/Model/mobilenet.pyοΌ‰ on the web.

question

Hello author, may I ask if this sess/resnet50-19c8e357.pth file is not available Thank you!

weight

Hello author, the weight of CAM is required in the code. Could you please upload it? Thank you

About binary classification problem (a target class and background)

Hello, author! I'm very interested in your work.
Recently I was trying to do a binary weakly supervised segmentation task with your code (containing only one target class and the rest as background), but I ran into some problems, such as:
(1) num_class is set to 1 or 2;
(2) How to set the label
I experimented with num_class set to 2 (because setting to 1 seemed to give an error), but the results were poor (not as good as the original IRN method), so I wanted to ask if this work was appropriate for my task.
Thank you and look forward to your reply!

Hello I have a question about your extended paper

First of all, I am very interested in your research and papers. Also, thank you for sharing these studies :)

So, I'm wondering. Is your extended paper under revision now?
Or.... Don't you have any plan about your extended paper?
Then.. I'm so sad :(

I want to know about your research results and the contents of your paper.
I'm still waiting your official paper

Thanks for reading and I'll be waiting for your answer.
Have a good day

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.