Giter Site home page Giter Site logo

irvingao / boundaryformer Goto Github PK

View Code? Open in Web Editor NEW

This project forked from mlpc-ucsd/boundaryformer

0.0 0.0 0.0 6.29 MB

Code for CVPR2022 paper: Instance Segmentation with Mask-supervised Polygonal Boundary Transformers

License: Apache License 2.0

Shell 0.42% C++ 2.45% Python 93.12% Cuda 3.90% CMake 0.02% Dockerfile 0.10%

boundaryformer's Introduction

Instance Segmentation With Mask-Supervised Polygonal Boundary Transformers

From Justin Lazarow (UCSD, now at Apple), Weijian Xu (UCSD, now at Microsoft), and Zhuowen Tu (UCSD).

This repository is an official implementation of the paper Instance Segmentation With Mask-Supervised Polygonal Boundary Transformers presented at CVPR 2022.

Introduction

BoundaryFormer aims to provide a simple baseline for regression-based instance segmentation. Notably, we use Transformers to regress a fixed number of points along a simple polygonal boundary. This process makes continuous predictions and is thus end-to-end differentiable. Our method differs from previous work in the field in two main ways: our method can match Mask R-CNN in Mask AP for the first time and we impose no additional supervision or ground-truth requirements as Mask R-CNN. That is, our method achieves parity in mask quality and supervision to mask-based baselines. We accomplish this by solely relying on a differentiable rasterization module (implemented in CUDA) which only requires access to ground-truth masks. We hope this can serve to drive further work in this area.

Installation

BoundaryFormer uses the same installation process as Detectron2. Please see installation instructions. This should generally require something like:

pip install -ve .

at the root of the source tree (as long as PyTorch, etc are installed correctly.

BoundaryFormer also uses the deformable attention modules introduced in Deformable-DETR. If this is already installed on your system, no action is needed. Otherwise, please build their modules:

git clone https://github.com/fundamentalvision/Deformable-DETR
cd Deformable-DETR/models/ops
sh ./make.sh
# unit test (should see all checking is True)
python test.py

Getting Started

BoundaryFormer follows the general guidelines of Detectron2, however, it lives under projects/BoundaryFormer.

Please make sure to set two additional environmental variables on your system:

export DETECTRON2_DATASETS=/path/to/datasets
export DETECTRON2_OUTPUTS=/path/to/outputs

For instance, to train on COCO using an R50 backbone at a 1x schedule:

python projects/BoundaryFormer/train_net.py --num-gpus 8 --config-file projects/BoundaryFormer/configs/COCO-InstanceSegmentation/boundaryformer_rcnn_R_50_FPN_1x.yaml COMMENT "hello model"

If you do not have 8 GPUs, adjust --num-gpus and your BATCH_SIZE accordingly. BoundaryFormer is trained with AdamW and we find the square-root scaling law to work well (i.e., a batch size of 8 should only induce a sqrt(2) change in LR).

Relevant Hyperparameters/Configuration Options

BoundaryFormer has a few hyperparameter options. Generally, these are configured under cfg.MODEL.BOUNDARY_HEAD (see projects/BoundaryFormer/boundary_former/config.py). Please see the paper for ablations of these values.

Number of layers

cfg.MODEL.BOUNDARY_HEAD.NUM_DEC_LAYERS = 4

We generally find that 4 layers is sufficient for good performance. A small amount of performance is lost by reducing this to 3 and otherwise increasing it doesn't generally change performance.

NOTE: if upsampling is used, this is generally ignored and computed by a combination of cfg.MODEL.BOUNDARY_HEAD.POLY_NUM_PTS and cfg.MODEL.BOUNDARY_HEAD.UPSAMPLING_BASE_NUM_PTS.

Number of control points

cfg.MODEL.BOUNDARY_HEAD.POLY_NUM_PTS = 64

This defines the number of points at the final output layer. If upsampling (see next section) is not used, this also constitutes the number of points at any intermediate layer. Generally, we find Cityscapes to benefit from more than 64 points (e.g. 128) but COCO less so.

Upsampling behavior

Upsampling constitutes our coarse-to-fine strategy which can reduce memory and computation. Rather than using the same number of points at each layer, we start off with a small number of points and upsample (2x) the points in a naive manner (midpoints) at each subsequent layer. To enable:

cfg.MODEL.BOUNDARY_HEAD.UPSAMPLING = True
cfg.MODEL.BOUNDARY_HEAD.UPSAMPLING_BASE_NUM_PTS = 8
cfg.MODEL.BOUNDARY_HEAD.POLY_NUM_PTS = 64

This will create a 4-layer (8 * 2 ** 3 = 64) coarse-to-fine model

Rasterization resolution

BoundaryFormer uses differentiable rasterization to transform the predicted polygons into mask space for supervision. To control the resolution:

cfg.MODEL.DIFFRAS.RESOLUTIONS = [64, 64]

is a flattened (e.g. for X and Y resolutions) list. This can be modified per layer by expanding it. For a two-layer model:

cfg.MODEL.DIFFRAS.RESOLUTIONS = [32, 32, 64, 64]

would supervise the first layer at 32 x 32 and the second at 64 x 64.

Rasterization smoothness

In the same way as SoftRas, we require some rasterization smoothness to differentiably rasterize the masks.

    cfg.MODEL.DIFFRAS.INV_SMOOTHNESS_SCHED = (0.001,)

will produce quite sharp rasterization (larger values will be "blurrier") which seems to work well. This can also be made to be dependent on the current iteration:

    cfg.MODEL.DIFFRAS.INV_SMOOTHNESS_SCHED = (0.15, 0.005)
    cfg.MODEL.DIFFRAS.INV_SMOOTHNESS_STEPS = (50000,)

to initially start with 0.15 and drop to 0.005 at iteration 50000. This hyperparameter is not particularly sensitive in our experience, however, too large of values will decrease performance.

Model Zoo

We release models for MS-COCO and Cityscapes.

COCO

Mask
head
Backbone lr
sched
Control
points
mask
AP
download
BoundaryFormer R50-FPN 64 36.1 model

Cityscapes

Mask
head
Backbone lr
sched
Control
points
initialization mask
AP
download
BoundaryFormer R50-FPN 64 ImageNet 34.7 model
BoundaryFormer R50-FPN 64 COCO 38.3 model

License

BoundaryFormer uses Detectron2 and is further released under the Apache 2.0 license.

Citing BoundaryFormer

If you use BoundaryFormer in your research, please use the following BibTeX entry.

@InProceedings{Lazarow_2022_CVPR,
    author    = {Lazarow, Justin and Xu, Weijian and Tu, Zhuowen},
    title     = {Instance Segmentation With Mask-Supervised Polygonal Boundary Transformers},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2022},
    pages     = {4382-4391}
}

boundaryformer's People

Contributors

alexander-kirillov avatar apivovarov avatar bowenc0221 avatar bryant1410 avatar bxiong1202 avatar chenbohua3 avatar chengyangfu avatar jlazarow avatar jonmorton avatar jss367 avatar kondela avatar lyttonhao avatar marcszafraniec avatar maxfrei750 avatar mrparosk avatar obendidi avatar patricklabatut avatar ppwwyyxx avatar puhuk avatar rajprateek avatar raymondcm avatar rbgirshick avatar sampepose avatar superirabbit avatar theschnitz avatar tkhe avatar vkhalidov avatar wangg12 avatar wat3rbro avatar wenliangzhao2018 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.