Giter Site home page Giter Site logo

dumpmemory / x-decoder Goto Github PK

View Code? Open in Web Editor NEW

This project forked from microsoft/x-decoder

0.0 0.0 0.0 9.66 MB

Official Implementation of X-Decoder for generalized decoding for pixel, image and language

License: MIT License

Python 100.00%

x-decoder's Introduction

X-Decoder: Generalized Decoding for Pixel, Image, and Language

[Project Page] [Paper] [HuggingFace All-in-One Demo] [HuggingFace Instruct Demo] [Video]

by Xueyan Zou*, Zi-Yi Dou*, Jianwei Yang*, Zhe Gan, Linjie Li, Chunyuan Li, Xiyang Dai, Harkirat Behl, Jianfeng Wang, Lu Yuan, Nanyun Peng, Lijuan Wang, Yong Jae Lee^, Jianfeng Gao^ in CVPR 2023.

๐ŸŒถ๏ธ Getting Started

๐Ÿ‘‰ [New] Latest Checkpoints and Numbers:

COCO ADE Ref-COCO COCO-Karpathy
Backbone Checkpoint PQ mAP mIoU PQ mAP mIoU mIoU ir@1 tr@1 CIDEr
Focal-T last 50.8 39.5 62.4 9.6 23.9 63.2 30.0 48.3 83.3
Focal-T best_open_seg 48.8 37.0 60.2 10.1 29.1 61.6 30.2 48.36
Focal-L last 56.2 46.4 65.5 11.5 23.6 67.7 34.9 54.4
Focal-L best_open_seg 51.5 41.3 64.1 11.7 29.4 61.5 30.7 50.1

Note the number in Table 1 in main paper is after task specific finetuning.

๐Ÿ‘‰ [New] Installation, Training, Evaluation, Dataset, and Demo Guide

๐Ÿ”ฅ News

  • [2023.07.19] ๐ŸŽข We are excited to release the x-decoder training code (INSTALL.md, DATASET.md, TRAIN.md, EVALUATION.md)!
  • [2023.07.10] We release Semantic-SAM, a universal image segmentation model to enable segment and recognize anything at any desired granularity. Code and checkpoint are available!
  • [2023.04.14] We are releasing SEEM, a new universal interactive interface for image segmentation! You can use it for any segmentation tasks, way beyond what X-Decoder can do!

  • [2023.03.20] As an aspiration of our X-Decoder, we developed OpenSeeD ([Paper][Code]) to enable open-vocabulary segmentation and detection with a single model, Check it out!
  • [2023.03.14] We release X-GPT which is an conversational version of our X-Decoder through GPT-3 langchain!
  • [2023.03.01] The Segmentation in the Wild Challenge had been launched and ready for submitting results!
  • [2023.02.28] We released the SGinW benchmark for our challenge. Welcome to build your own models on the benchmark!
  • [2023.02.27] Our X-Decoder has been accepted by CVPR 2023!
  • [2023.02.07] We combine X-Decoder (strong image understanding), GPT-3 (strong language understanding) and Stable Diffusion (strong image generation) to make an instructional image editing demo, check it out!
  • [2022.12.21] We release inference code of X-Decoder.
  • [2022.12.21] We release Focal-T pretrained checkpoint.
  • [2022.12.21] We release open-vocabulary segmentation benchmark.

๐Ÿ–Œ๏ธ DEMO

๐Ÿซ [X-GPT] โ€‚ ๐Ÿ“[Instruct X-Decoder]

demo

๐ŸŽถ Introduction

github_figure

X-Decoder is a generalized decoding model that can generate pixel-level segmentation and token-level texts seamlessly!

It achieves:

  • State-of-the-art results on open-vocabulary segmentation and referring segmentation on eight datasets;
  • Better or competitive finetuned performance to generalist and specialist models on segmentation and VL tasks;
  • Friendly for efficient finetuning and flexible for novel task composition.

It supports:

  • One suite of parameters pretrained for Semantic/Instance/Panoptic Segmentation, Referring Segmentation, Image Captioning, and Image-Text Retrieval;
  • One model architecture finetuned for Semantic/Instance/Panoptic Segmentation, Referring Segmentation, Image Captioning, Image-Text Retrieval and Visual Question Answering (with an extra cls head);
  • Zero-shot task composition for Region Retrieval, Referring Captioning, Image Editing.

Acknowledgement

Citation

@article{zou2022xdecoder,
  author      = {Zou*, Xueyan and Dou*, Zi-Yi and Yang*, Jianwei and Gan, Zhe and Li, Linjie and Li, Chunyuan and Dai, Xiyang and Wang, Jianfeng and Yuan, Lu and Peng, Nanyun and Wang, Lijuan and Lee*, Yong Jae and Gao*, Jianfeng},
  title       = {Generalized Decoding for Pixel, Image and Language},
  publisher   = {arXiv},
  year        = {2022},
}

x-decoder's People

Contributors

eltociear avatar jwyang avatar maureenzou avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.