Giter Site home page Giter Site logo

clip's Introduction

CLIP

Collect recent papers on CLIP and prompt learning

core thinking

  • Why CLIP-prior works well?
    • CLIP learns relationships between vision and language from 400 million text-image pairs.
  • How to transfer?
    • Zero-shot transfer
    • Prompt learning

Seminal Work

Title Year Venue Code Notes
Learning Transferable Visual Models From Natural Language Supervision 2021 ICML Link Link

Improved Work

Title Year Venue Code
Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm 2022 ICLR Link
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation 2022 Arxiv Link
SLIP: Self-supervision meets Language-Image Pre-training 2022 Arxiv Link

Applications

Image Classification

Title Year Venue Code
Learning to Prompt for Vision-Language Models 2021 Arxiv Link
Neural Prompt Search 2022 Arxiv None
Prompt Distribution Learning 2022 CVPR None
Conditional Prompt Learning for Vision-Language Models 2022 CVPR Link
CLIP-Adapter: Better Vision-Language Models with Feature Adapters 2022 Arxiv Link
Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling 2022 Arxiv Link
Unsupervised Prompt Learning for Vision-Language Models 2022 Arxiv Link

Detection

Title Year Venue Code Notes
Learning to Prompt for Open-Vocabulary Object Detection with Vision-Language Model 2022 CVPR Link Link
DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting 2022 CVPR Link Link

Image/Video translation/editing

Title Year Venue Code
HairCLIP: Design Your Hair by Text and Reference Image 2022 CVPR Link
FlexIT: Towards Flexible Semantic Image Translation 2022 CVPR None
VQGAN-CLIP: Open Domain Image Generation and Editing with Natural Language Guidance 2022 Arxiv Link

Image/Video Understanding

Title Year Venue Code
Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos 2022 CVPR Link

Visual Grounding

Title Year Venue Code
ClipCap: CLIP Prefix for Image Captioning 2021 Arxiv Link
CPT: Colorful prompt tuning for pre-trained vision-language models 2021 Arxiv None
ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension 2022 ACL None

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.