Giter Site home page Giter Site logo

zjukg / duet Goto Github PK

View Code? Open in Web Editor NEW
41.0 4.0 8.0 7.82 MB

[Paper][AAAI 2023] DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning

Home Page: https://arxiv.org/abs/2207.01328

License: MIT License

Shell 2.06% Python 97.94%
pretrained-language-model pytorch transformer zero-shot-learning cross-modal grounding semantic knowledge-transfer visual-grounding

duet's Introduction

DUET

license arxiv badge AAAI Pytorch

In this paper, we present a transformer-based end-to-end ZSL method named DUET, which integrates latent semantic knowledge from the pre-trained language models (PLMs) via a self-supervised multi-modal learning paradigm. Specifically, we (1) developed a cross-modal semantic grounding network to investigate the model's capability of disentangling semantic attributes from the images; (2) applied an attribute-level contrastive learning strategy to further enhance the model's discrimination on fine-grained visual characteristics against the attribute co-occurrence and imbalance; (3) proposed a multi-task learning policy for considering multi-model objectives.

  • Due to the page and format restrictions set by AAAI publications, we have omitted some details and appendix content. For the complete version of the paper, including the selection of prompts and experiment details, please refer to our arXiv version.

πŸ”” News

πŸ€– Model Architecture

Model_architecture

πŸ“š Dataset Download

  • The cache data for (CUB, AWA, SUN) are available here (Baidu cloud, 19.89G, Code: s07d).

πŸ“• Code Path

Code Structures

There are four parts in the code.

  • model: It contains the main files for DUET network.
  • data: It contains the data splits for different datasets.
  • cache: It contains some cache files.
  • script: The training scripts for DUET.
DUET
β”œβ”€β”€ cache
β”‚Β Β  β”œβ”€β”€ AWA2
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ attributeindex2prompt.json
β”‚Β Β  β”‚Β Β  └── id2imagepixel.pkl
β”‚Β Β  β”œβ”€β”€ CUB
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ attributeindex2prompt.json
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ id2imagepixel.pkl
β”‚Β Β  β”‚Β Β  └── mapping.json
β”‚Β Β  └── SUN
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ attributeindex2prompt.json
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ id2imagepixel.pkl
β”‚Β Β  β”‚Β Β  └── mapping.json
β”œβ”€β”€ data
β”‚Β Β  β”œβ”€β”€ AWA2
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ APN.mat
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ TransE_65000.mat
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ att_splits.mat
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ attri_groups_9.json
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ kge_CH_AH_CA_60000.mat
β”‚Β Β  β”‚Β Β  └── res101.mat
β”‚Β Β  β”œβ”€β”€ CUB
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ APN.mat
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ att_splits.mat
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ attri_groups_8.json
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ attri_groups_8_layer.json
β”‚Β Β  β”‚Β Β  └── res101.mat
β”‚Β Β  └── SUN
β”‚Β Β      β”œβ”€β”€ APN.mat
β”‚Β Β      β”œβ”€β”€ att_splits.mat
β”‚Β Β      β”œβ”€β”€ attri_groups_4.json
β”‚Β Β      └── res101.mat
β”œβ”€β”€ log
β”‚Β Β  β”œβ”€β”€ AWA2
β”‚Β Β  β”œβ”€β”€ CUB
β”‚Β Β  └── SUN
β”œβ”€β”€ model
β”‚Β Β  β”œβ”€β”€ log.py
β”‚Β Β  β”œβ”€β”€ main.py
β”‚Β Β  β”œβ”€β”€ main_utils.py
β”‚Β Β  β”œβ”€β”€ model_proto.py
β”‚Β Β  β”œβ”€β”€ modeling_lxmert.py
β”‚Β Β  β”œβ”€β”€ opt.py
β”‚Β Β  β”œβ”€β”€ swin_modeling_bert.py
β”‚Β Β  β”œβ”€β”€ util.py
β”‚Β Β  └── visual_utils.py
β”œβ”€β”€ out
β”‚Β Β  β”œβ”€β”€ AWA2
β”‚Β Β  β”œβ”€β”€ CUB
β”‚Β Β  └── SUN
└── script
    β”œβ”€β”€ AWA2
    β”‚Β Β  └── AWA2_GZSL.sh
    β”œβ”€β”€ CUB
    β”‚Β Β  └── CUB_GZSL.sh
    └── SUN
        └── SUN_GZSL.sh

πŸ”¬ Dependencies

  • Python 3
  • PyTorch >= 1.8.0
  • Transformers>= 4.11.3
  • NumPy
  • All experiments are performed with one RTX 3090Ti GPU.

🎯 Prerequisites

  • Dataset: please download the dataset, i.e., CUB, AWA2, SUN, and change the opt.image_root to the dataset root path on your machine
    • ❗NOTE: For other required feature files like APN.mat and id2imagepixel.pkl, please refer to here.
  • Data split: please download the data folder and place it in ./data/.
  • Attributeindex2prompt.json should generate and place it in ./cache/dataset/.
  • Download pretrained vision Transformer as the vision encoder:

πŸš€ Train & Eval

The training script for AWA2_GZSL:

bash script/AWA2/AWA2_GZSL.sh
[--dataset {AWA2, SUN, CUB}] [--calibrated_stacking CALIBRATED_STACKING] [--nepoch NEPOCH] [--batch_size BATCH_SIZE] [--manualSeed MANUAL_SEED]
[--classifier_lr LEARNING-RATE] [--xe XE] [--attri ATTRI] [--gzsl] [--patient PATIENT] [--model_name MODEL_NAME] [--mask_pro MASK-PRO] 
[--mask_loss_xishu MASK_LOSS_XISHU] [--xlayer_num XLAYER_NUM] [--construct_loss_weight CONSTRUCT_LOSS_WEIGHT] [--sc_loss SC_LOSS] [--mask_way MASK_WAY]
[--attribute_miss ATTRIBUTE_MISS]

πŸ“Œ Note:

  • you can open the .sh file for parameter modification.
  • Don't worry if you have any question. Just feel free to let we know via Adding Issues.

🀝 Cite:

Please consider citing this paper if you use the code or data from our work. Thanks a lot :)

@inproceedings{DBLP:conf/aaai/ChenHCGZFPC23,
  author       = {Zhuo Chen and
                  Yufeng Huang and
                  Jiaoyan Chen and
                  Yuxia Geng and
                  Wen Zhang and
                  Yin Fang and
                  Jeff Z. Pan and
                  Huajun Chen},
  title        = {{DUET:} Cross-Modal Semantic Grounding for Contrastive Zero-Shot Learning},
  booktitle    = {{AAAI}},
  pages        = {405--413},
  publisher    = {{AAAI} Press},
  year         = {2023}
}

Flag Counter

duet's People

Contributors

bighyf avatar guspan-tanadi avatar hackerchenzhuo avatar jethrojames avatar wencolani avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

duet's Issues

How to get pretrained weight for BertTokenizer?

When I was doing experiments on the CUB dataset, I found that the pre-training weight of BertTokenizer needs to be loaded in the main.py. How can I get the pre-training weight?
if opt.dataset == "CUB": mask_for_predict = torch.zeros(30522) tokenizer = BertTokenizer.from_pretrained("/home/hyf/data/PLMs/bert-base-uncased", do_lower_case=True)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.