Giter Site home page Giter Site logo

feielysia / viecap Goto Github PK

View Code? Open in Web Editor NEW
137.0 2.0 4.0 40.95 MB

Transferable Decoding with Visual Entities for Zero-Shot Image Captioning, ICCV 2023

Home Page: https://openaccess.thecvf.com/content/ICCV2023/html/Fei_Transferable_Decoding_with_Visual_Entities_for_Zero-Shot_Image_Captioning_ICCV_2023_paper.html

Python 80.56% Shell 2.51% Jupyter Notebook 16.93%
transferability vision-language-model object-hallucination zero-shot-captioning modality-biases

viecap's People

Contributors

feielysia avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

viecap's Issues

代码写的真好

可读性很强,作为模板学习,感谢休伯利安的舰长程序员hhh

about infer by batch

Congrats on your paper being accepted by iccv 2023!
looking at the infer_by_batch.py file in your source code, I don't seem to see the inputs that use batch data, or am I just being careless and not understanding?

some question about 'ClipCaptionPrefix'

Thank you for sharing this exciting work! The code and comments are pretty standard and I really learned a lot from it.

I would like to know: Does the hyperparameter frozen_gpt mean freezing the whole gpt model during training, I notice that the ClipCaptionPrefix code is as follows:

class ClipCaptionPrefix(ClipCaptionModel):

    def parameters(self, recurse: bool = True):
        return self.mapping_network.parameters()

    def train(self, mode: bool = True):
        super(ClipCaptionPrefix, self).train(mode)
        self.gpt.eval()
        return self

I think gpt.eval() just stop the Batch Normalization and Dropout module, and I print the params of gpt2 after set frozen_gpt=True, which goes as follows:

for name, param in model.gpt.named_parameters():
  print(name, ":", param.requires_grad)

the output:

transformer.wte.weight : True
transformer.wpe.weight : True
transformer.h.0.ln_1.weight : True
...

So im wondering whether the whole gpt2 model is frozen,or just the BN and Dropout layer.

Thanks in advance!

GPT2的预训练模型加载问题

尊敬的作者您好,我在运行train_coco.sh时遇到如下问题,请问该如何解决呢?
我想应该是加载GPT2预训练模型权重的问题,我搜索尝试了很多方法但都无效,希望您能解答一下,谢谢!
Traceback (most recent call last):
File "main.py", line 168, in
main()
File "main.py", line 152, in main
datasets = CaptionsDataset(
File "/private/ViECap-main/CaptionsDataset.py", line 31, in init
tokenizer = AutoTokenizer.from_pretrained(language_model)
File "/root/anaconda3/envs/Viecap/lib/python3.8/site-packages/transformers/models/auto/tokenization_auto.py", line 498, in from_pretrained
tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs)
File "/root/anaconda3/envs/Viecap/lib/python3.8/site-packages/transformers/models/auto/tokenization_auto.py", line 359, in get_tokenizer_config
resolved_config_file = get_file_from_repo(
File "/root/anaconda3/envs/Viecap/lib/python3.8/site-packages/transformers/utils/hub.py", line 678, in get_file_from_repo
resolved_file = cached_path(
File "/root/anaconda3/envs/Viecap/lib/python3.8/site-packages/transformers/utils/hub.py", line 282, in cached_path
output_path = get_from_cache(
File "/root/anaconda3/envs/Viecap/lib/python3.8/site-packages/transformers/utils/hub.py", line 545, in get_from_cache
raise ValueError(
ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.