Giter Site home page Giter Site logo

Questions about ptp about ptp HOT 8 OPEN

sail-sg avatar sail-sg commented on August 26, 2024
Questions about ptp

from ptp.

Comments (8)

nqx12348 avatar nqx12348 commented on August 26, 2024 3

Hi @FingerRec , I downloaded the original BLIP checkpoint trained on 14M data then performed zero-shot testing on COCO, and get the following result.
image
It seems the result is very close to BLIP-ptp trained on 4M data, and much higher than BLIP trained on 4M data (according to the number provided in the paper). The performance gap between models trained on 4M and 14M pretraining data is quite surprising.
image
Could you kindly release the BLIP checkpoint trained on 4M data (without using ptp) for comparison? So we can do more experiments to evaluate the effectiveness of ptp. Thanks!

from ptp.

nqx12348 avatar nqx12348 commented on August 26, 2024 1

hi NQX1248: Thanks for your good question.

  1. The prepartion of pretraining corpus follow OSCAR (https://github.com/microsoft/Oscar/blob/master/VinVL_DOWNLOAD.md). All setting keep consistent. It's an common pratcie.
  2. Yes, I have same observation! PTP rely highly on the quality of object tags . I previously focus on video-language pre-training and observe its hard to introduce object information during fine-tuning stage (like OA-Trans). The same as you, I also try toincorprate PTP into fine-tuning for common VL tasks not help. Its best to introduce in pre-training stage. In addition, for pure fine-tuning setting. But maybe you can try two experiments: a. 50% probality use PTP. 2. Incorporate in the pretrain dataset like webvid & cc3m.

I'd like to see if PTP will helps in video-language tasks. Looking forward to further communcation.

Thanks for your reply! I'm considering experiments on WebVid. Have you tried pretraining with ptp on WebVid and evaluating on downstream datasets, e.g., msrvtt? I also notice that you use object information in OA-Trans. Will you release extracted object tags and bboxes of WebVid?

from ptp.

nqx12348 avatar nqx12348 commented on August 26, 2024 1

Thanks for your detailed explanation! Here are my zero-shot testing logs. The checkpoints are pretrained checkpoint and coco zero-shot checkpoint, respectively.
pretrained_concated_pred_4m.log
coco_zero_shot.log

from ptp.

FingerRec avatar FingerRec commented on August 26, 2024

hi NQX1248:
Thanks for your good question.

  1. The prepartion of pretraining corpus follow OSCAR (https://github.com/microsoft/Oscar/blob/master/VinVL_DOWNLOAD.md). All setting keep consistent. It's an common pratcie.
  2. Yes, I have same observation! PTP rely highly on the quality of object tags . I previously focus on video-language pre-training and observe its hard to introduce object information during fine-tuning stage (like OA-Trans). The same as you, I also try toincorprate PTP into fine-tuning for common VL tasks not help. Its best to introduce in pre-training stage. In addition, for pure fine-tuning setting. But maybe you can try two experiments: a. 50% probality use PTP. 2. Incorporate in the pretrain dataset like webvid & cc3m.

I'd like to see if PTP will helps in video-language tasks. Looking forward to further communcation.

from ptp.

FingerRec avatar FingerRec commented on August 26, 2024

I do not explore PTP on Video-text Task. It should be work. Previously I save object feature and tags together in numpy file and it takes 10T space. Since I have already offboarding and have no access for these data again, you may need to follow https://github.com/FingerRec/OA-Transformer/blob/main/object_extraction.md for extraction.

from ptp.

nqx12348 avatar nqx12348 commented on August 26, 2024

Thanks for your response. I'm still confused about zero-shot setting on COCO.

  1. By comparing relased logs in this repo, I find 4M_ptp_coco_zero_shot.txt is completely the same as 4M_ptp_coco_ft.txt. Why? Does the model need to be trained during zero-shot testing on COCO? I notice there's no training process in zero-shot testing for Flickr30k.
  2. I also find two checkpoints (pretrained checkpoint and coco zero-shot checkpoint) , but according to my understanding, zero-shot testing on COCO need no extra training procedure, so these two checkpoints should be the same. What's the difference between them? I notice there's no checkpoint for zero-shot flickr30k.
  3. Given the above two questions, I'm a bit confused about the definition of zero-shot retrieval task. In my opinion it means pretraining on a number of large datasets and testing on a new dataset (that is not used in pretraining), without finetuning. But in ptp and ViLT I find COCO is used in the 4M training set, as well as in "zero-shot" testing. Is this allowed in the "zero-shot" setting? I read paper of Oscar and ViLT, but still don't find the answer. Could you kindly explain it? Thanks!

from ptp.

FingerRec avatar FingerRec commented on August 26, 2024
  1. Thanks a lot nqx. Yes, I missed to upload coco_zero_shot.txt. You are correct, zero-shot mean directly test without tuning. You can find the log is for fine-tuning rather than zero-shot, the performance is much higher than zero-shot. I'm looking for the zero-shot file, or could you put your test result here?
  2. You are correct. In general, the pre-training include multiple checkpoints and the zero-shot result have small difference. Select the checkpoint that performs best.
  3. I agree with you. Main reason is the lack of high-quality dataset and history reason. The conventional dcataset like cc and yfcc are quite noisy. So previous work like OSCAR introduce human-crafted coco,vg to help pre-training. All works follow their setting and actually there have misleading here. Although the down-stream tasks usually test on val/test set, but their still come from the same domain (dataset). In image classification or DA, the zero-shot setting should not include the data from same domain.

from ptp.

FingerRec avatar FingerRec commented on August 26, 2024

Cool, I will upload the log you provided.

from ptp.

Related Issues (10)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.