Hi, Congratulations on the great success of your wonderful work! I have several q

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Thanks a lot nqx. Yes, I missed to upload coco_zero_shot.txt. You are correct, z

Questions about ptp about ptp HOT 8 OPEN

sail-sg commented on August 26, 2024

Questions about ptp

from ptp.

Comments (8)

nqx12348 commented on August 26, 2024 3

Hi @FingerRec , I downloaded the original BLIP checkpoint trained on 14M data then performed zero-shot testing on COCO, and get the following result.

It seems the result is very close to BLIP-ptp trained on 4M data, and much higher than BLIP trained on 4M data (according to the number provided in the paper). The performance gap between models trained on 4M and 14M pretraining data is quite surprising.

Could you kindly release the BLIP checkpoint trained on 4M data (without using ptp) for comparison? So we can do more experiments to evaluate the effectiveness of ptp. Thanks!

from ptp.

nqx12348 commented on August 26, 2024 1

hi NQX1248: Thanks for your good question.

The prepartion of pretraining corpus follow OSCAR (https://github.com/microsoft/Oscar/blob/master/VinVL_DOWNLOAD.md). All setting keep consistent. It's an common pratcie.

Yes, I have same observation! PTP rely highly on the quality of object tags . I previously focus on video-language pre-training and observe its hard to introduce object information during fine-tuning stage (like OA-Trans). The same as you, I also try toincorprate PTP into fine-tuning for common VL tasks not help. Its best to introduce in pre-training stage. In addition, for pure fine-tuning setting. But maybe you can try two experiments: a. 50% probality use PTP. 2. Incorporate in the pretrain dataset like webvid & cc3m.

I'd like to see if PTP will helps in video-language tasks. Looking forward to further communcation.

Thanks for your reply! I'm considering experiments on WebVid. Have you tried pretraining with ptp on WebVid and evaluating on downstream datasets, e.g., msrvtt? I also notice that you use object information in OA-Trans. Will you release extracted object tags and bboxes of WebVid?

from ptp.

nqx12348 commented on August 26, 2024 1

Thanks for your detailed explanation! Here are my zero-shot testing logs. The checkpoints are pretrained checkpoint and coco zero-shot checkpoint, respectively.
pretrained_concated_pred_4m.log
coco_zero_shot.log

from ptp.

FingerRec commented on August 26, 2024

hi NQX1248:
Thanks for your good question.

The prepartion of pretraining corpus follow OSCAR (https://github.com/microsoft/Oscar/blob/master/VinVL_DOWNLOAD.md). All setting keep consistent. It's an common pratcie.
Yes, I have same observation! PTP rely highly on the quality of object tags . I previously focus on video-language pre-training and observe its hard to introduce object information during fine-tuning stage (like OA-Trans). The same as you, I also try toincorprate PTP into fine-tuning for common VL tasks not help. Its best to introduce in pre-training stage. In addition, for pure fine-tuning setting. But maybe you can try two experiments: a. 50% probality use PTP. 2. Incorporate in the pretrain dataset like webvid & cc3m.

I'd like to see if PTP will helps in video-language tasks. Looking forward to further communcation.

from ptp.

FingerRec commented on August 26, 2024

I do not explore PTP on Video-text Task. It should be work. Previously I save object feature and tags together in numpy file and it takes 10T space. Since I have already offboarding and have no access for these data again, you may need to follow https://github.com/FingerRec/OA-Transformer/blob/main/object_extraction.md for extraction.

from ptp.

nqx12348 commented on August 26, 2024

Thanks for your response. I'm still confused about zero-shot setting on COCO.

By comparing relased logs in this repo, I find 4M_ptp_coco_zero_shot.txt is completely the same as 4M_ptp_coco_ft.txt. Why? Does the model need to be trained during zero-shot testing on COCO? I notice there's no training process in zero-shot testing for Flickr30k.
I also find two checkpoints (pretrained checkpoint and coco zero-shot checkpoint) , but according to my understanding, zero-shot testing on COCO need no extra training procedure, so these two checkpoints should be the same. What's the difference between them? I notice there's no checkpoint for zero-shot flickr30k.
Given the above two questions, I'm a bit confused about the definition of zero-shot retrieval task. In my opinion it means pretraining on a number of large datasets and testing on a new dataset (that is not used in pretraining), without finetuning. But in ptp and ViLT I find COCO is used in the 4M training set, as well as in "zero-shot" testing. Is this allowed in the "zero-shot" setting? I read paper of Oscar and ViLT, but still don't find the answer. Could you kindly explain it? Thanks!

from ptp.

FingerRec commented on August 26, 2024

Thanks a lot nqx. Yes, I missed to upload coco_zero_shot.txt. You are correct, zero-shot mean directly test without tuning. You can find the log is for fine-tuning rather than zero-shot, the performance is much higher than zero-shot. I'm looking for the zero-shot file, or could you put your test result here?
You are correct. In general, the pre-training include multiple checkpoints and the zero-shot result have small difference. Select the checkpoint that performs best.
I agree with you. Main reason is the lack of high-quality dataset and history reason. The conventional dcataset like cc and yfcc are quite noisy. So previous work like OSCAR introduce human-crafted coco,vg to help pre-training. All works follow their setting and actually there have misleading here. Although the down-stream tasks usually test on val/test set, but their still come from the same domain (dataset). In image classification or DA, the zero-shot setting should not include the data from same domain.

from ptp.

FingerRec commented on August 26, 2024

Cool, I will upload the log you provided.

from ptp.

Questions about ptp about ptp HOT 8 OPEN

Comments (8)

Related Issues (10)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent