Could please provide script for ALBEF--to--CLIP_CNN attack.
To check transferability from ALBEF to CLIP_CNN, I replaced target_model from ViT-B/16 to RN101 in python script eval_albef2clip-vit_flickr.py , and i got following scores which are different from the ones reported in paper (Table 2, last column for SGA). Could you please clarify the anomaly or I missed something.
Hi,Zhiqiang @Zoky-2020
I am encountering some issues while replicating Sep-Attack and Co-Attack for transferability validation. Could you please share the relevant code? Thank you very much.
I am confused about this step
"we select the most matching caption pairs from the dataset of each image v to form an augmented caption set t ={t1, t2, ..., tM }"
because the dataset only gives single image-text pair, how could you find multiple matched texts for an single image
Hi @Zoky-2020
Thanks for responding to my previous issues.
I need a few clarifications regarding Table-5 results.
Upon inspection of Refcoco+ dataset, I found out that refcoco+_test.json and refcoco+_val.json contain paths of images from train set of MSCOCO. I created a json file consisting of paths of these train images (along with captions) and then generated adversarial images by attacking ALBEF model.
Afterwards, I performed evaluation using Grounding.py. I ensured that dataset class loads adversarial images during evaluation by modifying image paths in __getitem__ of grounding.dataset.py.
I obtained following results which are not close to the ones reported in paper. Could you please comment if I missed something while reproducing Tabe-5.
thanks for your great work
but, I have a question for the specific method for generate corresponding adversarial captions, is used bert-attack???
and the last step to generate corresponding adversarial captions
Hi, thanks for the great work.
Could you please provide script (and instructions) for cross task transferability (ITR-to-IC and ITR-VG) to reproduce the results of Table-4 and Table-5 of arxive paper.
Thanks!