zoky-2020 / sga Goto Github PK

Set-level Guidance Attack: Boosting Adversarial Transferability of Vision-Language Pre-training Models. [ICCV 2023 Oral]

Home Page: https://arxiv.org/abs/2307.14061

License: MIT License

Python 100.00%

adversarial-attack vision-language-pretraining

sga's Introduction

👋 Hi, I’m Zhiqiang WANG
👀 I’m interested in Trustworthy Machine Learning.

sga's People

Contributors

Stargazers

Watchers

Forkers

susie-ku yikepolimi

sga's Issues

老师您好

老师，读过您的论文之后，我有一个困惑，我在我爱计算机视觉这个公众号上，看到对您的方法描述是“在迭代优化对抗图像和对抗文本的过程，该策略逐步拉远图像和文本在特征空间中的距离，从而破坏跨模态交互，达到攻击效果。”但是这个描述，我没有在您的论文中找到，可以问下您在论文的哪个部分有提到这样的描述嘛？

Python script for ALBEF to CLIP_CNN

Hi,
Thank you for great work.

Could please provide script for ALBEF--to--CLIP_CNN attack.

To check transferability from ALBEF to CLIP_CNN, I replaced target_model from ViT-B/16 to RN101 in python script eval_albef2clip-vit_flickr.py , and i got following scores which are different from the ones reported in paper (Table 2, last column for SGA). Could you please clarify the anomaly or I missed something.

ALBEF to CLIP-CNN

	TR R@1	IR R@1
Paper	34.93	46.57
Reproduced	40.12	51.42

设备问题

你好，我想问一下代码是在什么设备上跑的，单卡能跑起来吗，我试了一下跑albef2clip-vit爆显存

Request the code for Sep-Attack and Co-Attack

Hi，Zhiqiang @Zoky-2020
I am encountering some issues while replicating Sep-Attack and Co-Attack for transferability validation. Could you please share the relevant code? Thank you very much.

how to generate most matching text set

thanks for sharing your great work.

I am confused about this step
"we select the most matching caption pairs from the dataset of each image v to form an augmented caption set t ={t1, t2, ..., tM }"

because the dataset only gives single image-text pair, how could you find multiple matched texts for an single image

Parameter name error during invocation: 'max_length' and 'max_lemgth'

I think this parameter will affect the final experimental results.

Reproducibility of Visual Grounding Results (Table-5)

Hi @Zoky-2020
Thanks for responding to my previous issues.

I need a few clarifications regarding Table-5 results.

Upon inspection of Refcoco+ dataset, I found out that refcoco+_test.json and refcoco+_val.json contain paths of images from train set of MSCOCO. I created a json file consisting of paths of these train images (along with captions) and then generated adversarial images by attacking ALBEF model.
Afterwards, I performed evaluation using Grounding.py. I ensured that dataset class loads adversarial images during evaluation by modifying image paths in __getitem__ of grounding.dataset.py.

I obtained following results which are not close to the ones reported in paper. Could you please comment if I missed something while reproducing Tabe-5.

	Val	TestA	TestB
Baseline	58.46	65.89	46.25
Co-Attack	54.26	61.80	43.81
SGA (in paper)	53.55	61.19	43.71
SGA (reproduced)	56.70	63.60	44.90

zoky-2020 / sga Goto Github PK

sga's Introduction

sga's People

Contributors

Stargazers

Watchers

Forkers

sga's Issues

老师您好

Python script for ALBEF to CLIP_CNN

设备问题

Request the code for Sep-Attack and Co-Attack

how to generate most matching text set

Parameter name error during invocation: 'max_length' and 'max_lemgth'

Reproducibility of Visual Grounding Results (Table-5)

specific method for generate corresponding adversarial captions

Corss Task Transferability - python scripts

CLIP(CNN)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent