In vqa task, I want to input two images and ask a question about the two images,how to

Here are the two images I passed <a target="_blank" rel="noopener noreferrer" href

how to realize multi-image correlation in vqa task? about mplug-owl HOT 4 OPEN

fansticOne commented on June 2, 2024

how to realize multi-image correlation in vqa task?

from mplug-owl.

Comments (4)

LukeForeverYoung commented on June 2, 2024

You can pass a list of images and place the same number of "<|image|>" in your prompt.

from mplug-owl.

fansticOne commented on June 2, 2024

I pass a list of images, say 2 images, and modify the prompt. The image_tensor after preprocess has batch size of 2, while the input_ids has batch size of 1,then I run model.generate(), I do get a result, however the result is wrong. Do I misunderstand?

from mplug-owl.

LukeForeverYoung commented on June 2, 2024

I pass a list of images, say 2 images, and modify the prompt. The image_tensor after preprocess has batch size of 2, while the input_ids has batch size of 1,then I run model.generate(), I do get a result, however the result is wrong. Do I misunderstand?

Could you provide an example and the incorrect response generated by the owl? Btw, the owl has not been trained on SFT data that includes multiple images. Therefore, it is reasonable to expect that it might fail in some cases.

from mplug-owl.

fansticOne commented on June 2, 2024

Here are the two images I passed

the prompt is
'USER: <|image|><|image|>{}\nAnswer the question using a single word or phrase. ASSISTANT:'.format('Does the dog in the first picture have same color with the dog in the second picture?')
the response generated by the owl is 'Yes'

from mplug-owl.

how to realize multi-image correlation in vqa task? about mplug-owl HOT 4 OPEN

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent