About the details of visual abastractor about mplug-owl HOT 3 CLOSED

x-plug commented on May 19, 2024

About the details of visual abastractor

from mplug-owl.

Comments (3)

yuqi657 commented on May 19, 2024

Could you give a detailed description (in text) of the implementation of your visual abstractor?

from mplug-owl.

LukeForeverYoung commented on May 19, 2024

We put the query_embed in mPLUG_OwlModel and pass it to Visual Abstractor during forward. The implement of Visual Abstractor is similar to the Perceiver in Flamingo, except that we use FFNs the same as LLAMA.
Referred to mPLUG and mPLUG-2, we apply abstractor to reduce the length of token length and help model to learn visual knowledge in language space.

from mplug-owl.

MAGAer13 commented on May 19, 2024

First of all, thanks for your great work. From the paper, I see learnable queries in visual abastractor. I think it may be similar to Perceiver in Flamingo or Q-Former in BLIP-2. But I don't find the implementation in your code about learnable queries (mPLUG_OwlVisualAbstractorEncoder and mPLUG_OwlVisualAbstractorModel in modeling_mplug_owl.py). I am curious about the details of visual abastractor. In other words, is it seems to Q-Former or Perceiver? The details do not contain in your paper and I cannot find in the code. Thanks again.

Hi, just for additional claim. The aim of visual abstractor is to reduce the number of patches for images which would result in a large number of token (256 for ViT/L-14 with 224x224 resolution) for the LLM. The maximum token for LLMs such as LLaMA, Bloom are 2048 where 256 is relatively large number for it. However, it did not happen to flamingo since it utilizes cross-attention. So the purpose is different. Besides, since we want to learn some useful features such as region or object features from the image, as practiced by mPLUG-2, which also leverages similar idea and verified by the visualization of attention map for the learnable queries.

from mplug-owl.

Recommend Projects

About the details of visual abastractor about mplug-owl HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent