Comments (3)
Could you give a detailed description (in text) of the implementation of your visual abstractor?
from mplug-owl.
We put the query_embed
in mPLUG_OwlModel
and pass it to Visual Abstractor during forward. The implement of Visual Abstractor is similar to the Perceiver in Flamingo, except that we use FFNs the same as LLAMA.
Referred to mPLUG and mPLUG-2, we apply abstractor to reduce the length of token length and help model to learn visual knowledge in language space.
from mplug-owl.
First of all, thanks for your great work. From the paper, I see learnable queries in visual abastractor. I think it may be similar to Perceiver in Flamingo or Q-Former in BLIP-2. But I don't find the implementation in your code about learnable queries (mPLUG_OwlVisualAbstractorEncoder and mPLUG_OwlVisualAbstractorModel in modeling_mplug_owl.py). I am curious about the details of visual abastractor. In other words, is it seems to Q-Former or Perceiver? The details do not contain in your paper and I cannot find in the code. Thanks again.
Hi, just for additional claim. The aim of visual abstractor is to reduce the number of patches for images which would result in a large number of token (256 for ViT/L-14 with 224x224 resolution) for the LLM. The maximum token for LLMs such as LLaMA, Bloom are 2048 where 256 is relatively large number for it. However, it did not happen to flamingo since it utilizes cross-attention. So the purpose is different. Besides, since we want to learn some useful features such as region or object features from the image, as practiced by mPLUG-2, which also leverages similar idea and verified by the visualization of attention map for the learnable queries.
from mplug-owl.
Related Issues (20)
- The code and detailed implementation of Figure 4 and Figure 5 in the paper mPLUG-Owl2 HOT 1
- Is there model checkpoint for multi-language(mainly chinese) videos?
- 对图像进行坐标检测,生成的bbox是resize成正方形之后的值吗? HOT 5
- how to realize multi-image correlation in vqa task? HOT 4
- file configuration_qwen.py not found HOT 1
- No module named 'mplug_owl2.model.multiway HOT 1
- The updated owl2 based on qwen14b? HOT 1
- other downstream tasks available? Like Visual Reasoning, requires the model to predict whether a sentence describes a pair of images HOT 1
- Cannot run inference
- Issue with gradio webui
- cls_token problem with image. HOT 3
- Zero3 train: Invalidate trace cache @ step 391: expected module 25, but got module 5
- Is there any code for fine-tuning the video model?
- QuickStart Code for mplug_owl2.1 has lots of errors. HOT 3
- Please can you split the model into 4GB chunks rather than 1 x 16GB. SafeTensors would be a nice addition also.
- 为什么输出全是英文
- How to do few-shot learning or in-context learning with mPLUG-Owl2?
- ModuleNotFoundError: No module named 'transformers_modules.mPLUG-Owl2'
- mPLUG-Owl2.1输出全是英文
- mplug-owl2-llama2-7b initialization error
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mplug-owl.