ytongbai / lvm Goto Github PK

View Code? Open in Web Editor NEW

1.7K 1.7K 51.0 9.21 MB

License: Apache License 2.0

Python 97.30% Shell 2.70%

lvm's People

Contributors

Stargazers

Watchers

Forkers

laoyangui aijia000 tukjet xiatianliang1024gm oztc frankgty suilin0432 2132660698 shanhedian2017 runngezhang kpup1710 robathanjames jingx8885 jundsinfo lishy66 cairuinhan bankxi hhy5277 luhaozhu tonyonst-t dekkolo-r 43prorhub mowervie cogentri33 a-flavoredbubble desksound39darisun udogger ytytyd sovy-i s-spoiledic comfyrdyni romancexoxox-p curiousli50 itselgispreamha anyaoha shirohige devomark cecyliameiden evelynmitchell jaedukseo snoopybingo jiafengttang glomquyet cuiwenbing songym2020 robotseye mindhome mindhome-inc strategist922 hanchunrui

lvm's Issues

release training data

Interesting work, do you have any plans to release the training data?

Question about inference details

Hi, yutong, firstly, very impressive work!
I have a question about the number of tokens generated in each inference step. Does LVM a) auto-regressively produces tokens one-by-one like normal LLM and then each 256 tokens are partitioned and grouped to decode an image? Or, b) directly generates 256 tokens in one step inference?

Huggingface transformers support

Hello,

Congratulations for the great work! Do you think it is possible to add the model to Huggingface transformers?

Are you planning on doing it?

Thanks a lot and looking forward to see the code!

Question of Fig. 13 in the Paper

Greetings,

Thanks for your awesome work! This work shed light on the AGI sparks with pure-vision models! Reading your paper, I found Fig. 13 about IQ testing amazing. Does the training set include similar images, i.e., IQ testing images? That is an important question about the reliability of AGI for this paper.

question about the data

P3，“……For datasets containing k different annotations for the same image we use a different approach: for each set of 1 + k images (input plus k annotations), we randomly select m elements, where m ≤ n + 1 ≤ 16. These m-tuples are then concatenated to form visual sequences”. What is 'n' here?
The ratio of colorization and mask about ImageNet can be randomly set？

Consider hosting the models and datasets in GitHub directly using XetData add-on

Hey there! I work at XetHub and we built a GitHub app that scales GitHub repos to handle large files (upto 100 TB). No cost for public repos.

Here's an example where we hosted a bunch of .onxx model files in the Git repo: onnx/models#632 (comment)

This way, your local folder doesn't need a .gitignore file and the GitHub state matches your local state since you can just keep your large files in the same folder. I'd be happy to coordinate and make this happen

Support！ Hoping for the emergence of a real large vision model!

Thanks for Interesting work

Thank you for your great work! Perhaps this will be the prototype for GPT-style scalable modeling in the visual domain.

Inquiry Regarding Release Timeline for Code, Models, and Datasets

Dear Authors,

I hope this message finds you well. I recently came across your repository while searching for scalable solutions for large vision models and was thoroughly impressed by the novel approach of using "visual sentences" for sequential modeling as detailed in your paper.

The concept of representing a diverse array of visual data as sequences and training models to predict the next token is quite intriguing. I am particularly interested in the potential applications of your Large Vision Model (LVM) for various vision tasks using visual prompts.

I understand from the README that the code, models, and datasets are being prepared for release. Could you kindly provide an estimated timeline for when these resources might become available? Access to these materials would be invaluable for researchers and practitioners alike who are eager to explore and build upon your work.

Additionally, if there is any possibility to access a pre-release or if there are any beta versions available for early testing, I would be highly interested in participating and providing feedback.

Thank you for your groundbreaking work in this field, and I look forward to your response.

Best regards,
yihong1120