ytongbai / lvm Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
Interesting work, do you have any plans to release the training data?
Hi, yutong, firstly, very impressive work!
I have a question about the number of tokens generated in each inference step. Does LVM a) auto-regressively produces tokens one-by-one like normal LLM and then each 256 tokens are partitioned and grouped to decode an image? Or, b) directly generates 256 tokens in one step inference?
Hello,
Congratulations for the great work! Do you think it is possible to add the model to Huggingface transformers?
Are you planning on doing it?
Thanks a lot and looking forward to see the code!
Greetings,
Thanks for your awesome work! This work shed light on the AGI sparks with pure-vision models! Reading your paper, I found Fig. 13 about IQ testing amazing. Does the training set include similar images, i.e., IQ testing images? That is an important question about the reliability of AGI for this paper.
Hey there! I work at XetHub and we built a GitHub app that scales GitHub repos to handle large files (upto 100 TB). No cost for public repos.
Here's an example where we hosted a bunch of .onxx model files in the Git repo: onnx/models#632 (comment)
This way, your local folder doesn't need a .gitignore
file and the GitHub state matches your local state since you can just keep your large files in the same folder. I'd be happy to coordinate and make this happen
Support! Hoping for the emergence of a real large vision model!
Thank you for your great work! Perhaps this will be the prototype for GPT-style scalable modeling in the visual domain.
Dear Authors,
I hope this message finds you well. I recently came across your repository while searching for scalable solutions for large vision models and was thoroughly impressed by the novel approach of using "visual sentences" for sequential modeling as detailed in your paper.
The concept of representing a diverse array of visual data as sequences and training models to predict the next token is quite intriguing. I am particularly interested in the potential applications of your Large Vision Model (LVM) for various vision tasks using visual prompts.
I understand from the README that the code, models, and datasets are being prepared for release. Could you kindly provide an estimated timeline for when these resources might become available? Access to these materials would be invaluable for researchers and practitioners alike who are eager to explore and build upon your work.
Additionally, if there is any possibility to access a pre-release or if there are any beta versions available for early testing, I would be highly interested in participating and providing feedback.
Thank you for your groundbreaking work in this field, and I look forward to your response.
Best regards,
yihong1120
Great Work!
Hi,thanks this work.
When will all the code be released?
It is a very wonderful project.
Is there a pre-trained model releases that can be utilized in the down-stream vision tasks.
It would be honor that we can do the following work on the basis of the project.
Thanks a lot.
Momo
Impressive progress !
Questions about whether the LVM has certain transferability to unknown tasks: for example some keypoint task or segment task which are not in the training data, whether the model demonstrates the transfer ability to unknown tasks to a certain extent?
Hi,
Thank you for the great work. Do you have a plan to release the model (even a small one) in the near future? Thanks again!
Thanks for your contributions! But i have a question for LVM applied to image classification. How should I use LVM with ICL to predict the labels?
First of all, thanks for the great work! I have a question about [BOS] and [EOS] tokens in your model training.
I have some slight confusion about whether [BOS] is used or not in your model.
And if it is used, I wonder why [EOS] alone is not enough to help model understand the sentence boundary.
Looking forward to the code release.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.