Giter Site home page Giter Site logo

lvm's People

Contributors

ytongbai avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lvm's Issues

Question about inference details

Hi, yutong, firstly, very impressive work!
I have a question about the number of tokens generated in each inference step. Does LVM a) auto-regressively produces tokens one-by-one like normal LLM and then each 256 tokens are partitioned and grouped to decode an image? Or, b) directly generates 256 tokens in one step inference?

Huggingface transformers support

Hello,

Congratulations for the great work! Do you think it is possible to add the model to Huggingface transformers?

Are you planning on doing it?

Thanks a lot and looking forward to see the code!

Question of Fig. 13 in the Paper

Greetings,

Thanks for your awesome work! This work shed light on the AGI sparks with pure-vision models! Reading your paper, I found Fig. 13 about IQ testing amazing. Does the training set include similar images, i.e., IQ testing images? That is an important question about the reliability of AGI for this paper.

question about the data

  1. P3,“……For datasets containing k different annotations for the same image we use a different approach: for each set of 1 + k images (input plus k annotations), we randomly select m elements, where m ≤ n + 1 ≤ 16. These m-tuples are then concatenated to form visual sequences”. What is 'n' here?
  2. The ratio of colorization and mask about ImageNet can be randomly set?

Consider hosting the models and datasets in GitHub directly using XetData add-on

Hey there! I work at XetHub and we built a GitHub app that scales GitHub repos to handle large files (upto 100 TB). No cost for public repos.

Here's an example where we hosted a bunch of .onxx model files in the Git repo: onnx/models#632 (comment)

This way, your local folder doesn't need a .gitignore file and the GitHub state matches your local state since you can just keep your large files in the same folder. I'd be happy to coordinate and make this happen

Thanks for Interesting work

Thank you for your great work! Perhaps this will be the prototype for GPT-style scalable modeling in the visual domain.

Inquiry Regarding Release Timeline for Code, Models, and Datasets

Dear Authors,

I hope this message finds you well. I recently came across your repository while searching for scalable solutions for large vision models and was thoroughly impressed by the novel approach of using "visual sentences" for sequential modeling as detailed in your paper.

The concept of representing a diverse array of visual data as sequences and training models to predict the next token is quite intriguing. I am particularly interested in the potential applications of your Large Vision Model (LVM) for various vision tasks using visual prompts.

I understand from the README that the code, models, and datasets are being prepared for release. Could you kindly provide an estimated timeline for when these resources might become available? Access to these materials would be invaluable for researchers and practitioners alike who are eager to explore and build upon your work.

Additionally, if there is any possibility to access a pre-release or if there are any beta versions available for early testing, I would be highly interested in participating and providing feedback.

Thank you for your groundbreaking work in this field, and I look forward to your response.

Best regards,
yihong1120

All code

Hi,thanks this work.

When will all the code be released?

About the released weights

It is a very wonderful project.

Is there a pre-trained model releases that can be utilized in the down-stream vision tasks.

It would be honor that we can do the following work on the basis of the project.

Thanks a lot.
Momo

Question about image classification

Thanks for your contributions! But i have a question for LVM applied to image classification. How should I use LVM with ICL to predict the labels?

Question about [BOS] and [EOS] tokens

First of all, thanks for the great work! I have a question about [BOS] and [EOS] tokens in your model training.

I have some slight confusion about whether [BOS] is used or not in your model.

And if it is used, I wonder why [EOS] alone is not enough to help model understand the sentence boundary.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.