Giter Site home page Giter Site logo

Comments (8)

zucchini-nlp avatar zucchini-nlp commented on July 24, 2024 1

I checked with the code in model-doc for SigLip and also got an error. Supporting "device_map" in VLMs is indeed needed important. I believe _no_split_modules should be same as in ClipModel

For flashattention afaik current VLMs in transformers use optimized attn implementations only for LLM backbone (e.g. LLaVa supports Flash-Attn and SDPA even though CLIP doesn't). There's an issue for adding SDPA attn (#30565) to all VLMs, I can open another tracker-issue for Flash-Attn but will not able to work on it right now. Open for community contributions

from transformers.

lucasjinreal avatar lucasjinreal commented on July 24, 2024 1

I have been using empty list and make it can be deivicemap auto on multiple GPUs, currently inference is normal. I still didn't know why CLIPVIsionModel should make CLIPENcoderLayer didn't automap though.

from transformers.

amyeroberts avatar amyeroberts commented on July 24, 2024

Hi @lucasjinreal, thanks for opening a feature request!

Could you share a code snippet of how the model is being created with auto_map and the running environment (run transformers-cli env in the terminal and copy-paste the output)? SigLip should support device_map="auto"

from transformers.

lucasjinreal avatar lucasjinreal commented on July 24, 2024

I have surpassed this error, by simply add a _no_split_modules = [] to the attribute.

But it could be better add inside transoformers, it's just a single line. I could submit a PR for this.

As for flashattn, it's a really needs, it can boost vlms training more faster.

from transformers.

zucchini-nlp avatar zucchini-nlp commented on July 24, 2024

@lucasjinreal cool, PR would be nice but you need to test in multi-gpu setting that everything is being split correctly. I don't think that an empty "split_modules" will work as the most similar CLIP doesn't split at some modules. If you don't have multiple gpus, I can run some tests after the PR is open :)

Flash-Attn noted, thanks, will add to my todo list!

from transformers.

zucchini-nlp avatar zucchini-nlp commented on July 24, 2024

@lucasjinreal i just noticed that SigLip already has _no_split_modules in TextModel and in VisionModel, yet not in the SigLipModel. If I do _no_split_modules=[] as you tried, device mismatch error is raised so we have to add text and vision models' _no_split_modules to enable it

LMK if you're up to opening a PR :)

from transformers.

lucasjinreal avatar lucasjinreal commented on July 24, 2024

Hi, In my cased I just using SIglipVisionModel as a parent class and used a SiglipVisionModelSplit(SiglipVisionModel) in my MLLM.

So I think it not appliable to inside of transformers. Let me think a better way to do this

from transformers.

zucchini-nlp avatar zucchini-nlp commented on July 24, 2024

I believe the best solution is to copy 'no-split-modules' that are already indicated in text-vision components, and add them in SiglipModel's 'no-split-modules'

from transformers.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.