Feature request How to make SiglipVisionModel can support Auto map

I checked with the code in <a href="https://huggingface.co/docs/transformers/en/model_

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

SiqlipVisionModel does not support "device map= auto": no split modules`attribute about transformers HOT 8 CLOSED

lucasjinreal commented on July 24, 2024

SiqlipVisionModel does not support "device map= auto": no split modules`attribute

from transformers.

Comments (8)

zucchini-nlp commented on July 24, 2024 1

I checked with the code in model-doc for SigLip and also got an error. Supporting "device_map" in VLMs is indeed needed important. I believe _no_split_modules should be same as in ClipModel

For flashattention afaik current VLMs in transformers use optimized attn implementations only for LLM backbone (e.g. LLaVa supports Flash-Attn and SDPA even though CLIP doesn't). There's an issue for adding SDPA attn (#30565) to all VLMs, I can open another tracker-issue for Flash-Attn but will not able to work on it right now. Open for community contributions

from transformers.

lucasjinreal commented on July 24, 2024 1

I have been using empty list and make it can be deivicemap auto on multiple GPUs, currently inference is normal. I still didn't know why CLIPVIsionModel should make CLIPENcoderLayer didn't automap though.

from transformers.

amyeroberts commented on July 24, 2024

Hi @lucasjinreal, thanks for opening a feature request!

Could you share a code snippet of how the model is being created with auto_map and the running environment (run transformers-cli env in the terminal and copy-paste the output)? SigLip should support device_map="auto"

from transformers.

lucasjinreal commented on July 24, 2024

I have surpassed this error, by simply add a _no_split_modules = [] to the attribute.

But it could be better add inside transoformers, it's just a single line. I could submit a PR for this.

As for flashattn, it's a really needs, it can boost vlms training more faster.

from transformers.

zucchini-nlp commented on July 24, 2024

@lucasjinreal cool, PR would be nice but you need to test in multi-gpu setting that everything is being split correctly. I don't think that an empty "split_modules" will work as the most similar CLIP doesn't split at some modules. If you don't have multiple gpus, I can run some tests after the PR is open :)

Flash-Attn noted, thanks, will add to my todo list!

from transformers.

zucchini-nlp commented on July 24, 2024

@lucasjinreal i just noticed that SigLip already has _no_split_modules in TextModel and in VisionModel, yet not in the SigLipModel. If I do _no_split_modules=[] as you tried, device mismatch error is raised so we have to add text and vision models' _no_split_modules to enable it

LMK if you're up to opening a PR :)

from transformers.

lucasjinreal commented on July 24, 2024

Hi, In my cased I just using SIglipVisionModel as a parent class and used a SiglipVisionModelSplit(SiglipVisionModel) in my MLLM.

So I think it not appliable to inside of transformers. Let me think a better way to do this

from transformers.

zucchini-nlp commented on July 24, 2024

I believe the best solution is to copy 'no-split-modules' that are already indicated in text-vision components, and add them in SiglipModel's 'no-split-modules'

from transformers.

SiqlipVisionModel does not support "device map= auto": no split modules`attribute about transformers HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent