zzxslp / som-llava Goto Github PK

View Code? Open in Web Editor NEW

109.0 109.0 2.0 11.1 MB

[COLM-2024] List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs

Python 73.80% Shell 0.38% C++ 2.57% Cuda 23.26%

huggingface-transformers llms multimodal pytorch

som-llava's People

Contributors

Stargazers

Watchers

Forkers

eltociear hubayirp

som-llava's Issues

How to load the som-llava model using the transformers library?

I attempted to use the following code, but unfortunately, it didn't work out:

model = LlavaForConditionalGeneration.from_pretrained("zzxslp/som-llava-v1.5-13b").to('cuda').eval()
processor = AutoProcessor.from_pretrained("zzxslp/som-llava-v1.5-13b")

I'm wondering if it's possible to directly load the som-llava model using the Transformers library. Is this functionality currently supported, or is it not compatible with this approach?

Annotated Images Download

Thank you very much for your awesome work. Would you mind providing the annotated image download links?

Attention map extraction

Hello, thanks for sharing the work, it is very inspiring. I wonder if you can share the attention extraction and visualization script used for creating Figure 2 in the paper?

Recommend Projects