Dear authors, thank you for your great work. I have tested the LLaVA

Problem running LLaVA-NeXT-Video-34B-DPO about llava-next HOT 6 CLOSED

llava-vl commented on September 4, 2024

Problem running LLaVA-NeXT-Video-34B-DPO

from llava-next.

Comments (6)

ZhangYuanhan-AI commented on September 4, 2024 1

bash scripts/video/demo/video_demo.sh lmms-lab/LLaVA-NeXT-Video-34B-DPO mistral_direct 16 2 True XXX.mp4

works well at my side

from llava-next.

ZhangYuanhan-AI commented on September 4, 2024 1

Hi, as our training data includes images as well, and hence there are many instructions includes phases like "in the image", so the current model generate "in the image" sometimes.

We are currently focusing on solving this!

from llava-next.

sykuann commented on September 4, 2024

I am facing the same issue..

from llava-next.

gyfastas commented on September 4, 2024

same issue too. How to fix it?

updated: I tried to mofidy mm_utils.py as the following. It now works for me:

def call_for_batch(self, output_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
    offset = min(output_ids.shape[1] - self.start_len, self.max_keyword_len)
    self.keyword_ids = [keyword_id.to(output_ids.device) for keyword_id in self.keyword_ids]
    try:
        for keyword_id in self.keyword_ids: # fix: if output_ids[0, -keyword_id.shape[0]:] is not equal len as keyword_id just pass
            if output_ids[0, -keyword_id.shape[0]:].shape[0] != keyword_id.shape[0]:
                continue
            elif (output_ids[0, -keyword_id.shape[0]:] == keyword_id).all():
                return True
    except Exception as e:
        print(f"Error raised here {e}")
        import pdb;pdb.set_trace()
    outputs = self.tokenizer.batch_decode(output_ids[:, -offset:], skip_special_tokens=True)[0]
    for keyword in self.keywords:
        if keyword in outputs:
            return True
    return False

Note: This might changes the behavior of stopping criteria. It starts to repeat words in my case.

from llava-next.

ZhangYuanhan-AI commented on September 4, 2024

Hi, please share with me the command you use.

from llava-next.

Marlod390 commented on September 4, 2024

I hardcoded the parameters into a code for inference. I changed vicuna_v1 to mistral_direct like yours and it worked. But compared to the 7B version, the 34B answer contains a lot of "in the image" and "in the frame". This may not be what the video VLM should output. Do you have the similar problem? If not, there may be something wrong with my code.

from llava-next.

Recommend Projects

Problem running LLaVA-NeXT-Video-34B-DPO about llava-next HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent