Thanks for your great job. When will the training code open sourced?

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi, From your provided checkpoint (<a href="https://huggingface.co/l

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

When will the training code open sourced? about llava-next HOT 9 OPEN

jimchenhub commented on July 29, 2024 6

When will the training code open sourced?

from llava-next.

Comments (9)

Luodian commented on July 29, 2024 6

We definitely have the plan to opensource everything (including previous LLaVA-NeXT's training code, data) to benefit community.

However, there're still lot of works to do and more releases to be expected~

from llava-next.

JianbangZ commented on July 29, 2024 2

@jimchenhub @Luodian
I found the training scripts was publiced in the HF model README
https://huggingface.co/lmms-lab/llama3-llava-next-8b

Any way to actually run this training script?

from llava-next.

jimchenhub commented on July 29, 2024

Can you please tell me if there is any obvious difference between the training code and the one in https://github.com/haotian-liu/LLaVA? I'm trying to finetune it on my data set.

from llava-next.

Luodian commented on July 29, 2024

There's no major difference, if you wish to finetune our 8b/72b/110b. Only diff is that you need to apply new conversation templates.

from llava-next.

jimchenhub commented on July 29, 2024

Indeed, I found that the special tokens of llama3-8B are different from those of vicuna-7B. Could you please tell me what the conversation template is like you use to finetune llama3-8B?

from llava-next.

Luodian commented on July 29, 2024

you can find in llava/conversations.py, we use llama-3.

But it's for inference, in training, you need to implement the masking logic for llama-3 your side.

from llava-next.

jimchenhub commented on July 29, 2024

OK thanks, I'll try it.

from llava-next.

jimchenhub commented on July 29, 2024

Hi,

From your provided checkpoint (https://huggingface.co/lmms-lab/llama3-llava-next-8b), I found that the pre-trained config is

PROMPT_VERSION=plain
PRETRAIN_DATA_VERSION="blip558k"

So I referred to https://github.com/haotian-liu/LLaVA/blob/main/scripts/v1_5/pretrain.sh to pre-train with LLaMA3-8B backend as the following script

deepspeed llava/train/train_mem.py \
    --deepspeed ./scripts/zero2.json \
    --model_name_or_path ckpts/Meta-Llama-3-8B \
    --version plain \
    --data_path ./playground/data/LLaVA-Pretrain/blip_laion_cc_sbu_558k.json \
    --image_folder ./playground/data/LLaVA-Pretrain/images \
    --vision_tower ckpts/clip-vit-large-patch14-336 \
    --mm_projector_type mlp2x_gelu \
    --tune_mm_mlp_adapter True \
    --mm_vision_select_layer -2 \
    --mm_use_im_start_end False \
    --mm_use_im_patch_token False \
    --bf16 True \
    --output_dir ./checkpoints/llava-v1.5-llama-8b-pretrain \
    --num_train_epochs 1 \
    --per_device_train_batch_size 32 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 1 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 24000 \
    --save_total_limit 1 \
    --learning_rate 1e-3 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --tf32 True \
    --model_max_length 2048 \
    --gradient_checkpointing True \
    --dataloader_num_workers 4 \
    --lazy_preprocess True \
    --report_to tensorboard

However, the training loss is much larger than pre-trained with vicuna-7B (2.40 v.s. 2.06).

To adapt LLaMA3-8B to the training code in https://github.com/haotian-liu/LLaVA, I manually add pad_token and unk_token like

if tokenizer.pad_token is None:
        print("\n add unk_token \n\n")
        smart_tokenizer_and_embedding_resize(
            special_tokens_dict=dict(
                unk_token="<unk>"
            ),
            tokenizer=tokenizer,
            model=model,
        )
tokenizer.pad_token = tokenizer.unk_token

I wonder whether it will cause bad pre-train performance or not? If this is not correct, how can I set padding_value in this line input_ids = torch.nn.utils.rnn.pad_sequence()?

from llava-next.

zihaolucky commented on July 29, 2024

Hi @jimchenhub have you figured it out?

from llava-next.

When will the training code open sourced? about llava-next HOT 9 OPEN

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent