Giter Site home page Giter Site logo

Comments (1)

yiranyyu avatar yiranyyu commented on August 22, 2024

I downloaded the model weight pre-trained on VG&COCO and pre-processed features following the instruction in README. Then I tested the zero-shot grounding performance of VL-T5 on RefCOCOg dataset following the guidance. However the performance on val and test split are both zero, which really confuse me.

Then I tested the few-shot performance with VL-T5 and get reasonable result (44.53% acc on val split with four samples). I was wondering if it is the weight not used (see the log in below) when initializing RefCOCOModel from pre-trained weight that cause such big gap between the zero-shot performance and few-shot performance?

Command to Reproduce the Results

cd VL-T5/

# modify scripts/RefCOCOg_VLT5.sh to set the `lr` param to 0, set epoch to 1
vim scripts/RefCOCOg_VLT5.sh

# modify #304 of src/refcoco from `>` to `>=` to save the zero acc checkpoint for testing
vim src/refcoco.py

# run the training script
cd VL-T5/
bash scripts/RefCOCOg_VLT5.sh 4

Logs and Other Information

Log

Building Model at GPU 0
Building Model at GPU 3
Building Model at GPU 1
Building Model at GPU 2
Some weights of VLT5RefCOCO were not initialized from the model checkpoint at t5-base and are newly initialized: ['encoder.visual_embedding.feat_embedding.0.weight', 'encoder.visual_embedding.feat_embedding.0.bias', 'encoder.visual_embedding.absolute_vis_pos_embedding.0.weight', 'encoder.visual_embedding.absolute_vis_pos_embedding.0.bias', 'encoder.visual_embedding.obj_order_embedding.weight', 'encoder.visual_embedding.img_order_embedding.weight', 'encoder.visual_embedding.layer_norm.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Model Launching at GPU 3
Model Launching at GPU 1
Model Launching at GPU 2
Model loaded from  snap/pretrain/VLT5/Epoch30.pth
_IncompatibleKeys(missing_keys=[], unexpected_keys=['encoder.visual_embedding.feat_embedding.1.weight', 'encoder.visual_embedding.absolute_vis_pos_embedding.1.weight'])

Xnip2022-10-26_20-22-44

Script

Content of scripts/RefCOCOg_VLT5.sh (only lr and epochs params changed):

# The name of experiment
name=VLT5

output=snap/refcocog/$name

PYTHONPATH=$PYTHONPATH:./src \
python -m torch.distributed.launch \
    --nproc_per_node=$1 \
    src/refcoco.py \
        --distributed --multiGPU \
        --train train \
        --valid val \
        --test test \
        --optim adamw \
        --warmup_ratio 0.1 \
        --clip_grad_norm 5 \
        --lr 0e-5 \
        --epochs 1 \
        --num_workers 4 \
        --backbone 't5-base' \
        --output $output ${@:2} \
        --load snap/pretrain/VLT5/Epoch30 \
        --batch_size 90 \

Platform

OS: Ubuntu GPU: A100

Update:

It seems the unexpected_keys warning is not the reason of this low performance. The unexpected_keys message disappears when I use the model further pretrained on VCR, however the val and test performance is still low (i.e. nearly 0.6% on val and test). Then we try to constrain the decoding and only generate vis_extra_id_ tokens, resulting a 1% accuracy on test.

from vl-t5.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.