Giter Site home page Giter Site logo

mmtrack's Introduction

Towards Unified Token Learning for Vision-Language Tracking (MMTrack)

The official implementation for the TCSVT 2023 paper [Towards Unified Token Learning for Vision-Language Tracking].

[Models] [Raw Results]

Framework

☀️ Highlights

Performance

Tracker TNL2K (AUC) LaSOT (AUC) LaSOT-ext (AUC) OTB99-Lang (AUC)
VLT_{TT} 54.7 67.3 48.4 74.0
JointNLT 56.9 60.4 - 65.3
MMTrack 58.6 70.0 49.4 70.5

Install the environment

conda create -n mmtrack python=3.8
conda activate mmtrack
bash install.sh

Set project paths

Run the following command to set paths for this project

python tracking/create_default_local_file.py --workspace_dir . --data_dir ./data --save_dir ./output

After running this command, you can also modify paths by editing these two files

lib/train/admin/local.py  # paths about training
lib/test/evaluation/local.py  # paths about testing

Data Preparation

  1. Download the preprocessed json file of reforco dataset. If the former link fails, you can download it here.

  2. Download the refcoco-train2014 dataset from Joseph Redmon's mscoco mirror.

  3. Download the OTB_Lang dataset from Link

Put the tracking datasets in ./data. It should look like:

${PROJECT_ROOT}
 -- data
     -- lasot
         |-- airplane
         |-- basketball
         |-- bear
         ...
     -- tnl2k
         |-- test
         |-- train
     -- refcoco
         |-- images
         |-- refcoco
         |-- refcoco+
         |-- refcocog
     -- otb_lang
         |-- OTB_query_test
         |-- OTB_query_train
         |-- OTB_videos

Training

Dowmload the pretrained OSTrack and Roberta-base, and put it under $PROJECT_ROOT$/pretrained_networks.

python tracking/train.py \
--script mmtrack --config baseline --save_dir ./output \
--mode multiple --nproc_per_node 2 --use_wandb 0

Replace --config with the desired model config under experiments/mmtrack. If you want to use wandb to record detailed training logs, you can set --use_wandb 1.

Evaluation

Download the model weights from Google Drive

Put the downloaded weights on $PROJECT_ROOT$/output/checkpoints/train/mmtrack/baseline

Change the corresponding values of lib/test/evaluation/local.py to the actual benchmark saving paths

Some testing examples:

  • LaSOT_lang or other off-line evaluated benchmarks (modify --dataset correspondingly)
python tracking/test.py --tracker_name mmtrack --tracker_param baseline --dataset_name lasot_lang --threads 8 --num_gpus 2

python tracking/analysis_results.py # need to modify tracker configs and names
  • lasot_extension_subset_lang
python tracking/test.py --tracker_name mmtrack --tracker_param baseline --dataset_name lasot_extension_subset_lang --threads 8 --num_gpus 2
  • TNL2k_Lang
python tracking/test.py --tracker_name mmtrack --tracker_param baseline --dataset_name tnl2k_lang --threads 8 --num_gpus 2
  • OTB_Lang
python tracking/test.py --tracker_name mmtrack --tracker_param baseline --dataset_name otb_lang --threads 8 --num_gpus 2

Acknowledgments

Citation

If our work is useful for your research, please consider cite:

@ARTICLE{Zheng2023mmtrack,
  author={Zheng, Yaozong and Zhong, Bineng and Liang, Qihua and Li, Guorong and Ji, Rongrong and Li, Xianxian},
  journal={IEEE Transactions on Circuits and Systems for Video Technology}, 
  title={Towards Unified Token Learning for Vision-Language Tracking}, 
  year={2023},
}

mmtrack's People

Contributors

azong-hqu avatar

Stargazers

 avatar nozzy avatar Ren Weipeng avatar  avatar Vergissmeinnicht avatar Yimin Du avatar Zhongjian Huang avatar Xuchen Li (李旭宸) avatar Xiao Wang(王逍) avatar  avatar Chen Liang avatar  avatar xzh_23 avatar

Watchers

 avatar Xuchen Li (李旭宸) avatar

Forkers

lxb-code-dev

mmtrack's Issues

How to set dataset correctly?

I have download refcoco, but I still cannot run the code, maybe my dataset position have some error,could u share more details about this dataset how to arrange? Hope u reply, thanks. this is my e-mail : [email protected]

checkpoint问题

你好大佬,我使用MMTrack_ep0150.pth.tar推理的时候,出现以下报错,看上去是MMTrack_ep0150.pth.tar有text_encoder.embeddings.position_ids 这个变量,但是roberta-base的模型里面并没有,我的pretrained_networks/roberta-base是从huggingface上下载直接使用的;
另外['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']这两个变量似乎没有进行训练;
请问,我是对于roberta-base的处理缺少了什么吗?

Evaluating 1 trackers on 280 sequences
Tracker: mmtrack baseline None , Sequence: airplane-1
test config: {'MODEL': {'PRETRAIN_FILE': 'OSTrack_ep0300.pth.tar', 'EXTRA_MERGER': False, 'RETURN_INTER': False, 'RETURN_STAGES': [], 'BACKBONE': {'TYPE': 'vit_base_patch16_224_ce', 'STRIDE': 16, 'MID_PE': False, 'SEP_SEG': False, 'CAT_MODE': 'direct', 'MERGE_LAYER': 0, 'ADD_CLS_TOKEN': False, 'CLS_TOKEN_USE_MODE': 'ignore', 'CE_LOC': [3, 6, 9], 'CE_KEEP_RATIO': [0.7, 0.7, 0.7], 'CE_TEMPLATE_RANGE': 'CTR_POINT'}, 'TEXT_ENCODER': 'roberta-base', 'FREEZE_TEXT_ENCODER': True, 'VLFUSION_LAYERS': 1, 'VL_INPUT_TYPE': 'separate', 'DECODER': {'DEC_LAYERS': 3, 'HIDDEN_DIM': 256, 'MLP_RATIO': 8, 'NUM_HEADS': 8, 'DROPOUT': 0.1, 'VOCAB_SIZE': 1001, 'BBOX_TYPE': 'xyxy', 'MEMORY_POSITION_EMBEDDING': 'sine', 'QUERY_POSITION_EMBEDDING': 'learned'}, 'HEAD': {'TYPE': 'MLP', 'NUM_CHANNELS': 256}}, 'TRAIN': {'LR': 0.0004, 'WEIGHT_DECAY': 0.0001, 'EPOCH': 150, 'LR_DROP_EPOCH': 125, 'BATCH_SIZE': 32, 'NUM_WORKER': 2, 'OPTIMIZER': 'ADAMW', 'BACKBONE_MULTIPLIER': 0.1, 'GIOU_WEIGHT': 2.0, 'L1_WEIGHT': 5.0, 'FREEZE_LAYERS': [0], 'PRINT_INTERVAL': 50, 'VAL_EPOCH_INTERVAL': 1000, 'GRAD_CLIP_NORM': 0.1, 'AMP': True, 'BBOX_TASK': True, 'LANGUAGE_TASK': True, 'AUX_LOSS': False, 'CE_START_EPOCH': 20, 'CE_WARM_EPOCH': 50, 'DROP_PATH_RATE': 0.1, 'SCHEDULER': {'TYPE': 'step', 'DECAY_RATE': 0.1}}, 'DATA': {'SAMPLER_MODE': 'causal', 'MEAN': [0.485, 0.456, 0.406], 'STD': [0.229, 0.224, 0.225], 'MAX_SAMPLE_INTERVAL': 200, 'TRAIN': {'DATASETS_NAME': ['LASOT_Lang'], 'DATASETS_RATIO': [6], 'SAMPLE_PER_EPOCH': 60000}, 'VAL': {'DATASETS_NAME': ['GOT10K_votval'], 'DATASETS_RATIO': [1], 'SAMPLE_PER_EPOCH': 10000}, 'SEARCH': {'SIZE': 384, 'FACTOR': 5.0, 'CENTER_JITTER': 4.5, 'SCALE_JITTER': 0.5, 'NUMBER': 1}, 'TEMPLATE': {'NUMBER': 1, 'SIZE': 192, 'FACTOR': 2.0, 'CENTER_JITTER': 0, 'SCALE_JITTER': 0}}, 'TEST': {'TEMPLATE_FACTOR': 2.0, 'TEMPLATE_SIZE': 192, 'SEARCH_FACTOR': 5.0, 'SEARCH_SIZE': 384, 'EPOCH': 150}}
Some weights of RobertaModel were not initialized from the model checkpoint at pretrained_networks/roberta-base and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Error(s) in loading state_dict for MMTrack:
Unexpected key(s) in state_dict: "text_encoder.embeddings.position_ids".

train log

你好大佬,请问可以提供训练的log吗?
我在3090上batch size为128的单卡训练需要25min/epoch,这正常吗?
谢谢大佬的工作
[train: 2, 50 / 468] FPS: 41.6 (138.7) , DataTime: 2.217 (0.097) , ForwardTime: 0.762 , TotalTime: 3.076 , Loss/cls: 5.12799 , Loss/total: 5.12799 , [email protected]: 52.12500
[train: 2, 100 / 468] FPS: 41.6 (45.9) , DataTime: 2.222 (0.096) , ForwardTime: 0.762 , TotalTime: 3.080 , Loss/cls: 5.10653 , Loss/total: 5.10653 , [email protected]: 53.73438

关于refercoco数据集

大佬您好,关于您给予的refercoco数据集部分,您给的第二个链接只需要下载2014 Training images该文件吗?
refercoco、refercoco+、refercocog是您在readme里面给予的第一个链接里面的三个文件这样放置吗?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.