azong-hqu / mmtrack Goto Github PK
View Code? Open in Web Editor NEWThe official implementation for the paper [Towards Unified Token Learning for Vision-Language Tracking].
License: MIT License
The official implementation for the paper [Towards Unified Token Learning for Vision-Language Tracking].
License: MIT License
你好大佬,我使用MMTrack_ep0150.pth.tar推理的时候,出现以下报错,看上去是MMTrack_ep0150.pth.tar有text_encoder.embeddings.position_ids 这个变量,但是roberta-base的模型里面并没有,我的pretrained_networks/roberta-base是从huggingface上下载直接使用的;
另外['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']这两个变量似乎没有进行训练;
请问,我是对于roberta-base的处理缺少了什么吗?
Evaluating 1 trackers on 280 sequences
Tracker: mmtrack baseline None , Sequence: airplane-1
test config: {'MODEL': {'PRETRAIN_FILE': 'OSTrack_ep0300.pth.tar', 'EXTRA_MERGER': False, 'RETURN_INTER': False, 'RETURN_STAGES': [], 'BACKBONE': {'TYPE': 'vit_base_patch16_224_ce', 'STRIDE': 16, 'MID_PE': False, 'SEP_SEG': False, 'CAT_MODE': 'direct', 'MERGE_LAYER': 0, 'ADD_CLS_TOKEN': False, 'CLS_TOKEN_USE_MODE': 'ignore', 'CE_LOC': [3, 6, 9], 'CE_KEEP_RATIO': [0.7, 0.7, 0.7], 'CE_TEMPLATE_RANGE': 'CTR_POINT'}, 'TEXT_ENCODER': 'roberta-base', 'FREEZE_TEXT_ENCODER': True, 'VLFUSION_LAYERS': 1, 'VL_INPUT_TYPE': 'separate', 'DECODER': {'DEC_LAYERS': 3, 'HIDDEN_DIM': 256, 'MLP_RATIO': 8, 'NUM_HEADS': 8, 'DROPOUT': 0.1, 'VOCAB_SIZE': 1001, 'BBOX_TYPE': 'xyxy', 'MEMORY_POSITION_EMBEDDING': 'sine', 'QUERY_POSITION_EMBEDDING': 'learned'}, 'HEAD': {'TYPE': 'MLP', 'NUM_CHANNELS': 256}}, 'TRAIN': {'LR': 0.0004, 'WEIGHT_DECAY': 0.0001, 'EPOCH': 150, 'LR_DROP_EPOCH': 125, 'BATCH_SIZE': 32, 'NUM_WORKER': 2, 'OPTIMIZER': 'ADAMW', 'BACKBONE_MULTIPLIER': 0.1, 'GIOU_WEIGHT': 2.0, 'L1_WEIGHT': 5.0, 'FREEZE_LAYERS': [0], 'PRINT_INTERVAL': 50, 'VAL_EPOCH_INTERVAL': 1000, 'GRAD_CLIP_NORM': 0.1, 'AMP': True, 'BBOX_TASK': True, 'LANGUAGE_TASK': True, 'AUX_LOSS': False, 'CE_START_EPOCH': 20, 'CE_WARM_EPOCH': 50, 'DROP_PATH_RATE': 0.1, 'SCHEDULER': {'TYPE': 'step', 'DECAY_RATE': 0.1}}, 'DATA': {'SAMPLER_MODE': 'causal', 'MEAN': [0.485, 0.456, 0.406], 'STD': [0.229, 0.224, 0.225], 'MAX_SAMPLE_INTERVAL': 200, 'TRAIN': {'DATASETS_NAME': ['LASOT_Lang'], 'DATASETS_RATIO': [6], 'SAMPLE_PER_EPOCH': 60000}, 'VAL': {'DATASETS_NAME': ['GOT10K_votval'], 'DATASETS_RATIO': [1], 'SAMPLE_PER_EPOCH': 10000}, 'SEARCH': {'SIZE': 384, 'FACTOR': 5.0, 'CENTER_JITTER': 4.5, 'SCALE_JITTER': 0.5, 'NUMBER': 1}, 'TEMPLATE': {'NUMBER': 1, 'SIZE': 192, 'FACTOR': 2.0, 'CENTER_JITTER': 0, 'SCALE_JITTER': 0}}, 'TEST': {'TEMPLATE_FACTOR': 2.0, 'TEMPLATE_SIZE': 192, 'SEARCH_FACTOR': 5.0, 'SEARCH_SIZE': 384, 'EPOCH': 150}}
Some weights of RobertaModel were not initialized from the model checkpoint at pretrained_networks/roberta-base and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Error(s) in loading state_dict for MMTrack:
Unexpected key(s) in state_dict: "text_encoder.embeddings.position_ids".
大佬您好,关于您给予的refercoco数据集部分,您给的第二个链接只需要下载2014 Training images该文件吗?
refercoco、refercoco+、refercocog是您在readme里面给予的第一个链接里面的三个文件这样放置吗?
Congrats!
It seems your paper just cost very short time to be accepted?
Thank you for your wonderful work. Can you open permissions?
I have download refcoco, but I still cannot run the code, maybe my dataset position have some error,could u share more details about this dataset how to arrange? Hope u reply, thanks. this is my e-mail : [email protected]
你好大佬,请问可以提供训练的log吗?
我在3090上batch size为128的单卡训练需要25min/epoch,这正常吗?
谢谢大佬的工作
[train: 2, 50 / 468] FPS: 41.6 (138.7) , DataTime: 2.217 (0.097) , ForwardTime: 0.762 , TotalTime: 3.076 , Loss/cls: 5.12799 , Loss/total: 5.12799 , [email protected]: 52.12500
[train: 2, 100 / 468] FPS: 41.6 (45.9) , DataTime: 2.222 (0.096) , ForwardTime: 0.762 , TotalTime: 3.080 , Loss/cls: 5.10653 , Loss/total: 5.10653 , [email protected]: 53.73438
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.