Giter Site home page Giter Site logo

goutamyg / mvt Goto Github PK

View Code? Open in Web Editor NEW
20.0 2.0 3.0 23.41 MB

[BMVC 2023] Mobile Vision Transformer-based Visual Object Tracking

License: Apache License 2.0

Python 100.00%
single-object-tracking vision-transformer mobile-vision-transformer bmvc bmvc2023 visual-object-tracking visual-tracking

mvt's Introduction

MVT_block

News

11-03-2024: C++ implementation of our tracker is available now

10-11-2023: ONNX-Runtime and TensorRT-based inference code is released. Now, our MVT runs at ~70 fps on CPU and ~300 fps on GPU ⚡⚡. Check the page for details.

14-09-2023: The pretrained tracker model is released

13-09-2023: The paper is available on arXiv now

22-08-2023: The MVT tracker training and inference code is released

21-08-2023: The paper is accepted at BMVC2023

Installation

Install the dependency packages using the environment file mvt_pyenv.yml.

Generate the relevant files:

python tracking/create_default_local_file.py --workspace_dir . --data_dir ./data --save_dir ./output

After running this command, modify the datasets paths by editing these files

lib/train/admin/local.py  # paths about training
lib/test/evaluation/local.py  # paths about testing

Training

  • Set the path of training datasets in lib/train/admin/local.py
  • Place the pretrained backbone model under the pretrained_models/ folder
  • For data preparation, please refer to this
  • Uncomment lines 63, 67, and 71 in the base_backbone.py file. Replace these lines with self.z_dict1 = template.tensors.
  • Run
python tracking/train.py --script mobilevit_track --config mobilevit_256_128x1_got10k_ep100_cosine_annealing --save_dir ./output --mode single
  • The training logs will be saved under output/logs/ folder

Pretrained tracker model

The pretrained tracker model can be found here

Tracker Evaluation

  • Update the test dataset paths in lib/test/evaluation/local.py
  • Place the pretrained tracker model under output/checkpoints/ folder
  • Run
python tracking/test.py --tracker_name mobilevit_track --tracker_param mobilevit_256_128x1_got10k_ep100_cosine_annealing --dataset got10k_test/trackingnet/lasot
  • Change the DEVICE variable between cuda and cpu in the --tracker_param file for GPU and CPU-based inference, respectively
  • The raw results will be stored under output/test/ folder

Profile tracker model

  • To count the model parameters, run
python tracking/profile_model.py

Acknowledgements

  • We use the Separable Self-Attention Transformer implementation and the pretrained MobileViT backbone from ml-cvnets. Thank you!
  • Our training code is built upon OSTrack and PyTracking

Citation

If our work is useful for your research, please consider citing:

@inproceedings{Gopal_2023_BMVC,
author    = {Goutam Yelluru Gopal and Maria Amer},
title     = {Mobile Vision Transformer-based Visual Object Tracking},
booktitle = {34th British Machine Vision Conference 2023, {BMVC} 2023, Aberdeen, UK, November 20-24, 2023},
publisher = {BMVA},
year      = {2023},
url       = {https://papers.bmvc2023.org/0800.pdf}
}

mvt's People

Contributors

goutamyg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

mvt's Issues

Inference

Very bad results, when running video_demo.py.

cvt onnx failed

When I used pytorch2onnx.py to convert onnx file, run reporte an error. The error is below

~/work/ai/MVT$ python tracking/pytorch2onnx.py
test config: {'MODEL': {'PRETRAIN_FILE': 'mobilevit_s.pt', 'EXTRA_MERGER': False, 'RETURN_INTER': False, 'RETURN_STAGES': [], 'BACKBONE': {'TYPE': 'mobilevit_s', 'STRIDE': 16, 'MID_PE': False, 'SEP_SEG': False, 'CAT_MODE': 'direct', 'MERGE_LAYER': 0, 'ADD_CLS_TOKEN': False, 'CLS_TOKEN_USE_MODE': 'ignore'}, 'NECK': {'TYPE': 'BN_FEATURE_FUSOR_LIGHTTRACK', 'NUM_CHANNS_POST_XCORR': 64}, 'HEAD': {'TYPE': 'CENTER', 'NUM_CHANNELS': 256}}, 'TRAIN': {'LR': 0.0004, 'WEIGHT_DECAY': 0.0001, 'EPOCH': 100, 'LR_DROP_EPOCH': 10, 'BATCH_SIZE': 128, 'NUM_WORKER': 10, 'OPTIMIZER': 'ADAMW', 'BACKBONE_MULTIPLIER': 0.1, 'GIOU_WEIGHT': 2.0, 'L1_WEIGHT': 5.0, 'FREEZE_LAYERS': [0], 'PRINT_INTERVAL': 50, 'VAL_EPOCH_INTERVAL': 10, 'GRAD_CLIP_NORM': 0.1, 'AMP': False, 'SCHEDULER': {'TYPE': 'cosine_anneal', 'DECAY_RATE': 0.5}}, 'DATA': {'SAMPLER_MODE': 'causal', 'MEAN': [0.0, 0.0, 0.0], 'STD': [1.0, 1.0, 1.0], 'MAX_SAMPLE_INTERVAL': 200, 'TRAIN': {'DATASETS_NAME': ['GOT10K_train_full'], 'DATASETS_RATIO': [1], 'SAMPLE_PER_EPOCH': 60000}, 'VAL': {'DATASETS_NAME': ['GOT10K_official_val'], 'DATASETS_RATIO': [1], 'SAMPLE_PER_EPOCH': 10000}, 'SEARCH': {'SIZE': 256, 'FACTOR': 4.0, 'CENTER_JITTER': 3, 'SCALE_JITTER': 0.25, 'NUMBER': 1}, 'TEMPLATE': {'NUMBER': 1, 'SIZE': 128, 'FACTOR': 2.0, 'CENTER_JITTER': 0, 'SCALE_JITTER': 0}}, 'TEST': {'DEVICE': 'cpu', 'TEMPLATE_FACTOR': 2.0, 'TEMPLATE_SIZE': 128, 'SEARCH_FACTOR': 4.0, 'SEARCH_SIZE': 256, 'EPOCH': 100}}
Converting tracking model now!
Traceback (most recent call last):
File "tracking/pytorch2onnx.py", line 223, in
convert_tracking_model(network, params.checkpoint)
File "tracking/pytorch2onnx.py", line 188, in convert_tracking_model
opset_version=11, do_constant_folding=True, input_names=['z','x'], output_names=['cls','reg'])
File "/home/999/.local/lib/python3.6/site-packages/torch/onnx/init.py", line 320, in export
custom_opsets, enable_onnx_checker, use_external_data_format)
File "/home/999/.local/lib/python3.6/site-packages/torch/onnx/utils.py", line 111, in export
custom_opsets=custom_opsets, use_external_data_format=use_external_data_format)
File "/home/999/.local/lib/python3.6/site-packages/torch/onnx/utils.py", line 729, in _export
dynamic_axes=dynamic_axes)
File "/home/999/.local/lib/python3.6/site-packages/torch/onnx/utils.py", line 493, in _model_to_graph
graph, params, torch_out, module = _create_jit_graph(model, args)
File "/home/999/.local/lib/python3.6/site-packages/torch/onnx/utils.py", line 437, in _create_jit_graph
graph, torch_out = _trace_and_get_graph_from_model(model, args)
File "/home/999/.local/lib/python3.6/site-packages/torch/onnx/utils.py", line 388, in _trace_and_get_graph_from_model
torch.jit._get_trace_graph(model, args, strict=False, _force_outplace=False, _return_inputs_states=True)
File "/home/999/.local/lib/python3.6/site-packages/torch/jit/_trace.py", line 1166, in _get_trace_graph
outs = ONNXTracedModule(f, strict, _force_outplace, return_inputs, _return_inputs_states)(*args, **kwargs)
File "/home/999/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/999/.local/lib/python3.6/site-packages/torch/jit/_trace.py", line 132, in forward
self._force_outplace,
File "/home/999/.local/lib/python3.6/site-packages/torch/jit/_trace.py", line 118, in wrapper
outs.append(self.inner(*trace_inputs))
File "/home/999/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/999/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1090, in _slow_forward
result = self.forward(*input, **kwargs)
File "tracking/pytorch2onnx.py", line 52, in forward
x, z = self.backbone(x=search, z=template)
File "/home/999/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/999/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1090, in _slow_forward
result = self.forward(*input, **kwargs)
File "/home/999/work/ai/MVT/lib/models/mobilevit_track/base_backbone.py", line 93, in forward
x, z = self.forward_features(x, z,)
File "/home/999/work/ai/MVT/lib/models/mobilevit_track/base_backbone.py", line 74, in forward_features
x, z = self._forward_MobileViT_layer(self.layer_3, x, z)
File "/home/999/work/ai/MVT/lib/models/mobilevit_track/base_backbone.py", line 46, in _forward_MobileViT_layer
z = MobilenetV2_block(z)
File "/home/999/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/999/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1090, in _slow_forward
result = self.forward(*input, **kwargs)
File "/home/999/work/ai/MVT/lib/models/mobilevit_track/modules/mobilenetv2.py", line 240, in forward
return self.block(x)
File "/home/999/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/999/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1090, in _slow_forward
result = self.forward(*input, **kwargs)
File "/home/999/.local/lib/python3.6/site-packages/torch/nn/modules/container.py", line 141, in forward
input = module(input)
File "/home/999/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/999/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1090, in _slow_forward
result = self.forward(*input, **kwargs)
File "/home/999/work/ai/MVT/lib/models/mobilevit_track/layers/conv_layer.py", line 236, in forward
return self.block(x)
File "/home/999/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/999/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1090, in _slow_forward
result = self.forward(*input, **kwargs)
File "/home/999/.local/lib/python3.6/site-packages/torch/nn/modules/container.py", line 141, in forward
input = module(input)
File "/home/999/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/999/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1090, in _slow_forward
result = self.forward(*input, **kwargs)
File "/home/999/.local/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 446, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/home/999/.local/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 443, in _conv_forward
self.padding, self.dilation, self.groups)
RuntimeError: Given groups=1, weight of size [256, 64, 1, 1], expected input[1, 3, 128, 128] to have 64 channels, but got 3 channels instead

UserWarning

./lib/train/data/loader.py:87: UserWarning: An output with one or more elements was resized since it had shape [1572864], which does not match the required output shape [1, 96, 128, 128]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /opt/conda/conda-bld/pytorch_1656352464346/work/aten/src/ATen/native/Resize.cpp:17.)
return torch.stack(batch, 1, out=out)

live tracking demo

Hi, how to perform this tracking using live feed or live camera, update the code for object tracking on live feed or camera.

base_backbone.py问题

你好,请问训练时候为什么取消base_backbone.py文件中63,67,71行?可以帮解答一下嘛,谢谢,

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.