apple / ml-cvnets Goto Github PK

View Code? Open in Web Editor NEW

1.7K 33.0 211.0 5.89 MB

CVNets: A library for training computer vision networks

Home Page: https://apple.github.io/ml-cvnets

License: Other

Python 99.84% Makefile 0.16%

ade20k classification computer-vision deep-learning detection imagenet machine-learning mscoco pascal-voc pytorch

ml-cvnets's Introduction

CVNets: A library for training computer vision networks

CVNets is a computer vision toolkit that allows researchers and engineers to train standard and novel mobile- and non-mobile computer vision models for variety of tasks, including object classification, object detection, semantic segmentation, and foundation models (e.g., CLIP).

What's new?
Installation
Getting started
Supported models and tasks
Maintainers
Research effort at Apple using CVNets
Contributing to CVNets
License
Citation

What's new?

July 2023: Version 0.4 of the CVNets library includes
- Bytes Are All You Need: Transformers Operating Directly On File Bytes
- RangeAugment: Efficient online augmentation with Range Learning
- Training and evaluating foundation models (CLIP)
- Mask R-CNN
- EfficientNet, Swin Transformer, and ViT
- Enhanced distillation support

Installation

We recommend to use Python 3.10+ and PyTorch (version >= v1.12.0)

Instructions below use Conda, if you don't have Conda installed, you can check out How to Install Conda.

# Clone the repo
git clone [email protected]:apple/ml-cvnets.git
cd ml-cvnets

# Create a virtual env. We use Conda
conda create -n cvnets python=3.10.8
conda activate cvnets

# install requirements and CVNets package
pip install -r requirements.txt -c constraints.txt
pip install --editable .

Getting started

General instructions for working with CVNets are given here.
Examples for training and evaluating models are provided here and here.
Examples for converting a PyTorch model to CoreML are provided here.

Supported models and Tasks

To see a list of available models and benchmarks, please refer to Model Zoo and examples folder.

ImageNet classification models

CNNs
Transformers

Multimodal Classification

ByteFormer

Object detection

Semantic segmentation

Foundation models

CLIP

Automatic Data Augmentation

Distillation

Soft distillation
Hard distillation

Maintainers

This code is developed by Sachin, and is now maintained by Sachin, Maxwell Horton, Mohammad Sekhavat, and Yanzi Jin.

Previous Maintainers

Farzad

Research effort at Apple using CVNets

Below is the list of publications from Apple that uses CVNets:

Contributing to CVNets

We welcome PRs from the community! You can find information about contributing to CVNets in our contributing document.

Please remember to follow our Code of Conduct.

License

For license details, see LICENSE.

Citation

If you find our work useful, please cite the following paper:

@inproceedings{mehta2022mobilevit,
     title={MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer},
     author={Sachin Mehta and Mohammad Rastegari},
     booktitle={International Conference on Learning Representations},
     year={2022}
}

@inproceedings{mehta2022cvnets, 
     author = {Mehta, Sachin and Abdolhosseini, Farzad and Rastegari, Mohammad}, 
     title = {CVNets: High Performance Library for Computer Vision}, 
     year = {2022}, 
     booktitle = {Proceedings of the 30th ACM International Conference on Multimedia}, 
     series = {MM '22} 
}

ml-cvnets's People

Contributors

Stargazers

Watchers

Forkers

peternara ydshieh voxuanthuy cv-ip stjordanis jawaechan zha0ming1e hkzhang-git blainehill2001 wilile26811249 classicvalues randomwalker300 windsorwho kelodgsmile moqingxinai kirk300 kvzhao collector-m aiforworlds z965631 xuekunnan sufyanmohammad robert-junwang wahyurahmaniar zmyl18641515513 purvang3 sungwookson anzhella-pankratova dandelight fabiofumarola xuliangcs bloody-trevi adarshkosta jasonyank ywang370 footh logichen mengqidyangge spartag117 gemmanguen saniazahan kaidduong macos ai-hub-deep-learning-fundamental shuvozitghose joskid isabella232 shengzhang90 shrejais vmbbc kujta1 hjoonkwon delldu subbareddy248 makhanov-nu zzz1412 mooc123456 huake-ezhou shuguoj huangying-zhan fengsiyu bainouo jackie666666 xu-github-curry lidaweinuc xk-wang zhuolinli-shu iamweiweishi robotseye ccoding04 zihan987 mohsen-azimi jhyuuu adnahmed mikechen66 woodman718 amirgholipour poodarchu holmes-gu whuyyc grandeep pugangqiang gazelei ttjjmm upstartdeveloper shaunstanislauslau coolcodelvs xiaoheyou yangfukui xuandaye berooo husterrc test-mass-forker-org-1 mbrukman jizongfox rahmaniitp tryweirdier techthiyanes wangxder guttappa1238

ml-cvnets's Issues

The loss is nan

I'm trying to train the classificaion model in my machine using 8 RTX 2080ti. Limited the memory size of GPU, I set batchsize for each GPU to 32. After training several iterations or several epoches, loss become NaN and the training is stopped. Could you tell me how to solve this problem?

Two class object detection

Hi Team,
For a two class object detection problem using mobileVITv2, I'm getting very low mAP and worse classification accuracy.
Can you suggest me any hyperparameters I should change?
Exactly, what changes I should make in this code. I have used the same coco detection example but I've given only two classes for detection instead of 81.

Unable to stack the tensors. Error: expected Tensor as element 0 in argument 0, but got

Hi,
I am getting the below error while am trying to train mobilevit for a custom dataset with 2 classes.

cvnets-train --common.config-file config/classification/imagenet/mobilevit_v1.yaml --common.results-loc classification_results
The dataset structure is as below:

train
-class1
-class2
validation
-class1
-class2

Any idea?

2022-07-20 17:02:33 - INFO    - Configuration file is stored here: classification_results/run_1/config.yaml
===========================================================================
2022-07-20 17:02:35 - DEBUG    - Training epoch 0 with 9873 samples
2022-07-20 17:02:49 - ERROR   - Unable to stack the tensors. Error: expected Tensor as element 0 in argument 0, but got numpy.int32
2022-07-20 17:02:49 - ERROR   - Exiting!!!
2022-07-20 17:02:49 - ERROR   - Unable to stack the tensors. Error: expected Tensor as element 0 in argument 0, but got numpy.int32
2022-07-20 17:02:49 - ERROR   - Exiting!!!
2022-07-20 17:02:49 - ERROR   - Unable to stack the tensors. Error: expected Tensor as element 0 in argument 0, but got numpy.int32
2022-07-20 17:02:49 - ERROR   - Exiting!!!
2022-07-20 17:02:50 - ERROR   - Unable to stack the tensors. Error: expected Tensor as element 0 in argument 0, but got numpy.int32
2022-07-20 17:02:50 - ERROR   - Exiting!!!
2022-07-20 17:02:50 - ERROR   - Unable to stack the tensors. Error: expected Tensor as element 0 in argument 0, but got numpy.int32
2022-07-20 17:02:50 - ERROR   - Exiting!!!
2022-07-20 17:02:52 - LOGS    - Exception occurred that interrupted the training. DataLoader worker (pid(s) 30836, 16400, 36564, 21248, 32388) exited unexpectedly
DataLoader worker (pid(s) 30836, 16400, 36564, 21248, 32388) exited unexpectedly
Traceback (most recent call last):
  File "d:\tools\mobile\lib\site-packages\torch\utils\data\dataloader.py", line 1134, in _try_get_data
    data = self._data_queue.get(timeout=timeout)
  File "D:\Tools\PVPython\lib\multiprocessing\queues.py", line 105, in get
    raise Empty
_queue.Empty

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "E:\Sahar\ml-cvnets\engine\training_engine.py", line 682, in run
    train_loss, train_ckpt_metric = self.train_epoch(epoch)
  File "E:\Sahar\ml-cvnets\engine\training_engine.py", line 298, in train_epoch
    for batch_id, batch in enumerate(self.train_loader):
  File "d:\tools\mobile\lib\site-packages\torch\utils\data\dataloader.py", line 652, in __next__
    data = self._next_data()
  File "d:\tools\mobile\lib\site-packages\torch\utils\data\dataloader.py", line 1330, in _next_data
    idx, data = self._get_data()
  File "d:\tools\mobile\lib\site-packages\torch\utils\data\dataloader.py", line 1296, in _get_data
    success, data = self._try_get_data()
  File "d:\tools\mobile\lib\site-packages\torch\utils\data\dataloader.py", line 1147, in _try_get_data
    raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e
RuntimeError: DataLoader worker (pid(s) 30836, 16400, 36564, 21248, 32388) exited unexpectedly
2022-07-20 17:02:52 - LOGS    - Training took 00:00:18.76
2022-07-20 17:02:52 - ERROR   - Unable to stack the tensors. Error: expected Tensor as element 0 in argument 0, but got numpy.int32
2022-07-20 17:02:52 - ERROR   - Exiting!!!
2022-07-20 17:02:52 - ERROR   - Unable to stack the tensors. Error: expected Tensor as element 0 in argument 0, but got numpy.int32
2022-07-20 17:02:52 - ERROR   - Exiting!!!
2022-07-20 17:02:54 - ERROR   - Unable to stack the tensors. Error: expected Tensor as element 0 in argument 0, but got numpy.int32
2022-07-20 17:02:54 - ERROR   - Exiting!!!
Traceback (most recent call last):
Traceback (most recent call last):
  File "D:\Tools\PVPython\lib\multiprocessing\queues.py", line 238, in _feed
    send_bytes(obj)
  File "D:\Tools\PVPython\lib\multiprocessing\queues.py", line 238, in _feed
    send_bytes(obj)
  File "D:\Tools\PVPython\lib\multiprocessing\connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "D:\Tools\PVPython\lib\multiprocessing\connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "D:\Tools\PVPython\lib\multiprocessing\connection.py", line 290, in _send_bytes
    nwritten, err = ov.GetOverlappedResult(True)
  File "D:\Tools\PVPython\lib\multiprocessing\connection.py", line 290, in _send_bytes
    nwritten, err = ov.GetOverlappedResult(True)
BrokenPipeError: [WinError 109] The pipe has been ended
BrokenPipeError: [WinError 109] The pipe has been endedv

Code Unfriendly for Downstream Tasks

Hi @sacmehta,

First I'd like to thank you for your contribution of MobileViTs. But I have the code in this repo very unfriendly for downstream tasks. I cannot find which snippets are related to MobileViTs as well as their detailed configurations. Although other have been some other GitHub repos providing unofficial code for MobileViT-V1, but the I couldn't load pre-trained weights from this one. Could you please re-organise your code here to make it more useful for the community?

Many thanks,
Yiming

Why MobileViT neither loses the patch order nor thespatial order of pixels within each patch?

Hi, Thank you for the great work.

I am sorry I don't understand why does MobileViT neither loses the patch order nor the spatial order of pixels within each patch?

In Figure 4, I think blue blocks are still permutation equivariant.

Would you mind giving an explanation?

Inference Time comparision between Resnet and MobileViT

Hello!

I'm Dae Hyun Kim, Korean student

I want to know the Inference time(or FPS) of Resnet and Mobile ViT on ImageNet-1k validation set

Are there real results about it?

I ran the evaluation code(main_eval.py) myself using a single GeForce RTX 2080 TI GPU.

The results are as follows.

Mobile Vit XS

Mobile Vit S

Resnet50

I know these are not inference time but evaluation time. However, based on this data, it can be estimated that mobilevit is slower than Resnet when measured by GPU, is this speculation true?

choose warm up iterations problem

Hi, congratulations on your excellent work!
I wan to know how many iterations use in warm up in different datasets? Such as , I have 10 thousands images, and batch size is 8, how to choose iterations for warm up in this model? If I use 300 epochs for training and how many epochs use to warm up? Whether there have a fixed ratio?

too sensitivve to hyperparameters?

Hi,
I trained your code on Imagenet-1k from scratch with your config file (mobilevit-small) with only one change: a new batch size of 32/GPU with an effective batch size of 32*4. I get top-1 accuracy of 74.23 on the Imagenet validation set. I suppose this will change the iterations warmup iterations, which I didn't change. But, I got the exact accuracy for MobileNetV2 with the same changes [just changing the batch size and without changing warmup itenrations]. Is it because transformers are susceptible to hyperparameter changes or you didn't notice such issues? Thank you.

If I use mobilevitv2's pretrained weights for transfer learning, does the custom dataset need to be normalized between 0 and 1? i.e. divide by 255.0; and the input image is bgr or rgb format?

Is the tensor of the input image of mobilevitv2's pre-trained weights between 0 to 255, or is it normalized to between 0 to 1? i.e. divide by 255.0?

In other words, if I use mobilevitv2 pre-trained weights for transfer learning, does the custom dataset need to be normalized between 0 and 1? i.e. divide by 255.0
thanks

Question about the dataloader on kinetics 400

Hi，thanks for such a great job!
I download the kinetics 400 from this github
https://github.com/cvdfoundation/kinetics-dataset
And I try to modify the parameters in this directory:config/video_classification/kinetics/mobilevit_st.yaml
When I modify the root of the training set and validation set,it seems doesn't load the data correctly!And I use one 1080ti.

Does it need to do other things to load the dataset?
Thank you!!

The problem when I git clone [email protected]:apple/ml-cvnets.git

Hi !
Congratulations on your huge improvement !
I want to use your MobileViT and its version2 model, but when I git clone [email protected]:apple/ml-cvnets.git
I encountered with Permission denied (publickey).
So how to resolve this?
Thanks very much .

Integrate MobileViTv2 with retinanet for object detection

Hi, I have notice that you integrated MobileViTv2 with SSDLite for object detection. I'm wondering if it is also an option to integrate MobileViTv2 with retinanet for object detection? I'm asking because retinanet seems achieve better performance than SSDLite. Thanks

Significance of update_scales() during training

Hi, Thank you for the great work.

I have a question related to the update_scales() function in VariableBatchSamplerDDP class. I see by default the code is not updating scales during training as self.scale_inc=False. As the code is there I am assuming that you could have tried enabling it. If so, may you provide the ablation results that you got with and without using update_scales() during training? Thank you

Confusion matrix is None. Check code Error

Sorry to bother you. But I meet (Confusion matrix is None. Check code) Error

When I use the following command in ADE20K Dataset (However, it works well in PascalVoc2012):
export CFG_FILE="PATH_TO_CONFIG_FILE"
export DEEPLABV3_MODEL_WEIGHTS="PATH_TO_MODEL_WEIGHTS"
CUDA_VISIBLE_DEVICES=0 cvnets-eval-seg --common.config-file $CFG_FILE --common.results-loc seg_results --model.segmentation.pretrained $DEEPLABV3_MODEL_WEIGHTS --evaluation.segmentation.resize-input-images --evaluation.segmentation.mode validation_set

It gets :
Confusion matrix is None. Check code
.........
acc_global, acc, iu, = conf_mat.compute()
TypeError: cannot unpack non-iterable NoneType object

Maybe the error is caused by class ConfusionMatrix() with zero return value, I don't know how to figure it out
Btw, the evaluation on a single image works properly.

#Operations for Self-Attention Layer

Hi @sacmehta, Thanks for the great work

ml-cvnets/cvnets/layers/multi_head_attention.py

Line 125 in d38a116

m_qk = (seq_len * in_channels * in_channels) * b_sz

# number of operations in QK^T
m_qk = (seq_len * in_channels * in_channels) *

As per the code above, the MAdds for QK^T is calculated as L*C*C where L & C are sequence length and channels respectively. But as we know QK^T product actually involves calculating NxN gram matrix, means we need to compute NxN elements and the operations required for calculating each element would be C. So, shouldn't the MAdds be NxNxC instead?

Thanks, please correct me if I am wrong.

The accuracy of the two figures does not match

Hi，thanks for such a great job，but I have a question, the accuracy of the standard training method of MobileViT-S in (b) and (c) in Figure 9 in the paper seems to be different, and the top-1 accuracy in (b) should be about 77% , the top-1 accuracy in (c) should be around 78%.

Can anyone answer my doubts?
thanks！

FLOPs calculation different from paper?

Hi, I trained your code using MobileViT-XS model.
The flops don't seem same to the paper.
For example, the paper mentions that the flops are 0.7G, but with your code, the flops are 0.9G.
Specifically, flops = 986.269M.
Thank you.

No tutorial for inference

Appears to be no instructions for running inference with a model in python

Checkpoint cannot be loaded

Greetings!
I have problems with loading pretrained weights in detection task.
I took weights and config from the Model Zoo. Chosen model for detection is SSD MobileViTv2-0.75.

I tried to get the model with pretrained weights with code below:

from cvnets import get_model
from options.opts import get_training_arguments
from options.utils import load_config_file

sys.argv= ['']
opts = get_training_arguments()
setattr(opts, 'common.config_file', <path-to-config-file>)
setattr(opts, 'model.detection.pretrained', <path-to-checkpoint>)

opts = load_config_file(opts)
model = get_model(opts)

However, there is an error occurs:

Unable to load pretrained weights from /content/ml-cvnets/coco-ssd-mobilevitv2-0.75.pt. Error: Error(s) in loading state_dict for SingleShotMaskDetector:
	size mismatch for ssd_heads.0.loc_cls_layer.pw_conv.block.conv.weight: copying a param with shape torch.Size([510, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([504, 512, 1, 1]).
	size mismatch for ssd_heads.0.loc_cls_layer.pw_conv.block.conv.bias: copying a param with shape torch.Size([510]) from checkpoint, the shape in current model is torch.Size([504]).
	size mismatch for ssd_heads.1.loc_cls_layer.pw_conv.block.conv.weight: copying a param with shape torch.Size([510, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([504, 256, 1, 1]). ...

As I understand, there is a difference between layer names in model and checkpoint.

Is there is anything I can do to fix this issue?
Thanks

adam optim ERROR:If capturable=False, state_steps should not be CUDA tensors.

Hi, congratulations on your excellent work!
I would really appreciate if you could help me through this.
So I run

PYTHONWARNINGS="ignore" cvnets-train --common.config-file config/classification/imagenet/mobilevit_v2.yaml --common.results-loc mobilevitv2_results/width_1_0_0 --common.override-kwargs scheduler.cosine.max_lr=0.0075 scheduler.cosine.min_lr=0.00075 optim.weight_decay=0.013 model.classification.mitv2.width_multiplier=1.00 --common.tensorboard-logging --common.accum-freq 4 --common.auto-resume

and trigger the auto-resume mode to continue my last training, and this error occurs

2022-07-03 06:06:18 - LOGS    - Exception occurred that interrupted the training. If capturable=False, state_steps shou
ld not be CUDA tensors.
If capturable=False, state_steps should not be CUDA tensors.

Traceback (most recent call last):                                                                           
  File "/home/yu/projects/mobilevit/ml-cvnets/engine/training_engine.py", line 682, in run
    train_loss, train_ckpt_metric = self.train_epoch(epoch)
  File "/home/yu/projects/mobilevit/ml-cvnets/engine/training_engine.py", line 353, in train_epoch
    self.gradient_scalar.step(optimizer=self.optimizer)
  File "/home/yu/anaconda3/envs/mobilevit/lib/python3.8/site-packages/torch/cuda/amp/grad_scaler.py", line 338, in step
    retval = self._maybe_opt_step(optimizer, optimizer_state, *args, **kwargs)
  File "/home/yu/anaconda3/envs/mobilevit/lib/python3.8/site-packages/torch/cuda/amp/grad_scaler.py", line 285, in _may
be_opt_step
    retval = optimizer.step(*args, **kwargs)
  File "/home/yu/anaconda3/envs/mobilevit/lib/python3.8/site-packages/torch/optim/optimizer.py", line 109, in wrapper
    return func(*args, **kwargs)
  File "/home/yu/anaconda3/envs/mobilevit/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorat
e_context
    return func(*args, **kwargs)
  File "/home/yu/anaconda3/envs/mobilevit/lib/python3.8/site-packages/torch/optim/adamw.py", line 161, in step
    adamw(params_with_grad,
  File "/home/yu/anaconda3/envs/mobilevit/lib/python3.8/site-packages/torch/optim/adamw.py", line 218, in adamw
    func(params,
  File "/home/yu/anaconda3/envs/mobilevit/lib/python3.8/site-packages/torch/optim/adamw.py", line 259, in _single_tenso
r_adamw
    assert not step_t.is_cuda, "If capturable=False, state_steps should not be CUDA tensors."

And I am 100% sure that CUDNN is enabled, all gpus are available, nothing wrong happens when I first train this.

And here's a nother problem, do you guys have a clue if the training process is slow?
Thanks sooooo much!

about image sizes in test on segmentation task.

I am confused what is the image size in test on segmentation task？
515x512 or just the same ratio with original image ？
how to handle the problem if the feature map can not be divided by patch size 2x2?

How to build the MobileViT model?

Following the steps given in the README.md but unable to build the model.

If I am running the script mobilevit.py. It is showing a relative import error.

ModuleNotFoundError: No module named 'utils'

Any solution? @sacmehta

How to prepare the custom dataset?

ERROR - Nan encountered in the loss.

Hi, there are some problems when I tried to train on my dataset, the prediction of the model's output appeared nan. Do you know how to solve such kind of problems?

Some questions about your common init file

I was wondering, that:
what's the difference between default_iteration and default_max_iteration
and default_epochs and default_max_epochs

e.g.
DEFAULT_ITERATIONS = 300000
DEFAULT_EPOCHS = 300
DEFAULT_MAX_ITERATIONS = DEFAULT_MAX_EPOCHS = 10000000

thanks!

Q: Video MobileViTv2 Kinetics-400

Hi there, thank you for sharing your work! Very clean code, and easy to understand :]

Have you trained a video model based on MobileVitV2 on Kinetics-400 yet?
If so, are you going to share the weights?

Thank you.

Model stability problem

Hi!
Thank you for the great work. I used the mobilevit blocks for my model to low level task. at begin it has good performance , but I get different performance when I run it once again. my model is stable if I remove the mobilevit blocks. Do you know what problem would make the model instability, I use following basic parameter:
max_lr:1e-4
min_lr:1e-6
optim
name: adamw
scheduler:
name: "cosine"
in_channels:96
transformer_dim : 144
ffn_dim = 288
n_transformer_blocks=2

Run mobilevit-v2 failed with ANE(iphone 13 pro and ios15.5)

Hi, with iphone13pro and ios15.5，i got the error when running mobilevit-v2. Do you have any experience to deal with it?

2022-06-27 15:12:57.196279+0800 CoreMLPerformance[10320:18583922] [espresso] [Espresso::handle_ex_plan] exception=ANECF error: ANECCompile(/var/mobile/Library/Caches/com.apple.aned/tmp/com.example.CoreMLPerformance/824F1B79C0CBD30CBE0EFD39FDD54AB15B3E93F859264B1FE69FDB3F02C8D241/C5992495EF614BA515A970B3EBE95F193A7C62E4F7478D853BED6F250D40E299/) FAILED: err=(
CompilationFailure
)
2022-06-27 15:12:57.196809+0800 CoreMLPerformance[10320:18583922] [coreml] Error plan build: -1.

The mobilevit-v2 is produced by following example command.
export CONFIG_FILE="https://docs-assets.developer.apple.com/ml-research/models/cvnets-v2/classification/mobilevitv2/imagenet1k/256x256/mobilevitv2-1.0.yaml"
export MODEL_WEIGHTS="https://docs-assets.developer.apple.com/ml-research/models/cvnets-v2/classification/mobilevitv2/imagenet1k/256x256/mobilevitv2-1.0.pt"
cvnets-convert --common.config-file $CONFIG_FILE --common.results-loc coreml_models_cls --model.classification.pretrained $MODEL_WEIGHTS --conversion.coreml-extn mlmodel

LayerNorm2d != GroupNorm w/ groups=1

Re your MobileVit2, these two norms are not equivalent and it would be misleading to call it LayerNorm2d as the group norm w/ groups=1 is not equivalent. 'LayerNorm2d' is already used elsewhere in other nets. Might be worth retraining MobileVit2 with an actual LayerNorm or renaming the norm to just GroupNorm.

ml-cvnets/cvnets/layers/normalization/layer_norm.py

Line 56 in 84d992f

class LayerNorm2D(nn.GroupNorm):

how to create model? the code looks a bit messy

Recommendations for configuring heads/training on custom datasets?

Thanks for developing MobileVit!
I'm wondering if there are any specific tips/examples for fine-tuning the pre-trained classification and detection models using mobilevit on custom datasets?
I see the n_classes reference in both classifier (1000) and detection (80), but can you provide any quick example of modifying for custom datasets and if you have any recommended lr for finetuning?
Thanks very much!

add the new MobileOne model

Thanks apple for open sourcing sota deep learning models to us!
I seen that you added recently the brand new mobileViTv2.
it could be nice to see also an option to downlad the new MobileOne model by apple: https://arxiv.org/abs/2206.04040v1

thanks!

Effective batch size

in readme doc:

for imagenet 1k, the effective batch size is 1k?
for imagenet 21k, the effective batch size if 4k?

why?

Using mobilevit attention layers to create heat map

Is it possible to use the attention layers of the mobilevit model to create a heatmap on the image input with the parts of the image used? I'm looking to something like this :

Segmentation config missing parameter

Config of MobileViT for segmentation using DeepLabV3 is lacking n_classes: 21 parameter to fit pertained weights.

MobileViT (version 1) pre-training dataset

Hello! thanks for sharing the great work!

I have a small question about the pre-training dataset.
Did all MobileViT (version 1) models pre-trained on ImageNet-1k?
I guess they did, but I want to make sure.
Thanks!

detection

About swish activation

Hi, I check the code of mobileVitV2, in the config

ml-cvnets/config/detection/ssd_coco/mobilevit_v2.yaml

Lines 75 to 76 in 84d992f

    
           activation: 
        
             name: "swish"

the model.classification.activation is swish, but I didn't find where it is used in the codes, all ConvLayers used model.activation.name which is ReLU; So could you help me where swish is used?

Question about video_reader

Thanks for such a great job!
I want to ask questions about the custom dataset and
the parameters in this directory:config/video_classification/kinetics/mobilevit_st.yaml

In this file,clips_per_video and num_frames_per_clip are 8,
Assume for 2 videos with 8 and 64 frames,
The dataset size will be (1+8)?

Another questoin:
If the length of the video longer than 64 frames,
Do I have to seperate the video to shorter segment or I can just modeify the parameter '' clips_per_video''?

Thank you for your help!!!

Exception catched while training ResNet ?

I'm trying to train the resnet classificaion model in my machine using 2 TITAN RTX. I leave all the configuration the default value in the config file except the dataset path and the log_freq. After training several iterations or several epoches, an exception was throwed as below :
`2022-02-07 14:21:38 - LOGS - Epoch: 5 [ 16462/10000000], loss: 1.0149, top1: 100.0000, top5: 100.0000, LR: [0.398905, 0.398905], Avg. batch load time: 2.070, Elapsed time: 3878.27
2022-02-07 14:24:18 - LOGS - Epoch: 5 [ 16512/10000000], loss: 1.0149, top1: 100.0000, top5: 100.0000, LR: [0.398905, 0.398905], Avg. batch load time: 2.054, Elapsed time: 4038.14
2022-02-07 14:26:52 - LOGS - Epoch: 5 [ 16562/10000000], loss: 1.0332, top1: 99.9978, top5: 99.9978, LR: [0.398905, 0.398905], Avg. batch load time: 2.071, Elapsed time: 4191.72
--Call--

/home/guan/miniconda3/envs/deit/lib/python3.6/site-packages/torch/autocast_mode.py(179)exit()
-> def exit(self, *args):
(Pdb)
2022-02-07 14:27:03 - LOGS - Exception occurred that interrupted the training.

2022-02-07 14:27:04 - LOGS - Training took 01:10:05.55`

This problem was encounted repeatedly when I resume training many times. It is strange that the problem was not encounted while training MobileViT. Could you tell me how to solve this problem?

MobileViT use a larger GPU memory than AlexNet?

Is it normal? I noticed that mobilevit have no more than 10 M parameters.

Segmentation accuracies are inconsistent

In the published paper, for segmentation w/ DeepLabv3, accuracies of MobileViT-XXS, MobileViT-XS and MobileViT-S are 73.6, 77.1 and 79.1 respectively. However, in readme-mobilevit.md, the authors states that the segmentation accuacies of the three mentioned models are 72.8, 77.1 and 79.3. For MobileViT-XXS, mIOU is change from 73.6 to 72.8. What causes such a big drop? Does the training process is very unstable?

Instructions to Run Inference on Mobile Devices

Hi,

Thank you for the great work. I would request you to share the detailed instructions to run inference and calculate latency on mobile devices (Table. 3 of the paper). Thank you

Segmentation Head for Custom Dataset is not automatically connected!

I like your library. Thank you for this.

I tried to train my Custom Dataset. I created CustomDataset Class. Class number of my dataset is 10. I used pretrained mobilenetv2 weights. When I run it, I have ize mismatch for seg_head.classifier.block.conv.bias: copying a param with shape torch.Size([150]) from checkpoint, the shape in current model is torch.Size([10])

Because class number of Ade20k Dataset is 150. Mine is 10. I think the Segmentation head is automatically connected.
Most libraries do this automatically. How can I fix this? Are you going to post a command for fine-tuning?

Dose batch sampler make model learns same weight in every training repeatation?

Hi! Thanks for sharing great work!

I have a question about the sampler

I am working on some training examples with variable_batch_sampler and batch sampler.
I'm trying to get average ACC over 5 times training (repeat training 5 times in the same setting)
Best validation ACCs may be similar (not the same) in every repeated trained model with both samplers.
But when I used the batch sampler, the all best val ACC of the repeated model are the same. Is that right?

I'm working with this yaml file

0707_mobilevits_real_defualt_lr0.0001_cosine_advanced_multiscale.docx

with this shell script command

for iter in '1' '2' '3' '4' '5'
do


    CUDA_VISIBLE_DEVICES=3 cvnets-train --common.config-file ./config/classification/CBIS-DDSM_2c_womulti/0707_mobilevits_real_defualt_lr0.0001_cosine_advanced_multiscale.yaml --common.results-loc ./results/2class_iters_wo_multi/iter$iter --model.classification.finetune-pretrained-model --model.classification.n-pretrained-classes 1000 --model.classification.pretrained ./weights/mobilevit_s.pt
done

and when I repeated training 5 times, best val ACC (same value 72.5467) appear in 268 epoch.
comparing_iterations.xlsx

This result is right?

Also I modified some code because to tracking the training information.

modified code is under hear

code.zip

AMP settings

When I was training mobile net v3 model with mixed_precision = true, the program raised an error like this:

022-08-16 03:13:22 - DEBUG - Training epoch 0 with 66072 samples
2022-08-16 03:14:03 - LOGS - Epoch: 0 [ 1/10000000], loss: 5.1851, LR: [0.1, 0.1], Avg. batch load time: 38.484, Elapsed time: 40.62
2022-08-16 03:14:06 - LOGS - Exception occurred that interrupted the training. CUDA error: an illegal memory access was encountered

Do you have any suggestion?

What is difference between variable_batch_sampler and multi_scale_sampler?

Hello, Thanks for sharing your great work!

I want to train the Mobilevit model with a multi-scale sampler, but I can not find any .yaml file for it. (In classification task)
The Mobilevit .yaml file in config/classification/imagenet uses a variable batch sampler, also Mobilevit V2 uses the same thing in training.

Is the variable batch sampler the same as the multi-scale sampler?
What should I use? Multi-scale sampler or variable batch sampler?

Running Training for Segmentation ScanNet

Hello @sacmehta,

Thank you for setting up this exhaustive repository!

I was looking to run the semantic segmentation training for the ScanNet dataset (for 10 selected classes.) I have gone ahead and made the following changes specific for the ScanNet dataset:

Complete Dataloader pipeline, as suggested in README
Config Files
Saving provided MobileNetv2 & MobileNetVit model trained on ADE20K, MobileNetv2 & MobileNetVit model trained on ImageNet on README.

Query:
I want to use the pre-trained model weights (of the ADE20K dataset), to start the training for ScanNet Dataset. You have pointed out to use this apple resource to train the Deeplabv3 model. I have gone ahead and created a script file for it, with the following content:
export CONFIG_FILE="./config/segmentation/scannet/deeplabv3_mobilevitv2-0.5.yaml"
export IMAGENET_PRETRAINED_WTS="./model_weights/mobilevitv2-0.5.pt"
PYTHONWARNINGS="ignore" cvnets-train --common.config-file $CONFIG_FILE --common.results-loc deeplabv3_results --model.classification.pretrained $IMAGENET_PRETRAINED_WTS.

When I run the above script on my terminal, I get the following error:
set_env_segmentation.sh: 3: cvnets-train: not found
I was wondering if you could provide me with the sequence of commands, which must be executed on the terminal, to get the training model running on a GPU, which uses a pre-trained model, trained on Imagenet and ADE20K.

Thank you again for the resource!

Nitin Bansal

[GitHub Pages] Missing link on "Installation" page

Hello,

It seems like the source for the "Installation" page meant to include a link to an external page for help on setting up a conda environment. Could you please share which website you maybe had in mind?

I think the "Get Started" page on the PyTorch docs might be a good option. Could open up a PR to make this fix.

Thank you for this great work!

Training Time on ImageNet

Hi,
I wonder how long does it take for you to train MobileViT-S with 8 GPUs?
I trained your model MobileViT-S with 1024 batch size (128*8) for 1 epoch with 8 V100 GPUs, but the training time is very slow.
It costs like 40 minutes/epoch. For 300 epochs, it means more than 8 days.
Is it normal?

Thank you

apple / ml-cvnets Goto Github PK

ml-cvnets's Introduction

CVNets: A library for training computer vision networks

Table of contents

What's new?

Installation

Getting started

Supported models and Tasks

Maintainers

Previous Maintainers

Research effort at Apple using CVNets

Contributing to CVNets

License

Citation

ml-cvnets's People

Contributors

Stargazers

Watchers

Forkers

ml-cvnets's Issues

Recommend Projects

Recommend Topics

Recommend Org