jiangtaoxie / fast-mpn-cov Goto Github PK

@CVPR2018: Efficient unrolling iterative matrix square-root normalized ConvNets, implemented by PyTorch (and code of B-CNN,Compact bilinear pooling etc.) for training from scratch & finetuning.

Home Page: http://peihuali.org/iSQRT-COV/index.html

License: MIT License

Python 85.92% Shell 14.08%

fast-mpn-cov's People

Contributors

Stargazers

Watchers

Forkers

jztd6676 wolfworld6 coderhhx ruixuejianfei chenghao-ch94 xiaoxiaoshikui vcbe123 littlecherry11 antecede nightist baipdiw parsonszeng bolin-chen algoflow19 sun-yitao henanjun nothingeasy orkhanhi yunfan55 wansuiye09 answerlinyi springzfx lan1991xu abcp4 xhchrn hejh1995 hzhang57 qijuntian lvyilin gdjmck csqlwang qingfengli-ing ms-ma usuyama lilujunai hywz36 jedrazb suyanzhou626 jyydlut gaiya2050 nankaigc bruinxiong tjusym kaderghal woshiyanyan mldl akindofyoga nemo1999 windexplore cnguyen123 220ec3119 pchauchat dl-cnn saimunur erlebnisw

fast-mpn-cov's Issues

Should we use SVMs for FGVC?

Hi, I am reading about your iSQRT paper and i think it is quite interesting. However, I am confused about the usage of SVMs.
You wrote "After finetuning, the outputs of iSQRT-COV layer are ℓ2−normalized before inputted to train k one-vs-all linear SVMs with hyperparameter C = 1" in your paper but i didn't find it in your released code.

Loss and Accuracy dont change

Hi, thank you for a great paper and code.

I have tried to run on Cars196 dataset the code where I used MPNCOV and mpnconvresnet50 in Jypyter Notebook, firsly I got error in #loss.backward()#. After adjusting the code with detach(), I could run the code.

However, the loss and accuracy dont change from epoch to epoch, just oscillate around the same value. The best top 1 accuracy is 0.006 and the loss is just 5.28 for all epochs.

Could you help me to make the code run properly?

Thank you in advance.

Why is your manual implementation via autograd.Function even faster than PyTorch 's autograd engine?

In order to make my code clean and easy to read, I tried to reimplement covpool, sqrtm and triuvec with native PyTorch operators as a simple plain python function, as shown in #7.

After ensuring the forward and backward results are equivalent between my auto backward version (with autograd engine) and your manual backward version (with autograd.Function), I tested their speed and surprisingly found my auto backward version slower.

Have you compared these two different approaches before? Do you have any idea on why the manual backward implementation is even faster than PyTorch 's autograd engine?

the loss tend to be nan

when i used mpncovresnet50 and MPNCOV， the train converged.
But if i change the backbone to resnet* or VGG*， keeping the MPNCOV unchanged, the train loss is nan.
Beside, my dataset is for a mini-FGVC task. It contains 9 classes with extra-unbalanced. When i fine-tuning
within two stage, The test acc is about 78, which is lower than plain vgg. Could you give me some advice?
Thank you for your amazing work.

gradcheck for Sqrtm in MPNCOV.py

Hello,
I use autograd.gradcheck for Sqrtm in MPNCOV.py and the function returns false, but if I delete the ligne 'der_NSiter = der_NSiter.transpose(1, 2)' , it turns out to be true.

Is there anybody who could explain it?

Thanks so much

Model parameters

I have notice a problem in finetune.sh
if i don't change setting the model used is this:
(features): WITH ALL LAYER
(classifier): Linear(in_features=32896, out_features=70, bias=True)
(representation): MPNCOV()

I don't understand why representation level is after classifier.

RuntimeError: Error(s) in loading state_dict for DataParallel:

When I fine-tuned the mpncovresnet50 in second stage, the error occured.

loading checkpoint 'Finetune-c9-mpncovresnet50-MPNCOV-reproduce-lr0.001-bs40/checkpoint.pth.tar'
Traceback (most recent call last):
File "main.py", line 436, in
main()
File "main.py", line 179, in main
model.load_state_dict(checkpoint['state_dict'])
File "/home/wen/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 719, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for DataParallel:
Missing key(s) in state_dict: "module.representation.conv_dr_block.0.weight", "module.representation.conv_dr_block.1.weight", "module.representation.conv_dr_block.1.bias", "module.representation.conv_dr_block.1.running_mean", "module.representation.conv_dr_block.1.running_var".
Unexpected key(s) in state_dict: "module.features.8.weight", "module.features.9.weight", "module.features.9.bias", "module.features.9.running_mean", "module.features.9.running_var", "module.features.9.num_batches_tracked".

two_stage_finetune.txt
this is my config.

what's the meaning of Implementations?

Hello, I don't know what the meaning of three points in Implementations.

Is the experiment implemented in three ways, in pytorch or tensorflow or MatConvNet?
Or use pytorch first? then tensorflow? and finally MatConvNet?
3.Or use MatConvNet in the pytorch environment (or use MatConvNet in the tensorflow environment)?
My question may be naive .Thanks very much if can hear from you.

When will you update the Fine-grained classification results of resnet101

When I reproduce the resnet101 experiments on cub, I only got the same results with resnet50 which is 88.1. I want to ask u when will u update the results because there is always TODO.

Downloading problem?

When I ran the code finetune.sh, it began to download from jtxie.com while it's too slow. So I decided to use BaiduYun. However, I got the following problem:

Start finetuning! Namespace(arch='mpncovresnet101', batch_size=10, benchmark='CUB', classifier_factor=5, data='/path/to/the/data/CUB', dist_backend='gloo', dist_url='tcp://224.66.41.62:23456', epochs=100, evaluate=False, freezed_layer=0, gpu=1, lr=0.0012, lr_method='step', lr_params=[[100.0]], modeldir='Results/Finetune-CUB-mpncovresnet101-MPNCOV-reproduce-lr1.2e-3-bs10', momentum=0.9, num_classes=200, pretrained=True, print_freq=100, representation='MPNCOV', resume='Results/Finetune-CUB-mpncovresnet101-MPNCOV-reproduce-lr1.2e-3-bs10/mpncovresnet101-ade9737a.pth.tar', seed=None, start_epoch=0, store_model_everyepoch=False, weight_decay=0.0001, workers=8, world_size=1) main.py:127: UserWarning: You have chosen a specific GPU. This will completely disable data parallelism. warnings.warn('You have chosen a specific GPU. This will completely ' => loading checkpoint 'Results/Finetune-CUB-mpncovresnet101-MPNCOV-reproduce-lr1.2e-3-bs10/mpncovresnet101-ade9737a.pth.tar' Traceback (most recent call last): File "main.py", line 503, in <module> main() File "main.py", line 206, in main checkpoint = torch.load(args.resume) File "/home/yzzc/.local/lib/python3.5/site-packages/torch/serialization.py", line 358, in load return _load(f, map_location, pickle_module) File "/home/yzzc/.local/lib/python3.5/site-packages/torch/serialization.py", line 527, in _load return legacy_load(f) File "/home/yzzc/.local/lib/python3.5/site-packages/torch/serialization.py", line 441, in legacy_load tar.extract('storages', path=tmpdir) File "/usr/lib/python3.5/tarfile.py", line 2027, in extract tarinfo = self.getmember(member) File "/usr/lib/python3.5/tarfile.py", line 1738, in getmember raise KeyError("filename %r not found" % name) KeyError: "filename 'storages' not found"
How can I solve this problem?

About the post-compensation implementation using trace

Hi， I try to use trace to implement the Pre-normalization and Post-compensation, but I find that the trace of covariance matrix might be negative which make sqrt operation wrong.

so, Could I ask if we can guarantee that the trace of covariance matrix is positive, or use some operation to make sure it is positive.

模型参数下载

作者取消了模型参数下载，请问可以发一份给我吗？万分感谢。邮箱地址：[email protected]

the result of the combination of mpncov & efficientnet did not meet expectations

Thanks for the great work.

I tried on pretrained mpncovresnet101 & mpncovresnet50, the performance is impressive. But the performance is poor when I combined mpncov and efficient-net. I fixed the backbone parameters and only update params of reduce-layer and fc. I replace the layer_reduce_relu with Swish, and train it on imagenet2012 for 55 epochs, the top1 acc is only about 0.76.

I wonder why mpncov shows poor performance on a better backbone, any advice would be thankful!

The performance of released model 'mpncovresnet50-15991845.pth'

Why can't I get the desired performance, i.e. error rate 21.71% when testing the released model 'mpncovresnet50-15991845.pth'?

About the implementation of MPNCOV meta layer in pytorch 0.3.1

I try to add mpncov layer in my network in pytorch 0.3.1. but I get error which is

Traceback (most recent call last):
  File "main.py", line 421, in <module>
    main()
  File "main.py", line 211, in main
    loss_temp, train_prec1_temp, train_prec5_temp = train(train_loader, model, criterion, optimizer, epoch)
  File "main.py", line 269, in train
    output = model(input)
  File "/home/zhangli/anaconda3/envs/pytorch-0.3.1/lib/python3.5/site-packages/torch/nn/modules/module.py", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/zhangli/anaconda3/envs/pytorch-0.3.1/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 71, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/zhangli/anaconda3/envs/pytorch-0.3.1/lib/python3.5/site-packages/torch/nn/modules/module.py", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/zhangli/wubanggu/darts/test_res/resnet.py", line 156, in forward
    x = self.representation(x)
  File "/home/zhangli/anaconda3/envs/pytorch-0.3.1/lib/python3.5/site-packages/torch/nn/modules/module.py", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/zhangli/wubanggu/darts/test_res/MPNCOV.py", line 69, in forward
    x = self._cov_pool(x)
  File "/home/zhangli/wubanggu/darts/test_res/MPNCOV.py", line 60, in _cov_pool
    return Covpool.apply(x)
RuntimeError: save_for_backward can only save input or output tensors, but argument 1 doesn't satisfy this condition

How can I fix it , Thank you very much.

When would you release the TensorFlow Implemention?

where should I put the pre-training model?

Hello,
I tried to train the model by myself. It shows amazing performance.
Hence, I gonna finetune the model.
After I downloaded the model from the Google Drive, I think I should put it in a special path.
Where should I put the pre-trained model? Or where should I set the path of pre-trained model?
Coz I didn't find a clear hint of the question.
Look forward to your reply.

can't download the pretrained pth

Thank you so much for the excellent work!When I run the finetune.sh, it tell me that the http://jtxie.com/models/mpncovresnet50-15991845.pth can't download.And when I type the link directly on my brower, it also can't turn to that wedsite.Could you please check the link? I will be appreciate if it is convenient for you to provide the pth. That is really an excel work! Thank you so much!

About the comparision results?

Could u pls tell me where did you get the results of CBP-Resnet50 model? Did u make the experiments yourself? If so, could u pls release your code or thraining files?

what are suggested training parameters for CUB_200_2001?

I tried lr=0.1,0.01, 0.001, batch=16, after about 10 epochs, it runs into significant over fitting. what are suggested training parameters for CUB_200_2001 to achieve top1 acc = 0.88?
Thanks a lot.

1D version of Fast-MPN-Cov

Can you please also release a 1D version that we can use for time series rather than images?

Fatal IO error: client killed

When I run the code, the following errors occur: Fatal IO error: client killed, and terminate the program. Is it the problem of my system environment? Do you know how to solve it? Thanks very much!

This error will only occur when I run the second time, and the first time it runs successfully.

fine tune issue

When I ran finetune.sh in ./finetune/, I recognized that when finetuning, the training process didn't apply the forward function in mpnconvresnet.py instead applied the forward function in base.py? So it ignores the following operations in mpnconvresnet.py if I am right:
x = MPNCOV.CovpoolLayer(x) x = MPNCOV.SqrtmLayer(x, 5) x = MPNCOV.TriuvecLayer(x)
why?

CUB-200-20122数据集实验

你好，我看了您的论文，发现您在CUB数据集上做了实验，您可以公布下关于这个数据集的代码吗，谢谢！