Light

helioszhao / m3l Goto Github PK

View Code? Open in Web Editor NEW

61.0 61.0 21.0 1.66 MB

Pytorch implementation for M^3L. CVPR 2021

Python 100.00%

cvpr2021 personreid

m3l's Introduction

Hi there, I'm Yuyang Zhao 👋

Contact Me:

✉️ Email: [email protected]

🔗 Website: https://yuyangzhao.com

🔎 Google Scholar: https://scholar.google.com/citations?user=u5M6XPAAAAAJ

m3l's People

Contributors

Stargazers

Watchers

m3l's Issues

How to run this code in only one source dataset? ask for your help :)

Hi, Doc Zhao!

When I reproduced your code in only one source dataset, I found that it can't work.
The error reports:

I just change this code to run in one source dataset.
in main.py
A.

in resMeta.py
D.

Look forward to your answer, thanks a lot! :)
best wishes!

Issues about copyModel function

Hi, why you write the copyModel function instead of using copy.deepcopy method?

Something about baseline

Hi, nice job!
I am curious about how to implement the baseline.
for dataset in datasets: for index in range(iters): loss = cls + tri + cent
Is the baseline written like this?
And in your Table 4, line 2, as follows

Did you use unify the classification loss of the label space of all datasets ,triplet ans centerloss?
Looking forward to your reply.

question after my run

I ran your project with three GPUs,and i didnot change the code except replacing msmt17v1 with msmt17v2,but it has this question:

File "D:\papercode\M3Lmaster\reid\trainers.py", line 64, in train
f_out, tri_features = self.model(inputs, MTE='', save_index=save_index)
ValueError: too many values to unpack (expected 2)

Is there something wrong with my dataset? Have you encountered it before?
Thank you for reading!

a few questions

Hello, I am very interested in your work, then I ran your code and have a few questions to ask you:
1, I see that your code writes itself some network layer, and normal torch.nn network layer is different, network layer parameter can be a tensor instead of parameter, so that can achieve meta-learning. But I found that with meta-learning and without, the training speed varies greatly. That is to say, using meta-learning training becomes very slow. Do I have to write the network layer as buffer? Have you tried directly getting the gradient of mteloss and then writing an optimizer to update the model parameters directly with the gradient of mteloss and grad_info(mtrloss gradient)?
2. When I run your code, there will be a feature fusion process in the meta-test phase, The fusion feature comes from Norm. Sample, where I found that running would report an error:"the parameter scale has invalid values", and the larger the learning rate, the more likely this error would occur. Have you ever met one?
3. I also found that I deleted the code for feature fusion due to an error in 2, I found that running out a much higher than in your article, such as MS + C + D → M reached mAP = 52.1%; At the same time, I also found that this time without meta-learning can achieve mAP = 52.0%

Thanks for reading!

Getting ValueError randomly during training

Thank you for making the code available.
I was trying to run the same repo as it is, I just changed my batch size from 64 to 32 due to memory constraints.
I am running the code on 2 Nvidia 1080Ti GPU's each of 12 GB memory.

However, randomly after few epochs I keep getting a Value error as:
ValueError: Expected parameter scale (Tensor of shape (2048,)) of distribution Normal(loc: torch.Size([2048]), scale: torch.Size([2048])) to satisfy the constraint GreaterThan(lower_bound=0.0), but found invalid values:
tensor([1.2194e-04, 1.5050e-04, 2.8594e-03, ..., 3.8839e-05, 1.8705e-05,
1.1311e-05], device='cuda:0')

I am getting it randomly after 10 epochs. Below is the full stack trace.
Kindly help me in this regard to run your code.

Epoch: [25][160/200] Time 2.152 (2.173) Total loss 6.960 (7.223) Loss 3.233(3.638) LossMeta 3.728(3.585)
Epoch: [25][165/200] Time 2.192 (2.173) Total loss 7.966 (7.204) Loss 4.791(3.644) LossMeta 3.174(3.560)
Traceback (most recent call last):
File "main.py", line 286, in
main()
File "main.py", line 108, in main
main_worker(args)
File "main.py", line 202, in main_worker
print_freq=args.print_freq, train_iters=args.iters)
File "/home/sarosij/M3L/reid/trainers.py", line 89, in train
f_test, mte_tri = self.newMeta(testInputs, MTE=self.args.BNtype)
File "/home/sarosij/anaconda3/envs/reid/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/sarosij/anaconda3/envs/reid/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 168, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/sarosij/anaconda3/envs/reid/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 178, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/sarosij/anaconda3/envs/reid/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
output.reraise()
File "/home/sarosij/anaconda3/envs/reid/lib/python3.6/site-packages/torch/_utils.py", line 434, in reraise
raise exception
ValueError: Caught ValueError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/home/sarosij/anaconda3/envs/reid/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
output = module(*input, **kwargs)
File "/home/sarosij/anaconda3/envs/reid/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/sarosij/M3L/reid/models/resMeta.py", line 180, in forward
bn_x = self.feat_bn(x, MTE, save_index)
File "/home/sarosij/anaconda3/envs/reid/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/sarosij/M3L/reid/models/MetaModules.py", line 362, in forward
Distri1 = Normal(self.meta_mean1, self.meta_var1)
File "/home/sarosij/anaconda3/envs/reid/lib/python3.6/site-packages/torch/distributions/normal.py", line 50, in init
super(Normal, self).init(batch_shape, validate_args=validate_args)
File "/home/sarosij/anaconda3/envs/reid/lib/python3.6/site-packages/torch/distributions/distribution.py", line 56, in init
f"Expected parameter {param} "
ValueError: Expected parameter scale (Tensor of shape (2048,)) of distribution Normal(loc: torch.Size([2048]), scale: torch.Size([2048])) to satisfy the constraint GreaterThan(lower_bound=0.0), but found invalid values:
tensor([1.2194e-04, 1.5050e-04, 2.8594e-03, ..., 3.8839e-05, 1.8705e-05,
1.1311e-05], device='cuda:0')

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.