Giter Site home page Giter Site logo

m3l's Introduction

m3l's People

Contributors

helioszhao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

m3l's Issues

Something about baseline

Hi, nice job!
I am curious about how to implement the baseline.
for dataset in datasets: for index in range(iters): loss = cls + tri + cent
Is the baseline written like this?
And in your Table 4, line 2, as follows
Y4}4JR6R$@CCBGFHK(ZVCUR
Did you use unify the classification loss of the label space of all datasets ,triplet ans centerloss?
Looking forward to your reply.

question after my run

I ran your project with three GPUs,and i didnot change the code except replacing msmt17v1 with msmt17v2,but it has this question:

File "D:\papercode\M3Lmaster\reid\trainers.py", line 64, in train
f_out, tri_features = self.model(inputs, MTE='', save_index=save_index)
ValueError: too many values to unpack (expected 2)

Is there something wrong with my dataset? Have you encountered it before?
Thank you for reading!

a few questions

Hello, I am very interested in your work, then I ran your code and have a few questions to ask you:
1, I see that your code writes itself some network layer, and normal torch.nn network layer is different, network layer parameter can be a tensor instead of parameter, so that can achieve meta-learning. But I found that with meta-learning and without, the training speed varies greatly. That is to say, using meta-learning training becomes very slow. Do I have to write the network layer as buffer? Have you tried directly getting the gradient of mteloss and then writing an optimizer to update the model parameters directly with the gradient of mteloss and grad_info(mtrloss gradient)?
2. When I run your code, there will be a feature fusion process in the meta-test phase, The fusion feature comes from Norm. Sample, where I found that running would report an error:"the parameter scale has invalid values", and the larger the learning rate, the more likely this error would occur. Have you ever met one?
3. I also found that I deleted the code for feature fusion due to an error in 2, I found that running out a much higher than in your article, such as MS + C + D โ†’ M reached mAP = 52.1%; At the same time, I also found that this time without meta-learning can achieve mAP = 52.0%

Thanks for reading!

Getting ValueError randomly during training

Thank you for making the code available.
I was trying to run the same repo as it is, I just changed my batch size from 64 to 32 due to memory constraints.
I am running the code on 2 Nvidia 1080Ti GPU's each of 12 GB memory.

However, randomly after few epochs I keep getting a Value error as:
ValueError: Expected parameter scale (Tensor of shape (2048,)) of distribution Normal(loc: torch.Size([2048]), scale: torch.Size([2048])) to satisfy the constraint GreaterThan(lower_bound=0.0), but found invalid values:
tensor([1.2194e-04, 1.5050e-04, 2.8594e-03, ..., 3.8839e-05, 1.8705e-05,
1.1311e-05], device='cuda:0')

I am getting it randomly after 10 epochs. Below is the full stack trace.
Kindly help me in this regard to run your code.

Epoch: [25][160/200] Time 2.152 (2.173) Total loss 6.960 (7.223) Loss 3.233(3.638) LossMeta 3.728(3.585)
Epoch: [25][165/200] Time 2.192 (2.173) Total loss 7.966 (7.204) Loss 4.791(3.644) LossMeta 3.174(3.560)
Traceback (most recent call last):
File "main.py", line 286, in
main()
File "main.py", line 108, in main
main_worker(args)
File "main.py", line 202, in main_worker
print_freq=args.print_freq, train_iters=args.iters)
File "/home/sarosij/M3L/reid/trainers.py", line 89, in train
f_test, mte_tri = self.newMeta(testInputs, MTE=self.args.BNtype)
File "/home/sarosij/anaconda3/envs/reid/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/sarosij/anaconda3/envs/reid/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 168, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/sarosij/anaconda3/envs/reid/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 178, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/sarosij/anaconda3/envs/reid/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
output.reraise()
File "/home/sarosij/anaconda3/envs/reid/lib/python3.6/site-packages/torch/_utils.py", line 434, in reraise
raise exception
ValueError: Caught ValueError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/home/sarosij/anaconda3/envs/reid/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
output = module(*input, **kwargs)
File "/home/sarosij/anaconda3/envs/reid/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/sarosij/M3L/reid/models/resMeta.py", line 180, in forward
bn_x = self.feat_bn(x, MTE, save_index)
File "/home/sarosij/anaconda3/envs/reid/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/sarosij/M3L/reid/models/MetaModules.py", line 362, in forward
Distri1 = Normal(self.meta_mean1, self.meta_var1)
File "/home/sarosij/anaconda3/envs/reid/lib/python3.6/site-packages/torch/distributions/normal.py", line 50, in init
super(Normal, self).init(batch_shape, validate_args=validate_args)
File "/home/sarosij/anaconda3/envs/reid/lib/python3.6/site-packages/torch/distributions/distribution.py", line 56, in init
f"Expected parameter {param} "
ValueError: Expected parameter scale (Tensor of shape (2048,)) of distribution Normal(loc: torch.Size([2048]), scale: torch.Size([2048])) to satisfy the constraint GreaterThan(lower_bound=0.0), but found invalid values:
tensor([1.2194e-04, 1.5050e-04, 2.8594e-03, ..., 3.8839e-05, 1.8705e-05,
1.1311e-05], device='cuda:0')

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.