open-mmlab / openunreid Goto Github PK

PyTorch open-source toolbox for unsupervised or domain adaptive object re-ID.

License: Apache License 2.0

Python 99.12% Makefile 0.03% Shell 0.85%

unsupervised-learning unsupervised-domain-adaptation re-identification image-retrieval open-set-domain-adaptation pseudo-labeling domain-translation

openunreid's Introduction

OpenUnReID

Introduction

OpenUnReID is an open-source PyTorch-based codebase for both unsupervised learning (USL) and unsupervised domain adaptation (UDA) in the context of object re-ID tasks. It provides strong baselines and multiple state-of-the-art methods with highly refactored codes for both pseudo-label-based and domain-translation-based frameworks. It works with Python >=3.5 and PyTorch >=1.1.

We are actively updating this repo, and more methods will be supported soon. Contributions are welcome.

Major features

Distributed training & testing with multiple GPUs and multiple machines.
High flexibility on various combinations of datasets, backbones, losses, etc.
GPU-based pseudo-label generation and k-reciprocal re-ranking with quite high speed.
Plug-and-play domain-specific BatchNorms for any backbones, sync BN is also supported.
Mixed precision training is supported, achieving higher efficiency.
A strong cluster baseline, providing high extensibility on designing new methods.
State-of-the-art methods and performances for both USL and UDA problems on object re-ID.

Supported methods

Please refer to MODEL_ZOO.md for trained models and download links, and please refer to LEADERBOARD.md for the leaderboard on public benchmarks.

Method	Reference	USL	UDA
UDA_TP	PR'20 (arXiv'18)	✓	✓
SPGAN	CVPR'18	n/a	✓
SSG	ICCV'19	ongoing	ongoing
strong_baseline	Sec. 3.1 in ICLR'20	✓	✓
MMT	ICLR'20	✓	✓
SpCL	NeurIPS'20	✓	✓
SDA	arXiv'20	n/a	ongoing

Updates

[2020-08-02] Add the leaderboard on public benchmarks: LEADERBOARD.md

[2020-07-30] OpenUnReID v0.1.1 is released:

Support domain-translation-based frameworks, CycleGAN and SPGAN.
Support mixed precision training (torch.cuda.amp in PyTorch>=1.6), use it by adding TRAIN.amp True at the end of training commands.

[2020-07-01] OpenUnReID v0.1.0 is released.

Installation

Please refer to INSTALL.md for installation and dataset preparation.

Get Started

Please refer to GETTING_STARTED.md for the basic usage of OpenUnReID.

License

OpenUnReID is released under the Apache 2.0 license.

Citation

If you use this toolbox or models in your research, please consider cite:

@inproceedings{ge2020mutual,
  title={Mutual Mean-Teaching: Pseudo Label Refinery for Unsupervised Domain Adaptation on Person Re-identification},
  author={Yixiao Ge and Dapeng Chen and Hongsheng Li},
  booktitle={International Conference on Learning Representations},
  year={2020},
  url={https://openreview.net/forum?id=rJlnOhVYPS}
}

@inproceedings{ge2020selfpaced,
    title={Self-paced Contrastive Learning with Hybrid Memory for Domain Adaptive Object Re-ID},
    author={Yixiao Ge and Feng Zhu and Dapeng Chen and Rui Zhao and Hongsheng Li},
    booktitle={Advances in Neural Information Processing Systems},
    year={2020}
}

Acknowledgement

Some parts of openunreid are learned from torchreid and fastreid. We would like to thank for their projects, which have boosted the research of supervised re-ID a lot. We hope that OpenUnReID could well benefit the research community of unsupervised re-ID by providing strong baselines and state-of-the-art methods.

Contact

This project is developed by Yixiao Ge (@yxgeee), Tong Xiao (@Cysu), Zhiwei Zhang (@zwzhang121).

openunreid's People

Contributors

Stargazers

Watchers

Forkers

sikastar yyht luckyjkz zkcys001 trantorrepository jingshui2014 gwanglee xrosliang dulvqingyun by-liu nirvanalan bruinxiong zigchang white1973 yuanxincherry snorkeldepth spiritbear000 edwardnguyen1705 wangshanshanahu gjtnb qiguming shengyupei cv-ip guofenggitlearning leochencipher wangyuan249 tyzaizl maggie94413 stevenlol yggame dabblle woshiyanyan daodao2022 yukaizhou 3-leaves-grass x-funbean ziqi-jin zzw-zwzhang tomhardy1138 zongyi1999 tiamat-tech sokazaki mtjmtj7 clarkhedi abdallahbenzine yan-song catcodee aachenhang zhaochuyang linkserendipity luckyzxy182 mzq308734881 cd70zyx teamwong111 lewiszhao mandrenday maoyanmei hartvon deepbehavier bugwayne emersonzc crazyusernova willjay5991 ajunlonglive iq-scm chengwei920412 open-mmlab-12

openunreid's Issues

Loss becomes nan when I try to train MMT for 100 epochs

Hello:
I downloaded your code and Market1501-UDA-MMT config.yaml in your ModelZoo.
At the beginning, I trained the model with configuration the same as downloaded, and I get the correct results on Market1501 after training for 50 epochs, which is mAP 81.0% / R-1 92.3%.
However, after adjusting the total epoch from 50 to 100, all losses become nan on the epoch 64.

Here's the log:
************************* Finished updating pseudo label *************************n
Epoch: [64][ 0/400] Time 0.618 (0.618) Acc@1 46.88% (46.88%) cross_entropy 5.810 (5.810) soft_entropy 5.974 (5.974) softmax_triplet 0.095 (0.095) soft_softmax_triplet 0.113 (0.113)
Epoch: [64][ 10/400] Time 0.412 (0.432) Acc@1 70.31% (52.41%) cross_entropy 4.323 (5.268) soft_entropy 5.372 (6.020) softmax_triplet 0.236 (0.333) soft_softmax_triplet 0.338 (0.314)
Epoch: [64][ 20/400] Time 0.410 (0.422) Acc@1 51.56% (53.65%) cross_entropy 5.469 (5.231) soft_entropy 6.135 (5.920) softmax_triplet 0.829 (0.415) soft_softmax_triplet 0.786 (0.417)
Epoch: [64][ 30/400] Time 0.411 (0.418) Acc@1 59.38% (53.43%) cross_entropy 5.301 (5.288) soft_entropy 6.075 (5.977) softmax_triplet 0.159 (0.350) soft_softmax_triplet 0.162 (0.367)
Epoch: [64][ 40/400] Time 0.403 (0.416) Acc@1 48.44% (53.58%) cross_entropy 5.748 (5.311) soft_entropy 6.841 (6.009) softmax_triplet 0.650 (0.363) soft_softmax_triplet 0.841 (0.390)
Epoch: [64][ 50/400] Time 0.411 (0.421) Acc@1 90.62% (56.43%) cross_entropy 4.243 (5.240) soft_entropy 6.419 (6.050) softmax_triplet 0.711 (0.392) soft_softmax_triplet 0.642 (0.412)
Epoch: [64][ 60/400] Time 0.411 (0.420) Acc@1 84.38% (60.40%) cross_entropy 4.272 (5.112) soft_entropy 5.915 (6.013) softmax_triplet 0.178 (0.400) soft_softmax_triplet 0.181 (0.413)
Epoch: [64][ 70/400] Time 0.411 (0.419) Acc@1 82.81% (63.34%) cross_entropy 4.216 (5.005) soft_entropy 5.374 (5.953) softmax_triplet 0.143 (0.400) soft_softmax_triplet 0.147 (0.416)
Epoch: [64][ 80/400] Time 0.411 (0.418) Acc@1 68.75% (65.12%) cross_entropy 4.149 (4.915) soft_entropy 4.728 (5.885) softmax_triplet 0.135 (0.395) soft_softmax_triplet 0.136 (0.408)
Epoch: [64][ 90/400] Time 0.410 (0.420) Acc@1 79.69% (66.86%) cross_entropy 4.661 (4.848) soft_entropy 6.480 (5.843) softmax_triplet 0.494 (0.403) soft_softmax_triplet 0.562 (0.410)
Epoch: [64][100/400] Time 0.409 (0.420) Acc@1 89.06% (68.83%) cross_entropy 3.709 (4.748) soft_entropy 5.128 (5.814) softmax_triplet 0.023 (0.396) soft_softmax_triplet 0.024 (0.410)
Epoch: [64][110/400] Time 0.409 (0.419) Acc@1 87.50% (70.33%) cross_entropy 3.880 (4.690) soft_entropy 5.349 (5.803) softmax_triplet 0.034 (0.400) soft_softmax_triplet 0.036 (0.420)
Epoch: [64][120/400] Time 0.412 (0.418) Acc@1 89.06% (71.73%) cross_entropy 4.067 (4.626) soft_entropy 6.327 (5.783) softmax_triplet 0.033 (0.378) soft_softmax_triplet 0.067 (0.399)
Epoch: [64][130/400] Time 0.403 (0.417) Acc@1 81.25% (72.52%) cross_entropy 4.177 (4.597) soft_entropy 6.546 (5.798) softmax_triplet 0.301 (0.375) soft_softmax_triplet 0.345 (0.396)
Epoch: [64][140/400] Time 0.410 (0.419) Acc@1 98.44% (73.85%) cross_entropy 3.169 (4.514) soft_entropy 4.617 (5.756) softmax_triplet 0.005 (0.365) soft_softmax_triplet 0.006 (0.388)
Epoch: [64][150/400] Time 0.411 (0.419) Acc@1 96.88% (75.03%) cross_entropy 3.230 (4.454) soft_entropy 5.414 (5.738) softmax_triplet 0.034 (0.358) soft_softmax_triplet 0.042 (0.380)
Epoch: [64][160/400] Time 0.411 (0.418) Acc@1 79.69% (75.90%) cross_entropy 3.972 (4.407) soft_entropy 5.432 (5.724) softmax_triplet 0.157 (0.362) soft_softmax_triplet 0.159 (0.386)
Epoch: [64][170/400] Time 0.411 (0.418) Acc@1 82.81% (76.35%) cross_entropy 3.553 (4.380) soft_entropy 4.720 (5.720) softmax_triplet 0.021 (0.358) soft_softmax_triplet 0.022 (0.381)
Epoch: [64][180/400] Time 0.411 (0.419) Acc@1 89.06% (77.10%) cross_entropy 3.552 (4.347) soft_entropy 5.564 (5.717) softmax_triplet 0.395 (0.358) soft_softmax_triplet 0.557 (0.379)
Epoch: [64][190/400] Time 0.412 (0.419) Acc@1 90.62% (77.83%) cross_entropy 3.651 (4.306) soft_entropy 5.183 (5.714) softmax_triplet 0.300 (0.359) soft_softmax_triplet 0.304 (0.381)
Epoch: [64][200/400] Time 0.412 (0.418) Acc@1 90.62% (78.39%) cross_entropy 3.595 (4.274) soft_entropy 4.808 (5.704) softmax_triplet 0.449 (0.358) soft_softmax_triplet 0.450 (0.381)
Epoch: [64][210/400] Time 0.413 (0.418) Acc@1 92.19% (78.92%) cross_entropy 3.538 (4.244) soft_entropy 5.474 (5.696) softmax_triplet 0.094 (0.354) soft_softmax_triplet 0.184 (0.376)
Epoch: [64][220/400] Time 0.750 (0.419) Acc@1 87.50% (79.31%) cross_entropy 3.932 (4.221) soft_entropy 6.867 (5.692) softmax_triplet 1.262 (0.357) soft_softmax_triplet 1.503 (0.380)
Epoch: [64][230/400] Time 0.412 (0.419) Acc@1 93.75% (79.82%) cross_entropy 3.531 (4.186) soft_entropy 5.543 (5.678) softmax_triplet 0.045 (0.351) soft_softmax_triplet 0.052 (0.374)
Epoch: [64][240/400] Time 0.412 (0.419) Acc@1 95.31% (80.30%) cross_entropy 3.225 (4.156) soft_entropy 5.377 (5.669) softmax_triplet 0.010 (0.344) soft_softmax_triplet 0.106 (0.369)
Epoch: [64][250/400] Time 0.451 (0.419) Acc@1 92.19% (80.66%) cross_entropy 3.677 (4.138) soft_entropy 6.138 (5.661) softmax_triplet 0.128 (0.346) soft_softmax_triplet 0.222 (0.369)
Epoch: [64][260/400] Time 0.441 (0.421) Acc@1 87.50% (80.92%) cross_entropy 3.759 (4.119) soft_entropy 5.217 (5.653) softmax_triplet 0.451 (0.345) soft_softmax_triplet 0.512 (0.367)
Epoch: [64][270/400] Time 0.453 (0.423) Acc@1 85.94% (81.24%) cross_entropy 3.600 (4.103) soft_entropy 5.220 (5.654) softmax_triplet 0.057 (0.350) soft_softmax_triplet 0.062 (0.372)
Epoch: [64][280/400] Time 0.451 (0.424) Acc@1 100.00% (81.68%) cross_entropy 2.953 (4.078) soft_entropy 5.246 (5.647) softmax_triplet 0.054 (0.342) soft_softmax_triplet 0.060 (0.365)
Epoch: [64][290/400] Time 0.452 (0.425) Acc@1 89.06% (81.97%) cross_entropy 3.536 (4.061) soft_entropy 4.614 (5.650) softmax_triplet 0.003 (0.345) soft_softmax_triplet 0.005 (0.368)
Epoch: [64][300/400] Time 0.449 (0.426) Acc@1 93.75% (82.25%) cross_entropy 3.347 (4.047) soft_entropy 5.860 (5.653) softmax_triplet 0.228 (0.353) soft_softmax_triplet 0.345 (0.373)
Epoch: [64][310/400] Time 0.413 (0.427) Acc@1 89.06% (82.50%) cross_entropy 3.414 (4.033) soft_entropy 5.487 (5.649) softmax_triplet 0.049 (0.352) soft_softmax_triplet 0.055 (0.374)
Epoch: [64][320/400] Time 0.411 (0.427) Acc@1 98.44% (82.80%) cross_entropy 2.885 (4.014) soft_entropy 4.540 (5.649) softmax_triplet 0.002 (0.353) soft_softmax_triplet 0.003 (0.373)
Epoch: [64][330/400] Time 0.413 (0.426) Acc@1 96.88% (82.99%) cross_entropy 3.345 (4.004) soft_entropy 5.611 (5.654) softmax_triplet 0.732 (0.361) soft_softmax_triplet 0.729 (0.379)
Epoch: [64][340/400] Time 0.410 (0.426) Acc@1 89.06% (83.21%) cross_entropy 3.622 (3.990) soft_entropy 5.257 (5.652) softmax_triplet 0.116 (0.360) soft_softmax_triplet 0.207 (0.379)
Epoch: [64][350/400] Time 0.406 (0.425) Acc@1 93.75% (83.40%) cross_entropy 3.257 (3.978) soft_entropy 5.324 (5.645) softmax_triplet 0.071 (0.359) soft_softmax_triplet 0.072 (0.376)
Epoch: [64][360/400] Time 0.188 (0.422) Acc@1 45.31% (82.93%) cross_entropy nan (nan) soft_entropy nan (nan) softmax_triplet nan (nan) soft_softmax_triplet nan (nan)
Epoch: [64][370/400] Time 0.182 (0.416) Acc@1 46.88% (81.94%) cross_entropy nan (nan) soft_entropy nan (nan) softmax_triplet nan (nan) soft_softmax_triplet nan (nan)
Epoch: [64][380/400] Time 0.183 (0.410) Acc@1 39.06% (80.98%) cross_entropy nan (nan) soft_entropy nan (nan) softmax_triplet nan (nan) soft_softmax_triplet nan (nan)
Epoch: [64][390/400] Time 0.181 (0.404) Acc@1 45.31% (80.07%) cross_entropy nan (nan) soft_entropy nan (nan) softmax_triplet nan (nan) soft_softmax_triplet nan (nan)
==> Val on the no.0 model

************************* Start validating market1501 on epoch 64 *************************n
Val: [ 0/18] Time 0.112 (0.112) Data 0.071 (0.071)
Val: [10/18] Time 0.030 (0.038) Data 0.000 (0.006)

Mean AP: 2.0%
CMC Scores:
top-1 0.4%
top-5 1.4%
top-10 1.4%
Validating time: 0:00:00.967822

************************* Finished validating *************************

==> Val on the no.1 model

************************* Start validating market1501 on epoch 64 *************************n
Val: [ 0/18] Time 0.109 (0.109) Data 0.068 (0.068)
Val: [10/18] Time 0.031 (0.038) Data 0.000 (0.006)

Mean AP: 96.5%
CMC Scores:
top-1 98.1%
top-5 99.5%
top-10 99.9%
Validating time: 0:00:01.165367

************************* Finished validating *************************

Finished epoch 64 mAP: 96.5% best: 96.5% *

K-means for SpCL

Have you tried K-means based clustering to generate pseudo labels for SpCL? I tried it based on your open source code. However, using K-means based pseudo labels for SpCL, we got poor results. For example, mAP are 20.9% and 36.2%,respectively. When the pseudo labels of MMT are generated by K-means, its results is comparable with that of DBSCAN. I can't understand and explain this. Could you give some suggestions? Thank you in advance.

Question on SPCL+

Hi, i notice that on the given leaderboard, SPCL+ shows higher performance than SPCL on both unsupervised and UDA re-id task. What's the main difference between SPCL and SPCL+ that brings the performance boost? Thanks ~

关于CUDA out of memory. Tried to allocate 18.00 MiB (GPU 0; 5.80 GiB total capacity; 4.98 GiB already allocated; 4.25 MiB free; 9.66 MiB cached)

我的硬件配置是1660ti，6g显存，训练命令是 GPUS=1 bash dist_train.sh CycleGAN mark_duck，我修改了配置文件里的iter次数，还是显示内存不足，请问是硬件配置不行吗？但是我跑cyclegan作者的模型是可以训练的。

visualize test results

Hi，will the subsequent version update with a visualized results option on test, which can show the rank lists of the query images and the activation maps of the retrieved images?

Question about 'jaccard distance'

I see the function has k1,k2 args, Is this the rerank？
The code is so complex and difficult for me
thanks

spcl+在UDA上不能复现结果

葛博，你好。我用的是4卡2080Ti，pytorch1.7+cuda10.1+python3.8.5，spcl+ 在duke->msmt map只有22 ，在market->msmt只有23.3，下面是我的log文件

==========

Args:Namespace(config='SpCL/config_duke_msmt.yaml', launcher='pytorch', resume_from=None, set_cfgs=None, tcp_port='10010', work_dir='SpCL/duke_msmt/4gpu_16per/800iter')

==========
cfg.LOCAL_RANK: 0
cfg.DATA_ROOT: ../datasets
cfg.LOGS_ROOT: /data/OpenUnlogs/logs

cfg.MODEL = edict()
cfg.MODEL.backbone: resnet50
cfg.MODEL.pooling: gem
cfg.MODEL.embed_feat: 0
cfg.MODEL.dropout: 0.0
cfg.MODEL.dsbn: True
cfg.MODEL.sync_bn: True
cfg.MODEL.samples_per_bn: 16
cfg.MODEL.mean_net: False
cfg.MODEL.alpha: 0.999
cfg.MODEL.imagenet_pretrained: True
cfg.MODEL.source_pretrained: None

cfg.DATA = edict()
cfg.DATA.height: 256
cfg.DATA.width: 128
cfg.DATA.norm_mean: [0.485, 0.456, 0.406]
cfg.DATA.norm_std: [0.229, 0.224, 0.225]

cfg.DATA.TRAIN = edict()
cfg.DATA.TRAIN.is_autoaug: False
cfg.DATA.TRAIN.is_flip: True
cfg.DATA.TRAIN.flip_prob: 0.5
cfg.DATA.TRAIN.is_pad: True
cfg.DATA.TRAIN.pad_size: 10
cfg.DATA.TRAIN.is_blur: False
cfg.DATA.TRAIN.blur_prob: 0.5
cfg.DATA.TRAIN.is_erase: True
cfg.DATA.TRAIN.erase_prob: 0.5
cfg.DATA.TRAIN.is_mutual_transform: False
cfg.DATA.TRAIN.mutual_times: 2

cfg.TRAIN = edict()
cfg.TRAIN.seed: 1
cfg.TRAIN.deterministic: True
cfg.TRAIN.amp: False

cfg.TRAIN.datasets = edict()
cfg.TRAIN.datasets.msmt17: trainval
cfg.TRAIN.datasets.dukemtmcreid: trainval
cfg.TRAIN.unsup_dataset_indexes: [0]
cfg.TRAIN.epochs: 50
cfg.TRAIN.iters: 800

cfg.TRAIN.LOSS = edict()

cfg.TRAIN.LOSS.losses = edict()
cfg.TRAIN.LOSS.losses.hybrid_memory: 1.0
cfg.TRAIN.LOSS.temp: 0.05
cfg.TRAIN.LOSS.momentum: 0.2
cfg.TRAIN.val_dataset: msmt17
cfg.TRAIN.val_freq: 5

cfg.TRAIN.SAMPLER = edict()
cfg.TRAIN.SAMPLER.num_instances: 4
cfg.TRAIN.SAMPLER.is_shuffle: True

cfg.TRAIN.LOADER = edict()
cfg.TRAIN.LOADER.samples_per_gpu: 16
cfg.TRAIN.LOADER.workers_per_gpu: 2

cfg.TRAIN.PSEUDO_LABELS = edict()
cfg.TRAIN.PSEUDO_LABELS.freq: 1
cfg.TRAIN.PSEUDO_LABELS.use_outliers: True
cfg.TRAIN.PSEUDO_LABELS.norm_feat: True
cfg.TRAIN.PSEUDO_LABELS.norm_center: True
cfg.TRAIN.PSEUDO_LABELS.cluster: dbscan
cfg.TRAIN.PSEUDO_LABELS.eps: [0.58, 0.6, 0.62]
cfg.TRAIN.PSEUDO_LABELS.min_samples: 4
cfg.TRAIN.PSEUDO_LABELS.dist_metric: jaccard
cfg.TRAIN.PSEUDO_LABELS.k1: 30
cfg.TRAIN.PSEUDO_LABELS.k2: 6
cfg.TRAIN.PSEUDO_LABELS.search_type: 0
cfg.TRAIN.PSEUDO_LABELS.cluster_num: None

cfg.TRAIN.OPTIM = edict()
cfg.TRAIN.OPTIM.optim: adam
cfg.TRAIN.OPTIM.lr: 0.00035
cfg.TRAIN.OPTIM.weight_decay: 0.0005

cfg.TRAIN.SCHEDULER = edict()
cfg.TRAIN.SCHEDULER.lr_scheduler: single_step
cfg.TRAIN.SCHEDULER.stepsize: 20
cfg.TRAIN.SCHEDULER.gamma: 0.1

cfg.TEST = edict()
cfg.TEST.datasets: ['msmt17']

cfg.TEST.LOADER = edict()
cfg.TEST.LOADER.samples_per_gpu: 32
cfg.TEST.LOADER.workers_per_gpu: 2
cfg.TEST.dist_metric: euclidean
cfg.TEST.norm_feat: True
cfg.TEST.dist_cuda: True
cfg.TEST.rerank: False
cfg.TEST.search_type: 0
cfg.TEST.k1: 20
cfg.TEST.k2: 6
cfg.TEST.lambda_value: 0.3
cfg.launcher: pytorch
cfg.tcp_port: 10010
cfg.work_dir: /data/OpenUnlogs/logs/SpCL/duke_msmt/4gpu_16per/800iter
cfg.rank: 0
cfg.ngpus_per_node: 4
cfg.gpu: 0
cfg.total_gpus: 4
cfg.world_size: 4
The training is in a un/semi-supervised manner with 2 dataset(s) (['msmt17', 'dukemtmcreid']),
where ['msmt17'] have no labels.

Mean AP: 22.0%
CMC Scores:
top-1 46.6%
top-5 59.3%
top-10 64.6%
Testing time: 0:03:25.443005

******************************* Finished testing *******************************

Total running time: 5:10:33.865417

训练时候的小问题?

您好请问您遇到过这个问题吗:
在训练的时候,每经过固定的iteration后会卡住一段时间（似乎这个频率与数据集大小及iteration有关，我有一个试验是每320个iteration卡一次，另一个试验每20个iteration会卡一次）。
我是用集群跑的，4块卡
samples_per_gpu: 32
workers_per_gpu: 2
320卡一次的试验: 494354张image, 11968 iteration
20次卡一次的试验: 227790张image, 5514 iteration
麻烦您啦!

Will you share trained models for vehicle re-ID?

In the previous work on SpCL (https://github.com/yxgeee/SpCL) you've shared trained models for vehicle re-identification datasets, however for OpenUnReID you only have person re-ID models in the model zoo. Will you add vehicle re-ID models?

Difference between OpenUnReID and the original MMT's implementation

Hello,
I have already noticed that in this implementation, MMT does not require any pre-training on source domain. I wonder if there are any other slight differences between OpenUnReID's implementation and the one proposed by the original autors.
Many thanks,

这是什么问题？

Implement of SpCL

Sorry, I can't run the code for SpCL successfully in this setting. The error is as follows

loss = self.train_step(iter, batch)

File "SpCL/main.py", line 97, in train_step
results = self.model(inputs,targets)
File "/home/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/anaconda3/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 447, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/home/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
TypeError: forward() takes 2 positional arguments but 3 were given

Could you give me advice? Thank you!

strong_basline训练到后期acc突然降到0 loss变为nan

Epoch: [12][ 380/1161] Time 0.496 (0.498) Acc@1 54.69% (60.06%) cross_entropy 3.617 (4.314) softmax_triplet 2.303 (3.466)
Epoch: [12][ 390/1161] Time 0.493 (0.498) Acc@1 57.03% (60.02%) cross_entropy 3.469 (4.292) softmax_triplet 1.859 (3.471)
Epoch: [12][ 400/1161] Time 0.501 (0.498) Acc@1 56.25% (59.95%) cross_entropy 3.451 (4.274) softmax_triplet 4.916 (3.471)
Epoch: [12][ 410/1161] Time 0.487 (0.498) Acc@1 53.12% (59.85%) cross_entropy 3.627 (4.257) softmax_triplet 3.050 (3.438)
Epoch: [12][ 420/1161] Time 0.489 (0.498) Acc@1 54.69% (59.79%) cross_entropy 3.693 (4.240) softmax_triplet 5.153 (3.429)
Epoch: [12][ 430/1161] Time 0.505 (0.498) Acc@1 58.59% (59.72%) cross_entropy 3.444 (4.225) softmax_triplet 1.498 (3.448)
Epoch: [12][ 440/1161] Time 0.488 (0.498) Acc@1 57.03% (59.68%) cross_entropy 3.482 (4.208) softmax_triplet 5.507 (3.431)
Epoch: [12][ 450/1161] Time 0.478 (0.498) Acc@1 60.94% (59.59%) cross_entropy 3.388 (4.195) softmax_triplet 0.360 (3.432)
Epoch: [12][ 460/1161] Time 0.487 (0.498) Acc@1 51.56% (59.52%) cross_entropy 3.739 (4.181) softmax_triplet 2.203 (3.410)
Epoch: [12][ 470/1161] Time 0.185 (0.493) Acc@1 0.00% (58.61%) cross_entropy nan (nan) softmax_triplet nan (nan)
Epoch: [12][ 480/1161] Time 0.182 (0.487) Acc@1 0.00% (57.40%) cross_entropy nan (nan) softmax_triplet nan (nan)
Epoch: [12][ 490/1161] Time 0.191 (0.481) Acc@1 0.00% (56.23%) cross_entropy nan (nan) softmax_triplet nan (nan)
Epoch: [12][ 500/1161] Time 1.477 (0.477) Acc@1 0.00% (55.11%) cross_entropy nan (nan) softmax_triplet nan (nan)
Epoch: [12][ 510/1161] Time 0.186 (0.472) Acc@1 0.00% (54.03%) cross_entropy nan (nan) softmax_triplet nan (nan)
Epoch: [12][ 520/1161] Time 0.192 (0.466) Acc@1 0.00% (52.99%) cross_entropy nan (nan) softmax_triplet nan (nan)
Epoch: [12][ 530/1161] Time 0.181 (0.461) Acc@1 0.00% (52.00%) cross_entropy nan (nan) softmax_triplet nan (nan)
Epoch: [12][ 540/1161] Time 0.191 (0.456) Acc@1 0.00% (51.03%) cross_entropy nan (nan) softmax_triplet nan (nan)
Epoch: [12][ 550/1161] Time 0.196 (0.451) Acc@1 0.00% (50.11%) cross_entropy nan (nan) softmax_triplet nan (nan)

如上，在单卡训练strong_basline的时候acc突然变0 loss都变为nan
之前在训练market 2 duke 的时候也有这个情况出现，单卡训练到49epoch的时候中间几个iter会突然acc变0 loss变nan
四卡训练的时候倒是没有出现这个情况，请问是为什么呢？

Questions about copy_state_dict(state_dict[key], model) in test_reid.py

This fuction: copy_state_dict(state_dict[key], model), in testing:

Whether it's UDA_TP, MMT, or SpCL, the final result is the original test result of resnet50 pretrained on ImageNet.

请问如何扩大验证集的规模

当我对模型做了一些修改后，验证集map与测试集的rank1相差了十个点，因此想扩大验证集的规模确定最好的权重是哪个~

Faiss assertion 'err__ == cudaSuccess' failed in void faiss::gpu::runL2Norm(faiss::gpu::Tensor<T, 2,

cuda是10.1
python3.6
faiss 1.6.3
为什么还是这个错误

Validation freq

Hey,

When I run two trainings (strong baseline) with the same config file, except for val_freq which is the frequency of validation (2 and 5 in my case). I surprisingly get two different curves of the training cross entropy loss.
I believe that even though the model is in eval mode, it still changes some of its settings. Am I wrong ?

Thanks in advance.

Something about the Sampler

Hi, I have read the code about the dist_samplers, and I have a question about it. It seems that in a epoch you don't sample all qualified images in a dataset. the number of images in a epoch seems depend on the 'num_instance' and the 'num of pid' in this dataset(For example, if we use market1501, we should sampler more than 10000 images in a epoch, but now we only sample 'num_instances * pids' images in a epoch). Will this have a bad effect on the model and final results due to the lack of images diversity in a epoch?
3Q >_<

请问测试多个数据集时，如何在optional arguments中添加参数？

你好，我尝试在optional arguments中添加测试集的参数，但是会报错。命令如下：GPUS=1 bash dist_train.sh source_pretrain /home/pj/OpenUnReID-master/tools/source_pretrain/logs/ TRAIN.LOADER.samples_per_gpu 64 TEST.datasets ['dukemtmc', 'market1501',]。请问是不是我的参数格式有问题？

Question about train loader

In StrongBaseLine， after generate pseudo labels， We need to update train loader According to the clustering results。 If only 4000 images were labeled， then the trainLoader is generated only for the 4000 images？ the remain images which were not labeled would not be used？ Thank you！

SpCL Train_step的问题

您好，想问下您这里id是否代表label.还想问下ind代表什么.
麻烦您啦!

关于大规模数据的问题

对于ReIDtask,在数据规模较大(source20万,target40万)的时候，计算jaccard_distance的时候出现memory error。(loss的计算已经放到cpu,内存128GB)

因为对您的代码还不是太熟悉,想问下您在计算loss的时候,不用矩阵乘法,一个batch一个batch的计算是否能解决这个问题呢?
此外您觉得还有哪些地方可以进一步优化呢?
谢谢您啦!

Market1501->Duck,单GPU可以汇报下么？

非常感谢你们优秀的开源项目！
当我用单GPU去复现时，batchsize设置64，其他没有任何改变，
Market1501->Duck, mAP=64.5%,Rank1=79.3%。
我不清楚单GPU为何差这么多，你们有试过单GPU的性能么？

severe overfitting with a single GPU

您好，
我在使用strong_baseline方法进行DukeMTMC-reID -> Market-1501的UDA训练时出现了严重的过拟合。

这是我对config的修改（我的设备最多支持batch size为8）：

samples_per_gpu: 16 -> 8
lr: 0.00035 -> 0.000035

这是我的训练log：

# performance on val dataset
Mean AP: 81.4%
CMC Scores:
  top-1          91.4%
  top-5          97.5%
  top-10         98.8%
Validating time:  0:00:09.783462
* Finished epoch  49  mAP: 81.4%  best: 81.4% *

# performance on test dataset
Mean AP: 55.4%
CMC Scores:
  top-1          78.6%
  top-5          91.6%
  top-10         94.5%

验证精度和测试精度相差很大，并且最终的测试精度与leaderboard上的差了近20%

请问我需要怎么调整以解决这一问题，非常感谢！

AttributeError: module 'faiss._swigfaiss' has no attribute 'delete_SwigPyIterator'

Run in faiss==1.7.0

Traceback (most recent call last):
  File "/root/miniconda3/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/root/miniconda3/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/root/miniconda3/lib/python3.8/site-packages/torch/distributed/launch.py", line 260, in <module>
    main()
  File "/root/miniconda3/lib/python3.8/site-packages/torch/distributed/launch.py", line 255, in main
    raise subprocess.CalledProcessError(returncode=process.returncode,

about "select_cams"

Hello, thanks for your awesome work!
I have a question about select_cams. It seems that the sampler only uses images in different cameras. I have tried to replace "select_cams = No_index(cams, i_cam)" with "select_cams = list(range(len(cams)))", the performance seems not good.
Correct me if I'm wrong. Thanks!

关于分chunk的问题

您好，之前跟您讨论过计算jacard距时矩阵过大的问题。您跟我提过分chunk的思路。我现在的想法是先用个mini-batch k-means去粗分成几个chunk，您觉得这样的思路可以吗。或者您觉得有什么更好的分chunk的思路吗。麻烦您啦!

The difference between strong_baseline and MMT proposed in unsupervised domain adaptation

I see that this benchmark network is from sec3.1 in the MMT paper. How is it trained? Does it require collaborative training?

Question about Reliable Clusters

Hi, Have you tried adding the condition of learning with reliable Clusters( independence of clusters and compactness of clusters) directly based on MMT ?

你好，我的faiss出了点问题；

版本是cuda9.0, faiss-gpu 1.5.0
报的错误是TypeError：bruteForceKnn() takes exactly 10 arguments (12given)

但是如果我把faiss的版本升上去，同样是用conda装的, 就会出现向前面的一个问题：Faiss assertion 'err__ == cudaSuccess' failed in void faiss::gpu::runL2Norm

我是只能把cuda升到10.0再试一试吗

About Leaderboard.

Hello, I have a question, the performance of SpCL+ or MMT+ on DukeMTMC-reID -> MSMT17 and Market-1501 -> MSMT17, have you made these experimental settings?
Look forward to your reply. Thank you very much!

Faiss assertion 'err__ == cudaSuccess' failed in void faiss::gpu::runL2Norm(

我的环境是ubuntu18.04， pytorch1.5.0 cuda10.1,运行时报错如下：
我执行的训练指令是：
GPUS=1 bash dist_train.sh SpCL SpCL/Market1501

bruteForceKnn is deprecated; call bfKnn instead
Faiss assertion 'err__ == cudaSuccess' failed in void faiss::gpu::runL2Norm(faiss::gpu::Tensor<T, 2, true, IndexType>&, bool, faiss::gpu::Tensor<float, 1, true, IndexType>&, bool, cudaStream_t) [with T = float; TVec = float4; IndexType = int; cudaStream_t = CUstream_st*] at gpu/impl/L2Norm.cu:292; details: CUDA error 11 invalid argument
Traceback (most recent call last):
File "/my_app/anaconda3/envs/OpenUnReID-pytorch1.5-py3.6/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/my_app/anaconda3/envs/OpenUnReID-pytorch1.5-py3.6/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/my_app/anaconda3/envs/OpenUnReID-pytorch1.5-py3.6/lib/python3.6/site-packages/torch/distributed/launch.py", line 263, in
main()
File "/my_app/anaconda3/envs/OpenUnReID-pytorch1.5-py3.6/lib/python3.6/site-packages/torch/distributed/launch.py", line 259, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/my_app/anaconda3/envs/OpenUnReID-pytorch1.5-py3.6/bin/python', '-u', 'SpCL/main.py', 'SpCL/config.yaml', '--work-dir=SpCL/Market1501', '--launcher=pytorch', '--tcp-port=28211', '--set']' died with <Signals.SIGABRT: 6>.

距离选择的问题。

您好有两个关于距离的问题想再问下您。

Jaccard距离效果会比其他两个好很多吗？
因为想在计算Jaccard距离的时候压缩下内存(避免出现N * N)。如果k2平均的那一步不去作，会对结果有很大影响吗?

麻烦您啦!

计算三元组损失时，特征需要归一化吗，代码里没有做归一化

如题

When will the trained model of SSG be Shared？

May I ask, When will the trained model of Unsupervised learning (USL) of SSG be Shared？

spcl 无监督域自适应

您好，当我在spcl的mian函数中将joint=False设置为joint=True时，显示以下错误：
File "/home/amax/OpenUnReID/openunreid/models/losses/memory.py", line 112, in forward
sim = torch.zeros(labels.max() + 1, B).float().cuda()
TypeError: zeros() received an invalid combination of arguments - got (Tensor, int), but expected one of:

(tuple of ints size, *, tuple of names names, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)
(tuple of ints size, *, Tensor out, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)
请问可否提供可能的解决方法呢？选择的两个数据集是duke和market，其中duke不带标签

关于num_parts的问题

您好，在config中是可以设置num_parts的，但是代码里似乎是在对channel做划分，而不是对特征谱做分条。不知道我理解的对不对，求解答

Question on performance

hi，thanks for your work.
I have obtained bad performance on MMT and SpCL by only using your code and default setting .
For instance,
MMT(duke-to-market):
************************* Start validating market1501 on epoch 49 *************************n
�[1mVal: [ 0/18]�[0m Time 0.997 (0.997) Data 0.353 (0.353)
�[1mVal: [10/18]�[0m Time 0.407 (0.493) Data 0.000 (0.032)

Mean AP: 4.2%
CMC Scores:
top-1 6.1%
top-5 15.2%
top-10 21.7%
Validating time: 0:00:22.068357
SpCL (unsupervised on market1501):
************************* Start validating market1501 on epoch 49 *************************n
�[1mVal: [ 0/18]�[0m Time 0.475 (0.475) Data 0.299 (0.299)
�[1mVal: [10/18]�[0m Time 0.100 (0.147) Data 0.000 (0.027)

Mean AP: 15.8%
CMC Scores:
top-1 24.0%
top-5 44.2%
top-10 56.2%
Validating time: 0:00:14.673135

I suspect that I have overlooked something important. Can you give me a advise?

why source_pretrain wrong?

Traceback (most recent call last):
File "source_pretrain/main.py", line 142, in
main()
File "source_pretrain/main.py", line 124, in main
runner.run()
File "/workspace/mnt/storage/zsl/debug_observe/OpenUnReID/openunreid/apis/runner.py", line 122, in run
self.train()
File "/workspace/mnt/storage/zsl/debug_observe/OpenUnReID/openunreid/apis/runner.py", line 212, in train
self.train_loader.new_epoch(self._epoch)
File "/workspace/mnt/storage/zsl/debug_observe/OpenUnReID/openunreid/data/utils/dataset_wrapper.py", line 89, in new_epoch
self.iter = iter(self.loader)
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 279, in iter
return _MultiProcessingDataLoaderIter(self)
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 719, in init
w.start()
File "/usr/lib/python3.6/multiprocessing/process.py", line 105, in start
self._popen = self._Popen(self)
File "/usr/lib/python3.6/multiprocessing/context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/usr/lib/python3.6/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/usr/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 32, in init
super().init(process_obj)
File "/usr/lib/python3.6/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/usr/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/usr/lib/python3.6/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'SubPolicy.init..'
Traceback (most recent call last):
File "source_pretrain/main.py", line 142, in
main()
File "source_pretrain/main.py", line 124, in main
runner.run()
File "/workspace/mnt/storage/zsl/debug_observe/OpenUnReID/openunreid/apis/runner.py", line 122, in run
self.train()
File "/workspace/mnt/storage/zsl/debug_observe/OpenUnReID/openunreid/apis/runner.py", line 212, in train
self.train_loader.new_epoch(self._epoch)
File "/workspace/mnt/storage/zsl/debug_observe/OpenUnReID/openunreid/data/utils/dataset_wrapper.py", line 89, in new_epoch
self.iter = iter(self.loader)
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 279, in iter
return _MultiProcessingDataLoaderIter(self)
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 719, in init
w.start()
File "/usr/lib/python3.6/multiprocessing/process.py", line 105, in start
self._popen = self._Popen(self)
File "/usr/lib/python3.6/multiprocessing/context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/usr/lib/python3.6/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/usr/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 32, in init
super().init(process_obj)
File "/usr/lib/python3.6/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/usr/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/usr/lib/python3.6/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'SubPolicy.init..'
Traceback (most recent call last):
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launch.py", line 263, in
main()
File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launch.py", line 259, in main
cmd=cmd)

Ask for help for solving faiss assertion error

Dear Author,
I guess I have met an obstacle about the wrong environment dependency. My machine is equipped with two RTX3090s, CUDA11.4, Ubuntu 20.04.3 LTS. Every time when I run "main.py" in the directory "tools/MMT", as long as it reaches the step of computing the jaccard distance, the error below occurs:

"""
Computing jaccard distance...
bruteForceKnn is deprecated; call bfKnn instead
Faiss assertion 'err__ == cudaSuccess' failed in void faiss::gpu::runL2Norm(faiss::gpu::Tensor<T, 2, true, IndexType>&, bool, faiss::gpu::Tensor<float, 1, true, IndexType>&, bool, cudaStream_t) [with T = float; TVec = float4; IndexType = int; cudaStream_t = CUstream_st*] at gpu/impl/L2Norm.cu:292; details: CUDA error 8 invalid device function

Process finished with exit code 134 (interrupted by signal 6: SIGABRT)
"""

I feel so weak to fix this frustrating error, since I do not have enough experience of using the module "faiss-gpu", and there is hardly relevant solution published on the Internet where my search methods could cover.
I wonder what the possible bug is? How can I solve the problem please? Looking forward to your instruction, with best regards, thanks!

关于ScPL的问题

你好, 在cross-domain上非常好的项目
但是我在使用中发现文档很少,很难做二次开发,
下载了开源的ScPL模型按Get Started流程测试时报UserWarning: missing keys in state_dict UserWarning: unexpected keys in checkpoint
得到的结果是:
Mean AP: 2.2%
CMC Scores:
top-1 6.7%
top-5 14.9%
top-10 20.1%
Testing time: 0:00:35.245021
然后我自己用market1501':和', 'dukemtmcreid训练35epoch得到高于你发布的精度
Mean AP: 91.3%
CMC Scores:
top-1 95.6%
top-5 98.9%
top-10 99.5%
Validating time: 0:00:05.402543
请问什么时候能做下更新.

question about "samples_per_bn"

There is a comparison between samples_per_bn and samples_per_gpu, but for mostly cases, "samples_per_bn" is always smaller than "samples_per_gpu", making sync bn failed. Does this have an impact on performance?

Except for controlling sync bn, the samples_per_bn has no other use. Why we need this?

batchsize

您好，有两个小问题。

1.您好config里的iters，可以理解成每个epoch拿出iters个mini-batch用于train吗?
2.对于大规模数据(几十万张图),bybird memory内存开销过大这个问题，您有什么解决的思路吗?

麻烦您啦!

super strong baseline

感谢您的开源项目！请问super strong baseline是在SpCL基础上做的一些改进吗

AttributeError: Can't pickle local object 'SubPolicy.init.<locals>.<lambda>'

how can i fix it, thank u

==========
Args:Namespace(config='source_pretrain/config.yaml', launcher='none', resume_from=None, set_cfgs=None, tcp_port='5017', work_dir='source_pretrain/market1501')

cfg.LOCAL_RANK: 0
cfg.DATA_ROOT: /home/pj/re-id/data
cfg.LOGS_ROOT: ../logs

cfg.MODEL = edict()
cfg.MODEL.backbone: resnet50
cfg.MODEL.pooling: gem
cfg.MODEL.embed_feat: 0
cfg.MODEL.dropout: 0.0
cfg.MODEL.dsbn: False
cfg.MODEL.sync_bn: True
cfg.MODEL.samples_per_bn: 64
cfg.MODEL.mean_net: False
cfg.MODEL.imagenet_pretrained: True
cfg.MODEL.source_pretrained: None

cfg.DATA = edict()
cfg.DATA.height: 256
cfg.DATA.width: 128
cfg.DATA.norm_mean: [0.485, 0.456, 0.406]
cfg.DATA.norm_std: [0.229, 0.224, 0.225]

cfg.DATA.TRAIN = edict()
cfg.DATA.TRAIN.is_autoaug: True
cfg.DATA.TRAIN.is_flip: True
cfg.DATA.TRAIN.flip_prob: 0.5
cfg.DATA.TRAIN.is_pad: True
cfg.DATA.TRAIN.pad_size: 10
cfg.DATA.TRAIN.is_blur: False
cfg.DATA.TRAIN.blur_prob: 0.5
cfg.DATA.TRAIN.is_erase: False
cfg.DATA.TRAIN.erase_prob: 0.5
cfg.DATA.TRAIN.is_mutual_transform: False
cfg.DATA.TRAIN.mutual_times: 2

cfg.TRAIN = edict()
cfg.TRAIN.seed: 1
cfg.TRAIN.deterministic: True

cfg.TRAIN.datasets = edict()
cfg.TRAIN.datasets.market1501: trainval
cfg.TRAIN.unsup_dataset_indexes: None
cfg.TRAIN.epochs: 120
cfg.TRAIN.iters: 200

cfg.TRAIN.LOSS = edict()

cfg.TRAIN.LOSS.losses = edict()
cfg.TRAIN.LOSS.losses.cross_entropy: 1.0
cfg.TRAIN.LOSS.losses.softmax_triplet: 1.0
cfg.TRAIN.LOSS.margin: 0.0
cfg.TRAIN.val_dataset: market1501
cfg.TRAIN.val_freq: 40

cfg.TRAIN.SAMPLER = edict()
cfg.TRAIN.SAMPLER.num_instances: 4
cfg.TRAIN.SAMPLER.is_shuffle: True

cfg.TRAIN.LOADER = edict()
cfg.TRAIN.LOADER.samples_per_gpu: 16
cfg.TRAIN.LOADER.workers_per_gpu: 4

cfg.TRAIN.OPTIM = edict()
cfg.TRAIN.OPTIM.optim: adam
cfg.TRAIN.OPTIM.lr: 0.00035
cfg.TRAIN.OPTIM.weight_decay: 0.0005

cfg.TRAIN.SCHEDULER = edict()
cfg.TRAIN.SCHEDULER.lr_scheduler: warmup_multi_step
cfg.TRAIN.SCHEDULER.stepsize: [40, 70]
cfg.TRAIN.SCHEDULER.gamma: 0.1
cfg.TRAIN.SCHEDULER.warmup_factor: 0.01
cfg.TRAIN.SCHEDULER.warmup_steps: 10

cfg.TEST = edict()
cfg.TEST.datasets: ['market1501']

cfg.TEST.LOADER = edict()
cfg.TEST.LOADER.samples_per_gpu: 32
cfg.TEST.LOADER.workers_per_gpu: 4
cfg.TEST.dist_metric: euclidean
cfg.TEST.norm_feat: True
cfg.TEST.dist_cuda: True
cfg.TEST.rerank: False
cfg.TEST.search_type: 0
cfg.TEST.k1: 20
cfg.TEST.k2: 6
cfg.TEST.lambda_value: 0.3
cfg.launcher: none
cfg.tcp_port: 5017
cfg.work_dir: ../logs/source_pretrain/market1501
cfg.total_gpus: 1
The training is in a fully-supervised manner with 1 dataset(s) (['market1501'])
=> Loaded trainval from Market1501

ids | # images | # cameras

751 |    12936 |         6

=> Loaded the Joint Training Dataset

ids | # images | # cameras

751 |    12936 |         6

/home/pj/OpenUnReID-master/openunreid/utils/torch_utils.py:69: UserWarning: unexpected keys in checkpoint: {'fc.weight', 'fc.bias'}
warnings.warn("unexpected keys in checkpoint: {}".format(unexpected_keys))
/home/pj/OpenUnReID-master/openunreid/models/builder.py:255: UserWarning: Sync BN is switched off, since the program is running without DDP
warnings.warn(
=> Loaded val from Market1501

ids | # images | # cameras

150 |     2257 |         6

Traceback (most recent call last):
File "source_pretrain/main.py", line 140, in
main()
File "source_pretrain/main.py", line 122, in main
runner.run()
File "/home/pj/OpenUnReID-master/openunreid/apis/runner.py", line 94, in run
self.train()
File "/home/pj/OpenUnReID-master/openunreid/apis/runner.py", line 171, in train
self.train_loader.new_epoch(self._epoch)
File "/home/pj/OpenUnReID-master/openunreid/data/utils/dataset_wrapper.py", line 89, in new_epoch
self.iter = iter(self.loader)
File "/home/pj/anaconda3/envs/reid/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 279, in iter
return _MultiProcessingDataLoaderIter(self)
File "/home/pj/anaconda3/envs/reid/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 719, in init
w.start()
File "/home/pj/anaconda3/envs/reid/lib/python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/home/pj/anaconda3/envs/reid/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/home/pj/anaconda3/envs/reid/lib/python3.8/multiprocessing/context.py", line 283, in _Popen
return Popen(process_obj)
File "/home/pj/anaconda3/envs/reid/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in init
super().init(process_obj)
File "/home/pj/anaconda3/envs/reid/lib/python3.8/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/home/pj/anaconda3/envs/reid/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/home/pj/anaconda3/envs/reid/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'SubPolicy.init..'

open-mmlab / openunreid Goto Github PK

openunreid's Introduction

OpenUnReID

Introduction

Major features

Supported methods

Updates

Installation

Get Started

License

Citation

Acknowledgement

Contact

openunreid's People

Contributors

Stargazers

Watchers

Forkers

openunreid's Issues

========== Args:Namespace(config='source_pretrain/config.yaml', launcher='none', resume_from=None, set_cfgs=None, tcp_port='5017', work_dir='source_pretrain/market1501')

ids | # images | # cameras

=> Loaded the Joint Training Dataset

ids | # images | # cameras

ids | # images | # cameras

Recommend Projects

Recommend Topics

Recommend Org

==========
Args:Namespace(config='source_pretrain/config.yaml', launcher='none', resume_from=None, set_cfgs=None, tcp_port='5017', work_dir='source_pretrain/market1501')