tinyvision / solider Goto Github PK

A Semantic Controllable Self-Supervised Learning Framework to learn general human representations from massive unlabeled human images, which can benefit downstream human-centric tasks to the maximum extent

License: Apache License 2.0

Python 99.17% Shell 0.83%

cvpr2023 human-centric self-supervised-learning

solider's People

Contributors

Stargazers

Watchers

Forkers

ssl-solider cwhgn kaysico viktoya equinorab doxzm office361 onorikka xiuyu-sxy symbolic-ref luna2terra tatacomb lverything paulauv plaid3 garnue jroutines taoxianyu saigona gameplux donkkaseu applejson nevercum utzls venkatsubr shirerdiao qqo8 6gem dutfile 5qnu sexyeah wenwenzju nsfwow detimer droxico recaller awekling duoshoucom shabbyjames tvsecret serend1p1ty gowithplay nimbus2004 xidorm 0xabce coriskr zytewalk srccy teakoo breakarray long-long-double baosiby libtree shijiangming1 varmos vegefira libcolor cinkun gamepera meltykisser kauhua takinoka whilebourgogne mrdimarco tinveo tufo830 wentop1 tryq1998 senuhere roguelio miruteer nek-oko vinanini pfeferminz xiaoqiuli zemire97 0xkiki venuslock cloudvi popself smartsudo finclips pred-320 annwawa partija notifycontext isleat malest marryroseme paxlovid ichibanya azhu520 excogi kennycou manraychan knowbe2 berstand a9bi tatachu mrpavlo

solider's Issues

SOLIDER with Resnet50

Hi authors,
Thanks for sharing your method, it is really interesting. This question is out of scope of the paper, have you tried your method with Resnet architecture? Can it work? Thank you in advance.

SOLIDER Training Time?

How long did it take to train SOLIDER (in hours) using the settings from the paper? For both DINO pre-training and SOLIDER training? I see the number of epochs, but not time in hours.

IndexError: tensors used as indices must be long, byte or bool tensors

when I run the resume_solider.sh,it maybe traceback with
File "main_solider.py", line 398, in train_one_epoch
semantic_weight = [torch.cat(semantic_weight)[torch.from_numpy(mask_idxs)]]
IndexError: tensors used as indices must be long, byte or bool tensors
how can I deal with it

Request for PedestrianDetection pretrained model

博主您好！
我使用SOLIDER上pretrained的模型进行测试，报了如下的错误

/home/cddjjc/anaconda3/envs/pedestron_v2/bin/python /home/cddjjc/Workspace/SOLIDER-PedestrianDetection/test_city_person.py configs/solider/cp/swin_base.py models_pretrained/solider_origin/swin_base/epoch_ 1 2 --out swin_base.json --show --mean_teacher 
loading annotations into memory...
Done (t=0.03s)
creating index...
index created!
No pre-trained weights for SwinBase, training start from scratch
unexpected key in source state_dict: backbone.norm0.weight, backbone.norm0.bias, head.mlp.0.weight, head.mlp.0.bias, head.mlp.2.weight, head.mlp.2.bias, head.mlp.4.weight, head.mlp.4.bias, head.last_layer.weight_g, head.last_layer.weight_v

missing keys in source state_dict: bbox_head.reg_convs.0.gn.bias, bbox_head.offset_scales.0.scale, bbox_head.cls_convs.0.conv.weight, neck.p3_l2.weight, bbox_head.reg_convs.0.conv.weight, bbox_head.cls_convs.0.gn.bias, bbox_head.csp_reg.weight, neck.p4_l2.weight, bbox_head.csp_cls.weight, neck.p5_l2.weight, bbox_head.cls_convs.0.gn.weight, bbox_head.offset_convs.0.conv.weight, bbox_head.csp_offset.bias, neck.p4.bias, bbox_head.csp_offset.weight, neck.p5.weight, bbox_head.reg_scales.0.scale, bbox_head.csp_cls.bias, neck.p4.weight, bbox_head.reg_convs.0.gn.weight, bbox_head.csp_reg.bias, bbox_head.offset_convs.0.gn.bias, neck.p3.weight, bbox_head.offset_convs.0.gn.weight, neck.p5.bias, neck.p3.bias

[                              ] 0/500, elapsed: 0s, ETA:Traceback (most recent call last):
  File "/home/cddjjc/Workspace/SOLIDER-PedestrianDetection/test_city_person.py", line 227, in <module>
    main()
  File "/home/cddjjc/Workspace/SOLIDER-PedestrianDetection/test_city_person.py", line 195, in main
    outputs = single_gpu_test(model, data_loader, args.show, args.save_img, args.save_img_dir)
  File "/home/cddjjc/Workspace/SOLIDER-PedestrianDetection/test_city_person.py", line 30, in single_gpu_test
    result = model(return_loss=False, rescale=not show, **data)
  File "/home/cddjjc/anaconda3/envs/pedestron_v2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/cddjjc/anaconda3/envs/pedestron_v2/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 169, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/cddjjc/anaconda3/envs/pedestron_v2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/cddjjc/Workspace/SOLIDER-PedestrianDetection/mmdet/core/fp16/decorators.py", line 49, in new_func
    return old_func(*args, **kwargs)
  File "/home/cddjjc/Workspace/SOLIDER-PedestrianDetection/mmdet/models/detectors/base.py", line 88, in forward
    return self.forward_test(img, img_meta, **kwargs)
  File "/home/cddjjc/Workspace/SOLIDER-PedestrianDetection/mmdet/models/detectors/base.py", line 79, in forward_test
    return self.simple_test(imgs[0], img_metas[0], **kwargs)
  File "/home/cddjjc/Workspace/SOLIDER-PedestrianDetection/mmdet/models/detectors/csp.py", line 203, in simple_test
    x = self.extract_feat(img)
  File "/home/cddjjc/Workspace/SOLIDER-PedestrianDetection/mmdet/models/detectors/single_stage.py", line 42, in extract_feat
    x = self.neck(x)
  File "/home/cddjjc/anaconda3/envs/pedestron_v2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/cddjjc/Workspace/SOLIDER-PedestrianDetection/mmdet/core/fp16/decorators.py", line 49, in new_func
    return old_func(*args, **kwargs)
  File "/home/cddjjc/Workspace/SOLIDER-PedestrianDetection/mmdet/models/necks/csp_neck.py", line 73, in forward
    p3 = self.p3(inputs[0])
  File "/home/cddjjc/anaconda3/envs/pedestron_v2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/cddjjc/anaconda3/envs/pedestron_v2/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 958, in forward
    output_padding, self.groups, self.dilation)
RuntimeError: Given transposed=1, weight of size [512, 256, 4, 4], expected input[1, 256, 128, 256] to have 512 channels, but got 256 channels instead

Process finished with exit code 1

应该是SOLIDER上的pretrained model缺少了最后几层的权重。请问是否方便提供一下训练好的PedestrianDetection的完整模型？感谢！

modescope中SOLIDER-base模型使用

博主你好，我看你们在modescope中放出了预训练的SOLIDER-base模型，但是modescope中的pipeline中并没有相应的Tasks.Id。

是否可以在windows 11上跑代码

如题，感谢！

Request for the fine-tuned / Pre-trained code for Person Attribute Recognition

Hello,

Can you please upload task-specific Swin Weight for Person Attribute Recognition.
Basically the weights I need to recreate these results :

Method   | Model        | PETA_ZS(mA) | RAP_ZS(mA) | PA100K(mA)
SOLIDER | Swin Tiny  |         74.37        |       74.23       |       84.14

What is the feature dimension for person ReID?

Were the feature embedding sizes defined here used for your person ReID experiments link, i.e., 96 for Swin Tiny, 96 for Swin Small, and 128 for Swin Base? Or did you use embeddings of higher dimension? It was not mentioned in the paper. Thanks.

When will the training framework for human_parsing or human_pose be released

@cwhgn Hello, When will the training framework for human_parsing or human_pose be released?

question about Gait Recognition

Great job!
I would like to ask a question:
I don't see Gait Recognition task in the mentioned six downstream tasks, is SOLIDER not suitable for processing human gait features based on silhouette? , or is it out of some other consideration?
Look forward to receiving your reply, thank you.

Exporting to ONNX (Person REID)

Hello, how could I export the model to onnx.

Global parameter fine-tuning or partial fine-tuning?

Hi! I am very interested in Human Centric Visual Tasks, may I ask if the fine-tuning part for downstream tasks only implement at some layers or all layers?
Looking forward to your more awesome work!

pose estimation 任务训练报错 KeyError: 'SwinTransformer is not in the models registry'

我在下载预训练后的solider_swin_base.pth，运行pose任务的训练时报错如下：

fp16 = dict(loss_scale='dynamic')
work_dir = './work_dirs/swin_base_coco_384x288_lly'
gpu_ids = range(0, 1)

2023-09-25 16:03:15,197 - mmpose - INFO - Set random seed to 1071184448, deterministic: False
Traceback (most recent call last):
File "/data1/user/lly/miniconda3/envs/mmcv/lib/python3.7/site-packages/mmcv/utils/registry.py", line 52, in build_from_cfg
return obj_cls(**args)
File "/data1/user/lly/work/pkg/mmpose-0.25.0/mmpose/models/detectors/top_down.py", line 48, in init
self.backbone = builder.build_backbone(backbone)
File "/data1/user/lly/work/pkg/mmpose-0.25.0/mmpose/models/builder.py", line 19, in build_backbone
return BACKBONES.build(cfg)
File "/data1/user/lly/miniconda3/envs/mmcv/lib/python3.7/site-packages/mmcv/utils/registry.py", line 212, in build
return self.build_func(*args, **kwargs, registry=self)
File "/data1/user/lly/miniconda3/envs/mmcv/lib/python3.7/site-packages/mmcv/cnn/builder.py", line 27, in build_model_from_cfg
return build_from_cfg(cfg, registry, default_args)
File "/data1/user/lly/miniconda3/envs/mmcv/lib/python3.7/site-packages/mmcv/utils/registry.py", line 45, in build_from_cfg
f'{obj_type} is not in the {registry.name} registry')
KeyError: 'SwinTransformer is not in the models registry'

请问应该怎么解决呢？
我的环境信息如下：
sys.platform: linux
Python: 3.7.16 (default, Jan 17 2023, 22:20:44) [GCC 11.2.0]
CUDA available: True
GPU 0,1,2,3: NVIDIA GeForce RTX 4090
CUDA_HOME: /usr
NVCC: Cuda compilation tools, release 10.1, V10.1.243
GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
PyTorch: 1.6.0
PyTorch compiling details: PyTorch built with:

GCC 7.3
C++ Version: 201402
Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v1.5.0 (Git Hash e2ac1fac44c5078ca927cb9b90e1b3066a0b2ed0)
OpenMP 201511 (a.k.a. OpenMP 4.5)
NNPACK is enabled
CPU capability usage: AVX2
CUDA Runtime 10.1
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
CuDNN 7.6.3
Magma 2.5.2
Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

TorchVision: 0.7.0
OpenCV: 4.8.0
MMCV: 1.3.17
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 10.1
MMPose: 0.25.0+fd361ca

The Semantic Head

Thanks for your excellent work! I'm trying to reproduce this repo recently! I found that The Sementic Head which is defined as part_classifier in the main_solider.py is seemly not optimized, could you please explain why? By the way, could you please provide the supplementary materials of the cvpr paper？

search inference

有训练好的用于demo.py的权重吗？

The pretrained model provide in the link is the original swin-transformer-backbone or you trained on human dataset ?

请问您的训练GPU是什么级别的呢,训练时间是多久呢？

LUPerson 官方链接提供的数据有10683716张数据，跟论文里面说的418万数据量好像不一致

@cwhgn 您好，LUPerson 官方链接提供的数据有10683716张数据，跟论文里面说的418万数据量好像不一致，是需要做什么处理变成418万的数据量吗？

inference (human parsing)

How to run model on (one of task) inference
I found that in demo.py only features space
Im interesting in human parsing task

您好，可以分享dino、solider的训练日志吗

          8卡V100，训练时间主要看是直接训练solider还是在dino基础上训练solider。dino基础上训练大约几天，直接训练时间会更久。

Originally posted by @cwhgn in #13 (comment)

Is it able to run inference in realtime?

What's the approximate inference speed for per image on your GPU configuration?

When release attribute recognition code? thank u

What is the license of SOLIDER?

Is it under an MIT license? Could you add the license file to the project?

Visualize and manually modify semantic clustering

Hello,

I was wondering if it is possible to visualize the semantic clustering results of the input images (and the attention maps) as in your paper. I have tried it, but I have not been able to visualize them.

Moreover, I was also thinking about modifying the clustering masks with my own ones so the model could learn to focus on the specified parts, but I'm having some problems. Do you think it is feasible?

person serach

如何使用solider

请问怎么使用solider，做人体解析