Giter Site home page Giter Site logo

tinyvision / solider Goto Github PK

View Code? Open in Web Editor NEW
1.9K 1.9K 343.0 456 KB

A Semantic Controllable Self-Supervised Learning Framework to learn general human representations from massive unlabeled human images, which can benefit downstream human-centric tasks to the maximum extent

License: Apache License 2.0

Python 99.17% Shell 0.83%
cvpr2023 human-centric self-supervised-learning

solider's People

Contributors

cwhgn avatar ssl-solider avatar xianzhexu avatar xiuyu-sxy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

solider's Issues

SOLIDER with Resnet50

Hi authors,
Thanks for sharing your method, it is really interesting. This question is out of scope of the paper, have you tried your method with Resnet architecture? Can it work? Thank you in advance.

SOLIDER Training Time?

How long did it take to train SOLIDER (in hours) using the settings from the paper? For both DINO pre-training and SOLIDER training? I see the number of epochs, but not time in hours.

IndexError: tensors used as indices must be long, byte or bool tensors

when I run the resume_solider.sh,it maybe traceback with
File "main_solider.py", line 398, in train_one_epoch
semantic_weight = [torch.cat(semantic_weight)[torch.from_numpy(mask_idxs)]]
IndexError: tensors used as indices must be long, byte or bool tensors
how can I deal with it

Request for PedestrianDetection pretrained model

博主您好!
我使用SOLIDER上pretrained的模型进行测试,报了如下的错误

/home/cddjjc/anaconda3/envs/pedestron_v2/bin/python /home/cddjjc/Workspace/SOLIDER-PedestrianDetection/test_city_person.py configs/solider/cp/swin_base.py models_pretrained/solider_origin/swin_base/epoch_ 1 2 --out swin_base.json --show --mean_teacher 
loading annotations into memory...
Done (t=0.03s)
creating index...
index created!
No pre-trained weights for SwinBase, training start from scratch
unexpected key in source state_dict: backbone.norm0.weight, backbone.norm0.bias, head.mlp.0.weight, head.mlp.0.bias, head.mlp.2.weight, head.mlp.2.bias, head.mlp.4.weight, head.mlp.4.bias, head.last_layer.weight_g, head.last_layer.weight_v

missing keys in source state_dict: bbox_head.reg_convs.0.gn.bias, bbox_head.offset_scales.0.scale, bbox_head.cls_convs.0.conv.weight, neck.p3_l2.weight, bbox_head.reg_convs.0.conv.weight, bbox_head.cls_convs.0.gn.bias, bbox_head.csp_reg.weight, neck.p4_l2.weight, bbox_head.csp_cls.weight, neck.p5_l2.weight, bbox_head.cls_convs.0.gn.weight, bbox_head.offset_convs.0.conv.weight, bbox_head.csp_offset.bias, neck.p4.bias, bbox_head.csp_offset.weight, neck.p5.weight, bbox_head.reg_scales.0.scale, bbox_head.csp_cls.bias, neck.p4.weight, bbox_head.reg_convs.0.gn.weight, bbox_head.csp_reg.bias, bbox_head.offset_convs.0.gn.bias, neck.p3.weight, bbox_head.offset_convs.0.gn.weight, neck.p5.bias, neck.p3.bias

[                              ] 0/500, elapsed: 0s, ETA:Traceback (most recent call last):
  File "/home/cddjjc/Workspace/SOLIDER-PedestrianDetection/test_city_person.py", line 227, in <module>
    main()
  File "/home/cddjjc/Workspace/SOLIDER-PedestrianDetection/test_city_person.py", line 195, in main
    outputs = single_gpu_test(model, data_loader, args.show, args.save_img, args.save_img_dir)
  File "/home/cddjjc/Workspace/SOLIDER-PedestrianDetection/test_city_person.py", line 30, in single_gpu_test
    result = model(return_loss=False, rescale=not show, **data)
  File "/home/cddjjc/anaconda3/envs/pedestron_v2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/cddjjc/anaconda3/envs/pedestron_v2/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 169, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/cddjjc/anaconda3/envs/pedestron_v2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/cddjjc/Workspace/SOLIDER-PedestrianDetection/mmdet/core/fp16/decorators.py", line 49, in new_func
    return old_func(*args, **kwargs)
  File "/home/cddjjc/Workspace/SOLIDER-PedestrianDetection/mmdet/models/detectors/base.py", line 88, in forward
    return self.forward_test(img, img_meta, **kwargs)
  File "/home/cddjjc/Workspace/SOLIDER-PedestrianDetection/mmdet/models/detectors/base.py", line 79, in forward_test
    return self.simple_test(imgs[0], img_metas[0], **kwargs)
  File "/home/cddjjc/Workspace/SOLIDER-PedestrianDetection/mmdet/models/detectors/csp.py", line 203, in simple_test
    x = self.extract_feat(img)
  File "/home/cddjjc/Workspace/SOLIDER-PedestrianDetection/mmdet/models/detectors/single_stage.py", line 42, in extract_feat
    x = self.neck(x)
  File "/home/cddjjc/anaconda3/envs/pedestron_v2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/cddjjc/Workspace/SOLIDER-PedestrianDetection/mmdet/core/fp16/decorators.py", line 49, in new_func
    return old_func(*args, **kwargs)
  File "/home/cddjjc/Workspace/SOLIDER-PedestrianDetection/mmdet/models/necks/csp_neck.py", line 73, in forward
    p3 = self.p3(inputs[0])
  File "/home/cddjjc/anaconda3/envs/pedestron_v2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/cddjjc/anaconda3/envs/pedestron_v2/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 958, in forward
    output_padding, self.groups, self.dilation)
RuntimeError: Given transposed=1, weight of size [512, 256, 4, 4], expected input[1, 256, 128, 256] to have 512 channels, but got 256 channels instead

Process finished with exit code 1

应该是SOLIDER上的pretrained model缺少了最后几层的权重。请问是否方便提供一下训练好的PedestrianDetection的完整模型?感谢!

modescope中SOLIDER-base模型使用

博主你好,我看你们在modescope中放出了预训练的SOLIDER-base模型,但是modescope中的pipeline中并没有相应的Tasks.Id。

What is the feature dimension for person ReID?

Were the feature embedding sizes defined here used for your person ReID experiments link, i.e., 96 for Swin Tiny, 96 for Swin Small, and 128 for Swin Base? Or did you use embeddings of higher dimension? It was not mentioned in the paper. Thanks.

question about Gait Recognition

Great job!
I would like to ask a question:
I don't see Gait Recognition task in the mentioned six downstream tasks, is SOLIDER not suitable for processing human gait features based on silhouette? , or is it out of some other consideration?
Look forward to receiving your reply, thank you.

pose estimation 任务训练报错 KeyError: 'SwinTransformer is not in the models registry'

我在下载预训练后的solider_swin_base.pth,运行pose任务的训练时报错如下:

fp16 = dict(loss_scale='dynamic')
work_dir = './work_dirs/swin_base_coco_384x288_lly'
gpu_ids = range(0, 1)

2023-09-25 16:03:15,197 - mmpose - INFO - Set random seed to 1071184448, deterministic: False
Traceback (most recent call last):
File "/data1/user/lly/miniconda3/envs/mmcv/lib/python3.7/site-packages/mmcv/utils/registry.py", line 52, in build_from_cfg
return obj_cls(**args)
File "/data1/user/lly/work/pkg/mmpose-0.25.0/mmpose/models/detectors/top_down.py", line 48, in init
self.backbone = builder.build_backbone(backbone)
File "/data1/user/lly/work/pkg/mmpose-0.25.0/mmpose/models/builder.py", line 19, in build_backbone
return BACKBONES.build(cfg)
File "/data1/user/lly/miniconda3/envs/mmcv/lib/python3.7/site-packages/mmcv/utils/registry.py", line 212, in build
return self.build_func(*args, **kwargs, registry=self)
File "/data1/user/lly/miniconda3/envs/mmcv/lib/python3.7/site-packages/mmcv/cnn/builder.py", line 27, in build_model_from_cfg
return build_from_cfg(cfg, registry, default_args)
File "/data1/user/lly/miniconda3/envs/mmcv/lib/python3.7/site-packages/mmcv/utils/registry.py", line 45, in build_from_cfg
f'{obj_type} is not in the {registry.name} registry')
KeyError: 'SwinTransformer is not in the models registry'

请问应该怎么解决呢?
我的环境信息如下:
sys.platform: linux
Python: 3.7.16 (default, Jan 17 2023, 22:20:44) [GCC 11.2.0]
CUDA available: True
GPU 0,1,2,3: NVIDIA GeForce RTX 4090
CUDA_HOME: /usr
NVCC: Cuda compilation tools, release 10.1, V10.1.243
GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
PyTorch: 1.6.0
PyTorch compiling details: PyTorch built with:

  • GCC 7.3
  • C++ Version: 201402
  • Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v1.5.0 (Git Hash e2ac1fac44c5078ca927cb9b90e1b3066a0b2ed0)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • NNPACK is enabled
  • CPU capability usage: AVX2
  • CUDA Runtime 10.1
  • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
  • CuDNN 7.6.3
  • Magma 2.5.2
  • Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

TorchVision: 0.7.0
OpenCV: 4.8.0
MMCV: 1.3.17
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 10.1
MMPose: 0.25.0+fd361ca

The Semantic Head

Thanks for your excellent work! I'm trying to reproduce this repo recently! I found that The Sementic Head which is defined as part_classifier in the main_solider.py is seemly not optimized, could you please explain why? By the way, could you please provide the supplementary materials of the cvpr paper?

inference (human parsing)

How to run model on (one of task) inference
I found that in demo.py only features space
Im interesting in human parsing task

Visualize and manually modify semantic clustering

Hello,

I was wondering if it is possible to visualize the semantic clustering results of the input images (and the attention maps) as in your paper. I have tried it, but I have not been able to visualize them.

Moreover, I was also thinking about modifying the clustering masks with my own ones so the model could learn to focus on the specified parts, but I'm having some problems. Do you think it is feasible?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.