hrnet / hrnet-image-classification Goto Github PK

View Code? Open in Web Editor NEW

945.0 25.0 218.0 188 KB

Train the HRNet model on ImageNet

Home Page: https://jingdongwang2017.github.io/Projects/HRNet/

License: MIT License

Python 100.00%

image-classification imagenet hrnets high-resolution-net

hrnet-image-classification's Introduction

High-resolution networks (HRNets) for Image classification

News

[2021/01/20] Add some stronger ImageNet pretrained models, e.g., the HRNet_W48_C_ssld_pretrained.pth achieved top-1 acc 83.6%.
[2020/03/13] Our paper is accepted by TPAMI: Deep High-Resolution Representation Learning for Visual Recognition.
Per request, we provide two small HRNet models. #parameters and GFLOPs are similar to ResNet18. The segmentation resutls using the two small models are also available at https://github.com/HRNet/HRNet-Semantic-Segmentation.
TensoFlow implemenation available at https://github.com/yuanyuanli85/tf-hrnet. Thanks VictorLi!
ONNX export enabled after fixing issues. Thanks Baowen Bao!

Introduction

This is the official code of high-resolution representations for ImageNet classification. We augment the HRNet with a classification head shown in the figure below. First, the four-resolution feature maps are fed into a bottleneck and the number of output channels are increased to 128, 256, 512, and 1024, respectively. Then, we downsample the high-resolution representations by a 2-strided 3x3 convolution outputting 256 channels and add them to the representations of the second-high-resolution representations. This process is repeated two times to get 1024 channels over the small resolution. Last, we transform 1024 channels to 2048 channels through a 1x1 convolution, followed by a global average pooling operation. The output 2048-dimensional representation is fed into the classifier.

ImageNet pretrained models

HRNetV2 ImageNet pretrained models are now available!

model	#Params	GFLOPs	top-1 error	top-5 error	Link
HRNet-W18-C-Small-v1	13.2M	1.49	27.7%	9.3%	OneDrive/BaiduYun(Access Code:v3sw)
HRNet-W18-C-Small-v2	15.6M	2.42	24.9%	7.6%	OneDrive/BaiduYun(Access Code:bnc9)
HRNet-W18-C	21.3M	3.99	23.2%	6.6%	OneDrive/BaiduYun(Access Code:r5xn)
HRNet-W30-C	37.7M	7.55	21.8%	5.8%	OneDrive/BaiduYun(Access Code:ajc1)
HRNet-W32-C	41.2M	8.31	21.5%	5.8%	OneDrive/BaiduYun(Access Code:itc1)
HRNet-W40-C	57.6M	11.8	21.1%	5.5%	OneDrive/BaiduYun(Access Code:i58x)
HRNet-W44-C	67.1M	13.9	21.1%	5.6%	OneDrive/BaiduYun(Access Code:3imd)
HRNet-W48-C	77.5M	16.1	20.7%	5.5%	OneDrive/BaiduYun(Access Code:68g2)
HRNet-W64-C	128.1M	26.9	20.5%	5.4%	OneDrive/BaiduYun(Access Code:6kw4)

Newly added checkpoints:

model	#Params	GFLOPs	top-1 error	Link
HRNet-W18-C (w/ CosineLR + CutMix + 300epochs)	21.3M	3.99	22.1%	Link
HRNet-W48-C (w/ CosineLR + CutMix + 300epochs)	77.5M	16.1	18.9%	Link
HRNet-W18-C-ssld (converted from PaddlePaddle)	21.3M	3.99	18.8%	Link
HRNet-W48-C-ssld (converted from PaddlePaddle)	77.5M	16.1	16.4%	Link

In the above Table, the first 2 checkpoints are trained with CosineLR, CutMix data augmentation and for longer epochs, i.e., 300epochs. The other two checkpoints are converted from PaddleClas. Please refer to SSLD tutorial for more details.

Quick start

Install

Install PyTorch=0.4.1 following the official instructions
git clone https://github.com/HRNet/HRNet-Image-Classification
Install dependencies: pip install -r requirements.txt

Data preparation

You can follow the Pytorch implementation: https://github.com/pytorch/examples/tree/master/imagenet

The data should be under ./data/imagenet/images/.

Train and test

Please specify the configuration file.

For example, train the HRNet-W18 on ImageNet with a batch size of 128 on 4 GPUs:

python tools/train.py --cfg experiments/cls_hrnet_w18_sgd_lr5e-2_wd1e-4_bs32_x100.yaml

For example, test the HRNet-W18 on ImageNet on 4 GPUs:

python tools/valid.py --cfg experiments/cls_hrnet_w18_sgd_lr5e-2_wd1e-4_bs32_x100.yaml --testModel hrnetv2_w18_imagenet_pretrained.pth

Other applications of HRNet

Citation

If you find this work or code is helpful in your research, please cite:

@inproceedings{SunXLW19,
  title={Deep High-Resolution Representation Learning for Human Pose Estimation},
  author={Ke Sun and Bin Xiao and Dong Liu and Jingdong Wang},
  booktitle={CVPR},
  year={2019}
}

@article{WangSCJDZLMTWLX19,
  title={Deep High-Resolution Representation Learning for Visual Recognition},
  author={Jingdong Wang and Ke Sun and Tianheng Cheng and 
          Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and 
          Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
  journal   = {TPAMI}
  year={2019}
}

Reference

[1] Deep High-Resolution Representation Learning for Visual Recognition. Jingdong Wang, Ke Sun, Tianheng Cheng, Borui Jiang, Chaorui Deng, Yang Zhao, Dong Liu, Yadong Mu, Mingkui Tan, Xinggang Wang, Wenyu Liu, Bin Xiao. Accepted by TPAMI. download

hrnet-image-classification's People

Contributors

Stargazers

Watchers

Forkers

jingcx elavin11 zehaoy jcjs yangsenwxy ddeeppnneett duoergun0729 hzhang57 wulingtian tahy1 kleinxin hust-wayne zhengfangwu zhaoyin214 sunke123 leo-xxx objectdetection dreamercv jimmytyt fengzifrank wynmew k383556 jarygrace fighterzzzh qhapper wy3406 lysummer123 heathhose 1365843167 pbdahzou xxyqsy ngunauj wqz960 jiazewang liuwenhaha caijiahao jeewa985 leeseyun bowenbao ml-lab liushuaicare bulatuseinov daojishigailvlun highland2019 presageboat kraken000 i-pan luogongning keyky hushunbo natumeyuzuru hesene himanshumittal01 clhne dreadlord1984 18381304961 leap-frog-sensetime sanghoon cnfive ljn114514 qingsong99 qq2737499951 guangluye qiyea wzhang1 linhduongtuan gandad moresun guohongli tszssong profou diamondto jeffersonfs singlautsav arash011 yutpa sanster pattaro ginobilinie zhangyingyue binianzjl tjuquentin hsfzxjy daybright-david alexmicrocontrol tonylibing fanshuixing hello-trouble xysong1201 mlgdg simplesoftmx rockerritesh crischow017 guanyonglai gopinath-balu hyokong 980044579 alexeyab woffett signalhust

hrnet-image-classification's Issues

Why do I use your HRNetW18, the forward time is 2.5 times as long as resnet50

Could you report training time?

Compared with different.model, could you report training time, and hardware?

Convert to output multi label

Thanks for sharing your work, Could you please give guidance to convert this model to train a multi label classification, ie; Single image multiple outputs

Thanks in advance

How to add tensorboard or using wandb to visual trainning process

How to use tensortboard or wandb to visual trainning process same using in yolov5

How to train with CPU only

about self.class_weights

HI! from ~\lib\datasets\cityscapes.py
i can see:
self.class_weights = torch.FloatTensor([0.8373, 0.918, 0.866, 1.0345,
1.0166, 0.9969, 0.9754, 1.0489,
0.8786, 1.0023, 0.9539, 0.9843,
1.1116, 0.9037, 1.0865, 1.0955,
1.0865, 1.1529, 1.0507]).cuda()

What do you mean？

How to use get_cls_net?

Thanks writing codes.
If I want to directly use the function (get_cls_net) .
How can I define the parameters of config? It is not easy for me to assign variable config to HighResolutionNet.

Can't find the w64 config YAML file

Could you please provide the HRNet-W64-C config file?

which scrips use to caculate the GFLOPs?

How To Perform Inference

I’ve been able to train my model, and perform validation, however, I cannot find a way to do inference. Even in validation, while it tells me the percentage it got wrong, I could not find any file or log that tells me which ones it got wrong. I’ve searched through the entire repo, and haven’t found a way to perform inference.

With that, I would like to ask the obvious question of how to perform inference and use the model.

how to train my own dataset with different categories?

BRANCHES instead of RANCHES

In cls_hrnet_w18_small_v2_sgd_lr5e-2_wd1e-4_bs32_x100.yaml STAGE1 configuration should read NUM_BRANCHES instead of NUM_RANCHES. In fact this doesn't affect in anything the code since .make_stage instead called for stage1; however, just to be consistent it be good to change it.

about--- HRNet-W48-C-ssld (converted from PaddlePaddle)

hello，How is this new checkpoint used, and does it require a new yaml?

No module named 'utils.modelsummary'

No module named 'utils.modelsummary'？

Torch.jit.script not working with this model

I have tried to convert this model to TorchScript using torch.jit.script. However, I am getting this issue:

RuntimeError: 
Expected integer literal for index:
  File "/home/david/Documents/pry/models/archs/hrnet.py", line 228

        for i in range(self.num_branches):
            x[i] = self.branches[i](x[i])
                   ~~~~~~~~~~~~~~~ <--- HERE

        x_fuse = []

How to train on single GPU

For I don't have multi-GPUS, I changed the GPUS = (0,0,0,0), but this error came RuntimeError: inputs must be on unique devices. There is no doubt telling that this network working on more than one GPUS. So how do I change the code to run them on single GPU?

Redistributing pre-trained model

Is it possible to redistribute your pretrained models ? Credit will be given to you of course.
Thanks!

the loss become nan

when i was trying to recurrence the face-xray, I modified the HRNet-Image-Calssification, but I got a bug that loss is nan.
this is what i added after stage4 in the cls_hrnet.py:

        # Upsampling
        x0_h, x0_w = y_list[0].size(2), y_list[0].size(3)
        x1 = F.interpolate(y_list[1], size=(x0_h, x0_w), mode='bilinear',align_corners=True)
        x2 = F.interpolate(y_list[2], size=(x0_h, x0_w), mode='bilinear',align_corners=True)
        x3 = F.interpolate(y_list[3], size=(x0_h, x0_w), mode='bilinear',align_corners=True)

        x = torch.cat([y_list[0], x1, x2, x3], 1)
        x = self.one_conv2d(x) # one conv2d to make the channel to 1
       
        x = F.interpolate(x, size=(224,224),mode='bilinear',align_corners=True)
        xray = torch.sigmoid(x)

then I found the xray is almost zero and the loss is nan, what's wrong?

I write the loss function below:

    def criterion(pred,target):
        x = torch.add(torch.mul(target,torch.log(pred)),torch.mul(torch.sub(1,target),torch.log(torch.sub(1,pred))))
        loss = -torch.mean(x)
        return loss

Begging for training log.

I modify the network structure and train from scratch.Can I get your training log and compare it?

A solution: HRNet Backbone Adopt_different_blocks_bug(BASIC//BOTTLENECK)

https://github.com/HRNet/HRNet-Image-Classification/blob/8f158719e821836e21e6cba99a3241a12a13bc41/lib/models/cls_hrnet.py#L459~L473
If different block types are used in different stages, instead of the default bottleneck-basic-basic-basic in the original yaml file, the channel mismatch error as shown in the figure below will appear. To avoid this error, we change it in the transition layer and use conv3*3 between different stages to match the number of channels. The corrected code and results are shown in the figure below.
(The demonstration is only a proof of feasibility, not an actual demonstration of the code)

Training Custom Dataset

I have been trying to find the format in which I can train RPC Dataset with the HR-Net and do evaluation. It is COCO format. I am unable to use it in Tensor or Pytorch version of the code. The only support that is given is for Imagenet and that too doesnt help.

Why the defualt.py in config folder is writed to describe pose_hrnet?

The defualt.py in config folder is writed to describe pose_hrnet , I want to know that is it suited to image classification.

guide to use pre-trained model

Hello. I am a beginner in deep learning and pytorch, and I have a question about how to use HRNet.

I downloaded the pre-trained model you provided (HRNet-W64-C) from one-drive and cloned the HRNet repo and opened it in pycharm.

In this state, I would like to test the image using the model I downloaded.

If you have a detailed explanation on how to do this, I would like to ask you a link, if not, for beginners.

How to convert the pretrained cls model to the required model for pose estimation

Thanks for such a great work.

I trained HRNet-W32-C in imagenet, got the pretrained cls model (final_state.pth.tar) which has 1956 keys.

However, the pretrained model (hrnet_w32-36af842e.pth) provided by pose_hrnet_w32 [https://github.com/leoxiaobin/deep-high-resolution-net.pytorch] has 2000 keys.

I compared the keys of them, the classification pretraining lacks some keys, what should I do? Looking forward to your reply.

how to get class label values & confusion matrix at test ? its not sufficient to get only accuracy..!

I try with "HRNet-Image-Classification/lib/core/function

function.txt

.py" in this file, at line no. 104,

for i, (input, target) in enumerate(val_loader):
            output = model(input)
            batch_time.update(time.time() - end)
            target = target.cuda(non_blocking=True)
            loss = criterion(output, target)

in this I try with target variable, it shows correct values at train time, but at test time it shows the class label for image in which its stored. not accurate values. can anyone tell how to get class label values for "test.py" ? target variable shows inaccurate values at testing.

Could you tell me the version of imagenet？

To get the validation metrics，what imagenet should I use? Is it ILSVRC2012 or ILSVRC 2017?

new pretrained model "HRNet-W48-C (w/ CosineLR + CutMix + 300epochs)"

Hello. Thank you for offering newly added checkpoints! When I tried to use the one "HRNet-W48-C (w/ CosineLR + CutMix + 300epochs)", my pytorch model loader and tar extractor couldn't work. Could you please release another version, or tell me the correct way to use it? Thanks!

This code is very unfriendly to windows, and it took me two hours to make it run on windows.

Could someone help to put the pretrained models in google drive?

I am interested in the following models:
HRNETV2_W18: "./pretrained_models/hrnetv2_w18_imagenet_pretrained.pth"
HRNETV2_W32: "./pretrained_models/hrnetv2_w32_imagenet_pretrained.pth"
HRNETV2_W48: "./pretrained_models/hrnetv2_w48_imagenet_pretrained.pth"

I am not able to download them from baidu. I wonder if someone can help to put them in google drive.
Many thanks.

Whether the results can be better or worse if we replace the nearest neigbor resize into transpose convolution operation, cause the nearest neigbor is kind of lack of elegance to me.

Thank you for this excellent work!
If there's any experiments supporting any conclusion of above question, please let us know ：）