Giter Site home page Giter Site logo

pycls's Introduction

pycls

Support Ukraine

pycls is an image classification codebase, written in PyTorch. It was originally developed for the On Network Design Spaces for Visual Recognition project. pycls has since matured and been adopted by a number of projects at Facebook AI Research.

pycls provides a large set of baseline models across a wide range of flop regimes.

Introduction

The goal of pycls is to provide a simple and flexible codebase for image classification. It is designed to support rapid implementation and evaluation of research ideas. pycls also provides a large collection of baseline results (Model Zoo). The codebase supports efficient single-machine multi-gpu training, powered by the PyTorch distributed package, and provides implementations of standard models including ResNet, ResNeXt, EfficientNet, and RegNet.

Using pycls

Please see GETTING_STARTED for brief installation instructions and basic usage examples.

Model Zoo

We provide a large set of baseline results and pretrained models available for download in the pycls Model Zoo; including the simple, fast, and effective RegNet models that we hope can serve as solid baselines across a wide range of flop regimes.

Sweep Code

The pycls codebase now provides powerful support for studying design spaces and more generally population statistics of models as introduced in On Network Design Spaces for Visual Recognition and Designing Network Design Spaces. This idea is that instead of planning a single pycls job (e.g., testing a specific model configuration), one can study the behavior of an entire population of models. This allows for quite powerful and succinct experimental design, and elevates the study of individual model behavior to the study of the behavior of model populations. Please see SWEEP_INFO for details.

Projects

A number of projects at FAIR have been built on top of pycls:

If you are using pycls in your research and would like to include your project here, please let us know or send a PR.

Citing pycls

If you find pycls helpful in your research or refer to the baseline results in the Model Zoo, please consider citing an appropriate subset of the following papers:

@InProceedings{Radosavovic2019,
  title = {On Network Design Spaces for Visual Recognition},
  author = {Ilija Radosavovic and Justin Johnson and Saining Xie Wan-Yen Lo and Piotr Doll{\'a}r},
  booktitle = {ICCV},
  year = {2019}
}

@InProceedings{Radosavovic2020,
  title = {Designing Network Design Spaces},
  author = {Ilija Radosavovic and Raj Prateek Kosaraju and Ross Girshick and Kaiming He and Piotr Doll{\'a}r},
  booktitle = {CVPR},
  year = {2020}
}

@InProceedings{Dollar2021,
  title = {Fast and Accurate Model Scaling},
  author = {Piotr Doll{\'a}r and Mannat Singh and Ross Girshick},
  booktitle = {CVPR},
  year = {2021}
}

License

pycls is released under the MIT license. Please see the LICENSE file for more information.

Contributing

We actively welcome your pull requests! Please see CONTRIBUTING.md and CODE_OF_CONDUCT.md for more info.

pycls's People

Contributors

alimbekovkz avatar amyreese avatar dmitryvinn avatar flystarhe avatar igorsugak avatar ir413 avatar mannatsingh avatar pdollar avatar r-barnes avatar rahulg avatar rajprateek avatar sdebnathusc avatar shoufachen avatar stanislavglebik avatar thatch avatar theschnitz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pycls's Issues

size mismatch for stem.conv.weight: copying a param of torch.Size([64, 3, 7, 7]) from checkpoint, where the shape is torch.Size([16, 3, 3, 3]) in current model. size mismatch for stem.bn.weight: copying a param of torch.Size([64]) from checkpoint, where the shape is torch.Size([16]) in current model.

size mismatch for stem.conv.weight: copying a param of torch.Size([64, 3, 7, 7]) from checkpoint, where the shape is torch.Size([16, 3, 3, 3]) in current model.
size mismatch for stem.bn.weight: copying a param of torch.Size([64]) from checkpoint, where the shape is torch.Size([16]) in current model.

Waiting For the RegNet

Hi๏ผŒthx for the codebase.
I wonder when to release the RegNet pre-trained models.

About Fig.5 in RegNet paper.

Hi, I'm confused about the Fig. 5 (left, middle) in the RegNet paper.

I know Fig. 5(left) shows the results which are under the condition that sharing bottleneck ratio b_i = b๏ผŒ but which b do you choose to get the results? Or for every specific value of b, the conclusions are same? The same confusion about the group g in the middle figure.

Did I miss something?

Cannot load pretrain model.

Thx for nice sharing!

My env:
Win10 + pytorch1.6

While I try to use
model = pycls.models.regnety(model_cate, pretrained=True)

Just got bad result, which equal when I set pretrained=False.

And, I also try to load by myself,
model.load_state_dict(torch.load(load_path),strict=True)

Got error below:

model.load_state_dict(torch.load(load_path),strict=True) File "F:\Software\Anaconda\envs\pth\lib\site-packages\torch\nn\modules\module.py", line 1045, in load_state_dict self.__class__.__name__, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for RegNet: Missing key(s) in state_dict: "stem.conv.weight", "stem.bn.weight", "stem.bn.bias", "stem.bn.running_mean", "stem.bn.running_var", "s1.b1.proj.weight", "s1.b1.bn.weight", "s1.b1.bn.bias", "s1.b1.bn.running_mean", "s1.b1.bn.running_var", "s1.b1.f.a.weight", "s1.b1.f.a_bn.weight", "s1.b1.f.a_bn.bias", "s1.b1.f.a_bn.running_mean", "s1.b1.f.a_bn.running_var", "s1.b1.f.b.weight", "s1.b1.f.b_bn.weight", "s1.b1.f.b_bn.bias", "s1.b1.f.b_bn.running_mean", "s1.b1.f.b_bn.running_var", "s1.b1.f.se.f_ex.0.weight", "s1.b1.f.se.f_ex.0.bias", "s1.b1.f.se.f_ex.2.weight", "s1.b1.f.se.f_ex.2.bias", "s1.b1.f.c.weight", "s1.b1.f.c_bn.weight", "s1.b1.f.c_bn.bias", "s1.b1.f.c_bn.running_mean", "s1.b1.f.c_bn.running_var", "s2.b1.proj.weight", "s2.b1.bn.weight", "s2.b1.bn.bias", "s2.b1.bn.running_mean", "s2.b1.bn.running_var", "s2.b1.f.a.weight", "s2.b1.f.a_bn.weight", "s2.b1.f.a_bn.bias", "s2.b1.f.a_bn.running_mean", "s2.b1.f.a_bn.running_var", "s2.b1.f.b.weight", "s2.b1.f.b_bn.weight", "s2.b1.f.b_bn.bias", "s2.b1.f.b_bn.running_mean", "s2.b1.f.b_bn.running_var", "s2.b1.f.se.f_ex.0.weight", "s2.b1.f.se.f_ex.0.bias", "s2.b1.f.se.f_ex.2.weight", "s2.b1.f.se.f_ex.2.bias", "s2.b1.f.c.weight", "s2.b1.f.c_bn.weight", "s2.b1.f.c_bn.bias", "s2.b1.f.c_bn.running_mean", "s2.b1.f.c_bn.running_var", "s3.b1.proj.weight", "s3.b1.bn.weight", "s3.b1.bn.bias", "s3.b1.bn.running_mean", "s3.b1.bn.running_var", "s3.b1.f.a.weight", "s3.b1.f.a_bn.weight", "s3.b1.f.a_bn.bias", "s3.b1.f.a_bn.running_mean", "s3.b1.f.a_bn.running_var", "s3.b1.f.b.weight", "s3.b1.f.b_bn.weight", "s3.b1.f.b_bn.bias", "s3.b1.f.b_bn.running_mean", "s3.b1.f.b_bn.running_var", "s3.b1.f.se.f_ex.0.weight", "s3.b1.f.se.f_ex.0.bias", "s3.b1.f.se.f_ex.2.weight", "s3.b1.f.se.f_ex.2.bias", "s3.b1.f.c.weight", "s3.b1.f.c_bn.weight", "s3.b1.f.c_bn.bias", "s3.b1.f.c_bn.running_mean", "s3.b1.f.c_bn.running_var", "s3.b2.f.a.weight", "s3.b2.f.a_bn.weight", "s3.b2.f.a_bn.bias", "s3.b2.f.a_bn.running_mean", "s3.b2.f.a_bn.running_var", "s3.b2.f.b.weight", "s3.b2.f.b_bn.weight", "s3.b2.f.b_bn.bias", "s3.b2.f.b_bn.running_mean", "s3.b2.f.b_bn.running_var", "s3.b2.f.se.f_ex.0.weight", "s3.b2.f.se.f_ex.0.bias", "s3.b2.f.se.f_ex.2.weight", "s3.b2.f.se.f_ex.2.bias", "s3.b2.f.c.weight", "s3.b2.f.c_bn.weight", "s3.b2.f.c_bn.bias", "s3.b2.f.c_bn.running_mean", "s3.b2.f.c_bn.running_var", "s3.b3.f.a.weight", "s3.b3.f.a_bn.weight", "s3.b3.f.a_bn.bias", "s3.b3.f.a_bn.running_mean", "s3.b3.f.a_bn.running_var", "s3.b3.f.b.weight", "s3.b3.f.b_bn.weight", "s3.b3.f.b_bn.bias", "s3.b3.f.b_bn.running_mean", "s3.b3.f.b_bn.running_var", "s3.b3.f.se.f_ex.0.weight", "s3.b3.f.se.f_ex.0.bias", "s3.b3.f.se.f_ex.2.weight", "s3.b3.f.se.f_ex.2.bias", "s3.b3.f.c.weight", "s3.b3.f.c_bn.weight", "s3.b3.f.c_bn.bias", "s3.b3.f.c_bn.running_mean", "s3.b3.f.c_bn.running_var", "s3.b4.f.a.weight", "s3.b4.f.a_bn.weight", "s3.b4.f.a_bn.bias", "s3.b4.f.a_bn.running_mean", "s3.b4.f.a_bn.running_var", "s3.b4.f.b.weight", "s3.b4.f.b_bn.weight", "s3.b4.f.b_bn.bias", "s3.b4.f.b_bn.running_mean", "s3.b4.f.b_bn.running_var", "s3.b4.f.se.f_ex.0.weight", "s3.b4.f.se.f_ex.0.bias", "s3.b4.f.se.f_ex.2.weight", "s3.b4.f.se.f_ex.2.bias", "s3.b4.f.c.weight", "s3.b4.f.c_bn.weight", "s3.b4.f.c_bn.bias", "s3.b4.f.c_bn.running_mean", "s3.b4.f.c_bn.running_var", "s4.b1.proj.weight", "s4.b1.bn.weight", "s4.b1.bn.bias", "s4.b1.bn.running_mean", "s4.b1.bn.running_var", "s4.b1.f.a.weight", "s4.b1.f.a_bn.weight", "s4.b1.f.a_bn.bias", "s4.b1.f.a_bn.running_mean", "s4.b1.f.a_bn.running_var", "s4.b1.f.b.weight", "s4.b1.f.b_bn.weight", "s4.b1.f.b_bn.bias", "s4.b1.f.b_bn.running_mean", "s4.b1.f.b_bn.running_var", "s4.b1.f.se.f_ex.0.weight", "s4.b1.f.se.f_ex.0.bias", "s4.b1.f.se.f_ex.2.weight", "s4.b1.f.se.f_ex.2.bias", "s4.b1.f.c.weight", "s4.b1.f.c_bn.weight", "s4.b1.f.c_bn.bias", "s4.b1.f.c_bn.running_mean", "s4.b1.f.c_bn.running_var", "s4.b2.f.a.weight", "s4.b2.f.a_bn.weight", "s4.b2.f.a_bn.bias", "s4.b2.f.a_bn.running_mean", "s4.b2.f.a_bn.running_var", "s4.b2.f.b.weight", "s4.b2.f.b_bn.weight", "s4.b2.f.b_bn.bias", "s4.b2.f.b_bn.running_mean", "s4.b2.f.b_bn.running_var", "s4.b2.f.se.f_ex.0.weight", "s4.b2.f.se.f_ex.0.bias", "s4.b2.f.se.f_ex.2.weight", "s4.b2.f.se.f_ex.2.bias", "s4.b2.f.c.weight", "s4.b2.f.c_bn.weight", "s4.b2.f.c_bn.bias", "s4.b2.f.c_bn.running_mean", "s4.b2.f.c_bn.running_var", "s4.b3.f.a.weight", "s4.b3.f.a_bn.weight", "s4.b3.f.a_bn.bias", "s4.b3.f.a_bn.running_mean", "s4.b3.f.a_bn.running_var", "s4.b3.f.b.weight", "s4.b3.f.b_bn.weight", "s4.b3.f.b_bn.bias", "s4.b3.f.b_bn.running_mean", "s4.b3.f.b_bn.running_var", "s4.b3.f.se.f_ex.0.weight", "s4.b3.f.se.f_ex.0.bias", "s4.b3.f.se.f_ex.2.weight", "s4.b3.f.se.f_ex.2.bias", "s4.b3.f.c.weight", "s4.b3.f.c_bn.weight", "s4.b3.f.c_bn.bias", "s4.b3.f.c_bn.running_mean", "s4.b3.f.c_bn.running_var", "s4.b4.f.a.weight", "s4.b4.f.a_bn.weight", "s4.b4.f.a_bn.bias", "s4.b4.f.a_bn.running_mean", "s4.b4.f.a_bn.running_var", "s4.b4.f.b.weight", "s4.b4.f.b_bn.weight", "s4.b4.f.b_bn.bias", "s4.b4.f.b_bn.running_mean", "s4.b4.f.b_bn.running_var", "s4.b4.f.se.f_ex.0.weight", "s4.b4.f.se.f_ex.0.bias", "s4.b4.f.se.f_ex.2.weight", "s4.b4.f.se.f_ex.2.bias", "s4.b4.f.c.weight", "s4.b4.f.c_bn.weight", "s4.b4.f.c_bn.bias", "s4.b4.f.c_bn.running_mean", "s4.b4.f.c_bn.running_var", "s4.b5.f.a.weight", "s4.b5.f.a_bn.weight", "s4.b5.f.a_bn.bias", "s4.b5.f.a_bn.running_mean", "s4.b5.f.a_bn.running_var", "s4.b5.f.b.weight", "s4.b5.f.b_bn.weight", "s4.b5.f.b_bn.bias", "s4.b5.f.b_bn.running_mean", "s4.b5.f.b_bn.running_var", "s4.b5.f.se.f_ex.0.weight", "s4.b5.f.se.f_ex.0.bias", "s4.b5.f.se.f_ex.2.weight", "s4.b5.f.se.f_ex.2.bias", "s4.b5.f.c.weight", "s4.b5.f.c_bn.weight", "s4.b5.f.c_bn.bias", "s4.b5.f.c_bn.running_mean", "s4.b5.f.c_bn.running_var", "s4.b6.f.a.weight", "s4.b6.f.a_bn.weight", "s4.b6.f.a_bn.bias", "s4.b6.f.a_bn.running_mean", "s4.b6.f.a_bn.running_var", "s4.b6.f.b.weight", "s4.b6.f.b_bn.weight", "s4.b6.f.b_bn.bias", "s4.b6.f.b_bn.running_mean", "s4.b6.f.b_bn.running_var", "s4.b6.f.se.f_ex.0.weight", "s4.b6.f.se.f_ex.0.bias", "s4.b6.f.se.f_ex.2.weight", "s4.b6.f.se.f_ex.2.bias", "s4.b6.f.c.weight", "s4.b6.f.c_bn.weight", "s4.b6.f.c_bn.bias", "s4.b6.f.c_bn.running_mean", "s4.b6.f.c_bn.running_var", "s4.b7.f.a.weight", "s4.b7.f.a_bn.weight", "s4.b7.f.a_bn.bias", "s4.b7.f.a_bn.running_mean", "s4.b7.f.a_bn.running_var", "s4.b7.f.b.weight", "s4.b7.f.b_bn.weight", "s4.b7.f.b_bn.bias", "s4.b7.f.b_bn.running_mean", "s4.b7.f.b_bn.running_var", "s4.b7.f.se.f_ex.0.weight", "s4.b7.f.se.f_ex.0.bias", "s4.b7.f.se.f_ex.2.weight", "s4.b7.f.se.f_ex.2.bias", "s4.b7.f.c.weight", "s4.b7.f.c_bn.weight", "s4.b7.f.c_bn.bias", "s4.b7.f.c_bn.running_mean", "s4.b7.f.c_bn.running_var", "head.fc.weight", "head.fc.bias". Unexpected key(s) in state_dict: "epoch", "model_state", "optimizer_state", "cfg".

Release additional pre-trained models (if available)

First, thank you for this contribution and the release of code and pre-trained models. I have trained two of the smaller settings and get similar error rates to the released models.

In the MODEL_ZOO.md readme, you mention that "the reported errors are averaged across 5 reruns for robust estimates". If these additional models (5 models per setting) are saved somewhere and match the current codebase, it would be a valuable addition to what is currently released. For instance, research in ensemble methods or analysis in variations across models would benefit greatly from this contribution.

For the larger settings (e.g., RegNetX-32GF at 76 train hours for 8 GPUs), training 5 models would take over two weeks on 8 GPUs, making it difficult for most researchers to do. Thanks!

how to use other dataset to train and test?

how to use other dataset to train and test?
what is the size of imagenet?

size mismatch for head.fc.bias: copying a param of torch.Size([1000]) from checkpoint, where the shape is torch.Size([10]) in current model.

Fail to use torch.utils.tensorboard when training with multi-gpu

Hi all,
I was trying to log information with tensorboard so I saved loss and accuracy at the end of both function train_epoch and test_epoch. Everything was ok when training with only one gpu but failed with multi-gpu, that mean it will show "No dashboards are active for the current data set..." on the browser.
Does anyone face the situation as well?

ps. My environment setting was pulled from nvidia-docker image nvcr.io/nvidia/pytorch:19.10-py3
(docker image detail)

Thanks!

module 'pycls.core' has no attribute 'builders'

I have install GETTING_STARTED.md.
I run "python ./tools/train_net.py --cfg ./configs/dds_baselines/regnetx/RegNetX-400MF_dds_8gpu.yaml OUT_DIR ./tmp".
Get error :
Traceback (most recent call last):
File "./tools/train_net.py", line 12, in
import pycls.core.trainer as trainer
File "/home/ex/pycls-master/pycls/core/trainer.py", line 14, in
import pycls.core.builders as builders
File "/home/ex/pycls-master/pycls/core/builders.py", line 12, in
from pycls.models.anynet import AnyNet
File "/home/ex/pycls-master/pycls/models/init.py", line 10, in
from pycls.models.model_zoo import effnet, regnetx, regnety, resnet, resnext
File "/home/ex/pycls-master/pycls/models/model_zoo.py", line 12, in
import pycls.core.builders as builders
AttributeError: module 'pycls.core' has no attribute 'builders'

Thanks

How to sample models for Figure 11 in RegNet paper

Hi, I noticed that 100 models are sampled to get the results as shown in Figure 11. (sec 4).

However, as the flops in the figure span a wide range(0.2B~12.8B), I don't know whether

the total number of models in all the flops regime is 100, or

for each of the flops regime, you sampled 100 models?

would you tell us how to prepare imagenet dataset?

Hi,
After going through the code, I noticed this line:

self._class_ids = sorted(

It seems that the imagenet val dataset does not have images stored in different subdirectories as does with train set. Why is the dataset implement like this? Would you please tell us how to prepare the imagenet dataset so that we could reproduce the result in the model zoo?

How to add agent?

If I change the model_builder to a trainable reinforcement learning agent and want to use the multiple gpus training code, what should I do?
Thanks!

Plan to support the design space comparison

Hi @rajprateek , @ir413 ,
Thanks for your team's great work, it provides many insights to the community. I am sure that the model zoos and the current codebase could inspire future research a lot.

I am also a little bit curious about the future plans of your codebase. So I want to ask that do you have any plans to support the design space comparison in this repo? For example, to allow users to sample & train models from different design spaces and compare these design spaces as described in the Sec. 3.1, as shown in Fig. 5, 7, and 9 in the paper. I think this feature could help the community to reproduce the comparison process and further improve this codebase's impact.

time_model.py gives different results to those in model_zoo

Hi - I appreciate there's already an open issue related to speed, but mine is slightly different.

When I run
python tools/time_net.py --cfg configs/dds_baselines/regnetx/RegNetX-1.6GF_dds_8gpu.yaml
having changed GPUS: from 8 to 1, I get the following dump. I am running this on a batch of size 64, with input resolution 224x224, on a V100, as stated in the paper.

image
This implies a forward pass of ~62ms, not the 33ms stated in MODEL_ZOO. Have I done something wrong? Not sure why the times are so different. The other numbers (acts, params, flops) all seem fine. The latency differences are seen for other models as well - here is 800MF (39ms vs model zoo's 21ms):
image

I am using commit a492b56, not the latest version of the repo, but MODEL_ZOO has not been changed since before this commit. This is because it is useful being able to time the models on dummy data, rather than having to construct a dataset. Would it be possible to have an option to do this? I can open a separate issue as a feature request for consideration if necessary.

delete

Any plans to enable transfer learning ?

`get_loss_fun` err message

assert cfg.MODEL.LOSS_FUN in _loss_funs.keys(), err_str.format(cfg.TRAIN.LOSS)

assert cfg.MODEL.LOSS_FUN in _loss_funs.keys(), err_str.format(cfg.TRAIN.LOSS)

To:

assert cfg.MODEL.LOSS_FUN in _loss_funs.keys(), err_str.format(cfg.MODEL.LOSS)

Question about speed

Hello, I tested the inference speed of RegNetX8.0 and RegNetX600MF on P40. Both are in batch_size = 1, input_size = 224x224, and averaged after 50 runs.

The result is that the average inference time of RegNetX8.0 is 22ms, and the inference time of RegNetX600MF is 15ms. Why the FLOPs of the two are so different, but the reasoning time is not so different?

And the inference time of MobileNetV1 is only 3.3ms. Does RegNet have plans to launch such a fast model?

Thanks a lot~

Bottlenecked by Dataloader

Hello Everyone

I am running some experiments using pycls and despite my best efforts, I was not able to run RegNetX-200MF_dds_8gpu.yaml without being bottlenecked by the data loader.

As a minimal example I did the following:

I ran this config on PyTorch 1.4.0, CUDA 10.1 in accordance with #79. (Full env below)
python tools/time_net.py --cfg configs/dds_baselines/regnetx/RegNetX-200MF_dds_8gpu.yaml
image

When I start training I get an eta of roughly 3d20h, while you were able to train the same net in 2.8h on 8 GPU - so I would expect a ballpark runtime of 20h.
python tools/train_net.py --cfg configs/dds_baselines/regnetx/RegNetX-200MF_dds_8gpu.yaml
image

This minimal example was run on a Dell Precision 7730. But I have the same problem when executing remotely on a server with 8 GPUs.

I am a bit lost over here so any help would be greatly appreciated!
Best Lukas

environment.yml.txt
python -m cProfile -s cumtime tools/time_net.py --cfg configs/dds_baselines/regnetx/RegNetX-200MF_dds_8gpu.yaml
image

implementation of empirical bootstrap

Hi, thank you very much for binging this codebase and recently released model zoo. I'm really interested in your related works.

Would you please provide the empirical bootstrap implementation used in the Designing Network Design Spaces?

With AutoAugment(RandAugment) or CutMix ?

I have tried to train RegNet variants with strong augmentations, such as AutoAugment or CutMix.

But the performance can not be improved with them.

For example, I have reproduced the paper's result for RegNet-Y-400M but I got around top1 accuracy 69% with CutMix, which is way below the vanilla RegNet-Y-400M.

I also tried to train the RegNet-Y with longer epochs, but failed to improve the result.

Do you have any experience for the strong data augmentations?

may be SE width is wrong?

if se_r:
        w_se = int(round(w_in * se_r))
        self.se = SE(w_b, w_se)

in anynet.py ln 192, the w_in should change to w_b, as input width for SE block is w_b, not w_in.

Reproduce the result of RegNetY

Thanks for sharing the code.

I have tried to reproduce the result of RegNet-Y-400M, but failed.

I have changed num_gpus=4 (from 8 in the original configuration) and just run the command as readme suggest.

python tools/train_net.py --cfg configs/dds_baselines/regnety/RegNetY-400MF_dds_8gpu.yaml

I can get only 68%~70% top-1 accuracy, which is way below the official result.

Here is my environment.

V100 x 4
pytorch 1.6
CUDA 10.1

What could be the reason?

Should the Residual Block drop the last conv1x1 if b=1?

Hello,
As the question describe, we can drop the last conv1x1 if no bottleneck (b=1) is used. I think I read it in the paper, but I could not find it in your implementation.

We also observe that the best models use a bottleneck ratio b of 1.0 (top-middle), which effectively removes the bottleneck (commonly used in practice).

Did you try to drop either the first or last conv1x1 in your experiments?
Thank you.

The data augmentation in dataloader

Hi, thanks for this repo. In https://github.com/facebookresearch/pycls/blob/master/MODEL_ZOO.md, you said that your primary goal is to provide simple and strong baselines that are easy to reproduce. However, I found that the repo still uses PCA random lighting for data augmentation. After deleting PCA random lighting, the performance will drop, I have tested on Resnet50(23.4677 VS 23.2) and EfficientB0 (25.52 VS 24.9). I believe if this repo also has results with only basic transformations will make others easier to reproduce, considering other data loader (Nvidia-DALI) may hard to implement PCA lighting:).

Would you please list the accuracy of the models?

Hi,

Thanks for bring this codebase to us. I noticed that, with the configuration in this code base, we can train efficientnet-b0 with 50 epochs, which is less than that in the paper. Can this configuration result in same accuracy as reported in the paper? Would you please list the accuracy of the provided configurations ?

"top1_err": 0.0000, "top5_err": 0.0000

My errors are becoming zero within a single epoch. Is this to be expected ? training with ResNeXt-101

[trainer.py: 165]: Start epoch: 1
[meters.py: 153]: json_stats: {"_type": "train_iter", "epoch": "1/100", "eta": "21,12:49:36", "iter": "10/931", "loss": 0.9199, "lr": 0.0125, "mem": 10038, "time_avg": 19.9869, "time_diff": 7.1350, "top1_err": 3.1250, "top5_err": 1.5625}
[meters.py: 153]: json_stats: {"_type": "train_iter", "epoch": "1/100", "eta": "17,03:15:09", "iter": "20/931", "loss": 0.0001, "lr": 0.0125, "mem": 10038, "time_avg": 15.9058, "time_diff": 0.4157, "top1_err": 0.0000, "top5_err": 0.0000}
[meters.py: 153]: json_stats: {"_type": "train_iter", "epoch": "1/100", "eta": "17,18:20:00", "iter": "30/931", "loss": 0.0158, "lr": 0.0125, "mem": 10038, "time_avg": 16.4908, "time_diff": 5.4462, "top1_err": 0.0000, "top5_err": 0.0000}
[meters.py: 153]: json_stats: {"_type": "train_iter", "epoch": "1/100", "eta": "16,18:25:35", "iter": "40/931", "loss": 0.0339, "lr": 0.0125, "mem": 10038, "time_avg": 15.5678, "time_diff": 0.4202, "top1_err": 0.0000, "top5_err": 0.0000}

Question about stem_w

Does the stem width parameter stem_w stay as 32 for all RegNet models, or does it follow the initial width w_0?

Why did you compare RegNet with your EfficientNet results instead of original EfficientNet results from the paper?

  1. Why did you compare RegNet with your EfficientNet results instead of original EfficientNet results from their paper https://arxiv.org/abs/1905.11946 Table 2?

  2. Why you didn't use enhancements (DropPath, more epochs, RMSProp, AutoAugment, ...) from Table 7 RegNet-paper for training RegNet?

Pytorch1.4 is OK; Pytorch1.2 failed

Pytorch 1.2 will meet following error:

  File "tools/train_net.py", line 255, in <module>
    main()
  File "tools/train_net.py", line 251, in main
    single_proc_train()
  File "tools/train_net.py", line 229, in single_proc_train
    train_model()
  File "tools/train_net.py", line 161, in train_model
    model = model_builder.build_model()
  File "xxxx/pycls/pycls/core/model_builder.py", line 36, in build_model
    model = _models[cfg.MODEL.TYPE]()
  File "xxxx/pycls/pycls/models/resnet.py", line 234, in __init__
    self._construct_cifar()
  File "xxxx/pycls/pycls/models/resnet.py", line 248, in _construct_cifar
    self.s1 = ResStage(w_in=16, w_out=16, stride=1, d=d)
  File "xxxx/pycls/pycls/models/resnet.py", line 164, in __init__
    super(ResStage, self).__init__()
  File "xxxx/anaconda3/envs/pytorch12/lib/python3.7/site-packages/torch/nn/modules/module.py", line 72, in __init__
    self._construct()
TypeError: _construct() missing 6 required positional arguments: 'w_in', 'w_out', 'stride', 'd', 'w_b', and 'num_gs'

Figure 15. vs Eqn. (2-4)

Hi, I'm confused when I check the numbers in the Figure 15.

Take the RegNetX-3.2GF for example. The params are as following

d = [2, 6, 15, 2]
w = [96, 192, 432, 1008]
wa = 26
w0 = 88
wm = 2.2

I can't get the w = 96 for the first stage through either Eqn. 2 wj = w0 +wa * j or Eqn.4 wj = w0 * wm ^ sj.

Did I miss something?

how to do tranfer learning with these models.

i have tried to load the models but failed

from pycls.models.regnet import RegNet

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = RegNet()
# optimizer = TheOptimizerClass(*args, **kwargs)

checkpoint = torch.load('model/RegNetY-32GF_dds_8gpu.pyth', map_location=device)['model_state']
model.load_state_dict(checkpoint)```

but got this error:

RuntimeError                              Traceback (most recent call last)
<ipython-input-19-2467f7a101a2> in <module>
      9         checkpoint[key.replace('model.', '')] = checkpoint[key]
     10         del checkpoint[key]
---> 11 model.load_state_dict(checkpoint)

c:\users\neo\.conda\envs\pytorch\lib\site-packages\torch\nn\modules\module.py in load_state_dict(self, state_dict, strict)
    828         if len(error_msgs) > 0:
    829             raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
--> 830                                self.__class__.__name__, "\n\t".join(error_msgs)))
    831         return _IncompatibleKeys(missing_keys, unexpected_keys)
    832 

**RuntimeError: Error(s) in loading state_dict for RegNet:**
	Missing key(s) in state_dict: "s1.b3.f.a.weight", "s1.b3.f.a_bn.weight", "s1.b3.f.a_bn.bias", "s1.b3.f.a_bn.running_mean", "s1.b3.f.a_bn.running_var", "s1.b3.f.b.weight", "s1.b3.f.b_bn.weight", "s1.b3.f.b_bn.bias", "s1.b3.f.b_bn.running_mean", "s1.b3.f.b_bn.running_var", "s1.b3.f.c.weight", "s1.b3.f.c_bn.weight", "s1.b3.f.c_bn.bias", "s1.b3.f.c_bn.running_mean", "s1.b3.f.c_bn.running_var", "s1.b4.f.a.weight", "s1.b4.f.a_bn.weight", "s1.b4.f.a_bn.bias", "s1.b4.f.a_bn.running_mean", "s1.b4.f.a_bn.running_var", "s1.b4.f.b.weight", "s1.b4.f.b_bn.weight", "s1.b4.f.b_bn.bias", "s1.b4.f.b_bn.running_mean", "s1.b4.f.b_bn.running_var", "s1.b4.f.c.weight", "s1.b4.f.c_bn.weight", "s1.b4.f.c_bn.bias", "s1.b4.f.c_bn.running_mean", "s1.b4.f.c_bn.running_var", "s2.b6.f.a.weight", "s2.b6.f.a_bn.weight", "s2.b6.f.a_bn.bias", "s2.b6.f.a_bn.running_mean", "s2.b6.f.a_bn.running_var", "s2.b6.f.b.weight", "s2.b6.f.b_bn.weight", "s2.b6.f.b_bn.bias", "s2.b6.f.b_bn.running_mean", "s2.b6.f.b_bn.running_var", "s2.b6.f.c.weight", "s2.b6.f.c_bn.weight", "s2.b6.f.c_bn.bias", "s2.b6.f.c_bn.running_mean", "s2.b6.f.c_bn.running_var". 
	**Unexpected key(s) in state_dict:** "s3.b1.proj.weight", "s3.b1.bn.weight", "s3.b1.bn.bias", "s3.b1.bn.running_mean", "s3.b1.bn.running_var", "s3.b1.bn.num_batches_tracked", "s3.b1.f.a.weight", "s3.b1.f.a_bn.weight", "s3.b1.f.a_bn.bias", "s3.b1.f.a_bn.running_mean", "s3.b1.f.a_bn.running_var", "s3.b1.f

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.