Giter Site home page Giter Site logo

naver-ai / rdnet Goto Github PK

View Code? Open in Web Editor NEW
84.0 4.0 3.0 15.08 MB

[ECCV2024] Official implementation of paper, "DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs".

Home Page: https://arxiv.org/abs/2403.19588

License: Apache License 2.0

Python 100.00%
backbone classification convolutional-neural-networks dense-connections revisit rdnet densenet eccv2024

rdnet's Introduction

๐ŸŽ‰๐ŸŽ‰ Our paper has been accepted at ECCV 2024! Stay tuned for more updates !! ๐ŸŽ‰๐ŸŽ‰

Donghyun Kim1*, Byeongho Heo2, Dongyoon Han2*

1NAVER Cloud AI, 2NAVER AI Lab

Densenet Reloaded Densenet becomes RDNet

We revitalize Densely Connected Convolutional Networks (DenseNets) and reveal their untapped potential to challenge the prevalent dominance of ResNet-style architectures. Our research indicates that DenseNets were previously underestimated, primarily due to conventional design choices and training methods that underexploited their full capabilities.


tradeoff with SOTA models

Tradeoff with RDNet (ours) and SOTA models


tradeoff with mainstream models

Tradeoff with RDNet (ours) and mainstream models

Key Highlights:

  • Pilot study (ยง5.1) reveals concatenations' effectiveness.
  • We have meticulously upgraded various aspects of DenseNets (ยง3.2) through architectural tweaks and block redesigns.
  • Our revitalized DenseNets (RDNets) outperform mainstream architectures such as Swin Transformer, ConvNeXt, and DeiT-III (ยง4.1).

Our work aims to reignite interest in DenseNets by demonstrating their renewed relevance and superiority in the current architectural landscape. We encourage the community to explore and build upon our findings, paving the way for further innovative contributions in deep learning architectures.

We believe that various architectural designs that have been popular recently would be combined with dense connections successfully.

Easy to use

RDNet is available on timm. You can easily use RDNet by installing the timm package.

import timm

model = timm.create_model('rdnet_large', pretrained=True)

For detailed usage, please refer to the huggingface model card.

Updates

  • (2024.07.24): Pip installable pacakge added.
  • (2024.04.19): Initial release of the repository.
  • (2024.03.28): Paper is available on arXiv.

Coming Soon

  • More ImageNet-22k Pretrained Models.
  • More ImageNet-1k fine-tuned models.
  • Cascade Mask R-CNN with RDNet.
  • Transfer Learning with RDNet (with cifar10, cifar100, stanford cars, ...).

RDNet for Image Classification

For details on object detection and instance segmentation, please refer to detection/README.md.

For details on semantic segmentation, please refer to segmentation/README.md.

Model Zoo

We provide the pretrained models for RDNet. You can download the pretrained models from the links below.

ImageNet-1K (pre-)trained models

Model IMG Size Params FLOPs Top-1 Model Card url
RDNet-T 224 22M 5.0G 82.8 model_card HFHub
RDNet-S 224 50M 8.7G 83.7 model_card HFHub
RDNet-B 224 87M 15.4G 84.4 model_card HFHub
RDNet-L 224 186M 34.7G 84.8 model_card HFHub

ImageNet-1K fine-tuned models

Model fine-tune from IMG Size Params FLOPs Top-1 Model Card url
RDNet-L (384) RDNet-L 384 186M 101.9G 85.8 model_card HFHub

Training

We provide the graphs of the training procedure. The graph is generated by the Weights & Biases service. You can check the graph by clicking the link below.

https://api.wandb.ai/links/dhkim0225/822w2zsj

For training commands, please refer to the TRAINING.md.

Acknowledgement

This repository is built using the timm, MMDetection, and MMSegmentation.

Citation

@misc{kim2024densenets,
    title={DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs}, 
    author={Donghyun Kim and Byeongho Heo and Dongyoon Han},
    year={2024},
    eprint={2403.19588},
    archivePrefix={arXiv},
}

License

Copyright (c) 2024-present NAVER Cloud Corp.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

License for Dataset

ImageNet - ImageNet Terms of access, https://image-net.org/download

Images from ADE20K - ADE20K Terms of Use, https://groups.csail.mit.edu/vision/datasets/ADE20K/terms/

MS COCO images dataset - Creative Commons Attribution 4.0 License, https://viso.ai/computer-vision/coco-dataset/

rdnet's People

Contributors

dhkim0225 avatar dyhan0920 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

rdnet's Issues

Pretrained model in timm

Thanks for the great work!

Any plans to have pretrained models in timm repo so it can be used as an arbitrary backbone/embedder?

Where can I find the CIFAR-10 training code?

I discovered this repository through the "Papers with Code" page for CIFAR-10 image classification. However, it appears that this repository primarily addresses the ImageNet dataset. Could you provide the CIFAR-10 training code to assist with my research?Thanks.

To many Total mult-adds nums.

Thank you for great research!

model_t = timm.create_model('rdnet_tiny.nv_in1k', pretrained=False)
summary(model_t, input_size=(1, 3, 224, 224), device='cpu')

Total params: 23,541,024
Trainable params: 23,541,024
Non-trainable params: 0
Total mult-adds (Units.GIGABYTES): 78.87
Input size (MB): 0.60
Forward/backward pass size (MB): 2651.23
Params size (MB): 94.15
Estimated Total Size (MB): 2745.99

image

your paper assert that tiny model's FLOPs is 5GB. but it has 78GB.
Of course, i might not estimate right method.
But, i compare FLOPs with another models. than rdnet has much more.

I think, this code has some junk behavior. What do you think about this cause?

Thanks.

About downstream task and DropPath

Thank you for your great ideas and work.

I have currently re-implemented RDNet Backbone with Tensorflow, and I am trying to apply it to Human Pose Estimation(HPE) as a downstream task and I am asking you because I have the following questions:

  1. From what I understand after reading RDNet paper, RDNet backbone uses Stochastic Depth(DropPath), but RDNet's DenseBlock class only declares DropPath and doesn't seem to use it. Is this the part that was intended? If yes, please explain why you didn't use it.
  2. As a downstream task, I am trying to train a model by applying RDNet to the HPE task, but I have confirmed that the model is not trained well. I have tried several times by modifying hyperparameters such as batch size, LR, etc., but the model is not trained well. As a curiosity about this, could the training be sensitive or not trained well when applying RDNet Backbone to the downstream task?
    The conditions used in training are as follows:
    • Backbone: RDNet-Tiny (pretrained: X)
    • LR Scheduler: MultiStepLR
    • LR: 0.001 (also tried lower LR)
    • batch size: 64 (also tried 16, 32)
    • optimizer: Adam

ํ•œ๊ตญ์–ด ๋ฒ„์ „(Korean Translation) RDNet์ด๋ผ๋Š” ์ข‹์€ ์•„์ด๋””์–ด์™€ ๊ฒฐ๊ณผ๋ฌผ์„ ๊ณต์œ  ํ•ด์ฃผ์…”์„œ ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค.

ํ˜„์žฌ ์ €๋Š” RDNet Backbone์„ Tensorflow๋กœ ์žฌ ๊ตฌํ˜„ํ•˜์—ฌ, downstream task๋กœ Human Pose Estimation(HPE)์— ์ ์šฉํ•˜๊ณ ์ž ํ•˜๋Š” ์ƒํ™ฉ ์ค‘์— ๊ถ๊ธˆํ•œ ์ ์ด ์ƒ๊ฒจ ์•„๋ž˜์˜ ์งˆ๋ฌธ๋“ค์„ ๋ฌธ์˜๋“œ๋ฆฝ๋‹ˆ๋‹ค:

  1. RDNet ๋…ผ๋ฌธ์„ ์ดํ•ดํ•˜๊ธฐ๋กœ RDNet Backbone์€ Stochastic Depth(DropPath)๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์œผ๋กœ ์•Œ๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ RDNet์˜ DenseBlock ํด๋ž˜์Šค์—์„œ DropPath๋ฅผ ์„ ์–ธ๋งŒ ํ•˜๊ณ  ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š” ๊ฒƒ์œผ๋กœ ๋ณด์ž…๋‹ˆ๋‹ค. ์ด ๋ถ€๋ถ„์ด ์˜๋„๋œ ๊ฒƒ์ด ๋งž๋Š” ๋ฌธ์˜๋“œ๋ฆฝ๋‹ˆ๋‹ค. ๋งŒ์•ฝ, ์˜๋„๋œ ์‚ฌํ•ญ์ด ๋งž๋‹ค๋ฉด ๊ทธ ์ด์œ ๋„ ์„ค๋ช… ๋ถ€ํƒ๋“œ๋ฆฝ๋‹ˆ๋‹ค.
  2. downstream task๋กœ RDNet Backbone์„ HPE task์— ์ ์šฉํ•˜๊ณ ์ž ์‹œ๋„ํ•œ ๊ฒฐ๊ณผ, ๋ชจ๋ธ์˜ ํ•™์Šต์ด ์ž˜ ์ด๋ฃจ์–ด์ง€์ง€ ์•Š์Œ์„ ํ™•์ธํ–ˆ์Šต๋‹ˆ๋‹ค. batch size๋‚˜ LR๊ณผ ๊ฐ™์€ hyperparamter๋„ ์ˆ˜์ •ํ•˜์—ฌ ์—ฌ๋Ÿฌ ๋ฒˆ ์‹œ๋„ ํ•ด๋ดค์ง€๋งŒ ์ด ์—ญ์‹œ ํ•™์Šต์ด ์ž˜ ๋˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค. ์ด์— ๋Œ€ํ•œ ๊ถ๊ธˆ์ฆ์œผ๋กœ RDNet Backbone์„ downstream task์— ์ ์šฉํ•  ๊ฒฝ์šฐ, ํ•™์Šต์ด ๋ฏผ๊ฐํ•ด์ง€๊ฑฐ๋‚˜(hyperparameter ๋“ฑ์œผ๋กœ ์ธํ•ด), ํ•™์Šต์ด ์ž˜ ์•ˆ๋  ์ˆ˜๋„ ์žˆ์„์ง€ ๊ถ๊ธˆํ•ฉ๋‹ˆ๋‹ค:
    ํ•™์Šต ๋•Œ ์ ์šฉํ•œ ์กฐ๊ฑด๋“ค์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.
    • Backbone: RDNet-Tiny (pretrained: X)
    • LR Scheduler: MultiStepLR
    • LR: 0.001 (also tried lower LR)
    • batch size: 64 (also tried 16, 32)
    • optimizer: Adam

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.