Giter Site home page Giter Site logo

cliquenet's Introduction

CliqueNet

This repository is for Convolutional Neural Networks with Alternately Updated Clique (to appear in CVPR 2018, Oral presentation),

by Yibo Yang, Zhisheng Zhong, Tiancheng Shen, and Zhouchen Lin.

citation

If you find CliqueNet useful in your research, please consider citing:

@article{yang18,
 author={Yibo Yang and Zhisheng Zhong and Tiancheng Shen and Zhouchen Lin},
 title={Convolutional Neural Networks with Alternately Updated Clique},
 journal={arXiv preprint arXiv:1802.10419},
 year={2018}
}

table of contents

Introduction

CliqueNet is a newly proposed convolutional neural network architecture where any pair of layers in the same block are connected bilaterally (Fig 1). Any layer is both the input and output another one, and information flow can be maximized. During propagation, the layers are updated alternately (Fig 2), so that each layer will always receive the feedback information from the layers that are updated more lately. We show that the refined features are more discriminative and lead to a better performance. On benchmark classification datasets including CIFAR-10, CIFAR-100, SVHN, and ILSVRC 2012, we achieve better or comparable results over state of the arts with fewer parameters. This repo contains the code of our project, and also provides some experimental results that are out of the paper.

Fig 1. An illustration of a block with 4 layers. Node 0 denotes the input layer of this block.

Fig 2. Alternate updating rule in CliqueNet. "{}" denotes the concatenating operator.

Usage

  • Our experiments are conducted with TensorFlow in Python 2.
  • Clone this repo: git clone https://github.com/iboing/CliqueNet
  • An example to train a model on CIFAR or SVHN:
python train.py --gpu [gpu id] --dataset [cifar-10 or cifar-100 or SVHN] --k [filters per layer] --T [all layers of three blocks] --dir [path to save models]
  • Additional techniques (optional): if you want to use attentional transition, bottleneck architecture, or compression strategy in our paper, add --if_a True, --if_b True, and --if_c True, respectively.

Ablation experiments

With the feedback connections, CliqueNet alternately re-update previous layers with updated layers, to enable refined features. The weights among layers are re-used for multiple times, so that a deeper representation space can be attained with a fixed number of parameters. In order to test the effectiveness of CliqueNet's feature refinement, we analyze the features generated in different stages by conducting experiments using different versions of CliqueNet. As illustrated by Fig 3, the CliqueNet(I+I) only uses Stage-I feature. The CliqueNet(I+II) uses Stage-I feature concatenated with input layer as the block feature, but transits Stage-II feature into the next block. The CliqueNet(II+II) only uses refined features.

Fig 3. A schema for CliqueNet(i+j), i,j belong to {I,II}.

Model block feature transit error(%)
CliqueNet(I+I) { X_0, Stage-I } Stage-I 6.64
CliqueNet(I+II) { X_0, Stage-I } Stage-II 6.10
CliqueNet(II+II) { X_0, Stage-II } Stage-II 5.76

Tab 1. Resutls of different versions of CliqueNets.

To run the experiments above, please modify train.py as:

from models.cliquenet_I_I import build_model

for CliqueNet(I+I), and

from models.cliquenet_I_II import build_model

for CliqueNet(I+II).

We further consider a situation where the feedback is not processed entirely. Concretely, when k=64 and T=15, we use the Stage-II feature, but only the first X steps, see Fig 2. Then X=0 is just the case of CliqueNet(I+I), and X=5 corresponds to CliqueNet(II+II).

Model CIFAR-10 CIFAR-100
CliqueNet(X=0) 5.83 24.79
CliqueNet(X=1) 5.63 24.65
CliqueNet(X=2) 5.54 24.37
CliqueNet(X=3) 5.41 23.75
CliqueNet(X=4) 5.20 24.04
CliqueNet(X=5) 5.12 23.73

Tab 2. Performance of CliqueNets with different X.

To run the experiments with different X, modify train.py as:

from models.cliquenet_X import build_model

and set the value of X in ./models/cliquenet_X.py

Comparison with state of the arts

The results listed below demonstrate the superiority of CliqueNet over DenseNet when there are no additional techniques (bottleneck, compression, etc.).

Model FLOPs Params CIFAR-10 CIFAR-100 SVHN
DenseNet (k = 12, T = 36) 0.53G 1.0M 7.00 27.55 1.79
DenseNet (k = 12, T = 96) 3.54G 7.0M 5.77 23.79 1.67
DenseNet (k = 24, T = 96) 13.78G 27.2M 5.83 23.42 1.59
CliqueNet (k = 36, T = 12) 0.91G 0.94M 5.93 27.32 1.77
CliqueNet (k = 64, T = 15) 4.21G 4.49M 5.12 23.98 1.62
CliqueNet (k = 80, T = 15) 6.45G 6.94M 5.10 23.32 1.56
CliqueNet (k = 80, T = 18) 9.45G 10.14M 5.06 23.14 1.51

Tab 3. Main results on CIFAR and SVHN without data augmentation.

Because larger T would lead to higher computation cost and slightly more parameters, we prefer using a larger k in our experiments. To make comparisons more fair, we also consider the situation where k and T of DenseNets and CliqueNets are exactly the same, see Tab 4.

Model Params CIFAR-10 CIFAR-100
DenseNet(k=12,T=36) 1.02M 7.00 27.55
CliqueNet(k=12,T=36) 1.05M 5.79 26.85
DenseNet(k=24,T=18) 0.99M 7.13 27.70
CliqueNet(k=24,T=18) 0.99M 6.04 26.57
DenseNet(k=36,T=12) 0.96M 6.89 27.54
CliqueNet(k=36,T=12) 0.94M 5.93 27.32

Tab 4. Comparisons with the same k and T.

Note that the result of DenseNet(k=12, T=36) is reported by original paper. The others are implementated by ourselves under the same experimental settings.

Results on ImageNet

Our code for experiments on ImageNet with TensorFlow will be released soon.

Here we provide a PyTorch version to train a CliqueNet on ImageNet. An example to run:

python train_imagenet.py [path to the imagenet dataset]

(As the default, CliqueNet-S3 is trained, batchsize is 160 and attentional transition is used.)

cliquenet's People

Contributors

iboing avatar zs-zhong avatar stc1995 avatar

Watchers

James Cloos avatar Jason Lee avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.