zeiss-microscopy / bsconv Goto Github PK

Reference implementation for Blueprint Separable Convolutions (CVPR 2020)

License: BSD 3-Clause Clear License

Python 99.75% Shell 0.25%

cvpr2020 pytorch depthwise-separable-convolutions resnet mobilenet image-classification deep-learning efficient-neural-networks zeiss cifar10

bsconv's People

Stargazers

Watchers

bsconv's Issues

How is BSConv being utilized in MobileNet V2 and V3?

Great paper!
Just one small question,
It seems that you have not alter the structure of mobilenet v2 and v3, because it sort of already got BSConv built in?
Dose this imply that the accuracy gain (especially in CIFAR) is purely from the proposed orthonormal regularization loss?

About the PCA in section 3.1 of the paper.

Hi, thank you for releasing the code. I have a question that looking forward to your answers：
This PCA code, in my opinion, reduces the dimensionality of the features (K*K) and proves the redundancy of the features within each kernel, how is the intra-kernel correlations derived from this?

step 1: split 3D kernel F into 2D kernels (assuming F is of size CxHxW)

xs = [F[nChannel, :, :].flatten() for nChannel in range(F.shape[0])]
X = np.array(xs)

step 2: perform PCA

import sklearn.decomposition
pca = sklearn.decomposition.PCA(n_components=None)
pca.fit(X)

step3: this is the variance of F which is explained by the first principal component (PC1)

v = pca.explained_variance_ratio_[0]

5.3. Fine-grained Recognition

batch_size:128, momentum:0.9 ,weight decay 10−4.epochs:100,learning rate：0.1, linearly decayed at every epochs.
The results of the paper cannot be reproduced. Is there a problem with my hyperparameter setting

scheduling the learning rate for sub_imagenet datasets.

In paper 5.3:
For Stanford Dogs, Stanford Cars, and Oxford 102 Flowers, the learning rate is adjust as below:
"The initial learning rate is set to 0.1
and linearly decayed at every epoch such that it approaches
zero after a total of 100 epochs."

How does the learning rate adjusted? Does it wrote in the script (bsconv_pytorch_train.py)?
Is the learning rate adjusted as below?
epoch 0, lr_rate = 0.1
epoch 1, lr_rate = 0.1-(0.1/100)*1
..
epoch n, lr_rate = 0.1-(0.1/100)*n
..
epoch 99, lr_rate = 0.1-(0.1/100)*99

About activation layer and inference:

Hello, thanks for releasing the code. I have a couple of questions:

I want to confirm about the BSConv-S. In my understand, the module has:
bsconvS =[ (Conv1x1 +BN) -->(Conv1x1+BN)-->(dw-Conv3x3)]
and the BN and ReLu are only applied after bsconvS. There is no Activation in the middle of BSConv-S.
Is this also hold for the BS Residual-Inverted Bottleneck of Mobile-V2? So that, the transformed block only has 1 activation only at the end. (while original one has 2 ReLU). Specifically:
Inverted-Residual Block:
x--> [conv1x1-BN-Act --> conv3x3-BN-Act --> conv1x1-BN] + x
while BS-Inverted Bottleneck:
x--> [conv1x1-BN --> conv1x1-BN--> conv3x3-BN-Act] + x
If there is no ReLu in middle of BSCon-S, then during inference, can we merge the first 2 Conv1x1 into a single conv1x1 (which reduce to BSConv-U), to save computation.
Did you compare inference speed between BSConv-S and regular Conv.

Thank you.

the pictures in the paper

how to get this??where is the code?

about Figure 2 in paper

How to get the Histogram of the variance along the depth axis of filter kernels as shown in Figure 2 in the paper？Can you share your code?Thanks!!

Ask about adjusting learning rate

Hello!

I read your paper very well and got amazed by your work!

While I am googling about Wide-resnet, I found this repo explaining about training best wide-resnet. So how do you think about apply this learning details in your bin/bsconv_pytorch_train.py if and only if it improve results.

If you think this is good idea, let me know and I will try this.

Thank you.

aboult BSConv-S

hello,when i see this paper, i have a problem aboult BSConv-S , in BSConv-S,here is a choice for using a BN and activate layer , So when to use BN and activate layer in BSConv-S ?

MobileNetv3-large baseline accuracy

Hello Manuel! I have read your CVPR2020 paper, and your method is effective on ConvNets.

While following your work, I have problems to reproduce MobileNetV3-large cifar100 baseline, which has 75% accuracy. However, with the following setting,
epoch=200; SGD with momentum 0.9; weight decay of 10−4; lr=0.1 and decayed by a factor of 0.1 at epochs 100, 150, and 180.
I can only get accuracy around 70%.
I also change the first two stride=2 to stride=1 for MobileNetV3.

Can you share your parameter setting? Or is there anything wrong? Thanks for helping me.

models 'mobilenetv2_w1_bsconvs' and ''mobilenetv2_w1' are identical

I print out the two models, and don't see any difference. Reproduce:

import bsconv.pytorch
model1= bsconv.pytorch.get_model('mobilenetv2_w1_bsconvs',num_classes=100)
print(model1)

and

import bsconv.pytorch
model2= bsconv.pytorch.get_model('mobilenetv2_w1',num_classes=100)
print(model2)

zeiss-microscopy / bsconv Goto Github PK

bsconv's People

Stargazers

Watchers

Forkers

bsconv's Issues

step 1: split 3D kernel F into 2D kernels (assuming F is of size CxHxW)

step 2: perform PCA

step3: this is the variance of F which is explained by the first principal component (PC1)

Recommend Projects

Recommend Topics

Recommend Org