wy1iu / largemargin_softmax_loss Goto Github PK

View Code? Open in Web Editor NEW

346.0 23.0 115.0 25.97 MB

Implementation for <Large-Margin Softmax Loss for Convolutional Neural Networks> in ICML'16.

License: Other

CMake 2.72% Makefile 0.69% Shell 0.42% C++ 79.87% Cuda 6.54% MATLAB 0.88% Python 8.81% Dockerfile 0.07%

l-softmax icml-2016 lsoftmax-loss caffe face-recognition image-recognition deep-learning

largemargin_softmax_loss's People

Contributors

Stargazers

Watchers

Forkers

caomw yiboin caozhengquan xqpinitial stevenlol benjamesbabala zilongzhong liaoheping kakacynic guitaryourself loliod joyhuang9473 1292765944 jay2002 dreadlord1984 deepxkn liulei2776 pkuwison cupwater wyw636 chenbinghui1 runauto apprisi shubaozhang guoyu24k jianghuairong lgen brightown lakehui moyans hzq-github hawklucky walkoncross boyuansun tlok666 zhanglaplace zgsxwsdxg mfzhang ldh127 liujie3948 stoneyang-face zhangmingmovidius mugglewang trantorrepository furong0912 lji72 zhoukaii ljisoftware yanhuacheng tzhang2014 zouhongwei shubhampachori12110095 wang-mengjiao b2220333 nicehuster123 cocowf soledad89 xugithub1 wjxdp kendy1992 colinstonema zhabzhang xiangjun0103 mmx110 wicwicwic daijucug codes-kzhan jimeffry brucew91 lqs19881030 clscy wolfworld6 aqsc xtanitfy superhero1991 moseshu coder-wl gds101054108 we0091234 tinyloop andrehuang matt-bluet superlcw wilburd ztf-ucas peace-zy wujinlonglovezhangmiao1314 feesics tensorflow-pool runngezhang hx2009302823 maxuehao chenmingthu huangdebo keivanb yanchaomars pgsrv kuangge1994 chihuadelishu vertyxzz

largemargin_softmax_loss's Issues

Pairs of testing

As I read in article the final- representation of a testing face is obtained by concatenating
its original face features and its horizontally flipped features.
Do I understand correctly, for final embedding should sum embedding of original and horizontally,or should make one by attach from the end.

train accuracy decrease

Hi there!

The same problem of #18 happened to me when train the model and my problem was even more severe that my train accuracy decrease from 90% to 50%. Though it would increase to a new optimal accuracy after hundreds iters, I still confused with this phenomenon.

Thank you for your help!

how to test with input data,there is a problem with bottom: "label"

when i want to write deploy.prototxt,
the input is :
input: "data"
input_dim: 25
input_dim: 3
input_dim: 224
input_dim: 224

However ,
layer {
name: "fc8"
type: "LargeMarginInnerProduct"
bottom: "fc7"
bottom: "label"
top: "fc8"
top: "lambda"
param {
name: "fc8"
lr_mult: 0
}
largemargin_inner_product_param {
num_output: 100
type: SINGLE
base: 0
gamma: 1
iteration: 0
lambda_min: 0
weight_filler {
type: "msra"
}
}
include {
phase: TEST
}
}
there is a error 👍 bottom: "label"
how to input bottom: "label"

The difference between the bn_layer.cpp and bnll.cpp

I notice that you use the BN layer implemented by yourself, could you share me the different between them ?

train mnist，loss is nan

when i use your mnist_train_test.prototxt to train the mnist, the loss is nan? How should i do to overcome this issues??

The fully connected layer for the MNIST

It is reported to use fully connected layer of 256 in your paper. But I find it is 512 in your prototxt. So could you help me check to use 256 or 512 ?

some typos in

Difficult to train with LargeMargin_Softmax_Loss on cifar10

I have tried to train myexamples/cifar10/model/cifar_train_test.prototxt with different settings- DOUBLE/TRIPLE/QUADRUPLE, but it always goes like this:

I0327 02:22:00.515635 16177 solver.cpp:228] Iteration 12000, loss = 87.3365
I0327 02:22:00.515707 16177 solver.cpp:244] Train net output #0: lambda = 0.0624753
I0327 02:22:00.515720 16177 solver.cpp:244] Train net output #1: loss = 87.3365 (* 1 = 87.3365 loss)
I0327 02:22:00.586127 16177 solver.cpp:244] Train net output #2: mean_length = inf
I0327 02:22:00.586163 16177 sgd_solver.cpp:106] Iteration 12000, lr = 0.001
I0327 02:26:54.401607 16177 solver.cpp:228] Iteration 12200, loss = 87.3365
I0327 02:26:54.401752 16177 solver.cpp:244] Train net output #0: lambda = 0.0540467
I0327 02:26:54.401765 16177 solver.cpp:244] Train net output #1: loss = 87.3365 (* 1 = 87.3365 loss)
I0327 02:26:54.471928 16177 solver.cpp:244] Train net output #2: mean_length = inf
I0327 02:26:54.471937 16177 sgd_solver.cpp:106] Iteration 12200, lr = 0.001
I0327 02:31:48.234402 16177 solver.cpp:228] Iteration 12400, loss = 87.3365
I0327 02:31:48.234601 16177 solver.cpp:244] Train net output #0: lambda = 0.0467769
I0327 02:31:48.234617 16177 solver.cpp:244] Train net output #1: loss = 87.3365 (* 1 = 87.3365 loss)
I0327 02:31:48.304947 16177 solver.cpp:244] Train net output #2: mean_length = inf
I0327 02:31:48.304958 16177 sgd_solver.cpp:106] Iteration 12400, lr = 0.001
I0327 02:36:42.063432 16177 solver.cpp:228] Iteration 12600, loss = 87.3365
I0327 02:36:42.063588 16177 solver.cpp:244] Train net output #0: lambda = 0.0405035
I0327 02:36:42.063603 16177 solver.cpp:244] Train net output #1: loss = 87.3365 (* 1 = 87.3365 loss)
I0327 02:36:42.134166 16177 solver.cpp:244] Train net output #2: mean_length = inf

How to tackle with the problem ?

evaluate LargeMargin_Softmax_Loss on lfw

Thank you for sharing！
I used LargeMargin_Softmax_Loss to train a model on CASIA Webface。But what criteria do I use to evaluate this model, Euclidean distance?

Test the L-Softmax loss on new dataset

Hi,

I use the L-Softmax loss on the new dataset, the accuarcy of m=1, m=2, m=3, m=4 is 93.12%, 92.57%, 92.15% andg 91.86% respectively. Why the accuracy of m=4 is worse m=1?

Thanks.

Angle margin

Greetings,

I'm reading the paper, I would like if anyone can explain why the margin angle is equal to (m - 1) / (m + 1) * theta1,2

Thanks in advance.

prototxt for CASIA Webface

Many thanks for your work.

Could you please share train-test prototxt for CASIA experiment/params that is related to LargeMarginInnerProduct (largemargin_inner_product_param) for this "complex" case?

Why `lambda = max(lambda_min,base(1+gammaiteration)^(-power)`? Any particular reason?

Test the L-Softmax loss on my dataset

################## train ##################
layer {
name: "fc8_2"
type: "LargeMarginInnerProduct"
bottom: "fc7"
bottom: "label"
top: "fc8"
top: "lambda"
param {
name: "fc8"
lr_mult: 10
}
largemargin_inner_product_param {
num_output: 101
type: DOUBLE
base: 1000
gamma: 0.00002
power: 45
iteration: 0
weight_filler {
type: "msra"
}
}
include {
phase: TRAIN
}
}

Test net output #1: lambda = 0
Test net output #2: loss = 87.2935 (* 1 = 87.2935 loss)
Iteration 0, loss = 22.7132

Train net output #0: lambda = 996.407
Train net output #1: loss = 26.682 (* 1 = 26.682 loss)
Iteration 0, lr = 5e-05
Iteration 50, loss = 5.74099

Train net output #0: lambda = 832.577
Train net output #1: loss = 5.74778 (* 1 = 5.74778 loss)
teration 50, lr = 5e-05
Iteration 100, loss = 5.24638

Train net output #0: lambda = 696.185
Train net output #1: loss = 5.34603 (* 1 = 5.34603 loss)
Iteration 100, lr = 5e-05
Iteration 150, loss = 5.04773

Train net output #0: lambda = 582.55
Train net output #1: loss = 5.15273 (* 1 = 5.15273 loss)
Iteration 150, lr = 5e-05
I fine-turn from vgg on my new data. But the loss is to large and I try different lr .from your experience ,what should i do. Thanks

Computation of k value from eq. (6)

Hello and thank you for the code and the awesome paper!

I haven't fully understood how you compute the value of k from the Equation (6) in the paper. Could you please provide some small explanation?
I have followed luoyetx implementation for the computation of k for mxnet but in most cases I get k=0, even for m>1 in MNIST and I think that potentially it's a problem that does not allow my implementation in tensorflow to reach the high accuracy you have.

Thanks a lot in advance,
Magda

L

L-softmax + center loss

the deploy.prototxt of LargeMargin_Softmax_Loss

After finished training. How can I use LargeMargin_Softmax_Loss in the deploy.prototxt? thank you!

Is the CIFAR10 dataset error rate given in the paper the result of a single model?

@wy1iu I tried to reproduce L-Softmax with pytorch, the CIFAR10 dataset best error rate is 10.5 when m=2, higher than 7.73 in the paper, I would like to know if the 7.73 error rate is the result of a single model, or if multiple models are integrated?
I am looking forward to your reply, thanks.

Activation function problem

when I train Lenet , I use ReLU as activation function .The accuracy is 0.1 , but I use PReLU , the accuracy is 0.98 . I don't know why .

void LargeMarginInnerProductLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,

Why do you do this before each iteration? and the original full connection layer does not have this step.

caffe_set(M_*N_, (Dtype)0., bottom_diff);

About A-Softmax

@wy1iu Hello!
The A-Softmax you proposed is also an excellent work. But i encountered a big problem when i used it to fine-tune a face recognition CNN model. The pre-trained model have already achieved about 98% accuracy on LFW. I have to mention that the 98% model was obtained by training on A-Softmax with SINGLE type. However, when i changed it to QUADRUPLE type to do fine-tuning, after 1w iterations with small learning rate like 0.0002 and 64 batch size, the caffemodel i got was total a mess(like about 50% on LFW). What might be the problem? I wish you could provide me some guidance! Thanks a lot!

How to run

When I build like Caffe，it taks some errors。hdf5/serial/hdf5.h

hard to convergence

I have reimplemented L_softmax using tensorflow, but I found it really hard to convergence.

Why this particular construction?

Hi --

I was wondering where you got the idea for the specific construction of the L-softmax. It seems like maybe you could achieve a similar goal by enforcing a margin like

norm(W) * norm(x) * (m * cos(theta) - m + 1)

instead of

norm(W) * norm(x) * cos(m * theta)

as you do in the paper.

The former seems simpler because you don't have to worry about constructing a psi function that behaves well for all values of theta, m doesn't have to be integer valued, etc. Also, in the paper, the gradient of psi is 0 at pi/2, which AFAICT is an undesirable side effect of the choice of psi. Is that right, or is there some reason that grad psi(pi/2) should be 0?

The proposed alternative above would have the same shape as cos in [0, pi] but with a range of [-m, 1], which seems maybe more natural.

Thoughts? Am I missing something? Did you try this and it stunk in practice?

Thanks

When could you release you code ?

Can I know which part of the paper do sign_x_ correspond to?

 Blob<Dtype> sign_0_; // sign_0 = sign(cos_theta)
  // for DOUBLE type
  Blob<Dtype> cos_theta_quadratic_;
  // for TRIPLE type
  Blob<Dtype> sign_1_; // sign_1 = sign(abs(cos_theta) - 0.5)
  Blob<Dtype> sign_2_; // sign_2 = sign_0 * (1 + sign_1) - 2
  Blob<Dtype> cos_theta_cubic_;
  // for QUADRA type
  Blob<Dtype> sign_3_; // sign_3 = sign_0 * sign(2 * cos_theta_quadratic_ - 1)
Blob<Dtype> sign_4_; // sign_4 = 2 * sign_0 + sign_3 - 3

trian_accuracy decrease?

I0630 11:22:50.776134 23843 solver.cpp:337] Iteration 4000, Testing net (#0)
I0630 11:22:54.062695 23843 solver.cpp:404] Test net output #0: accuracy = 1
I0630 11:22:54.062719 23843 solver.cpp:404] Test net output #1: lambda = 60.444
I0630 11:22:54.062727 23843 solver.cpp:404] Test net output #2: loss = 1.59741e-09 (* 1 = 1.59741e-09 loss)
I0630 11:22:54.148296 23843 solver.cpp:225] Iteration 4000 (8.7286 iter/s, 11.4566s/100 iters), loss = 0.327719
I0630 11:22:54.148321 23843 solver.cpp:244] Train net output #0: accuracy = 0.9375
I0630 11:22:54.148326 23843 solver.cpp:244] Train net output #1: lambda = 1.37052
I0630 11:22:54.148346 23843 solver.cpp:244] Train net output #2: loss = 0.327719 (* 1 = 0.327719 loss)
I0630 11:22:54.148352 23843 sgd_solver.cpp:137] Iteration 4000, lr = 0.0001
I0630 11:23:02.281814 23843 solver.cpp:225] Iteration 4100 (12.295 iter/s, 8.13342s/100 iters), loss = 0.0623774
I0630 11:23:02.281839 23843 solver.cpp:244] Train net output #0: accuracy = 0.984375
I0630 11:23:02.281844 23843 solver.cpp:244] Train net output #1: lambda = 1.23743
I0630 11:23:02.281863 23843 solver.cpp:244] Train net output #2: loss = 0.0623773 (* 1 = 0.0623773 loss)

.....

I0630 11:23:59.213723 23843 solver.cpp:225] Iteration 4800 (12.2984 iter/s, 8.13116s/100 iters), loss = 1.32174
I0630 11:23:59.213876 23843 solver.cpp:244] Train net output #0: accuracy = 0.585938
I0630 11:23:59.213886 23843 solver.cpp:244] Train net output #1: lambda = 0.609189
I0630 11:23:59.213891 23843 solver.cpp:244] Train net output #2: loss = 1.32174 (* 1 = 1.32174 loss)
I0630 11:23:59.213896 23843 sgd_solver.cpp:137] Iteration 4800, lr = 0.0001
I0630 11:24:07.347139 23843 solver.cpp:225] Iteration 4900 (12.2953 iter/s, 8.13318s/100 iters), loss = 1.92986
I0630 11:24:07.347164 23843 solver.cpp:244] Train net output #0: accuracy = 0.429688
I0630 11:24:07.347169 23843 solver.cpp:244] Train net output #1: lambda = 0.551035
I0630 11:24:07.347175 23843 solver.cpp:244] Train net output #2: loss = 1.92986 (* 1 = 1.92986 loss)

......

I0630 11:24:59.453804 23843 solver.cpp:225] Iteration 5500 (12.2901 iter/s, 8.13665s/100 iters), loss = 2.45443
I0630 11:24:59.453843 23843 solver.cpp:244] Train net output #0: accuracy = 0.0078125
I0630 11:24:59.453848 23843 solver.cpp:244] Train net output #1: lambda = 0.30322
I0630 11:24:59.453868 23843 solver.cpp:244] Train net output #2: loss = 2.45443 (* 1 = 2.45443 loss)
I0630 11:24:59.453873 23843 sgd_solver.cpp:137] Iteration 5500, lr = 0.0001
I0630 11:25:07.590095 23843 solver.cpp:225] Iteration 5600 (12.2908 iter/s, 8.13617s/100 iters), loss = 2.40135
I0630 11:25:07.590245 23843 solver.cpp:244] Train net output #0: accuracy = 0
I0630 11:25:07.590270 23843 solver.cpp:244] Train net output #1: lambda = 0.274696
I0630 11:25:07.590275 23843 solver.cpp:244] Train net output #2: loss = 2.40135 (* 1 = 2.40135 loss)

layer {
name: "fc10"
type: "LargeMarginInnerProduct"
bottom: "person"
bottom: "label"
top: "fc10"
top: "lambda"
param {
name: "ip2"
lr_mult: 1
}
largemargin_inner_product_param {
num_output: 10
type: TRIPLE
weight_filler {
type: "xavier"
}
base: 100
gamma: 2.5e-05
power: 45
iteration: 0
lambda_min: 0
}
}

Why lambda_min not used in Backward?

As the title described. In forward part, lambda won't be smaller than lambda_min, but in backward part, it doesn't compare with lambda_min. Is there a reason to do so or just a mistake ?

the problem with deploy.prototxt( Unknown bottom blob 'label')

I have trained a models with LargeMargin_softmax_loss and want deploy it to new data without label.But the LargeMarginInnerProduct layer needs to input label. I have tried use InnerProduct layer instead of it, but it occur target_blobs.size() == source_layer.blobs_size() (2 vs. 1).This means the weigths can't copy from LargeMarginInnerProduct layer to InnerProduct layer.
Now, I have no idea to use the model in new data without label. What should I do to counter the problem?Can you share you deploy.prototxt?
Thank you!

Licensing

Hi,

What license are the largemargin_inner_product layers released under?

Can i use your implementation to train models for commercial use? Or do i have to implement it myself before training commercial models?

Thanks.

Check failed: target_blobs.size() == source_layer.blobs_size() (1 vs. 2)

the old fc layer only one bottom, but ip2 layer have two bottom.
How to solve this problem? Thanks for your answer.

reproduce cifar result

I only got 8.8% error rate when reproducing cifar10 example with this repository after training 22000 iterations,and the loss exploded(87.3365) after training 23000 iterations. Has anyone met similar problems or reproduced paper's result successfully?