Giter Site home page Giter Site logo

wy1iu / largemargin_softmax_loss Goto Github PK

View Code? Open in Web Editor NEW
346.0 23.0 115.0 25.97 MB

Implementation for <Large-Margin Softmax Loss for Convolutional Neural Networks> in ICML'16.

License: Other

CMake 2.72% Makefile 0.69% Shell 0.42% C++ 79.87% Cuda 6.54% MATLAB 0.88% Python 8.81% Dockerfile 0.07%
l-softmax icml-2016 lsoftmax-loss caffe face-recognition image-recognition deep-learning

largemargin_softmax_loss's People

Contributors

wy1iu avatar ydwen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

largemargin_softmax_loss's Issues

Pairs of testing

As I read in article the final- representation of a testing face is obtained by concatenating
its original face features and its horizontally flipped features.
Do I understand correctly, for final embedding should sum embedding of original and horizontally,or should make one by attach from the end.

train accuracy decrease

Hi there!

The same problem of #18 happened to me when train the model and my problem was even more severe that my train accuracy decrease from 90% to 50%. Though it would increase to a new optimal accuracy after hundreds iters, I still confused with this phenomenon.

Thank you for your help!

how to test with input data,there is a problem with bottom: "label"

when i want to write deploy.prototxt,
the input is :
input: "data"
input_dim: 25
input_dim: 3
input_dim: 224
input_dim: 224

However ,
layer {
name: "fc8"
type: "LargeMarginInnerProduct"
bottom: "fc7"
bottom: "label"
top: "fc8"
top: "lambda"
param {
name: "fc8"
lr_mult: 0
}
largemargin_inner_product_param {
num_output: 100
type: SINGLE
base: 0
gamma: 1
iteration: 0
lambda_min: 0
weight_filler {
type: "msra"
}
}
include {
phase: TEST
}
}
there is a error 👍 bottom: "label"
how to input bottom: "label"

train mnist,loss is nan

when i use your mnist_train_test.prototxt to train the mnist, the loss is nan? How should i do to overcome this issues??

Difficult to train with LargeMargin_Softmax_Loss on cifar10

I have tried to train myexamples/cifar10/model/cifar_train_test.prototxt with different settings- DOUBLE/TRIPLE/QUADRUPLE, but it always goes like this:

I0327 02:22:00.515635 16177 solver.cpp:228] Iteration 12000, loss = 87.3365
I0327 02:22:00.515707 16177 solver.cpp:244] Train net output #0: lambda = 0.0624753
I0327 02:22:00.515720 16177 solver.cpp:244] Train net output #1: loss = 87.3365 (* 1 = 87.3365 loss)
I0327 02:22:00.586127 16177 solver.cpp:244] Train net output #2: mean_length = inf
I0327 02:22:00.586163 16177 sgd_solver.cpp:106] Iteration 12000, lr = 0.001
I0327 02:26:54.401607 16177 solver.cpp:228] Iteration 12200, loss = 87.3365
I0327 02:26:54.401752 16177 solver.cpp:244] Train net output #0: lambda = 0.0540467
I0327 02:26:54.401765 16177 solver.cpp:244] Train net output #1: loss = 87.3365 (* 1 = 87.3365 loss)
I0327 02:26:54.471928 16177 solver.cpp:244] Train net output #2: mean_length = inf
I0327 02:26:54.471937 16177 sgd_solver.cpp:106] Iteration 12200, lr = 0.001
I0327 02:31:48.234402 16177 solver.cpp:228] Iteration 12400, loss = 87.3365
I0327 02:31:48.234601 16177 solver.cpp:244] Train net output #0: lambda = 0.0467769
I0327 02:31:48.234617 16177 solver.cpp:244] Train net output #1: loss = 87.3365 (* 1 = 87.3365 loss)
I0327 02:31:48.304947 16177 solver.cpp:244] Train net output #2: mean_length = inf
I0327 02:31:48.304958 16177 sgd_solver.cpp:106] Iteration 12400, lr = 0.001
I0327 02:36:42.063432 16177 solver.cpp:228] Iteration 12600, loss = 87.3365
I0327 02:36:42.063588 16177 solver.cpp:244] Train net output #0: lambda = 0.0405035
I0327 02:36:42.063603 16177 solver.cpp:244] Train net output #1: loss = 87.3365 (* 1 = 87.3365 loss)
I0327 02:36:42.134166 16177 solver.cpp:244] Train net output #2: mean_length = inf

How to tackle with the problem ?

evaluate LargeMargin_Softmax_Loss on lfw

Thank you for sharing!
I used LargeMargin_Softmax_Loss to train a model on CASIA Webface。But what criteria do I use to evaluate this model, Euclidean distance?

Test the L-Softmax loss on new dataset

Hi,

I use the L-Softmax loss on the new dataset, the accuarcy of m=1, m=2, m=3, m=4 is 93.12%, 92.57%, 92.15% andg 91.86% respectively. Why the accuracy of m=4 is worse m=1?

Thanks.

Angle margin

Greetings,

I'm reading the paper, I would like if anyone can explain why the margin angle is equal to (m - 1) / (m + 1) * theta1,2

Thanks in advance.

prototxt for CASIA Webface

Many thanks for your work.

Could you please share train-test prototxt for CASIA experiment/params that is related to LargeMarginInnerProduct (largemargin_inner_product_param) for this "complex" case?

Test the L-Softmax loss on my dataset

################## train ##################
layer {
name: "fc8_2"
type: "LargeMarginInnerProduct"
bottom: "fc7"
bottom: "label"
top: "fc8"
top: "lambda"
param {
name: "fc8"
lr_mult: 10
}
largemargin_inner_product_param {
num_output: 101
type: DOUBLE
base: 1000
gamma: 0.00002
power: 45
iteration: 0
weight_filler {
type: "msra"
}
}
include {
phase: TRAIN
}
}

Test net output #1: lambda = 0
Test net output #2: loss = 87.2935 (* 1 = 87.2935 loss)
Iteration 0, loss = 22.7132

Train net output #0: lambda = 996.407
Train net output #1: loss = 26.682 (* 1 = 26.682 loss)
Iteration 0, lr = 5e-05
Iteration 50, loss = 5.74099

Train net output #0: lambda = 832.577
Train net output #1: loss = 5.74778 (* 1 = 5.74778 loss)
teration 50, lr = 5e-05
Iteration 100, loss = 5.24638

Train net output #0: lambda = 696.185
Train net output #1: loss = 5.34603 (* 1 = 5.34603 loss)
Iteration 100, lr = 5e-05
Iteration 150, loss = 5.04773

Train net output #0: lambda = 582.55
Train net output #1: loss = 5.15273 (* 1 = 5.15273 loss)
Iteration 150, lr = 5e-05
I fine-turn from vgg on my new data. But the loss is to large and I try different lr .from your experience ,what should i do. Thanks

Computation of k value from eq. (6)

Hello and thank you for the code and the awesome paper!

I haven't fully understood how you compute the value of k from the Equation (6) in the paper. Could you please provide some small explanation?
I have followed luoyetx implementation for the computation of k for mxnet but in most cases I get k=0, even for m>1 in MNIST and I think that potentially it's a problem that does not allow my implementation in tensorflow to reach the high accuracy you have.

Thanks a lot in advance,
Magda

Activation function problem

when I train Lenet , I use ReLU as activation function .The accuracy is 0.1 , but I use PReLU , the accuracy is 0.98 . I don't know why .

About A-Softmax

@wy1iu Hello!
The A-Softmax you proposed is also an excellent work. But i encountered a big problem when i used it to fine-tune a face recognition CNN model. The pre-trained model have already achieved about 98% accuracy on LFW. I have to mention that the 98% model was obtained by training on A-Softmax with SINGLE type. However, when i changed it to QUADRUPLE type to do fine-tuning, after 1w iterations with small learning rate like 0.0002 and 64 batch size, the caffemodel i got was total a mess(like about 50% on LFW). What might be the problem? I wish you could provide me some guidance! Thanks a lot!

How to run

When I build like Caffe,it taks some errors。hdf5/serial/hdf5.h

hard to convergence

I have reimplemented L_softmax using tensorflow, but I found it really hard to convergence.

Why this particular construction?

Hi --

I was wondering where you got the idea for the specific construction of the L-softmax. It seems like maybe you could achieve a similar goal by enforcing a margin like

norm(W) * norm(x) * (m * cos(theta) - m + 1)

instead of

norm(W) * norm(x) * cos(m * theta)

as you do in the paper.

The former seems simpler because you don't have to worry about constructing a psi function that behaves well for all values of theta, m doesn't have to be integer valued, etc. Also, in the paper, the gradient of psi is 0 at pi/2, which AFAICT is an undesirable side effect of the choice of psi. Is that right, or is there some reason that grad psi(pi/2) should be 0?

The proposed alternative above would have the same shape as cos in [0, pi] but with a range of [-m, 1], which seems maybe more natural.

Thoughts? Am I missing something? Did you try this and it stunk in practice?

Thanks

Can I know which part of the paper do sign_x_ correspond to?

 Blob<Dtype> sign_0_; // sign_0 = sign(cos_theta)
  // for DOUBLE type
  Blob<Dtype> cos_theta_quadratic_;
  // for TRIPLE type
  Blob<Dtype> sign_1_; // sign_1 = sign(abs(cos_theta) - 0.5)
  Blob<Dtype> sign_2_; // sign_2 = sign_0 * (1 + sign_1) - 2
  Blob<Dtype> cos_theta_cubic_;
  // for QUADRA type
  Blob<Dtype> sign_3_; // sign_3 = sign_0 * sign(2 * cos_theta_quadratic_ - 1)
Blob<Dtype> sign_4_; // sign_4 = 2 * sign_0 + sign_3 - 3

trian_accuracy decrease?

I0630 11:22:50.776134 23843 solver.cpp:337] Iteration 4000, Testing net (#0)
I0630 11:22:54.062695 23843 solver.cpp:404] Test net output #0: accuracy = 1
I0630 11:22:54.062719 23843 solver.cpp:404] Test net output #1: lambda = 60.444
I0630 11:22:54.062727 23843 solver.cpp:404] Test net output #2: loss = 1.59741e-09 (* 1 = 1.59741e-09 loss)
I0630 11:22:54.148296 23843 solver.cpp:225] Iteration 4000 (8.7286 iter/s, 11.4566s/100 iters), loss = 0.327719
I0630 11:22:54.148321 23843 solver.cpp:244] Train net output #0: accuracy = 0.9375
I0630 11:22:54.148326 23843 solver.cpp:244] Train net output #1: lambda = 1.37052
I0630 11:22:54.148346 23843 solver.cpp:244] Train net output #2: loss = 0.327719 (* 1 = 0.327719 loss)
I0630 11:22:54.148352 23843 sgd_solver.cpp:137] Iteration 4000, lr = 0.0001
I0630 11:23:02.281814 23843 solver.cpp:225] Iteration 4100 (12.295 iter/s, 8.13342s/100 iters), loss = 0.0623774
I0630 11:23:02.281839 23843 solver.cpp:244] Train net output #0: accuracy = 0.984375
I0630 11:23:02.281844 23843 solver.cpp:244] Train net output #1: lambda = 1.23743
I0630 11:23:02.281863 23843 solver.cpp:244] Train net output #2: loss = 0.0623773 (* 1 = 0.0623773 loss)

.....

I0630 11:23:59.213723 23843 solver.cpp:225] Iteration 4800 (12.2984 iter/s, 8.13116s/100 iters), loss = 1.32174
I0630 11:23:59.213876 23843 solver.cpp:244] Train net output #0: accuracy = 0.585938
I0630 11:23:59.213886 23843 solver.cpp:244] Train net output #1: lambda = 0.609189
I0630 11:23:59.213891 23843 solver.cpp:244] Train net output #2: loss = 1.32174 (* 1 = 1.32174 loss)
I0630 11:23:59.213896 23843 sgd_solver.cpp:137] Iteration 4800, lr = 0.0001
I0630 11:24:07.347139 23843 solver.cpp:225] Iteration 4900 (12.2953 iter/s, 8.13318s/100 iters), loss = 1.92986
I0630 11:24:07.347164 23843 solver.cpp:244] Train net output #0: accuracy = 0.429688
I0630 11:24:07.347169 23843 solver.cpp:244] Train net output #1: lambda = 0.551035
I0630 11:24:07.347175 23843 solver.cpp:244] Train net output #2: loss = 1.92986 (* 1 = 1.92986 loss)

......

I0630 11:24:59.453804 23843 solver.cpp:225] Iteration 5500 (12.2901 iter/s, 8.13665s/100 iters), loss = 2.45443
I0630 11:24:59.453843 23843 solver.cpp:244] Train net output #0: accuracy = 0.0078125
I0630 11:24:59.453848 23843 solver.cpp:244] Train net output #1: lambda = 0.30322
I0630 11:24:59.453868 23843 solver.cpp:244] Train net output #2: loss = 2.45443 (* 1 = 2.45443 loss)
I0630 11:24:59.453873 23843 sgd_solver.cpp:137] Iteration 5500, lr = 0.0001
I0630 11:25:07.590095 23843 solver.cpp:225] Iteration 5600 (12.2908 iter/s, 8.13617s/100 iters), loss = 2.40135
I0630 11:25:07.590245 23843 solver.cpp:244] Train net output #0: accuracy = 0
I0630 11:25:07.590270 23843 solver.cpp:244] Train net output #1: lambda = 0.274696
I0630 11:25:07.590275 23843 solver.cpp:244] Train net output #2: loss = 2.40135 (* 1 = 2.40135 loss)

layer {
name: "fc10"
type: "LargeMarginInnerProduct"
bottom: "person"
bottom: "label"
top: "fc10"
top: "lambda"
param {
name: "ip2"
lr_mult: 1
}
largemargin_inner_product_param {
num_output: 10
type: TRIPLE
weight_filler {
type: "xavier"
}
base: 100
gamma: 2.5e-05
power: 45
iteration: 0
lambda_min: 0
}

}

Why lambda_min not used in Backward?

As the title described. In forward part, lambda won't be smaller than lambda_min, but in backward part, it doesn't compare with lambda_min. Is there a reason to do so or just a mistake ?

the problem with deploy.prototxt( Unknown bottom blob 'label')

I have trained a models with LargeMargin_softmax_loss and want deploy it to new data without label.But the LargeMarginInnerProduct layer needs to input label. I have tried use InnerProduct layer instead of it, but it occur target_blobs.size() == source_layer.blobs_size() (2 vs. 1).This means the weigths can't copy from LargeMarginInnerProduct layer to InnerProduct layer.
Now, I have no idea to use the model in new data without label. What should I do to counter the problem?Can you share you deploy.prototxt?
Thank you!

Licensing

Hi,

What license are the largemargin_inner_product layers released under?

Can i use your implementation to train models for commercial use? Or do i have to implement it myself before training commercial models?

Thanks.

reproduce cifar result

I only got 8.8% error rate when reproducing cifar10 example with this repository after training 22000 iterations,and the loss exploded(87.3365) after training 23000 iterations. Has anyone met similar problems or reproduced paper's result successfully?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.