wy1iu / sphereface-plus Goto Github PK
View Code? Open in Web Editor NEWSphereFace+ Implementation for <Learning towards Minimum Hyperspherical Energy> in NIPS'18.
License: MIT License
SphereFace+ Implementation for <Learning towards Minimum Hyperspherical Energy> in NIPS'18.
License: MIT License
Dear Authors,
Thanks for the wonderful work. I was wondering if you explain the parameters in the following "inter_class_dist_output" layer.
layer {
name: "inter_class_dist_output"
type: "InterClass"
bottom: "fc5"
bottom: "label"
top: "maximize_inter_class_dist"
param {
name: "fc6_w"
lr_mult: 1
decay_mult: 1
}
inter_class_param {
num_output: 10572
type: AMONG
iteration: 16000
alpha_start_iter: 16001
alpha_start_value: 10 #among
}
}
Essentially, if you could please answer the following questions then it would be great:
1 -> Parameter fc6_w
which is a learnable parameter in the network. Could you please explain what this parameter is for in the network?
2 -> Could you please explain about this parameter type: AMONG
?
3 -> It would be also great if you could please explain about iteration
, alpha_start_iter
and alpha_start_value
parameters and how should we tune them?
Thanks very much.
When I run the default files (solvers and shell script) that triggers the training, it converges with two gpus. Now when i train with just one gpu, the convergence is much slower - that is understandable.
Finally, when i train the model on just one gpu and with iter_size=2
, that you mentioned, maximize_inter_class_dist
in the console just becomes zero. Could you please reply on this?
Thanks.
Hello,
Thanks for the great work. One thing I noticed is that if I change the default mean and scale value, the softmax loss overflows (in sphereface training) and model does not converge without overflowing (in sphereface-plus). The problem is that I can not use this 127.5 and 0.0078125 as the scale value somehow. Could you please suggest, how to use mean value as 127 and scale as 1.0 (any layers should I change or any parameters should I modify) and still make the model convergent?
transform_param {
mean_value: 127.5
mean_value: 127.5
mean_value: 127.5
scale: 0.0078125
mirror: true
}
Hi wy1iu,
I'm trying to use MHE to finetune my FR model trained with AM-Softmax, as it's mentioned in the paper.
I have gotten about 0.5% gain at my own dataset with the defalt parameters in the log file(as follow).
inter_class_param {
num_output: 10572
type: AMONG
iteration: 16000
alpha_start_iter: 16001
alpha_start_value: 10
}
But it seems that the "iteration" and "alpha_start_iter" are too large when finetuning, isn't it?
Thanks!
Thanks for your great work. I was just wondering about the following snippet:
I0204 14:26:03.838340 6323 solver.cpp:218] Iteration 10980 (0.968637 iter/s, 20.6476s/20 iters), loss = 14.7497
I0204 14:26:03.838383 6323 solver.cpp:237] Train net output #0: lambda = 5
I0204 14:26:03.838428 6323 solver.cpp:237] Train net output #1: maximize_inter_class_dist = 14.3479 (* 1 = 14.3479 loss)
I0204 14:26:03.838443 6323 solver.cpp:237] Train net output #2: softmax_loss = 0.401827 (* 1 = 0.401827 loss)
I0204 14:26:03.838527 6323 sgd_solver.cpp:105] Iteration 10980, lr = 0.01
I0204 14:26:22.732009 6323 solver.cpp:447] Snapshotting to binary proto file result/m_single_model_iter_11000.caffemodel
I0204 14:26:23.138741 6323 sgd_solver.cpp:273] Snapshotting solver state to binary proto file result/m_single_model_iter_11000.solverstate
I0204 14:26:24.393401 6323 solver.cpp:310] Iteration 11000, loss = 13.0881
I0204 14:26:24.393429 6323 solver.cpp:315] Optimization Done.
I0204 14:26:24.788636 6323 caffe.cpp:259] Optimization Done.
There is this Train net output #2: softmax_loss = 0.401827 (* 1 = 0.401827 loss)
and at the end it also shows I0204 14:26:24.393401 6323 solver.cpp:310] Iteration 11000, loss = 13.0881
. What is this last loss 13.0881
? - I think that this loss is a combination of a few other losses. Should I take care of this loss or the softmax_loss
during the training.
Hi I got confused by part of the code:
https://github.com/wy1iu/sphereface-plus/blob/master/tools/caffe-sfplus/src/caffe/layers/inter_class_layer.cpp#L137-L161
Dtype alpha_start_iter_ = this->layer_param_.inter_class_param().alpha_start_iter();
Dtype alpha_start_value_ = this->layer_param_.inter_class_param().alpha_start_value();
Dtype alpha_step_ = this->layer_param_.inter_class_param().alpha_step();
Dtype alpha_stepvalue_size = this->layer_param_.inter_class_param().alpha_stepvalue_size();
Dtype normalize_ = this->layer_param_.inter_class_param().normalize();
if (alpha_stepvalue_size != 0){
const int* alpha_stepvalue_data = alpha_stepvalues.cpu_data();
if (alpha_start_iter_ == iter_){
alpha_ = alpha_start_value_;
}
else if(alpha_start_iter_ < iter_) {
if(alpha_stepvalue_data[alpha_index_] == iter_ && alpha_index_<alpha_stepvalue_size){
alpha_ += alpha_step_;
alpha_index_ += (Dtype)1.;
}
}
}
else{
if (alpha_start_iter_ == iter_){
alpha_ = alpha_start_value_;
}
}
if (top.size() == 2) {
top[1]->mutable_cpu_data()[0] = alpha_;
}
It seems that you are doing some thing like annealing, right?
Is alpha_
the factor for energy term (
Roughly I understand that alpha_start_iter_
and alpha_start_value_
are the starting timestamp and starting value of alpha_
.
But what does alpha_stepvalues
mean?
Did you do something like multi-step lr scheduler that decreases alpha_
by a factor after several iterations?
Though you left alpha_stepvalues
empty in the configuration, I'm still curious about this part.
Suppose that we have new faces to apply this method and recognize each one. In this scenario do I have to train the network on these new faces or the pretrained model will give the sufficient performance?
I ask this question because I can't get my desired accuracy , so I guess there are two possibilities 1- the pretrained model is not proper for new faces 2- I can't use caffe c++ properly.
Thank you for your great work and support.
As stated. Any explanation?
The base
value of lambda according to the prototxt file it is 1000 in the log file that you have attached on the google drive here: https://drive.google.com/drive/folders/1kpGGvb5Nv0EmDicW2ue8LdUVjI0Ht1PP. However, in the same training log the lambda was set as 5 from the first iteration.
I was wondering if we set the base
as 1000, should not it be starting from 1000 and then decrease it to the minimum lambda, which is 5 in this case. Could you please look into your log file: https://drive.google.com/drive/folders/1kpGGvb5Nv0EmDicW2ue8LdUVjI0Ht1PP?
Thanks.
Hi wy1iu, thanks for your sharing codes. I try to train the webface_casia datasets, the lossvalue convergence. but I train the ms1m datasets ,the lossvalue divergence. The params setting as follow:
inter_class_param {
num_output: 85164
type: AMONG
iteration: 16000
alpha_start_iter: 20000
alpha_start_value: 5.3
}
Should I modify the params of "iteration:" "alpha_start_iter:" "alpha_start_value: ",can you help me?
Is the params of "iteration:" "alpha_start_iter:" "alpha_start_value: " related to the number of trainning datasets?
Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.