wy1iu / sphereface-plus Goto Github PK

SphereFace+ Implementation for <Learning towards Minimum Hyperspherical Energy> in NIPS'18.

License: MIT License

MATLAB 0.49% Shell 0.31% CMake 1.22% Makefile 0.27% HTML 0.08% CSS 0.10% Jupyter Notebook 57.51% C++ 33.20% Python 4.03% Cuda 2.75% Dockerfile 0.03%

caffe face-recognition sphereface

sphereface-plus's People

Stargazers

Watchers

sphereface-plus's Issues

Information about "inter_class_dist_output" layer

Dear Authors,
Thanks for the wonderful work. I was wondering if you explain the parameters in the following "inter_class_dist_output" layer.

layer {
  name: "inter_class_dist_output"
  type: "InterClass"
  bottom: "fc5"
  bottom: "label"
  top: "maximize_inter_class_dist"
  param {
    name: "fc6_w"
    lr_mult: 1
    decay_mult: 1
  }
  inter_class_param {
    num_output: 10572
    type: AMONG
    iteration: 16000
    alpha_start_iter: 16001
    alpha_start_value: 10 #among
  }
}

Essentially, if you could please answer the following questions then it would be great:
1 -> Parameter fc6_w which is a learnable parameter in the network. Could you please explain what this parameter is for in the network?
2 -> Could you please explain about this parameter type: AMONG?
3 -> It would be also great if you could please explain about iteration, alpha_start_iter and alpha_start_value parameters and how should we tune them?

Thanks very much.

maximize_inter_class_dist becomes zero when training with just one GPU instead of two

When I run the default files (solvers and shell script) that triggers the training, it converges with two gpus. Now when i train with just one gpu, the convergence is much slower - that is understandable.
Finally, when i train the model on just one gpu and with iter_size=2, that you mentioned, maximize_inter_class_dist in the console just becomes zero. Could you please reply on this?
Thanks.

Effect of mean and scale on the sphereface-plus training

Hello,
Thanks for the great work. One thing I noticed is that if I change the default mean and scale value, the softmax loss overflows (in sphereface training) and model does not converge without overflowing (in sphereface-plus). The problem is that I can not use this 127.5 and 0.0078125 as the scale value somehow. Could you please suggest, how to use mean value as 127 and scale as 1.0 (any layers should I change or any parameters should I modify) and still make the model convergent?

  transform_param {
    mean_value: 127.5
    mean_value: 127.5
    mean_value: 127.5
    scale: 0.0078125
    mirror: true
  }

About the inter_class_param for finetuning

Hi wy1iu,

I'm trying to use MHE to finetune my FR model trained with AM-Softmax, as it's mentioned in the paper.
I have gotten about 0.5% gain at my own dataset with the defalt parameters in the log file(as follow).
inter_class_param {
num_output: 10572
type: AMONG
iteration: 16000
alpha_start_iter: 16001
alpha_start_value: 10
}
But it seems that the "iteration" and "alpha_start_iter" are too large when finetuning, isn't it?

Thanks!

Confusion in the display of the loss while training sphereface-plus

Thanks for your great work. I was just wondering about the following snippet:

I0204 14:26:03.838340  6323 solver.cpp:218] Iteration 10980 (0.968637 iter/s, 20.6476s/20 iters), loss = 14.7497
I0204 14:26:03.838383  6323 solver.cpp:237]     Train net output #0: lambda = 5
I0204 14:26:03.838428  6323 solver.cpp:237]     Train net output #1: maximize_inter_class_dist = 14.3479 (* 1 = 14.3479 loss)
I0204 14:26:03.838443  6323 solver.cpp:237]     Train net output #2: softmax_loss = 0.401827 (* 1 = 0.401827 loss)
I0204 14:26:03.838527  6323 sgd_solver.cpp:105] Iteration 10980, lr = 0.01
I0204 14:26:22.732009  6323 solver.cpp:447] Snapshotting to binary proto file result/m_single_model_iter_11000.caffemodel
I0204 14:26:23.138741  6323 sgd_solver.cpp:273] Snapshotting solver state to binary proto file result/m_single_model_iter_11000.solverstate
I0204 14:26:24.393401  6323 solver.cpp:310] Iteration 11000, loss = 13.0881
I0204 14:26:24.393429  6323 solver.cpp:315] Optimization Done.
I0204 14:26:24.788636  6323 caffe.cpp:259] Optimization Done.

There is this Train net output #2: softmax_loss = 0.401827 (* 1 = 0.401827 loss) and at the end it also shows I0204 14:26:24.393401 6323 solver.cpp:310] Iteration 11000, loss = 13.0881. What is this last loss 13.0881? - I think that this loss is a combination of a few other losses. Should I take care of this loss or the softmax_loss during the training.

Hyper-param explained.

Hi I got confused by part of the code:
https://github.com/wy1iu/sphereface-plus/blob/master/tools/caffe-sfplus/src/caffe/layers/inter_class_layer.cpp#L137-L161

Dtype alpha_start_iter_ = this->layer_param_.inter_class_param().alpha_start_iter();
  Dtype alpha_start_value_ = this->layer_param_.inter_class_param().alpha_start_value();
  Dtype alpha_step_ = this->layer_param_.inter_class_param().alpha_step();
  Dtype alpha_stepvalue_size = this->layer_param_.inter_class_param().alpha_stepvalue_size();
  Dtype normalize_ = this->layer_param_.inter_class_param().normalize();
  if (alpha_stepvalue_size != 0){
    const int* alpha_stepvalue_data = alpha_stepvalues.cpu_data();
    if (alpha_start_iter_ == iter_){
      alpha_ = alpha_start_value_;
    }
    else if(alpha_start_iter_ < iter_) {
      if(alpha_stepvalue_data[alpha_index_] == iter_ && alpha_index_<alpha_stepvalue_size){
        alpha_ += alpha_step_;
        alpha_index_ += (Dtype)1.;
      }
    }
  }
  else{
    if (alpha_start_iter_ == iter_){
      alpha_ = alpha_start_value_;
    }
  }
  if (top.size() == 2) {
    top[1]->mutable_cpu_data()[0] = alpha_;
  }

It seems that you are doing some thing like annealing, right?

Is alpha_ the factor for energy term ($\lambda_M$ in Eq.(8) of the original paper)?

Roughly I understand that alpha_start_iter_ and alpha_start_value_ are the starting timestamp and starting value of alpha_.

But what does alpha_stepvalues mean?
Did you do something like multi-step lr scheduler that decreases alpha_ by a factor after several iterations?

Though you left alpha_stepvalues empty in the configuration, I'm still curious about this part.

do we have to train model for new faces?

Suppose that we have new faces to apply this method and recognize each one. In this scenario do I have to train the network on these new faces or the pretrained model will give the sufficient performance?

I ask this question because I can't get my desired accuracy , so I guess there are two possibilities 1- the pretrained model is not proper for new faces 2- I can't use caffe c++ properly.

Thank you for your great work and support.

The loss implemented in Inter_class_layer seems to be different from the MHE loss form on paper

As stated. Any explanation?

Base lambda value in the log file

The base value of lambda according to the prototxt file it is 1000 in the log file that you have attached on the google drive here: https://drive.google.com/drive/folders/1kpGGvb5Nv0EmDicW2ue8LdUVjI0Ht1PP. However, in the same training log the lambda was set as 5 from the first iteration.
I was wondering if we set the base as 1000, should not it be starting from 1000 and then decrease it to the minimum lambda, which is 5 in this case. Could you please look into your log file: https://drive.google.com/drive/folders/1kpGGvb5Nv0EmDicW2ue8LdUVjI0Ht1PP?
Thanks.

how to modify the params of layer "inter_class_dist_output"?

Hi wy1iu, thanks for your sharing codes. I try to train the webface_casia datasets, the lossvalue convergence. but I train the ms1m datasets ,the lossvalue divergence. The params setting as follow:
inter_class_param {
num_output: 85164
type: AMONG
iteration: 16000
alpha_start_iter: 20000
alpha_start_value: 5.3
}
Should I modify the params of "iteration:" "alpha_start_iter:" "alpha_start_value: ",can you help me?
Is the params of "iteration:" "alpha_start_iter:" "alpha_start_value: " related to the number of trainning datasets?
Thanks!

wy1iu / sphereface-plus Goto Github PK

sphereface-plus's People

Stargazers

Watchers

Forkers

sphereface-plus's Issues

Information about "inter_class_dist_output" layer

maximize_inter_class_dist becomes zero when training with just one GPU instead of two

Effect of mean and scale on the sphereface-plus training

About the inter_class_param for finetuning

Confusion in the display of the loss while training sphereface-plus

Hyper-param explained.

do we have to train model for new faces?

The loss implemented in Inter_class_layer seems to be different from the MHE loss form on paper

Base lambda value in the log file

how to modify the params of layer "inter_class_dist_output"?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent