Giter Site home page Giter Site logo

wy1iu / sphereface-plus Goto Github PK

View Code? Open in Web Editor NEW
148.0 148.0 25.0 8.18 MB

SphereFace+ Implementation for <Learning towards Minimum Hyperspherical Energy> in NIPS'18.

License: MIT License

MATLAB 0.49% Shell 0.31% CMake 1.22% Makefile 0.27% HTML 0.08% CSS 0.10% Jupyter Notebook 57.51% C++ 33.20% Python 4.03% Cuda 2.75% Dockerfile 0.03%
caffe face-recognition sphereface

sphereface-plus's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sphereface-plus's Issues

Information about "inter_class_dist_output" layer

Dear Authors,
Thanks for the wonderful work. I was wondering if you explain the parameters in the following "inter_class_dist_output" layer.

layer {
  name: "inter_class_dist_output"
  type: "InterClass"
  bottom: "fc5"
  bottom: "label"
  top: "maximize_inter_class_dist"
  param {
    name: "fc6_w"
    lr_mult: 1
    decay_mult: 1
  }
  inter_class_param {
    num_output: 10572
    type: AMONG
    iteration: 16000
    alpha_start_iter: 16001
    alpha_start_value: 10 #among
  }
}

Essentially, if you could please answer the following questions then it would be great:
1 -> Parameter fc6_w which is a learnable parameter in the network. Could you please explain what this parameter is for in the network?
2 -> Could you please explain about this parameter type: AMONG?
3 -> It would be also great if you could please explain about iteration, alpha_start_iter and alpha_start_value parameters and how should we tune them?

Thanks very much.

maximize_inter_class_dist becomes zero when training with just one GPU instead of two

When I run the default files (solvers and shell script) that triggers the training, it converges with two gpus. Now when i train with just one gpu, the convergence is much slower - that is understandable.
Finally, when i train the model on just one gpu and with iter_size=2, that you mentioned, maximize_inter_class_dist in the console just becomes zero. Could you please reply on this?
Thanks.

Effect of mean and scale on the sphereface-plus training

Hello,
Thanks for the great work. One thing I noticed is that if I change the default mean and scale value, the softmax loss overflows (in sphereface training) and model does not converge without overflowing (in sphereface-plus). The problem is that I can not use this 127.5 and 0.0078125 as the scale value somehow. Could you please suggest, how to use mean value as 127 and scale as 1.0 (any layers should I change or any parameters should I modify) and still make the model convergent?

  transform_param {
    mean_value: 127.5
    mean_value: 127.5
    mean_value: 127.5
    scale: 0.0078125
    mirror: true
  }

About the inter_class_param for finetuning

Hi wy1iu,

I'm trying to use MHE to finetune my FR model trained with AM-Softmax, as it's mentioned in the paper.
I have gotten about 0.5% gain at my own dataset with the defalt parameters in the log file(as follow).
inter_class_param {
num_output: 10572
type: AMONG
iteration: 16000
alpha_start_iter: 16001
alpha_start_value: 10
}
But it seems that the "iteration" and "alpha_start_iter" are too large when finetuning, isn't it?

Thanks!

Confusion in the display of the loss while training sphereface-plus

Thanks for your great work. I was just wondering about the following snippet:

I0204 14:26:03.838340  6323 solver.cpp:218] Iteration 10980 (0.968637 iter/s, 20.6476s/20 iters), loss = 14.7497
I0204 14:26:03.838383  6323 solver.cpp:237]     Train net output #0: lambda = 5
I0204 14:26:03.838428  6323 solver.cpp:237]     Train net output #1: maximize_inter_class_dist = 14.3479 (* 1 = 14.3479 loss)
I0204 14:26:03.838443  6323 solver.cpp:237]     Train net output #2: softmax_loss = 0.401827 (* 1 = 0.401827 loss)
I0204 14:26:03.838527  6323 sgd_solver.cpp:105] Iteration 10980, lr = 0.01
I0204 14:26:22.732009  6323 solver.cpp:447] Snapshotting to binary proto file result/m_single_model_iter_11000.caffemodel
I0204 14:26:23.138741  6323 sgd_solver.cpp:273] Snapshotting solver state to binary proto file result/m_single_model_iter_11000.solverstate
I0204 14:26:24.393401  6323 solver.cpp:310] Iteration 11000, loss = 13.0881
I0204 14:26:24.393429  6323 solver.cpp:315] Optimization Done.
I0204 14:26:24.788636  6323 caffe.cpp:259] Optimization Done.

There is this Train net output #2: softmax_loss = 0.401827 (* 1 = 0.401827 loss) and at the end it also shows I0204 14:26:24.393401 6323 solver.cpp:310] Iteration 11000, loss = 13.0881. What is this last loss 13.0881? - I think that this loss is a combination of a few other losses. Should I take care of this loss or the softmax_loss during the training.

Hyper-param explained.

Hi I got confused by part of the code:
https://github.com/wy1iu/sphereface-plus/blob/master/tools/caffe-sfplus/src/caffe/layers/inter_class_layer.cpp#L137-L161

Dtype alpha_start_iter_ = this->layer_param_.inter_class_param().alpha_start_iter();
  Dtype alpha_start_value_ = this->layer_param_.inter_class_param().alpha_start_value();
  Dtype alpha_step_ = this->layer_param_.inter_class_param().alpha_step();
  Dtype alpha_stepvalue_size = this->layer_param_.inter_class_param().alpha_stepvalue_size();
  Dtype normalize_ = this->layer_param_.inter_class_param().normalize();
  if (alpha_stepvalue_size != 0){
    const int* alpha_stepvalue_data = alpha_stepvalues.cpu_data();
    if (alpha_start_iter_ == iter_){
      alpha_ = alpha_start_value_;
    }
    else if(alpha_start_iter_ < iter_) {
      if(alpha_stepvalue_data[alpha_index_] == iter_ && alpha_index_<alpha_stepvalue_size){
        alpha_ += alpha_step_;
        alpha_index_ += (Dtype)1.;
      }
    }
  }
  else{
    if (alpha_start_iter_ == iter_){
      alpha_ = alpha_start_value_;
    }
  }
  if (top.size() == 2) {
    top[1]->mutable_cpu_data()[0] = alpha_;
  }

It seems that you are doing some thing like annealing, right?

Is alpha_ the factor for energy term ($\lambda_M$ in Eq.(8) of the original paper)?

Roughly I understand that alpha_start_iter_ and alpha_start_value_ are the starting timestamp and starting value of alpha_.

But what does alpha_stepvalues mean?
Did you do something like multi-step lr scheduler that decreases alpha_ by a factor after several iterations?

Though you left alpha_stepvalues empty in the configuration, I'm still curious about this part.

do we have to train model for new faces?

Suppose that we have new faces to apply this method and recognize each one. In this scenario do I have to train the network on these new faces or the pretrained model will give the sufficient performance?

I ask this question because I can't get my desired accuracy , so I guess there are two possibilities 1- the pretrained model is not proper for new faces 2- I can't use caffe c++ properly.

Thank you for your great work and support.

Base lambda value in the log file

The base value of lambda according to the prototxt file it is 1000 in the log file that you have attached on the google drive here: https://drive.google.com/drive/folders/1kpGGvb5Nv0EmDicW2ue8LdUVjI0Ht1PP. However, in the same training log the lambda was set as 5 from the first iteration.
I was wondering if we set the base as 1000, should not it be starting from 1000 and then decrease it to the minimum lambda, which is 5 in this case. Could you please look into your log file: https://drive.google.com/drive/folders/1kpGGvb5Nv0EmDicW2ue8LdUVjI0Ht1PP?
Thanks.

how to modify the params of layer "inter_class_dist_output"?

Hi wy1iu, thanks for your sharing codes. I try to train the webface_casia datasets, the lossvalue convergence. but I train the ms1m datasets ,the lossvalue divergence. The params setting as follow:
inter_class_param {
num_output: 85164
type: AMONG
iteration: 16000
alpha_start_iter: 20000
alpha_start_value: 5.3
}
Should I modify the params of "iteration:" "alpha_start_iter:" "alpha_start_value: ",can you help me?
Is the params of "iteration:" "alpha_start_iter:" "alpha_start_value: " related to the number of trainning datasets?
Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.