tamerthamoqa / facenet-pytorch-glint360k Goto Github PK

A PyTorch implementation of the 'FaceNet' paper for training a facial recognition model with Triplet Loss using the glint360k dataset. A pre-trained model using Triplet Loss is available for download.

License: MIT License

Python 100.00%

face-recognition facenet lfw-dataset multi-gpu pretrained-model pytorch triplet-loss vggface2-dataset

facenet-pytorch-glint360k's People

Contributors

Stargazers

Watchers

Forkers

jarygrace xixirupan hejh1995 vishwas1234567 amandayeyan peternara chifang musicboxes sbalk sunshiding xiaonanchong tachikoma2018 doveyouth caomy7 kdg1016 jumanuba gokulsg ivan0416 freegliboracle peererror ledduy610 diggle1 liyantett 55renchen jinhasong junyu1 amirzkn vareto-pytorch gisstar1981726 hdb1301040027 d-kicinski smallflyfly yysung1123 vuminhduc97 jaedukseo yongduek hslim11 poisonbox koobh maichieman1122 khai247 uditanshu23 leonilpark 993917172 farisalasmary weifengchiu saiamrit ilgrad ken82568 mrjson1 khlin216 ai-charlie zero506 dinhanit pobbyleesh lihuibng bpradana

facenet-pytorch-glint360k's Issues

why did you set Normalize(mean=[0.6071, 0.4609, 0.3944], std=[0.2457, 0.2175, 0.2129])

usually: Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
insightface: Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
the reason why your set?

torch.load pre-trained model error

hello, thank you for your contribution. i download your weight file and load it to check the performance on LFW , while load the pt file according to your description, the error "_pickle.UnpicklingError: A load persistent id instruction was encountered,
but no persistent_load function was specified." occurred. the pickle and save operation for your weight file maybe error???

Pre-trained model does not reproduce the results

Hi,

Thanks for sharing the repo. I tried to evaluate the pre-trained model on LFW. I couldn't reproduce the results you've reported.

The performance gap seems a bit large. I am sharing the ROC curve I've reproduced below:

Thanks for your help in advance!

about make Triplet dataset

Is this a way to make hard triplet online? Is it offline?

Error with code in readme for pretrained model

Hello, I tried your pretrained model with the code in the readme file but I get the below error. If I only pass in 'img' to preprocess it works but results aren't that great. Is there something I'm doing wrong? Thanks!

img = preprocess(img.to(device))
AttributeError: 'numpy.ndarray' object has no attribute 'to'

About make Triplet dataset

I have a puzzle，when generate triplet, I think the distance between pos and anc should less than the distance between anc and neg.

Embeddings are getting clustered together in a small region after training

Hi @tamerthamoqa,

Thanks a lot for such a fantastic repo which we could use for our work.
I was recently working on building a Face verification system using Siamese network. I was using results of pretrained models of Casia webface dataset and VGG2 Face dataset and was able to achieve close to 90% accuracy on my dataset. Further I was using Hard triplet batching sample and training strategy to further fine tune the network but for some reason after training the Embeddings for all the images are being clustered together or in other words the distance of two embeddings corresponding to two persons are getting too close to each other for example earlier using the pre-trained models if for two embedding we were getting 0.45 as cosine distance after training using this triplet loss we were getting 0.006 and it doesn't change much for same person or different person.

If you could give me any insights on this, that would be helpful.
Thanks

Upload raw glint360k?

Hi!

I would like to experiment with using different alignment strategies and more powerful detection/alignment models. For that it would be nice to have raw glint360k dataset. However, it is hard to get access to: the torrent is pretty much dead, and baidu is not very cooperative.

Would it be possible for you to upload raw glint360k to Google Drive?

triplet_loss_dataloader.py

Hello, I'm daniel,
While running your project, one question arose.

In dataloader/triplet_loss_dataloader,
It is a system that generates (pos, neg) class randomly as the number of triplets allocated for each processor, and randomly selects images,
but, When using the function of np.random.choice, I confirmed that the same random value is outputted for each processor.
So I used np.random.RandomState(), and I was able to use a different random value for each processor.

Please let me know if I understand this processor well or not.

Thank you.
Daniel

LICENCE

It will be very much appreciated if you could add a Licence to this project. Thank you.

About maintaining the aspect ratio of face

Hi,
I found the faces in your training and LFW test datasets are stretched. But from my perspective, if the trained model infers in normal aspect ratio faces, which may result in a performance degeneration.

do you think it is neccessary to keep the original face aspect ratio?

Questions about running validate_lfw() function in train_triplets_loss.py

Hello @tamerthamoqa
I use the validate_lfw() functions in my faceNet project, without changing anything, but when evaluating, it tooks almost 2 hours to calculate the distances and other metrics and still I didn't get a results, so the first question is I want to know if evaluating costs a lot of time, cause it computes on CPU instead of GPUs, and if it does, evaluate every epoch would costs, so I wonder how long does it take to train the whole model, it would very thankful if you can share me the training details so I can figure if there something uncorrect with my code.
Thanks Sincerely!

Face alignment for increased TAR@FAR (after training) and couple more thoughts

@tamerthamoqa
Hello again! Your pre-trained model is trained on unaligned VGG2 dataset, so it performs well with variances over pose. But many projects pre-process the images to obtain aligned faces which helps them to increase the TAR @ FAR score with given CNN model.
So I wonder are you interested in testing what can we get with face alignment ?
I implemented face align as transformation for the torchvision.transforms which let me test your pre-trained model on the raw LFW with this transform. It obtained TAR: 0.6640+-0.0389 @ FAR: 0.0010 without training and without face-stretching, which I think is promising. Unfortunately it can not be used with the cropped VGG2 and LFW for training/testing, because the faces are deformed/stretched (although it can be made to stretch the faces as well) and some face detections fail.
Next thing I'm not sure about is whether we can obtain less false-positives if the input faces are not stretched but preserve their shape. This leads to the next question - why the input is chosen to be square 224×224 ? Can't we change it to rectangle (for example 208×240) to better fit the human face instead of stretching the (aligned) faces ?
I also see that the normalized tensors RGB values have range [-2;2] is this the best range ?

Questions about L2 Normalization

Hi @tamerthamoqa ,
I'm curious about L2 Normalization, which would constrain the embedding into an euclidean feature space and , so the maximum distance of two features in feature space shouldn't be 2? why the threshold is from 0.0 to 4.0?
Thanks!

Precision calculations

Hello @tamerthamoqa,
I am using this repo on a custom dataset, but I encountered some weird behaviour, every other metric constantly changes during epochs, but Precision always stays the same at 0.5000+-0.5000. I have also defined a custom validation dataset for which I generated an equal amount of positive and negative pairs in total consisting of 422 pairs, here's an example on one of the epochs:

100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:22<00:00, 4.45it/s]
Epoch 137: Number of valid training triplets in epoch: 4
Validating on LFW! ...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:01<00:00, 2.65it/s]
Accuracy on LFW: 0.8818+-0.0417 Precision 0.5000+-0.5000 Recall 0.4364+-0.4368 ROC Area Under Curve: 0.1977 Best distance threshold: 1.16+-0.03 TAR: 0.2068+-0.2145 @ FAR: 0.0000

I tried reducing the range of threshold from the default to:
thresholds_roc = np.arange(0.5, 0.8, 0.1)
thresholds_val = np.arange(0.5, 0.8, 0.1)
But the precision stays the same. My question is what's going on with the precision calculations as far as I have reviewed the calculation logic checks out?
Thank you in advance.

Glint360k downloading issues

Hello, @tamerthamoqa!
I have tried to download glint360k (unpacked) from google drive, but each time downloading fails in the middle due to network issues. Idk now why, but I have not found any other copy of glint360k dataset except yours one.

Can I kindly ask you to split whole .zip file into pieces of 5 GB (by command split --verbose -b5G glint360_unpacked.zip glint360_unpacked.zip.) and download these pieces into google drive so that I could download it without network errors?

This is really important for me - I am doing thesis on face recognition and this dataset shows great metric on validation datasets.

The cpu usage rate is very high

the speed of training is slower, and the cpu usage rate is very high,about 800%, is there anything to solve it?

Resnet 18

Hello @tamerthamoqa

May I ask what is the reason you choose ResNet 18 for training? From my understanding, the more data we have, the deeper network we can use. For VGGFace2, it is about 3M images, therefore, I think ResNet 50 will be a better choice isn't it ?

RuntimeError: Given input size: (1536x5x5). Calculated output size: (1536x0x0). Output size is too small

I tried to run a test that trains only 20 of VGGFace2's train data.
And I tried to use inceptionresnet v2 as the network architecture, but it does not run with the following error.

RuntimeError: Given input size: (1536x5x5). Calculated output size: (1536x0x0). Output size is too small

Please let me know if there is any guessing or wrong part of the reason.

calculate_roc_values wrong function

Hello @tamerthamoqa,
I think there is an error in calculate_roc_values function with it comes to true_positive_rate, and false_positive_rate calculation, the mean should be calculated outside the loop, as we average across all folds

the new code:
def calculate_roc_values(thresholds, distances, labels, num_folds=10):
num_pairs = min(len(labels), len(distances))
num_thresholds = len(thresholds)
k_fold = KFold(n_splits=num_folds, shuffle=False)

true_positive_rates = np.zeros((num_folds, num_thresholds))
false_positive_rates = np.zeros((num_folds, num_thresholds))
precision = np.zeros(num_folds)
recall = np.zeros(num_folds)
accuracy = np.zeros(num_folds)
best_distances = np.zeros(num_folds)

indices = np.arange(num_pairs)

for fold_index, (train_set, test_set) in enumerate(k_fold.split(indices)):
    # Find the best distance threshold for the k-fold cross validation using the train set
    accuracies_trainset = np.zeros(num_thresholds)
    for threshold_index, threshold in enumerate(thresholds):
        _, _, _, _, accuracies_trainset[threshold_index] = calculate_metrics(
            threshold=threshold,
            dist=distances[train_set],
            actual_issame=labels[train_set],
        )
    best_threshold_index = np.argmax(accuracies_trainset)

    # Test on test set using the best distance threshold
    for threshold_index, threshold in enumerate(thresholds):
        (
            true_positive_rates[fold_index, threshold_index],
            false_positive_rates[fold_index, threshold_index],
            _,
            _,
            _,
        ) = calculate_metrics(
            threshold=threshold,
            dist=distances[test_set],
            actual_issame=labels[test_set],
        )

    (
        _,
        _,
        precision[fold_index],
        recall[fold_index],
        accuracy[fold_index],
    ) = calculate_metrics(
        threshold=thresholds[best_threshold_index],
        dist=distances[test_set],
        actual_issame=labels[test_set],
    )
    
    best_distances[fold_index] = thresholds[best_threshold_index]

# Calculate mean values of TPR and FPR across all folds
true_positive_rate = np.mean(true_positive_rates, 0)
false_positive_rate = np.mean(false_positive_rates, 0)


return (
    true_positive_rate,
    false_positive_rate,
    precision,
    recall,
    accuracy,
    best_distances,
)

embedding vectors dimension

Hi @tamerthamoqa ,

Thanks a lot for your great repo.
According to FaceNet paper the best dimension for embedded vector is 128. I am curious to know is there any specific reason that you used four time bigger embedded vector dimension to 512?

About the number of cropped glint360k

Hi @tamerthamoqa
Thanks for your release of cropped glint360k dataset.
The number of dataset is 16,967,993, but actually there seem to be 17,091,657.

(https://github.com/deepinsight/insightface/tree/master/recognition/partial_fc)

Is it caused by MTCNN detection?
Thanks