Giter Site home page Giter Site logo

siamese-networks-for-one-shot-learning's Introduction

Siamese-Networks-for-One-Shot-Learning

This repository was created for me to familiarize with One Shot Learning. The code uses Keras library and the Omniglot dataset. This repository tries to implement the code for Siamese Neural Networks for One-shot Image Recognition by Koch et al..

One-Shot Learning

Currently most deep learning models need generally thousands of labeled samples per class. Data acquisition for most tasks is very expensive. The possibility to have models that could learn from one or a few samples is a lot more interesting than having the need of acquiring and labeling thousands of samples. One could argue that a young child can learn a lot of concepts without needing a large number of examples. This is where one-shot learning appears: the task of classifying with only having access of one example of each possible class in each test task. This ability of learning from little data is very interesting and could be used in many machine learning problems.

Despite this paper is focused on images, this concept can be applied to many fields. To fully understand the problem we should describe what is considered an example of an one-shot task. Given a test sample, X, an one-shot task would aim to classify this test image into one of C categories. For this support set of samples with a representing N unique categories (N-way one shot task) is given to the model in order to decide what is the class of the test images. Notice that none of the samples used in this one-shot task have been seen by the model (the categories are different in training and testing).

Frequently for one-shot learning tasks, the Omniglot dataset is used for evaluating the performance of the models. Let’s take a deeper look to this database, since it was the dataset used in the paper (MNIST was also tested but we will stick with Omniglot).

Omniglot Dataset

Omniglot Dataset

The Omniglot dataset consists in 50 different alphabets, 30 used in a background set and 20 used in a evaluation set. Each alphabet has a number of characters from 14 to 55 different characters drawn by 20 different subjects, resulting in 20 105x105 images for each character. The background set should be used in training for hyper parameter tuning and feature learning, leaving the final results to the remaining 20 alphabets, never seen before by the models trained in the background set. Despite that this paper uses 40 background alphabets and 10 evaluation alphabets.

This dataset is considered as sort of a MNIST transpose, where the number of possible classes is considerably higher than the number of training samples, making it suitable to one-shot tasks.

The authors use 20-way one-shot task for evaluating the performance in the evaluation set. For each alphabet it is performed 40 different one-shot tasks, completing a total of 400 tasks for the 10 evaluation alphabets. An example of one one-shot task in this dataset can be seen in the following figure:

One-Shot Task

Let's dive into the methodology proposed by Koch_et al._ to solve this one-shot task problem.

Methodology

To solve this methodology, the authors propose the use of a Deep Convolutional Siamese Networks. Siamese Nets were introduced by Bromley and Yan LeCun in the 90s for a verification problem. Siamese nets are two twin networks that accept distinct inputs but are joined in by a energy function that calculates a distance metric between the outputs of the two nets. The weights of both networks are tied, allowing them to compute the same function. In this paper the weighed L1 distance between twin feature vectors is used as energy function, combined with a sigmoid activations.

This architecture seems to be designed for verification tasks, and this is exactly how the authors approach the problem.

In the paper a convolutional neural net was used. 3 Blocks of Cov-RELU-Max Pooling are used followed by a Conv-RELU connected to a fully-connected layer with a sigmoid function. This layer produces the feature vectors that will be fused by the L1 weighed distance layer. The output is fed to a final layer that outputs a value between 1 and 0 (same class or different class). To assess the best architecture, Bayesian hyper-parameter tuning was performed. The best architecture is depicted in the following image:

best_architecture

L2-Regularization is used in each layer, and as an optimizer it is used Stochastic Gradient Descent with momentum. As previously mentioned, Bayesian hyperparameter optimization was used to find the best parameters for the following topics:

  • Layer-wise Learning Rates (search from 0.0001 to 0.1)
  • Layer-wise Momentum (search from 0 to 1)
  • Layer-wise L2-regularization penalty (from 0 to 0.1)
  • Filter Size from 3x3 to 20x20
  • Filter numbers from 16 to 256 (using multipliers of 16)
  • Number of units in the fully connected layer from 128 to 4096 (using multipliers of 16)

For training some details were used:

  • The learning rate is defined layer-wise and it is decayed by 1% each epoch.
  • In every layer the momentum is fixed at 0.5 and it is increased linearly each epoch until reaching a value mu.
  • 40 alphabets were used in training and validation and 10 for evaluation
  • The problem is considered a verification task since the train consists in classifying pairs in same or different character. - After that in evaluation phase, the test image is paired with each one of the support set characters. The pair with higher probability output is considered the class for the test image.
  • Data Augmentation was used with affine distortions (rotations, translations, shear and zoom)

Implementation Details

When comparing to the original paper, there are some differences in this implementation, namely:

  • The organization of training/validation/evaluation is different from the original paper. In the paper they follow the division suggested by the paper that introduced the Omniglot dataset, while in this implementation I used a different approach: from the 30 alphabets background set, 80% (24) are used for training and 20% (6) are using for validation one-shot tasks.
  • In the paper it is said that the momentum evolves linearly along epochs, but no details about this are present. Therefore I introduced a momentum_slope parameter that controls how the momentum evolves across the epochs.
  • In the paper the learning rate decays 1% each epoch, while in this implementation it decays 1% each 500 iterations.
  • The hyperparameter optimization does not include the Siamese network architecture tuning. Since the paper already describes the best architecture, I decided to reduce the hyperparameter space search to just the other parameters.
  • Weight initialization: I found them to not have high influence on the final results. Therefore, in this implementation the default glorot uniform initialization is used.

Code Details

There are two main files to run the code in this repo:

  • train_siamese_networks.py that allows you to train a siamese network with a specific set of parameters.
  • bayesian_hyperparameter_optimization.py that does Bayesian hyperparameter optimization as described in the paper.

Both files store the tensorflow curve logs that can be consulted in tensorboard (in a logs folder that is created), also the models with higher validation one-shot task accuracy are saved in a models folder, allowing to keep the best models.

Regarding the rest of the code:

  • omniglot_loader is a class used to load the dataset and prepare it to the train and one-shot tasks.
  • image_augmentor is used by omniglot_loader to augment data like described in the paper. Most of this code is adapted from keras image generator code
  • modified_sgd is an adaptation of the original keras sgd, but it is modified to allow layer_wise learning rate and momentums.
  • siamese_net is the main class that holds the model and trains it.

Notes:

  • I noticed that some combination of hyperparameters (especially with high learning rates) would lead to train accuracy stabilizing in 0.5, leading to output always the same probability for all images. Therefor I added some early stop conditions to the code.
  • Due to hardware and time limitations, I did get to run a fully optimization run with the parameters described in the paper. The code is available though for someone who wants to play with it.
  • I have not been able to reproduce the results reported by the authors (>90% in the evaluation set). I was able to get results in the order of 70%+ with SGD+momentum and 80%+ with Adam optimizer. I believe this happened because a good set of hyperparameters is harder to find with SGD. I believe that with a proper hardware and time, with Bayesian optimization, the results would be much closer to the reported ones (or at least similar to the ones gotten with Adam optimizer).
  • The code uses GPy and GPyOpt for Bayesian Hyperparameter Optimization.

References

  • Koch, Gregory, Richard Zemel, and Ruslan Salakhutdinov. "Siamese neural networks for one-shot image recognition." ICML Deep Learning Workshop. Vol. 2. 2015.

Credits

I would like to give credit to a blog post that introduced me to this paper, when I was searching for Siamese Networks. The blog post also includes code for this paper, despite having some differences regarding this repo (Adam optimizer is used, layerwise learning-rate option is not available). It is a great blog post go check it out:

siamese-networks-for-one-shot-learning's People

Contributors

aplusc98 avatar nokiaddt avatar tensorfreitas avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

siamese-networks-for-one-shot-learning's Issues

Author's implementation

Hello,
Can you please let me know where can I find author's implementation? You did mention that it slightly different. And what was the accuracy rate of your implementation? Thanks

Versions of tensorflow, cuda and cudnn

Hello,
I'm trying to run your training script, but cudnn does not initalize. It is due to some conflicts with my tensorflow version apparently. Can you precise me the version you used to run your scripts, please?

Herlderlord

Evaluation alphabets acuracy: 1.0, is that right?

I changed the max train iteration to 18000, and after the training process the evaluation result shows below:

Making One Shot Task on evaluation alphabets:
Atemayar_Qelisayer alphabet, accuracy: 1.0
ULOG alphabet, accuracy: 1.0
Sylheti alphabet, accuracy: 1.0
Angelic alphabet, accuracy: 1.0
Glagolitic alphabet, accuracy: 1.0
Ge_ez alphabet, accuracy: 1.0
Tengwar alphabet, accuracy: 1.0
Oriya alphabet, accuracy: 1.0
Avesta alphabet, accuracy: 1.0
Kannada alphabet, accuracy: 1.0
Aurek-Besh alphabet, accuracy: 1.0
Keble alphabet, accuracy: 1.0
Mongolian alphabet, accuracy: 1.0
Gurmukhi alphabet, accuracy: 1.0
Manipuri alphabet, accuracy: 1.0
Malayalam alphabet, accuracy: 1.0
Atlantean alphabet, accuracy: 1.0
Old_Church_Slavonic_(Cyrillic) alphabet, accuracy: 1.0
Tibetan alphabet, accuracy: 1.0
Syriac_(Serto) alphabet, accuracy: 1.0

Mean global accuracy: 1.0
Final Evaluation Accuracy = 1.0

Is that the same with you?

[Question] about the difference between the paper and implementation

Hi, there!

Thanks for your great work, it helps a lot and the application of siamese networks for one-shot learning also deserves further research.

However, I find you implement the loss function as binary_crossentropy, in the following way:

self.model.compile(loss='binary_crossentropy', metrics=['binary_accuracy'],

I suppose it is the expected behavior but the paper (https://www.cs.cmu.edu/~rsalakhu/papers/oneshot1.pdf, oneshot1.pdf) says, the former part of the proposed loss function is:
image
which seems to be the opposite number of binary_crossentropy and confuses me a lot. Does the author mean:
image

Or could you please provide some more explanation?

Thanks&Best Regards

pretrained model

Hi,
I tried to train the network with a server using two GPUs still it's so much time consuming.
Could you please share the weights and biases for pretrained network.

Thank you

The training went wrong.

First of all, thanks to your code, I have a deeper understanding of the paper.I met a problem in the training process, I hope the author can help me to solve it.Thank you.Here's the problem.
Traceback (most recent call last):
File "D:\PyCharm 5.0.3\helpers\pydev\pydevd.py", line 2407, in
globals = debugger.run(setup['file'], None, None, is_module)
File "D:\PyCharm 5.0.3\helpers\pydev\pydevd.py", line 1798, in run
launch(file, globals, locals) # execute the script
File "D:\PyCharm 5.0.3\helpers\pydev_pydev_imps_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "D:/software/github/Siamese-Networks-for-One-Shot-Learning/train_siamese_network.py", line 59, in
main()
File "D:/software/github/Siamese-Networks-for-One-Shot-Learning/train_siamese_network.py", line 31, in main
tensorboard_log_path=tensorboard_log_path
File "D:/software/github/Siamese-Networks-for-One-Shot-Learning\siamese_network.py", line 72, in init
l2_regularization_penalization)
File "D:/software/github/Siamese-Networks-for-One-Shot-Learning\siamese_network.py", line 142, in _construct_siamese_architecture
momentum=0.5)
File "D:/software/github/Siamese-Networks-for-One-Shot-Learning\modified_sgd.py", line 42, in init
self.lr = K.variable(lr, name='lr')
AttributeError: can't set attribute

Keras and TensorFlow versions.

Hey, thanks for the great work. Can you please tell me what versions of Keras and TensorFlow did you use? I am facing issues running the code on the present version. Thanks!

Low accuracy

Hi, i cloned the repo and tried running it as it is but my accuracy doesn't seem to be improving. Is it because of the TensorFlow warning that I get? Is there any pre-processing that I'm supposed to do before running the train_siamese_networks

(mypython3) MacBook-Pro-3:one-shot Sabrinai$ python train_siamese_network.py
Using TensorFlow backend.
2018-08-14 18:21:16.735624: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Iteration 1/1000000: Train loss: 4.849324, Train Accuracy: 0.531250, lr = 0.001000
Iteration 2/1000000: Train loss: 4.848513, Train Accuracy: 0.453125, lr = 0.001000
Iteration 3/1000000: Train loss: 4.848382, Train Accuracy: 0.515625, lr = 0.001000
Iteration 4/1000000: Train loss: 4.848629, Train Accuracy: 0.546875, lr = 0.001000
Iteration 5/1000000: Train loss: 4.848142, Train Accuracy: 0.578125, lr = 0.001000
Iteration 6/1000000: Train loss: 4.848841, Train Accuracy: 0.468750, lr = 0.001000
Iteration 7/1000000: Train loss: 4.846568, Train Accuracy: 0.578125, lr = 0.001000
Iteration 8/1000000: Train loss: 4.847251, Train Accuracy: 0.484375, lr = 0.001000
Iteration 9/1000000: Train loss: 4.846010, Train Accuracy: 0.578125, lr = 0.001000
Iteration 10/1000000: Train loss: 4.846083, Train Accuracy: 0.562500, lr = 0.001000
Iteration 11/1000000: Train loss: 4.846534, Train Accuracy: 0.437500, lr = 0.001000
Iteration 12/1000000: Train loss: 4.846575, Train Accuracy: 0.484375, lr = 0.001000
Iteration 13/1000000: Train loss: 4.846043, Train Accuracy: 0.468750, lr = 0.001000
Iteration 14/1000000: Train loss: 4.844590, Train Accuracy: 0.609375, lr = 0.001000
Iteration 15/1000000: Train loss: 4.844895, Train Accuracy: 0.468750, lr = 0.001000
Iteration 16/1000000: Train loss: 4.844888, Train Accuracy: 0.468750, lr = 0.001000
Iteration 17/1000000: Train loss: 4.845278, Train Accuracy: 0.484375, lr = 0.001000
Iteration 18/1000000: Train loss: 4.843062, Train Accuracy: 0.562500, lr = 0.001000
Iteration 19/1000000: Train loss: 4.843076, Train Accuracy: 0.593750, lr = 0.001000
Iteration 20/1000000: Train loss: 4.842909, Train Accuracy: 0.437500, lr = 0.001000
Iteration 21/1000000: Train loss: 4.842513, Train Accuracy: 0.531250, lr = 0.001000
Iteration 22/1000000: Train loss: 4.841990, Train Accuracy: 0.484375, lr = 0.001000
Iteration 23/1000000: Train loss: 4.841151, Train Accuracy: 0.500000, lr = 0.001000
Iteration 24/1000000: Train loss: 4.841326, Train Accuracy: 0.500000, lr = 0.001000
Iteration 25/1000000: Train loss: 4.840984, Train Accuracy: 0.437500, lr = 0.001000
Iteration 26/1000000: Train loss: 4.838634, Train Accuracy: 0.515625, lr = 0.001000
Iteration 27/1000000: Train loss: 4.838871, Train Accuracy: 0.609375, lr = 0.001000
Iteration 28/1000000: Train loss: 4.837312, Train Accuracy: 0.609375, lr = 0.001000
Iteration 29/1000000: Train loss: 4.838040, Train Accuracy: 0.578125, lr = 0.001000
Iteration 30/1000000: Train loss: 4.836654, Train Accuracy: 0.531250, lr = 0.001000
Iteration 31/1000000: Train loss: 4.835261, Train Accuracy: 0.578125, lr = 0.001000
Iteration 32/1000000: Train loss: 4.836050, Train Accuracy: 0.515625, lr = 0.001000
Iteration 33/1000000: Train loss: 4.835767, Train Accuracy: 0.515625, lr = 0.001000
Iteration 34/1000000: Train loss: 4.834557, Train Accuracy: 0.500000, lr = 0.001000
Iteration 35/1000000: Train loss: 4.833904, Train Accuracy: 0.500000, lr = 0.001000
Iteration 36/1000000: Train loss: 4.832893, Train Accuracy: 0.515625, lr = 0.001000
Iteration 37/1000000: Train loss: 4.832811, Train Accuracy: 0.453125, lr = 0.001000
Iteration 38/1000000: Train loss: 4.829530, Train Accuracy: 0.625000, lr = 0.001000
Iteration 39/1000000: Train loss: 4.830757, Train Accuracy: 0.421875, lr = 0.001000
Iteration 40/1000000: Train loss: 4.829597, Train Accuracy: 0.468750, lr = 0.001000
Iteration 41/1000000: Train loss: 4.830768, Train Accuracy: 0.531250, lr = 0.001000
Iteration 42/1000000: Train loss: 4.826791, Train Accuracy: 0.515625, lr = 0.001000
Iteration 43/1000000: Train loss: 4.825655, Train Accuracy: 0.578125, lr = 0.001000
Iteration 44/1000000: Train loss: 4.825454, Train Accuracy: 0.562500, lr = 0.001000
Iteration 45/1000000: Train loss: 4.823873, Train Accuracy: 0.578125, lr = 0.001000
Iteration 46/1000000: Train loss: 4.821697, Train Accuracy: 0.531250, lr = 0.001000
Iteration 47/1000000: Train loss: 4.821634, Train Accuracy: 0.546875, lr = 0.001000
Iteration 48/1000000: Train loss: 4.819137, Train Accuracy: 0.593750, lr = 0.001000
Iteration 49/1000000: Train loss: 4.818835, Train Accuracy: 0.531250, lr = 0.001000
Iteration 50/1000000: Train loss: 4.813269, Train Accuracy: 0.468750, lr = 0.001000
Iteration 51/1000000: Train loss: 4.813827, Train Accuracy: 0.562500, lr = 0.001000
Iteration 52/1000000: Train loss: 4.813818, Train Accuracy: 0.500000, lr = 0.001000
Iteration 53/1000000: Train loss: 4.813799, Train Accuracy: 0.437500, lr = 0.001000
Iteration 54/1000000: Train loss: 4.810773, Train Accuracy: 0.515625, lr = 0.001000
Iteration 55/1000000: Train loss: 4.808959, Train Accuracy: 0.500000, lr = 0.001000
Iteration 56/1000000: Train loss: 4.810541, Train Accuracy: 0.453125, lr = 0.001000
Iteration 57/1000000: Train loss: 4.807623, Train Accuracy: 0.453125, lr = 0.001000
Iteration 58/1000000: Train loss: 4.804411, Train Accuracy: 0.531250, lr = 0.001000
Iteration 59/1000000: Train loss: 4.800980, Train Accuracy: 0.546875, lr = 0.001000
Iteration 60/1000000: Train loss: 4.802567, Train Accuracy: 0.484375, lr = 0.001000
Iteration 61/1000000: Train loss: 4.801285, Train Accuracy: 0.468750, lr = 0.001000
Iteration 62/1000000: Train loss: 4.798449, Train Accuracy: 0.453125, lr = 0.001000
Iteration 63/1000000: Train loss: 4.795244, Train Accuracy: 0.515625, lr = 0.001000
Iteration 64/1000000: Train loss: 4.795055, Train Accuracy: 0.468750, lr = 0.001000
Iteration 65/1000000: Train loss: 4.790056, Train Accuracy: 0.500000, lr = 0.001000
Iteration 66/1000000: Train loss: 4.792620, Train Accuracy: 0.515625, lr = 0.001000
Iteration 67/1000000: Train loss: 4.791032, Train Accuracy: 0.468750, lr = 0.001000
Iteration 68/1000000: Train loss: 4.788491, Train Accuracy: 0.515625, lr = 0.001000
Iteration 69/1000000: Train loss: 4.787354, Train Accuracy: 0.500000, lr = 0.001000
Iteration 70/1000000: Train loss: 4.785830, Train Accuracy: 0.484375, lr = 0.001000
Iteration 71/1000000: Train loss: 4.783831, Train Accuracy: 0.484375, lr = 0.001000
Iteration 72/1000000: Train loss: 4.781487, Train Accuracy: 0.500000, lr = 0.001000
Iteration 73/1000000: Train loss: 4.779665, Train Accuracy: 0.546875, lr = 0.001000
Iteration 74/1000000: Train loss: 4.779678, Train Accuracy: 0.500000, lr = 0.001000
Iteration 75/1000000: Train loss: 4.777548, Train Accuracy: 0.484375, lr = 0.001000
Iteration 76/1000000: Train loss: 4.774950, Train Accuracy: 0.515625, lr = 0.001000
Iteration 77/1000000: Train loss: 4.774406, Train Accuracy: 0.437500, lr = 0.001000
Iteration 78/1000000: Train loss: 4.768193, Train Accuracy: 0.515625, lr = 0.001000
Iteration 79/1000000: Train loss: 4.769056, Train Accuracy: 0.500000, lr = 0.001000
Iteration 80/1000000: Train loss: 4.764924, Train Accuracy: 0.546875, lr = 0.001000
Iteration 81/1000000: Train loss: 4.766854, Train Accuracy: 0.500000, lr = 0.001000
Iteration 82/1000000: Train loss: 4.763232, Train Accuracy: 0.515625, lr = 0.001000
Iteration 83/1000000: Train loss: 4.761155, Train Accuracy: 0.531250, lr = 0.001000
Iteration 84/1000000: Train loss: 4.760967, Train Accuracy: 0.484375, lr = 0.001000
Iteration 85/1000000: Train loss: 4.758671, Train Accuracy: 0.562500, lr = 0.001000
Iteration 86/1000000: Train loss: 4.754441, Train Accuracy: 0.562500, lr = 0.001000
Iteration 87/1000000: Train loss: 4.756378, Train Accuracy: 0.500000, lr = 0.001000
Iteration 88/1000000: Train loss: 4.754009, Train Accuracy: 0.531250, lr = 0.001000
Iteration 89/1000000: Train loss: 4.769855, Train Accuracy: 0.515625, lr = 0.001000
Iteration 90/1000000: Train loss: 4.751012, Train Accuracy: 0.546875, lr = 0.001000
Iteration 91/1000000: Train loss: 4.749597, Train Accuracy: 0.562500, lr = 0.001000
Iteration 92/1000000: Train loss: 4.745842, Train Accuracy: 0.515625, lr = 0.001000
Iteration 93/1000000: Train loss: 4.747173, Train Accuracy: 0.484375, lr = 0.001000
Iteration 94/1000000: Train loss: 4.744622, Train Accuracy: 0.546875, lr = 0.001000
Iteration 95/1000000: Train loss: 4.743729, Train Accuracy: 0.484375, lr = 0.001000
Iteration 96/1000000: Train loss: 4.740994, Train Accuracy: 0.531250, lr = 0.001000
Iteration 97/1000000: Train loss: 4.739508, Train Accuracy: 0.531250, lr = 0.001000
Iteration 98/1000000: Train loss: 4.727015, Train Accuracy: 0.515625, lr = 0.001000
Iteration 99/1000000: Train loss: 4.733810, Train Accuracy: 0.578125, lr = 0.001000

requirements.txt

Hi, may I request for a requirement.txt file so I know what to pip install.
And are you using python 3.5 or 3.7?

Thank you.

Version of tensorflow

Hello,
Can you tell me which version of tensorflow you used in this code?
Thanks for this code!

prediction code

Hi,
I am a beginner in this field and I am really curious about one shot learning because I am facing the lack of data problem. So can you tell me how to use the resulted model to do the prediction.

Thank you

Using with own dataset

Hi, I'm trying to run this with my own dataset (which only has characters and no alphabets).

I'm having trouble modifying this..

    # First let's take care of the train alphabets
    for alphabet in os.listdir(train_path):
        alphabet_path = os.path.join(train_path, alphabet)

        current_alphabet_dictionary = {}

        for character in os.listdir(alphabet_path):
            character_path = os.path.join(alphabet_path, character)

            current_alphabet_dictionary[character] = os.listdir(
                character_path)

        self.train_dictionary[alphabet] = current_alphabet_dictionary

I've tried changing this to

    for alphabet in os.listdir(train_path):
        alphabet_path = os.path.join(train_path, alphabet)

        current_alphabet_dictionary = {}
        
        for character in os.listdir(alphabet_path):
            character_path = os.path.join(alphabet_path, character)

            current_alphabet_dictionary[character] = character_path
        self.train_dictionary[alphabet] = current_alphabet_dictionary

My character path is something like this(with the number '498' being the label)

character path isdata/train/raw/498/996.png

I'm getting a type error saying that

TypeError: list indices must be integers or slices, not str

This is because the dictionary is supposed to have keys and values, but what if I don't have any key(alphabets) to work with, and only have values(characters)?

train epoch acc have no change

firstly thx for yr work, and since the version, i made some change in your code, but it works nothing, and every epoch the accurcy rate was not change, like this.

4(}32%7( MGBF_$ZW`E%1

code problems

Traceback (most recent call last):
File "/home/user/ShadowCreative/Siamese network/Siamese-Networks-for-One-Shot-Learning/train_siamese_network.py", line 59, in
main()
File "/home/user/ShadowCreative/Siamese network/Siamese-Networks-for-One-Shot-Learning/train_siamese_network.py", line 46, in main
model_name='siamese_net_lr10e-4')
File "/home/user/ShadowCreative/Siamese network/Siamese-Networks-for-One-Shot-Learning/siamese_network.py", line 235, in train_siamese_network
train_loss, train_accuracy = self.model.train_on_batch(images, labels)
File "/home/user/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1727, in train_on_batch
logs = self.train_function(iterator)
File "/home/user/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 828, in call
result = self._call(*args, **kwds)
File "/home/user/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 871, in _call
self._initialize(args, kwds, add_initializers_to=initializers)
File "/home/user/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 726, in _initialize
*args, **kwds))
File "/home/user/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 2969, in _get_concrete_function_internal_garbage_collected
graph_function, _ = self._maybe_define_function(args, kwargs)
File "/home/user/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 3361, in _maybe_define_function
graph_function = self._create_graph_function(args, kwargs)
File "/home/user/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 3206, in _create_graph_function
capture_by_value=self._capture_by_value),
File "/home/user/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/framework/func_graph.py", line 990, in func_graph_from_py_func
func_outputs = python_func(*func_args, **func_kwargs)
File "/home/user/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 634, in wrapped_fn
out = weak_wrapped_fn().wrapped(*args, **kwds)
File "/home/user/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/framework/func_graph.py", line 977, in wrapper
raise e.ag_error_metadata.to_exception(e)
TypeError: in user code:

/home/user/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py:805 train_function  *
    return step_function(self, iterator)
/home/user/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py:795 step_function  **
    outputs = model.distribute_strategy.run(run_step, args=(data,))
/home/user/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/distribute/distribute_lib.py:1259 run
    return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
/home/user/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/distribute/distribute_lib.py:2730 call_for_each_replica
    return self._call_for_each_replica(fn, args, kwargs)
/home/user/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/distribute/distribute_lib.py:3417 _call_for_each_replica
    return fn(*args, **kwargs)
/home/user/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py:788 run_step  **
    outputs = model.train_step(data)
/home/user/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py:757 train_step
    self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
/home/user/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py:497 minimize
    loss, var_list=var_list, grad_loss=grad_loss, tape=tape)
/home/user/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py:547 _compute_gradients
    with ops.name_scope_v2(self._name + "/gradients"):

TypeError: unsupported operand type(s) for +: 'Modified_SGD' and 'str'

Process finished with exit code 1

@@@@@
Hi, i met some problems

AttributeError: 'decode'

Hi, after the line 'Trained Ended!'
I ran into AttributeError: 'str' object has no attribute 'decode'.
I believe it happens when loading the weights

Thank you

Training loss does not change in the first 700 iterations

Hey,

I have started to train the network using the code in this repo.
I see that the training accuracy has not gone above 0.65 and is mostly revolving around 0.45-0.52
in the first 700 iterations. Is this normal? the loss is also changing very minutely revolving around 5.1

Thanks for this code!

image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.