Giter Site home page Giter Site logo

Comments (9)

YuxianMeng avatar YuxianMeng commented on August 27, 2024

Did you try r=2 on MNIST data? I think this issue may caused by your toy batch. For example, if all your inputs in this toy batch are same while the targets are not, both capsules and human can learning nothing from this. I'd be glad to learn more details of this toy batch and training process on MNIST and fix this bug(if it exists).

from matrix-capsules-pytorch.

JianboGuo avatar JianboGuo commented on August 27, 2024

@menorashid @shzygmyx I also tried training mnist with r >1 (e.g., r=2, 3). But all of them failed. Besides, it seems the loss didn't converge. I also used different learning rate, but never worked. The reason may be that errors occur when multiple EM iterations exist.

from matrix-capsules-pytorch.

YuxianMeng avatar YuxianMeng commented on August 27, 2024

@JianboGuo
Hi, I've run my code again. It seems working quite well.
python train.py -batch_size=8 -lr=2e-2 -num_epochs=5 -r=2 -print_freq=5
returns something like

batch:5, loss:0.3607, acc:0/8
batch:10, loss:0.3606, acc:4/8
batch:15, loss:0.3608, acc:1/8
batch:20, loss:0.3595, acc:1/8
batch:25, loss:0.3545, acc:1/8
batch:30, loss:0.3494, acc:1/8
batch:35, loss:0.2936, acc:2/8
batch:40, loss:0.2784, acc:4/8
batch:45, loss:0.1938, acc:6/8
batch:50, loss:0.2515, acc:3/8
batch:55, loss:0.1760, acc:6/8
batch:60, loss:0.1039, acc:7/8
batch:65, loss:0.1357, acc:7/8
batch:70, loss:0.1372, acc:6/8
batch:75, loss:0.1030, acc:6/8
batch:80, loss:0.0486, acc:7/8
batch:85, loss:0.0496, acc:7/8
batch:90, loss:0.0379, acc:8/8

So it definetely will converge. Would you mind providing more details after running the code above?
Also, please make sure that you are using the latest version of my code.

from matrix-capsules-pytorch.

JianboGuo avatar JianboGuo commented on August 27, 2024

@shzygmyx Thanks for your answer. What I actually get is:
[guojianbo@localhost Matrix-Capsules-pytorch]$ python train.py -batch_size=64 -lr=2e-2 -num_epochs=5 -r=1 -print_freq=5
activating cuda
Epoch 0
batch:5, loss:0.3525, acc:11/64
batch:10, loss:0.3546, acc:4/64
batch:15, loss:0.3207, acc:5/64
batch:20, loss:0.2899, acc:11/64
batch:25, loss:0.1925, acc:28/64
batch:30, loss:0.1050, acc:51/64
batch:35, loss:0.0987, acc:49/64
batch:40, loss:0.0776, acc:52/64
batch:45, loss:0.0526, acc:55/64
batch:50, loss:0.0267, acc:60/64
batch:55, loss:0.0242, acc:59/64
batch:60, loss:0.0218, acc:61/64
batch:65, loss:0.0341, acc:56/64
batch:70, loss:0.0279, acc:60/64
batch:75, loss:0.0452, acc:57/64
batch:80, loss:0.0307, acc:59/64
batch:85, loss:0.0162, acc:62/64
batch:90, loss:0.0466, acc:57/64
batch:95, loss:0.0135, acc:61/64
batch:100, loss:0.0130, acc:61/64

and

[guojianbo@localhost Matrix-Capsules-pytorch]$ python train.py -batch_size=64 -lr=2e-2 -num_epochs=5 -r=2 -print_freq=5
activating cuda
Epoch 0
batch:5, loss:0.3639, acc:5/64
batch:10, loss:0.3677, acc:9/64
batch:15, loss:0.3716, acc:6/64
batch:20, loss:0.3755, acc:2/64
batch:25, loss:0.3795, acc:5/64
batch:30, loss:0.3834, acc:2/64
batch:35, loss:0.3874, acc:7/64
batch:40, loss:0.3914, acc:6/64
batch:45, loss:0.3954, acc:4/64
batch:50, loss:0.3994, acc:12/64
batch:55, loss:0.4035, acc:6/64
batch:60, loss:0.4076, acc:2/64
batch:65, loss:0.4116, acc:8/64
batch:70, loss:0.4158, acc:8/64
batch:75, loss:0.4199, acc:5/64
batch:80, loss:0.4240, acc:9/64
batch:85, loss:0.4283, acc:6/64
batch:90, loss:0.4324, acc:9/64
batch:95, loss:0.4367, acc:5/64
batch:100, loss:0.4409, acc:3/64
batch:105, loss:0.4452, acc:2/64
batch:110, loss:0.4495, acc:5/64
batch:115, loss:0.4538, acc:5/64
batch:120, loss:0.4581, acc:11/64
batch:125, loss:0.4624, acc:9/64
batch:130, loss:0.4669, acc:4/64
batch:135, loss:0.4713, acc:3/64
batch:140, loss:0.4756, acc:6/64
batch:145, loss:0.4802, acc:5/64
batch:150, loss:0.4845, acc:7/64
batch:155, loss:0.4889, acc:9/64
batch:160, loss:0.4933, acc:9/64
batch:165, loss:0.4979, acc:6/64
batch:170, loss:0.5026, acc:7/64
batch:175, loss:0.5071, acc:6/64
batch:180, loss:0.5115, acc:7/64
batch:185, loss:0.5162, acc:9/64
batch:190, loss:0.5209, acc:5/64
batch:195, loss:0.5254, acc:8/64
batch:200, loss:0.5301, acc:8/64

I think the reason is the following:
since we do multiple dynamic routing procedures, i.e. continuously updating R, a, x, and what we actually use is the last a, x, then the gradient backpropagation I think only depends on the last a, x. However, in your code, gradient always backpropagates in every iteration (EM algo), which results in impurity for gradient w.r.t. the nn.parameter self.W.

from matrix-capsules-pytorch.

JianboGuo avatar JianboGuo commented on August 27, 2024

@shzygmyx Actually, I use python 2.7 for the latest version, and I correct some "division ops" to make sure the result is the same as by python3. Also, my colleague tried it via python3. It also failed.

from matrix-capsules-pytorch.

YuxianMeng avatar YuxianMeng commented on August 27, 2024

@JianboGuo
Thanks for your feedback. It turns out that I ran the wrong code myself, I'm very sorry for that. I've reproduced this problem and trying to fix it. If you have any suggestions on this, please inform me. By the way, a Pull Request is also welcome.

from matrix-capsules-pytorch.

JianboGuo avatar JianboGuo commented on August 27, 2024

@shzygmyx Thanks for your reply. I think the reason lies in the procedure(E-step) for updating coefficient R. I am also trying to fix the bugs.

from matrix-capsules-pytorch.

YuxianMeng avatar YuxianMeng commented on August 27, 2024

@JianboGuo Hi, I find that this convergence problem may be caused by schedule of lambda_ and m(mostly lambda) in train.py. The previous schedule is to increase lambda_ and m 2e-1 every epoch. Changing it to 2e-2 helps capsule converge in r=2 and r=3 cases. On the other hand, please decrease the maximum of lambda and m if the loss suddenly increases after several batches. Changing line 84~line 87 to

                if lambda_ < 1.2e-3:
                    lambda_ += 2e-2/steps
                if m < 0.2:
                    m += 2e-2/steps

works for me, at least for the first hundreds of batches. Also, note that this schedule is far from the best one. Good luck and looking forward to your findings on better schedules!

from matrix-capsules-pytorch.

dragonfly90 avatar dragonfly90 commented on August 27, 2024

@shzygmyx, I tried

if lambda_ < 1.2e-3:
  lambda_ += 2e-2/steps
if m < 0.2:
  m += 2e-2/steps

but, the Epoch1 Test acc:0.1135, it seems that it also does not converge.

from matrix-capsules-pytorch.

Related Issues (10)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.