Hello. I'm trying to overfit to a toy batch with r=2. With batchsize >1 I am unable

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

@shzygmyx, I tried <div class="snippet-clipboard-content notranslate position-rela

Overfitting with r>1 about matrix-capsules-pytorch HOT 9 OPEN

yuxianmeng commented on August 27, 2024

Overfitting with r>1

from matrix-capsules-pytorch.

Comments (9)

YuxianMeng commented on August 27, 2024

Did you try r=2 on MNIST data? I think this issue may caused by your toy batch. For example, if all your inputs in this toy batch are same while the targets are not, both capsules and human can learning nothing from this. I'd be glad to learn more details of this toy batch and training process on MNIST and fix this bug(if it exists).

from matrix-capsules-pytorch.

JianboGuo commented on August 27, 2024

@menorashid @shzygmyx I also tried training mnist with r >1 (e.g., r=2, 3). But all of them failed. Besides, it seems the loss didn't converge. I also used different learning rate, but never worked. The reason may be that errors occur when multiple EM iterations exist.

from matrix-capsules-pytorch.

YuxianMeng commented on August 27, 2024

@JianboGuo
Hi, I've run my code again. It seems working quite well.
python train.py -batch_size=8 -lr=2e-2 -num_epochs=5 -r=2 -print_freq=5
returns something like

batch:5, loss:0.3607, acc:0/8
batch:10, loss:0.3606, acc:4/8
batch:15, loss:0.3608, acc:1/8
batch:20, loss:0.3595, acc:1/8
batch:25, loss:0.3545, acc:1/8
batch:30, loss:0.3494, acc:1/8
batch:35, loss:0.2936, acc:2/8
batch:40, loss:0.2784, acc:4/8
batch:45, loss:0.1938, acc:6/8
batch:50, loss:0.2515, acc:3/8
batch:55, loss:0.1760, acc:6/8
batch:60, loss:0.1039, acc:7/8
batch:65, loss:0.1357, acc:7/8
batch:70, loss:0.1372, acc:6/8
batch:75, loss:0.1030, acc:6/8
batch:80, loss:0.0486, acc:7/8
batch:85, loss:0.0496, acc:7/8
batch:90, loss:0.0379, acc:8/8

So it definetely will converge. Would you mind providing more details after running the code above?
Also, please make sure that you are using the latest version of my code.

from matrix-capsules-pytorch.

JianboGuo commented on August 27, 2024

@shzygmyx Thanks for your answer. What I actually get is:
[guojianbo@localhost Matrix-Capsules-pytorch]$ python train.py -batch_size=64 -lr=2e-2 -num_epochs=5 -r=1 -print_freq=5
activating cuda
Epoch 0
batch:5, loss:0.3525, acc:11/64
batch:10, loss:0.3546, acc:4/64
batch:15, loss:0.3207, acc:5/64
batch:20, loss:0.2899, acc:11/64
batch:25, loss:0.1925, acc:28/64
batch:30, loss:0.1050, acc:51/64
batch:35, loss:0.0987, acc:49/64
batch:40, loss:0.0776, acc:52/64
batch:45, loss:0.0526, acc:55/64
batch:50, loss:0.0267, acc:60/64
batch:55, loss:0.0242, acc:59/64
batch:60, loss:0.0218, acc:61/64
batch:65, loss:0.0341, acc:56/64
batch:70, loss:0.0279, acc:60/64
batch:75, loss:0.0452, acc:57/64
batch:80, loss:0.0307, acc:59/64
batch:85, loss:0.0162, acc:62/64
batch:90, loss:0.0466, acc:57/64
batch:95, loss:0.0135, acc:61/64
batch:100, loss:0.0130, acc:61/64

and

[guojianbo@localhost Matrix-Capsules-pytorch]$ python train.py -batch_size=64 -lr=2e-2 -num_epochs=5 -r=2 -print_freq=5
activating cuda
Epoch 0
batch:5, loss:0.3639, acc:5/64
batch:10, loss:0.3677, acc:9/64
batch:15, loss:0.3716, acc:6/64
batch:20, loss:0.3755, acc:2/64
batch:25, loss:0.3795, acc:5/64
batch:30, loss:0.3834, acc:2/64
batch:35, loss:0.3874, acc:7/64
batch:40, loss:0.3914, acc:6/64
batch:45, loss:0.3954, acc:4/64
batch:50, loss:0.3994, acc:12/64
batch:55, loss:0.4035, acc:6/64
batch:60, loss:0.4076, acc:2/64
batch:65, loss:0.4116, acc:8/64
batch:70, loss:0.4158, acc:8/64
batch:75, loss:0.4199, acc:5/64
batch:80, loss:0.4240, acc:9/64
batch:85, loss:0.4283, acc:6/64
batch:90, loss:0.4324, acc:9/64
batch:95, loss:0.4367, acc:5/64
batch:100, loss:0.4409, acc:3/64
batch:105, loss:0.4452, acc:2/64
batch:110, loss:0.4495, acc:5/64
batch:115, loss:0.4538, acc:5/64
batch:120, loss:0.4581, acc:11/64
batch:125, loss:0.4624, acc:9/64
batch:130, loss:0.4669, acc:4/64
batch:135, loss:0.4713, acc:3/64
batch:140, loss:0.4756, acc:6/64
batch:145, loss:0.4802, acc:5/64
batch:150, loss:0.4845, acc:7/64
batch:155, loss:0.4889, acc:9/64
batch:160, loss:0.4933, acc:9/64
batch:165, loss:0.4979, acc:6/64
batch:170, loss:0.5026, acc:7/64
batch:175, loss:0.5071, acc:6/64
batch:180, loss:0.5115, acc:7/64
batch:185, loss:0.5162, acc:9/64
batch:190, loss:0.5209, acc:5/64
batch:195, loss:0.5254, acc:8/64
batch:200, loss:0.5301, acc:8/64

I think the reason is the following:
since we do multiple dynamic routing procedures, i.e. continuously updating R, a, x, and what we actually use is the last a, x, then the gradient backpropagation I think only depends on the last a, x. However, in your code, gradient always backpropagates in every iteration (EM algo), which results in impurity for gradient w.r.t. the nn.parameter self.W.

from matrix-capsules-pytorch.

JianboGuo commented on August 27, 2024

@shzygmyx Actually, I use python 2.7 for the latest version, and I correct some "division ops" to make sure the result is the same as by python3. Also, my colleague tried it via python3. It also failed.

from matrix-capsules-pytorch.

YuxianMeng commented on August 27, 2024

@JianboGuo
Thanks for your feedback. It turns out that I ran the wrong code myself, I'm very sorry for that. I've reproduced this problem and trying to fix it. If you have any suggestions on this, please inform me. By the way, a Pull Request is also welcome.

from matrix-capsules-pytorch.

JianboGuo commented on August 27, 2024

@shzygmyx Thanks for your reply. I think the reason lies in the procedure(E-step) for updating coefficient R. I am also trying to fix the bugs.

from matrix-capsules-pytorch.

YuxianMeng commented on August 27, 2024

@JianboGuo Hi, I find that this convergence problem may be caused by schedule of lambda_ and m(mostly lambda) in train.py. The previous schedule is to increase lambda_ and m 2e-1 every epoch. Changing it to 2e-2 helps capsule converge in r=2 and r=3 cases. On the other hand, please decrease the maximum of lambda and m if the loss suddenly increases after several batches. Changing line 84~line 87 to

                if lambda_ < 1.2e-3:
                    lambda_ += 2e-2/steps
                if m < 0.2:
                    m += 2e-2/steps

works for me, at least for the first hundreds of batches. Also, note that this schedule is far from the best one. Good luck and looking forward to your findings on better schedules!

from matrix-capsules-pytorch.

dragonfly90 commented on August 27, 2024

@shzygmyx, I tried

if lambda_ < 1.2e-3:
  lambda_ += 2e-2/steps
if m < 0.2:
  m += 2e-2/steps

but, the Epoch1 Test acc:0.1135, it seems that it also does not converge.

from matrix-capsules-pytorch.

Overfitting with r>1 about matrix-capsules-pytorch HOT 9 OPEN

Comments (9)

Related Issues (10)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent