duanenielsen / deepinfomaxpytorch Goto Github PK

View Code? Open in Web Editor NEW

315.0 315.0 47.0 397 KB

Learning deep representations by mutual information estimation and maximization

Home Page: https://arxiv.org/abs/1808.06670

Python 100.00%

autoencoder compression deep-learning pytorch

deepinfomaxpytorch's Introduction

Welcome to Duane's Github

I like making cool reinforcement learning projects.

deepinfomaxpytorch's People

Contributors

Stargazers

Watchers

Forkers

manik-hossain ml-lab bianjiang1234567 kfzyqin jingcx avivt haopengzhang96 gideonite laughing-boy skyisnotwarm fintrek junjie2008v sbhakat originofamonia bangyou01 decoder996 dl-alva forks-learning overbestfitting kingfou hongbo-sun yhjflower lzw27 jizongfox chrisbyd 731963709 xiaotailong tor4z fansuhang aabbccgithub liqiang12689 wizard1203 liangzhendong123 mrleaper kouheifurukawa petergit20 xdusponge greatwizard9519 downeykking athaioan liam0949 derekqxu gaopeigi3 zhn6818 burjune jevenm

deepinfomaxpytorch's Issues

The loss value is negative

Excuse me,author. I met a problem when I use the mutual information, its loss value is negative at the beginning. Is this normal?

How was the result with full loss used?

Thanks for your generous contribution. I'd like to know how was the performance with entire Loss (I noticed that the result you provided on readme is only local part).
By the way, which dataset did you evaluate on?

Why the experimental results are inferior to those reported in paper?

Hi, I have checked the code, and I think everything is ok. But the results are inferior to those reported in the paper. For example, the accuracy for CIFAR-10 with this code is about 60% (DeepInfoMax-Local), while that in the paper is about 70%.
Do you have any idea that why there is such a big difference?

Matching representations to a prior distribution is wrong?

@DuaneNielsen I think that you have not implemented adversarial matching of the distributions. You calculate PRIOR loss here with the y that comes from here, so when you do loss.backward() here the weights of the encoder will be altered so that the PRIOR loss is minimized - i.e. the PRIOR loss from the paper is maximized since you are doing negative of the original loss. So you are altering the weights of the encoder so that you maximize the PRIOR loss from the paper, rather than minimizing it. Have I understood this wrong?

model_path = Path(r'c:\data\deepinfomax\models\run1\encoder' + str(epoch))

Is the file suffix missing in line 24 of cluster.py?

Great Code!

amazing! were you able to get up to 69 accuracy? just asking

some potential bugs w.r.t BN layers

Thanks for your sharing! I notice that you use batchnorm layers in encoder and do not apply model.eval() in testing stage. This may bring performance degradation.

Why does prior distribution have no encoder loss？

the following code ：

term_a = torch.log(self.prior_d(prior)).mean()
term_b = torch.log(1.0 - self.prior_d(y)).mean()
PRIOR = - (term_a + term_b) * self.gamma

"-(term_a + term_b)" is the loss of Discriminator, and “term_b” is the loss of encoder( similar as generator of gan )

In the code you only backward Discriminator's loss(part of prior distribution), and there is no backward of the loss that belongs to the encoder in the prior distribution.

loss.backward()  // loss = global+local + prior , prior =-(term_a+term_b)
optim.step()
loss_optim.step()

I think it could be the following process

term_a = torch.log(self.prior_d(prior)).mean()
term_b = torch.log(1.0 - self.prior_d(y.detach())).mean()  // y should be detach
PRIOR = - (term_a + term_b) * self.gamma
encoder_loss_for_p = term_b
.............

loss.backward()  // loss = global+local + prior , prior =-(term_a+term_b)
optim.step()   //update the gradient from global+local but no prior
loss_optim.step()

encoder_loss_for_p.backward()   //optim the encoder for Adversarial
optim.step()

Is my understanding wrong？

How would I apply this to non-image (1-dimensional) data?

First off, thanks for the implementation of this code, it's great!

I'm interested in applying DIM to non-image data, i.e., I just have a collection of feature vectors (not images) that I'd like to encode and maximise information between the original feature vectors and their new embeddings. I'm trying to translate the problem from 2D inputs to 1D inputs.

I have three questions:

Does doing this even make sense? I can't see why the principle of maximising information between the original representation and the embedding wouldn't apply to 1D inputs.
How can I implement this? As far as I understand it, the local embeddings are 2D feature maps, and the global embeddings are 1D vectors. Obviously in the 1D setting, these 2D feature maps disappear, but the 1D global embeddings remain the same. Could the local embeddings be replaced with 1D embeddings of some sort (rather than 2D maps)? The discriminator models that used 2D convolutions would therefore need to be updated.
Why does the GlobalDiscriminator model have 2D convolutional layers? It was my understanding that for the global discriminator the local feature maps should be flattened and concatenated with the global embedding, but based on the code it seems the local feature maps are being further processed before being concatenated with the global embedding? Could you clarify this please?

Thanks in advance!

Question about the loss function

Is the fake and real term switched since we are maximizing the objective function?

epoch restart?

Hi
After I download your project, I try to run your train.py with default setting, but I found out that there is no any weight file in the document, should I run the rcalland`s deep-INFOMAX first to get the network weight file first?

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

This model works perfectly beforen
However,recently update my torch and torchvision into latest version and got this error.

loss score become negative infinity and nan

I'm using this model on a vehicle dataset and trying to create image embedding. During the training, at some point, loss suddenly become negative infinity and eventually become nan. Have you guys encounter this issue yet? What do you think possibly make this happen?

The prior is wrong

The objective function of matching the prior is a min-max function，but in your code I can not see the min-max procedure，I think there is bug in your code.

Questions about loss functions

Thanks for the amazing code.
I was wondering why there is no log term with in LOCAL = (Em - Ej) * self.beta; GLOBAL = (Em - Ej) * self.alpha I think there should be a log term with Em.

EJ[Tω(x, y)] − log EM[eTω(x,y)], And For EJ why direct expectation is taken rather than of exponent.

Thanks in advance!

Why this code don't apply adversarial training?

It didn't train the discriminator for prior distirbution. I am very confused about this...

FileNotFoundError: [Errno 2] No such file or directory: 'c:\\data\\deepinfomax\\models\\run5/encoder860.wgt'

sorry,i didn't find the file
FileNotFoundError: [Errno 2] No such file or directory: 'c:\data\deepinfomax\models\run5/encoder860.wgt'

Why do you fine-tune the encoder when training a classifier?

After we have trained our DIM encoder we should train just a classifier on top of the representations. However, in your implementation you optimize the whole model. But as far as I understand, we should use DIM's representations as is, i.e. without fine-tuning. What am I missing?