duanenielsen / deepinfomaxpytorch Goto Github PK
View Code? Open in Web Editor NEWLearning deep representations by mutual information estimation and maximization
Home Page: https://arxiv.org/abs/1808.06670
Learning deep representations by mutual information estimation and maximization
Home Page: https://arxiv.org/abs/1808.06670
the following code :
term_a = torch.log(self.prior_d(prior)).mean()
term_b = torch.log(1.0 - self.prior_d(y)).mean()
PRIOR = - (term_a + term_b) * self.gamma
"-(term_a + term_b)" is the loss of Discriminator, and “term_b” is the loss of encoder( similar as generator of gan )
In the code you only backward Discriminator's loss(part of prior distribution), and there is no backward of the loss that belongs to the encoder in the prior distribution.
loss.backward() // loss = global+local + prior , prior =-(term_a+term_b)
optim.step()
loss_optim.step()
I think it could be the following process
term_a = torch.log(self.prior_d(prior)).mean()
term_b = torch.log(1.0 - self.prior_d(y.detach())).mean() // y should be detach
PRIOR = - (term_a + term_b) * self.gamma
encoder_loss_for_p = term_b
.............
loss.backward() // loss = global+local + prior , prior =-(term_a+term_b)
optim.step() //update the gradient from global+local but no prior
loss_optim.step()
encoder_loss_for_p.backward() //optim the encoder for Adversarial
optim.step()
Hi, I have checked the code, and I think everything is ok. But the results are inferior to those reported in the paper. For example, the accuracy for CIFAR-10 with this code is about 60% (DeepInfoMax-Local), while that in the paper is about 70%.
Do you have any idea that why there is such a big difference?
The objective function of matching the prior is a min-max function,but in your code I can not see the min-max procedure,I think there is bug in your code.
Excuse me,author. I met a problem when I use the mutual information, its loss value is negative at the beginning. Is this normal?
First off, thanks for the implementation of this code, it's great!
I'm interested in applying DIM to non-image data, i.e., I just have a collection of feature vectors (not images) that I'd like to encode and maximise information between the original feature vectors and their new embeddings. I'm trying to translate the problem from 2D inputs to 1D inputs.
I have three questions:
Thanks in advance!
amazing! were you able to get up to 69 accuracy? just asking
This model works perfectly beforen
However,recently update my torch and torchvision into latest version and got this error.
sorry,i didn't find the file
FileNotFoundError: [Errno 2] No such file or directory: 'c:\data\deepinfomax\models\run5/encoder860.wgt'
Thanks for your generous contribution. I'd like to know how was the performance with entire Loss (I noticed that the result you provided on readme is only local part).
By the way, which dataset did you evaluate on?
It didn't train the discriminator for prior distirbution. I am very confused about this...
After we have trained our DIM encoder we should train just a classifier on top of the representations. However, in your implementation you optimize the whole model. But as far as I understand, we should use DIM's representations as is, i.e. without fine-tuning. What am I missing?
I'm using this model on a vehicle dataset and trying to create image embedding. During the training, at some point, loss suddenly become negative infinity and eventually become nan. Have you guys encounter this issue yet? What do you think possibly make this happen?
Hi
After I download your project, I try to run your train.py with default setting, but I found out that there is no any weight file in the document, should I run the rcalland`s deep-INFOMAX first to get the network weight file first?
@DuaneNielsen I think that you have not implemented adversarial matching of the distributions. You calculate PRIOR loss here with the y that comes from here, so when you do loss.backward() here the weights of the encoder will be altered so that the PRIOR loss is minimized - i.e. the PRIOR loss from the paper is maximized since you are doing negative of the original loss. So you are altering the weights of the encoder so that you maximize the PRIOR loss from the paper, rather than minimizing it. Have I understood this wrong?
Is the file suffix missing in line 24 of cluster.py?
Thanks for the amazing code.
I was wondering why there is no log
term with in LOCAL = (Em - Ej) * self.beta; GLOBAL = (Em - Ej) * self.alpha
I think there should be a log term with Em.
EJ[Tω(x, y)] − log EM[eTω(x,y)], And For EJ why direct expectation is taken rather than of exponent.
Thanks in advance!
Thanks for your sharing! I notice that you use batchnorm layers in encoder and do not apply model.eval() in testing stage. This may bring performance degradation.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.