Comments (10)
-
Computing for probability mass doesn't necessarily mean exclusive. Multinomial can allow multiple non-zero entries. If two items tend to co-occur, the model can certainly learn to give both probability mass.
-
In Eq (3), the purpose of c_{ui} is to downweight 0 (since c_{ui} != 0 even when x_{ui} = 0), which is equivalent to negative sampling. I am not sure why you think it only cares about 1. I didn't do logistic with negative sampling because I didn't find it much more helpful. If you can get logistic with negative sampling with better results (all the necessary code should be available to you), please let me know and I am happy to include that in an updated version of the paper on arxiv. Whatever NCF used in the public source code is what I used.
from vae_cf.
The first problem I understand.
For the second problem, I can understand the situation c_{ui} != 0. I use your code and use the Logistic likelihood function with Negative sampling to replace the loss function as below,
the omega^{+}_{u}
means the set that contains the all positive observe sample of user u, omega^{-}_{u}
means the set that contain negative samples randomly sampling from the interaction history of user u except the positive samples. And I note the ratio of the Negative sample as K which is equal (the number of omega^{-}_{u}
/ the number of omega^{+}_{u}
).
I do some experiments using the Logistic likelihood function with Negative sampling in setting of different K. And I find the conclusion that the performance is improve by enlarge the K. And the performance of Logistic likelihood function reach best when take all the zero entry as negative samples.
But the best performance of Logistic likelihood function is still worse than Multinomial.
And I find that there is a paper which take the variational auto-encoder with Logistic likelihood function[1], being similar with your work. And the Negative sampling improve his result a lot.
[1] Augmented Variational Autoencoders for Collaborative Filtering with Auxiliary Information,2017,CIKM.
http://aai.kaist.ac.kr/xe2/module=file&act=procFileDownload&file_srl=18019&sid=4be19b9d0134a4aeacb9ef1ecd81c784&module_srl=1379
from vae_cf.
I am not sure I follow, but for logistic isn't what I did is using all the 0's as negatives?
from vae_cf.
I think I understand now, and maybe you misunderstood what I did -- for both Gaussian and logistic, I used all the 0's in the training. With Gaussian, I applied the c_{ui} weight which is in effect down-weighting all the negatives. With logistic, I simply used all the 0's, which I think corresponds to what you mean by setting K to the largest possible.
from vae_cf.
Hi, I have some doubt about how to apply your data split method in the baseline WMF. As I know, the data split method you use like below,
Each row in the matrix represent the interaction data for a user on all items. The interaction data in blue rectangle is used for train, the data in red rectangle is used for getting the necessary representation for test users , and the green is used for compute the NDCG.
As I know, the WMF need know all the users in the process of training. How do you use this data split method for WMF as a baseline ?
from vae_cf.
Your diagram looks correct. (One minor detail is that the splitting between red and green for each test user is random, not like certain items will only in red or green for all test users, so just to make that clear.)
I think there is only one sensible way to do it. Rather than me directly feeding you the answer, maybe you can think about it first and tell me how you would do it?
from vae_cf.
You are right, the split is random. To see it simply, I draw the diagram like the above.
I have tried to use the data in blue and red rectangle to train the WMF, cause the data two together have all the users, then predict the result of green rectangle. But I am wondering if there is something wrong. Cause, with this train strategy, the interaction data of test set(red rectangle) influenced the learn-able parameters of WMF (user and items latent embeddings). This means I leak the test set in the training process. For VAE, Although the data in red rectangle is used to get the necessary representation for test user, the data doesn't have an influence on the learn-able parameters of VAE.
This is how I think, but I think training WMF like this exists some problems above. Is there anything wrong, and how do you do it?
from vae_cf.
Yes, you are right that this would leak the validation data for WMF. A simple fix (this is how I did) is to train WMF only with the blue box and only keep the item factors. Then during evaluation, keep the item factors fixed and learn the validation user factors (which corresponds to one ALS update) with the red box and make prediction for the green box. This is known as strong generalization.
from vae_cf.
I wonder why you didnt use Binary cross entropy over Cross entropy also. Since it is a multi-label problem.
I also wonder why negative sampling or another technique wasnt applied since you vocabulary is very large.
from vae_cf.
Also, in production, how do you represent new videos with this architecture?
from vae_cf.
Related Issues (18)
- Making a script out of the notebook? HOT 2
- One problem reveals when I rerun the program HOT 5
- Error in python 3.5+ HOT 7
- beta annealing
- getting NaN values in ndcg HOT 3
- Normalization of multinomial probability
- Getting Error When importing "import apply_regularization, l2_regularizer" HOT 2
- All kind of measure (Recall, NDCG_binary_at_k_batch) alway return NaN HOT 3
- confused about split data
- Request for Modification: filter_triplets Function
- A question about the way of how to split data HOT 5
- Not an issue, just a question about the other datasets HOT 2
- some question HOT 2
- Running on Python 3.5+ HOT 3
- Superscript “PR” means partial regularization or personal ranking? HOT 1
- Implementation of CDAE (as a baseline) HOT 1
- Question about l2 normalization of input HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vae_cf.