Hi, I have two questions about the paper that I can't understand to ask you for full s

Computing for probability mass doesn't necessarily mean exclusive

A question about Negative sampling？ about vae_cf HOT 10 OPEN

dawenl commented on July 19, 2024

A question about Negative sampling？

from vae_cf.

Comments (10)

dawenl commented on July 19, 2024

Computing for probability mass doesn't necessarily mean exclusive. Multinomial can allow multiple non-zero entries. If two items tend to co-occur, the model can certainly learn to give both probability mass.
In Eq (3), the purpose of c_{ui} is to downweight 0 (since c_{ui} != 0 even when x_{ui} = 0), which is equivalent to negative sampling. I am not sure why you think it only cares about 1. I didn't do logistic with negative sampling because I didn't find it much more helpful. If you can get logistic with negative sampling with better results (all the necessary code should be available to you), please let me know and I am happy to include that in an updated version of the paper on arxiv. Whatever NCF used in the public source code is what I used.

from vae_cf.

ConanCui commented on July 19, 2024

The first problem I understand.

For the second problem, I can understand the situation c_{ui} != 0. I use your code and use the Logistic likelihood function with Negative sampling to replace the loss function as below,

the omega^{+}_{u} means the set that contains the all positive observe sample of user u, omega^{-}_{u} means the set that contain negative samples randomly sampling from the interaction history of user u except the positive samples. And I note the ratio of the Negative sample as K which is equal (the number of omega^{-}_{u} / the number of omega^{+}_{u}).

I do some experiments using the Logistic likelihood function with Negative sampling in setting of different K. And I find the conclusion that the performance is improve by enlarge the K. And the performance of Logistic likelihood function reach best when take all the zero entry as negative samples.
But the best performance of Logistic likelihood function is still worse than Multinomial.

And I find that there is a paper which take the variational auto-encoder with Logistic likelihood function[1], being similar with your work. And the Negative sampling improve his result a lot.

[1] Augmented Variational Autoencoders for Collaborative Filtering with Auxiliary Information,2017,CIKM.
http://aai.kaist.ac.kr/xe2/module=file&act=procFileDownload&file_srl=18019&sid=4be19b9d0134a4aeacb9ef1ecd81c784&module_srl=1379

from vae_cf.

dawenl commented on July 19, 2024

I am not sure I follow, but for logistic isn't what I did is using all the 0's as negatives?

from vae_cf.

dawenl commented on July 19, 2024

I think I understand now, and maybe you misunderstood what I did -- for both Gaussian and logistic, I used all the 0's in the training. With Gaussian, I applied the c_{ui} weight which is in effect down-weighting all the negatives. With logistic, I simply used all the 0's, which I think corresponds to what you mean by setting K to the largest possible.

from vae_cf.

ConanCui commented on July 19, 2024

Hi, I have some doubt about how to apply your data split method in the baseline WMF. As I know, the data split method you use like below,

Each row in the matrix represent the interaction data for a user on all items. The interaction data in blue rectangle is used for train, the data in red rectangle is used for getting the necessary representation for test users , and the green is used for compute the NDCG.
As I know, the WMF need know all the users in the process of training. How do you use this data split method for WMF as a baseline ?

from vae_cf.

dawenl commented on July 19, 2024

Your diagram looks correct. (One minor detail is that the splitting between red and green for each test user is random, not like certain items will only in red or green for all test users, so just to make that clear.)

I think there is only one sensible way to do it. Rather than me directly feeding you the answer, maybe you can think about it first and tell me how you would do it?

from vae_cf.

ConanCui commented on July 19, 2024

You are right, the split is random. To see it simply, I draw the diagram like the above.
I have tried to use the data in blue and red rectangle to train the WMF, cause the data two together have all the users, then predict the result of green rectangle. But I am wondering if there is something wrong. Cause, with this train strategy, the interaction data of test set(red rectangle) influenced the learn-able parameters of WMF (user and items latent embeddings). This means I leak the test set in the training process. For VAE, Although the data in red rectangle is used to get the necessary representation for test user, the data doesn't have an influence on the learn-able parameters of VAE.

This is how I think, but I think training WMF like this exists some problems above. Is there anything wrong, and how do you do it?

from vae_cf.

dawenl commented on July 19, 2024

Yes, you are right that this would leak the validation data for WMF. A simple fix (this is how I did) is to train WMF only with the blue box and only keep the item factors. Then during evaluation, keep the item factors fixed and learn the validation user factors (which corresponds to one ALS update) with the red box and make prediction for the green box. This is known as strong generalization.

from vae_cf.

JoaoLages commented on July 19, 2024

I wonder why you didnt use Binary cross entropy over Cross entropy also. Since it is a multi-label problem.
I also wonder why negative sampling or another technique wasnt applied since you vocabulary is very large.

from vae_cf.

JoaoLages commented on July 19, 2024

Also, in production, how do you represent new videos with this architecture?

from vae_cf.

A question about Negative sampling？ about vae_cf HOT 10 OPEN

Comments (10)

Related Issues (18)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent