Giter Site home page Giter Site logo

bns's Introduction

Bayesian Negative Sampling

Accepted by International Conference on Data Engineering (ICDE 2023)

@INPROCEEDINGS{Bin:ICDE:2023,

author={Liu, Bin and Wang, Bang},

booktitle={2023 IEEE 39th International Conference on Data Engineering (ICDE)},

title={Bayesian Negative Sampling for Recommendation},

year={2023},

pages={749-761},

doi={10.1109/ICDE55515.2023.00063}}

Required Packages

  • numpy : Implement BNS for Matrix Factorization (MF) (run main_MF.py );
  • pytorch: Implement BNS for light Graph Convolution Network (LightGCN) (run main_lightGCN.py ).

About BNS Framework:

We summarize the Bayesian negative sampling algorithm as: Randomly select a small set of candidates $\mathcal{M}_u$ from the negative instances set, for each negative instances $l \in \mathcal{M}_u$ do:

(i) Computing prior probability

By assuming $pop_l \sim B (N, P_{fn}(l))$, we compute prior probability as: $$P_{fn}(l) = \frac{pop_l}{N}$$ This step needs $\mathcal{O}(1)$.

(ii) Computing cumulative distribution function (likelihood)

$$ F_{n}(\hat{x} _ l) = \frac{1}{n} \sum I_ {|x_ \cdot \leq \hat{x}_l |} $$

This step needs $\mathcal{O}(|\mathcal{I}|)$.

The empirical distribution function $F_n (\cdot)$ converges to common cumulative distribution function $F(\cdot)$ almost surely by the strong law of large numbers. Glivenko Theorem (1933) strengthened this result by proving uniform convergence of $F_n(\cdot)$ to $F(\cdot)$. This property makes it possible for us to compute the $F(\cdot)$, even if we do not know its explicit expression. Given the observation $\hat{x}$, $F(\hat{x})$
assigns a probabilistic prediction valued in $[0,1]$ of $l$ being false negative (positive).

(iii) Computing negative signal (posterior probability)

$$ \mathtt{unbias}(l) = \frac{[1 - F(\hat{x} _ l)] [1-P _ {fn} (l)] }{1 - F(\hat{x}_ l) -P_{fn}(l) + 2F(\hat{x}_ l)P_{fn}(l)} $$

Note that $\mathtt{unbias}(l)$ is an unbiased estimator for $l$ being true negative (see Lemma 3.1). This step needs $\mathcal{O}(1)$.

(iv) Performing Bayesian negative sampling

$$ j = \mathop{\arg\min}\limits_{l \in\mathcal{M}_u}~ \mathtt{info}(l)\cdot [1-(1+\lambda)\mathtt{unbias}(l)]$$

If the sampler $h^*$ minimizes the conditional sampling risk $R(l|i)$, then the empirical sampling risk will be minimized (see Theorem 3.1). Thus the Bayesian optimal sampling rule is essentially sampling the instance with minimal conditional sampling risk. This step needs $\mathcal{O}(1)$.

If you have any questions, please feel free to contact contact "[email protected]; [email protected]". More details about BNS(Bin Liu & Bang Wang, 2022) see our paper or poster at https://doi.org/10.48550/arXiv.2204.06520 .

Distribution Analysis

The key is the class conditional density. On the basis of order relation analysis of negatives' scores, we derive the class conditional density of true negatives and that of false negatives, and provide an affirmative answer from a Bayesian viewpoint to distinguish true negatives from false negatives.

Real distribution

Theoretical distribution

We exhibits the distribution morphology of false negatives and true negatives derived from the ordinal relation with different types of $f(x)$: Gaussian distribution $x \sim \mathcal{N}(\mu,\sigma)$ (symmetrical), student distribution $x \sim t(n)$ (symmetrical), and Gamma distribution $x\sim Ga(\alpha,\lambda)$ (asymmetrical) . As we can see, during the training process, the real distribution of true/false negatives gradually exhibit the same structure as theoretical distribution. Although the probability density function $f(x)$ changes during the training process, this separated structure is sufficient for us to classify the true negative instances from false negative instances.

Dataset Description:

MovieLens100K; MovieLens1M; Yahoo!-R3.

Our Bayesian Negative Sampling algothrim does not depend on the datasets. Just split the dataset you are interested in into the training set and test set, replace the train.csv and test.csv files, you can run Bayesian negative sampling easily. You just need to make sure that you have chosen appropriate prior information for modeling prior probability $P_{tn}(\cdot)$ or $P_{fn}(\cdot)$.

bns's People

Contributors

liubin06 avatar

Stargazers

Sunsik Kim avatar Vuong Toan avatar  avatar Ramsey avatar HUST MinsLab avatar  avatar  avatar Yifan Zhuo avatar  avatar  avatar  avatar  avatar Wang  Zhe avatar  avatar Lingyun Lu avatar  avatar  avatar  avatar  avatar  avatar Che avatar resistzzz avatar Zhang Chong avatar  avatar  avatar 王家山 avatar  avatar Jimmy avatar  avatar  avatar  avatar  avatar Jason Yin avatar Kang Yan avatar  avatar  avatar Chaoyang avatar Fei Wang avatar  avatar cxiang avatar 张三 avatar

Watchers

Jason Yin avatar

Forkers

guoyi0

bns's Issues

关于BNS的一些疑问

尊敬的BNS作者您好,我是同济大学的一名本科生,最近拜读了您的作品——《Bayesian Negative Sampling for Recommendation》后深有感触,有一些疑问想询问您,所以冒昧给您的邮箱发了邮件,期待您的回信!祝您学业一帆风顺,A会发发发!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.