clhchtcjj / bine Goto Github PK

View Code? Open in Web Editor NEW

224.0 224.0 79.0 58.01 MB

BiNE: Bipartite Network Embedding

Python 100.00%

bine's People

Contributors

Stargazers

Watchers

Forkers

wsgan001 tandychao timeflow-lab seasa2016 xiongfeihtp ikihiyori weisha2018 khashwung filipefbn songfgh ssuperyuri huizhaowang wentaotao banfengxing v1clyn y19941115mx ckingdev schaelle xy1234552 lucian-whu hbghhy jakisou xiaotret zlizy shubhampachori12110095 notorioush2 kevinguom rongchen89 junhaowang hbtom ys-jung baoxiangchen codeants2012 coreybiao papachristoumarios jiajiadf masoudmlk hukaiwlw caojiangxia liuchuang0059 sagarjoglekar isabelchaves xyc1120310104 maitreyi96 a96123155 donvink karan1508 mysqlsc bjuthjliu zshwuhan kamath rodney-wang neenerrh li-study changzhijiang alliedtoasters qingwwu zzg2008 sxxtyz andyl-n lostdirt world4jason ht445 zhangzee loganjindev berit-martine ruchikdama24 cypablo123456cy fabriziocarta88 orlanooz cdx08222028 gusijia hmj5 alanadeng emersonzc nathanaelg16 apolanco115

bine's Issues

Actual python dependencies

I just had to spend a while getting versions to line up to run this code. For anyone else searching for this, here is a more accurate list of required python modules and version. Some were missing from the README.

python                    2.7.14
datasketch                1.4.1
futures                   3.2.0
networkx                  2.2
numpy                     1.16.0
pandas                    0.23.4
scikit-learn              0.20.0
scipy                     1.1.0
six                       1.11.0

the problem in top_n functin

I find that the code in line 180 of train.py
tmp_t = sorted(test_rate[u].items(), lambda x, y: cmp(x[1], y[1]), reverse=True)[0:min(len(test_rate[u]),len(test_rate[u]))]
you use
min(len(test_rate[u]),len(test_rate[u]))
but they are the same.

Dataset

Need a little bit information on how to prepare my dataset. I have a dat file containing all the ratings place in the model folder so that it can be read, renamed and splitted into train and test. but it seem it is not reading anything

networkx.exception.NetworkXError: HITS: power iteration failed to converge in 102 iterations.

hello,everyone,When I was running the BINE code on a new dataset AIV, the following error occurred. Do you know how to solve it？

Number of maximum iteration

Please why is 50 chosen as the max number of iterations. As I try to run with higher iteration and it gives better recommendation results

train.py 73行多了一个x

If the ground truth is not 0 or 1, how to perform negative sampling?

For example assume it's 0.5.

I think it means the sample is positive in 50% time and negative in 50% time. Is it OK?

biadjacency_matrix issue

Any idea how to solve this?

  File "train.py", line 53, in walk_generator
    gul.homogeneous_graph_random_walks_for_large_bipartite_graph(datafile=args.train_data, percentage=args.p, maxT=args.maxT, minT=args.minT)
  File "/tmp2/cmchen/proRec/BiNE/model/graph_utils.py", line 111, in homogeneous_graph_random_walks_for_large_bipartite_graph
    A,row_index,item_index= bi.biadjacency_matrix(self.G, self.node_u, self.node_v, dtype=np.float,weight='weight', format='csr')
ValueError: too many values to unpack

networkx.exception.NetworkXError:HITS: power iteration failed to converge in 102 iterations.

Is this the reason? And how to solve in this case?It seems it cannot simply solved by replacing a function.

Traceback (most recent call last):
  File "D:/programming/BiNE/model/train.py", line 572, in <module>
    sys.exit(main())
  File "D:/programming/BiNE/model/train.py", line 569, in main
    train_by_sampling(args)
  File "D:/programming/BiNE/model/train.py", line 321, in train_by_sampling
    walk_generator(gul,args)
  File "D:/programming/BiNE/model/train.py", line 55, in walk_generator
    gul.calculate_centrality()
  File "D:\programming\BiNE\model\graph_utils.py", line 61, in calculate_centrality
    h, a = nx.hits(self.G)
  File "D:\Anaconda3.5\envs\BiNE\lib\site-packages\networkx\algorithms\link_analysis\hits_alg.py", line 111, in hits
    "HITS: power iteration failed to converge in %d iterations."%(i+1))
networkx.exception.NetworkXError: HITS: power iteration failed to converge in 102 iterations.

skip-gram center and context word?

非常感谢您分享的代码。
在skip-gram，我有些问题请教下您，
I_z = {center: 1}这个地方是不是应该是计算context的节点吧，
V = np.array(node_list[contexts]['embedding_vectors']) 应该是计算center的节点embedding吧，
最终更新的是
for z in context_u:
tmp_z, tmp_loss = skip_gram(u, z, neg_u, node_list_u, lam, alpha)
node_list_u[z]['embedding_vectors'] += tmp_z ## 这里是不是更新center节点的embedding吧？

十分期待您的解答！

About node visualization

Hi Leihui, I am quite interested in the node visualization performance in your paper. However, I can not reproduce the TSNE results as shown in your paper. Could you please share the code of node visualization, thanks.

Low speed

I used a data set which contains 0.2 million links to get the embedding. But after running for 8 hours , the program still got stuck in the graph construction.

Are there some ways to speed up the program?

not suit for large dataset...

i try to use bine on a user-item interaction network with 1 million users and similar scale of items. the implement now is stack at construct graph... i set the "large" option to 2. and it didnt help.
is there anyway to speed up the training.
plus, there is still around 100 GB memory unused in my machine and only 1 cpu is fully used.
hope for the answer.

About data set segmentation

Has anyone converted this code to python3？

Why context vector will be updated by SGD?

Hi I want to understand why we need to update the context vector of user nodes and item nodes when performing the skipgram model.

This is not the case in node2vec afaik (and node2vec is using one-hot vector). May I ask if there is any reason behind it?