Giter Site home page Giter Site logo

bine's People

Contributors

clhchtcjj avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

bine's Issues

Actual python dependencies

I just had to spend a while getting versions to line up to run this code. For anyone else searching for this, here is a more accurate list of required python modules and version. Some were missing from the README.

python                    2.7.14
datasketch                1.4.1
futures                   3.2.0
networkx                  2.2
numpy                     1.16.0
pandas                    0.23.4
scikit-learn              0.20.0
scipy                     1.1.0
six                       1.11.0

the problem in top_n functin

I find that the code in line 180 of train.py
tmp_t = sorted(test_rate[u].items(), lambda x, y: cmp(x[1], y[1]), reverse=True)[0:min(len(test_rate[u]),len(test_rate[u]))]
you use
min(len(test_rate[u]),len(test_rate[u]))
but they are the same.

Dataset

Need a little bit information on how to prepare my dataset. I have a dat file containing all the ratings place in the model folder so that it can be read, renamed and splitted into train and test. but it seem it is not reading anything

Number of maximum iteration

Please why is 50 chosen as the max number of iterations. As I try to run with higher iteration and it gives better recommendation results

biadjacency_matrix issue

Any idea how to solve this?

  File "train.py", line 53, in walk_generator
    gul.homogeneous_graph_random_walks_for_large_bipartite_graph(datafile=args.train_data, percentage=args.p, maxT=args.maxT, minT=args.minT)
  File "/tmp2/cmchen/proRec/BiNE/model/graph_utils.py", line 111, in homogeneous_graph_random_walks_for_large_bipartite_graph
    A,row_index,item_index= bi.biadjacency_matrix(self.G, self.node_u, self.node_v, dtype=np.float,weight='weight', format='csr')
ValueError: too many values to unpack

networkx.exception.NetworkXError:HITS: power iteration failed to converge in 102 iterations.

Is this the reason? And how to solve in this case?It seems it cannot simply solved by replacing a function.

Traceback (most recent call last):
  File "D:/programming/BiNE/model/train.py", line 572, in <module>
    sys.exit(main())
  File "D:/programming/BiNE/model/train.py", line 569, in main
    train_by_sampling(args)
  File "D:/programming/BiNE/model/train.py", line 321, in train_by_sampling
    walk_generator(gul,args)
  File "D:/programming/BiNE/model/train.py", line 55, in walk_generator
    gul.calculate_centrality()
  File "D:\programming\BiNE\model\graph_utils.py", line 61, in calculate_centrality
    h, a = nx.hits(self.G)
  File "D:\Anaconda3.5\envs\BiNE\lib\site-packages\networkx\algorithms\link_analysis\hits_alg.py", line 111, in hits
    "HITS: power iteration failed to converge in %d iterations."%(i+1))
networkx.exception.NetworkXError: HITS: power iteration failed to converge in 102 iterations.

skip-gram center and context word?

非常感谢您分享的代码。
在skip-gram,我有些问题请教下您,
I_z = {center: 1}这个地方是不是应该是计算context的节点吧,
V = np.array(node_list[contexts]['embedding_vectors']) 应该是计算center的节点embedding吧,
最终更新的是
for z in context_u:
tmp_z, tmp_loss = skip_gram(u, z, neg_u, node_list_u, lam, alpha)
node_list_u[z]['embedding_vectors'] += tmp_z ## 这里是不是更新center节点的embedding吧?

十分期待您的解答!

About node visualization

Hi Leihui, I am quite interested in the node visualization performance in your paper. However, I can not reproduce the TSNE results as shown in your paper. Could you please share the code of node visualization, thanks.

Low speed

I used a data set which contains 0.2 million links to get the embedding. But after running for 8 hours , the program still got stuck in the graph construction.

Are there some ways to speed up the program?

not suit for large dataset...

i try to use bine on a user-item interaction network with 1 million users and similar scale of items. the implement now is stack at construct graph... i set the "large" option to 2. and it didnt help.
is there anyway to speed up the training.
plus, there is still around 100 GB memory unused in my machine and only 1 cpu is fully used.
hope for the answer.

Why context vector will be updated by SGD?

Hi I want to understand why we need to update the context vector of user nodes and item nodes when performing the skipgram model.

This is not the case in node2vec afaik (and node2vec is using one-hot vector). May I ask if there is any reason behind it?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.