snowkylin / line Goto Github PK

View Code? Open in Web Editor NEW

203.0 203.0 83.0 3.28 MB

TensorFlow implementation of paper "LINE: Large-scale Information Network Embedding" by Jian Tang, et al.

Python 100.00%

line's People

Contributors

Stargazers

Watchers

Forkers

hangtianfeige tinglo ambier gaoyz0625 dawnranger songfgh afcarl tommylee3003 esadr haonanli kyungeuuun wengfna hensonwells watarukudo0914 liuzixing samanthachen majiga coderhaohao 0x01111 generalsemantics flylearning raymondhliu cljxhouse lqfarmer haiyu94 jackwangsysu1234 xxg1993 xiabin1 liujinseu peinwu ssyygam syd951186545 andefriday suyc123 slsally 1oscar vladthesav yoohu gauller zhhhzhang hanzhizhuo zjuliuyan miaomiao17 lukebelieves dangchienhsgs liuqiankun666 tichocan coffeeclh kero13 treenewbee0 florence2me liuweiping2020 sparkoor starinngchild caesarsar whxhx huanghqdx onexming xuxiaohan liuchuang0059 vogalbraith lilixu023 yxg1 songjgit zilinglin shangdehao1 planet-b612 kiminh stockholm0101 rotcx smith6036 kingdrew0801 yx1996-zd global-localhost global19 global19-atlassian-net lfchener jeffchen2020 simonebonato zuoxijunxifu li-xinqi githubtpx hihihi175hihihi

line's Issues

sorry, this is not an issue. just my mistake.

if not self.g.has_edge(self.node_index_reversed[negative_node], self.node_index_reversed[edge[0]])， can you get negative samples for second-order ?

loss function different from the paper

Thanks for your elegant implementation of LINE.

I notice that loss function in your code:

self.inner_product = tf.reduce_sum(self.u_i_embedding * self.u_j_embedding, axis=1)
self.loss = -tf.reduce_mean(tf.log_sigmoid(self.label * self.inner_product))

seems to implement a loss function like this:

$\log\sigma(u_j^Tu_i - \sum_{n=1}^Ku_n^Tu_i)$

which is different from the function in your slide:

$\log\sigma(u_j^Tu_i) + \sum_{n=1}^K\log\sigma(-u_n^Tu_i)$

However, the embedding learned by this code is feasible in my experiment, could anybody explain this? Thanks in advance.

hi，i am confused about the loss in your code.

second order similarity issue

When we handle the second order similarity, why we need a random switch node strategy (i.e. beginning/ending nodes)?

line/utils.py

Line 41 in 4cdfa7a

    
           if np.random.rand() > 0.5:      # important: second-order proximity is for directed edge

In the original paper, it enables developers to utilize the homogenous graph ("an undirected edge can
be considered as two directed edges with opposite directions and equal weights").

low efficiency of `tf.matmul`

tf.matmul(tf.one_hot(self.u_i, depth=args.num_of_nodes), self.embedding)

Is it better to use tf.gather?

tf.gather(self.embedding, self.u_i)

Hello

Need I construct the graph by myself?The file in the data directory is useless?I'm a rookie. I hope to get your reply.Thanks!

can i only maintain the second-order proximity?

hi! sorry to bother. i only want to keep the second-order proximity by deleting the first-order proximity. how should i do it? thanks!

About the embeddings of second-order similarity

According to the original paper, the second-order similarity should be the concatenation of the embeddings and context embeddings. Maybe you miss the concatenation operation.

How to get co-authorship_graph.pkl?

Can you share the source dataset.

How can i get embedding between 0-1?

Hi, i have got some problem when i want to get embedding between 0 and 1, even when i initialize between 0 and 1, it doesn't work, how should i do?

Run on CPU

Can we run the Tensorflow version on CPU?

data replace

I can run this program, but why I can't replace data file, and the error is KeyError：‘weight’.

Loss only decrease from 2.2 to 1.8

I use the default setting and your dataset, running 2w batches. It seems the loss wouldn't decease too much, only from 2.2 to 1.8. I wonder if there is something wrong with my experiments?

run on multiple cores?

Hi there,

I can run the code smoothly, but it is extremely slow in my case (I have around 400,000 nodes and 300,000 edges). Is it possible to setup multiple CPU cores to speed up the process?

Cheers,
Weisi

Traceback (most recent call last):
File "line.py", line 69, in
main()
File "line.py", line 23, in main
train(args)
File "line.py", line 29, in train
data_loader = DBLPDataLoader(graph_file=args.graph_file)
File "/home/yt/yantao/研究生课程/降维-可视化/LINE_code/line-master/utils.py", line 8, in init
self.num_of_nodes = self.g.number_of_nodes()
File "/usr/local/lib/python3.6/site-packages/networkx/classes/graph.py", line 798, in number_of_nodes
return len(self._node)
AttributeError: 'Graph' object has no attribute '_node'
can you tell me how to solve this problem?

Question about negative sampling in utils.py

Hi,

I read your algorithm and found one part I cannot understand. In the function of fetch_batch, you have the negative sampling for the center node edge[0], but in line 54, you check whether there is an edge between the negative node and edge[1], the line is
if not self.g.has_edge(self.node_index_reversed[edge[0]], self.node_index_reversed[negative_node])
If I understand negative sampling correctly, we need to check whether there is an edge between center node edge[0] and negative node. If there is no link between them, we can add the negative node as a negative sample for edge[0].

Am I right ? Or I missed something.

Looking forward.

数据集

请问保存成pkl文件的数据集是怎么处理的？

Some errors occur

Hello snowkylin,
Thank you for your code! I have run the code on my dataset, the dataset contains 10 sub-datasets, so I wrote a "for" code to call the procedure of line, but when it runs to the second sub-dataset, it comes an error, it says the "'target_embedding'" has already defined, so I add an "auto reuse" to this variable, however, it comes to a second error, it says it expects the shape of "'target_embedding'" as (xxx,ppp) but it has been found the shape as (yyy,zzz), two shapes are not the same, I couldn't find it out, could help me to address it?
Thank you so much!
Bests,
Xiuling