snowkylin / line Goto Github PK
View Code? Open in Web Editor NEWTensorFlow implementation of paper "LINE: Large-scale Information Network Embedding" by Jian Tang, et al.
TensorFlow implementation of paper "LINE: Large-scale Information Network Embedding" by Jian Tang, et al.
Thanks for your elegant implementation of LINE.
I notice that loss function in your code:
self.inner_product = tf.reduce_sum(self.u_i_embedding * self.u_j_embedding, axis=1)
self.loss = -tf.reduce_mean(tf.log_sigmoid(self.label * self.inner_product))
seems to implement a loss function like this:
which is different from the function in your slide:
However, the embedding learned by this code is feasible in my experiment, could anybody explain this? Thanks in advance.
When we handle the second order similarity, why we need a random switch node strategy (i.e. beginning/ending nodes)?
Line 41 in 4cdfa7a
In the original paper, it enables developers to utilize the homogenous graph ("an undirected edge can
be considered as two directed edges with opposite directions and equal weights").
tf.matmul(tf.one_hot(self.u_i, depth=args.num_of_nodes), self.embedding)
Is it better to use tf.gather
?
tf.gather(self.embedding, self.u_i)
Need I construct the graph by myself?The file in the data directory is useless?I'm a rookie. I hope to get your reply.Thanks!
hi! sorry to bother. i only want to keep the second-order proximity by deleting the first-order proximity. how should i do it? thanks!
According to the original paper, the second-order similarity should be the concatenation of the embeddings and context embeddings. Maybe you miss the concatenation operation.
Can you share the source dataset.
Hi, i have got some problem when i want to get embedding between 0 and 1, even when i initialize between 0 and 1, it doesn't work, how should i do?
Can we run the Tensorflow version on CPU?
I can run this program, but why I can't replace data file, and the error is KeyError:‘weight’.
I use the default setting and your dataset, running 2w batches. It seems the loss wouldn't decease too much, only from 2.2 to 1.8. I wonder if there is something wrong with my experiments?
Hi there,
I can run the code smoothly, but it is extremely slow in my case (I have around 400,000 nodes and 300,000 edges). Is it possible to setup multiple CPU cores to speed up the process?
Cheers,
Weisi
Traceback (most recent call last):
File "line.py", line 69, in
main()
File "line.py", line 23, in main
train(args)
File "line.py", line 29, in train
data_loader = DBLPDataLoader(graph_file=args.graph_file)
File "/home/yt/yantao/研究生课程/降维-可视化/LINE_code/line-master/utils.py", line 8, in init
self.num_of_nodes = self.g.number_of_nodes()
File "/usr/local/lib/python3.6/site-packages/networkx/classes/graph.py", line 798, in number_of_nodes
return len(self._node)
AttributeError: 'Graph' object has no attribute '_node'
can you tell me how to solve this problem?
Hi,
I read your algorithm and found one part I cannot understand. In the function of fetch_batch, you have the negative sampling for the center node edge[0], but in line 54, you check whether there is an edge between the negative node and edge[1], the line is
if not self.g.has_edge(self.node_index_reversed[edge[0]], self.node_index_reversed[negative_node])
If I understand negative sampling correctly, we need to check whether there is an edge between center node edge[0] and negative node. If there is no link between them, we can add the negative node as a negative sample for edge[0].
Am I right ? Or I missed something.
Looking forward.
请问保存成pkl文件的数据集是怎么处理的?
Hello snowkylin,
Thank you for your code! I have run the code on my dataset, the dataset contains 10 sub-datasets, so I wrote a "for" code to call the procedure of line, but when it runs to the second sub-dataset, it comes an error, it says the "'target_embedding'" has already defined, so I add an "auto reuse" to this variable, however, it comes to a second error, it says it expects the shape of "'target_embedding'" as (xxx,ppp) but it has been found the shape as (yyy,zzz), two shapes are not the same, I couldn't find it out, could help me to address it?
Thank you so much!
Bests,
Xiuling
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.