Giter Site home page Giter Site logo

openne's Introduction

OpenNE (sub-project of OpenSKL)

OpenNE is a sub-project of OpenSKL, providing an Open-source Network Embedding toolkit for network representation learning (NRL), with TADW as key features to incorporate text attributes of nodes.

Overview

OpenNE provides a standard training and testing toolkit for network embedding. We unify the input and output interfaces of different NE models and provide scalable options for each model. Moreover, we implement typical NE models based on tensorflow, which enables these models to be trained with GPUs.

Models

Besides TADW for learning network embeddings with text attributes, we also implement typical models including DeepWalk LINE, node2vec, GraRep, , GCN, HOPE, GF, SDNE and LE. If you want to learn more about network embedding, visit another our NRL paper list.

Evaluation

To validate the effectiveness of this toolkit, we employ the node classification task for evaluation.

Settings

We show the node classification results of various methods in different datasets. We set representation dimension to 128, kstep=4 in GraRep. Note that, both GCN(a semi-supervised NE model) and TADW need additional text features as inputs. Thus, we evaluate these two models on Cora in which each node has text information. We use 10% labeled data to train GCN.

Wiki (Wiki dataset is provided by LBC project. But the original link failed.): 2405 nodes, 17981 edges, 19 labels, directed:

  • data/wiki/Wiki_edgelist.txt
  • data/wiki/Wiki_category.txt

Cora: 2708 nodes, 5429 edges, 7 labels, directed:

  • data/cora/cora_edgelist.txt
  • data/cora/cora.features
  • data/cora/cora_labels.txt

Running environment:
BlogCatalog: CPU: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz.
Wiki, Cora: CPU: Intel(R) Core(TM) i5-7267U CPU @ 3.10GHz.

Results

We report the Micro-F1 and Macro-F1 performance to quantify the effectiveness, and the running time for efficiency evaluation. Overall, OpenNE can reproduce the results in the original papers. Our proposed TADW achieves better performance than DeepWalk with the help of text attributes.

Wiki:

Algorithm Time Micro-F1 Macro-F1
DeepWalk 52s 0.669 0.560
LINE 2nd 70s 0.576 0.387
node2vec 32s 0.651 0.541
GraRep 19.6s 0.633 0.476
OpenNE(DeepWalk) 42s 0.658 0.570
OpenNE(LINE 2nd) 90s 0.661 0.521
OpenNE(Node2vec) 33s 0.655 0.538
OpenNE(GraRep) 23.7s 0.649 0.507
OpenNE(GraphFactorization) 12.5s 0.637 0.450
OpenNE(HOPE) 3.2s 0.601 0.438
OpenNE(LaplacianEigenmaps) 4.9s 0.277 0.073
OpenNE(SDNE) 39.6s 0.643 0.498

Cora:

Algorithm Dropout Weight_decay Hidden Dimension Time Accuracy
DeepWalk - - - 160 33.5s 0.713
TADW - - - 80*2 13.9s 0.780
GCN 0.5 5e-4 16 - 4.0s 0.790
OpenNE(TADW) - - - 80*2 20.8s 0.791
OpenNE(GCN) 0.5 5e-4 16 - 5.5s 0.789
OpenNE(GCN) 0 5e-4 16 - 6.1s 0.779
OpenNE(GCN) 0.5 1e-4 16 - 5.4s 0.783
OpenNE(GCN) 0.5 5e-4 64 - 6.5s 0.779

Usage

Installation

  • Clone this repo.
  • enter the directory where you clone it, and run the following code
    pip install -r requirements.txt
    cd src
    python setup.py install

General Options

You can check out the other options available to use with OpenNE using:

python -m openne --help
  • --input, the input file of a network;
  • --graph-format, the format of input graph, adjlist or edgelist;
  • --output, the output file of representation (GCN doesn't need it);
  • --representation-size, the number of latent dimensions to learn for each node; the default is 128
  • --method, the NE model to learn, including deepwalk, line, node2vec, grarep, tadw, gcn, lap, gf, hope and sdne;
  • --directed, treat the graph as directed; this is an action;
  • --weighted, treat the graph as weighted; this is an action;
  • --label-file, the file of node label; ignore this option if not testing;
  • --clf-ratio, the ratio of training data for node classification; the default is 0.5;
  • --epochs, the training epochs of LINE and GCN; the default is 5;

Example

To run "node2vec" on BlogCatalog network and evaluate the learned representations on multi-label node classification task, run the following command in the home directory of this project:

python -m openne --method node2vec --label-file data/blogCatalog/bc_labels.txt --input data/blogCatalog/bc_adjlist.txt --graph-format adjlist --output vec_all.txt --q 0.25 --p 0.25

To run "gcn" on Cora network and evaluate the learned representations on multi-label node classification task, run the following command in the home directory of this project:

python -m openne --method gcn --label-file data/cora/cora_labels.txt --input data/cora/cora_edgelist.txt --graph-format edgelist --feature-file data/cora/cora.features  --epochs 200 --output vec_all.txt --clf-ratio 0.1

Specific Options

DeepWalk and node2vec:

  • --number-walks, the number of random walks to start at each node; the default is 10;
  • --walk-length, the length of random walk started at each node; the default is 80;
  • --workers, the number of parallel processes; the default is 8;
  • --window-size, the window size of skip-gram model; the default is 10;
  • --q, only for node2vec; the default is 1.0;
  • --p, only for node2vec; the default is 1.0;

LINE:

  • --negative-ratio, the default is 5;
  • --order, 1 for the 1st-order, 2 for the 2nd-order and 3 for 1st + 2nd; the default is 3;
  • --no-auto-save, no early save when training LINE; this is an action; when training LINE, we will calculate F1 scores every epoch. If current F1 is the best F1, the embeddings will be saved.

GraRep:

  • --kstep, use k-step transition probability matrix(make sure representation-size%k-step == 0).

TADW:

  • --lamb, lamb is a hyperparameter in TADW that controls the weight of regularization terms.

GCN:

  • --feature-file, The file of node features;
  • --epochs, the training epochs of GCN; the default is 5;
  • --dropout, dropout rate;
  • --weight-decay, weight for l2-loss of embedding matrix;
  • --hidden, number of units in the first hidden layer.

GraphFactorization:

  • --epochs, the training epochs of GraphFactorization; the default is 5;
  • --weight-decay, weight for l2-loss of embedding matrix;
  • --lr, learning rate, the default is 0.01

SDNE:

  • --encoder-list, a list of numbers of the neuron at each encoder layer, the last number is the dimension of the output node representation, the default is [1000, 128]
  • --alpha, alpha is a hyperparameter in SDNE that controls the first order proximity loss, the default is 1e-6
  • --beta, beta is used for construct matrix B, the default is 5
  • --nu1, parameter controls l1-loss of weights in autoencoder, the default is 1e-5
  • --nu2, parameter controls l2-loss of weights in autoencoder, the default is 1e-4
  • --bs, batch size, the default is 200
  • --lr, learning rate, the default is 0.01

Input

The supported input format is an edgelist or an adjlist:

edgelist: node1 node2 <weight_float, optional>
adjlist: node n1 n2 n3 ... nk

The graph is assumed to be undirected and unweighted by default. These options can be changed by setting the appropriate flags.

If the model needs additional features, the supported feature input format is as follow (feature_i should be a float number):

node feature_1 feature_2 ... feature_n

Output

The output file has n+1 lines for a graph with n nodes. The first line has the following format:

num_of_nodes dim_of_representation

The next n lines are as follows:

node_id dim1 dim2 ... dimd

where dim1, ... , dimd is the d-dimensional representation learned by OpenNE.

Testing

If you want to evaluate the learned node representations, you can input the node labels. It will use a portion (default: 50%) of nodes to train a classifier and calculate F1-score on the rest dataset.

The supported input label format is

node label1 label2 label3...

Embedding visualization

To show how to apply dimension reduction methods like t-SNE and PCA to embedding visualization, we choose the 20 newsgroups dataset. Using the text feature, we built the news network by kneighbors_graph in scikit-learn. We uploaded the results of different methods in t-SNE-PCA.pptx where the colors of nodes represent the labels of nodes. A simple script is shown as follows:

cd visualization_example
python 20newsgroup.py
tensorboard --logdir=log/

After running the tensorboard, visit localhost:6006 to view the result.

Citation

If you find OpenNE is useful for your research, please consider citing the following papers:

@InProceedings{perozzi2014deepwalk,
  Title                    = {Deepwalk: Online learning of social representations},
  Author                   = {Perozzi, Bryan and Al-Rfou, Rami and Skiena, Steven},
  Booktitle                = {Proceedings of KDD},
  Year                     = {2014},
  Pages                    = {701--710}
}

@InProceedings{tang2015line,
  Title                    = {Line: Large-scale information network embedding},
  Author                   = {Tang, Jian and Qu, Meng and Wang, Mingzhe and Zhang, Ming and Yan, Jun and Mei, Qiaozhu},
  Booktitle                = {Proceedings of WWW},
  Year                     = {2015},
  Pages                    = {1067--1077}
}

@InProceedings{grover2016node2vec,
  Title                    = {node2vec: Scalable feature learning for networks},
  Author                   = {Grover, Aditya and Leskovec, Jure},
  Booktitle                = {Proceedings of KDD},
  Year                     = {2016},
  Pages                    = {855--864}
}

@article{kipf2016semi,
  Title                    = {Semi-Supervised Classification with Graph Convolutional Networks},
  Author                   = {Kipf, Thomas N and Welling, Max},
  journal                  = {arXiv preprint arXiv:1609.02907},
  Year                     = {2016}
}

@InProceedings{cao2015grarep,
  Title                    = {Grarep: Learning graph representations with global structural information},
  Author                   = {Cao, Shaosheng and Lu, Wei and Xu, Qiongkai},
  Booktitle                = {Proceedings of CIKM},
  Year                     = {2015},
  Pages                    = {891--900}
}

@InProceedings{yang2015network,
  Title                    = {Network representation learning with rich text information},
  Author                   = {Yang, Cheng and Liu, Zhiyuan and Zhao, Deli and Sun, Maosong and Chang, Edward},
  Booktitle                = {Proceedings of IJCAI},
  Year                     = {2015}
}

@Article{tu2017network,
  Title                    = {Network representation learning: an overview},
  Author                   = {TU, Cunchao and YANG, Cheng and LIU, Zhiyuan and SUN, Maosong},
  Journal                  = {SCIENTIA SINICA Informationis},
  Volume                   = {47},
  Number                   = {8},
  Pages                    = {980--996},
  Year                     = {2017}
}

@inproceedings{ou2016asymmetric,
  title                    = {Asymmetric transitivity preserving graph embedding},
  author                   = {Ou, Mingdong and Cui, Peng and Pei, Jian and Zhang, Ziwei and Zhu, Wenwu},
  booktitle                = {Proceedings of the 22nd ACM SIGKDD},
  pages                    = {1105--1114},
  year                     = {2016},
  organization             = {ACM}
}

@inproceedings{belkin2002laplacian,
  title                    = {Laplacian eigenmaps and spectral techniques for embedding and clustering},
  author                   = {Belkin, Mikhail and Niyogi, Partha},
  booktitle                = {Advances in neural information processing systems},
  pages                    = {585--591},
  year                     = {2002}
}

@inproceedings{ahmed2013distributed,
  title                    = {Distributed large-scale natural graph factorization},
  author                   = {Ahmed, Amr and Shervashidze, Nino and Narayanamurthy, Shravan and Josifovski, Vanja and Smola, Alexander J},
  booktitle                = {Proceedings of the 22nd international conference on World Wide Web},
  pages                    = {37--48},
  year                     = {2013},
  organization             = {ACM}
}

@inproceedings{wang2016structural,
  title                    = {Structural deep network embedding},
  author                   = {Wang, Daixin and Cui, Peng and Zhu, Wenwu},
  booktitle                = {Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining},
  pages                    = {1225--1234},
  year                     = {2016},
  organization             = {ACM}
}

About OpenSKL

OpenSKL project aims to harness the power of both structured knowledge and natural languages via representation learning. All sub-projects of OpenSKL, under the categories of Algorithm, Resource and Application, are as follows.

  • Algorithm:
    • OpenKE
      • An effective and efficient toolkit for representing structured knowledge in large-scale knowledge graphs as embeddings, with TransR and PTransE as key features to handle complex relations and relational paths.
      • This toolkit also includes three repositories:
    • ERNIE
      • An effective and efficient toolkit for augmenting pre-trained language models with knowledge graph representations.
    • OpenNE
      • An effective and efficient toolkit for representing nodes in large-scale graphs as embeddings, with TADW as key features to incorporate text attributes of nodes.
    • OpenNRE
      • An effective and efficient toolkit for implementing neural networks for extracting structured knowledge from text, with ATT as key features to consider relation-associated text information.
      • This toolkit also includes two repositories:
  • Resource:
    • The embeddings of large-scale knowledge graphs pre-trained by OpenKE, covering three typical large-scale knowledge graphs: Wikidata, Freebase, and XLORE. The embeddings are free to use under the MIT license, and please click the following link to submit download requests.
    • OpenKE-Wikidata
      • Wikidata is a free and collaborative database, collecting structured data to provide support for Wikipedia. The original Wikidata contains 20,982,733 entities, 594 relations and 68,904,773 triplets. In particular, Wikidata-5M is the core subgraph of Wikidata, containing 5,040,986 high-frequency entities from Wikidata with their corresponding 927 relations and 24,267,796 triplets.
      • TransE version: Knowledge embeddings of Wikidata pre-trained by OpenKE.
      • TransR version of Wikidata-5M: Knowledge embeddings of Wikidata-5M pre-trained by OpenKE.
    • OpenKE-Freebase
      • Freebase was a large collaborative knowledge base consisting of data composed mainly by its community members. It was an online collection of structured data harvested from many sources. Freebase contains 86,054,151 entities, 14,824 relations and 338,586,276 triplets.
      • TransE version: Knowledge embeddings of Freebase pre-trained by OpenKE.
    • OpenKE-XLORE
      • XLORE is one of the most popular Chinese knowledge graphs developed by THUKEG. XLORE contains 10,572,209 entities, 138,581 relations and 35,954,249 triplets.
      • TransE version: Knowledge embeddings of XLORE pre-trained by OpenKE.
  • Application:
    • Knowledge-Plugin
      • An effective and efficient toolkit of plug-and-play knowledge injection for pre-trained language models. Knowledge-Plugin is general for all kinds of knowledge graph embeddings mentioned above. In the toolkit, we plug the TransR version of Wikidata-5M into BERT as an example of applications. With the TransR embedding, we enhance the knowledge ability of BERT without fine-tuning the original model, e.g., up to 8% improvement on question answering.

openne's People

Contributors

alannotnerd avatar albertyang33 avatar benelot avatar crsqq avatar eustomaqua avatar freyr-wings avatar helloxcq avatar skeletondyh avatar thucsthanxu13 avatar zzy14 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

openne's Issues

requirements

I get Illegal instruction (core dumped. tensorflow 1.10 I think you need to update tensorflow version to 1.5 in requirements.py

Some questions about directed graph

adj[look_up[edge[0]]][look_up[edge[1]]] = 1.0
adj[look_up[edge[1]]][look_up[edge[0]]] = 1.0
The code is found in tadw and grarep.
Is it means that the method can only be used in undirected graph?

gridsearch

now it's not convenient to do grid search like "sklearn" while the model has a lot of parameters , can you develop some model for five fold gridsearch

NaN issue running LINE

I have encountered NaN in the loss as well as the learned embedding when running with LINE. Same issue does not occur when running with the code provided by the LINE paper author.

The command I used was:

python src/main.py --input graph_memes.txt --graph-format edgelist --representation-size 64 --directed --epochs 20 --workers 32 --method line --output memes_line.txt

The command line output was:

Reading...
Pre-procesing for non-uniform negative sampling!
Instructions for updating:
Use the retry module or similar alternatives.
Pre-procesing for non-uniform negative sampling!
epoch:0 sum of loss:9389.645792722702
epoch:0 sum of loss:4735.700150832534
epoch:1 sum of loss:9331.65645313263
epoch:1 sum of loss:4152.164891295135
epoch:2 sum of loss:9321.187238454819
epoch:2 sum of loss:3999.570925347507
epoch:3 sum of loss:9314.46668368578
epoch:3 sum of loss:3914.669999137521
epoch:4 sum of loss:9310.313271224499
epoch:4 sum of loss:3865.22195328027
epoch:5 sum of loss:9308.589637458324
epoch:5 sum of loss:3833.3185774832964
epoch:6 sum of loss:9306.87627118826
epoch:6 sum of loss:3810.1825504228473
epoch:7 sum of loss:9304.062089920044
epoch:7 sum of loss:3792.079162888229
epoch:8 sum of loss:9303.782062470913
epoch:8 sum of loss:3780.837639503181
epoch:9 sum of loss:9303.198790431023
epoch:9 sum of loss:3770.1571313217282
epoch:10 sum of loss:9302.51797068119
epoch:10 sum of loss:3763.195652872324
epoch:11 sum of loss:9301.631554305553
epoch:11 sum of loss:3758.490410581231
epoch:12 sum of loss:9300.383490085602
epoch:12 sum of loss:3752.485915541649
epoch:13 sum of loss:9300.618901848793
epoch:13 sum of loss:3749.2992250844836
epoch:14 sum of loss:9299.43757379055
epoch:14 sum of loss:3747.6589295864105
epoch:15 sum of loss:9299.78179126978
epoch:15 sum of loss:3743.776451356709
epoch:16 sum of loss:9300.20017015934
epoch:16 sum of loss:3739.654469586909
epoch:17 sum of loss:nan
epoch:17 sum of loss:3739.8969665542245
epoch:18 sum of loss:nan
epoch:18 sum of loss:3736.818851336837
epoch:19 sum of loss:nan
epoch:19 sum of loss:3736.8378988951445
3631.65814495
Saving embeddings...

The input file is attached.

graph_memes.txt.zip

A bug for LINE

I find in the _main.py, users can pass the parameter 'negative-ratio' into openne, however, when _main.py calls line.py, it does not pass such parameter into the line.py.
Looking forward to your reply and fix

Why not use the Classifier(LogisticRegression) to evaluate the performance of GCN?

It seems that the acc of GCN is directly obtained from the GCN net, while the acc of other methods are obtained from the Classifier(LogisticRegression).
Is it fair to compare the performance between GCN and others by different classifiers (i.e., GCN net and LogisticRegression)?
Why not use the Classifier(LogisticRegression) to evaluate the performance of GCN?

questions about multiprocess in walker.py

In your walker.py, I find that you comment out multiprocessing part. I'd like to run random walk process for many times. So, I am wondering whether it is ok for me to just cancel the comment and run the code on a server.
Looking forward to your reply~

How to run the code on GPU?

I simply use CUDA_VISIBLE_DEVICES to restrict it on GPU, but it does not work.
I'm new to GPU, could you please tell me how to run the code on GPU?
Thanks!

TypeError: 'instancemethod' object has no attribute '__getitem__'

mldl@mldlUB1604:/ub16_prj/OpenNE$ python src/main.py --method node2vec --label-file data/blogCatalog/bc_labels.txt --input data/blogCatalog/bc_adjlist.txt --graph-format adjlist --output vec_all.txt --q 0.25 --p 0.25
Reading...
Traceback (most recent call last):
File "src/main.py", line 124, in
main(parse_args())
File "src/main.py", line 74, in main
g.read_adjlist(filename=args.input)
File "/home/mldl/ub16_prj/OpenNE/src/libnrl/graph.py", line 37, in read_adjlist
self.encode_node()
File "/home/mldl/ub16_prj/OpenNE/src/libnrl/graph.py", line 27, in encode_node
self.G.nodes[node]['status'] = ''
TypeError: 'instancemethod' object has no attribute 'getitem'
mldl@mldlUB1604:
/ub16_prj/OpenNE$

some problem about installing

I cloned it to my server(Ubuntu 16.04.4 LTS).Here is the path “/home/ruancy/wrj/OpenNE-master/”
I input “python setup.py install” to install.Then I input "python -m openne --help"
I get an error,Here it is:
Traceback (most recent call last):
File "/home/ruancy/anaconda3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/ruancy/anaconda3/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/ruancy/anaconda3/lib/python3.6/site-packages/openne-0.0.0-py3.6.egg/openne/main.py", line 7, in
import node2vec
ModuleNotFoundError: No module named 'node2vec'

A TypeError When I run example

I meet an error as below when I run example:
"""
File "src\libnrl\graph.py", line 27, in encode_node
self.G.nodes[node]['status'] = ''
TypeError: 'method' object is not subscriptable
"""
I use python3.

AttributeError: module 'walker' has no attribute 'Walker'

Traceback (most recent call last):
File "src/main.py", line 128, in
main(parse_args())
File "src/main.py", line 84, in main
workers=args.workers, p=args.p, q=args.q, window=args.window_size)
File "D:\OpenNE-master\src\libnrl\node2vec.py", line 20, in init
self.walker = walker.Walker(graph, p=p, q=q, workers=kwargs["workers"])
AttributeError: module 'walker' has no attribute 'Walker'

i use py3

Error running LINE 'AttributeError: 'LINE' object has no attribute 'best_vector''

I got an error when running LINE using the following configuration:
python src/main.py --input graph.txt --graph-format edgelist --representation-size 64 --directed --epochs 20 --method line --no-auto-save --output embed_line_d_64.txt

Traceback (most recent call last):
File "src/main.py", line 124, in
main(parse_args())
File "src/main.py", line 86, in main
model = line.LINE(g, epoch = args.epochs, rep_size=args.representation_size, order=args.order)
File "/home/xxx/OpenNE/src/libnrl/line.py", line 233, in init
self.vectors = self.best_vector
AttributeError: 'LINE' object has no attribute 'best_vector'

TADW meeting MemoryError

Hi~!
There is 32G memory in my Ubuntu, I have a graph about 20k vertices and 40k edges. It came up with MemmoryError as following:

Traceback (most recent call last):
  File "src/main.py", line 126, in <module>
    main(parse_args())
  File "src/main.py", line 98, in main
    model = tadw.TADW(graph=g, dim=args.representation_size, lamb=args.lamb)
  File "/code/OpenNE/src/libnrl/tadw.py", line 14, in __init__
    self.train()
  File "/code/OpenNE/src/libnrl/tadw.py", line 56, in train
    self.T = self.getT()
  File "/code/OpenNE/src/libnrl/tadw.py", line 39, in getT
    for i in range(g.number_of_nodes())]) 
  File "/home/ring/.local/lib/python3.6/site-packages/numpy/core/shape_base.py", line 234, in vstack
    return _nx.concatenate([atleast_2d(_m) for _m in tup], 0)
MemoryError

May I have some suggest to handle this problem?

Error when running example of LINE

When run the following cmd,
python src/main.py --method line --label-file data/blogCatalog/bc_labels.txt --input data/blogCatalog/bc_adjlist.txt --graph-format adjlist --output vec_all.txt
And get error,

File "/home/***/runspace/OpenNE/src/libnrl/graph.py", line 27, in encode_node
self.G.nodes[node]['status'] = ''
TypeError: 'instancemethod' object has no attribute 'getitem'

Confused by the loos of line.py

Hi @zzy14 ,thanks for your excellent code. I am confused by the loos function of line.py, specifically,
1.The commented code of line.py (43-48) is the original form of the paper, it is easy to understand. I want to know whether some minus sign are missed here.
# self.sample_sum2 =**-** tf.reduce_sum(tf.log(tf.nn.sigmoid(**-**tf.reduce_sum(tf.multiply(self.pos_h_v_e, self.neg_t_e_context), axis=2))), axis=1) # self.second_loss = tf.reduce_mean(-tf.log(tf.nn.sigmoid(tf.reduce_sum(tf.multiply(self.pos_h_e, self.pos_t_e_context), axis=1))) + # self.sample_sum2) # self.sample_sum1 = **-**tf.reduce_sum(tf.log(tf.nn.sigmoid(**-**tf.reduce_sum(tf.multiply(self.pos_h_v_e, self.neg_t_e), axis=2))), axis=1) # self.first_loss = tf.reduce_mean(-tf.log(tf.nn.sigmoid(tf.reduce_sum(tf.multiply(self.pos_h_e, self.pos_t_e), axis=1))) + # self.sample_sum1)
2. The loos funcution you actually used in the code (49-54) is the reduced form of original paper?

Example problem

MY install is ok,but when i process the example,, it maintain in the status:
(openNE_env) ruancy@dell-SYS-7048GR-TR:~/OpenNE-master$ python src/main.py --method node2vec --label-file data/blogCatalog/bc_labels.txt --input data/blogCatalog/bc_adjlist.txt --graph-format adjlist --output vec_all.txt --q 0.25 --p 0.25
Reading...
Preprocess transition probs...

About wiki dataset

Hi, I wonder the original source of the data set WIKI. . or the dataset is generate by yourself ?.

Baseline reproduction scripts

Can you provide scripts to reproduce the baselines as shown in the Readme.md? I am trying to reproduce it and some code does not even run properly.

why TADW needs label_file?

Hi~
When I review TADW's code, I'm in puzzle about that why TADW needs label_file?
The following code is in main.py , about line 98.

elif args.method == 'tadw':
     assert args.label_file != ''
     assert args.feature_file != ''
     g.read_node_label(args.label_file)
     g.read_node_features(args.feature_file)
     model = tadw.TADW(graph=g, dim=args.representation_size, lamb=args.lamb)

执行 node2vec 报错

src/graph.py line 27
self.G.nodes[node]['status'] = ''

是否应该修改为
self.G.node[node]['status'] = ''

Problems with respect to different ratio of training images in GCN.

Hello, there, first thank for making the code available online, but I have a few questions about testing the GCN model using the cora dataset. If I change the ---clf-ratio to 0.01, the training and evaluation loss is nan. How can that happen, hopefully I can get your attention, thanks,

GCN output problem

i cant't find the output vec_all.txt when running GCN,
python src/main.py --method gcn --label-file data/cora/cora_labels.txt --input data/cora/cora_edgelist.txt --graph-format edgelist --feature-file data/cora/cora.features --epochs 200 --output vec_all.txt --clf-ratio 0.1

the precedure didnot have the model save or embedding save or how to predict the new data

Dead cycle

hi,friends. i have a problem that..
1 4
1 5
1 6
1 7
1 9
1 10
1 11
1 12
2 4
2 5
2 6
2 7
2 8
2 9
2 10
2 11
3 7
3 8
3 9
3 10
3 11
3 12
4 7
5 7
6 8
6 11
9 6
7 12
8 12
11 5
10 8

i set these datas on the method “line”,the program will be dead cycle .... Can i solve these problem?

TypeError: 'method' object is not subscriptable

Hello, I ran an example with following command, but I got an error.
My python version is Python 3.6.0 :: Anaconda custom (64-bit)

Here is the detail.

command:

python src/main.py --method gcn --label-file data/cora/cora_labels.txt --input data/cora/cora_edgelist.txt --graph-format edgelist --feature-file data/cora/cora.features --epochs 200 --output vec_all.txt --clf-ratio 0.1

error:

Using TensorFlow backend.
Reading...
Traceback (most recent call last):
File "src/main.py", line 125, in
main(parse_args())
File "src/main.py", line 77, in main
g.read_edgelist(filename=args.input, weighted=args.weighted, directed=args.directed)
File "/home/lzc/p/OpenNE/src/libnrl/graph.py", line 76, in read_edgelist
self.encode_node()
File "/home/lzc/p/OpenNE/src/libnrl/graph.py", line 27, in encode_node
self.G.nodes[node]['status'] = ''
TypeError: 'method' object is not subscriptable

Some issue in LINE

    # self.h_e = tf.nn.l2_normalize(tf.nn.embedding_lookup(self.embeddings, self.h), 1)
    # self.t_e = tf.nn.l2_normalize(tf.nn.embedding_lookup(self.embeddings, self.t), 1)
    # self.t_e_context = tf.nn.l2_normalize(tf.nn.embedding_lookup(self.context_embeddings, self.t), 1)
    self.h_e = tf.nn.embedding_lookup(self.embeddings, self.h)
    self.t_e = tf.nn.embedding_lookup(self.embeddings, self.t)
    self.t_e_context = tf.nn.embedding_lookup(self.context_embeddings, self.t)
    self.second_loss = -tf.reduce_mean(tf.log_sigmoid(self.sign*tf.reduce_sum(tf.multiply(self.h_e, self.t_e_context), axis=1)))

I see those code in method LINE, and I am concerned about the same question why we cannot use l2_normalize in this step. Could you please give me some brief explanation about this point?

By the way, this is a really nice repository. Thank you for sharing your work.

minor suggestion on your implementation of line.py

When we apply both 1st and 2nd LINE, the sampling table of nodes and the alias table of edges will be constructed twice, since you put the constructing step in the class _LINE and this class is called twice. And I think moving it to the class LINE can solve this and it would not be difficult, so why not?

why do LINE's losses have different forms?

While reading/learning the implementation of LINE, I found the first-order loss is different from the LINE paper.

It involves negative samples which are only used in second-order proximity, and sigmoid function is not applied to model vertices u_i and u_j (4 lines with sigmoid function are commented).
In your code, the loss looks like O_1 = -u_i*u_j + log(sum(exp(u_i*u_neg)))

In LINE paper, except for w_ij are omitted via edge sampling, first-order proximity objective is
image

Is there any theoretical or practical explanation to do so?

TADW - dimension goof-up

I tried to generate an embedding of size 128. I put the dimension accordingly, but the embeddings generated were of size 256. Then I tried with keeping dimension parameter 64, surprisingly the outout generated having dimension 128.

It was working fine for dimension 32.

Problem with LINE

I am using LINE for link prediction on my graph. When I run LINe on the entire graph, I get no error. However, when I run it on the training graph for link prediction, I get the following error:

"ValueError: Variable model/embeddings1 already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope?"

What I realized is that LINE keeps training multiple times, as if it never reaches a stopping loss. I guess that's where it needs to reuse the variable. I don't understand why this happens.

I set "reuse=tf.AUTO_REUSE" in tf.variable_scope, to see if it stops after multiple training sessions, but after 2-3 training times I get the following error:
"Trying to share variable model/embeddings1, but specified shape (518, 1) and found shape (517, 1)."

Do you have any idea what's happening here?

'DiGraph' object has no attribute 'edges_iter'

Traceback (most recent call last):
  File "src/main.py", line 124, in <module>
    main(parse_args())
  File "src/main.py", line 74, in main
    g.read_adjlist(filename=args.input)
  File "/home/zhu/Documents/git_repo/OpenNE/src/libnrl/graph.py", line 35, in read_adjlist                                                    
    for i, j in self.G.edges_iter():
AttributeError: 'DiGraph' object has no attribute 'edges_iter'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.