The time measuring module in run_script_dgl.py is not accurate

Beyond GNNs with LRNNs

This repository contains materials to reproduce the results from Beyond Graph Neural Networks with Lifted Relational Neural Networks.

Prerequisites

For the GNN frameworks, please follow the instructions at their respective pages, i.e. PyG and DGL.
- They are both very nice and user friendly python frameworks, so it should go smoothly.
- For reference, we used PyG 1.4.3 and DGL 0.4.3 (actual versions as of March 2020).
- Additionally, you will also need some basic python libraries like Pandas and Matplotlib to analyse the results, but you most likely already have those anyway.
For the LRNN framework, all you need is Java ≥ 1.8.
- The whole learning engine is the small NeuraLogic.jar included directly in this repo. For its source see a separate NeuraLogic repository.

Download & Create the Datasets

run a script to download and process all (73) the NCI datasets.

bash create_datasets.sh

Note this will take a considerable disk space (few GBs), so you might want to limit yourself, e.g. to the first 10 datasets as in the paper. For that, follow the individual steps below and remove the unwanted datasets (e.g. after donwload).

Alternative: individual steps - choose only the datasets you want:

go to ftp://ftp.ics.uci.edu/pub/baldig/learning/nci/gi50/ and download and UNZIP all the files into some folder DIR
run java -jar mol2csv.jar DIR
- this will convert all the molecular datasets in the DIR into simple csv representation, creating one folder per each dataset
- the source code of this simple utility can be found at: Molecule2csv
run python csvs2graphs.py DIR OUTDIR
- this will transform the csv representations into respective graph objects (for PyG and DGL) and textual (datalog) representation (for LRNN), and also split into 10 train-val-test folds
  - we note that as we use only the mol2 types in this case, and not the full feature vectors, this creates unnecessarily large files/graphs with many zero values, since we do not treat this as a special case (with sparse embedding indices instead of dense vectors). This is of course terribly space-inefficient, but has no influence on the main point of the experiments (all the frameworks use the same representation). Actually it causes LRNN to consume much more memory than it should due to complex parsing of the bloated text files, so we might fix this in near future.

Run the Experiments

run the PyG script with some model (gcn/gsage/gin) on some of the processed datasets, e.g. for the first dataset (NCI 786_0) as:

python run_script_pyg.py -sd DIR/786_0 -model gcn -lr 1.5e-05 -ts 2000 -out OUTDIR/pyg/786_0
very similarly, run the DGL version by:

python run_script_dgl.py -sd DIR/786_0 -model gcn -lr 1.5e-05 -ts 2000 -out OUTDIR/dgl/786_0
run the LRNN framework on the same datasets and models (templates) by calling:

java -Xmx5g -jar NeuraLogic.jar -sd DIR/786_0 -t ./templates/gcn.txt -ts 2000 -fp fold -init glorot -lr 1.5e-05 -ef XEnt -out OUTDIR/lrnn/786_0
Change the parameters of the scripts as you like (models, datasets, batch sizes, training steps, learning rates, ...) to further compare the behavior and runtimes of the frameworks, as done in the additional experiments in the paper.

We do not recommend to run the full batch of experiments on your local computer - it will take a long time to run them all! Rather, run them on some cluster to parallelize over the individual instances. For convenience/reference, we include our batch job scripts to run with a Slurm cluster manager in the directory: ./grid

Analyze the Results

All the relevant information from the experiments gets stored in the JSON files in the respective OUTDIR directories. For reference, we include our original results from experiments run with the included batch job scripts.

You can analyze our and your new results by own means in the respective JSON files, but we also include a convenience script to do some loading into DataFrames and plotting of the results (used for the paper), so you might want to bootstrap from there:

./analyse_results.py

You can find the Lifted Relational Neural Networks framework itself being developed at the Neuralogic repository. Please let me know ([email protected]) if you find any bugs or anything interesting!

	start = time.time()
	train_results = train(model, train_loader, optimizer, epoch)

	# train_loss2, train_acc = test(model, train_loader)
	val_results = test(model, val_loader)
	test_results = test(model, test_loader)

	if val_results.loss < best_val_results.loss:
	print(f'improving validation loss to {val_results.loss} at epoch {epoch}')
	best_val_results = val_results
	best_test_results = test_results
	best_train_results = train_results
	print(f'storing respective test results with accuracy {best_test_results.accuracy}')

	end = time.time()

gustiks / gnnwlrnns Goto Github PK

gnnwlrnns's Introduction

Beyond GNNs with LRNNs

Prerequisites

Download & Create the Datasets

Alternative: individual steps - choose only the datasets you want:

Run the Experiments

Analyze the Results

gnnwlrnns's People

Contributors

Stargazers

Watchers

Forkers

gnnwlrnns's Issues

The time measuring module in run_script_dgl.py is not accurate

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent