Giter Site home page Giter Site logo

remylau / gli Goto Github PK

View Code? Open in Web Editor NEW

This project forked from graph-learning-benchmarks/gli

0.0 0.0 0.0 5.67 MB

๐Ÿ—‚ Graph Learning Indexer: a contributor-friendly and metadata-rich platform for graph learning benchmarks. Dataloading, Benchmarking, Tagging, and more!

Home Page: https://graph-learning-benchmarks.github.io/gli/

License: MIT License

Python 44.70% Makefile 0.35% Jupyter Notebook 54.95%

gli's Introduction

Graph Learning Indexer (GLI)

Pycodestyle Pydocstyle Pylint Pytest

Graph Learning Indexer (GLI) is a benchmark curation platform for graph learning.

Design Objectives

In comparison to previous graph learning libraries, GLI highlights two design objectives.

  • GLI is designed to better serve dataset contributors by minimizing the effort of contributing and maintaining a dataset.
  • GLI is designed to create a knowledge base (as opposed to a simple collection) of benchmarks with rich meta information about the datasets.

Highlighted Features

File-Based Data API

GLI defines a file-based standard dataset API that is both efficient in storage and flexible for various graph structures. In comparison to the common code-based dataset API, the file-based design can significantly reduce the maintainance effort required for the dataset contributors.

Explicit Separation of Data and Task

GLI makes an explicit separation between the data storage and the task configuration. For graph learning, there could often be multiple tasks (e.g., node classification and link prediction) defined on the same dataset, or there could be multiple settings for the same task (e.g., random split or fixed split).

The explicit separation of data and task provides a number of benefits:

  • The API becomes more exensible to new tasks.
  • The automated tests can be separated by tasks and become more modularized.
  • It allows to implement general data loading schemes for each task.

Automated Tests

GLI implements a wide range of automated tests for new dataset submissions, which provides prompt and rich feedback to the dataset contributors and makes the contribution process smoother.

Rich Meta Information

GLI also provides tools to calculate graph properties (such as clustering coefficients or homophily ratio) and benchmark popular models for newly contributed datasets, which can augment new datasets with rich meta information.

Get Started

This is a quickstart for users who want to use the existing datasets hosted in GLI. For users who want to contribute a new dataset, please refer to our Contribution Guide.

Installation

Currently, we support installation from the source.

git clone https://github.com/Graph-Learning-Benchmarks/gli.git
cd gli
pip install -e .

To test the installation, run the following command:

python example.py --graph cora --task NodeClassification

The output should be something like the following:

> Graph(s) loading takes 0.0196 seconds and uses 0.9788 MB.
> Task loading takes 0.0016 seconds and uses 0.1218 MB.
> Combining(s) graph and task takes 0.0037 seconds and uses 0.0116 MB.
Dataset("CORA dataset. NodeClassification", num_graphs=1, save_path=~/.dgl/CORA dataset. NodeClassification)**

Data Loading API

To load a dataset from the remote data repository, simply use the get_gli_dataset() function:

>>> import gli
>>> dataset = gli.get_gli_dataset(dataset="cora", task="NodeClassification", device="cpu")
>>> dataset
Dataset("CORA dataset. NodeClassification", num_graphs=1, save_path=/Users/jimmy/.dgl/CORA dataset. NodeClassification)

Alternatively, one can also get a single graph or a list of graphs rather than a wrapped dataset by get_gli_graph(). Furthermore, GLI provides abstractions for various tasks (GLITask) and provides a function get_gli_task() to return a task instance. Combine these two instances to get a wrapped dataset that is identical to the previous case.

>>> import gli
>>> g = gli.get_gli_graph(dataset="cora", device="cpu", verbose=False)
>>> g
Graph(num_nodes=2708, num_edges=10556,
      ndata_schemes={'NodeFeature': Scheme(shape=(1433,), dtype=torch.float32), 'NodeLabel': Scheme(shape=(), dtype=torch.int64)}
      edata_schemes={})
>>> task = gli.get_gli_task(dataset="cora", task="NodeClassification", verbose=False)
>>> task
<gli.task.NodeClassificationTask object at 0x100eff640>
>>> dataset = gli.combine_graph_and_task(g, task)
>>> dataset
Dataset("CORA dataset. NodeClassification", num_graphs=1, save_path=/Users/jimmy/.dgl/CORA dataset. NodeClassification)

The returned dataset is inherited from DGLDataset. Therefore, it can be incorporated into DGL's infrastructure seamlessly:

>>> type(dataset)
<class 'gli.dataset.NodeClassificationDataset'>
>>> isinstance(dataset, dgl.data.DGLDataset)
True

Contributing

New Dataset, Feature Request, Bug Fix, or Better Documentation.

All kinds of improvement are welcomed! Please refer to our Contribution Guide for details.

Citation

If you find GLI helpful for your research, please consider citing our paper below.

Graph Learning Indexer: A Contributor-Friendly and Metadata-Rich Platform for Graph Learning Benchmarks.

Jiaqi Ma*, Xingjian Zhang*, Hezheng Fan, Jin Huang, Tianyue Li, Ting Wei Li, Yiwen Tu, Chenshu Zhu, and Qiaozhu Mei. LOG 2022. (*Equal Contributions.)

BibTex:

@inproceedings{ma2022graph,
      title={Graph Learning Indexer: A Contributor-Friendly and Metadata-Rich Platform for Graph Learning Benchmarks},
      author={Jiaqi Ma and Xingjian Zhang and Hezheng Fan and Jin Huang and Tianyue Li and Ting Wei Li and Yiwen Tu and Chenshu Zhu and Qiaozhu Mei},
      booktitle={The First Learning on Graphs Conference},
      year={2022},
      url={https://openreview.net/forum?id=ZBsxA6_gp3}
}

Please note that you should cite the corresponding data source if you are using any datasets hosted here.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.