Giter Site home page Giter Site logo

featgraph's Introduction

FeatGraph: Sparse kernels for GNNs based on TVM

Graph neural networks (GNNs) are gaining popularity in recent years as a promising approach to machine learning on graphs. Unlike traditional graph workloads where each vertex/edge is associated with a scalar, GNNs attach a feature tensor to each vertex/edge. This additional feature dimension, along with consequently more complex vertex- and edge-wise computations, has enormous implications on locality and parallelism, which existing graph processing systems fail to exploit.

To tackle the challenge, FeatGraph maps the building blocks of GNNs to generalized SpMM (sparse-dense matrix multiplication) and SDDMM (sampled dense-dense matrix multiplication) kernels, and provides high-performance implementations of these sparse kernels based on TVM.

For more information, refer to our SC'20 paper.

@article{hu2020featgraph,
  title={FeatGraph: A Flexible and Efficient Backend for Graph Neural Network Systems},
  author={Hu, Yuwei and Ye, Zihao and Wang, Minjie and Yu, Jiali and Zheng, Da and Li, Mu and Zhang, Zheng and Zhang, Zhiru and Wang, Yida},
  journal={International Conference for High Performance Computing, Networking, Storage and Analysis},
  year={2020}
}

Run the code

  1. Install TVM (instructions) and DGL (instructions).

TVM v0.7 is required. When you clone TVM:

git clone -b v0.7 --recursive https://github.com/apache/incubator-tvm tvm
  1. Install FeatGraph.
git clone [email protected]:amazon-research/FeatGraph.git
export PYTHONPATH=/path/to/FeatGraph/python:${PYTHONPATH}
  1. Prepare datasets.

The input to SpMM is an adjacency matrix in csr format stored as a scipy npz file; the input to SDDMM is an adjacency matrix in coo format stored as a scipy npz file. You can run download_reddit_dataset.py under the benchmark folder to get the reddit dataset.

  1. Run benchmark scripts.
cd benchmark
python bench_vanilla_spmm.py --dataset data/reddit_csr_float32.npz --feat-len 64 --target x86
python bench_vanilla_spmm.py --dataset data/reddit_csr_float32.npz --feat-len 64 --target cuda
python bench_vanilla_sddmm.py --dataset data/reddit_coo_float32.npz --feat-len 64 --target x86
python bench_vanilla_sddmm.py --dataset data/reddit_coo_float32.npz --feat-len 64 --target cuda

featgraph's People

Contributors

huyuwei avatar jermainewang avatar yzh119 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

featgraph's Issues

use end-to-end DGL scripts run featGraph

Hi, I want to run the featGraph end-to-end.
I have already built the DGL (with featGraph) and run the test.py file successfully using the instructions posted in https://github.com/dmlc/dgl/tree/master/featgraph.

  • If I want to run an end-to-end GCN training on Pubmed or Reddit dataset, can I just use the DGL GCN benchmark script I have before without changing any kernel names? In other words, which parts of the code of DGL python script do I need to change so that I can run the featGraph(not DGL) end-to-end? Thank you.

Benchmark data download failure

I get an error when trying to run the download_reddit_dataset.py file

Traceback (most recent call last):
  File "download_reddit_dataset.py", line 12, in <module>
    assert adj_scipy_csr.has_canonical_format  # the matrix has sorted indices and no duplicates
AssertionError

Any help to solve this is appreciated.

Is FeatGraph still work when using mini-batch for gnn training?

the paper said:

FeatGraph generates kernel codes for a specific graph topology (i.e., the adjacency matrix);

if we use mini-batch for GNN training, the subgraph structure corresponding to each batch should be uncertain in this case. Is FeatGraph still work?

can featGraph support GAT?

Hello, Do you know whether featGraph can support end-to-end graph attention networks (GATs) training? The paper did not mention it can support GAT, therefore, I am curious about it. Thank you.

Two mistakes in my view

Hello, thanks for the open-sourced code! During reading the code, there are two places that I don't understand.

  1. At this line, I think num_col_partitions should be assigned rather than num_row_partitions.

  2. During running bench_vanilla_sddmm.py, I notice that data in reddit_coo_float32.npz is not sorted. That is to say, in loaded COO matrix, for every source vertex, its destination vertices index are not sorted. I don't quite understand why. Is it a preprocessed adjacency matrix by reorderding the vertices? And I'm not sure whether it may cause some issue when indexing CSR matrix in the graph partition procedure?

TypeError

num_cuda_blocks: 64
num_threads_per_cuda_block: 32
Traceback (most recent call last):
File "bench_vanilla_spmm.py", line 84, in
bench_vanilla_spmm_cuda(adj_scipy_csr, args.feat_len)
File "bench_vanilla_spmm.py", line 68, in bench_vanilla_spmm_cuda
_bench_vanilla_spmm_cuda(num_cuda_blocks, num_threads_per_cuda_block)
File "bench_vanilla_spmm.py", line 49, in _bench_vanilla_spmm_cuda
vanilla_spmm_module = VanillaSpMMcuda(adj_scipy_csr)
File "/home/dsaha/FeatGraph/python/featgraph/module/spmm.py", line 214, in init
super(VanillaSpMMcuda, self).init(adj_scipy, num_col_partitions=1)
File "/home/dsaha/FeatGraph/python/featgraph/module/spmm.py", line 67, in init
self._adj_indptr_tvm = tvm.nd.array(self._adj_indptr, ctx=self._ctx)
TypeError: array() got an unexpected keyword argument 'ctx'

I am getting this error when I run "python3 bench_vanilla_spmm.py --dataset data/reddit_csr_float32.npz --feat-len 64 --target cuda". Could you help me?

CheckError

Traceback (most recent call last):
File "bench_vanilla_spmm.py", line 82, in
bench_vanilla_spmm_x86(adj_scipy_csr, args.feat_len)
File "bench_vanilla_spmm.py", line 41, in bench_vanilla_spmm_x86
_bench_vanilla_spmm_x86(num_col_partitions, num_feat_partitions)
File "bench_vanilla_spmm.py", line 33, in _bench_vanilla_spmm_x86
tcost = vanilla_spmm_module.measure_average_time(input_tvm_ndarrays, num_runs)
File "\pythonProject\FeatGraph2\python\featgraph\module\spmm.py", line 179, in measure_average_time
timer = self._func.time_evaluator(self._func.entry_name, ctx=self._ctx, number=num_runs)
File "\Python\Python37\site-packages\tvm-0.7.0-py3.7-win-amd64.egg\tvm\runtime\module.py", line 220, in time_evaluator
f_preproc,
File "\Python\Python37\site-packages\tvm-0.7.0-py3.7-win-amd64.egg\tvm_ffi_ctypes\packed_func.py", line 237, in call
raise get_last_ffi_error()
tvm._ffi.base.TVMError: Traceback (most recent call last):
File "\tvm0.7\tvm\src\target\llvm\llvm_module.cc", line 79
TVMError: Check failed: entry_name != nullptr: Symbol tvm_main is not presented
I meet this error when I run all benchmark scripts.
Is there something wrong with tvm0.7?Could you help me?

Could you please specify dependencies?

Hi,
Thank you so much for this awesome work! Could you please specify dependencies? Which versions of TVM, DGL, and Python are needed? It would be helpful.

Thanks.
Khaled

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.