Giter Site home page Giter Site logo

gunrock / mini Goto Github PK

View Code? Open in Web Editor NEW
17.0 6.0 5.0 110 KB

mini is mini

License: Apache License 2.0

Makefile 6.75% C++ 81.90% Cuda 10.99% Shell 0.36%
gunrock mini-gunrock gpu traversal-operators workload-mapping-strategies graph-primitives cuda

mini's Introduction

mini-gunrock

This little project is the result of my study of moderngpu 2.0 and my attempt to create a minimalism-style graph processing library for the GPU. The reasons of creating this project are:

  • Gunrock main project is getting huge and it is difficult for new developers to get the core idea of our original purpose: giving graph processing library on the GPU both programmability and performance.
  • The integration with Multi-GPU and template-based design make it difficult to process micro-benchmarks for single workload mapping strategies and operations.
  • The Gunrock main project uses a mix of C++03, C++11, C99 and old-fashioned CUDA programming style, an alternative of refactoring such a large project is to build a small project with minimal components so that we can quickly try out new ideas.

Introduction

mini-gunrock's core components are graph traversal operators that based on moderngpu 2.0 transforms. The use of moderngpu 2.0 makes the code size small without losing performance. Here is a brief introduction of all the components in mini-gunrock.

  • graph_device_t contains CSR, CSC, and edge list presentations, it is the main data structure for storing our graph topology and node/edge values. It also contains auxiliary data, such as d_scanned_row_offsets, to be used by traversal operators.
  • frontier_t is the input and output for all traversal operators. It maps to Gunrock's frontier_queue using 1DArray.
  • problem_t is the base class for different graph problems. It contains a shared_ptr to graph_device_t called gslice. Each graph primitive will derive this class and define their own problem data structure which contains per-node/per-edge data.
  • filter and advance map to Gunrock's filter and advance operators. Underneath, they both use moderngpu 2.0's transforms (transform_compact for filter, and transform_scan + transform_lbs for advance). Currently I have implemented the baseline implementation which equals to Gunrock's LB strategy, the idempotence capability LB strategy, the dynamic group workload mapping strategy, and flexible uniquification filter. The direction optimal advance has also been implemented as in Ligra. I will gradually add more features as I explore more workload mapping strategies.
  • neighborhood reduce is a new operator enabled thanks to moderngpu 2.0's transform_segreduce operator. Different from a spmv where user always multiplies a vector with the whole matrix (in graph form, visits all the nodes neighbors), this operator takes dynamic generated input frontier, visits only these nodes' neighbors, and reduce over each neighborhood according to user-specific reduce operator by two functors: get_value_to_reduce and write_reduce_value.

mini/gunrock/tests/bfs/test_bfs.cu shows the power of mini-gunrock. After loading graph and setting up frontier and problem, the actual algorithm part only contains 8 lines of code. It is a truly data-centric framework and basically achieved our original idea of "the flow of frontier between multiple operators".

TODOS (with no particular orders)

  • Add a pure compute operator with no filter (maps to Gunrock's bypass_filter).
  • Add launch_box and restrict settings to further improve the performance.
  • Add the batch-intersection operator.
  • Add more graph primitives and their CPU validation code.

YZH

mini's People

Contributors

slashspirit avatar yzhwang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

mini's Issues

Compile Mini Gunrock with ptx assemble issue

I tried compile Mini Gunrock bfs and got following issue:

ptxas /tmp/tmpxft_00009acd_00000000-5_test_bfs.ptx, line 8902; error   : Instruction 'shfl' without '.sync' is not supported on .target sm_70 and higher from PTX ISA version 6.4
ptxas /tmp/tmpxft_00009acd_00000000-5_test_bfs.ptx, line 8906; error   : Instruction 'shfl' without '.sync' is not supported on .target sm_70 and higher from PTX ISA version 6.4
ptxas fatal   : Ptx assembly aborted due to errors
Makefile:18: recipe for target 'bin/test_bfs__x86_64' failed
make: *** [bin/test_bfs__x86_64] Error 255

I compiled with CUDA10 on daisy V100:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Apr_24_19:10:27_PDT_2019
Cuda compilation tools, release 10.1, V10.1.168

I suspect this is the moderngpu problem.

Thanks!
Yuxin

moderngpu stuck in a dead lock on Volta GPU

I am trying to run mini Gunrock BFS on daisy which has Volta GPU.

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Apr_24_19:10:27_PDT_2019
Cuda compilation tools, release 10.1, V10.1.168

gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

However, it is stuck in a deadlock. We found the code is stuck at moderngpu transform_scan.

If I run the same code on Luigi which has Tesla K40

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85

gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Mini Gunrock BFS runs correctly and successfully.

Segmentation Fault 139 (core dumped) occurs when building the code

When building the code I receive this error:

$make
mkdir -p bin
"/usr/local/cuda-7.5/bin/nvcc" -gencode=arch=compute_30,code="sm_30,compute_30" -std=c++11 -ccbin=/usr/bin/g++-4.8 -Xcompiler="-Wundef" -O2 -g -Xcompiler="-Werror" -lineinfo --expt-extended-lambda -use_fast_math -Xptxas="-v" -o bin/test_bfs__x86_64 test_bfs.cu -m64 -I.. -I../../src -I"../../../external/moderngpu/src"
Segmentation fault (core dumped)
make: *** [bin/test_bfs__x86_64] Error 139

My system:
Ubuntu14.04 64 Bit,
GeForce GT 740 : 1058.500 Mhz (Ordinal 0) Compute Capability sm_30 Mem Clock: 2500.000 Mhz x 128 bits ( 80.0 GB/s)

And I can build gunrock main project successfully. I can not figure out what the problem is.

invalid __shared__ read in moderngpu/src/moderngpu/cta_load_balance.hxx:171

I compiled pr and run with dataset soc-LiveJournal1.mtx and got the error:

cuda-memcheck ./bin/test_pr__x86_64 --file=../../../../gunrock/dataset/large/soc-LiveJournal1/soc-LiveJournal1.mtx

========= Invalid __shared__ read of size 4
=========     at 0x00002010 in /mnt/daisy-mount/mini/gunrock/tests/pr/../../../external/moderngpu/src/moderngpu/cta_load_balance.hxx:171:_ZN4mgpu16launch_box_cta_kINS_12launch_box_tIJNS_7arch_20INS_12launch_cta_tILi128ELi11ELi8ELi0EEENS_7empty_tEEENS_7arch_35INS3_ILi128ELi7ELi5ELi0EEES5_EENS_7arch_52IS4_S5_EEEEEZNS_13lbs_segreduceIS5_ZNS_13lbs_segreduceIS5_ZN7gunrock5oprtr12neighborhood19neighborhood_kernelINSF_2pr12pr_problem_tENSJ_12pr_functor_tEfNS_6plus_tIfEELb0ELb0EEEiSt10shared_ptrIT_ERSO_INSF_10frontier_tIiEEESU_PT1_SV_iRNS_18standard_context_tEEUliiiE_PiPfSN_fJEEEvT0_iSV_iT2_T3_T4_RNS_9context_tEDpT5_EUliiiNS_5tupleIJEEEE_S10_S1B_S11_SN_fJEEEvS12_iSV_iS13_S14_S15_S18_S17_DpT6_EUliiE_JEEEvS12_iDpSV_
=========     by thread (126,0,0) in block (29205,0,0)
=========     Address 0x09015178 is out of bounds

It seems a bug in moderngpu

dead lock problem for ballot function

ballot function also has the similar problem to issue: #3

https://github.com/yzhwang/moderngpu/blob/9a6c3167fc12ed8b459b7f4376dd89069cad3eb1/src/moderngpu/cta_segscan.hxx#L39

    if(tid < num_warps) {
      int cta_bits = ballot(0 != storage.delta[tid]);
      int warp_segment = 31 - clz(cta_mask & cta_bits);
      int start = (-1 != warp_segment) ?
        (31 - clz(storage.delta[warp_segment]) + 32 * warp_segment) : 0;
      storage.delta[num_warps + tid] = start;
    }

should be modified to:

    if(tid < num_warps) {
      unsigned mask = __activemask();
      int cta_bits = ballot(0 != storage.delta[tid], mask);
      int warp_segment = 31 - clz(cta_mask & cta_bits);
      int start = (-1 != warp_segment) ?
        (31 - clz(storage.delta[warp_segment]) + 32 * warp_segment) : 0;
      storage.delta[num_warps + tid] = start;
    }

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.