Giter Site home page Giter Site logo

shixuansun / rapidflow Goto Github PK

View Code? Open in Web Editor NEW
25.0 2.0 3.0 34 MB

Source code of "RapidFlow: An Efficient Approach to Continuous Subgraph Matching" published in VLDB'2022 - By Shixuan Sun, Xibo Sun, Bingsheng He and Qiong Luo

License: MIT License

CMake 0.17% C 4.21% C++ 95.03% Python 0.59%
dynamic-graphs streaming subgraph-matching subgraph-query-processing continuous-subgraph-matching

rapidflow's Introduction

RapidFlow

Introduction

Continuous subgraph matching (CSM) is an important building block in many real-time graph processing applications. Given a subgraph query ๐‘„ and a data graph stream, a CSM algorithm reports the occurrences of ๐‘„ in the stream. Specifically, when a new edge ๐‘’ arrives in the stream, existing CSM algorithms start from the inserted ๐‘’ in the current data graph ๐บ to search ๐‘„. However, this rigid matching order of always starting from ๐‘’ can lead to a massive number of partial results that will turn out futile. Also, if ๐‘„ contains automorphisms, there will be a lot of redundant computation in the matching process. To address these two problems, we propose RapidFlow, an effective approach to CSM. First, we design a query reduction technique, which reduces CSM to batch subgraph matching (BSM) where we enumerate all results in a region of ๐บ that will be affected by the update. The well-established BSM techniques can determine effective matching orders, not necessarily starting from the newly inserted edge. Second, to eliminate redundant computation caused by automorphisms in ๐‘„, we propose dual matching, which leverages the duality of ๐‘„ and ๐บ in the matching process. Extensive experiment results show that RapidFlow outperforms state-of-the-art algorithms, including TurboFlux and SymBi, by up to two orders of magnitude on various workloads.

For the details, please refer to our VLDB'2022 paper "RapidFlow: An Efficient Approach to Continuous Subgraph Matching" by Shixuan Sun, Xibo Sun, Bingsheng He, Qiong Luo. If you have any further questions, please feel free to contact us.

Compile

Under the root directory of the project, execute the following commands to compile the source code.

mkdir build
cd build
cmake ..
make

Test

Execute the following commands to test the correctness of the binary file. The script will execute 100 test cases on insert and delete streams, respectively. The test data is stored in the insert and delete folders. The test process will take around 15 minutes. As we have tested the correctness of the binary, execute the command unless you modified the source code.

cd test
python test.py ../build/streaming/RapidFlow.out

Execute

After compiling the source code, you can find the binary file 'RapidFlow.out' under the 'build/streaming' directory. Execute the binary with the following command './RapidFlow.out -d data_graphs -q query_graphs -u update_streams -num number_of_embeddings -time_limit time_in_seconds', in which '-d' specifies the input of the data graphs, '-q' specifies the input of the query graphs and '-u' specifies the input of the graph update stream. The '-num' parameter sets the maximum number of embeddings that you would like to find for each edge update. If the number of embeddings enumerated reaches the limit or all incremental results for the update have been found, then the program will process next update. Set '-num' as 'MAX' to find all incremental results for each update. The '-time_limit' parameter configures the time budget for the query. If the query cannot be completed within the time limit, then the program will terminate the query and return the number of results found. The default value is 3600 seconds (1 hour). This figure illustrates the output log of the binary.

Example: The time limit is 3600 seconds for each query. The target number of incremental matches is 429496729, which is the limit of uint32_t (Note that the target number is the limit of uint64_t if set -num to MAX. In our experiments, we set -num to 429496729 to obtain the query time).

./RapidFlow.out -d ../../test/insert/data_graph/data.graph -q ../../test/insert/query_graph/Q_0 -u ../../test/insert/data_graph/insertion.graph -num 429496729 -time_limit 3600

Example: The time limit is 3600 seconds for each query. The target number of incremental matches is 1 (In our experiments, we set -num to 1 to obtain the response time).

./RapidFlow.out -d ../../test/insert/data_graph/data.graph -q ../../test/insert/query_graph/Q_0 -u ../../test/insert/data_graph/insertion.graph -num 1 -time_limit 3600

The query time of processing a stream includes the global indexing time, which is the elapsed time of updating the global index, the local indexing time, which is the elapsed time of updating the local index, and the enumeration time, which is the elapsed time of enumerating the incremental results. If you want to measure the three metrics separately, add the macro '#define MEASURE_INDEXING_COST' to Line 18 in streaming_engine.cpp.

Experiment Datasets and Baseline Methods in our Experiments

The datasets and baseline methods in our experiments can be found in this repository, which is an in-depth study of existing continuous subgraph matching methods by our research team.

rapidflow's People

Contributors

shixuansun avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

rapidflow's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.