Giter Site home page Giter Site logo

sean-chester / skybench Goto Github PK

View Code? Open in Web Editor NEW
14.0 7.0 4.0 25.1 MB

Collection of algorithms in C++ for main-memory skyline computation

License: MIT License

Makefile 1.90% Shell 9.86% C++ 78.39% C 9.86%
database-management skyline-query multicore

skybench's Introduction

SkyBench

Version 1.1

© 2015-2016 Darius Šidlauskas, Sean Chester, and Kenneth S. Bøgh


Table of Contents


Introduction

The SkyBench software suite contains software for efficient main-memory computation of skylines. The state-of-the-art sequential (i.e., single-threaded) and multi-core (i.e., multi-threaded) algorithms are included.

The skyline operator [1] identifies so-called pareto-optimal points in a multi-dimensional dataset. In two dimensions, the problem is often presented as finding the silhouette of Manhattan:
if one has knows the position of the corner points of every building, what parts of which buildings are visible from across the river? The two-dimensional case is trivial to solve and not the focus of SkyBench.

In higher dimensions, the problem is formalised with the concept of dominance: a point p is dominated by another point q if q has better or equal values for every attribute and the points are distinct. All points that are not dominated are part of the skyline. For example, if the points correspond to hotels, then any hotel that is more expensive, farther from anything of interest, and lower-rated than another choice would not be in the skyline. In the table below, Marge's Hotel is dominated by Happy Hostel, because it is more expensive, farther from Central Station, and lower rated, so it is not in the skyline. On the other hand, The Grand has the best rating and Happy Hostel has the best price. Lovely Lodge does not have the best value for any one attribute, but neither The Grand nor Happy Hostel outperform it on every attribute, so it too is in the skyline and represents a good balance of the attributes.

Name Price per Night Rating Distance to Central Station In skyline?
The Grand $325 ⋆⋆⋆⋆⋆ 1.2km
Marge's Motel $55 ⋆⋆ 3.6km
Happy Hostel $25 ⋆⋆⋆ 0.4km
Lovely Lodge $100 ⋆⋆⋆⋆ 8.2km

As the number of dimensions/attributes increases, so too does the size of and difficulty in producing the skyline. Parallel algorithms, such as those implemented here, quickly become necessary.

SkyBench is released in conjunction with our recent ICDE paper [2]. All of the code and scripts necessary to repeat experiments from that paper are available in this software suite. To the best of our knowledge, this is also the first publicly released C++ skyline software, which will hopefully be a useful resource for the academic and industry research communities.


Algorithms

The following algorithms have been implemented in SkyBench:

  • Hybrid [2]: Located in src/hybrid. It is the state-of-the-art multi-core algorithm, based on two-level quad-tree partitioning of the data and memoisation of point-to-point relationships.

  • Q-Flow [2]: Located in src/qflow. It is a simplification of Hybrid to demonstrate control flow.

  • PSkyline [3]: Located in src/pskyline. It was the previous state-of-the-art multi-core algorithm, based on a divide-and-conquer paradigm.

  • BSkyTree [4]: Located in src/bskytree. It is the state-of-the-art sequential algorithm, based on a quad-tree partitioning of the data and memoisation of point-to-point relationships.

All four algorithms are implementations of the common interface defined in common/skyline_i.h and use common dominance tests from
common/common.h and common/dt_avx.h (the latter when vectorisation is enabled).


Datasets

For reproducibility of the experiments in [2], we include three datasets. The WEATHER dataset was originally obtained from The University of East Anglia Climatic Research Unit and preprocessed for skyline computation. We also include two classic skyline datasets, exactly as used in [2]: NBA and HOUSE.

The synthetic workloads can be generated with the standard benchmark skyline data generator [1] hosted on pgfoundry.


Requirements

SkyBench depends on the following applications:

  • A C++ compiler that supports C++11 and OpenMP (e.g., the newest GNU compiler)

  • The GNU make program

  • AVX or AVX2 if vectorised dominance tests are to be used


Usage

To run, the code needs to be compiled with the given number of dimensions.^ For example, to compute the skyline of the 8-dimensional NBA data set located in workloads/nba-U-8-17264.csv, do:

make all DIMS=8

./bin/SkyBench -f workloads/nba-U-8-17264.csv

By default, it will compute the skyline with all algorithms. Running ./bin/SkyBench without parameters will provide more details about the supported options.

You can make use of the provided shell script (/script/runExp.sh) that does all of the above automatically. For details, execute:

./script/runExp.sh

To reproduce the experiment with real datasets (Table II in [2]), do (assuming a 16-core machine):

./scripts/realTest.sh 16 T "bskytree pbskytree pskyline qflow hybrid"

^For performance reasons, skyline implementations that we obtained from other authors compile their code for a specific number of dimensions. For a fair comparison, we adopted the same approach.


License

This software is subject to the terms of The MIT License, which has been included in this repository.


Contact

This software suite will be expanded soon with new algorithms; so, you are encouraged to ensure that this is still the latest version. Please do not hesitate to contact the authors if you have comments, questions, or bugs to report.

SkyBench on GitHub


References

S. Börzsönyi, D. Kossmann, and K. Stocker. (2001) "The Skyline Operator." In Proceedings of the 17th International Conference on Data Engineering (ICDE 2001), 421--432. http://infolab.usc.edu/csci599/Fall2007/papers/e-1.pdf

S. Chester, D. Šidlauskas, I Assent, and K. S. Bøgh. (2015) "Scalable parallelization of skyline computation for multi-core processors." In Proceedings of the 31st IEEE International Conference on Data Engineering (ICDE 2015), 1083--1094. http://cs.au.dk/~schester/publications/chester_icde2015_mcsky.pdf

H. Im, J. Park, and S. Park. (2011) "Parallel skyline computation on multicore architectures." Information Systems 36(4): 808--823. http://dx.doi.org/10.1016/j.is.2010.10.005

J. Lee and S. Hwang. (2014) "Scalable skyline computation using a balanced pivot selection technique." Information Systems 39: 1--21. http://dx.doi.org/10.1016/j.is.2013.05.005


skybench's People

Contributors

jnider avatar sean-chester avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

skybench's Issues

graphs not found after compiling and running the project

thank you for sharing this great project
i am using Ubuntu 18.04
i followed your description in the Readme file

when i run make all DIMS=8

the project compiled and the .o files created in bin directory
then after i run ./bin/SkyBench -f workloads/nba-U-8-17264.csv
and
./runExp.sh -p -i workloads/ -t 8 -c 1000000 -d "2 4 6 8 10 12 14 16 18 20 22 24" -s "bskytree hybrid"
i get .csv files in the result folders

i want to know how do you get the graphs in figure 5 fig 6 ... and figure 13 in your published paper ICDE 2015 (Scalable Parallelization of Skyline Computation for Multicore Processors).

please is this the code not enough to generate those graphs

if i compile the project in a 4 core laptop are there changes i can make to your code to get the results in your published paper ?

i changed threads="4"

but it not worked

bskytree not giving expected result

Hi Guys,

I'm trying to use your single threaded bskytree implementation to check the results against my own skyline algorithm implementation. I'm getting odd results when I input the following toy data set,

std::vector< std::vector<float> > vvf {
    {0.010,0.010,0.010,0.010},
    {0.001,0.001,0.001,0.001},
    {0.100,0.100,0.100,0.100}
};

The result I get back says that the skyline includes points (0, 1, 2) when I think it should just be point 1.

The code i'm using to make the call is based on what you've used in testdriver.cpp,

const uint32_t n = vvf.size();
const uint32_t d = vvf.front().size();

float** data = AllocateDoubleArray( n, d );
redistribute_data( vvf, data );
vvf.clear();

SkyTree s(n, d, data, true, false);
s.SkyTree::Init( data );
vector<int> res = s.SkyTree::Execute();

Appreciate if you could point out what i'm doing wrong.

Thanks,
David

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.