Giter Site home page Giter Site logo

sigmod17reproducibility's Introduction

SIGMOD17Reproducibility

Code Information
Programming Language Python
Compiler Info Python 2.7 Interpreter
Packages/Libraries Needed Python Anaconda (See details below)

Datasets

Experiments require MySQL 5.6. The code does not have any version specific features and should work on all version after 5.6 too.

Below are the dataset files. These scripts assume that there is no database with the name graph, qa, ssb and tpch. To import the database, use the following command replace <filename> with graph, qa, ssb and tpch.

mysql -u <username> -p < <filename>.sql
  1. https://www.dropbox.com/s/7mb6snalnxndlxp/graph.sql?dl=0
  2. https://www.dropbox.com/s/aqop2af4i39pe1w/qa.sql?dl=0
  3. https://www.dropbox.com/s/y609n91exdishyf/ssb.sql?dl=0
  4. https://www.dropbox.com/s/kdk4iq9kngu5cr2/tpch.sql?dl=0

Alternatively, you could also run the db.sh file in the root folder. If pass is your MySQL password, you can execute the command sudo ./db.sh pass to download the files and set up the database.

Hardware Information

All experiments were performed on 16GB machine installed with OS X 10.10.5. There are no special hardware requirements and results should be easily reproducable on any standard machine.

Hardware Information
Processor 2.2 GHz Intel Core i7
Memory 16 GB 1600 MHz DDR3
Cores quad core
Cache 6MB shared L3 cache

Environment Setup

  • Execute the database scripts. Once the scripts are executed, you should see 4 databases in MySQL.
  • Checkout the code from GitHub to a folder on the machine.
  • Install Anaconda as explained here
  • We will replicate the python code environment using conda's virtual environment. Execute the following command from the top level of the directory so that environment.yml is present. This will create a private virtual python 2.7 based anaconda environment with all dependencies loaded
conda env create -f environment.yml
  • Navigate to the SIGMOD17Reproducibility directory in terminal.
  • Activate the environment by using the command source activate sigmodduplicate.
  • Install mysql-python using the command pip install mysql-python.
  • Update constants/db.py file with the username and password for the database. Set up is now complete.

Running Experiments

Below is the complete set of experiments to reproduce results in the paper. An overarching note is to keep in mind that there will be some amount of variability in the results (time taken or price assigned to queries) due to sampling of query parameters, data sampled or both. However, these variations are minor and never close to an order of magnitude. Thus, the important thing is to observe the trend in the graphs which should be close to the results in the paper.

Section 2.4

  • Execute the following commands
cd integration
python CombinerBenchmarkPriceBehavior.py

This set of experiment will generate 4 figures - benchmarkselect.pdf, benchmarkproject.pdf, benchmarkjoin.pdf and benchmarkgroup.pdf corresponding to Figure 2 in the paper. Note that the legend labelling is consistent with the paper for ease of verification.

Section 5.1

  • Execute the following commands
cd integration
python PriceSelectivity.py #Generates benchmarkselectsupportsize.pdf corresponding to Figure 4a
python PriceAttributes.py #Generates benchmarkprojectsupportsize.pdf corresponding to Figure 4b
python SwapUpdateFraction.py #Generates benchmarkcellswapratio.pdf corresponding to Figure 4c
python SupportSetSize.py #Generates benchmarktimesssize.pdf corresponding to Figure 4d
  • Execute the following for SSB experiments
cd integration_ssb
python CombinerReproduce.py #Generates ssbstatichistorytime.pdf, ssbstatichistoryawareprice.pdf corresponding to Figure 4f/4e respectively and barchartssbtime.pdf for Figure 5a
python HistoryAwareQ11.py #Generates ssbq11.pdf corresponding to 4g
  • Execute the following for TPCH experiment
cd integration_tpch
python CombinerReproduce.py #Generates barcharttpchtimetest.pdf for Figure 5b

Section 5.4

  • Execute the following commands
cd integration_dblp
python Combiner.py #Generates prices for queryes Q^c_1, Q^c_2, Q^c_3, Q^c_4, Q^c_5, Q^c_6, Q^c_7
  • Execute the following commands
cd integration_crash
python Combiner.py #Generates prices for queryes Q^d_1, Q^d_2, Q^d_3, Q^d_4

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.