Giter Site home page Giter Site logo

imdb_pg_dataset's Introduction

IMDB / JOB Workload

This repository contains a Vagrant machine that automatically pulls down and imports the IMDB dataset from that paper How Good are Query Optimizers, Really?. Note that, upon provisioning, the VM will download 1.2GB+ of data.

It will create a VM running Arch Linux, upgrade it, install the latest version of Postgres, configure itself to use 16GB of RAM (12GB for the Postgres shared_buffers) and 4 CPU cores, create a 100GB disk image to hold the data, and, finally, download and load an archive. It could break at any moment.

Note: if you would just like to download a Postgres pg_dump of the IMDB dataset, you can get it here: https://doi.org/10.7910/DVN/2QYZBT

To use, first install the persistent storage Vagrant plugin:

vagrant plugin install vagrant-persistent-storage

Next, modify vagrant/Vagrantfile to set a path to where you would like the VDI containing the database to go.

config.persistent_storage.enabled = true
config.persistent_storage.location = "/PATH/TO/STORAGE/LOCATION.vdi"
config.persistent_storage.size = 100000
config.persistent_storage.mountname = 'pg'
config.persistent_storage.filesystem = 'ext4'
config.persistent_storage.mountpoint = '/media/data'
config.persistent_storage.volgroupname = 'myvolgroup'

Then, start up the VM:

cd vagrant
vagrant up
cd ..

You can ignore the last few warnings (about /home/vagrant). Note that this VM will have an open Postgres server, with a single user, imdb, with no password. You don't want to leave it running on a network you don't trust (or without your own firewall).

To connect to the database from your host machine:

psql -U imdb -h localhost

To run one of the JOB queries:

psql -U imdb -h localhost < job/1a.sql

Citation

If you use the JOB dataset, please cite the original authors (no affiliation):

@article{JOB,
  series = {VLDB '15},
  title = {How {{Good Are Query Optimizers}}, {{Really}}?},
  volume = {9},
  issn = {2150-8097},
  doi = {10.14778/2850583.2850594},
  number = {3},
  journal = {Proc. VLDB Endow.},
  author = {Leis, Viktor and Gubichev, Andrey and Mirchev, Atanas and Boncz, Peter and Kemper, Alfons and Neumann, Thomas},
  month = nov,
  year = {2015},
  pages = {204--215}
}

If you use this VM or our prepared dataset, please cite our paper as well:

@inproceedings{rejoin,
  address = {Houston, TX},
  series = {aiDM '18},
  title = {Deep {{Reinforcement Learning}} for {{Join Order Enumeration}}},
  shorttitle = {{{ReJOIN}}},
  booktitle = {First {{International Workshop}} on {{Exploiting Artificial Intelligence Techniques}} for {{Data Management}}},
  author = {Marcus, Ryan and Papaemmanouil, Olga},
  month = jun,
  year = {2018}
}

If you use the CEB datasets, please cite the Flow Loss paper:

Parimarjan Negi, Ryan Marcus, Andreas Kipf, Hongzi Mao, Nesime Tatbul, Tim Kraska, and Mohammad Alizadeh. 2021. Flow-loss: learning cardinality estimates that matter. Proc. VLDB Endow. 14, 11 (July 2021), 2019–2032. https://doi.org/10.14778/3476249.3476259

@article{flowloss,
author = {Negi, Parimarjan and Marcus, Ryan and Kipf, Andreas and Mao, Hongzi and Tatbul, Nesime and Kraska, Tim and Alizadeh, Mohammad},
title = {Flow-Loss: Learning Cardinality Estimates That Matter},
year = {2021},
issue_date = {July 2021},
publisher = {VLDB Endowment},
volume = {14},
number = {11},
issn = {2150-8097},
url = {https://doi.org/10.14778/3476249.3476259},
doi = {10.14778/3476249.3476259},
journal = {Proc. VLDB Endow.},
month = {jul},
pages = {2019–2032},
numpages = {14}
}

If you use the JOB extended queries, please cite the Neo paper:

Ryan Marcus, Parimarjan Negi, Hongzi Mao, Chi Zhang, Mohammad Alizadeh, Tim Kraska, Olga Papaemmanouil, and Nesime Tatbul. 2019. Neo: a learned query optimizer. Proc. VLDB Endow. 12, 11 (July 2019), 1705–1718. https://doi.org/10.14778/3342263.3342644

@article{neo,
author = {Marcus, Ryan and Negi, Parimarjan and Mao, Hongzi and Zhang, Chi and Alizadeh, Mohammad and Kraska, Tim and Papaemmanouil, Olga and Tatbul, Nesime},
title = {Neo: A Learned Query Optimizer},
year = {2019},
issue_date = {July 2019},
publisher = {VLDB Endowment},
volume = {12},
number = {11},
issn = {2150-8097},
url = {https://doi.org/10.14778/3342263.3342644},
doi = {10.14778/3342263.3342644},
journal = {Proc. VLDB Endow.},
month = {jul},
pages = {1705–1718},
numpages = {14}
}

If you use the JOB-D queries, please cite the HybridQO paper (no affiliation):

@article{DBLP:journals/pvldb/YuC0L22,
  author       = {Xiang Yu and
                  Chengliang Chai and
                  Guoliang Li and
                  Jiabin Liu},
  title        = {Cost-based or Learning-based? {A} Hybrid Query Optimizer for Query
                  Plan Selection},
  journal      = {Proc. {VLDB} Endow.},
  volume       = {15},
  number       = {13},
  pages        = {3924--3936},
  year         = {2022},
  url          = {https://www.vldb.org/pvldb/vol15/p3924-li.pdf},
  doi          = {10.14778/3565838.3565846},
  timestamp    = {Mon, 23 Oct 2023 15:31:40 +0200},
  biburl       = {https://dblp.org/rec/journals/pvldb/YuC0L22.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}

imdb_pg_dataset's People

Contributors

bindscha avatar ryanmarcus avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

imdb_pg_dataset's Issues

Issue in installing Bao extension

I am following the tutorial "BAO for PostgreSQL"https://rmarcus.info/bao_docs/tutorial.html. I installed postgresql in archlinux box of Vagrant and loaded the imdb dataset. I tested the dataset with the test case specified in the tutorial and it worked fine. When I tried to install the Bao Extension https://rmarcus.info/bao_docs/tutorial/1_pg_setup.html#install-the-bao-extension, it gave a list of errors and warnings. I tried this setup in Ubuntu OS without using Vagrant but the faced with the same problem. The result is listed below:
[vagrant@archlinux BaoForPostgreSQL]$ cd pg_extension
[vagrant@archlinux pg_extension]$ make USE_PGXS=1 install
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fPIC -I. -I./ -I/usr/include/postgresql/server -I/usr/include/postgresql/internal -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -I/usr/include/libxml2 -c -o main.o main.c
In file included from main.c:10:
bao_planner.h: In function ‘plan_arm’:
bao_planner.h:220:41: warning: passing argument 2 of ‘standard_planner’ makes pointer from integer without a cast [-Wint-conversion]
220 | plan = standard_planner(query_copy, cursorOptions, boundParams);
| ^~~~~~~~~~~~~
| |
| int
In file included from bao_util.h:8,
from main.c:8:
/usr/include/postgresql/server/optimizer/planner.h:41:64: note: expected ‘const char *’ but argument is of type ‘int’
41 | ern PlannedStmt *standard_planner(Query *parse, const char *query_string,
| ~~~~~~~~~~~~^~~~~~~~~~~~

In file included from main.c:10:
bao_planner.h:220:56: warning: passing argument 3 of ‘standard_planner’ makes integer from pointer without a cast [-Wint-conversion]
220 | plan = standard_planner(query_copy, cursorOptions, boundParams);
| ^~~~~~~~~~~
| |
| ParamListInfo {aka struct ParamListInfoData *}
In file included from bao_util.h:8,
from main.c:8:
/usr/include/postgresql/server/optimizer/planner.h:42:15: note: expected ‘int’ but argument is of type ‘ParamListInfo’ {aka ‘struct ParamListInfoData *’}
42 | int cursorOptions,
| ~~~~^~~~~~~~~~~~~
In file included from main.c:10:
bao_planner.h:220:12: error: too few arguments to function ‘standard_planner’
220 | plan = standard_planner(query_copy, cursorOptions, boundParams);
| ^~~~~~~~~~~~~~~~
In file included from bao_util.h:8,
from main.c:8:
/usr/include/postgresql/server/optimizer/planner.h:41:21: note: declared here
41 | extern PlannedStmt *standard_planner(Query *parse, const char *query_string,
| ^~~~~~~~~~~~~~~~
In file included from main.c:10:
bao_planner.h:228:43: warning: passing argument 2 of ‘standard_planner’ makes pointer from integer without a cast [-Wint-conversion]
228 | plan = standard_planner(query_copy, cursorOptions, boundParams);
| ^~~~~~~~~~~~~
| |
| int
bao_planner.h:22:5: note: in definition of macro ‘save_arm_options’
22 | { x }
| ^
In file included from bao_util.h:8,
from main.c:8:
/usr/include/postgresql/server/optimizer/planner.h:41:64: note: expected ‘const char *’ but argument is of type ‘int’
41 | ern PlannedStmt *standard_planner(Query *parse, const char *query_string,
| ~~~~~~~~~~~~^~~~~~~~~~~~

In file included from main.c:10:
bao_planner.h:228:58: warning: passing argument 3 of ‘standard_planner’ makes integer from pointer without a cast [-Wint-conversion]
228 | plan = standard_planner(query_copy, cursorOptions, boundParams);
| ^~~~~~~~~~~
| |
| ParamListInfo {aka struct ParamListInfoData *}
bao_planner.h:22:5: note: in definition of macro ‘save_arm_options’
22 | { x }
| ^
In file included from bao_util.h:8,
from main.c:8:
/usr/include/postgresql/server/optimizer/planner.h:42:15: note: expected ‘int’ but argument is of type ‘ParamListInfo’ {aka ‘struct ParamListInfoData *’}
42 | int cursorOptions,
| ~~~~^~~~~~~~~~~~~
In file included from main.c:10:
bao_planner.h:228:14: error: too few arguments to function ‘standard_planner’
228 | plan = standard_planner(query_copy, cursorOptions, boundParams);
| ^~~~~~~~~~~~~~~~
bao_planner.h:22:5: note: in definition of macro ‘save_arm_options’
22 | { x }
| ^
In file included from bao_util.h:8,
from main.c:8:
/usr/include/postgresql/server/optimizer/planner.h:41:21: note: declared here
41 | extern PlannedStmt *standard_planner(Query *parse, const char query_string,
| ^~~~~~~~~~~~~~~~
main.c: In function ‘_PG_init’:
main.c:61:16: warning: assignment to ‘planner_hook_type’ {aka ‘PlannedStmt * (
)(Query *, const char *, int, struct ParamListInfoData )’} from incompatible pointer type ‘PlannedStmt * ()(Query *, int, struct ParamListInfoData *)’ [-Wincompatible-pointer-types]
61 | planner_hook = bao_planner;
| ^
main.c: In function ‘bao_planner’:
main.c:166:37: warning: passing argument 2 of ‘prev_planner_hook’ makes pointer from integer without a cast [-Wint-conversion]
166 | return prev_planner_hook(parse, cursorOptions,
| ^~~~~~~~~~~~~
| |
| int
main.c:166:37: note: expected ‘const char *’ but argument is of type ‘int’
main.c:167:30: warning: passing argument 3 of ‘prev_planner_hook’ makes integer from pointer without a cast [-Wint-conversion]
167 | boundParams);
| ^~~~~~~~~~~
| |
| ParamListInfo {aka struct ParamListInfoData *}
main.c:167:30: note: expected ‘int’ but argument is of type ‘ParamListInfo’ {aka ‘struct ParamListInfoData *’}
main.c:166:12: error: too few arguments to function ‘prev_planner_hook’
166 | return prev_planner_hook(parse, cursorOptions,
| ^~~~~~~~~~~~~~~~~
main.c:175:36: warning: passing argument 2 of ‘standard_planner’ makes pointer from integer without a cast [-Wint-conversion]
175 | return standard_planner(parse, cursorOptions,
| ^~~~~~~~~~~~~
| |
| int
In file included from bao_util.h:8,
from main.c:8:
/usr/include/postgresql/server/optimizer/planner.h:41:64: note: expected ‘const char *’ but argument is of type ‘int’
41 | ern PlannedStmt *standard_planner(Query *parse, const char *query_string,
| ~~~~~~~~~~~~^~~~~~~~~~~~

main.c:176:29: warning: passing argument 3 of ‘standard_planner’ makes integer from pointer without a cast [-Wint-conversion]
176 | boundParams);
| ^~~~~~~~~~~
| |
| ParamListInfo {aka struct ParamListInfoData *}
In file included from bao_util.h:8,
from main.c:8:
/usr/include/postgresql/server/optimizer/planner.h:42:15: note: expected ‘int’ but argument is of type ‘ParamListInfo’ {aka ‘struct ParamListInfoData *’}
42 | int cursorOptions,
| ~~~~^~~~~~~~~~~~~
main.c:175:12: error: too few arguments to function ‘standard_planner’
175 | return standard_planner(parse, cursorOptions,
| ^~~~~~~~~~~~~~~~
In file included from bao_util.h:8,
from main.c:8:
/usr/include/postgresql/server/optimizer/planner.h:41:21: note: declared here
41 | extern PlannedStmt *standard_planner(Query *parse, const char *query_string,
| ^~~~~~~~~~~~~~~~
main.c:187:36: warning: passing argument 2 of ‘standard_planner’ makes pointer from integer without a cast [-Wint-conversion]
187 | return standard_planner(parse, cursorOptions, boundParams);
| ^~~~~~~~~~~~~
| |
| int
In file included from bao_util.h:8,
from main.c:8:
/usr/include/postgresql/server/optimizer/planner.h:41:64: note: expected ‘const char *’ but argument is of type ‘int’
41 | ern PlannedStmt *standard_planner(Query *parse, const char *query_string,
| ~~~~~~~~~~~~^~~~~~~~~~~~

main.c:187:51: warning: passing argument 3 of ‘standard_planner’ makes integer from pointer without a cast [-Wint-conversion]
187 | return standard_planner(parse, cursorOptions, boundParams);
| ^~~~~~~~~~~
| |
| ParamListInfo {aka struct ParamListInfoData *}
In file included from bao_util.h:8,
from main.c:8:
/usr/include/postgresql/server/optimizer/planner.h:42:15: note: expected ‘int’ but argument is of type ‘ParamListInfo’ {aka ‘struct ParamListInfoData *’}
42 | int cursorOptions,
| ~~~~^~~~~~~~~~~~~
main.c:187:12: error: too few arguments to function ‘standard_planner’
187 | return standard_planner(parse, cursorOptions, boundParams);
| ^~~~~~~~~~~~~~~~
In file included from bao_util.h:8,
from main.c:8:
/usr/include/postgresql/server/optimizer/planner.h:41:21: note: declared here
41 | extern PlannedStmt *standard_planner(Query *parse, const char *query_string,
| ^~~~~~~~~~~~~~~~
main.c: In function ‘bao_ExplainOneQuery’:
main.c:318:46: warning: passing argument 2 of ‘planner_hook’ makes pointer from integer without a cast [-Wint-conversion]
318 | plan = (planner_hook ? planner_hook(query, cursorOptions, params)
| ^~~~~~~~~~~~~
| |
| int
main.c:318:46: note: expected ‘const char *’ but argument is of type ‘int’
main.c:318:61: warning: passing argument 3 of ‘planner_hook’ makes integer from pointer without a cast [-Wint-conversion]
318 | plan = (planner_hook ? planner_hook(query, cursorOptions, params)
| ^~~~~~
| |
| ParamListInfo {aka struct ParamListInfoData *}
main.c:318:61: note: expected ‘int’ but argument is of type ‘ParamListInfo’ {aka ‘struct ParamListInfoData *’}
main.c:318:26: error: too few arguments to function ‘planner_hook’
318 | plan = (planner_hook ? planner_hook(query, cursorOptions, params)
| ^~~~~~~~~~~~
main.c:319:37: warning: passing argument 2 of ‘standard_planner’ makes pointer from integer without a cast [-Wint-conversion]
319 | : standard_planner(query, cursorOptions, params));
| ^~~~~~~~~~~~~
| |
| int
In file included from bao_util.h:8,
from main.c:8:
/usr/include/postgresql/server/optimizer/planner.h:41:64: note: expected ‘const char *’ but argument is of type ‘int’
41 | ern PlannedStmt *standard_planner(Query *parse, const char *query_string,
| ~~~~~~~~~~~~^~~~~~~~~~~~

main.c:319:52: warning: passing argument 3 of ‘standard_planner’ makes integer from pointer without a cast [-Wint-conversion]
319 | : standard_planner(query, cursorOptions, params));
| ^~~~~~
| |
| ParamListInfo {aka struct ParamListInfoData *}
In file included from bao_util.h:8,
from main.c:8:
/usr/include/postgresql/server/optimizer/planner.h:42:15: note: expected ‘int’ but argument is of type ‘ParamListInfo’ {aka ‘struct ParamListInfoData *’}
42 | int cursorOptions,
| ~~~~^~~~~~~~~~~~~
main.c:319:13: error: too few arguments to function ‘standard_planner’
319 | : standard_planner(query, cursorOptions, params));
| ^~~~~~~~~~~~~~~~
In file included from bao_util.h:8,
from main.c:8:
/usr/include/postgresql/server/optimizer/planner.h:41:21: note: declared here
41 | extern PlannedStmt *standard_planner(Query *parse, const char *query_string,
| ^~~~~~~~~~~~~~~~
main.c:325:5: error: too few arguments to function ‘ExplainOnePlan’
325 | ExplainOnePlan(plan, into, es, queryString,
| ^~~~~~~~~~~~~~
In file included from main.c:17:
/usr/include/postgresql/server/commands/explain.h:90:13: note: declared here
90 | extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
| ^~~~~~~~~~~~~~
main.c:399:3: error: too few arguments to function ‘ExplainOnePlan’
399 | ExplainOnePlan(plan, into, es, queryString,
| ^~~~~~~~~~~~~~
In file included from main.c:17:
/usr/include/postgresql/server/commands/explain.h:90:13: note: declared here
90 | extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
| ^~~~~~~~~~~~~~
make: *** [: main.o] Error 1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.