Giter Site home page Giter Site logo

hyrise / sql-parser Goto Github PK

View Code? Open in Web Editor NEW
734.0 48.0 243.0 2.9 MB

SQL Parser for C++. Building C++ object structure from SQL statements.

License: MIT License

Makefile 0.95% Python 0.15% Shell 0.65% C++ 88.52% C 2.56% Yacc 6.15% Lex 1.03%
sql c-plus-plus sql-parser hyrise

sql-parser's Introduction

Build Status Coverage Status CodeFactor

Welcome to Hyrise

Hyrise is a research in-memory database system that has been developed by HPI since 2009 and has been entirely rewritten in 2017. Our goal is to provide a clean and flexible platform for research in the area of in-memory data management. Its architecture allows us, our students, and other researchers to conduct experiments around new data management concepts. To enable realistic experiments, Hyrise features comprehensive SQL support and performs powerful query plan optimizations. Well-known benchmarks, such as TPC-H or TPC-DS, can be executed with a single command and without any preparation.

This readme file focuses on the technical aspects of the repository. For more background on our research and for a list of publications, please visit the Hyrise project page.

You can still find the (archived) previous version of Hyrise on Github.

Citation

When referencing this version of Hyrise, please use the following bibtex entry:

(click to expand)
@inproceedings{DBLP:conf/edbt/DreselerK0KUP19,
  author    = {Markus Dreseler and
               Jan Kossmann and
               Martin Boissier and
               Stefan Klauck and
               Matthias Uflacker and
               Hasso Plattner},
  editor    = {Melanie Herschel and
               Helena Galhardas and
               Berthold Reinwald and
               Irini Fundulaki and
               Carsten Binnig and
               Zoi Kaoudi},
  title     = {Hyrise Re-engineered: An Extensible Database System for Research in
               Relational In-Memory Data Management},
  booktitle = {Advances in Database Technology - 22nd International Conference on
               Extending Database Technology, {EDBT} 2019, Lisbon, Portugal, March
               26-29, 2019},
  pages     = {313--324},
  publisher = {OpenProceedings.org},
  year      = {2019},
  url       = {https://doi.org/10.5441/002/edbt.2019.28},
  doi       = {10.5441/002/edbt.2019.28},
  timestamp = {Mon, 18 Mar 2019 16:09:00 +0100},
  biburl    = {https://dblp.org/rec/conf/edbt/DreselerK0KUP19.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Supported Systems

Hyrise is developed for Linux (preferrably the most current Ubuntu version) and optimized to run on server hardware. We support Mac to facilitate the local development of Hyrise, but do not recommend it for benchmarking.

Supported Benchmarks

We support a number of benchmarks out of the box. This makes it easy to generate performance numbers without having to set up the data generation, loading CSVs, and finding a query runner. You can run them using the ./hyriseBenchmark* binaries.

Note that the query plans are generated in our CI pipeline with possibly many stages in parallel and different CI runs might be executed on different machines. Reported runtimes are not to be taken as solid benchmark performance numbers.

Benchmark Notes
TPC-DS Query Plans
TPC-H Query Plans
Join Order Query Plans
Star Schema Query Plans
JCC-H Call the hyriseBenchmarkTPCH binary with the -j flag.
TPC-C In development, no proper optimization done yet

Getting started

Have a look at our contributor guidelines.

You can find definitions of most of the terms and abbreviations used in the code in the glossary. If you cannot find something that you are looking for, feel free to open an issue.

The Step by Step Guide is a good starting point to get to know Hyrise.

Native Setup

You can install the dependencies on your own or use the install_dependencies.sh script (recommended) which installs all of the therein listed dependencies and submodules. The install script was tested under macOS Monterey (12.4) and Ubuntu 22.04.

See dependencies for a detailed list of dependencies to use with brew install or apt-get install, depending on your platform. As compilers, we generally use recent versions of clang and gcc (Linux only). Please make sure that the system compiler points to the most recent version or use cmake (see below) accordingly. Older versions may work, but are neither tested nor supported.

Nix Setup

You can build Hyrise using Nix. To do so, first install Nix on your current operating system. Afterward, run the following command in the root of the repository:

nix-shell resources/nix --pure

This will drop you into a shell with all dependencies installed. You can now build Hyrise as usual. Please note that using the --pure flag is recommended as it avoids using dependencies from the local system.

For more information on Nix, see Nix Packages.

Setup using Docker

If you want to create a Docker-based development environment using CLion, head over to our dedicated tutorial.

Otherwise, to get all dependencies of Hyrise into a Docker image, run

docker build -t hyrise .

You can start the container via

docker run -it hyrise

Inside the container, you can then checkout Hyrise and run ./install_dependencies.sh to download the required submodules.

Building and Tooling

It is highly recommended to perform out-of-source builds, i.e., creating a separate directory for the build. Advisable names for this directory would be cmake-build-{debug,release}, depending on the build type. Within this directory call cmake .. to configure the build. By default, we use very strict compiler flags (beyond -Wextra, including -Werror). If you use one of the officially supported environments, this should not be an issue. If you simply want to test Hyrise on a different system and run into issues, you can call cmake -DHYRISE_RELAXED_BUILD=On .., which will disable these strict checks. Subsequent calls to CMake, e.g., when adding files to the build will not be necessary, the generated Makefiles will take care of that.

Compiler choice

CMake will default to your system's default compiler. To use a different one, call cmake -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ .. in a clean build directory. See dependencies for supported compiler versions.

Unity Builds

Starting with cmake 3.16, you can use -DCMAKE_UNITY_BUILD=On to perform unity builds. For a complete (re-)build or when multiple files have to be rebuilt, these are usually faster, as the relative cost of starting a compiler process and loading the most common headers is reduced. However, this only makes sense for debug builds. See our blog post on reducing the compilation time for details.

ccache

For development, you may want to use ccache, which reduces the time needed for recompiles significantly. Especially when switching branches, this can reduce the time to recompile from several minutes to one or less. On the downside, we have seen random build failures on our CI server, which is why we do not recommend ccache anymore but merely list it as an option. To use ccache, add -DCMAKE_CXX_COMPILER_LAUNCHER=ccache to your cmake call. You will need to adjust some ccache settings either in your environment variables or in your ccache config so that ccache can handle the precompiled headers. On our CI server, this worked for us: CCACHE_SLOPPINESS=file_macro,pch_defines,time_macros CCACHE_DEPEND=1.

Build

Simply call make -j*, where * denotes the number of threads to use.

Usually debug binaries are created. To configure a build directory for a release build make sure it is empty and call CMake like cmake -DCMAKE_BUILD_TYPE=Release

Lint

./scripts/lint.sh (Google's cpplint is used for the database code. In addition, we use flake8 for linting the Python scripts under /scripts.)

Format

./scripts/format.sh (clang-format is used for the database code. We use black for formatting the Python scripts under /scripts.)

Test

Calling make hyriseTest from the build directory builds all available tests. The binary can be executed with ./<YourBuildDirectory>/hyriseTest. Subsets of all available tests can be selected via --gtest_filter=.

Coverage

./scripts/coverage.sh will print a summary to the command line and create detailed html reports at ./coverage/index.html

Requires clang on macOS and Linux.

Address/UndefinedBehavior Sanitizers

cmake -DENABLE_ADDR_UB_LEAK_SANITIZATION=ON will generate Makefiles with AddressSanitizer, LeakSanitizer, and Undefined Behavior options. Compile and run them as normal - if any issues are detected, they will be printed to the console. It will fail on the first detected error and will print a summary. To convert addresses to actual source code locations, make sure llvm-symbolizer is installed (included in the llvm package) and is available in $PATH. To specify a custom location for the symbolizer, set $ASAN_SYMBOLIZER_PATH to the path of the executable. This seems to work out of the box on macOS - if not, make sure to have llvm installed. The binary can be executed with LSAN_OPTIONS=suppressions=asan-ignore.txt ./<YourBuildDirectory>/hyriseTest.

cmake -DENABLE_THREAD_SANITIZATION=ON will work as above but with the ThreadSanitizer. Some sanitizers are mutually exclusive, which is why we use two configurations for this.

Compile Times

When trying to optimize the time spent building the project, it is often helpful to have an idea how much time is spent where. scripts/compile_time.sh helps with that. Get usage instructions by running it without any arguments.

Maintainers

  • Martin Boissier
  • Daniel Lindner
  • Marcel Weisgut

Contact: [email protected]

Maintainers Emeriti

  • Markus Dreseler
  • Stefan Halfpap
  • Jan Kossmann

Contributors

  • Yannick Bäumer
  • Lawrence Benson
  • Jasper Blum
  • Lukas Budach
  • Timo Djürken
  • Alexander Dubrawski
  • Fabian Dumke
  • Leonard Geier
  • Richard Ebeling
  • Fabian Engel
  • Ben-Noah Engelhaupt
  • Moritz Eyssen
  • Martin Fischer
  • Christian Flach
  • Pedro Flemming
  • Mathias Flüggen
  • Johannes Frohnhofen
  • Pascal Führlich
  • Carl Gödecken
  • Adrian Holfter
  • Theresa Hradilak
  • Ben Hurdelhey
  • Sven Ihde
  • Ivan Illic
  • Jonathan Janetzki
  • Michael Janke
  • Max Jendruk
  • Tobias Jordan
  • David Justen
  • Youri Kaminsky
  • Marvin Keller
  • Mirko Krause
  • Eva Krebs
  • Henok Lachmann
  • Sven Lehmann
  • Till Lehmann
  • Tom Lichtenstein
  • Alexander Löser
  • Jan Mattfeld
  • Arne Mayer
  • Dominik Meier
  • Julian Menzler
  • Torben Meyer
  • Leander Neiß
  • Vincent Rahn
  • Hendrik Rätz
  • Robert Richter
  • Niklas Riekenbrauck
  • Alexander Riese
  • Marc Rosenau
  • Johannes Schneider
  • David Schumann
  • Simon Siegert
  • Arthur Silber
  • Furkan Simsek
  • Toni Stachewicz
  • Daniel Stolpe
  • Jonathan Striebel
  • Nils Thamm
  • Hendrik Tjabben
  • Justin Trautmann
  • Carsten Walther
  • Leo Wendt
  • Lukas Wenzel
  • Fabian Wiebe
  • Tim Zimmermann

sql-parser's People

Contributors

ankushrayabhari avatar bensk1 avatar bmwiedemann avatar bouncner avatar cclauss avatar cdmh avatar chrisxu333 avatar d-justen avatar dencrash avatar dey4ss avatar f4lco avatar j-tr avatar javrucebo avatar julianmenzler avatar klauck avatar lawben avatar mengke-mk avatar mgonnav avatar mrks avatar msiomkin avatar mweisgut avatar nanxiao avatar santazhang avatar schwald avatar simonstadlinger avatar tennyzhuang avatar till-later avatar timzimmermann avatar torpedro avatar valsight-foss avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sql-parser's Issues

Properly test the print methods

As seen in #111, the print* methods are entirely untested. Since we want this to be useful for people outside of Hyrise, too, we should make sure that we add tests for this (and any other untested parts).

how to support the sql syntax ?

the sql is following:

select * from (
SELECT o.OBJECT_NAME, o.OBJECT_ID ,'' short_name, decode(bitand(t.property, 32), 32, 'YES', 'NO') partitioned,
decode(bitand(t.property, 64), 64, 'IOT',
decode(bitand(t.property, 512), 512, 'IOT_OVERFLOW',
decode(bitand(t.flags, 536870912), 536870912, 'IOT_MAPPING', null))) iot_type,
o.OWNER OBJECT_OWNER, o.CREATED, o.LAST_DDL_TIME, O.GENERATED, O.TEMPORARY, case when xt.obj# is null then 'N' else 'Y' end EXTERNAL
FROM SYS.Dba_OBJECTS O ,sys.tab$ t, sys.external_tab$ xt
WHERE O.OWNER = :SCHEMA
and o.object_id = t.obj#(+)
and o.object_id = xt.obj#(+)
AND O.OBJECT_TYPE = 'TABLE'
union all
SELECT OBJECT_NAME, OBJECT_ID , syn.SYNONYM_NAME short_NAME, decode(bitand(t.property, 32), 32, 'YES', 'NO') partitioned,
decode(bitand(t.property, 64), 64, 'IOT',
decode(bitand(t.property, 512), 512, 'IOT_OVERFLOW',
decode(bitand(t.flags, 536870912), 536870912, 'IOT_MAPPING', null))) iot_type,
SYN.TABLE_OWNER OBJECT_OWNER, o.CREATED, o.LAST_DDL_TIME, O.GENERATED, O.TEMPORARY, case when xt.obj# is null then 'N' else 'Y' end EXTERNAL
FROM SYS.Dba_OBJECTS O, sys.user_synonyms syn,sys.tab$ t, sys.external_tab$ xt
WHERE syn.table_owner = o.owner
and syn.TABLE_NAME = o.object_NAME
and o.object_id = t.obj#
and o.object_id = xt.obj#(+)
and o.object_type = 'TABLE'
and :INCLUDE_SYNS = 1
)
WHERE OBJECT_NAME NOT IN (SELECT OBJECT_NAME FROM RECYCLEBIN)
AND not object_name like 'BIN$%'

when use the test :sql-parser-master/example "xxx", the xxx is the up sql , then run result is following

Given string is not a valid SQL query.
syntax error, unexpected $undefined (L0:507)

the current sql-parser-master can't parse the paramter field, the specail character as above "t.obj#", “xt.obj#(+)” ,"tab.$"

Segment fault with example while using |create table statement|

I built the code on Mac and tried executing the example with below statement

./example "CREATE TABLE Persons (PersonID int);"

And this results in a SEGFAULT. So I tried to debug and got the following stack

$ lldb example
(lldb) target create "example"
Current executable set to 'example' (x86_64).
(lldb) r "CREATE TABLE Persons (PersonID int);"
Process 45712 launched: '/Users/user/workspace/sql-parser/example/example' (x86_64)
Parsed successfully!
Number of statements: 1
CreateStatment
	Persons
Process 45712 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x00007fffdae6ab52 libsystem_c.dylib`strlen + 18
libsystem_c.dylib`strlen:
->  0x7fffdae6ab52 <+18>: pcmpeqb (%rdi), %xmm0
    0x7fffdae6ab56 <+22>: pmovmskb %xmm0, %esi
    0x7fffdae6ab5a <+26>: andq   $0xf, %rcx
    0x7fffdae6ab5e <+30>: orq    $-0x1, %rax
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x00007fffdae6ab52 libsystem_c.dylib`strlen + 18
    frame #1: 0x0000000100092ccd libsqlparser.so`hsql::inprint(char const*, unsigned long) + 93
    frame #2: 0x0000000100000c19 example`main + 489
    frame #3: 0x00007fffdae34235 libdyld.dylib`start + 1
    frame #4: 0x00007fffdae34235 libdyld.dylib`start + 1

License mismatch

sql-parser/LICENSE currently states at the top:

This computer program can be used by anyone in accordance with the OpenSource
licence of the Hasso-Plattner Institute. Its terms of use can be downloaded
under the URL https://github.com/hyrise/hyrise/blob/master/LICENSE

and then goes on to spell out the terms of the Hasso-Plattner OpenSource license. However, the referenced hyrise/LICENSE is the standard MIT license.

Could you please clarify which license - Hasso-Plattner OpenSource or MIT - currently applies to sql-parser? Thanks!

Join multiple tables

This typical OLAP query (join fact table with multiple dimensions) is not supported

SELECT City.name, Product.category, SUM(price) FROM fact
INNER JOIN City ON fact.city_id = City.id
INNER JOIN Product ON fact.product_id = Product.id
GROUP BY City.name, Product.category

It gives "syntax error, unexpected INNER, expecting $end", however it is a valid SQL syntax.

multiple UNION support

TPC-DS queries contain multiple, not nested UNION operations.

Example: TPC-DS query 71, netezza dialect:

select i_brand_id brand_id, i_brand brand,t_hour,t_minute,
 	sum(ext_price) ext_price
 from item, (select ws_ext_sales_price as ext_price, 
                        ws_sold_date_sk as sold_date_sk,
                        ws_item_sk as sold_item_sk,
                        ws_sold_time_sk as time_sk  
                 from web_sales,date_dim
                 where d_date_sk = ws_sold_date_sk
                   and d_moy=12
                   and d_year=2002
                 union all
                 select cs_ext_sales_price as ext_price,
                        cs_sold_date_sk as sold_date_sk,
                        cs_item_sk as sold_item_sk,
                        cs_sold_time_sk as time_sk
                 from catalog_sales,date_dim
                 where d_date_sk = cs_sold_date_sk
                   and d_moy=12
                   and d_year=2002
                 union all
                 select ss_ext_sales_price as ext_price,
                        ss_sold_date_sk as sold_date_sk,
                        ss_item_sk as sold_item_sk,
                        ss_sold_time_sk as time_sk
                 from store_sales,date_dim
                 where d_date_sk = ss_sold_date_sk
                   and d_moy=12
                   and d_year=2002
                 ) tmp,time_dim
 where
   sold_item_sk = i_item_sk
   and i_manager_id=1
   and time_sk = t_time_sk
   and (t_meal_time = 'breakfast' or t_meal_time = 'dinner')
 group by i_brand, i_brand_id,t_hour,t_minute
 order by ext_price desc, i_brand_id
 ;

SQL keyword as alias

Some TPC-DS queries contain sql keywords as aliases, e.g., order. The parser interprets these aliases as sql keywords for the corresponding sql operations.

Example, TPC-DS query 94, netezza dialect:

select  
   count(distinct ws_order_number) as "order count"
  ,sum(ws_ext_ship_cost) as "total shipping cost"
  ,sum(ws_net_profit) as "total net profit"
from
   web_sales ws1
  ,date_dim
  ,customer_address
  ,web_site
where
    d_date between '2002-5-01' and 
           (cast('2002-5-01' as date) + 60 days)
and ws1.ws_ship_date_sk = d_date_sk
and ws1.ws_ship_addr_sk = ca_address_sk
and ca_state = 'OK'
and ws1.ws_web_site_sk = web_site_sk
and web_company_name = 'pri'
and exists (select *
            from web_sales ws2
            where ws1.ws_order_number = ws2.ws_order_number
              and ws1.ws_warehouse_sk <> ws2.ws_warehouse_sk)
and not exists(select *
               from web_returns wr1
               where ws1.ws_order_number = wr1.wr_order_number)
order by count(distinct ws_order_number)
limit 100;

OVER (PARTITION BY …) support

TPC-DS queries contain the OVER() clause. Using this clause, PARTITION BY is often used additionally.

Example: TPC-DS query 12, natezza dialect:

select  i_item_id
      ,i_item_desc 
      ,i_category 
      ,i_class 
      ,i_current_price
      ,sum(ws_ext_sales_price) as itemrevenue 
      ,sum(ws_ext_sales_price)*100/sum(sum(ws_ext_sales_price)) over
          (partition by i_class) as revenueratio
from	
	web_sales
    	,item 
    	,date_dim
where 
	ws_item_sk = i_item_sk 
  	and i_category in ('Men', 'Books', 'Electronics')
  	and ws_sold_date_sk = d_date_sk
	and d_date between cast('2001-06-15' as date) 
				and (cast('2001-06-15' as date) + 30 days)
group by 
	i_item_id
        ,i_item_desc 
        ,i_category
        ,i_class
        ,i_current_price
order by 
	i_category
        ,i_class
        ,i_item_id
        ,i_item_desc
        ,revenueratio
limit 100;

why i run the make in the src/parser , fail ?

/home/cxx/sql-parser-master/src/parser
[root@thomas-dev parser]# make clean;make
rm -f bison_parser.cpp flex_lexer.cpp bison_parser.h flex_lexer.h *.output
bison (GNU Bison) 2.4.1
bison bison_parser.y --output=bison_parser.cpp --defines=bison_parser.h --verbose
bison_parser.y:63.18-21: syntax error, unexpected identifier
make: *** [bison_parser.cpp] 错误 1
[root@thomas-dev parser]# pwd
/home/cxx/sql-parser-master/src/parser

DB support list

Could you please help to list what databases are supported in this parser?
Oracle/Mysql/DB2/etc.

how can the bison_paser.y to support the follow insert syntax ?

./example/example "insert into XT_CZRZB (CZRZBH,CZLB,CZTJ,CZCT,CZR,CZDWMC,CXDWBM,CZSJ,CZRIP) VALUES ('RZ320100221612' || to_char('5287','FM000000'),'111','gmsfzhm = ''1'' and xm = ''2'' and xb = ''1'' and csrq = ''19901202'' and hjqh like ''11____'' and bmch = ''3'' and zjzl = ''100'' and zjhm = ''4'' and hjzrq like ''320102______'' and sf = ''400'' and mz = ''01'' and rylb = ''1'' and ryxzzzrq like ''320100______'' and zc = ''02'' and fwcs = ''123'' and lxdh = ''777777'' and jgqx like ''1101__'' and rysx = ''111100000000000000000000000000000000000000000000000000000000''','101010','yaoshibin','kjxxhc','320100220000','20161221133856','10.33.2.68')"
Given string is not a valid SQL query.
syntax error, unexpected CONCAT, expecting ')' or ',' (L0:99)

UTF-8 support

It is not possible to pass non ascii identifiers right now. It would be great to support utf-8 queries.

Flex can process utf-8 characters with regex like this

%option 8bit

([A-Za-z0-9_]|[\xc2-\xdf][\x80-\xbf]|[\xe0-\xef][\x80-\xbf][\x80-\xbf]|[\xf0-\xf4][\x80-\xbf][\x80-\xbf][\x80-\xbf])* {
    yylval->sval = strdup(yytext);
    return SQL_IDENTIFIER;
}

Improve memory management

Currently the parse tree contains a large number of raw pointers to maintain the nodes. This means that we always need to explicitly delete all children upon deletion of a node. While we do have an automatic memory leak checker, the risk of memory leaks is still there and handling memory this way is cumbersome.

Possible improvements:

  • Use std::string instead of char*
  • Use smart pointers instead of raw pointers

We should measure the performance impact of using smart pointers. In general, I think the advantages of smart memory management will outweigh possible performance drawbacks.

Related Blog Post: Feeding a Bison with tasty C++11 grAST!

Table Alias Ambiguity

This statement is currently ambiguous:

SELECT * FROM table AS foo

Is the foo referring to "table" or to the Select query?
Find out how to resolve this...

Arbitrary text allowed behind SQL query

(debug)> SELECT * FROM t ICANDOMAGICHERE;
=== Columns
|       a|       b|
|     int|     int|
=== Chunk 0 ===
|   12345|       1|
|     123|       2|
|    1234|       3|
===
3 rows total (COMPILE: 13 µs, EXECUTE: 207 µs (wall time))
(debug)> SELECT * FROM t WHERE a=b ICANDOMAGICHERE;
SQL query not valid.
SQL query:
==========
SELECT * FROM t WHERE a=b ICANDOMAGICHERE;
                          ^=== ERROR HERE!
==========
Error line: 0
Error column: 26
Error message: syntax error, unexpected IDENTIFIER, expecting $end

Run benchmark error

make benchmark

g++ -std=c++11 -Wall -Isrc/ -L./ -O3 benchmark/parser_benchmark.cpp benchmark/benchmark.cpp benchmark/queries.cpp benchmark/benchmark_utils.cpp -o bin/benchmark -lbenchmark -lpthread -lsqlparser -lstdc++ -lstdc++fs
benchmark/benchmark.cpp: In function ‘int main(int, char**)’:
benchmark/benchmark.cpp:11:82: error: no matching function for call to ‘RegisterBenchmark(const char*, void ()(benchmark::State&, const string&), const std::basic_string&)’
benchmark::RegisterBenchmark(p_name.c_str(), &BM_ParseBenchmark, query.second);
^
benchmark/benchmark.cpp:11:82: note: candidates are:
In file included from benchmark/benchmark.cpp:1:0:
/usr/local/include/benchmark/benchmark.h:958:29: note: benchmark::internal::Benchmark
benchmark::RegisterBenchmark(const char*, void ()(benchmark::State&))
inline internal::Benchmark
RegisterBenchmark(const char* name,
^
/usr/local/include/benchmark/benchmark.h:958:29: note: candidate expects 2 arguments, 3 provided
/usr/local/include/benchmark/benchmark.h:966:22: note: template benchmark::internal::Benchmark* benchmark::RegisterBenchmark(const char*, Lambda&&)
internal::Benchmark* RegisterBenchmark(const char* name, Lambda&& fn) {
^
/usr/local/include/benchmark/benchmark.h:966:22: note: template argument deduction/substitution failed:
benchmark/benchmark.cpp:11:82: note: candidate expects 2 arguments, 3 provided
benchmark::RegisterBenchmark(p_name.c_str(), &BM_ParseBenchmark, query.second);
^
benchmark/benchmark.cpp:13:85: error: no matching function for call to ‘RegisterBenchmark(const char*, void ()(benchmark::State&, const string&), const std::basic_string&)’
benchmark::RegisterBenchmark(t_name.c_str(), &BM_TokenizeBenchmark, query.second);
^
benchmark/benchmark.cpp:13:85: note: candidates are:
In file included from benchmark/benchmark.cpp:1:0:
/usr/local/include/benchmark/benchmark.h:958:29: note: benchmark::internal::Benchmark
benchmark::RegisterBenchmark(const char*, void ()(benchmark::State&))
inline internal::Benchmark
RegisterBenchmark(const char* name,
^
/usr/local/include/benchmark/benchmark.h:958:29: note: candidate expects 2 arguments, 3 provided
/usr/local/include/benchmark/benchmark.h:966:22: note: template benchmark::internal::Benchmark* benchmark::RegisterBenchmark(const char*, Lambda&&)
internal::Benchmark* RegisterBenchmark(const char* name, Lambda&& fn) {
^
/usr/local/include/benchmark/benchmark.h:966:22: note: template argument deduction/substitution failed:
benchmark/benchmark.cpp:13:85: note: candidate expects 2 arguments, 3 provided
benchmark::RegisterBenchmark(t_name.c_str(), &BM_TokenizeBenchmark, query.second);
^
benchmark/benchmark.cpp:20:82: error: no matching function for call to ‘RegisterBenchmark(const char*, void ()(benchmark::State&, const string&), const std::basic_string&)’
benchmark::RegisterBenchmark(p_name.c_str(), &BM_ParseBenchmark, query.second);
^
benchmark/benchmark.cpp:20:82: note: candidates are:
In file included from benchmark/benchmark.cpp:1:0:
/usr/local/include/benchmark/benchmark.h:958:29: note: benchmark::internal::Benchmark
benchmark::RegisterBenchmark(const char*, void ()(benchmark::State&))
inline internal::Benchmark
RegisterBenchmark(const char* name,
^
/usr/local/include/benchmark/benchmark.h:958:29: note: candidate expects 2 arguments, 3 provided
/usr/local/include/benchmark/benchmark.h:966:22: note: template benchmark::internal::Benchmark* benchmark::RegisterBenchmark(const char*, Lambda&&)
internal::Benchmark* RegisterBenchmark(const char* name, Lambda&& fn) {
^
/usr/local/include/benchmark/benchmark.h:966:22: note: template argument deduction/substitution failed:
benchmark/benchmark.cpp:20:82: note: candidate expects 2 arguments, 3 provided
benchmark::RegisterBenchmark(p_name.c_str(), &BM_ParseBenchmark, query.second);
^
benchmark/benchmark.cpp:23:85: error: no matching function for call to ‘RegisterBenchmark(const char*, void ()(benchmark::State&, const string&), const std::basic_string&)’
benchmark::RegisterBenchmark(t_name.c_str(), &BM_TokenizeBenchmark, query.second);
^
benchmark/benchmark.cpp:23:85: note: candidates are:
In file included from benchmark/benchmark.cpp:1:0:
/usr/local/include/benchmark/benchmark.h:958:29: note: benchmark::internal::Benchmark
benchmark::RegisterBenchmark(const char*, void ()(benchmark::State&))
inline internal::Benchmark
RegisterBenchmark(const char* name,
^
/usr/local/include/benchmark/benchmark.h:958:29: note: candidate expects 2 arguments, 3 provided
/usr/local/include/benchmark/benchmark.h:966:22: note: template benchmark::internal::Benchmark* benchmark::RegisterBenchmark(const char*, Lambda&&)
internal::Benchmark* RegisterBenchmark(const char* name, Lambda&& fn) {
^
/usr/local/include/benchmark/benchmark.h:966:22: note: template argument deduction/substitution failed:
benchmark/benchmark.cpp:23:85: note: candidate expects 2 arguments, 3 provided
benchmark::RegisterBenchmark(t_name.c_str(), &BM_TokenizeBenchmark, query.second);
^
benchmark/queries.cpp:3:35: fatal error: experimental/filesystem: No such file or directory
#include <experimental/filesystem>
^
compilation terminated.
make: *** [bin/benchmark] Error 1

make

the README says 'make build' should work, however there is no target for 'build'. Also:

[bray@localhost sql-parser]$ make
Compiling...
bison_parser.y:71.20-26: error: syntax error, unexpected {...}
make[2]: *** [bison_parser.cpp] Error 1
make[1]: *** [parser/bison_parser.cpp] Error 2
make: *** [test] Error 1

Will your parser cope with a specific kind of request?

For example, the query:

	|-- Найти КУТЫ у которых модель принадлежит другому производителю, нежели указано в карточке КУта(и товара) с доп информацией. по определенному производителю.
	|-- ВыбПроизводитель1=Справочник.Производители
	|SELECT 
	|	спрК1.ID as [Кут1 $Справочник.КлассификаторТоваров]
	|	, (SELECT COUNT(спрКО1.ID) FROM $Справочник.КлассифТоваров_КлючевыеХарактеристики as спрКО1 WHERE спрКО1.PARENTEXT = спрК1.ID) as КолвоКО1
	|	, $спрК1.КомментарийОптовогоПрайса as КомментарийОптовогоПрайса
	|	, спрК2.ID as [Кут2 $Справочник.КлассификаторТоваров]
	|	, (SELECT COUNT(спрКО2.ID) FROM $Справочник.КлассифТоваров_КлючевыеХарактеристики as спрКО2 WHERE спрКО2.PARENTEXT = спрК2.ID) as КолвоКО2
	|	, $спрК2.КомментарийОптовогоПрайса as КомментарийОптовогоПрайса
	|FROM 
	|	$Справочник.КлассификаторТоваров as спрК1(NOLOCK) 
	|	INNER JOIN $Справочник.КлассификаторТоваров спрК2(NOLOCK) on ($спрК1.Модель = $спрК2.Модель AND $спрК1.Производитель <> $спрК2.Производитель)
	|WHERE
	|	$спрК1.Производитель = :ВыбПроизводитель1

I need an output of the following form:
спрК1: $Справочник.КлассификаторТоваров
спрК2: $Справочник.КлассификаторТоваров

Dose "SQLStatementTransformer" have been implemented or not?

We read the paper provided in README.md and want to follow this work. However, I fail to find the SQLStatementTransformer and SQLQueryTask (in codes) which are described in your paper. So I wonder if these modulars have not been implemented?

Aliases starting with a digit are not supported

Aliases starting with a digit are evaluated as invalid.

Example 1:

SELECT x AS "2ndColumn"
FROM table1

Example 2 (TPC-DS query 50, netezza dialect):

select  
   s_store_name
  ,s_company_id
  ,s_street_number
  ,s_street_name
  ,s_street_type
  ,s_suite_number
  ,s_city
  ,s_county
  ,s_state
  ,s_zip
  ,sum(case when (sr_returned_date_sk - ss_sold_date_sk <= 30 ) then 1 else 0 end)  as "30 days" 
  ,sum(case when (sr_returned_date_sk - ss_sold_date_sk > 30) and 
                 (sr_returned_date_sk - ss_sold_date_sk <= 60) then 1 else 0 end )  as "31-60 days" 
  ,sum(case when (sr_returned_date_sk - ss_sold_date_sk > 60) and 
                 (sr_returned_date_sk - ss_sold_date_sk <= 90) then 1 else 0 end)  as "61-90 days" 
  ,sum(case when (sr_returned_date_sk - ss_sold_date_sk > 90) and
                 (sr_returned_date_sk - ss_sold_date_sk <= 120) then 1 else 0 end)  as "91-120 days" 
  ,sum(case when (sr_returned_date_sk - ss_sold_date_sk  > 120) then 1 else 0 end)  as ">120 days" 
from
   store_sales
  ,store_returns
  ,store
  ,date_dim d1
  ,date_dim d2
where
    d2.d_year = 2001
and d2.d_moy  = 8
and ss_ticket_number = sr_ticket_number
and ss_item_sk = sr_item_sk
and ss_sold_date_sk   = d1.d_date_sk
and sr_returned_date_sk   = d2.d_date_sk
and ss_customer_sk = sr_customer_sk
and ss_store_sk = s_store_sk
group by
   s_store_name
  ,s_company_id
  ,s_street_number
  ,s_street_name
  ,s_street_type
  ,s_suite_number
  ,s_city
  ,s_county
  ,s_state
  ,s_zip
order by s_store_name
        ,s_company_id
        ,s_street_number
        ,s_street_name
        ,s_street_type
        ,s_suite_number
        ,s_city
        ,s_county
        ,s_state
        ,s_zip
limit 100;

Unable to use in Windows

I tried to use the parser in Windows with Visual studio 2012. I was finally somehow succeeded in compiling the basic project. But the flex_lexer.cpp file throws the fatal error that the limit of nested conditions exceeded. i.e. "flex_lexer.l(215): fatal error C1061: compiler limit : blocks nested too deeply"
Can you please help me out here what i am missing and how can i make this thing to work.

Parse error in multiple table join

This is my query

select * from events join sensors on events.sid = sensors.sid join entities on entities.eid = sensors.eid where entities.name = "Ball 1" and events.v < 1000000

and here is the error message:
syntax error, unexpected JOIN, expecting $end (L0:62)

The library still performs well with the simple joining of two table.

why the result is different in the whereclause between the select with count(*) and select *

[root@thomas-dev sql-parser-master]# ./example/example "select count(*) from t1 where doctor_dept_id = '0095' and doctor_name = 'jack'"
Parsed successfully!
Number of statements: 1
SelectStatement
Fields:
count
[root@thomas-dev sql-parser-master]#
[root@thomas-dev sql-parser-master]# ./example/example "select * from t1 where doctor_dept_id = '0095' and doctor_name = 'jack'"
Parsed successfully!
Number of statements: 1
SelectStatement
Fields:
*
Sources:
t1
Search Conditions:
AND
9
doctor_dept_id
0095
9
doctor_name
jack

Unable to make after renaming the 'sql-parser' folder

I forked your repository and use it as a submodule.
Before I ran make I rename the folder to 'sqlParser' and then run the make command.
Results:

[user@localhost sqlParser]$ make
make -C src/parser/
make[1]: Entering directory `/home/user/general/core/external/sqlParser/src/parser'
bison bison_parser.y -v
bison_parser.y:73.20-26: error: syntax error, unexpected {...}
make[1]: *** [bison_parser.cpp] Error 1
make[1]: Leaving directory `/home/user/general/core/external/sqlParser/src/parser'
make: *** [parser] Error 2
[user@localhost sqlParser]$ 

Clarify licsense.

Please make a statement with regard to the license this is released under.

SQL syntax caching

Does this parser (or flex/bison) support caching syntax? I realize it very take time when parsing the same syntax with different argument several times. It would be better if we cache them.

P/s: maybe the link about your paper already died(http://torpedro.com/paper/HyriseSQL-03-2015.pdf), it responsed code 404. Could you update it pls?

Use Makefile Generation Script

Use one of the following:

  • Cmake
  • automake
  • premake

Update 09 Dec 2019: cmake seems to have won the race in open-source software. Let's stick to that.

Multiple "WHEN" conditions not supported in a "CASE" statement

The following statement has two "WHEN" conditions and an error ("syntax error, unexpected WHEN") is returned when trying to parse it by calling SQLParser::parseSQLString:

SELECT (CASE
WHEN ("Calcs"."int0" = "NULL") OR ("Calcs"."int1" = "NULL") THEN "NULL"
WHEN "Calcs"."int0" > "Calcs"."int1" THEN "Calcs"."int0"
ELSE "Calcs"."int1" END) AS "TEMP_Test__1523549003__0_"
FROM "Calcs"

If the statement is changed to only contain one "WHEN" condition, then the parsing succeeds:
SELECT (CASE
WHEN ("Calcs"."int0" = "NULL") OR ("Calcs"."int1" = "NULL") THEN "NULL"
ELSE "Calcs"."int1" END) AS "TEMP_Test__1523549003__0_"
FROM "Calcs"

MSVC build

It seems that it is possible to build sql-parser with msvc (I built it with msvc 2015), however there is no <unistd.h> on Windows.

WITH support

Many TPC-DS queries (1, 2, 4, 5, 11, 14, ...) contain the WITH clause which is currently not supported by the parser.

TPC-DS query 14, natezza dialect:

with  cross_items as
 (select i_item_sk ss_item_sk
 from item,
 (select iss.i_brand_id brand_id
     ,iss.i_class_id class_id
     ,iss.i_category_id category_id
 from store_sales
     ,item iss
     ,date_dim d1
 where ss_item_sk = iss.i_item_sk
   and ss_sold_date_sk = d1.d_date_sk
   and d1.d_year between 1999 AND 1999 + 2
 intersect 
 select ics.i_brand_id
     ,ics.i_class_id
     ,ics.i_category_id
 from catalog_sales
     ,item ics
     ,date_dim d2
 where cs_item_sk = ics.i_item_sk
   and cs_sold_date_sk = d2.d_date_sk
   and d2.d_year between 1999 AND 1999 + 2
 intersect
 select iws.i_brand_id
     ,iws.i_class_id
     ,iws.i_category_id
 from web_sales
     ,item iws
     ,date_dim d3
 where ws_item_sk = iws.i_item_sk
   and ws_sold_date_sk = d3.d_date_sk
   and d3.d_year between 1999 AND 1999 + 2)
 where i_brand_id = brand_id
      and i_class_id = class_id
      and i_category_id = category_id
),
 avg_sales as
 (select avg(quantity*list_price) average_sales
  from (select ss_quantity quantity
             ,ss_list_price list_price
       from store_sales
           ,date_dim
       where ss_sold_date_sk = d_date_sk
         and d_year between 1999 and 1999 + 2
       union all 
       select cs_quantity quantity 
             ,cs_list_price list_price
       from catalog_sales
           ,date_dim
       where cs_sold_date_sk = d_date_sk
         and d_year between 1999 and 1999 + 2 
       union all
       select ws_quantity quantity
             ,ws_list_price list_price
       from web_sales
           ,date_dim
       where ws_sold_date_sk = d_date_sk
         and d_year between 1999 and 1999 + 2) x)
  select  channel, i_brand_id,i_class_id,i_category_id,sum(sales), sum(number_sales)
 from(
       select 'store' channel, i_brand_id,i_class_id
             ,i_category_id,sum(ss_quantity*ss_list_price) sales
             , count(*) number_sales
       from store_sales
           ,item
           ,date_dim
       where ss_item_sk in (select ss_item_sk from cross_items)
         and ss_item_sk = i_item_sk
         and ss_sold_date_sk = d_date_sk
         and d_year = 1999+2 
         and d_moy = 11
       group by i_brand_id,i_class_id,i_category_id
       having sum(ss_quantity*ss_list_price) > (select average_sales from avg_sales)
       union all
       select 'catalog' channel, i_brand_id,i_class_id,i_category_id, sum(cs_quantity*cs_list_price) sales, count(*) number_sales
       from catalog_sales
           ,item
           ,date_dim
       where cs_item_sk in (select ss_item_sk from cross_items)
         and cs_item_sk = i_item_sk
         and cs_sold_date_sk = d_date_sk
         and d_year = 1999+2 
         and d_moy = 11
       group by i_brand_id,i_class_id,i_category_id
       having sum(cs_quantity*cs_list_price) > (select average_sales from avg_sales)
       union all
       select 'web' channel, i_brand_id,i_class_id,i_category_id, sum(ws_quantity*ws_list_price) sales , count(*) number_sales
       from web_sales
           ,item
           ,date_dim
       where ws_item_sk in (select ss_item_sk from cross_items)
         and ws_item_sk = i_item_sk
         and ws_sold_date_sk = d_date_sk
         and d_year = 1999+2
         and d_moy = 11
       group by i_brand_id,i_class_id,i_category_id
       having sum(ws_quantity*ws_list_price) > (select average_sales from avg_sales)
 ) y
 group by rollup (channel, i_brand_id,i_class_id,i_category_id)
 order by channel,i_brand_id,i_class_id,i_category_id
 limit 100;
with  cross_items as
 (select i_item_sk ss_item_sk
 from item,
 (select iss.i_brand_id brand_id
     ,iss.i_class_id class_id
     ,iss.i_category_id category_id
 from store_sales
     ,item iss
     ,date_dim d1
 where ss_item_sk = iss.i_item_sk
   and ss_sold_date_sk = d1.d_date_sk
   and d1.d_year between 1999 AND 1999 + 2
 intersect
 select ics.i_brand_id
     ,ics.i_class_id
     ,ics.i_category_id
 from catalog_sales
     ,item ics
     ,date_dim d2
 where cs_item_sk = ics.i_item_sk
   and cs_sold_date_sk = d2.d_date_sk
   and d2.d_year between 1999 AND 1999 + 2
 intersect
 select iws.i_brand_id
     ,iws.i_class_id
     ,iws.i_category_id
 from web_sales
     ,item iws
     ,date_dim d3
 where ws_item_sk = iws.i_item_sk
   and ws_sold_date_sk = d3.d_date_sk
   and d3.d_year between 1999 AND 1999 + 2) x
 where i_brand_id = brand_id
      and i_class_id = class_id
      and i_category_id = category_id
),
 avg_sales as
(select avg(quantity*list_price) average_sales
  from (select ss_quantity quantity
             ,ss_list_price list_price
       from store_sales
           ,date_dim
       where ss_sold_date_sk = d_date_sk
         and d_year between 1999 and 1999 + 2
       union all
       select cs_quantity quantity
             ,cs_list_price list_price
       from catalog_sales
           ,date_dim
       where cs_sold_date_sk = d_date_sk
         and d_year between 1999 and 1999 + 2
       union all
       select ws_quantity quantity
             ,ws_list_price list_price
       from web_sales
           ,date_dim
       where ws_sold_date_sk = d_date_sk
         and d_year between 1999 and 1999 + 2) x)
  select  this_year.channel ty_channel
                           ,this_year.i_brand_id ty_brand
                           ,this_year.i_class_id ty_class
                           ,this_year.i_category_id ty_category
                           ,this_year.sales ty_sales
                           ,this_year.number_sales ty_number_sales
                           ,last_year.channel ly_channel
                           ,last_year.i_brand_id ly_brand
                           ,last_year.i_class_id ly_class
                           ,last_year.i_category_id ly_category
                           ,last_year.sales ly_sales
                           ,last_year.number_sales ly_number_sales 
 from
 (select 'store' channel, i_brand_id,i_class_id,i_category_id
        ,sum(ss_quantity*ss_list_price) sales, count(*) number_sales
 from store_sales 
     ,item
     ,date_dim
 where ss_item_sk in (select ss_item_sk from cross_items)
   and ss_item_sk = i_item_sk
   and ss_sold_date_sk = d_date_sk
   and d_week_seq = (select d_week_seq
                     from date_dim
                     where d_year = 1999 + 1
                       and d_moy = 12
                       and d_dom = 3)
 group by i_brand_id,i_class_id,i_category_id
 having sum(ss_quantity*ss_list_price) > (select average_sales from avg_sales)) this_year,
 (select 'store' channel, i_brand_id,i_class_id
        ,i_category_id, sum(ss_quantity*ss_list_price) sales, count(*) number_sales
 from store_sales
     ,item
     ,date_dim
 where ss_item_sk in (select ss_item_sk from cross_items)
   and ss_item_sk = i_item_sk
   and ss_sold_date_sk = d_date_sk
   and d_week_seq = (select d_week_seq
                     from date_dim
                     where d_year = 1999
                       and d_moy = 12
                       and d_dom = 3)
 group by i_brand_id,i_class_id,i_category_id
 having sum(ss_quantity*ss_list_price) > (select average_sales from avg_sales)) last_year
 where this_year.i_brand_id= last_year.i_brand_id
   and this_year.i_class_id = last_year.i_class_id
   and this_year.i_category_id = last_year.i_category_id
 order by this_year.channel, this_year.i_brand_id, this_year.i_class_id, this_year.i_category_id
 limit 100;

CAST support

TPC-DS queries (5, 12, 18, 21, ...) contain the CAST function which is currently not supported by the parser.

TPC-DS query 12, natezza dialect:

select  i_item_id
      ,i_item_desc 
      ,i_category 
      ,i_class 
      ,i_current_price
      ,sum(ws_ext_sales_price) as itemrevenue 
      ,sum(ws_ext_sales_price)*100/sum(sum(ws_ext_sales_price)) over
          (partition by i_class) as revenueratio
from	
	web_sales
    	,item 
    	,date_dim
where 
	ws_item_sk = i_item_sk 
  	and i_category in ('Men', 'Books', 'Electronics')
  	and ws_sold_date_sk = d_date_sk
	and d_date between cast('2001-06-15' as date) 
				and (cast('2001-06-15' as date) + 30 days)
group by 
	i_item_id
        ,i_item_desc 
        ,i_category
        ,i_class
        ,i_current_price
order by 
	i_category
        ,i_class
        ,i_item_id
        ,i_item_desc
        ,revenueratio
limit 100;

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.