ms705 / nom-sql Goto Github PK
View Code? Open in Web Editor NEWRust SQL parser written using nom
License: MIT License
Rust SQL parser written using nom
License: MIT License
Most of the Rust libraries for accessing MySQL allow named parameters using the :name
syntax, it would be ideal to be able to parse these as an ItemPlaceholder::Named(String)
.
When running the tests for the first time with cargo test
I noticed multiple warnings about unused doc comments:
warning: unused doc comment
--> src/common.rs:1021:1
|
1021 | /// Parse rule for a comment part.
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1022 | / named!(pub parse_comment<CompleteByteSlice, String>,
1023 | | do_parse!(
1024 | | opt_multispace >>
1025 | | tag_no_case!("comment") >>
... |
1029 | | )
1030 | | );
| |__- rustdoc does not generate documentation for macro expansions
|
= help: to document an item produced by a macro, the macro must produce the documentation as part of its expansion
Should we follow the help suggestion and include the documentation line(s) in the macro or remove the documentation lines for now?
The join clauses appear to properly support nested selection but the from clause only looks for a table list. We will want to be able to support parsing these statements since nested selection shows up in the from clause fairly often from my experience.
Currently on master:
Line 58 in 026473b
In src/order.rs
, the order by clause allows for trailing commas, which is technically invalid SQL.
Eg, "select * from users order by name ascage desc,\n"
will parse, but should not
In the incoming bump nom to v5
PR, this is on line 76 of the same file:
76 many0(order_expr),
This placholder ?
working but :1
and $1
not work.
Example
let sql = "DELETE FROM users WHERE user = $1 AND password = $2";
let res = parse_query(sql);
assert!(res.is_ok());
// or
let sql = "DELETE FROM users WHERE user = :1 AND password = :2";
let res = parse_query(sql);
assert!(res.is_ok());
Result:
Err(
"failed to parse query"
)
As of MySQL 8 Common Table Expressions are supported with the added keyword WITH
(https://dev.mysql.com/doc/refman/8.0/en/with.html). It would be helpful to be able to parse this statement prior every place a SELECT
statement shows up (I don't believe there are any exceptions from what I have seen in the documentation.
It would be nice to be able to parse SQL statements with comments in them.
create--blah blah
table--blah blah
tt--blah blah
(--blah blah
x--blah blah
)--blah blah
;
In standard SQL you may select literals without a table, eg SELECT 1 + 1;
. This is not supported by nom-sql
but in a perfect world it would be.
comment from @spazm (on the bump nom to v5
PR):
nested_selection could attempt to parse with something more specific than field_definition_expr like field_definition_literals(hypothetical) which would parse without the FROM (or FROM as optional).
Hi!
Thank you for this great library.
I am trying to use it to parse wikipedia dumps in my project wikipedia-externallinks-fast-extraction.
Unfortunately, they contain mysql escape characters that are currently not supported by this library.
The escape characters are:
\0
\'
\"
\b
\n
\r
\t
\Z
\\
\%
\_
INSERT INTO externallinks VALUES (23481,120102,'http://home.arcor.de/jean-polmartin/aufsaetze/apliut.htm\'','http://de.arcor.home./jean-polmartin/aufsaetze/apliut.htm\'','http://de.arcor.home./jean-polmartin/aufsaetze/apliut.htm\'');
Currently parsers that accept subqueries call the nested_selection
parser which won't catch unions compound_selection
will. I believe anywhere that a subquery or a filter section of DML commands should always permit a compound selection.
Nom
recently released v5 and nom-sql
still uses v4. Just wondering if there is a plan to upgrade and what that plan might be.
One thing to note: nom v5's macros were rewritten to use the streaming version of the parsers. My assumption would be that nom-sql
would only want to deal with complete parsers when upgrading to nom v5.
Nom has good documentation on upgrading to nom v5 here
MySQL table options can be written in any order, and be separated by a comma.
CREATE TABLE t(x integer) AUTO_INCREMENT=1,ENGINE=,KEY_BLOCK_SIZE=8;
CREATE TABLE
statements without a space between the table name and the column definition fail to parse.
CREATE TABLE x(b integer);
i stumbled across this line.
would consumers ever need to know about "IS NULL" vs. "= NULL"?
select 1 from t where id is null
select 1 from t where id = null
The result of parsing these queries is the same but is null
is not an equivalent of = null
. The latter query actually returns "wrong" results.
https://stackoverflow.com/questions/9581745/sql-is-null-and-null
Hi, I'm working on implementing fmt::Display on SqlQuery and all the structs and enums underneath. I will make a PR sometime on the next few days.
The version of nom
currently used by nom-sql
is fairly old. In and of itself, this isn't so bad (though the newer nom presumably has performance and correctness fixes), but it causes some pain with dependencies. Specifically, the old version of num depends on regex 0.1
, whereas regex
is now at 0.2
(which is incompatible according to Rust's semver). regex 0.1
in turn has a bunch of outdated dependencies (compare 0.1 to current). These in turn pull in more old dependencies, etc. For crates that use nom-sql
this causes many dependencies to be pulled in twice (one old, one new), which has a detrimental effect on compilation time and space usage.
I know fixing this is a bit annoying, as the nom api has changed a fair bit, but it's probably eventually worth doing.
I found the following comment // TODO(malte): support REPLACE, nested selection, DEFAULT VALUES
in insert::insertion
Is this comment still relevant ?
Lines 58 to 59 in 791c0f5
Both ORDER BY AND GROUP BY can have a positional in place of a column name (the positional and column names can be used in the same list) and any conditional expression. Ideally, the parser would able to handle these situations.
#![feature(test)]
extern crate test;
use test::Bencher;
use sqlparser::dialect::MySqlDialect;
use sqlparser::parser::Parser;
#[bench]
fn bench_sqlparser(b: &mut Bencher) {
let pg_dialect = MySqlDialect {};
let sql = r#"SELECT COUNT(*) FROM t"#;
let _ = Parser::parse_sql(&pg_dialect, sql).unwrap();
b.iter(||{
Parser::parse_sql(&pg_dialect, sql).unwrap();
});
}
#[bench]
fn bench_nomsql(b: &mut Bencher) {
let sql = r#"SELECT COUNT(*) FROM t"#;
let s=nom_sql::parser::parse_query(sql);
if s.is_err(){
panic!("{}",s.err().unwrap())
}
b.iter(||{
nom_sql::parser::parse_query(sql).unwrap();
});
}
running 2 tests
test bench_nomsql ... bench: 10,228 ns/iter (+/- 352)
test bench_sqlparser ... bench: 2,832 ns/iter (+/- 64)
Per the following comment I stumbled upon for the creation
parser, // TODO(malte): support types, TEMPORARY tables, IF NOT EXISTS, AS stmt
I wanted to get an issue open to add these features along with a few other missing pieces. I will add comments as I figure out what missing pieces those may be.
Currently the parser has the HAVING clause parsed as part of the GROUP BY clause when it was valid, this is needed in cases where a condition on a calculated value is needed and a GROUP BY is inappropriate such as SELECT 1 + 2 as adder HAVING adder > 2
.
Right now parsing a CREATE TABLE
statement will fail:
Exemple:
CREATE TABLE withcomments (
...
field VARCHAR(100) COMMENT 'some field',
...
) COMMENT='table description';
It would be nice to support those.
I'm not yet really familiar with nom, but I would gladly help on this feature.
#[test]
fn test_nomsql() {
let sql = r#"SELECT COUNT(*) FROM (SELECT id, user_id FROM test WHERE id = 1) t"#;
let s=nom_sql::parser::parse_query(sql);
if s.is_err(){
panic!("{}",s.err().unwrap())
}
}
running 1 test
test test_nomsql ... FAILED
failures:
---- bench_nomsql stdout ----
thread 'main' panicked at 'failed to parse query', benches/bench_parser.rs:24:9
stack backtrace:
0: rust_begin_unwind
at /rustc/ec56537c4325ce5b798fc3628cbdd48ba4949ae5/library/std/src/panicking.rs:575:5
1: core::panicking::panic_fmt
at /rustc/ec56537c4325ce5b798fc3628cbdd48ba4949ae5/library/core/src/panicking.rs:64:14
2: core::panicking::panic_display
3: core::ops::function::FnOnce::call_once
4: core::ops::function::FnMut::call_mut
at /rustc/ec56537c4325ce5b798fc3628cbdd48ba4949ae5/library/core/src/ops/function.rs:337:5
5: test::bench::Bencher::bench
at /rustc/ec56537c4325ce5b798fc3628cbdd48ba4949ae5/library/test/src/bench.rs:56:22
6: test::bench::benchmark::{{closure}}
at /rustc/ec56537c4325ce5b798fc3628cbdd48ba4949ae5/library/test/src/bench.rs:208:51
7: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
at /rustc/ec56537c4325ce5b798fc3628cbdd48ba4949ae5/library/core/src/panic/unwind_safe.rs:271:9
8: std::panicking::try::do_call
at /rustc/ec56537c4325ce5b798fc3628cbdd48ba4949ae5/library/std/src/panicking.rs:483:40
9: std::panicking::try
at /rustc/ec56537c4325ce5b798fc3628cbdd48ba4949ae5/library/std/src/panicking.rs:447:19
10: std::panic::catch_unwind
at /rustc/ec56537c4325ce5b798fc3628cbdd48ba4949ae5/library/std/src/panic.rs:137:14
11: test::bench::benchmark
at /rustc/ec56537c4325ce5b798fc3628cbdd48ba4949ae5/library/test/src/bench.rs:208:18
12: test::run_test
at /rustc/ec56537c4325ce5b798fc3628cbdd48ba4949ae5/library/test/src/lib.rs:628:13
13: test::run_tests
at /rustc/ec56537c4325ce5b798fc3628cbdd48ba4949ae5/library/test/src/lib.rs:465:31
14: test::console::run_tests_console
at /rustc/ec56537c4325ce5b798fc3628cbdd48ba4949ae5/library/test/src/console.rs:293:5
15: test::test_main
at /rustc/ec56537c4325ce5b798fc3628cbdd48ba4949ae5/library/test/src/lib.rs:140:15
16: test::test_main_static
at /rustc/ec56537c4325ce5b798fc3628cbdd48ba4949ae5/library/test/src/lib.rs:159:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
failures:
bench_nomsql
test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 1 filtered out; finished in 0.01s
error: bench failed, to rerun pass `-p jdb --bench bench_parser`
For some reason the parser isn't able to parse this fairly simple case statement: CASE WHEN country = 'USA' THEN state ELSE city END
. I couldn't find any obvious reason that the case_when_column
would miss this one.
As mentioned in the comments of arithmetic.rs
, operator precedence is not currently supported and neither is nested arithmetic expressions.
I plan on eventually submitting a PR to support this but first wanted to open an issue so others may be aware of what I am working on in case they want to also contribute.
I have minimal experience with parsers, but after some research found that Pratt parsers are probably best for this purpose, and plan to implement it using this approach barring any comments/suggestions from other contributors
These 2 queries
INSERT INTO user(name, password) VALUES('aaa','xxx');
INSERT INTO user (name, password) VALUES ('aaa','xxx');
are both valid, but nom-sql
only parses the 2nd correctly.
I'm guessing this is a bit of complex to do in nom
, where the multispace
becomes optional when tag!('(')
is in the next few bytes.
Lines 23 to 40 in 8fe4e92
MySQL 8 supports window functions (https://dev.mysql.com/doc/refman/8.0/en/window-functions-usage.html) ideally the parser would be able to properly parse statements that use these in expressions.
In master (and on the incoming bump nom to v5
PR), compound_selection
will match unbalanced paren pairs. We only want "(nested)" or "nested" to match, but using opt for both paren tags means that "nested)" and "(nested" will also match.
There may be other similar unbalanced issues with parens in other areas of the code or trailing commas.
This might work if the incoming bump nom to v5
PR is merged, but is definitely not pretty. Maybe it could be extracted into a "opt_parenthetical", ore more general opt_delimited :
alt(
delimited(opt(tag("(")), nested_selection, opt(tag(")"))),
nested_selection,
)
(line 110 of src/compound_select
in the bump nom to v5
PR)
Hi,
I'm still trying to parse mysql dumps from wikipedia, and I'm having problems with mysql table options (table_option
in the docs).
CREATE TABLE x (b integer) ENGINE=InnoDB AUTO_INCREMENT=413856661 DEFAULT CHARSET=binary;
Currently, when parse fails, the user only gets a very uninformative error string as a result.
It would be nice to have a custom error type with useful functionality , such as knowing where the error happened.
SQL files can contain any byte sequence. However, this library only exposes a way to parse a rust string (that is, a sequence of unicode codepoints). This makes it impossible to parse some SQL files (such as the wikipedia dumps I am currently working with), as thay contain byte sequences that are not valid utf-8.
The api should expose a function that takes an &[u8]
instead of an &str
.
For handling byte sequences that are not valid utf8 in literal strings, I see two possibilities:
Blob(Vec<u8>)
(the information that the literal was a string and not a blob would be lost)rust-encoding
(invalid characters would be lost).Hi, your library seem to have a comprehensive structure of queries. I noticed that the DeleteStatement
is lacking though.
MySQL supports the ability to choose which partitions of a given table will be selected from/changed. We will want to be able to parse this when they show up anywhere in for DELETE, INSERT, REPLACE, SELECT and UPDATE, so I believe it will work best to parse this within the table_reference
and schema_table_reference
parsers.
I originally opened this as a ticket relating to some INSERT changes but felt it would be better under it's own ticket/commit since it is its own command.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.