dashbitco / nimble_parsec Goto Github PK
View Code? Open in Web Editor NEWA simple and fast library for text-based parser combinators
A simple and fast library for text-based parser combinators
So we don't need to explicitly check that we are done every time.
The user should have access to the full tuple that parser combinators operate on, and that tuple should include a state
element that could be get and set by the user. Something like this:
{:ok, acc, stack, state, rest, line, column}
If the user is meant to pattern match on this tuple directly, the tuple should probably be a record, which is easier to pattern match on.
I would like to parse something like the following string:
'<foo="bar",baz="xxx">'
into the following list:
["foo", "bar", "baz", "xxx"]
This works fine, as long as the "bar" and "baz" do not contain escaped strings. The following example does not work:
'<foo="baaa\"rrrrr",baz="xxx\"substring\"xxx">'
because it contains a \"
and my combinator is defined as utf8_string([not: ?"], min: 1)
.
Is there a recommended way of parsing strings containing escaped characters, specifically \"
characters?
As discussed here, NimbleParsec should have an expect
combinator for better error reporting. I suggest something like this: expect(previous \\ [], expected)
. This combinator would raise an error is expected
didn't match.
My use case: I'm currently working on a parser for the ICU Message Format. Currently, NimbleParsec backtracks so much that I get rather useless error messages. Suppose we have something like this:
{variable, plural, one {...} two {...} abcd {...}}
The above is invalid, because abcd
is not a supported option for the plural
argument type. I already know it won't be supported as soon as I parse {variable, plural,
, and I'd like to emit an error as soon as I find the abcd
option. But currently there is no way to do this... NimbleParsec will just backtrack and fail with an error message that doesn't really help anyone (it actually says it expects an end of string on character 0...).
An expect
combinator would allow me to emit the correct error message.
mix nimble_parsec.compile
is great for removing nimble_parsec
as a dependency for a package (as is currently the case for ex_cldr). However compiling the resultant code can produce a lot of compiler unused variable
warnings which I think is not a good thing for a package if it can be avoided.
I have explored a little and can remove many of the warnings by replacing in NimbleParsec.Compiler
:
defp build_proxy_to(name, next, n) do
args = quote(do: [rest, acc, stack, context, line, offset])
[_, acc, stack, _, _, _] = args
body =
quote do
unquote(build_acc_depth(n, acc, stack)) = stack
unquote(next)(rest, acc, stack, context, line, offset)
end
{name, args, true, body}
end
with
defp build_proxy_to(name, next, n) do
args = quote(do: [rest, acc, stack, context, line, offset])
[_, acc, stack, _, _, _] = args
body =
quote do
# Is removed in a compiler optimisation pass
_ = {acc, stack}
unquote(build_acc_depth(n, acc, stack)) = stack
unquote(next)(rest, acc, stack, context, line, offset)
end
{name, args, true, body}
end
However there are still one case I haven't nailed down yet which produces repeated:
warning: variable "context" is unused
lib/cldr/language_tag/rfc5646_parser.ex:4116
warning: variable "line" is unused
lib/cldr/language_tag/rfc5646_parser.ex:4116
warning: variable "offset" is unused
lib/cldr/language_tag/rfc5646_parser.ex:4116
warning: variable "rest" is unused
lib/cldr/language_tag/rfc5646_parser.ex:4116
These warnings come from code generated with the following shape:
defp language_tag__35(
inner_rest,
inner_acc,
[{rest, acc, context, line, offset} | stack],
inner_context,
inner_line,
inner_offset
) do
language_tag__27(
inner_rest,
[],
[{inner_rest, inner_acc ++ acc, inner_context, inner_line, inner_offset} | stack],
inner_context,
inner_line,
inner_offset
)
end
I am happy to make a PR for this, but could also use some guidance on where to hunt down this last case since the pattern [{rest, acc, context, line, offset} | stack]
is used in 5 places for different combinator compilations.
This takes away from the nimble brand, but I think Elixir lacks a good parser combinator library in the Style of nimbleparsec. Most libraries are specific to strings (which is weird because they don't actually leverage the BEAM's patern matching capabilities). The most generic one I know is ExSpirit, and although it's very extensible it's not as pleasant to use.
Would you like to add generic combinators to NimbleParsec or would you prefer that as a separate package?
Many parser combinator libraries maintain a set of the common combinators, like whitespace
and word
. Would you accept pull requests to add those kinds of combinators? Usually they are organized by some kind of domain, like NimbleParsec.Text
and NimbleParsec.Number
.
Over the last week or so I've been experimenting with using nimble_parsec as the basis for a new GraphQL parser for the absinthe project.
What I've built so far is here: https://github.com/bruce/absinthe/blob/nimble-parsec-experiment/lib/absinthe/parser.ex
At this point I have the rough (and probably very naive) implementation of (almost) the entire GraphQL specification built out as nimble_parsec combinators. The experience has been great; I've really enjoyed using the package and feel like this is working towards much more maintainable, enjoyable parser code than our current leex/yecc implementation.
I've gotten it to the point that it compiles, although it doesn't actually do a whole lot yet (I'm, for example, only using map/traverse in a few places to generate a few of the resulting structs we'll want eventually, and I'm not skipping ignored input in most of the combinators yet). I've exposed a few defparsec
s for a few tests (and some necessary recursion), and it's feeling pretty good.
Except for the compile time (a few minutes on my new XPS 13) and the size of the resulting .beam
, which is... 23MB. (Reducing the defparsec
usage to those strictly needed for recursion and the main entry point reduces the .beam
filesize to 22MB.)
It's quite possible that I'm doing something boneheaded with the way this is structured/built at this point, but on the off chance I'm not [completely] and this can help serve as a sample of a larger grammar to track down an issue (with generation or documentation), I'd love it if someone could take a peek at what I've built so far.
$ elixir --version
Erlang/OTP 20 [erts-9.2] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:10] [hipe] [kernel-poll:false]
Elixir 1.6.4 (compiled with OTP 20)
If I do the following:
defmodule MarcParser do
import NimbleParsec
defparsec :parse_marc,
utf8_string([], 24)
end
and run mix compile
, it never finishes.
Let's keep {line, last_line_byte}
and the current_byte
.
I'm trying to write a super-parser, smthg like this:
defmacro named_parser(name, parser, tag) do
defparsec(
name,
symbol
|> map({:extract, []})
|> ignore(spaces)
|> optional(int_t |> tag(:tag))
|> ignore(spaces)
|> ignore(ascii_char([?:]))
|> ignore(spaces1)
|> concat(parser)
|> tag(tag)
|> map({:data_field_prettify, []})
)
end
but it doesn't works.
So I tried this:
defmacro named_parser(name, parser, tag) do
quote do
defparsec(
unquote(name),
symbol
|> map({:extract, []})
|> ignore(spaces)
|> optional(int_t |> tag(:tag))
|> ignore(spaces)
|> ignore(ascii_char([?:]))
|> ignore(spaces1)
|> concat(unquote(parser))
|> tag(unquote(tag))
|> map({:data_field_prettify, []})
)
end
end
but I get this error:
== Compilation error in file lib/rulec/rulec_parser.ex ==
** (CompileError) lib/rulec/rulec_parser.ex:456: undefined function symbol/0
(stdlib) lists.erl:1354: :lists.mapfoldl/3
(stdlib) lists.erl:1354: :lists.mapfoldl/3
(elixir) expanding macro: Kernel.|>/2
Any suggestion?
Hi, might be boneheaded but can't see any way to do a monadic bind. I'm writing a Bencoding library and e.g. for byte strings, it would be useful to parse the length and then construct a parser that only parses that number of characters.
Currently, if I have something like this:
@impl Makeup.Lexer
defparsec :root,
root_combinator
I get this:
warning: function root__0/6 is private, @impl attribute is always discarded for private functions/macros
test/lexer/fixtures/token_lexer.exs:31
It looks like this happens because nimble_parsec doesn't generate the def root(...)
clause first, generating some helper function clauses instead. This makes it hard to annotate the function with docs or @impl
attributes.
Options debug: true
and inline: true
at the same time in defparsec/3
causes the compile error:
== Compilation error in file lib/my_parser.ex ==** (ArgumentError) cannot convert the given list to a string.
To be converted to a string, a list must contain only:
* strings
* integers representing Unicode codepoints
* or a list containing one of these three elements
Please check the given list or call inspect/1 to get the list representation, got:[datetime__2: 6]
(elixir) lib/list.ex:826: List.to_string/1
lib/nimble_parsec/recorder.ex:66: NimbleParsec.Recorder.format_defs/3
lib/nimble_parsec/recorder.ex:27: NimbleParsec.Recorder.record/6
lib/my_parser.ex:19: (module)
Is this problem concerning to the note of defparsec/3
in the documentation?
It (:inline) is disabled by default because of a bug in Elixir v1.5 and v1.6 where unused functions that are inlined cause a compilation error
Readind the doc:
https://github.com/plataformatec/nimble_parsec/blob/8005d1439e2479dcbcc13cb8c7bbd0fb833af875/lib/nimble_parsec.ex#L36
seems not possible to call "normal" function from defparsec.
Following the suggstion of the doc I started using module variable but this seems to be limited.
For example I'm not able to rapresent mutal recursive parser, somenthing like:
lambda_t =
ignore(string("fn"))
|> ignore(ascii_char([?(]))
|> concat(optional(
symbol
|> repeat(
|> ignore(ascii_char([?,]))
|> concat(symbol)
)
|> tag(:parameters)
))
|> ignore(ascii_char([?)]))
|> ignore(ascii_char([?{]))
|> concat(
statements
|> tag(:body)
)
|> ignore(ascii_char([?}]))
|> tag(:lambda_t)
statements = choice ([basic_types, lambda_t]) ...
statements could contains a lambda_t and in order to use it we should declare before lambda_t, so this approach is not easy to follow.
Any idea how to achive this?
Any suggestion?
I'd like to ship a parser with a Hex package but want to avoid runtime dependency on NimbleParsec. Another use case recently mentioned is to build a parser and ship it as part of Elixir.
Here's a proof-of-concept project that does this:
Is there/could there be a public API that makes such code generation easier?
I'm building a basic signed integer combinator. This (likely naive) attempt seems to work fine:
int_value =
optional(ascii_char([?-]))
|> integer(min: 1)
|> traverse({:sign_int_value, []})
defp sign_int_value(_rest, [int, _neg], context, _, _) do
{[int * -1], context}
end
defp sign_int_value(_rest, res, context, _, _) do
{res, context}
end
If I try to tag the result, however, by turning it into:
int_value =
optional(ascii_char([?-]))
|> integer(min: 1)
|> traverse({:sign_int_value, []})
|> tag(:int_value)
I see an error:
undefined function when/2
It's reported on the line in my parser module that I'm using defparsec
. Using debug: true
, here's what's generated:
defp __int_value____0(rest, acc, stack, context, line, offset) do
__int_value____1(rest, [], [acc | stack], context, line, offset)
end
defp __int_value____1(<<x0::integer, rest::binary>>, acc, stack, context, comb__line, comb__offset) when x0 === 45 do
__int_value____2(rest, [x0] ++ acc, stack, context, comb__line, comb__offset + 1)
end
defp __int_value____1(rest, acc, stack, context, line, offset) do
__int_value____2(rest, acc, stack, context, line, offset)
end
defp __int_value____2(rest, acc, stack, context, line, offset) do
__int_value____3(rest, [], [acc | stack], context, line, offset)
end
defp __int_value____3(<<x0::integer, rest::binary>>, acc, stack, context, comb__line, comb__offset) when x0 >= 48 and x0 <= 57 do
__int_value____4(rest, [(x0 - 48) * 1] ++ acc, stack, context, comb__line, comb__offset + 1)
end
defp __int_value____3(rest, _acc, _stack, context, line, offset) do
{:error, "expected byte in the range ?0..?9", rest, context, line, offset}
end
defp __int_value____4(<<x0::integer, rest::binary>>, acc, stack, context, comb__line, comb__offset) when x0 >= 48 and x0 <= 57 do
__int_value____6(rest, [x0] ++ acc, stack, context, comb__line, comb__offset + 1)
end
defp __int_value____4(rest, acc, stack, context, line, offset) do
__int_value____5(rest, acc, stack, context, line, offset)
end
defp __int_value____6(rest, acc, stack, context, line, offset) do
__int_value____4(rest, acc, stack, context, line, offset)
end
defp __int_value____5(rest, user_acc, [acc | stack], context, line, offset) do
__int_value____7(rest, (
[head | tail] = :lists.reverse(user_acc)
[:lists.foldl(fn x, acc -> x - 48 + acc * 10 end, head, tail)]
) ++ acc, stack, context, line, offset)
end
defp __int_value____7(rest, user_acc, [acc | stack], context, line, offset) do
case(with({acc, context} when is_list(acc) <- sign_int_value(rest, user_acc, context, line, offset), {acc, context} when is_list(acc) <- {[int_value: :lists.reverse(acc)], context}) do
{acc, context} when is_list(acc)
end) do
{user_acc, context} when is_list(user_acc) ->
__int_value____8(rest, user_acc ++ acc, stack, context, line, offset)
{:error, reason} ->
{:error, reason, rest, context, line, offset}
end
end
defp __int_value____8(rest, acc, _stack, context, line, offset) do
{:ok, acc, rest, context, line, offset}
end
I don't immediately see the use of the mythical when/2
that's causing the issue, but I'd guess it's somewhere in __int_value____7/6
in the midst of the case
+ with
complexities.
Now, it's possible I'm missing something in my reading of the docs with regard to the compatibility of tag
and traverse
, but if they are incompatible for some reason, it seems to me that it's still worth reporting this bad code generation.
(It does appear I can just set the tag as part of the return value from traverse
as a temporary workaround, but that feels like giving the sign_int_value
too much responsibility.)
I'm submitting this issue on the not-so-unlikely chance someone knows exactly what's going on and can fix it faster. Otherwise, I am happy to dig in deeper tomorrow, and learn more about the package in the process (which I'm finding a real joy to use!).
It would be nice to be able to stream data into nimble_parsec as it arrives over the network.
Current Support: None (as far as I can tell)
Things to consider:
defparsec
optionHi everyone,
The idea of having a regex combinator is to be easier to implement some rules that makes sense to be implemented via Regex, such as the names of XML elements, which have strict rules for naming.
As an example, Scala has a Regex parser, which makes parsing names of XML elements as easy as [a-zA-Z_:][a-zA-Z0-9\.-_:]*
.
I'm not sure if it's viable to implement these, since we cannot do pattern matching on regexes on BEAM, but I think this can be worthwile to consider.
Parsec has some handy combinators between and sepBy.
Having these in the package should make it easy to parse bracketed / delimited text, eg:
array_of_ints = between(string("["), string("]"), sep_by(",", integer()))
or perhaps with the order of arguments reversed:
array_of_ints =
integer()
|> sep_by(",")
|> between(string("["), string("]"))
It reads size_combinator
from the input and then it reads N bytes from the binary. For example:
sized_binary(combinator, integer(min: 1))
We will also add:
sized_binary(combinator, bytes(4))
The combinator must always return a list with one element. This means this is possible:
sized_binary(combinator, integer(min: 1) |> ignore(string(":")))
I'd like to customize the doc of the entry function generated by defparsec
, so I can include some examples of things it can and cannot parse. Is there a way to do this?
Hello,
I've been recently working on parsing https://projectfluent.org/ (detail, failing spec in description). I'm trying to improve error messages and add more validations but I'm stuggling with returning errors.
Failing spec:
describe "errors from combinators" do
defcombinatorp(:errors,
empty()
|> post_traverse(:set_error)
)
defcombinatorp(:function_reference,
repeat(
empty()
|> parsec(:errors)
)
)
defparsec :parse_function_reference, parsec(:function_reference)
defp set_error(_rest, _, _context, _line, _offset) do
{:error, "something wrong"}
end
test "returns ok/error" do
assert parse_function_reference("") == {:error, "something wrong", "", %{}, {1, 0}, 0}
end
end
1) test errors from combinators returns ok/error (NimbleParsec.IntegrationTest)
test/integration_test.exs:121
Assertion with == failed
code: assert parse_function_reference("") == {:error, "something wrong", "", %{}, {1, 0}, 0}
left: {:ok, [], "", %{}, {1, 0}, 0}
right: {:error, "something wrong", "", %{}, {1, 0}, 0}
stacktrace:
test/integration_test.exs:122: (test)
It will pass if I either:
repeat
in function_reference
or
parsec(:errors)
to post_traverse(:set_error)
.Is there any way to deal with that and ensure that errors are returned or is this expected behavior that I can deal with somehow differently? Of course, my combinators are more complex than this example.
Thank you for any help.
I am using the following code
import NimbleParsec
# Type ::= "int" "[" "]" | "boolean" | "int" | Identifier
array_int = string("int []") |> label("int[]")
int = string("int") |> label("int")
bool = string("boolean") |> label("boolean")
defparsec(:types, choice([array_int, int, bool]))
When I test the using int
it says ok but when I test using intd
it is still saying ok, but I want to catch the exact word int
and reject any variation like intd, intuiw, intc...
, how can I achieve this?
Nimble Parsec Version: 0.5.3
Elixir Version: 1.10.0
Today I made a seemingly simple change to the cron parser in oban. Here is the complete diff:
+ whitespace = ascii_string([?\s, ?\t], min: 1)
+
defparsec(
:cron,
minutes
- |> ignore(string(" "))
+ |> ignore(whitespace)
|> concat(hours)
- |> ignore(string(" "))
+ |> ignore(whitespace)
|> concat(days)
- |> ignore(string(" "))
+ |> ignore(whitespace)
|> concat(months)
- |> ignore(string(" "))
+ |> ignore(whitespace)
|> concat(weekdays)
)
The new parser works exactly as expected, but compiling it generates this list of unused variable warnings:
warning: variable "user_acc" is unused (if the variable is not meant to be used, prefix it with an underscore)
lib/oban/crontab/parser.ex:797: Oban.Crontab.Parser.cron__105/6
warning: variable "user_acc" is unused (if the variable is not meant to be used, prefix it with an underscore)
lib/oban/crontab/parser.ex:801: Oban.Crontab.Parser.cron__107/6
warning: variable "user_acc" is unused (if the variable is not meant to be used, prefix it with an underscore)
lib/oban/crontab/parser.ex:1558: Oban.Crontab.Parser.cron__213/6
warning: variable "user_acc" is unused (if the variable is not meant to be used, prefix it with an underscore)
lib/oban/crontab/parser.ex:1562: Oban.Crontab.Parser.cron__215/6
warning: variable "user_acc" is unused (if the variable is not meant to be used, prefix it with an underscore)
lib/oban/crontab/parser.ex:2319: Oban.Crontab.Parser.cron__321/6
warning: variable "user_acc" is unused (if the variable is not meant to be used, prefix it with an underscore)
lib/oban/crontab/parser.ex:2323: Oban.Crontab.Parser.cron__323/6
warning: variable "user_acc" is unused (if the variable is not meant to be used, prefix it with an underscore)
lib/oban/crontab/parser.ex:3266: Oban.Crontab.Parser.cron__443/6
warning: variable "user_acc" is unused (if the variable is not meant to be used, prefix it with an underscore)
lib/oban/crontab/parser.ex:3270: Oban.Crontab.Parser.cron__445/6
The function definitions are rather non-descript. Here is a sample from the warning at line 797:
defp cron__105(rest, user_acc, [acc | stack], context, line, offset) do
cron__107(rest, acc, stack, context, line, offset)
end
In this particular instance I commit the compiled version, so I'm able to manually fix the deprecation warnings. I expect that the compiled code doesn't have any compilation errors, though.
I am struggling with dialyzer (as usual) and it appears all my combinators that use unwrap_and_tag/2
generate the following error:
lib/cldr/language_tag/rfc5646_grammar.ex:81:call_without_opaque
The call NimbleParsec.unwrap_and_tag('Elixir.NimbleParsec':t(),'script') does not have an opaque term of type 'Elixir.NimbleParsec':t() in 2nd.
The relevant combinator is below. It is one of many examples and it appears that all calls to unwrap_and_tag/2
cause the error.
def script do
alpha4()
|> unwrap_and_tag(:script)
|> label("a script id of four alphabetic character")
end
It is a typing error, or a misunderstanding on my side?
https://tools.ietf.org/html/rfc3501 requires a lot of case insensitive string matching for keywords like "INBOX", "ALL", "BODY", "FROM". And all strings defined in ABNF is case insensitive by default https://tools.ietf.org/html/rfc5234#page-5. Is there an easy way to do case insensitive string matching that I've been missing? If there isn't one, is it possible to add this feature?
Thanks a lot.
I Think nimble doesn't currently support positive or negative lookahead. I don't think I'm using them much at the moment, but're useful sometimea.
Hi,
the helper function that match possibly empty sequence of iterations of
a combinator will be useful. I wrote a simple implementation.
defmodule LexerHelper do
import NimbleParsec
def possible(comb, to_poss) do
comb
|> repeat(to_poss)
|> lookahead_not(to_poss)
end
end
digit = ascii_char [?0..?9]
num = digit
|> possible(
concat(
optional(ascii_char([?_])),
digit
)
)
defparsec :num, num
iex(1)> Lexer.num "123"
{:ok, '123', "", %{}, {1, 0}, 3}
iex(2)> Lexer.num "123_234"
{:ok, '123_234', "", %{}, {1, 0}, 7}
iex(3)> Lexer.num "123_234_"
{:ok, '123_234', "_", %{}, {1, 0}, 7}
iex(4)> Lexer.num "123_234 34"
{:ok, '123_234', " 34", %{}, {1, 0}, 7}
The generated doc for an entry point states
Returns {:ok, [token], rest, context, line, byte_offset} or {:error, reason,
rest, context, line, byte_offset}.
which is not correct because at the position of line
I see a tuple. I suspect it is {line where accumulared result begins, col where it begins}?
I've defined a parser in this way:
int_t = integer(min: 1) |> tag(:int_t)
defparsec :parse_inttype, int_t
When I try to use it in this way:
RuleC.parse_inttype("123")
I get this:
{:ok, [int_t: '{'], "", %{}, {1, 0}, 3}
instead of:
{:ok, [int_t: 123], "", %{}, {1, 0}, 3}
But other case works:
> RuleC.parse_inttype("213")
{:ok, [int_t: [213]], "", %{}, {1, 0}, 3}
I'm not an Elixir expert but seems that the internal type of integer is a char instead of int
Firstly, this is a brilliant library and I'm excited to find a use case for it. The composition abilities are really attractive given how unwieldy parsing can get sometimes.
To the point now, though: given the nature of strings in Elixir, it wasn't entirely clear to me that this library is strictly for parsing text data. I read a non-trivial amount of the high level documentation before deciding to try and use it for parsing raw binary data. It wasn't until I wrote my first integer combinator that I realized I had misunderstood:
iex(1)> App.Parser.version(<<1, 6>>)
{:error,
"expected byte in the range ?0..?9, followed by byte in the range ?0..?9",
<<1, 1>>, %{}, {1, 0}, 0}
I was disappointed since this rules out my current use case, but it made sense in hindsight after getting more familiar with the library.
I started to submit a PR for the docs that would clarify this important detail up front so that others wouldn't have the same confusion that I did, but then it occurred to me that I should probably ask before I presume this library will always be strictly for parsing text data. Are there hopes or plans to expand its use case to generic binary parsing? If not, I'll go ahead and submit that PR! :)
Motivation: say I want to parse a line which might be empty.
Currently, I have to do this:
line = choice([utf8_string([not: ?\n], min: 1), string("")]) |> ignore(string("\n"))
I'd like to be able to write:
line = utf8_string([not: ?\n], min: 0) |> ignore(string("\n"))
I propose defining utf8(chars, min: 0)
as equivalent to choice([utf8_string([not: ?\n], min: 1), string("")])
. Although implemented in a more efficient way if possible.
The tag for v0.5.1 was not pushed to Github, so clicking "View Source" in the documentation results in a 404.
I created a project using the mix new project
and add the dependency in mix.exs
, I found this example in documentation
defmodule MyParser do
import NimbleParsec
defparsec integer(min: 1) |> tag(:integer)
end
MyParser.integer("1234")
when I run this example from project directory using the iex -S mix
I got this error message
** (CompileError) iex:12: undefined function defparsec/1
Elixir 1.9.1
NimbleParser 0.5.1
I have the following small code, running with the nimble_parsec dependency at version 0.6:
defmodule Foo do
import NimbleParsec
defparsec :test,
(
string("START ")
|> lookahead_not(string("MIDDLE/")) # Works
# |> lookahead_not(concat(ascii_string([?A..?Z], min: 1), string("/"))) # Fails
# |> lookahead_not(ascii_string([?A..?Z], min: 1) |> string("/")) # Fails
|> ascii_string([?A..?Z], [min: 1, max: 7])
)
end
This test parsec is supposed to match "START " followed by some A-Z characters that are NOT followed by a "/".
It works for the lookahead_not()
commented with "# Works" (that is, it tells me "did not expect string ...") but does not work with the lookahead_not()
s commented with "# Fails" (that is, it matches the A-Z up to but not including the "/", while instead it should tell me "did not expect string ..."):
With lookahead_not(string("MIDDLE/"))
active:
4▶ Foo.test("START MIDDLE END")
{:ok, ["START ", "MIDDLE"], " END", %{}, {1, 0}, 12}
5▶ Foo.test("START MIDDLE/ END")
{:error, "did not expect string \"MIDDLE/\"", "MIDDLE/ END", %{}, {1, 0}, 6}
With lookahead_not(concat(ascii_string([?A..?Z], min: 1), string("/")))
active:
7▶ Foo.test("START MIDDLE END")
{:ok, ["START ", "MIDDLE"], " END", %{}, {1, 0}, 12}
8▶ Foo.test("START MIDDLE/ END")
{:ok, ["START ", "MIDDLE"], "/ END", %{}, {1, 0}, 12}
With lookahead_not(ascii_string([?A..?Z], min: 1) |> string("/"))
active:
10▶ Foo.test("START MIDDLE END")
{:ok, ["START ", "MIDDLE"], " END", %{}, {1, 0}, 12}
11▶ Foo.test("START MIDDLE/ END")
{:ok, ["START ", "MIDDLE"], "/ END", %{}, {1, 0}, 12}
As you can see, in the second and third test, the "/" is seemingly not considered/detected/accounted for by the lookahead_not()
, or the lookahead_not()
doesn't seem to have any effect.
Using ascii_string([?A..?Z]
including with other/more ?x
notations inside it has worked perfectly fine for the rest of the project, which is quite a lot of using it, so it doesn't strike me as a problem with encoding or similar, but more as a bug in the lookahead_not()
combinator.
Let me know if there's anything I can do to try narrowing this down better!
Edit, in case it's relevant:
14▶ runtime_info
## System and architecture
Elixir version: 1.10.2
Erlang/OTP version: 22
ERTS version: 10.7
Compiled for: x86_64-apple-darwin18.7.0
Schedulers: 4
Schedulers online: 4
I don't know if this is a nimble_parsec bug, or an Elixir bug, or a Dialyzer bug, or just me doing something wrong, but here goes:
I'm playing with NimbleParsec, and I'm attempting to put something together that recognises boolean algebra. At the moment, I'm simply trying to get it to recognise two booleans catenated, and I've come up with this:
defmodule Exlox.Parser.Boolean do
import NimbleParsec
def boolean(combinator \\ empty()) do
true_p = string("true") |> replace(true)
false_p = string("false") |> replace(false)
choice(combinator, [true_p, false_p])
end
end
defmodule Exlox.Parser do
import NimbleParsec
import Exlox.Parser.Boolean
defparsec :expression, boolean() |> boolean()
end
That is: it recognises "truefalse", "truetrue", "falsefalse" and "falsetrue". This is OK (though I wonder whether this is the best way to do this...).
But dialyzer doesn't like it:
lib/parser.ex:4: The call 'Elixir.Exlox.Parser.Boolean':boolean([]) does not have an opaque term of type 'Elixir.NimbleParsec':t() as 1st argument
What am I doing wrong?
If I use ascii_chars([?\n])
to expect a newline and there's no newline, the error that I get is "expected byte 10"
. I would expect the error to be something like expected ASCII character \n
or something like that.
I've not much experience with parsers, but given a super simple markdown headline like # Heading
I'm wondering why the first following example doesn't work, while the latter one does:
heading_1 =
string("#")
|> ignore()
|> utf8_string([], min: 1)
defparsec(:markdown, heading_1)
heading_1 =
string("#")
|> ignore()
|> concat(utf8_string([], min: 1))
defparsec(:markdown, heading_1)
Hello,
nimple_parsec 0.5.0
causes phoenix app (or makeup) to fail during compilation.
==> makeup
Compiling 43 files (.ex)
== Compilation error in file lib/makeup/lexer/combinators.ex ==
** (CompileError) lib/makeup/lexer/combinators.ex:116: undefined function repeat_until/2
(stdlib) lists.erl:1338: :lists.foreach/2
(stdlib) erl_eval.erl:680: :erl_eval.do_apply/6
could not compile dependency :makeup, "mix compile" failed. You can recompile this dependency with "mix deps.compile makeup", update it with "mix deps.update makeup" or clean it with "mix deps.clean makeup"
Switching back to 0.4.0
resolves the error.
The following code works OK.
defmodule MyParser do
@moduledoc false
@debug false
import NimbleParsec
white_space =
ignore(ascii_string([9, 32], min: 1))
language_code =
optional(white_space) |> # Needs to be here in order to make traverse work
ascii_string([?a..?z], 2) |>
traverse(:check_language_code) |>
tag(:language)
territory_code =
ascii_string([?A..?Z], 2) |>
tag(:territory)
locale =
choice([language_code, territory_code]) |>
ignore(string(":"))
locale_text =
optional(locale) |>
optional(white_space) |>
utf8_string([], min: 1)
defparsec :locale_text, locale_text, debug: @debug
defp check_language_code(_rest, [language_code], context, _line, _offset) do
case Cldr.validate_locale(language_code) do
{:ok, %Cldr.LanguageTag{cldr_locale_name: cldr_locale_name}} -> {cldr_locale_name |> String.to_atom |> List.wrap, context}
{:error, _} -> {:error, "Not a valid language code (#{language_code})"}
end
end
end
MyParser.locale_text("en:A text")
works and returns a valid language code and MyParser.locale_text("qq:A text")
fails with an error.
But in order to make it work, I had to put an extra part in front of ascii_string([?a..?z], 2)
in language_code
above, otherwise the compiler failed.
I have no idea of the difference and for me it does not matter, but it is not clear to me why traverse
fails being second in the pipe chain.
With this code,
def some_parser do
choice(empty(), [
string("+62"),
string("0")
])
end
Dialyzer complains that choice/2
breaks the contract (t(), t()) -> t()
. It seems that choice/2
expects [t]
instead of just t
, so I think the typespec should be: @spec choice(t, [t]) :: t
.
Is this correct?
Hey, I ran into this issue while writing a language parser in nimble_parsec
. The issue was that chaining the repeat and optional combinators like so: repeat(optional(...))
never terminates execution. The reason is pretty clear: it repeatedly chooses the "None" option of optional
, and just keeps going. A full replication of the issue can be found in this gist.
The symptoms of this bug are that the combinator never terminates, so as long as the user does rudimentary testing of the parser, they will identify the issue. It never terminates on any input, including inputs which can never match. Although I identified the issue quickly, a newer user might have real trouble with it. I somewhat doubt this error can be detected statically, but a warning in the docs might help save some people some time when they encounter this issue.
José, I've tried to work out these issues so I could send a PR but my dialyzer-fu is poor and my understanding of the architecture of nimble_parsec
is primitive at best..
I have omitted what I believe are duplicated errors to leave as simple a case as possible. The source links directly to where the error is indicated.
lib/cldr/language_tag/rfc5646_grammar.ex:13:no_return
Function langtag/0 has no local return.
________________________________________________________________________________
lib/cldr/language_tag/rfc5646_grammar.ex:59:call_without_opaque
The call NimbleParsec.concat('Elixir.NimbleParsec':t(),nonempty_maybe_improper_list()) does not have an opaque term of type 'Elixir.NimbleParsec':t() in 2nd.
________________________________________________________________________________
lib/cldr/language_tag/rfc5646_grammar.ex:82:call_with_opaque
The call NimbleParsec.label('Elixir.NimbleParsec':t(),<<_:320>>) contains an opaque term in 1st argument when terms of different types are expected in these positions}.
________________________________________________________________________________
lib/cldr/language_tag/rfc5646_grammar.ex:146:call_without_opaque
The call NimbleParsec.unwrap_and_tag(nonempty_maybe_improper_list(),'type') does not have an opaque term of type 'Elixir.NimbleParsec':t() in 1st.
________________________________________________________________________________
lib/cldr/language_tag/rfc5646_grammar.ex:271:call_without_opaque
The call NimbleParsec.choice([nonempty_maybe_improper_list(),...]) does not have a term of type ['Elixir.NimbleParsec':t(),...] (with opaque subterms) in 1st.
________________________________________________________________________________
Hi, I played a bit with quoting and the following code parses just fine using Code.string_to_quoted:
parser parseName do many1 letter end parser parseAbs do string "abs" spaces name <- parseName spaces body <- parseTerm return {:abs, name, body} end parser parseTerm do chainl1 (try(parseAbs) or parseVar or parens(parseTerm)), parseApp end parser parseVar do name <- parseName return {:var, name} end parser parens p do char '(' p_ <- p char ')' return p_ end parser parseApp do spaces return (fn f, a -> {:app, f, a} end) end
Which looks basically the same as using Parsec in Haskell. Not suggesting anything, just wanted to share.
Looking the documentation of parsec
function seems that is not possible to invoke a parser of a different Module.
In order to distribute the logic of parsing in several files and in order to use macro (as a superset of defparsec
for instance) I guess that we need to expand defparsec
functionalities.
Rationale: the choice combinator currently accepts a list of two or more combinators. However, when writing a macro that can accept a variable number of choices, it is useful to have a combinator that never matches to handle the case of zero choices provided.
See here for a practical example:
https://github.com/derpibooru/philomena/blob/c66fe0ca39cd8d4f51b21c8fe939489c24bba892/lib/philomena/search/lexer.ex#L262-L285
I implemented a combinator (roughly) as per the ABNF described by the IETF:
IPv4address = d8 "." d8 "." d8 "." d8
d8 = DIGIT ; 0-9
/ %x31-39 DIGIT ; 10-99
/ "1" 2DIGIT ; 100-199
/ "2" %x30-34 DIGIT ; 200-249
/ "25" %x30-35 ; 250-255
IPv6address = 6(h16 ":") ls32
/ "::" 5(h16 ":") ls32
/ [ h16 ] "::" 4(h16 ":") ls32
/ [ *1(h16 ":") h16 ] "::" 3(h16 ":") ls32
/ [ *2(h16 ":") h16 ] "::" 2(h16 ":") ls32
/ [ *3(h16 ":") h16 ] "::" h16 ":" ls32
/ [ *4(h16 ":") h16 ] "::" ls32
/ [ *5(h16 ":") h16 ] "::" h16
/ [ *6(h16 ":") h16 ] "::"
ls32 = h16 ":" h16 / IPv4address
h16 = 1*4HEXDIG
ipv4_octet =
ascii_string([?0..?9], min: 1, max: 3)
ipv4_address =
times(ipv4_octet |> string("."), 3)
|> concat(ipv4_octet)
ipv6_hexadectet =
ascii_string('0123456789abcdefABCDEF', min: 1, max: 4)
ipv6_ls32 =
choice([
ipv6_hexadectet |> string(":") |> concat(ipv6_hexadectet),
ipv4_address
])
ipv6_fragment =
ipv6_hexadectet |> string(":")
ipv6_address =
choice([
times(ipv6_fragment, 6) |> concat(ipv6_ls32),
string("::") |> times(ipv6_fragment, 5) |> concat(ipv6_ls32),
optional(ipv6_hexadectet) |> string("::") |> times(ipv6_fragment, 4) |> concat(ipv6_ls32),
optional(times(ipv6_fragment, max: 1) |> concat(ipv6_hexadectet)) |> string("::") |> times(ipv6_fragment, 3) |> concat(ipv6_ls32),
optional(times(ipv6_fragment, max: 2) |> concat(ipv6_hexadectet)) |> string("::") |> times(ipv6_fragment, 2) |> concat(ipv6_ls32),
optional(times(ipv6_fragment, max: 3) |> concat(ipv6_hexadectet)) |> string("::") |> concat(ipv6_fragment) |> concat(ipv6_ls32),
optional(times(ipv6_fragment, max: 4) |> concat(ipv6_hexadectet)) |> string("::") |> concat(ipv6_ls32),
optional(times(ipv6_fragment, max: 5) |> concat(ipv6_hexadectet)) |> string("::") |> concat(ipv6_hexadectet),
optional(times(ipv6_fragment, max: 6) |> concat(ipv6_hexadectet)) |> string("::")
])
I have checked this combinator over and over again and I am fairly sure it is implemented exactly per the BNF, but I cannot get it to match addresses like fe80::362c:b162:1a49:bf12
or 2000:4000:6000:8000::a
. What's going wrong?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.