andrewcooke / parsercombinator.jl Goto Github PK
View Code? Open in Web Editor NEWA parser combinator library for Julia
License: Other
A parser combinator library for Julia
License: Other
Pkg.update() to 1.7.2 and now getting
julia> Pkg.test("LightGraphs")
INFO: Testing LightGraphs
INFO: Recompiling stale cache file /Users/seth/.julia/lib/v0.5/AutoHashEquals.ji for module AutoHashEquals.
INFO: Recompiling stale cache file /Users/seth/.julia/lib/v0.5/LightGraphs.ji for module LightGraphs.
INFO: Recompiling stale cache file /Users/seth/.julia/lib/v0.5/ParserCombinator.ji for module ParserCombinator.
running /Users/seth/.julia/v0.5/LightGraphs/test/operators.jl ...
running /Users/seth/.julia/v0.5/LightGraphs/test/graphdigraph.jl ...
running /Users/seth/.julia/v0.5/LightGraphs/test/persistence.jl ...
ERROR: LoadError: LoadError: ParserCombinator.ParserException("cannot parse")
in once at /Users/seth/.julia/v0.5/ParserCombinator/src/core/parsers.jl:184
[inlined code] from /Users/seth/.julia/v0.5/ParserCombinator/src/core/parsers.jl:169
in single_result at /Users/seth/.julia/v0.5/ParserCombinator/src/core/parsers.jl:193
in parse_dot at /Users/seth/.julia/v0.5/ParserCombinator/src/dot/DOT.jl:216
in readdot at /Users/seth/.julia/v0.5/LightGraphs/src/persistence/dot.jl:26
in readdot at /Users/seth/.julia/v0.5/LightGraphs/src/persistence/dot.jl:25
in include at ./boot.jl:261
in include_from_node1 at ./loading.jl:392
[inlined code] from /Users/seth/.julia/v0.5/LightGraphs/test/runtests.jl:81
in anonymous at ./no file:4294967295
in include at ./boot.jl:261
in include_from_node1 at ./loading.jl:392
in process_options at ./client.jl:277
in _start at ./client.jl:377
while loading /Users/seth/.julia/v0.5/LightGraphs/test/persistence.jl, in expression starting on line 45
while loading /Users/seth/.julia/v0.5/LightGraphs/test/runtests.jl, in expression starting on line 78
Any ideas? I'll do some digging also.
Edit: also happening on 0.4...
If anyone wants write access, or wants to fork and take over this package (I don't know how Julia supports that) please contact me. I am very unlikely to be supporting this in the near/medium term. Sorry.
This is probably a big job, but:
WARNING: produce is now deprecated. Use Channels for inter-task communication.
Stacktrace:
[1] depwarn(::String, ::Symbol) at ./deprecated.jl:64
[2] produce(::Array{Any,1}) at ./deprecated.jl:884
[3] #producer#27(::Bool, ::Function, ::ParserCombinator.NoCache{String,Int64}, ::ParserCombinator.Seq!) at /Users/seth/.julia/v0.6/ParserCombinator/src/core/parsers.jl:141
[4] (::ParserCombinator.#kw##producer)(::Array{Any,1}, ::ParserCombinator.#producer, ::ParserCombinator.NoCache{String,Int64}, ::ParserCombinator.Seq!) at ./<missing>:0
[5] (::ParserCombinator.##29#30{Bool,ParserCombinator.Seq!,ParserCombinator.NoCache{String,Int64}})() at /Users/seth/.julia/v0.6/ParserCombinator/src/core/parsers.jl:171
while loading /Users/seth/.julia/v0.6/LightGraphs/test/persistence/persistence.jl, in expression starting on line 106
Hi,
In the grammar I am writing,
there are a few places where I need to match against one of a large selection of constants.
Which I will call a_long_list
, it might have 20 elements, it might have 200.
In theory, I'm sure there are use cases for matching against one of thousands, or tens of thousands.
Alt(Equals.(a_long_list)...)
works.
But I understand that it will be O(n*m)
for n
the length of the list, and m
the maximum length of any element of that list
Which honestly isn't too bad, I think.
But I figure it can be done better.
If a Trie is used, I think this becomes just O(m)
.
I might be screwing up my math here, but I think that in the process of finding the longest match, one automatically finds all the shorter matches, which can be saved for if backoff is required.
So there is no need to re-step through the source, if one fails.
I've started working on this, I have the Trie stuff working to return what strings match, I just need to like it into a matcher
, with the trampoline stuffs, Success/Fail/Execute
.
What I currently propose is a matcher:
EqualsOneOf(values; greedy::Bool=true)
Where values
are the values that it could be equal to.
If greedy
is false
then this matches shortest-first.
And if it has to back-off, then gives the second shortest string in values that matches,
etc.
This is the natural order for a Trie.
If greedy
is true
, then it matches longest-first (the longest string that is in values first), and then if that needs to be backedoff from, match's the second longest, and so forth.
The is accomplished by collect
ing, all the values that match as Trie keys,
and then reverse
ing the order. (Which i guess does make this O(n+m)
)
This is a bit less expressive than Alt(Equals.(a_long_list)...)
since that lets you choose a priority for the matchers, not just longestfirst or shortestfirst
What do you think?
When I have it working, should I make a PR?
One issue is that right now, Tries only support strings.
JuliaCollections/DataStructures.jl#220
I did some perfomance tests for reading a file from a graph using my package FatGraphs;
Pkg.clone("https://github.com/CarloLucibello/FatGraphs.jl")
For comparison I write a graph in a simple text format (Pajek .net). Each of the following function has been run twice to avoid compilations times:
julia> g = Graph(100,1000,seed=1)
Graph{Int64}(100, 1000)
julia> @time writegraph("test.net",g)
0.243075 seconds (37.02 k allocations: 1.487 MB)
1
julia> @time readgraph("test.net")
0.003040 seconds (18.54 k allocations: 550.281 KB)
Graph{Int64}(100, 1000)
```
Notice how perfomance is degraded when **reading** from a .dot or a .gml file, relying on ParserCombinator:
```julia
julia> @time writegraph("test.dot",g)
0.001826 seconds (15.23 k allocations: 664.031 KB)
1
julia> @time readgraph("test.dot")
2.408855 seconds (1.22 M allocations: 49.666 MB, 0.56% gc time)
Graph{Int64}(100, 1000)
julia> @time writegraph("test.gml",g)
0.001426 seconds (16.83 k allocations: 789.031 KB)
1
julia> @time readgraph("test.gml")
1.024898 seconds (511.11 k allocations: 18.279 MB, 0.64% gc time)
Graph{Int64}(100, 1000)
```
Probably ParserCombinator has some huge type instability issues. Can those be avoided?
Bye,
Carlo
I'm trying to define a grammar with a loop, so I'm using Delayed()
. I get an error when trying to redefine the matcher for a delayed rule:
ERROR: `convert` has no method matching convert(::Type{Nullable{Matcher}}, ::Alt)
Here's the grammar:
expr = Delayed()
doubley = p"[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?[dD]" > (x -> float64(x[1:end-1]))
floaty_dot = p"[-+]?[0-9]*\.[0-9]+([eE][-+]?[0-9]+)?[Ff]" > (x -> float32(x[1:end-1]))
floaty_nodot = p"[-+]?[0-9]*[0-9]+([eE][-+]?[0-9]+)?[Ff]" > (x -> float32(x[1:end-1]))
floaty = floaty_dot | floaty_nodot
expr.matcher = doubley | float
I'm basically copying what you have in calc.jl
I have a GML file that has underscores in key names:
graph [
directed 1
id 42
label "splice graph of s-exons"
node [
id 1
label "start"
conservation 100.0
transcript_fraction 100.0
genes "ENSBTAG00000007876,ENSG00000107643,ENSGGOG00000011771,ENSMMUG00000004060,ENSMODG00000002193,ENSMUSG00000021936,ENSOANG00000012095,ENSRNOG00000020155,ENSSSCG00000010380,ENSXETG00000021691"
]
.
.
.
This is failing with the following error:
ParserError{Int64}("Expected ] at (11,21)\n transcript_fraction 100.0\n ^\n", 253)
Stacktrace:
[1] check_channel_state at ./channels.jl:125 [inlined]
[2] take_unbuffered(::Channel{Any}) at ./channels.jl:327
[3] take! at ./channels.jl:315 [inlined]
[4] iterate(::Channel{Any}, ::Nothing) at ./channels.jl:395
[5] iterate at ./channels.jl:394 [inlined]
[6] once at /home/elin/.julia/packages/ParserCombinator/Rc0cd/src/core/parsers.jl:184 [inlined]
[7] #single_result#36 at /home/elin/.julia/packages/ParserCombinator/Rc0cd/src/core/parsers.jl:192 [inlined]
[8] (::getfield(ParserCombinator, Symbol("#kw##single_result#38")))(::NamedTuple{(:debug,),Tuple{Bool}}, ::getfield(ParserCombinator, Symbol("#single_result#38")){getfield(ParserCombinator, Symbol("##single_result#36#37")){Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}},UnionAll}}, ::String, ::Trace) at ./none:0
[9] #parse_raw#6(::Bool, ::Function, ::String) at /home/elin/.julia/packages/ParserCombinator/Rc0cd/src/gml/GML.jl:80
[10] #parse_raw at ./none:0 [inlined]
[11] #parse_dict#9(::Bool, ::Array{Symbol,1}, ::Bool, ::Function, ::String) at /home/elin/.julia/packages/ParserCombinator/Rc0cd/src/gml/GML.jl:162
[12] parse_dict(::String) at /home/elin/.julia/packages/ParserCombinator/Rc0cd/src/gml/GML.jl:162
[13] loadgml(::IOStream, ::String) at /home/elin/.julia/packages/GraphIO/IpSAL/src/GML/Gml.jl:33
[14] loadgraph at /home/elin/.julia/packages/GraphIO/IpSAL/src/GML/Gml.jl:95 [inlined]
[15] #119 at /home/elin/.julia/packages/LightGraphs/HsNig/src/persistence/common.jl:15 [inlined]
[16] #open#310(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::getfield(LightGraphs, Symbol("##119#120")){String,GraphIO.GML.GMLFormat}, ::String, ::Vararg{String,N} where N) at ./iostream.jl:369
[17] open at ./iostream.jl:367 [inlined]
[18] loadgraph(::String, ::String, ::GraphIO.GML.GMLFormat) at /home/elin/.julia/packages/LightGraphs/HsNig/src/persistence/common.jl:14
[19] top-level scope at In[17]:1
I think that the problem is that the key regex doesn't support underscores:
"[a-zA-Z][a-zA-Z0-9]*"
Is there ir a reason to avoid underscore (or other symbols) in key names?
Getting
ERROR: LoadError: LoadError: UndefVarError: Parsers not defined
on Pkg.test("ParserCombinator")
. Not sure how to import ParserCombinator.Parsers.GML
.
It may have to do with the fact that some OSes (like OSX) are case-insensitive, and you've got both Parsers.jl
and parsers.jl
.
ETA: confirmed - when I rename parsers.jl we're all good.
I guess this example is not updated so that I get an error.
"ERROR: LoadError: syntax: extra token "Node" after end of expression"
How can I make an struct that reads the expression and give me the result of the calculation?
On Julia v0.4.3 I installed Pkg.add("parserCombinators"); using ParserCombinator
and Pkg.installed("ParserCombinator") v"1.7.4"
Then the following test:parse_one("โฌ", p".") # any non-ascii character
ERROR: BoundsError: attempt to access 3-element Array{UInt8,1}:
0xe2
0x82
0xac
at index [4]
in schedule_and_wait at task.jl:343
in consume at task.jl:259
in once at ~/.julia/v0.4/ParserCombinator/src/core/parsers.jl:182
in single_result at ~/.julia/v0.4/ParserCombinator/src/core/parsers.jl:193
I'm testing a simple Boolean regex:
julia> parse_one("false", p"([Tt][Rr][Uu][Ee])|([Ff][Aa][Ll][Ss][Ee])"+Eos())
1-element Array{Any,1}:
"false"
works fine, but this is weird:
julia> parse_one("true", p"([Tt][Rr][Uu][Ee])|([Ff][Aa][Ll][Ss][Ee])"+Eos())
ERROR: ParserCombinator.ParserException("cannot parse")
in once at /Users/john/.julia/v0.5/ParserCombinator/src/core/parsers.jl:184
[inlined code] from /Users/john/.julia/v0.5/ParserCombinator/src/core/parsers.jl:169
in single_result at /Users/john/.julia/v0.5/ParserCombinator/src/core/parsers.jl:193
in eval at ./boot.jl:264
Without the Eos()
it works:
julia> parse_one("true", p"([Tt][Rr][Uu][Ee])|([Ff][Aa][Ll][Ss][Ee])")
1-element Array{Any,1}:
"true"
Any ideas?
P.S. Thanks for a great library!
I had a thought. Mike Innes's Flow.jl is looking promising: https://github.com/MikeInnes/Flow.jl. The package seems general enough to deal with any kind of Julia code, which is far more powerful than alternatives like TensorFlow.
Something that sort of bugs me when using ParserCombinator as a CFG-parser is the syntax. I sort of dislike writing, and even more dislike reading,
x = Delayed()
y = Star(x)
x.matcher = Seq(e"(", y, e")")
or similar. It would be cool if that could be written
matcher(@flow function()
x = Seq(e"(", Star(x), e"))
end)
letting Flow.jl figure out what the graph looks like. What are your thoughts? I'd be happy to work on a prototype when/if I get the time.
from PackageEvaluator: http://pkg.julialang.org/logs/ParserCombinator_0.4.log
and also Travis, if you trigger a new build: https://travis-ci.org/tkelman/ParserCombinator.jl/builds/105584168
ERROR: OverflowError()
in parse_raw at /Users/seth/dev/julia/tempwip/ParserCombinator/src/gml/GML.jl:74
using http://www-personal.umich.edu/~mejn/netdata/polblogs.zip - 85k lines, but only 1490 nodes and 19091 edges.
Firstly, thanks for this package! I'm playing around with it and it's exceptionally easy to use ๐.
This is a very minor matter (as it's just syntax), but it seems the Seq
combinator used here is a Cartesian product (correct me if I'm wrong). Consequently, wouldn't *
or ร
be a more natural choice of symbols? What are your thoughts?
consider the following program
s = """
digraph "g" {
}
"""
ast = DOT.parse_dot(s)[1]
s = """
digraph g {
}
"""
ast = DOT.parse_dot(s)[1]
their id
should be different, but currently the parser will not parse the quotes, but just ignore them
Hello and thank you for this great library. I'm trying to parse a language with fat arrow funcdefs as in:
a() => 1
However, they are never begin used because funccall is a shorter version of this:
a()
I've verified they both work as removing funccall allows funcdef to work again. Is there any way to set precedence or otherwise allow funcdef to be used? Thanks for any advice or feedback.
Here are the relevent parser definitions:
arglist = (name | (name + E","))[0:end]
funccall = name + E"(" + arglist + E")" |> FuncCall
paramlist = ((name | assign) | ((name | assign) + E","))[0:end]
funcbody = stmt | (whitespacereq + stmt)[1:end]
funcdef = name + E"(" + paramlist + E")" + E"=>" + E"\n"[0:end] + funcbody |> FuncDef
I want to extract information from GCN NASA website. Can ParserCombinator.jl be of help in my problem. Please see discourse: https://discourse.julialang.org/t/web-scraping-of-gcn-nasa-circulars-text/100809
Hi - I was wondering if this project is alive or has been abandoned? I wanted to try to use a combinator-based parser in julia and found this one. However, there haven't been any git commits for a while. I've hit a snag where StarPlus tends to get stuck in stupidly deep recursions, and have also found the error reporting to be a bit difficult to follow but don't really want to end up as the only user/developer of a library.
Is there an alternative to this or any will to make this work with current Julia?
The Example yields:
โ Info: Precompiling ParserCombinator [fae87a5f-d1ad-5cf0-8f61-c941e1580b46]
โ @ Base loading.jl:1260
syntax: extra token "Node" after end of expression
The current implementation
PFloat64() = Parse(p"-?(\d*\.?\d+|\d+\.\d*)([eE]\d+)?", Float64)
is missing the optional sign for the exponent: "-?(\d*\.?\d+|\d+\.\d*)([eE]-?\d+)?"
An implementation of PFloat16 is also missing.
Currently, parsing http://www-personal.umich.edu/~mejn/netdata/celegansneural.zip is taking a long time (~ 20 minutes) on my machine. This is a 15k-line file with 302 nodes and 2359 edges.
This issue is used to trigger TagBot; feel free to unsubscribe.
If you haven't already, you should update your TagBot.yml
to include issue comment triggers.
Please see this post on Discourse for instructions and more details.
If you'd like for me to do this for you, comment TagBot fix
on this issue.
I'll open a PR within a few hours, please be patient!
This parser fails to parse the following dot program, it is an example from https://graphviz.org/Gallery/directed/datastruct.html
digraph g {
fontname="Helvetica,Arial,sans-serif"
node [fontname="Helvetica,Arial,sans-serif"]
edge [fontname="Helvetica,Arial,sans-serif"]
graph [
rankdir = "LR"
];
node [
fontsize = "16"
shape = "ellipse"
];
edge [
];
"node0" [
label = "<f0> 0x10ba8| <f1>"
shape = "record"
];
"node1" [
label = "<f0> 0xf7fc4380| <f1> | <f2> |-1"
shape = "record"
];
"node2" [
label = "<f0> 0xf7fc44b8| | |2"
shape = "record"
];
"node3" [
label = "<f0> 3.43322790286038071e-06|44.79998779296875|0"
shape = "record"
];
"node4" [
label = "<f0> 0xf7fc4380| <f1> | <f2> |2"
shape = "record"
];
"node5" [
label = "<f0> (nil)| | |-1"
shape = "record"
];
"node6" [
label = "<f0> 0xf7fc4380| <f1> | <f2> |1"
shape = "record"
];
"node7" [
label = "<f0> 0xf7fc4380| <f1> | <f2> |2"
shape = "record"
];
"node8" [
label = "<f0> (nil)| | |-1"
shape = "record"
];
"node9" [
label = "<f0> (nil)| | |-1"
shape = "record"
];
"node10" [
label = "<f0> (nil)| <f1> | <f2> |-1"
shape = "record"
];
"node11" [
label = "<f0> (nil)| <f1> | <f2> |-1"
shape = "record"
];
"node12" [
label = "<f0> 0xf7fc43e0| | |1"
shape = "record"
];
"node0":f0 -> "node1":f0 [
id = 0
];
"node0":f1 -> "node2":f0 [
id = 1
];
"node1":f0 -> "node3":f0 [
id = 2
];
"node1":f1 -> "node4":f0 [
id = 3
];
"node1":f2 -> "node5":f0 [
id = 4
];
"node4":f0 -> "node3":f0 [
id = 5
];
"node4":f1 -> "node6":f0 [
id = 6
];
"node4":f2 -> "node10":f0 [
id = 7
];
"node6":f0 -> "node3":f0 [
id = 8
];
"node6":f1 -> "node7":f0 [
id = 9
];
"node6":f2 -> "node9":f0 [
id = 10
];
"node7":f0 -> "node3":f0 [
id = 11
];
"node7":f1 -> "node1":f0 [
id = 12
];
"node7":f2 -> "node8":f0 [
id = 13
];
"node10":f1 -> "node11":f0 [
id = 14
];
"node10":f2 -> "node12":f0 [
id = 15
];
"node11":f2 -> "node1":f0 [
id = 16
];
}
Need to change to something else... q for quote? (near p for pattern).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.