com-lihaoyi / fastparse Goto Github PK
View Code? Open in Web Editor NEWWriting Fast Parsers Fast in Scala
Home Page: https://com-lihaoyi.github.io/fastparse
License: MIT License
Writing Fast Parsers Fast in Scala
Home Page: https://com-lihaoyi.github.io/fastparse
License: MIT License
There is no scala version yet. But I'm just posting here for your awareness.
https://news.ycombinator.com/item?id=9602055
Nom is such good name for a parser. Will ver 2.0 be called NomNom?
Hi there, how easy would it be to adapt the library to support tab completion for e.g. a REPL grammar, similarly to what is supported by the sbt parsers? Thanks
I have this problem where I'm getting stuck in an infinite loop while constructing ParseError. So the actual parsing completes, results into an error and then when I try to construct ParseError it's getting stuck into a loop while building the error trace. I'm using Fastparse version 0.3.4.
It looks like it's getting stuck here. The stack looks like this while it's stuck:
at scala.collection.generic.Growable$class.loop$1(Growable.scala:54)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:57)
at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:183)
at scala.collection.immutable.List.$colon$colon$colon(List.scala:128)
at fastparse.parsers.Combinators$Either.rec$4(Combinators.scala:444)
at fastparse.parsers.Combinators$Either.parseRec(Combinators.scala:447)
at fastparse.parsers.Combinators$Rule.res$lzycompute$1(Combinators.scala:142)
- locked <0x961> (a fastparse.parsers.Combinators$Rule)
at fastparse.parsers.Combinators$Rule.fastparse$parsers$Combinators$Rule$$res$1(Combinators.scala:142)
at fastparse.parsers.Combinators$Rule.parseRec(Combinators.scala:147)
at fastparse.parsers.Combinators$Either.rec$4(Combinators.scala:439)
at fastparse.parsers.Combinators$Either.parseRec(Combinators.scala:447)
at fastparse.parsers.Combinators$Capturing.parseRec(Combinators.scala:22)
at fastparse.parsers.Combinators$Rule.res$lzycompute$1(Combinators.scala:142)
- locked <0x964> (a fastparse.parsers.Combinators$Rule)
at fastparse.parsers.Combinators$Rule.fastparse$parsers$Combinators$Rule$$res$1(Combinators.scala:142)
at fastparse.parsers.Combinators$Rule.parseRec(Combinators.scala:147)
at fastparse.parsers.Combinators$Either.rec$4(Combinators.scala:439)
at fastparse.parsers.Combinators$Either.parseRec(Combinators.scala:447)
at fastparse.parsers.Combinators$Rule.res$lzycompute$1(Combinators.scala:142)
at fastparse.parsers.Combinators$Rule.fastparse$parsers$Combinators$Rule$$res$1(Combinators.scala:142)
at fastparse.parsers.Combinators$Rule.parseRec(Combinators.scala:147)
at fastparse.parsers.Combinators$Repeat.rec$3(Combinators.scala:373)
at fastparse.parsers.Combinators$Repeat.parseRec(Combinators.scala:409)
at fastparse.parsers.Combinators$Capturing.parseRec(Combinators.scala:22)
at fastparse.parsers.Combinators$Rule.res$lzycompute$1(Combinators.scala:142)
at fastparse.parsers.Combinators$Rule.fastparse$parsers$Combinators$Rule$$res$1(Combinators.scala:142)
at fastparse.parsers.Combinators$Rule.parseRec(Combinators.scala:147)
at fastparse.WhitespaceApi$CustomSequence.parseRec(WhitespaceApi.scala:32)
at fastparse.WhitespaceApi$CustomSequence.parseRec(WhitespaceApi.scala:26)
at fastparse.parsers.Combinators$Rule.res$lzycompute$1(Combinators.scala:142)
at fastparse.parsers.Combinators$Rule.fastparse$parsers$Combinators$Rule$$res$1(Combinators.scala:142)
at fastparse.parsers.Combinators$Rule.parseRec(Combinators.scala:147)
at fastparse.parsers.Combinators$Sequence$Flat.rec$1(Combinators.scala:247)
at fastparse.parsers.Combinators$Sequence$Flat.parseRec(Combinators.scala:268)
at fastparse.parsers.Combinators$Either.rec$4(Combinators.scala:439)
at fastparse.parsers.Combinators$Either.parseRec(Combinators.scala:447)
at fastparse.parsers.Combinators$Capturing.parseRec(Combinators.scala:22)
at fastparse.parsers.Combinators$Rule.res$lzycompute$1(Combinators.scala:142)
at fastparse.parsers.Combinators$Rule.fastparse$parsers$Combinators$Rule$$res$1(Combinators.scala:142)
at fastparse.parsers.Combinators$Rule.parseRec(Combinators.scala:147)
at fastparse.WhitespaceApi$CustomSequence.parseRec(WhitespaceApi.scala:32)
at fastparse.WhitespaceApi$CustomSequence.parseRec(WhitespaceApi.scala:26)
at fastparse.WhitespaceApi$CustomSequence.parseRec(WhitespaceApi.scala:26)
at fastparse.WhitespaceApi$CustomSequence.parseRec(WhitespaceApi.scala:26)
at fastparse.parsers.Combinators$Rule.res$lzycompute$1(Combinators.scala:142)
at fastparse.parsers.Combinators$Rule.fastparse$parsers$Combinators$Rule$$res$1(Combinators.scala:142)
at fastparse.parsers.Combinators$Rule.parseRec(Combinators.scala:147)
at fastparse.parsers.Combinators$Either.rec$4(Combinators.scala:439)
at fastparse.parsers.Combinators$Either.parseRec(Combinators.scala:447)
at fastparse.parsers.Combinators$Rule.res$lzycompute$1(Combinators.scala:142)
at fastparse.parsers.Combinators$Rule.fastparse$parsers$Combinators$Rule$$res$1(Combinators.scala:142)
at fastparse.parsers.Combinators$Rule.parseRec(Combinators.scala:147)
at fastparse.parsers.Combinators$Repeat.rec$3(Combinators.scala:373)
at fastparse.parsers.Combinators$Repeat.parseRec(Combinators.scala:409)
at fastparse.parsers.Combinators$Capturing.parseRec(Combinators.scala:22)
at fastparse.parsers.Combinators$Rule.res$lzycompute$1(Combinators.scala:142)
at fastparse.parsers.Combinators$Rule.fastparse$parsers$Combinators$Rule$$res$1(Combinators.scala:142)
at fastparse.parsers.Combinators$Rule.parseRec(Combinators.scala:147)
at fastparse.WhitespaceApi$CustomSequence.parseRec(WhitespaceApi.scala:32)
at fastparse.WhitespaceApi$CustomSequence.parseRec(WhitespaceApi.scala:26)
at fastparse.parsers.Combinators$Rule.res$lzycompute$1(Combinators.scala:142)
at fastparse.parsers.Combinators$Rule.fastparse$parsers$Combinators$Rule$$res$1(Combinators.scala:142)
at fastparse.parsers.Combinators$Rule.parseRec(Combinators.scala:147)
at fastparse.parsers.Combinators$Either.rec$4(Combinators.scala:439)
at fastparse.parsers.Combinators$Either.parseRec(Combinators.scala:447)
at fastparse.parsers.Combinators$Rule.res$lzycompute$1(Combinators.scala:142)
at fastparse.parsers.Combinators$Rule.fastparse$parsers$Combinators$Rule$$res$1(Combinators.scala:142)
at fastparse.parsers.Combinators$Rule.parseRec(Combinators.scala:147)
at fastparse.parsers.Combinators$Repeat.rec$3(Combinators.scala:373)
at fastparse.parsers.Combinators$Repeat.parseRec(Combinators.scala:409)
at fastparse.parsers.Combinators$Capturing.parseRec(Combinators.scala:22)
at fastparse.parsers.Combinators$Rule.res$lzycompute$1(Combinators.scala:142)
at fastparse.parsers.Combinators$Rule.fastparse$parsers$Combinators$Rule$$res$1(Combinators.scala:142)
at fastparse.parsers.Combinators$Rule.parseRec(Combinators.scala:147)
at fastparse.WhitespaceApi$CustomSequence.parseRec(WhitespaceApi.scala:32)
at fastparse.WhitespaceApi$CustomSequence.parseRec(WhitespaceApi.scala:26)
at fastparse.parsers.Combinators$Rule.res$lzycompute$1(Combinators.scala:142)
at fastparse.parsers.Combinators$Rule.fastparse$parsers$Combinators$Rule$$res$1(Combinators.scala:142)
at fastparse.parsers.Combinators$Rule.parseRec(Combinators.scala:147)
at fastparse.parsers.Combinators$Sequence$Flat.rec$1(Combinators.scala:247)
at fastparse.parsers.Combinators$Sequence$Flat.parseRec(Combinators.scala:268)
at fastparse.parsers.Combinators$Either.rec$4(Combinators.scala:439)
at fastparse.parsers.Combinators$Either.parseRec(Combinators.scala:447)
at fastparse.parsers.Combinators$Capturing.parseRec(Combinators.scala:22)
at fastparse.parsers.Combinators$Rule.res$lzycompute$1(Combinators.scala:142)
at fastparse.parsers.Combinators$Rule.fastparse$parsers$Combinators$Rule$$res$1(Combinators.scala:142)
at fastparse.parsers.Combinators$Rule.parseRec(Combinators.scala:147)
at fastparse.WhitespaceApi$CustomSequence.parseRec(WhitespaceApi.scala:32)
at fastparse.WhitespaceApi$CustomSequence.parseRec(WhitespaceApi.scala:26)
at fastparse.WhitespaceApi$CustomSequence.parseRec(WhitespaceApi.scala:26)
at fastparse.WhitespaceApi$CustomSequence.parseRec(WhitespaceApi.scala:26)
at fastparse.parsers.Combinators$Rule.res$lzycompute$1(Combinators.scala:142)
at fastparse.parsers.Combinators$Rule.fastparse$parsers$Combinators$Rule$$res$1(Combinators.scala:142)
at fastparse.parsers.Combinators$Rule.parseRec(Combinators.scala:147)
at fastparse.parsers.Combinators$Either.rec$4(Combinators.scala:439)
at fastparse.parsers.Combinators$Either.parseRec(Combinators.scala:447)
at fastparse.parsers.Combinators$Rule.res$lzycompute$1(Combinators.scala:142)
at fastparse.parsers.Combinators$Rule.fastparse$parsers$Combinators$Rule$$res$1(Combinators.scala:142)
at fastparse.parsers.Combinators$Rule.parseRec(Combinators.scala:147)
at fastparse.parsers.Combinators$Repeat.rec$3(Combinators.scala:373)
at fastparse.parsers.Combinators$Repeat.parseRec(Combinators.scala:409)
at fastparse.parsers.Combinators$Capturing.parseRec(Combinators.scala:22)
at fastparse.parsers.Combinators$Rule.res$lzycompute$1(Combinators.scala:142)
at fastparse.parsers.Combinators$Rule.fastparse$parsers$Combinators$Rule$$res$1(Combinators.scala:142)
at fastparse.parsers.Combinators$Rule.parseRec(Combinators.scala:147)
at fastparse.WhitespaceApi$CustomSequence.parseRec(WhitespaceApi.scala:32)
at fastparse.WhitespaceApi$CustomSequence.parseRec(WhitespaceApi.scala:26)
at fastparse.parsers.Combinators$Rule.res$lzycompute$1(Combinators.scala:142)
at fastparse.parsers.Combinators$Rule.fastparse$parsers$Combinators$Rule$$res$1(Combinators.scala:142)
at fastparse.parsers.Combinators$Rule.parseRec(Combinators.scala:147)
at fastparse.parsers.Combinators$Sequence$Flat.rec$1(Combinators.scala:247)
at fastparse.parsers.Combinators$Sequence$Flat.parseRec(Combinators.scala:268)
at fastparse.parsers.Combinators$Either.rec$4(Combinators.scala:439)
at fastparse.parsers.Combinators$Either.parseRec(Combinators.scala:447)
at fastparse.parsers.Combinators$Capturing.parseRec(Combinators.scala:22)
at fastparse.parsers.Combinators$Rule.res$lzycompute$1(Combinators.scala:142)
at fastparse.parsers.Combinators$Rule.fastparse$parsers$Combinators$Rule$$res$1(Combinators.scala:142)
at fastparse.parsers.Combinators$Rule.parseRec(Combinators.scala:147)
at fastparse.WhitespaceApi$CustomSequence.parseRec(WhitespaceApi.scala:32)
at fastparse.WhitespaceApi$CustomSequence.parseRec(WhitespaceApi.scala:26)
at fastparse.WhitespaceApi$CustomSequence.parseRec(WhitespaceApi.scala:26)
at fastparse.WhitespaceApi$CustomSequence.parseRec(WhitespaceApi.scala:26)
at fastparse.parsers.Combinators$Rule.res$lzycompute$1(Combinators.scala:142)
at fastparse.parsers.Combinators$Rule.parseRec(Combinators.scala:147)
at fastparse.core.Parsed$TracedFailure$.apply(Parsing.scala:199)
at fastparse.core.Parsed$Failure$Extra$Impl.traced$lzycompute(Parsing.scala:116)
- locked <0x982> (a fastparse.core.Parsed$Failure$Extra$Impl)
at fastparse.core.Parsed$Failure$Extra$Impl.traced(Parsing.scala:116)
at fastparse.core.ParseError.<init>(Parsing.scala:34)
To reproduce:
Clone this and run the tests
As discussed at 29-02-2016 in gitter the solution might be to provide a flag to ParseError which can be plumbed down into failure.extra.traced which disables the construction of traceParsers or to call .distinct on traceParsers at every step, to stop it from blowing up to infinity in these cases
~! "foo"
and ~!" foo"
and ~ !"foo"
all parse, and all do different things, despite being nearly indistiguishable visually.
Either negation or the cut operator should change. Negation could be Not
, or cut could be ~>
or somesuch. (~>
because it visually suggests it's one way, and it parses with the same precedence as ~!
.)
If this feature already exists I apologize for missing it.
It would be nice to have a built in construct for parsers with predicates like this:
import fastparse._
object Parser {
def predicated[T](parser: Parser[T])(pred: T => Boolean): Parser[T] = P {
parser.flatMap(x => if (pred(x)) Pass.map(_ => x) else Fail)
}
val pDigits: Parser[Int] = P(CharIn('0' to '9').rep(1).!.map(_.toInt))
val pEven: Parser[Int] = predicated(pDigits)({x => x % 2 == 0})
val pOdd: Parser[Int] = predicated(pDigits)({x => x % 2 != 0})
}
object Run extends App {
import Parser._
println(pEven.parse("123"))
println(pEven.parse("124"))
println(pOdd.parse("123"))
println(pOdd.parse("124"))
}
Run...
Failure(predicated:0 / Fail:3 ..."", false)
Success(124, 3)
Success(123, 3)
Failure(predicated:0 / Fail:3 ..."", false)
It seems that the ParserApi implicit does not get invoked in Scala 2.10. The same source code compiles with Scala 2.11.
Sample build.sbt:
scalaVersion := "2.10.6"
libraryDependencies += "com.lihaoyi" %% "fastparse" % "0.3.4"
Hello.scala:
import fastparse.all._
object Hello {
val t: P[String] = P("boo" | "bar")
}
sbt compile
:
[info] Compiling 1 Scala source to /tmp/fp/target/scala-2.10/classes...
[error] /tmp/fp/Hello.scala:4: value | is not a member of String
[error] val t: P[String] = P("boo" | "bar")
[error] ^
[error] one error found
[error] (compile:compile) Compilation failed
[error] Total time: 1 s, completed Dec 27, 2015 5:36:13 PM
90 seconds to compile StringIn with a 17 character string is way too long.
println("string length,time to compile StringIn (ms)")
for (subset <- "aaaaaaaaaaaaaaaaa".tails.toSeq.reverse) {
val start = System.currentTimeMillis()
StringIn(subset)
val end = System.currentTimeMillis()
println(s"${subset.size},${end - start}")
}
string length,time to compile StringIn (ms)
0,16
1,5
2,1
3,2
4,4
5,10
6,16
7,17
8,23
9,52
10,79
11,155
12,438
13,1274
14,3264
15,10069
16,29726
17,90797
There appears to be no provision for using the captured result of a lookahead. In a context sensitive parser it may be better to jump ahead and get some value used in building the correct context parser. Is there some reason lookahead needs to throw away captured values?
Use case: In YAML a Scalar Block indentation needs to be detected by looking at the indentation of first non-empty line. The complicating factor is that empty lines are not just thrown away, and how they are parsed depends on the indentation of the block.
So doing something like the following:
def BlockScalar(indent:Int) = // builds a parser using the indent
val block_scalar = block_scalar_header ~
&((" ".rep ~ "\n").rep ~ " ".!.rep.map(_.length)).flatMap(BlockScalar)
I have this code:
import fastparse._
object playGround extends App{
someParsing()
def someParsing() = {
val ID =
P(!CharIn('0' to '9') ~ (
CharIn('0' to '9').rep ~
CharIn('a' to 'z').rep ~
CharIn('A' to 'Z').rep ~
"-".rep ~ "_".rep).rep ~ " "
)
ID.parse("a ")
}
}
and when I run it in SBT, it just hangs and doesn't return anything!
I often have to parse primitive Java/Scala types (Int
, Boolean
, Double
), and while for some types, the parser is written quickly, the parser for floating point numbers is particularly complicated.
One great thing about the Scala stdlib parser library was its inclusion of JavaTokenParsers
, I think.
I suggest to add just parsers for the primitive Java/Scala types. If you want to include these parsers, I'm happy to supply a pull-request. Maybe these parsers can reside within a package object under fastparse.parsers.javatokens
. I'm not sure whether to supply only the parser, or also the mapping to the number types.
The needed parsers would be:
Byte, Int, Long
Float, Double
Boolean
Hi @vovapolu . I heard you are hacking on FastParse this summer. One thing that is similar to the changes you are doing would be support for true stream parsing. I actually opened #39 for this last year. It worked, but we never investigated the performance implications. There may be some refinements for speedup needed to avoid the control flow via exceptions. Maybe you want to take a crack at it :)?
One concrete use case I have personally would be improving Scala error messages with a streaming error message parser. Combining https://github.com/cvogt/cbt/ with https://github.com/cvogt/scalac-cosmetics/
Under "Writing Parsers" - "Capturing":
captureOpt is a Parser[Opt[String]]
That should be
captureOpt is a Parser[Option[String]]
They're dumb but we probably have to support them
@sirthias is there any way to override the parsing over every single character or string to check for these silly \u0123
thingies? Maybe by modifying my current wspStr
and wspChar
thingies? I suppose I'd need to get rid of anyOf
or noneOf
because those don't support the stupid unicode escapes either.
I don't want to do a pre-processing stage if I can reasonably avoid it. Preprocessing will destroy all the source locations and require elaborate gymnastics to get them back.
This should be really easy if anyone wants to pick it up
> compile
Generating Scalatex Sources...
[info] Compiling 3 Scala sources to C:\dev\prj\fastparse\readme\target\scala-2.11\classes...
[error] C:\dev\prj\fastparse\readme\target\scala-2.11\src_managed\main\scalatex\Main.scala:5: invalid escape character
[error] wd = ammonite.ops.Path("C:\dev\prj\fastparse"),
[error] ^
[error] C:\dev\prj\fastparse\readme\target\scala-2.11\src_managed\main\scalatex\Main.scala:5: invalid escape character
[error] wd = ammonite.ops.Path("C:\dev\prj\fastparse"),
[error] ^
[error] C:\dev\prj\fastparse\readme\target\scala-2.11\src_managed\main\scalatex\Main.scala:6: invalid escape character
[error] output = ammonite.ops.Path("C:\dev\prj\fastparse\readme\target\scalatex"),
[error] ^
[error] C:\dev\prj\fastparse\readme\target\scala-2.11\src_managed\main\scalatex\Main.scala:6: invalid escape character
[error] output = ammonite.ops.Path("C:\dev\prj\fastparse\readme\target\scalatex"),
[error] ^
[error] C:\dev\prj\fastparse\readme\target\scala-2.11\src_managed\main\scalatex\Main.scala:6: invalid escape character
[error] output = ammonite.ops.Path("C:\dev\prj\fastparse\readme\target\scalatex"),
[error] ^
[error] C:\dev\prj\fastparse\readme\target\scala-2.11\src_managed\main\scalatex\Readme.scala:7: invalid escape character
[error] def apply(): Frag = _root_.scalatex.twf("C:\dev\prj\fastparse\readme\Readme.scalatex")
[error] ^
[error] C:\dev\prj\fastparse\readme\target\scala-2.11\src_managed\main\scalatex\Readme.scala:7: invalid escape character
[error] def apply(): Frag = _root_.scalatex.twf("C:\dev\prj\fastparse\readme\Readme.scalatex")
[error] ^
[error] C:\dev\prj\fastparse\readme\target\scala-2.11\src_managed\main\scalatex\Readme.scala:7: invalid escape character
[error] def apply(): Frag = _root_.scalatex.twf("C:\dev\prj\fastparse\readme\Readme.scalatex")
[error] ^
[error] 8 errors found
[error] (readme/compile:compileIncremental) Compilation failed
[error] Total time: 2 s, completed 01-Jun-2015 11:27:18
Derives from lihaoyi/Scalatex#9.
noticed in the context of the Scala community build:
[fastparse] Checking Dir target/repos/scala
[fastparse:error] <console>:2: warning: Detected apparent refinement of Unit; are you missing an '=' sign?
[fastparse:error] def f1(a: T): Unit { }
[fastparse:error] ^
[fastparse:error] java.lang.OutOfMemoryError: GC overhead limit exceeded
[fastparse:error] at java.util.zip.ZipFile.getInflater(ZipFile.java:455)
[fastparse:error] at java.util.zip.ZipFile.getInputStream(ZipFile.java:374)
[fastparse:error] at java.util.jar.JarFile.getInputStream(JarFile.java:447)
[fastparse:error] at sun.misc.URLClassPath$JarLoader$2.getInputStream(URLClassPath.java:940)
[fastparse:error] at sun.misc.Resource.cachedInputStream(Resource.java:77)
[fastparse:error] at sun.misc.Resource.getByteBuffer(Resource.java:160)
[fastparse:error] at java.net.URLClassLoader.defineClass(URLClassLoader.java:454)
[fastparse:error] at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
[fastparse:error] at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
[fastparse:error] at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
[fastparse:error] at java.security.AccessController.doPrivileged(Native Method)
[fastparse:error] at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
[fastparse:error] at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
[fastparse:error] at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
[fastparse:error] at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
[fastparse:error] at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
[fastparse:error] at scala.concurrent.impl.ExecutionContextImpl$AdaptedForkJoinTask.exec(ExecutionContextImpl.scala:121)
[fastparse:error] at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
[fastparse:error] at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
[fastparse:error] at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
[fastparse:error] at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
[fastparse:error] java.lang.OutOfMemoryError: GC overhead limit exceeded
[fastparse:error] java.lang.OutOfMemoryError: GC overhead limit exceeded
[fastparse:error] java.lang.OutOfMemoryError: GC overhead limit exceeded
should I just crank up the heap size, or you think there's a real regression here you want to investigate?
When my code tries to match on the result of a parse, such as:
dumpfileP.parse(mySQL) match {
case Result.Success(stmts, _) => stmts
case Result.Failure(parser, index) => displayFailure(...)
}
The Scala compiler consistently kicks out a warning:
[warn] /home/jducoeur/GitHub/Querki/querki/scalajvm/app/querki/imexport/MySQLImport.scala:143: The outer reference in this type test cannot be checked at run time.
[warn] case Result.Success(stmts, _) => stmts
This warning may be correct -- I'm honestly unsure -- but I don't care about it, and definitely don't want to see it. My coding standards are "no warnings", so this is getting in the way of using FastParse for production code.
This can be worked around by using isInstanceOf + asInstanceOf, but that's rather boilerplatey. A more idiomatic-Scala solution would be preferable.
Any plans to support this? How hard would it be, what would be good places to look, if I wanted to add it?
I have a use case where I want to parse a stream of lines that are not succeeded by a line break, but proceeded. There can be significant wait between the individual lines, so I need to parse and process a line before it's terminating new line is sent. Most line based streaming stuff breaks on that unfortunately, so I imagine a Stream[Char] would be the right thing here.
Also see tpolecat/atto#11 which supports Streams of lines, not chars if I understand correctly
When using .rep
with min = 0 and max = 0
println( P(" ".rep(min=0, max=0) ~ End).parse(" "))
println( P(" ".rep(min=1, max=1) ~ End).parse(" "))
the result is
Success((),2)
Failure(End:1:3 ..." ")
Second parser and second result are correct.
First parser with min=0 and max=0 and a Success
result is not correct, because
println( P("" ~ End).parse(" "))
delivers
Failure(End:1:1 ..." ")
as expected.
Failure
.It appears git tags are missing for the following releases that are listed in the changelog:
0.1.6 - 0.1.7
0.3.1
and the following versions in maven:
0.1.2 - 0.1.7
0.3.0 - 0.3.1
I keep getting StackOverflowErrors while trying to compile scala-parser. Getting them with both Java 7 (OS X) and 8 (Ubuntu 12.04, 14.04), as far as commit a0c39cd (didn't tried further in the past), on a clean config (Ivy cache cleared).
sbt compile
output like
Loading /usr/share/sbt/bin/sbt-launch-lib.bash
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=256m; support was removed in 8.0
[info] Set current project to scala-parser (in build file:/home/test/tmp/scala-parser/)
[info] Compiling 8 Scala sources to /home/test/tmp/scala-parser/target/scala-2.11/classes...
java.lang.StackOverflowError
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:760)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:455)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:367)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at scala.tools.nsc.typechecker.Namers$Namer.typeErrorHandler(Namers.scala:111)
at scala.tools.nsc.typechecker.Namers$Namer.typeSig(Namers.scala:1539)
at scala.tools.nsc.typechecker.Namers$Namer$$anonfun$monoTypeCompleter$1$$anonfun$apply$1.apply$mcV$sp(Namers.scala:778)
at scala.tools.nsc.typechecker.Namers$Namer$$anonfun$monoTypeCompleter$1$$anonfun$apply$1.apply(Namers.scala:777)
at scala.tools.nsc.typechecker.Namers$Namer$$anonfun$monoTypeCompleter$1$$anonfun$apply$1.apply(Namers.scala:777)
at scala.tools.nsc.typechecker.Namers$Namer.scala$tools$nsc$typechecker$Namers$Namer$$logAndValidate(Namers.scala:1565)
at scala.tools.nsc.typechecker.Namers$Namer$$anonfun$monoTypeCompleter$1.apply(Namers.scala:777)
at scala.tools.nsc.typechecker.Namers$Namer$$anonfun$monoTypeCompleter$1.apply(Namers.scala:769)
at scala.tools.nsc.typechecker.Namers$$anon$1.completeImpl(Namers.scala:1681)
at scala.tools.nsc.typechecker.Namers$LockingTypeCompleter$class.complete(Namers.scala:1689)
at scala.tools.nsc.typechecker.Namers$$anon$1.complete(Namers.scala:1679)
...
at scala.reflect.internal.Symbols$Symbol.initialize(Symbols.scala:1628)
at scala.tools.nsc.typechecker.Typers$Typer.typed1(Typers.scala:4911)
at scala.tools.nsc.typechecker.Typers$Typer.runTyper$1(Typers.scala:5295)
at scala.tools.nsc.typechecker.Typers$Typer.scala$tools$nsc$typechecker$Typers$Typer$$typedInternal(Typers.scala:5322)
at scala.tools.nsc.typechecker.Typers$Typer.body$2(Typers.scala:5269)
at scala.tools.nsc.typechecker.Typers$Typer.typed(Typers.scala:5273)
at scala.tools.nsc.typechecker.Typers$Typer.typedByValueExpr(Typers.scala:5351)
at scala.tools.nsc.typechecker.Typers$Typer.scala$tools$nsc$typechecker$Typers$Typer$$typedStat$1(Typers.scala:2977)
at scala.tools.nsc.typechecker.Typers$Typer$$anonfun$60.apply(Typers.scala:3081)
at scala.tools.nsc.typechecker.Typers$Typer$$anonfun$60.apply(Typers.scala:3081)
at scala.collection.immutable.List.loop$1(List.scala:172)
at scala.collection.immutable.List.mapConserve(List.scala:188)
at scala.tools.nsc.typechecker.Typers$Typer.typedStats(Typers.scala:3081)
at scala.tools.nsc.typechecker.Typers$Typer.typedBlock(Typers.scala:2340)
at scala.tools.nsc.typechecker.Typers$Typer$$anonfun$typedOutsidePatternMode$1$1.apply(Typers.scala:5217)
at scala.tools.nsc.typechecker.Typers$Typer$$anonfun$typedOutsidePatternMode$1$1.apply(Typers.scala:5217)
at scala.tools.nsc.typechecker.Typers$Typer.typedOutsidePatternMode$1(Typers.scala:5216)
at scala.tools.nsc.typechecker.Typers$Typer.typedInAnyMode$1(Typers.scala:5252)
at scala.tools.nsc.typechecker.Typers$Typer.typed1(Typers.scala:5259)
at scala.tools.nsc.typechecker.Typers$Typer.runTyper$1(Typers.scala:5295)
at scala.tools.nsc.typechecker.Typers$Typer.scala$tools$nsc$typechecker$Typers$Typer$$typedInternal(Typers.scala:5322)
at scala.tools.nsc.typechecker.Typers$Typer.body$2(Typers.scala:5269)
at scala.tools.nsc.typechecker.Typers$Typer.typed(Typers.scala:5273)
at scala.tools.nsc.typechecker.Typers$Typer.typed(Typers.scala:5362)
at scala.tools.nsc.typechecker.Typers$Typer.computeType(Typers.scala:5453)
at scala.tools.nsc.typechecker.Namers$Namer.assignTypeToTree(Namers.scala:876)
[error] (compile:compile) java.lang.StackOverflowError
[error] Total time: 4 s, completed 6 mars 2015 10:16:29
Would you have any idea of what could cause that? Am I the only one getting these errors?
There are some cases where we need to fail gracefully within a map
call. As an example, I have some code that parses a charset name and needs to turn it into an instance of java.nio.charset.Charset
. The name itself might be valid syntactically but not actually be the name of a valid Charset
, which I can only discover in the map
call.
I think what I'm really asking for is for is for map
to be able to return a failing parser - that is, for Parser
to have a flatMap
method.
That would make it more general and let Ammonite parse Vector[Char]
s without mangling them into strings first
haoyi-haoyi@ object Foo{
import fastparse.all._
val plus = P( "+" )
val num = P( CharIn('0' to '9').rep(1) ).!.map(_.toInt)
val side = P( "(" ~! expr ~! ")" | num ).log()
val expr: P[Int] = P( side ~ plus ~ side ).map{case (l, r) => l + r}.log()
}
haoyi-haoyi@ Foo.expr.parse("(1+(2+3x))+4").asInstanceOf[fastparse.core.Result.Failure].index
+expr:0
+side:0
+expr:1
+side:1
-side:1:Success(2)
+side:3
+expr:4
+side:4
-side:4:Success(5)
+side:6
-side:6:Success(7)
-expr:4:Success(7)
-side:3:Failure(side:3 / ")":3 ..."(2+3x))+4", cut)
-expr:1:Failure(expr:1 / side:3 / ")":1 ..."1+(2+3x))+", cut)
-side:0:Failure(side:0 / expr:1 / side:3 / ")":0 ..."(1+(2+3x))", cut)
-expr:0:Failure(expr:0 / side:0 / expr:1 / side:3 / ")":0 ..."(1+(2+3x))", cut)
res76: Int = 7
This should probably be ")":7
rather than ")":0
. The final index
seems right, so something is funky in the logging
Tested on 0.2.1:
sealed trait Val
case class Var(value : String) extends Val
val startTime = new Date()
val vars = List("\r\n", "\n", "hl", "SA", "avxavxavxavxavxavx").toSeq
val variables : Parser[Var] = P ( StringIn(vars:_*).! ).map(Var)
val Result.Success(myVars, _) = variables.parse("SA")
val endTime = new Date()
val dateDiff = new SimpleDateFormat("mm:ss").format(new Date(endTime.getTime - startTime.getTime))
println (s"myVars = $myVars")
println (s"took $dateDiff to parse")
results in the following output:
myVars = Var(SA)
took 03:11 to parse
This only happens on long strings, shorter ones process almost instantly. I've tried various combinations of the long string and it always results in an exceedingly long parsing time.
Previously we needed to implement a bunch ourselves because they didn't work in Scala.js, and this was done using a pre-computed/serialized bitset which makes the JS output somewhat large. Many now exist in Scala.js itself and can be retired
Hi!
I'm very newbie to Scala and a total newbie using fastparse.
I've created a very simple SBT project and tried to compile and run some simple parsers from the web examples, but I get a lot of compilation errors on sbt like this:
[error] /home/freinn/parsertests/src/main/scala/Main.scala:30: No implicit view available from String => fastparse.core.Parser[V].
[error] val ab = P( "a".rep ~ "b" )
[error] ^
[error] one error found
error Compilation failed
Or another one:
[error] /home/freinn/tecsisaDSL/src/main/scala/Main.scala:22: value ~ is not a member of String
[error] val ab = P("a" ~ "b")
[error] ^
[error] /home/freinn/tecsisaDSL/src/main/scala/Main.scala:41: too many arguments for method println: (x: Any)Unit
[error] println(ParserSet.val1, ParserSet.val2)
[error] ^
[error] two errors found
I'm using 0.3.7 with the following line on build.sbt:
libraryDependencies += "com.lihaoyi" %% "fastparse" % "0.3.7"
I did the import (import fastparse.all._) in the beginning of my file and created an object ParserSet { ... } with all the copied/pasted code from the web examples.
Implement repN and repExactN instances that collect either (N or more) or N instances of the given parser, by analogy with rep, rep1
It would be very useful to have some monadic functions to run on parsed result (in a similar way like we work with Try-s), things: like map, recover, orElse, getOrElse, etc
Sorry for being pedantic, but I think the Xml parse could be part of standalone project.
haoyi-haoyi@ object Foo{
import fastparse.all._
val plus = P( "+" )
val num = P( CharIn('0' to '9').rep(1) ).!.map(_.toInt)
val side = P( "(" ~! expr ~! ")" | num ).log()
val expr: P[Int] = P( side ~ plus ~ side ).map{case (l, r) => l + r}.log()
}
haoyi-haoyi@ Foo.expr.parse("(1+(2+3x))+4").asInstanceOf[fastparse.core.Result.Failure].index
+expr:0
+side:0
+expr:1
+side:1
-side:1:Success(2)
+side:3
+expr:4
+side:4
-side:4:Success(5)
+side:6
-side:6:Success(7)
-expr:4:Success(7)
-side:3:Failure(side:3 / ")":3 ..."(2+3x))+4", cut)
-expr:1:Failure(expr:1 / side:3 / ")":1 ..."1+(2+3x))+", cut)
-side:0:Failure(side:0 / expr:1 / side:3 / ")":0 ..."(1+(2+3x))", cut)
-expr:0:Failure(expr:0 / side:0 / expr:1 / side:3 / ")":0 ..."(1+(2+3x))", cut)
res76: Int = 7
This should probably be ")":7
rather than ")":0
. The final index
seems right, so something is funky in the logging
Cuts are a nice feature for generating error information and for simplifying parsing, but they appear to be global in nature. That is, you can't (as far as I can tell) provide a scope for the cut that essentially says: for this parsing branch you treat this as a cut, but if you bail out on this entire branch, there's still another one to consider.
A motivating example: data may usually be in a format that can be efficiently represented, such as an array of ints, but may (rarely) contain valid yet not-efficiently-represented data (e.g. a struct). Scoped cuts can allow you to get precise information when it really must be an array of ints without having to write a second parsing routine to handle the case where non-Int input is okay.
Of course there are ways to do this: do lax parsing, then filter by sub-parsing the captured string using cuts. This, however, is inelegant and inefficient.
There is an additional consideration which is if one has scoped cuts, can you "cut more deeply" to make a cut that will escape one or more levels of scoping (or perhaps can escape scopes of particular names)? I do not yet have an opinion on whether this is a good idea.
The minimal syntax would look something like
val Num = CharsWhile(c => c >= '0' && c <= '9').!.map(_.toInt)
val Str = ("\"" ~ CharsWhile(c => c != '"').! ~ "\"")
val NumArray = Num.rep(1, " " ~! Pass)
val AnyArray = Scoped(NumArray) | (Num | Str).rep(1, " " ~! Pass)
(a.! ~ b.! ~c.!).!
is a Parser[String]
, ignoring the inner captures. There should be a way to capture all 4 things like in a regex ((...)(...)(...))
.
Apparently "it should be trivial to implement by taking the source code for Capture and making it append to a tuple instead of replacing the innards".
Just recording it here, hopefully I can contribute this some time.
index
is enough for machines but hard to read for humans. It's trivial to get the line/col via
val lines = f.input.take(f.index).lines.toVector
val line = lines.length
val col = lines.last.length
But it's probably worth chucking this (or some more-optimized version of it) into the Failure
object for everybody's convenience
[info] Failures:
[info] 90/93 pythonparse.ProjectTests.ansible
[info] java.lang.Exception: pythonparse/jvm/target/repos/ansible/lib/ansible/parsing/vault/__init__.py
[info] pythonparse.ProjectTests$.check(ProjectTests.scala:49)
[info] pythonparse.ProjectTests$$anonfun$16$$anonfun$apply$9.apply(ProjectTests.scala:61)
[info] pythonparse.ProjectTests$$anonfun$16$$anonfun$apply$9.apply(ProjectTests.scala:53)
[info] Tests: 93
[info] Passed: 92
[info] Failed: 1
I noticed this happening in the Scala community build: https://scala-ci.typesafe.com/job/scala-2.11.x-jdk8-integrate-community-build/123/consoleFull
I found that overriding WhitespaceApi
also affects CharIn
's behavior. Is this a bug?
import fastparse.WhitespaceApi
import fastparse.noApi._
val White = WhitespaceApi.Wrapper{
import fastparse.all._
NoTrace(" ".rep)
}
import White._
val x = P("abc" ~ CharIn("def").rep.!)
x.parse("abcdd d")
// result: res0: fastparse.core.Result[String] = Success(dd d,7)
Please try running this two ways.
The output should be the same for both but it is not.
<
Some text<
object BugReport extends App {
object MyParser {
val number = P(CharPred( (ch:Char) => {ch.isDigit || ch == '.'}).rep.! )
val text = P(CharPred( (ch:Char) => {ch.isSpaceChar || ch.isLetterOrDigit}).rep.! )
val textFirst = (text | number).!
val numberFirst = (number | text).!
def target = numberFirst
def parseItem(str: String) = target.parse(str)
}
val input = "Some text"
MyParser.parseItem(input) match {
case Result.Success(res, _) =>
println(">" + res + "<")
case x => println("Could not parse the input string:" + x)
}
}
We want to parse the following: the first line contains the size, the second line an integer sequence of that size. We use WhitespaceApi to define spaces and tabs as whitespace; line endings are considered significant and have their own parser.
We first parse the size, then the repeated sequence of integers using flatMap
. However, flatMap
does not eat whitespace between the first parser and the second parser. Written without an explicit whitespace token (see Scala code below), the following will parse:
2
3 4
but not the following:
2
3 4
Code below:
import fastparse.WhitespaceApi
object Test extends App {
// whitespace contains spaces and tabs
val White = WhitespaceApi.Wrapper{
import fastparse.all._
NoTrace(CharsWhile(" \t".contains(_)).?)
}
import White._
import fastparse.noApi._
// line endings
val lineEnding: P[Unit] = P("\r".? ~ "\n")
// non-negative integer
val nnInt: P[Int] = P( CharIn('0'to'9').repX(1).!.map(_.toInt) )
// sized sequence of integers, separated by whitespace
def seqInt(n: Int): P[Seq[Int]] = nnInt.rep(min=n, max=n)
// size followed by sized sequence
val sizeAndSeqInt: P[Seq[Int]] = (nnInt ~ lineEnding).flatMap( n => seqInt(n) )
val sizeAndSeqInt1: P[Seq[Int]] = (nnInt ~ lineEnding).flatMap( n => Pass ~ seqInt(n) )
sizeAndSeqInt.parse("2\n3 4").get // Success
sizeAndSeqInt1.parse("2\n 3 4").get // Success
sizeAndSeqInt.parse("2\n 3 4").get // Failure
}
The behavior is different from the Scala parser combinators and should be documented (or the flatMap
semantics changed, if it makes sense).
Would it be possible to implement error recovery, like parboiled1 does (trying to synchronize the input by adding or deleting tokens from the token stream)? This would allow many more uses for fastparse, like creating very powerful editors for DSLs, in combination with scala.js and CodeMirror.
I happen to find bugs about syntax error reporting of typo on def
,val
or var
, which are not occurred if method or variable doesn't have type annotation. I assume they should be reported as syntax error even if they're not type annotated
Here are examples of method with typo on def
. Behavior of variable with typo on val
or var
is the same.
scala> val p0 = scalaparse.Scala.CompilationUnit
p0: fastparse.P0 = CompilationUnit
//Reporting typo of `def`, if return type is annotated
scala> p0.parse("object A{def i:Int = 1}",0,true)
res24: fastparse.core.Result[Unit] = Success((), 23)
scala> p0.parse("object A{de i:Int = 1}",0,true)
res25: fastparse.core.Result[Unit] = Failure(CompilationUnit:0 / Body:0 / TopStatSeq:0 / TopStat:0 / Tmpl:0 / ObjDef:0 / DefTmpl:8 / TmplBody:8 / }:17 / "}":18 ..."= 1}", true)
//Not reporting typo of `def`, if return type is NOT annotated
scala> p0.parse("object A{def i = 1}",0,true)
res26: fastparse.core.Result[Unit] = Success((), 19)
scala> p0.parse("object A{de i = 1}",0,true)
res27: fastparse.core.Result[Unit] = Success((), 18)
Currently, the readme (user manual) does not contain any information of
P(...)
,Currently, log()
displays debugging information only when a Failure
object is returned.
It would be helpful for log()
to additionally display for Success
objects to display the text (or a summarized version of it) that a rule processed, for a couple of reasons:
Success
object is returned; it is very hard to locate this type of error without seeing the text the Success
object processed;Success
objects make it easier to see where you are lokking in the parse.These original text strings should not pollute the visual space with too much information, though, which would make log()
output hard to read. Thus, the strings should:
I've written a patch that does this, by using regexs to:
Although the string may be up to 49 chars long, in practice it is shorter than that due to breaking on whitespace. Here are some sample summarizations:
"Sticks and stones may break my bones but names will never hurt me."
0:66 Success: "sticks and"..."will never hurt me."
"I'm going to go to this shop to go shopping while she goes shopping at that shop."
0:81 Success: "I'm going to"..."shopping at that shop."
"We really may not be all that hungry since we ate a lot already."
0:64 Success: "we really may"..."ate a lot already."
"I've been studying a parser combinator library for scala because it might be useful for my projects."
0:100 Success: "I've been studying"..."for my projects."
"A newspaper reported that the store is going to plan new studies on the project."
0:80 Success: "a newspaper"..."on the project."
Here is how the Success strings look in debugging a program with log()
. The example program is a simple NLP chunker with an input sentence:
"A newspaper reported that the firm plans new studies on the project."
Without Success strings:
+s:0
+clause:0
+np:0
+adjP:2
-adjP:2:Failure(adjP:1:3 / adj:1:3 / ws:1:6 / (CharIn(" \t\n.;:?!").rep(1) | &(",") | End):1:6 ..."newspaper ")
+pp:12
-pp:12:Failure(pp:1:13 / prep:1:13 / StringIn("about", "above", "according to", "across", "after", "against", "around", "at", "before", "behind", "below", "beneath", "beside", "besides", "between", "beyond", "by", "by way of", "down", "during", "except", "for", "from", "in", "in addition to", "in front of", "in place of", "in regard to", "in spite of", "inside", "instead of", "into", "like", "near", "of", "off", "on", "on account of", "out", "out of", "outside", "over", "through", "throughout", "till", "to", "toward", "under", "until", "up", "upon", "with", "without"):1:13 ..."reported t")
-np:0:Success(12)
+vp:12
+vConj:12
-vConj:12:Success(21)
+pp:21
-pp:21:Failure(pp:1:22 / prep:1:22 / StringIn("about", "above", "according to", "across", "after", "against", "around", "at", "before", "behind", "below", "beneath", "beside", "besides", "between", "beyond", "by", "by way of", "down", "during", "except", "for", "from", "in", "in addition to", "in front of", "in place of", "in regard to", "in spite of", "inside", "instead of", "into", "like", "near", "of", "off", "on", "on account of", "out", "out of", "outside", "over", "through", "throughout", "till", "to", "toward", "under", "until", "up", "upon", "with", "without"):1:22 ..."that the f")
+np:21
+adjP:26
-adjP:26:Failure(adjP:1:27 / adj:1:27 / StringIn("big", "small", "fast", "slow", "new", "old", "next", "red", "blue", "green", "orange", "yellow", "white", "black", "grey", "silver", "gold", "good", "bad", "great", "awful", "cool", "awesome", "worthless", "useful", "clever", "smart", "dumb", "stupid", "ridiculous", "fun", "interesting", "boring", "hungry", "thirsty", "firm"):1:27 ..."the firm p")
-np:21:Failure(np:1:22 / (det.? ~ Logged(adjP,adjP,<function1>).? ~ n.rep(1) ~ Logged(pp,pp,<function1>).? | pronoun):1:22 ..."that the f")
-vp:12:Success(21)
-clause:0:Success(21)
+clauseConnector:21
-clauseConnector:21:Success(26)
+clause:26
+np:26
+adjP:30
-adjP:30:Success(35)
+pp:41
-pp:41:Failure(pp:1:42 / prep:1:42 / StringIn("about", "above", "according to", "across", "after", "against", "around", "at", "before", "behind", "below", "beneath", "beside", "besides", "between", "beyond", "by", "by way of", "down", "during", "except", "for", "from", "in", "in addition to", "in front of", "in place of", "in regard to", "in spite of", "inside", "instead of", "into", "like", "near", "of", "off", "on", "on account of", "out", "out of", "outside", "over", "through", "throughout", "till", "to", "toward", "under", "until", "up", "upon", "with", "without"):1:42 ..."new studie")
-np:26:Success(41)
+vp:41
+vConj:41
-vConj:41:Failure(vConj:1:42 / ("to" ~ ws ~ adv.rep ~ infinitive | "going ".? ~ "to" ~ ws ~ adv.rep ~ infinitive ~ presentParticiple.? | modalAuxiliary ~ adv.rep ~ infinitive | (("do" | "did" | "will") ~ (ws ~ adv).? | ("don't" | "didn't" | "won't") ~ ws) ~ infinitive | have ~ adv.rep ~ pastParticiple | be ~ adv.rep ~ presentParticiple | have ~ adv.rep ~ "been " ~ adv.rep ~ presentParticiple | presentTense | pastTense):1:42 ..."new studie")
-vp:41:Failure(vp:1:42 / vConj:1:42 / ("to" ~ ws ~ adv.rep ~ infinitive | "going ".? ~ "to" ~ ws ~ adv.rep ~ infinitive ~ presentParticiple.? | modalAuxiliary ~ adv.rep ~ infinitive | (("do" | "did" | "will") ~ (ws ~ adv).? | ("don't" | "didn't" | "won't") ~ ws) ~ infinitive | have ~ adv.rep ~ pastParticiple | be ~ adv.rep ~ presentParticiple | have ~ adv.rep ~ "been " ~ adv.rep ~ presentParticiple | presentTense | pastTense):1:42 ..."new studie")
+copulaP:41
-copulaP:41:Failure(copulaP:1:42 / be:1:42 / StringIn("am", "are", "is", "was", "were", "will be", "be", "'m", "'s", "'re", "'ll"):1:42 ..."new studie")
-clause:26:Failure(clause:1:27 / (Logged(vp,vp,<function1>) | Logged(copulaP,copulaP,<function1>)):1:42 ..."the firm p")
-s:0:Failure(s:1:1 / End:1:22 ..."a newspape")
With Success strings:
+s:0
+clause:0
+np:0
+adjP:2
-adjP:2:Failure(adjP:1:3 / adj:1:3 / ws:1:6 / (CharIn(" \t\n.;:?!").rep(1) | &(",") | End):1:6 ..."newspaper ")
+pp:12
-pp:12:Failure(pp:1:13 / prep:1:13 / StringIn("about", "above", "according to", "across", "after", "against", "around", "at", "before", "behind", "below", "beneath", "beside", "besides", "between", "beyond", "by", "by way of", "down", "during", "except", "for", "from", "in", "in addition to", "in front of", "in place of", "in regard to", "in spite of", "inside", "instead of", "into", "like", "near", "of", "off", "on", "on account of", "out", "out of", "outside", "over", "through", "throughout", "till", "to", "toward", "under", "until", "up", "upon", "with", "without"):1:13 ..."reported t")
-np:0:12 Success: "a newspaper "
+vp:12
+vConj:12
-vConj:12:21 Success: "reported "
+pp:21
-pp:21:Failure(pp:1:22 / prep:1:22 / StringIn("about", "above", "according to", "across", "after", "against", "around", "at", "before", "behind", "below", "beneath", "beside", "besides", "between", "beyond", "by", "by way of", "down", "during", "except", "for", "from", "in", "in addition to", "in front of", "in place of", "in regard to", "in spite of", "inside", "instead of", "into", "like", "near", "of", "off", "on", "on account of", "out", "out of", "outside", "over", "through", "throughout", "till", "to", "toward", "under", "until", "up", "upon", "with", "without"):1:22 ..."that the f")
+np:21
+adjP:26
-adjP:26:Failure(adjP:1:27 / adj:1:27 / StringIn("big", "small", "fast", "slow", "new", "old", "next", "red", "blue", "green", "orange", "yellow", "white", "black", "grey", "silver", "gold", "good", "bad", "great", "awful", "cool", "awesome", "worthless", "useful", "clever", "smart", "dumb", "stupid", "ridiculous", "fun", "interesting", "boring", "hungry", "thirsty", "firm"):1:27 ..."the firm p")
-np:21:Failure(np:1:22 / (det.? ~ Logged(adjP,adjP,<function1>).? ~ n.rep(1) ~ Logged(pp,pp,<function1>).? | pronoun):1:22 ..."that the f")
-vp:12:21 Success: "reported "
-clause:0:21 Success: "a newspaper reported "
+clauseConnector:21
-clauseConnector:21:26 Success: "that "
+clause:26
+np:26
+adjP:30
-adjP:30:35 Success: "firm "
+pp:41
-pp:41:Failure(pp:1:42 / prep:1:42 / StringIn("about", "above", "according to", "across", "after", "against", "around", "at", "before", "behind", "below", "beneath", "beside", "besides", "between", "beyond", "by", "by way of", "down", "during", "except", "for", "from", "in", "in addition to", "in front of", "in place of", "in regard to", "in spite of", "inside", "instead of", "into", "like", "near", "of", "off", "on", "on account of", "out", "out of", "outside", "over", "through", "throughout", "till", "to", "toward", "under", "until", "up", "upon", "with", "without"):1:42 ..."new studie")
-np:26:41 Success: "the firm plans "
+vp:41
+vConj:41
-vConj:41:Failure(vConj:1:42 / ("to" ~ ws ~ adv.rep ~ infinitive | "going ".? ~ "to" ~ ws ~ adv.rep ~ infinitive ~ presentParticiple.? | modalAuxiliary ~ adv.rep ~ infinitive | (("do" | "did" | "will") ~ (ws ~ adv).? | ("don't" | "didn't" | "won't") ~ ws) ~ infinitive | have ~ adv.rep ~ pastParticiple | be ~ adv.rep ~ presentParticiple | have ~ adv.rep ~ "been " ~ adv.rep ~ presentParticiple | presentTense | pastTense):1:42 ..."new studie")
-vp:41:Failure(vp:1:42 / vConj:1:42 / ("to" ~ ws ~ adv.rep ~ infinitive | "going ".? ~ "to" ~ ws ~ adv.rep ~ infinitive ~ presentParticiple.? | modalAuxiliary ~ adv.rep ~ infinitive | (("do" | "did" | "will") ~ (ws ~ adv).? | ("don't" | "didn't" | "won't") ~ ws) ~ infinitive | have ~ adv.rep ~ pastParticiple | be ~ adv.rep ~ presentParticiple | have ~ adv.rep ~ "been " ~ adv.rep ~ presentParticiple | presentTense | pastTense):1:42 ..."new studie")
+copulaP:41
-copulaP:41:Failure(copulaP:1:42 / be:1:42 / StringIn("am", "are", "is", "was", "were", "will be", "be", "'m", "'s", "'re", "'ll"):1:42 ..."new studie")
-clause:26:Failure(clause:1:27 / (Logged(vp,vp,<function1>) | Logged(copulaP,copulaP,<function1>)):1:42 ..."the firm p")
-s:0:Failure(s:1:1 / End:1:22 ..."a newspape")
In the bottom version, it is much easier to follow the parse and ID the problem, which is not a Failure object, but the Success object: -np:26:41 Success: "the firm plans "
(i.e. an NP: "plans that are firm", rather than NP "the firm" + V "plans").
Right now "!" is the most confusing sign in FastParse as it is used both for negation and for capturing. Moreover !(something) does not provide any output, so to have something very common and trivial, like a parser like "everything but space" I have to do something like this:
val notSpace = (!" ").flatMap(v => AnyChar)
val stringWithoutSpaces = P( notSpace.rep.! )
I would be happy to have less verbose way to do this
For debugging purposes, I would like to assign the line number and the column to my tree nodes. How do I obtain these information?
Not everything that's useful lives in the library as a primitive, operator or class.
Some patterns aren't used widely enough, while others are difficult to encapsulate in a helper that's generic enough to be used in all cases, and others are so abstract that they're more developer workflows than code. Nevertheless, these are things that have been learned before writing Scalaparse/Pythonparse/Scalatex and other parsers, and are worth writing down somewhere so others can learn from it.
Here are a few from gitter:
CharsWhile
with .rep
to greatly improve performance chomping "mostly boring" stretches of characterscut
s, which forces left-factoring of your rules, which then makes them run faster since they backtrack/repeat lessNotNewline
operator to force parsing of expressions to stop in certain contexts https://github.com/lihaoyi/fastparse/blob/master/scalaparse/shared/src/main/scala/scalaparse/Exprs.scala#L73.tupled
with case class constructors to build the case classes conveniently https://github.com/lihaoyi/fastparse/blob/master/pythonparse/shared/src/main/scala/pythonparse/Expressions.scala#L121-L122This task was given to me previously by the owner of the project @lihaoyi.
The parser will have take either of http://code.jquery.com/jquery-2.2.1.min.js and http://code.jquery.com/jquery-2.2.1.js, parse it into an AST, and pretty-print it the same way http://jsbeautifier.org/ does.
I'm adding the issue now, but I'm already thinking of doing by the end of the summer since I'm pre-occupied by the GSoC 16 program. Hope that's ok ๐
When parsing long sections, types end up messy because there are no appropriate instances of Sequencer
This should be really easy if anyone wants to pick it up
It's currently possible to specify a minimum number of repetitions for repeated sequences. It'd be great to also have a maximum number of repetitions.
My use case is parsing HTTP grammar, where for example language tags are defined as:
language-tag = primary-tag *( "-" subtag )
primary-tag = 1*8ALPHA
subtag = 1*8ALPHA
I haven't yet found a sane way to express that with fastparse, but maybe I'm missing something?
val aa: P[String] = P("aa").!
val aaLA: P[(String, String)] = (&(aa)).! ~ aa
I'd have expected this to succeed, but it fails: val Success(("aa", "aa"), _) = aaLA.parse("aa")
Instead this succeeds: val Success(("", "aa"), _) = aaLA.parse("aa")
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.