Giter Site home page Giter Site logo

lightyear's Introduction

Lightyear

Lightweight parser combinator library for Idris, inspired by Parsec.

Build Status

Module overview:

  • Lightyear.Core: central definitions + instances
  • Lightyear.Errmsg: error message formatting, mainly internal library
  • Lightyear.Combinators: generic combinators like many or sepBy
  • Lightyear.Char: char-bound parsers like char or space
  • Lightyear.Strings: string-bound parsers like strings or lexeme

Synopsis

This package is used (almost) the same way as Parsec, except for one difference: backtracking.

Commitment

  • Parsec combinators won't backtrack if a branch of <|> has consumed any input, hence Parsec parsers require an explicit try.

  • Lightyear parsers are backtrack-by-default and there is the commitTo combinator that makes the parser commit to that branch.

In other words, the following two pieces of code are equivalent (using illustrative combinator names):

Parsec:

elem :: Parser Int
elem = (try (string "0x") >> hexNumber) <|> decNumber

Lightyear:

elem : Parser Int
elem = (string "0x" $> commitTo hexNumber) <|> decNumber
-- which may be abbreviated as:
--   = (string "0x" >! hexNumber) <|> decNumber

After reading the prefix 0x, both parsers commit to reading a hexadecimal number or nothing at all — Parsec does this automatically, Lightyear uses the commitTo combinator for this purpose. On the other hand, Parsec requires the string "0x" to be wrapped in try because if we are reading 0123, we definitely don't want to commit to the left branch upon seeing the leading 0.

For convenience, commitTo is merged in monadic and applicative operators, yielding the operators >!=, >!, <$!>, <$!, and $!>. The ! in the names is inspired by the notation used for cuts in Prolog.

A parser that uses commitment might look like this (notice the leading char '@' that leads to commitment):

entry : Parser Entry
entry = char '@' >! do
  typ <- pack <@> some (satisfy (/= '{'))
  token "{"
  ident <- pack <@> some (satisfy (/= ','))
  token ","
  items <- item `sepBy` comma
  token "}"
  return $ En typ ident items

Lazy Branching with <|>|

It is worth noting that Idris itself is a strict language, and thus the <|> operator will evaluate both its arguments eagerly by default. In order to lazily evaluate different parsing branches we are required to use a special operator: <|>|. In general, all recursive calls of combinators have to occur in a lazy context. (With mutual recursion, this generalises to the rule that each call cycle has to be broken by laziness in at least one place.)

In the wild, it might look like this:

partial parseExpr : Parser SExpr
parseExpr = parseName <|>| ( MkSExpr <$> parens (many parseExpr) )

In the above example, the whole RHS of <|>| is lazy, and so the recursive occurrence of parseExpr in it will be evaluated only if the LHS of <|>| fails. Using <|> would cause infinite recursion.

For convenience, a version of <*> that lazily evaluates its second argument is included as <*>|. Conversely to <|>|, the RHS of <*>| will be evaluated only if the LHS of <*>| succeeds.

Example

Lightyear is used to parse BibTeX in bibdris.

Build

$ make clean
$ make test
$ make install

lightyear's People

Contributors

0xflotus avatar bartadv avatar bryce-anderson avatar david-christiansen avatar jfdm avatar leostera avatar melvar avatar oleks avatar steshaw avatar superfunc avatar termina1 avatar timjb avatar vyp avatar ziman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lightyear's Issues

Lightyear Tests segfaults.

Using Idris HEAD (3c4bd14) and Lightyear HEAD (53c605) the lightyear tests seg fault.

$ make install && make test results in:

(cd tests; bash runtests.sh)
compiling lightyear tests...
compiled OK, running lightyear tests...
runtests.sh: line 18: 17864 Segmentation fault      timeout 5s ./test > output
* test failed or timed out *
make: *** [test] Error 1

Upon further exploration, I find that if main is executed in the REPL the tests succeed. If :exec is called the tests seg fault.

space parses zero or more whitespace, while it would make more sense to parse only one!

Currently Lightyear.Strings.space parses zero or more whitespace, which seems strange given it singular naming. Moreover, this is in contrast to Parsec which defines:

space

and

spaces

to separate these two cases. From Parsec.Text.Chars:

-- | Skips /zero/ or more white space characters. See also 'skipMany'.
spaces :: (Stream s m Char) => ParsecT s u m ()
spaces = skipMany space <?> "white space"

-- | Parses a white space character (any character which satisfies 'isSpace')
-- Returns the parsed character.

space :: (Stream s m Char) => ParsecT s u m Char
space = satisfy isSpace <?> "space"

Issue translating megaparsec example

Hi, I'm trying to translate the last megaparsec example on this page.

http://akashagrawal.me/beginners-guide-to-megaparsec/

I ended up with this, which typechecks.

boldP : Parser (List Char)
boldP = do
  _ <- ntimes 2 (char '*')
  txt <- some (alphaNum <|> char ' ')
  _ <- ntimes 2 (char '*')
  pure $ unpack "<strong>" ++ txt ++ unpack "</strong>"

foo : Either String (List Char)
foo = parse boldP "**idris is cool**"

But it seems to fail to do anything.

Expression: foo = case case uncons "idris is cool**" of
       Nothing => ue [Err pos "a token, not EOF"] (ST i pos tw)
       Just (t, rest) => case f t of
                           Nothing => ue [Err pos "a different token"] (ST i pos tw)
                           Just res => case updatePos tw pos t of
                                         (newPos, b) => us res (ST rest newPos tw) of
  Id r => case r of
            MkReply _ (Success x) => Right x
            MkReply _ (Failure es) => Left (concat (intersperse "\n" (map display es))) : Either String (List Char)

JSON as a separate library

I want to use JSON in my Idris application. I started writing a JSON library but then realised I should just use Lightyear, then I saw that it was even an example!

Json example

I think the json example is usefull, would it be a bad idea to add it to the package?

Universe inconsistency in Test.idr with Idris 0.9.19

compiling lightyear tests...
Test.idr:18:1:Universe inconsistency.
Working on: o39
Old domain: (7,7)
New domain: (7,6)
Involved constraints:
ConstraintFC {uconstraint = o39 < p39, ufc = Test.idr:18:1}
ConstraintFC {uconstraint = o39 < p39, ufc = Test.idr:18:1}
ConstraintFC {uconstraint = k39 <= o39, ufc = Test.idr:18:1}
ConstraintFC {uconstraint = m39 <= o39, ufc = Test.idr:18:1}

  • could not compile tests *

MonadState implementation for ParserT?

Would it be possible to add an implementation of MonadState for ParserT?

I took a look at the implementation for Monad, and just blinked. I'm not sure I'm up to defining this myself.

Ambiguous :: in Lightyear/String_.idr c2s

./Lightyear/String_.idr:37:5:When elaborating right hand side of c2s:
Can't disambiguate name: Prelude.List.::, Prelude.Stream.::, Prelude.Vect.::

It works if I change the [] to Prelude.List.Nil

Idris 0.9.10

documentation

Is there any documentation available for this library?

All string parsers are now partial

Every single string parser is now partial because #55 has injected its partial tab-counting code into the Stream Char String instance (and ignoring the tabwidth given to updatePos in favor of the default).

(Also, is there anywhere to read what that code is actually supposed to accomplish?)

Infinite loop

I haven't managed to prepare smaller repro, this is what I found out so far.

Given following code: http://lpaste.net/2559959787278172160 (with test cases in comments), one of the test cases causes infinite loop.

Removing one of those fixes it (that's why I haven't been able to nail it down, since it might be tad more complicated interplay between two parsers):

  1. Removing many from here (throwing entire sepBy, as it uses many indirectly):
pKey : Parser (List String)
pKey = sepBy1 (pack <$> many alphaNum) (char '.')
  1. Removing getPosition from here (I run the branch with latest getState fix):
pPartial : (Position -> Maybe Position) -> Parser Node
pPartial f = do
  pos <- f <$> getPosition
  key <- pTag ">" <?> "partial tag"
  pure (Partial key pos)

Alternative backtracks, but the transformer does not

Steps to reproduce

import Control.Monad.State
import Lightyear
import Lightyear.Char
import Lightyear.Combinators
import Lightyear.Strings

-- hide the one from Lightyear.Strings
%hide Parser

Parser : Type -> Type
Parser = ParserT String (State Integer)

pModify : Parser ()
pModify = do
  token "+"
  modify (+ 1)

pTextBlock : Parser String
pTextBlock = pack <$> many anyChar

parser : Parser (List String)
parser = manyTill (choice alts) eof
  where
    alts : List (Parser String)
    alts = [ pure "" <* ((pModify <* string "can'thappen") <|> pModify)
           , pTextBlock
           ]

testParse : Parser a -> String -> Either String (a, Integer)
testParse p src =
  let initialVal = 0
      Id (r,s) = flip runStateT initialVal $ execParserT p (initialState Nothing src 8)
  in case r of
    MkReply _ (Success x)  => Right (x,s)
    MkReply _ (Failure es) => Left $ concat $ intersperse "\n" $ map display es

Running

λΠ> :exec testParse parser "+ho ho"

gives

Right (["", "ho ho"], 2)

Observed behaviour

The returned value means that our state had been modified two times. And this is because of the alternative:

((pModify <* string "can'thappen") <|> pModify)

which first enters the left branch (which won't succeed due to dummy string parse at the end), and then the right branch kicks in, and succeeds.

Expected behavior

If the default behavior is to backtrack, then the result of the transformer needs to be discarded as well.

Build failure with idris 0.12.1

~/Projects/idris/lightyear (master)$ make install
idris --build lightyear.ipkg
Type checking ./Lightyear/StringFile.idr
./Lightyear/StringFile.idr:22:12:When checking type of Lightyear.StringFile.Lightyear.readFile:
Can't disambiguate since no name has a suitable type: 
        Effects.DepEff.Eff, Effects.SimpleEff.Eff, Effects.TransEff.Eff
make: *** [build] Error 1

I do not know the effects system yet, otherwise I would send a PR.

requireFailure consumes an input (does not do backtracking)

The documentation claims that requireFailure is called notFollowedBy in parsec and that it doesn't consume any input. However, I see it does.

For example, I want to parse a series of any 4 chars. However, these chars shouldn't form a specific string ("bb" in an example below). So "aaaa" and "abcd" should math, but neither "bbcd" nor "abbc" should not.

I composed a following parser:

ntimes 4 (requireFailure (string "bb") *> anyChar)

However, I noticed, that it "eats" single b chars. E.g.

parse (ntimes 4 (requireFailure (string "bb") *> anyChar)) "abcde"

results in ['a', 'c', 'd', 'e'] (it fails, however, on "bbcd" and "abbc" as expected).

If following implementation of requireFailure is used:

requireFailure' : ParserT str m tok -> ParserT str m ()
requireFailure' (PT f) = PT requireFailureHelper where
    requireFailureHelper r us cs ue ce ss@(ST i pos tw) =
        f r
            (\t, s => ue [Err pos "argument parser to fail"] s)
            (\t, s => ce [Err pos "argument parser to fail"] s)
            (\errs, _ => us () ss)
            (\errs, _ => cs () ss)
            ss
parse (ntimes 4 (requireFailure' (string "bb") *> anyToken)) "abcde"

gives ['a', 'b', 'c', 'd'] as I expect.

Asked previously on StackOverflow.

getPosition is invalidating the parser state?

Given parser:

pFrob : Parser String
pFrob = do
  pos <- getPosition
  string "frob"
  pure "frob!"

following test code

parse (pFrob <|> string "w00t") "w00t"

should fail on left branch and succeed on second. However, this happens:

Left "At 1:1:\n\tstring \"frob\"\nAt 1:1:\n\tcharacter 'f'\nAt 1:1:\n\ta different token" : Either String
                                                                                                   String

Now, if I remove the getPosition action:

pFrob : Parser String
pFrob = do
  string "frob"
  pure "frob!"

it succeeds:

Right "w00t" : Either String String

won't build with 0.99

$ git clone [email protected]:ziman/lightyear.git
Cloning into 'lightyear'...
remote: Counting objects: 611, done.
remote: Total 611 (delta 0), reused 0 (delta 0), pack-reused 611
Receiving objects: 100% (611/611), 108.43 KiB | 57.00 KiB/s, done.
Resolving deltas: 100% (345/345), done.
Checking connectivity... done.
$ cd lightyear/
$ make clean
idris --clean lightyear.ipkg
rm -f tests/*.ibc
$ make test
idris --build lightyear.ipkg
Type checking ./Lightyear/Core.idr
Type checking ./Lightyear/Combinators.idr
Type checking ./Lightyear/Errmsg.idr
Type checking ./Lightyear.idr
Type checking ./Lightyear/Char.idr
Type checking ./Lightyear/Strings.idr
Type checking ./Lightyear/StringFile.idr
idris --install lightyear.ipkg
Installing Lightyear.ibc to /home/k/.cabal/share/x86_64-linux-ghc-7.10.3/idris-0.99/libs/lightyear
Installing Lightyear/Core.ibc to /home/k/.cabal/share/x86_64-linux-ghc-7.10.3/idris-0.99/libs/lightyear/Lightyear
Installing Lightyear/Combinators.ibc to /home/k/.cabal/share/x86_64-linux-ghc-7.10.3/idris-0.99/libs/lightyear/Lightyear
Installing Lightyear/Errmsg.ibc to /home/k/.cabal/share/x86_64-linux-ghc-7.10.3/idris-0.99/libs/lightyear/Lightyear
Installing Lightyear/StringFile.ibc to /home/k/.cabal/share/x86_64-linux-ghc-7.10.3/idris-0.99/libs/lightyear/Lightyear
Installing Lightyear/Strings.ibc to /home/k/.cabal/share/x86_64-linux-ghc-7.10.3/idris-0.99/libs/lightyear/Lightyear
Installing Lightyear/Char.ibc to /home/k/.cabal/share/x86_64-linux-ghc-7.10.3/idris-0.99/libs/lightyear/Lightyear
Installing 00lightyear-idx.ibc to /home/k/.cabal/share/x86_64-linux-ghc-7.10.3/idris-0.99/libs/lightyear
(cd tests; bash runtests.sh)
TIMOUTCMD set to timeout
compiling lightyear tests...
compiled OK, running lightyear tests...
compiling the JSON test...
compiled OK, running the JSON test...
9d8
<
16d14
<
31d28
<

something FAIL

Makefile:10: recipe for target 'test' failed
make: *** [test] Error 1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.