ekmett / parsers Goto Github PK

View Code? Open in Web Editor NEW

88.0 19.0 37.0 507 KB

Generic parser combinators

License: Other

Haskell 100.00%

parsers's Introduction

parsers

Goals

This library provides convenient combinators for working with and building parsing combinator libraries.

Given a few simple instances, you get access to a large number of canned definitions.

Contact Information

Contributions and bug reports are welcome!

Please feel free to contact me through github or on the #haskell IRC channel on irc.freenode.net.

-Edward Kmett

parsers's People

Contributors

Stargazers

Watchers

parsers's Issues

Compilation errors with latest updates (attoparsec?)

$ cabal update
$ cabal install parsers

Resolving dependencies...
Configuring parsers-0.11.0.2...
Building parsers-0.11.0.2...
Failed to install parsers-0.11.0.2
Last 10 lines of the build log ( /home/fibonacci/.cabal/logs/parsers-0.11.0.2.log ):
Preprocessing library parsers-0.11.0.2...
[1 of 8] Compiling Text.Parser.Token.Highlight ( src/Text/Parser/Token/Highlight.hs, dist/build/Text/Parser/Token/Highlight.o )
[2 of 8] Compiling Text.Parser.Permutation ( src/Text/Parser/Permutation.hs, dist/build/Text/Parser/Permutation.o )
[3 of 8] Compiling Text.Parser.Combinators ( src/Text/Parser/Combinators.hs, dist/build/Text/Parser/Combinators.o )

src/Text/Parser/Combinators.hs:376:10:
Not in scope: type constructor or class ‘Att.Chunk’

src/Text/Parser/Combinators.hs:382:21:
Not in scope: ‘Att.endOfInput’
cabal: Error: some packages failed to install:
parsers-0.11.0.2 failed during the building phase. The exception was:
ExitFailure 1

Allow overlap between prefix and term in buildExpressionParser?

I have a rather strange request, perhaps; what would you think about allowing prefix operator and term overlap in results of buildExpressionParser? That is, change

          termP      = do{ pre  <- prefixP
                         ; x    <- term
                         ; post <- postfixP
                         ; return (post (pre x))
                         }

to something like

          preTermP   =     do { pre <- prefixP; x <- term; return $ pre x }
                       <|> term

          termP      = do { x <- preTermP
                         ; post <- postfixP
                         ; return (post x)
                         }

Feature request: instances for more parsing libraries

I would be nice to have instances of the classes exposed by this library for polyparse and uuparsinglib. I find these libraries much nicer to work with because <|> backtracks by default. Unfortunately, they don't come with a set of predefined parsers for common tasks like parsing floats, and I often fall back on parsec for this reason. If the combinators exposed by this library would work together with polyparse or uuparsing, I would never use parsec again.

double and scientific don't parse negatives

> runParser double mempty "-1.0" :: Result Double
Failure (ErrInfo {_errDoc = (interactive):1:1: error: expected: double
-1.0<EOF>
^       , _errDeltas = [Columns 0 0]})

Negative Integers parse just fine.

The Haskell Report appears to specify negative literals for both integer and float in the grammer, and the fix appears easy. Any reason this is like this or just an oversight?

Package description insufficient

Hi,

“Parsing combinators” is a bit meagre. Please consider elaborating the package description.

Thanks,
Joachim

Instances for ReadP?

Can we actually make the standard ReadP parser provide us with instances? This would give us an out of the box reference implementation.

`natural` parser parses 0

In fact, it accepts any number of 0s, and returns 0 as a result, when the documentation specifies that it should only parse positive integers.

Text.Parser.Char.lower documentation excludes lowercase Unicode characters outside of ASCII

From the docs:

Parses a lower case character (a character between 'a' and 'z')

But the implementation uses isLower, which is not restricted to ASCII lowercase.

Move notFollowedBy to LookAheadParsing?

I think notFollowedBy should be in LookAheadParsing.

The attoparsec implementation is different than the one for parsers that have lookahead.

parsers on hackage does not allow does not allow attoparsec 0.12

http://hackage.haskell.org/package/parsers-0.11.0.2

Feature Request: case-insensitive string match

Is it possible to add such a function?
stringCI :: (CharParsing m, IsString s) => s -> m s

Applicative Permutation?

The structure of Permutation makes it seem like it'd be an Applicative, but it isn't. If it can't be done on the current structure, it can be done with the equivalent of Curried (Yoneda Permutation) Permutation. This would allow traverseing over a list and being able to match all permutations of that list, for instance.

Intermittent QuickCheck test failures

On rare occasion, the quicktest test suite will fail. Here's an example from a recent Travis build:

Test suite quickcheck: RUNNING...
*** Failed! Falsifiable (after 33 tests): 
attoparsec
'\721424'
'\DLE'

That was on GHC 7.8.4, but the GHC version appears to be unimportant, since I can reproduce the issue on 8.2.1 as well:

*** Failed! Falsifiable (after 100 tests):                   
attoparsec
'\1044548'
'D'

Suggestion for a different interface for permutations.

Why did nobody think of releasing a package like this before? Very useful!

I have a suggestion for the permutation parsers. Martijn van Steenbergen created a cleaner and more powerful way to specify permutation parsers, with PermuteEffects and ReplicateEffects. Based on that Doaitse Swierstra added MergAndPermute parsers to uu-parsinglib. Here are some examples.

The two solutions are not completely the same, and which you prefer is a matter of taste, but both are an improvement over the operators from "Parsing Permutation Phrases".

attoparsec doesn't support lookAhead

@bos has expressed willingness to accept a patch, however.

Incorrect implementation of someSpace for Unlined in Text.Parser.Token.

The comments on the 'TokenParsing' type class suggest that 'someSpace' should parse at least one space but the instance of 'someSpace' for 'Unlined' also parses the empty string it's definition is:

someSpace = skipMany (satisfy $ \c -> c /= '\n' && isSpace c)

where it should probably be something like

someSpace = spaceNoNewline >> skipMany spaceNoNewline where
spaceNoNewline c = c /= '\n' && isSpace c

minimal example:

Text.Parser.Token Text.Trifecta.Parser Text.Parser.Combinators> parseTest (runUnlined someSpace *> eof ) ""
()

parsers 0.11.0.2 doesn't compile

[3 of 8] Compiling Text.Parser.Combinators ( src/Text/Parser/Combinators.hs, dist/build/Text/Parser/Combinators.o )

src/Text/Parser/Combinators.hs:376:10:
    Not in scope: type constructor or class `Att.Chunk'

src/Text/Parser/Combinators.hs:382:21:
    Not in scope: `Att.endOfInput'

See http://hydra.cryp.to/build/136645/nixlog/1/raw for a complete build log.

Parsing an out-of-range character literal throws an exception

Attempting to parse a character literal that is larger than maxBound using Text.Parser.Token.charLiteral throws an exception. It should produce a useful parse error.

Prelude Text.Trifecta.Parser Text.Parser.Token> parseTest charLiteral "'\\1114112'"
*** Exception: Prelude.chr: bad argument: 1114112

The exception comes from the use of toEnum in

parsers/src/Text/Parser/Token.hs

Line 582 in 2f53572

charNum = toEnum . fromInteger <$> num where

. The result should be bounds checked before being passed to toEnum.

For comparison, when asking GHC to parse the literal by simply entering it, it generates a (relatively) useful parse error:

λ> '\1114112'

<interactive>:53:9:
    numeric escape sequence out of range at character '2'

buildExpressionParser Error Messages

Hi!

The parser names in buildExpressionParser seem to be a bit off / confusing.

For instance, let's say I define a non-associative operator.

term :: DeltaParsing m => m Integer
term = natural <?> "number in term"

opTable :: (DeltaParsing m) => OperatorTable m Integer
opTable = [[binary "+" (+) AssocNone]]
  where binary op f assoc = Infix (f <$ reservedOp op) assoc

expression :: (DeltaParsing m) => m Integer
expression = buildExpressionParser opTable term

Then something like parseString (expression <* eof) "2 + 2 + 2" will spew out some confusing messages:

*>  parseString (expression <* eof) mempty "2 + 2 + 2"
Failure (interactive):1:7: error: expected: ambiguous use of a left-associative operator,
    ambiguous use of a right-associative operator,
    end of input
2 + 2 + 2<EOF>

the problem is that there is an ambiguous use of a non-associative operator. Not that the parser should be expecting an ambiguous left, or right associative operator. In fact this parser has no left or right associative operators, so why should it claim to be expecting anything of the sort?

I may be very wrong, but I can't seem to get these ambiguous messages to ever show up in a meaningful way. I'm not sure what would make sense, but maybe something like...

expected: a non-ambiguous use of a left-associative operator

would be better for a case where a left-associative operator was used ambiguously?

Thanks!

Dead link in package description

The package description links to Text.Parser.Combinators.Parsing which no longer exists.

Wrong example

Hello, the example in the documentation (http://hackage.haskell.org/package/parsers-0.12.1/docs/Text-Parser-Expression.html) is wrong.

Instead of:

  expr    = buildExpressionParser table term
          <?> "expression"

  term    =  parens expr
          <|> natural
          <?> "simple expression"

  table   = [ [prefix "-" negate, prefix "+" id ]
            , [postfix "++" (+1)]
            , [binary "*" (*) AssocLeft, binary "/" (div) AssocLeft ]
            , [binary "+" (+) AssocLeft, binary "-" (-)   AssocLeft ]
            ]

  binary  name fun assoc = Infix (fun <* reservedOp name) assoc
  prefix  name fun       = Prefix (fun <* reservedOp name)
  postfix name fun       = Postfix (fun <* reservedOp name)

it should be:

  expr    = buildExpressionParser table term
          <?> "expression"

  term    =  parens expr
          <|> natural
          <?> "simple expression"

  table   = [ [prefix "-" negate, prefix "+" id ]
            , [postfix "++" (+1)]
            , [binary "*" (*) AssocLeft, binary "/" (div) AssocLeft ]
            , [binary "+" (+) AssocLeft, binary "-" (-)   AssocLeft ]
            ]

  binary  name fun assoc = Infix (fun <$ reservedOp name) assoc
  prefix  name fun       = Prefix (fun <$ reservedOp name)
  postfix name fun       = Postfix (fun <$ reservedOp name)

Remove or move the MonadPlus constraint.

Why does Parsing need a MonadPlus constraint?

Maybe it would be nice to be able to see which parts need monadic parsing, so perhaps you could remove the MonadPlus constraint and add class MonadPlus m => MonadicParsing m.

Test run failure

Not sure if I should report this here or with GHC:

Test suite doctests: RUNNING...
GHC runtime linker: fatal error: I found a duplicate definition for symbol
   _hs_text_memcpy
whilst processing object file
   /home/stackage/.cabal/lib/x86_64-linux-ghc-7.10.1/text_IINWRW1LxFGIctooOLjJAI/libHStext-1.2.0.4-IINWRW1LxFGIctooOLjJAI.a
This could be caused by:
   * Loading two different object files which export the same symbol
   * Specifying the same object file twice on the GHCi command line
   * An incorrect `package.conf' entry, causing some object to be
     loaded twice.
doctests: doctests: panic! (the 'impossible' happened)
  (GHC version 7.10.1 for x86_64-unknown-linux):
        loadArchive "/home/stackage/.cabal/lib/x86_64-linux-ghc-7.10.1/text_IINWRW1LxFGIctooOLjJAI/libHStext-1.2.0.4-IINWRW1LxFGIctooOLjJAI.a": failed

Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug

Test suite doctests: FAIL
Test suite logged to: /home/stackage/work/logs/nightly/parsers-0.12.2.1/test-run.out

Approximate matching combinators (missing?)

I was looking for combinators for the case where I want to do an approximate matching.

So for example I'm matching ************\d\d\d\d, but I'll accept (for OCR) a few non-* characters in the first part of the string.

An useful combinator would be to be able to have alternatives with associated weights, and have some sort of levenshtein distance as combinator.

Runtime crash caused by permutation parsing with Optimization flag

Consider the following code:

module Main where
import Control.Applicative
import Text.Parser.Permutation
import Text.Trifecta

main :: IO ()
main =  parseTest (myPerm "a") "semanticsppp"

myPerm :: String -> Parser (String, String, String)
myPerm opN = permute $
   (,,) opN <$$> string "semantics"
            <|?> ([], (some (char 'p')))

If I compile this code with -O2 flag with GHC 7.6.3, this program halt with runtime crash:

$ ghc -O2 crash.hs
[1 of 1] Compiling Main             ( crash.hs, crash.o )
Linking crash ...

$ ./crash
crash: Oops!  Entered absent arg w_s3DL{v} [lid] base:GHC.Base.String{tc 36u}

If I compile the same code without -O2 flag, then it runs without fail:

$ ghc -O0 crash.hs -fforce-recomp
[1 of 1] Compiling Main             ( crash.hs, crash.o )
Linking crash ...

$ ./crash                        
("a","semantics","ppp")

It might be GHC's bug, but I couldn't find out what caused this.

Expression parser just doesn't work at all

Following code works correctly with parsers-0.9 and trifecta-1.2.1.1, but not with parsers-0.10.

module Main where
import Control.Applicative
import Text.Parser.Expression
import Text.Trifecta

data Expr = IntLit Integer | Add Expr Expr
            deriving (Read, Show, Eq, Ord)

term :: (Monad f, TokenParsing f) => f Expr
term = IntLit <$> integer

expression :: (Monad f, TokenParsing f) => f Expr
expression = buildExpressionParser table term
  where
    table = [ [ Infix (Add <$ symbolic '+') AssocLeft]
            ]

main :: IO ()
main = parseTest expression "1+2"

If I run above code with parsers-0.9, I get following:

Add (IntLit 1) (IntLit 2)

but with 0.10, parsing fails!

(interactive):1:3: error: unspecified error
1+2<EOF> 
  ^

Generalize highlighting

Right now, the Highlight type is very restricted in domain; it should really be a type class (perhaps with some standard instances and/or transformers).

new parsers not on hackage..

please update parser on hackage to 0.12.4

Building tests via cabal fails when adding dependencies hspec and trifecta

To reproduce:

checkout unexpected-bug-example branch from this fork: https://github.com/razvan-panda/parsers/tree/unexpected-bug-example

cabal sandbox init
cabal install --only-dependencies --enable-tests

Resolving dependencies...
cabal: Could not resolve dependencies:
next goal: trifecta (dependency of parsers-0.12.8:*test)
rejecting: trifecta-1.7.1.1/installed-A22... (package is broken)
rejecting: trifecta-1.7.1.1, trifecta-1.7.1, trifecta-1.7, trifecta-1.6.2.1,
trifecta-1.6.2, trifecta-1.6.1 (cyclic dependencies; conflict set: parsers,
trifecta)
rejecting: trifecta-1.6 (conflict: base==4.10.1.0/installed-4.1..., trifecta
=> base>=4.4 && <4.9.1)
rejecting: trifecta-1.5.2, trifecta-1.5.1.3, trifecta-1.5.1.2,
trifecta-1.5.1.1, trifecta-1.5.1, trifecta-1.5, trifecta-1.4.3,
trifecta-1.4.2, trifecta-1.4.1 (cyclic dependencies; conflict set: parsers,
trifecta)
rejecting: trifecta-1.4 (conflict: base==4.10.1.0/installed-4.1..., trifecta
=> base>=4.4 && <4.7)
rejecting: trifecta-1.2.1.1, trifecta-1.2.1 (cyclic dependencies; conflict
set: parsers, trifecta)
rejecting: trifecta-1.2 (conflict: base==4.10.1.0/installed-4.1..., trifecta
=> base<0)
rejecting: trifecta-1.1, trifecta-1.0 (cyclic dependencies; conflict set:
parsers, trifecta)
trying: trifecta-0.53
trying: template-haskell-2.12.0.0/installed-2.1... (dependency of
tagged-0.8.5)
next goal: pretty (dependency of template-haskell-2.12.0.0/installed-2.1...)
rejecting: pretty-1.1.3.3/installed-1.1... (conflict: pretty =>
deepseq==1.4.3.0/installed-1.4..., trifecta => deepseq>=1.2.0.1 && <1.4)
rejecting: pretty-1.1.3.6, pretty-1.1.3.5, pretty-1.1.3.4, pretty-1.1.3.3,
pretty-1.1.3.2, pretty-1.1.3.1, pretty-1.1.2.1, pretty-1.1.2.0,
pretty-1.1.1.3, pretty-1.1.1.2, pretty-1.1.1.1, pretty-1.1.1.0,
pretty-1.1.0.0, pretty-1.0.1.2, pretty-1.0.1.1, pretty-1.0.1.0, pretty-1.0.0.0
(conflict: template-haskell => pretty==1.1.3.3/installed-1.1...)
After searching the rest of the dependency tree exhaustively, these were the
goals I've had most trouble fulfilling: terminfo, trifecta, base, parsers,
parsers-0.12.8:test
Note: when using a sandbox, all packages are required to have consistent
dependencies. Try reinstalling/unregistering the offending packages or
recreating the sandbox.

Building using stack test works fine.

Provide token parsers that return Scientific values instead of Doubles

Since Double values can lose precision, values such as 4.8132429863700174 can return incorrect values (in this case, 4.8132429863700175). It would be really neat if there were combinators that returned Scientific values (e.g. integerOrScientific). This package already indirectly depends on the scientific library, so it wouldn't entail any further dependencies.

commaSep1 does not parse one-or-more, it parses zero-or-more

http://hackage.haskell.org/packages/archive/parsers/0.5/doc/html/src/Text-Parser-Token.html#commaSep1 is identical to http://hackage.haskell.org/packages/archive/parsers/0.5/doc/html/src/Text-Parser-Token.html#commaSep, so commaSep1 will quite happily return the empty list.

Is there suppose to be two nested `token` calls here?

Was looking through this code, and the nested token (token (...)) call seems like it may be redundant...

integer :: forall m. TokenParsing m => m Integer
integer = token (token (highlight Operator sgn <*> natural')) <?> "integer"

notFollowedBy is a no-op for attoparsec and yoctoparsec, say

Hi,
I've been studying this definition of the notFollowedBy combinator for attoparsec:

parsers/src/Text/Parser/Combinators.hs

Line 456 in 9b86500

notFollowedBy p = optional p >>= maybe (pure ()) (unexpected . show)

I am now fairly convinced that this is a no-op, but I'd like to ask for some feedback for my reasoning from someone who knows the subject matter better than myself. To this end, please consider the following reduction steps:

  optional p >>= maybe (pure ()) (fail . show) -- `unexpected` is `fail`
~ optional p >>= maybe (pure ()) (const empty) -- `fail` should behave as `empty` here
~ Just <$> p <|> pure Nothing >>= maybe (pure ()) (const empty) -- inlining the definition of `optional`
~ (p >>= const empty) <|> pure () -- distributing the alternative
~ (p *> empty) <|> pure () -- using a simple fact about `const` and `>>=`
~ pure () -- if `p` succeeds, we've got `empty <|> pure ()` which reduces to `pure ()`. if `p` fails, we directly get `pure ()`.

which does nothing when used like this: p <* pure (). pure () always succeeds and consumes nothing. so this is equivalent to p.

Can someone please comment on this reasoning?

Thanks :)

Add flag description

Add description for:

flag lib-Werror

Widen deps to allow use of new attoparsec

Hello from 0.12 land :-)

Parsing with source locations

It’s nice to get source locations from parsers: at a minimum, line and column, but character/byte offset might be important for a lot of use cases too. position, span, and spanned are all useful combinators here:

class TokenParsing m => PositionParsing m where
  position :: m Pos

span :: PositionParsing m => m a -> m Span
spanned :: PositionParsing m => m a -> m (Span, a)

This would require parsers to define or otherwise provide Pos and Span types, give them as part of the definition of the class (fundep or type family), or e.g. just return (nested) tuples. I’m in favour of defining them here, but not especially picky. Fundep/type family seems like it would make combinators less easily portable between parser implementations, tho.

We may or may not want to accommodate e.g. #line directives. I kinda think not; it seems like the user should be able to deal with this themselves by watching for and remembering the most recent directive as they parse without requiring conforming parser implementations to be able to track them.

NB: trifecta has DeltaParsing, but Delta isn’t quite the shape I want for most of my parsers. DeltaParsing does, however, offer some extra functionality: returning the current line, slicing the input, etc. All of that is either tied to the input type (String, Text, ByteString…) or converts, so we might want to punt on that stuff for the moment. I think some of the complications of Delta are intended to deal with incremental or discontinuous parsing, which may or may not warrant further thought here as well.

No instance for (Text.Parser.Combinators.Parsing Parser) arising from a use of `unexpected'

How to reproduce:

checkout unexpected-bug-example branch from this fork: https://github.com/razvan-panda/parsers/tree/unexpected-bug-example

stack init
stack test

Causes following error:

[1 of 2] Compiling SemVer           ( tests/SemVer.hs, .stack-work/dist/x86_64-linux-nix/Cabal-2.0.1.0/build/quickcheck/quickcheck-tmp/SemVer.o )
            
/home/neo/Forks/parsers/tests/SemVer.hs:13:10: error:
    * No instance for (Text.Parser.Combinators.Parsing Parser)
        arising from a use of `unexpected'
    There are instances for similar types:
        instance Data.Attoparsec.Internal.Types.Chunk t =>
                Text.Parser.Combinators.Parsing
                (Data.Attoparsec.Internal.Types.Parser t)
        -- Defined in `Text.Parser.Combinators'
    * In the expression: unexpected "Leading Zeros"
    In a stmt of a 'do' block:
        if length digits > 1 && head digits == '0' then
            unexpected "Leading Zeros"
        else 
            return $ read digits
    In the expression:
        do digits <- some digit
        if length digits > 1 && head digits == '0' then
            unexpected "Leading Zeros"
        else
            return $ read digits
|         
13 |     then unexpected "Leading Zeros"
|

T.P.Expression should be able to handle chained postfix operators

Languages like C, E, and Python permit chained postfix calls and other operators; we should have some functionality for it. @ekmett suggested adding a shunting yard; it might also be possible to permit multiple postfix operators directly in the current precedence-table algorithm.