arith

A simple demo of building a parser with combinators. The parser consumes and evaluates arithmetic expressions. If you enter a badly formed expression it will tell you what went wrong. For example:

$ cabal run arith "2+(4/(5-4))"
6
$ cabal run arith "2+(4/(5-4)"
"arithmetic" (line 1, column 11):
unexpected end of input
expecting ")"

You can also supply an environment or lookup table of identifiers and the values assigned to them:

$ cabal run arith "2+(4/(5-4))+foo-x" "[(\"foo\",1),(\"x\",9)]"
-2

Note that you need to escape the quotation marks around identifier names with a backslash.

The parser uses the Parsec library. When building parsers with combinators, we start with tiny parsers capable of, for instance, parsing a single character or the repetition of a character, then combine these in various ways to produce more sophisticated parsers. Parsec provides atomic parsers like digit and letter, then ways of repeating and combining them, like many (takes a parser and applies it zero to many times), many1 (apply a parser one or more times) and (<|>) (take two parsers and apply the first one then, if the first one fails, apply the second one).

We want to parse strings, such as "2 + x", into our data type for expressions. An expression is either a value (e.g. 2), an identifier (e.g. "x") or the addition, subtraction, multiplication or division of two expressions:

-- in Arith.Types

data Exp = Val Int
  | Id String
  | Plus Exp Exp
  | Minus Exp Exp
  | Mult Exp Exp
  | Div Exp Exp deriving (Show, Eq)

We can create a parser that will consume an identifier and produce the right kind of Exp value using many1 and letter:

-- in Arith.Parse

-- |Parse an identifier
parseId :: Parser Exp
parseId = Id <$> many1 letter

Note that parseId is using applicative style. It tries to run the parser many1 letter then supplies the result to the Id constructor, which is "lifted" into the Parser context using (<$>). You could equally well write it with do-notation like so:

parseId = do str <- many1 letter
             pure (Id str)

Now we can run this parser using the parse function from Parsec. It will return Either ParseError Exp. That is, if all goes well, it will return an Exp value in the Left constructor, like Left (Id "x"), and if anything goes wrong it will return Right e, where e tells us what went wrong. The parse function takes three arguments: the parser, a string describing the parser and the string to be parsed. We can play with it in ghci like so:

$ cabal repl
ghci> :l Arith.Parse
ghci> :m + Text.Parsec
ghci> parse parseId "parsing an identifier" "x"
Right (Id "x")
ghci> parse parseId "parsing an identifier" "5"
Left "parsing an identifier" (line 1, column 1):
unexpected "5"
expecting letter

After creating similar parsers for the various things that can make up an arithmetic expression, we can think about parsing complex expressions made up of several parts. So that our expressions are unambiguous without the need for lots of brackets, we introduce a distinction between terms (expressions that are operands to a multiplication or division) and factors (expressions that are operands to an addition or subtraction). This allows us to say that we want terms to "bind" more tightly than factors, following the normal convention that 2+3*2 is equal to 8 rather than 10.

A factor is a value or an identifier or an expression inside brackets:

-- |Parse a factor
parseFactor :: Parser Exp
parseFactor = parseVal <|> parseId <|> parseParen

To parse a term expression, we get the factor on the left, decide which operator we're looking at, get the factor on the right then return a Mul or Div expression as required:

-- |Parse a term
parseTerm :: Parser Exp
parseTerm = parseFactor >>= loop
  where factorSuffix f1 = do
          op <- lexeme $ oneOf "*/"
          f2 <- parseFactor
          case op of
            '*' -> loop (Mult f1 f2)
            '/' -> loop (Div f1 f2)
        loop t = factorSuffix t <|> pure t

Note that the alternation in loop means that if the factorSuffix parser doesn't match we just return the expression as is. So calling parseTerm on an expression that doesn't contain any terms, like 2+2 or 5, just returns the original expression.

To parse an expression that might contains terms and factors, we first try to parse terms then try to parse factors.

-- |Parse an expression
parseExp :: Parser Exp
parseExp = parseTerm >>= loop
  where termSuffix t1 = do
          op <- lexeme $ oneOf "+-"
          t2 <- parseTerm
          case op of
            '+' -> loop (Plus t1 t2)
            '-' -> loop (Minus t1 t2)
        loop t = termSuffix t <|> pure t

Finally, we can hook this up in the main method to read expressions from the command line. The important part is the case expression that runs the parse function and unpacks the result. We are expecting args, the list of strings supplied as command line arguments, to have at least one thing in it, which is the expression to parse. If there is more than one thing then the second one is assumed to be an environment, which has to be valid Haskell syntax for a list of (String, Int) pairs.

case parse parseExp "arithmetic" (head args) of
  Left e -> print e
  Right x -> do let env = if length args > 1
                          then read (args !! 1)
                          else []
                case eval x env of
                    Nothing  -> putStrLn "error"
                    (Just i) -> print i

jimburton / arith Goto Github PK

arith's Introduction

arith

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent