Giter Site home page Giter Site logo

pyrsec's Introduction

Pyrsec

Simple parser combinator made in Python

PyPI PyPI - License codecov

In the journey of creating a parser combinator in python while being as type safe as possible we are here now. I don't recommend you use this for anything important but for exploration and fun. This library is a mostly undocumented, bare bone implementation of a parser combinator, no error recovery is currently in place, only None is returned in case the parser can't continue. I basically started with a minimum implementation while adding a basic json parser as a test and kept adding functionality as needed.

pip install pyrsec

A Json parser as an example

You should be able to inspect the types of the variables in the following code

>>> from pyrsec import Parsec

Lets define the type of our json values,

>>> from typing import Union, List, Dict  # because 3.8 and 3.9 ๐Ÿ™„
>>> # Recursive type alias ๐Ÿ‘€. See how we will not parse `floats` here.
>>> # Also at this level we can't still reference JSON recursively, idk why.
>>> JSON = Union[bool, int, None, str, List["JSON"], Dict[str, "JSON"]]

and the type of our parser. Since this is a parser that will output JSON values its type will be Parsec[JSON].

>>> # To be defined later
>>> json_: Parsec[JSON]
>>> # For recursive parsers like `list_` and `dict_`
>>> deferred_json_ = Parsec.from_deferred(lambda: json_)

Lets bring up a few basic parsers.

>>> import re
>>> true = Parsec.from_string("true").map(lambda _: True)
>>> false = Parsec.from_string("false").map(lambda _: False)
>>> null = Parsec.from_string("null").map(lambda _: None)
>>> number = Parsec.from_re(re.compile(r"-?\d+")).map(int)
>>> true("true")
(True, '')
>>> false("false")
(False, '')
>>> null("null")
(None, '')
>>> number("42")
(42, '')

We need to be able to parse character sequences, lets keep it simple.

The operators >> and << are used to discard the part that the arrow is not pointing at. They are meant to work well with Parsec instances. In this case only the result of the middle parser Parsec.from_re(re.compile(r"[^\"]*")) is returned from the string parser.

If what you want instead is to concatenate the results you should see the & operator. (wait for the pair definition).

>>> quote = Parsec.from_string('"').ignore()
>>> string = quote >> Parsec.from_re(re.compile(r"[^\"]*")) << quote
>>> string('"foo"')
('foo', '')

See how the quotes got discarded?

Also, missing a quote would mean a parsing error.

>>> string('foo"'), string('"bar')
(None, None)

Lets get a little bit more serious with the lists.

Spaces are always optional on json strings. Other basic tokens are also needed.

>>> space = Parsec.from_re(re.compile(r"\s*")).ignore()
>>> comma = Parsec.from_string(",").ignore()
>>> opened_square_bracket = Parsec.from_string("[").ignore()
>>> closed_square_bracket = Parsec.from_string("]").ignore()

And finally, the list parser. We need to use a deferred value here because the definition is recursive but the whole json parser is still not available.

>>> list_ = (
...     opened_square_bracket
...     >> (deferred_json_.sep_by(comma))  # See here?
...     << closed_square_bracket
... )

Lets create an incomplete one.

>>> json_ = space >> (true | false | number | null | string | list_) << space

Lets try it then!

>>> list_("[]")
([], '')
>>> list_("[1, true, false, []]")
([1, True, False, []], '')

Defining a dict should be pretty easy by now. Maybe the pair parser is interesting because its use of &.

Some tokens,

>>> opened_bracket = Parsec.from_string("{").ignore()
>>> closed_bracket = Parsec.from_string("}").ignore()
>>> colon = Parsec.from_string(":").ignore()

And pair, notice that the type of pair will be Parsec[tuple[str, JSON]].

>>> pair = ((space >> string << space) << colon) & deferred_json_
>>> pair('"foo": [123]')
(('foo', [123]), '')

The dict parser will finally be pretty close to the list one.

>>> dict_ = (
...     opened_bracket
...     >> pair.sep_by(comma).map(lambda xs: dict(xs))
...     << closed_bracket
... )

And finally lets redefine the json parser to embrace the full beauty of it.

>>> json_ = space >> (true | false | number | null | string | list_ | dict_) << space
>>> json_("""
... {
...     "json_parser": [true]
... }
... """)
({'json_parser': [True]}, '')

Enjoy!

pyrsec's People

Contributors

frndmg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

pyrsec's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.