Giter Site home page Giter Site logo

hpython's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hpython's Issues

Parser fails on type ascription

While parser now supports type annotations in various places, there is one use it chokes on:

    result : List[HistoryLine] = []

BTW, thank you for this great work!

hpython.nix has version 0.2

this is a line in hpython.nix:

version = "0.2";

however, hackage show that the latest versions is 0.3. I am new to nix so what is happening here?

Python parsing edge-case `"""`

"""""" is not a syntax error
""""""" is a syntax error
""" """ is not a syntax error
""" """" is a syntax error
""" " """ is not a syntax error

Top-level Validation module

There should be a Language.Python.Validation module that exports key things from the Validation tree. As things are now, I don't know where to get started with validation, and there's nowhere I can document how the pieces fit together.

Whitespace causing expressions to no longer be correct-by-construction

Valid forms of not expressions in Python:

not(True)
not True
not""
not{}
not 1

Invalid Python:

notTrue
not1

Here's a simplified form of the AST:

data NotExpr
  = NotExprOne KNot [Whitespace] NotExpr
  | NotExprNone Comparison

data Symbol = SGt | SLt | ...

data Comparison
  = Gt Expr [Whitespace] SGt [Whitespace] Expr
  | Lt Expr [Whitespace] SLt [Whitespace] Expr
  | ...

data Expr = Number Int | True | False | None | String [Char] | Parens Expr | ...

This permits all the valid uses of not, but also some invalid uses. For example, one
could write NotExprOne KNot [] (NotExprOne KNot [] ..., which would result in
notnot....

The true whitespace rule for not is this:

Given (NotExprOne a b c): If print(c) begins with an identifier character, then 'b' should not be empty.

If we were to encode this with types, it might look like this:

data NotExpr'
  = NotExprNone' Comparison'

data NotExpr
  = NotExprOne KNot (NonEmpty Whitespace) NotExpr
  | NotExprOne' KNot [Whitespace] Comparison'
  | NotExprNone Comparison

-- | Comparison' will not begin with an identifier character when printed
data Comparison'
  = Gt' Expr' [Whitespace] SGt [Whitespace] Expr
  | Lt' Expr' [Whitespace] SLt [Whitespace] Expr

-- | Expr' will not begin with an identifier character when printed
data Expr' = Parens' (Either Expr Expr') | String' [Char]

-- | Comparison will begin with an identifier character when printed
data Comparison
  = Gt Expr [Whitespace] SGt [Whitespace] Expr
  | Lt Expr [Whitespace] SLt [Whitespace] Expr

-- | Expr will begin with an identifier character when printed
data Expr = Number Int | True | False | None

We split terminals into two types- one that will begin with identifier characters when printed, and one that won't. Then we have to propagate this change to all the non-terminals. It essentially doubles the number of types and data constructors. A lot of this grammar https://docs.python.org/3.5/reference/grammar.html has to be duplicated to encode this. I'm skeptical of this approach due to the amount of code it requires.


Ideally I still want to have simple prisms on all these types to provide a good user interface.
Now it would have to look something like this:

class FromNot s ws | s -> ws where
  _Not :: Prism' NotExpr (KNot, ws, s)

instance FromNot NotExpr (NonEmpty Whitespace) where
  _Not = -- match on NotExprOne

instance FromNot Comparison' [Whitespace] where
  _Not = -- match on NotExprOne'

class FromComparison comp expr | comp -> expr where
  _Gt :: Prism' comp (expr, [Whitespace], SGt, [Whitespace], Expr)
  _Lt :: Prism' comp (expr, [Whitespace], SLt, [Whitespace], Expr)

instance FromComparison Comparison' Expr' where
  _Gt = -- match on Gt'
  _Lt = -- match on Lt'

instance FromComparison Comparison Expr where
  _Gt = -- match on Gt
  _Lt = -- match on Lt

instance FromComparison NotExpr Expr where
  _Gt = -- match on NotExprNone then Gt
  _Lt = -- match on NotExprNone then Lt

instance FromComparison NotExpr' Expr' where
  _Gt = -- match on NotExprNone' then Gt'
  _Lt = -- match on NotExprNone' then Lt'

Following imports

Eventually I want to be able to parse and validate imported modules. Not going to do it yet, but I'm going to do some brainstorming here.

https://docs.python.org/3.5/reference/import.html#searching

  • We'll have to duplicate the "finders and loaders" logic
  • Should probably make import awareness work for calls to importlib.import_module as well as import statements
  • Need to warn about importing inside control flow, as we can't give useful guarantees

Smart constructors preclude useful optics

There are a bunch of syntax elements that are created using runtime validation, because a type-based correct-by-construction representation is too complex to be useful. All of these runtime-validated types are created using the smart constructor pattern. The problem is that smart constructors prevent helpful prisms.

Here's an example:

module NoNumbers (NoNumbers, mkNoNumbers, _NoNumbers) where

newtype NoNumbers = NoNumbers { unNoNumbers :: String }

mkNoNumbers :: String -> Maybe NoNumbers
mkNoNumbers s
  | any isDigit s = Nothing
  | otherwise = Just $ NoNumbers s

_NoNumbers :: Prism' NoNumbers String
_NoNumbers = prism NoNumbers (Just . unNoNumbers)

_NoNumbers is obviously wrong. review _NoNumbers ~ NoNumbers, which we were trying to hide in the first place.

How do we fix this?

  1. `_NoNumbers :: Prism' (Maybe NoNumbers) String

    Now, review _NoNumbers :: String -> Maybe NoNumbers, which is mkNoNumbers. The downside is
    that preview _NoNumbers :: Maybe NoNumbers -> Maybe String, which will be a pain when chaining
    prisms.

  2. _NoNumbers :: Prism NoNumbers (Maybe NoNumbers) String String

Now review _NoNumbers :: String -> Maybe NoNumbers and preview _NoNumbers :: NoNumbers -> Maybe String. We can always get a string out, so let's downgrade to Iso

  1. _NoNumbers :: Iso NoNumbers (Maybe NoNumbers) String String

view _NoNumbers :: NoNumbers -> String
view (from _NoNumbers) :: String -> Maybe NoNumbers

Now that we have an accurate optic, let's see how it fares in nested updates

{-# language TemplateHaskell #-}
module Test where

import Control.Lens
import NoNumbers       

testMkThing = Thing "hello" <$> ("goodbye" ^. getting (from _NoNumbers)) 
testSet =                                                                                                                                          
  let                                                                                                                                             
    a = testMkThing1 ^?! _Just                                                                       
  in                                                                                                                                                       
    a & traverseOf thingNoNumbers (_NoNumbers .~ "goodbye1") 

It turns out that in this context, an Iso is a burden. We could have achieve the same outcome with less code by using mkNoNumbers and unNoNumbers on their own:

testMkThing = Thing "hello" <$> mkNoNumbers "goodbye"
testSet =                                                                                                                                          
  let                                                                                                                                             
    a = testMkThing1 ^?! _Just                                                                       
  in                                                                                                                                                       
    a & traverseOf thingNoNumbers (\_ -> mkNoNumbers "goodbye1") 

Parser/Decoder leaks memory

<!> needs to try parsing the left side before to know whether or not it needs the right, so we will use way too much memory for cases where the parser takes the left side of many <!>s.

For example

exprListComp :: Parser ann Whitespace -> Parser ann (Expr '[] ann)
exprListComp ws = do
  ex <- exprOrStar ws
  (\cf ->
      Generator (ex ^. exprAnnotation) .
      Comprehension (ex ^. exprAnnotation) ex cf) <$>
    compFor <*>
    many (Left <$> compFor <!> Right <$> compIf)
    <!>
    (\case
       ([], Nothing) -> ex
       ([], Just ws) ->
         Tuple (ex ^. exprAnnotation) ex ws Nothing
       ((ws, ex') : cs, mws) ->
         Tuple (ex ^. exprAnnotation) ex ws . Just $ (ex', cs, mws) ^. _CommaSep1') <$>
    commaSepRest (exprOrStar anySpace)

will cause an out of memory error for

(()for a in()for a in()for a in()if{**()}for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in ())

For this to work, <!> needs to be able to determine which side to take based on the current parse state, ala LL/LR(k).

`locals()` and `globals()` messes with scope checking

There are three mutable, globally accessible dicts called locals() and globals(), and vars(). When these dicts are modified, new variables are brought into scope. Warn about these usages. Warnings should only occur when it's the built-in definitions that are accessed. If a variable called globals is introduced, shadowing the original, then there's no issue.

How to implement decorators parsing?

I am trying to implement parsing for python 3 decorators and struggling (see http://python-3-patterns-idioms-test.readthedocs.io/en/latest/PythonDecorators.html for details...).

What I have so far is that I've defined:

data Decorator v a 
  = Decorator (Indents a) (Ident v a)
  -- ^ '@' <ident>
  deriving (Eq, Show, Functor, Foldable, Traversable)

The Decorator is parsed using a TkAt token that contains the string for the decorator's name (I am only tackling argument less decorators ATM):

    parseDecoratorIdentifier :: Parser ann (Ident '[] ann)
    parseDecoratorIdentifier = do
      curTk <- currentToken
      case curTk of
        TkAt s ann -> do
          Parser $ consumed ann
          pure $ MkIdent ann s []
        _ -> Parser . throwError $ ExpectedIdentifier curTk

    maybeDecorator :: Parser ann (Maybe (Decorator '[] ann))
    maybeDecorator = optional (Decorator <$> 
      indents <*> parseDecoratorIdentifier)

Then I've augmented FunDef with a Maybe (Decorator v a) field.

compoundStatement :: Parser ann (CompoundStatement '[] ann)
compoundStatement =
  fundef <!> ....
    where
     fundef =
      (\dec a (tkDef, defSpaces) -> Fundef dec a (pyTokenAnn tkDef) (NonEmpty.fromList defSpaces)) <$>
      maybeDecorator <*>
      indents <*>
      token space (TkDef ()) <*>
.....

Now when I try to parse "@decorate\ndef fun(a:str) -> int:\n return 1" I got the following error:

     ┏━━ test/Helpers.hs ━━━
    40 ┃ doParse :: (Show ann, Monad m) => ann -> Parser ann a -> Nested ann -> PropertyT m a
    41 ┃ doParse initial pa input = do
    42 ┃   let res = runParser initial pa input
    43 ┃   case res of
    44 ┃     Left err -> do
    45 ┃       annotateShow err
       ┃       │ UnexpectedEndOfLine (Caret (Columns 0 0) "@decorate\n")
    46 ┃       failure
       ┃       ^^^^^^^
    47 ┃     Right a -> pure a

So the decorator needs to handle the newline and be at the same indentation level than the def it is attached to. How do I implement that in hpython?

`else_` in the DSL isn't actually usable

if_, for_, and while_ all return fully-formed Statements, but else_ can only operate on types that have HasElse. Additionally, the signature of else_ requires its input to be something with an 'else-shaped' hole in it, but the combinators from above produce hole-less things.

I currently see two options: make else_ operate on more things, so it can modify the fully-formed output of if_ et al, or change the control flow combinators to output their pre-Statement types (like For and If), so that else_ can modify those. I like the second option more because it means the types force us to write code that is reflected by the generated Python. If I allow else_ to modify Statements, then we're allowed to write redundant applications of else_ to and old statement.

Example of Validation

It would help learners if there were example of validation in the examples directory.

Treatment of `async` and `await`

async and await are not considered reserved keywords. async is an identifier unless followed by def, defining a function definition. await is considered an identifer unless used inside an async def function, in which case it is a keyword.

Error explainer

Currently if you validate python code then you just get a list of errors. It would be good to be able to translate that list into a human readable representation. The output would give a succinct explanation of the error, and use the original source file + the annotations in the error to pull out relevant areas of the code so we can pinpoint the bits that caused the error.

For example, if we validate this code:

def a():
    x = 1
    y = x + b
    return y

we could get an error message like:

Line 3: 'b' is not in scope
2 |     x = 1
3 |     y = x + b
                ^
4 |     return y

More robust treatment of `globals`, `nonlocals`, and `del`

Currently we warn about all usages of globals, nonlocals, and del. In the general case they interfere with scope checking, but there is a subset of usages that are safe.

We only really need to warn when we need to do extra computation to see if the scope should be modified. Usages at the top-level and in unconditional control flow (like try, with) would be okay, but inside function calls and if statements are bad.

Function definitions with default values are incorrectly parsed

When parsing the following definition signature:

def foo(y : str, x : int = 0) -> bool:
   ...

hpython generates the following parameters list:

[KeywordParam {_paramAnn = Ann {getAnn = SrcInfo {_srcInfoName = "data/test/contractWithSchema.py", _srcInfoLineStart = 50, _srcInfoLineEnd = 50, _srcInfoColStart = 9, _srcInfoColEnd = 10, _srcInfoOffsetStart = 1170, _srcInfoOffsetEnd = 1171}}, _paramName = MkIdent {_identAnn = Ann {getAnn = SrcInfo {_srcInfoName = "data/test/contractWithSchema.py", _srcInfoLineStart = 50, _srcInfoLineEnd = 50, _srcInfoColStart = 9, _srcInfoColEnd = 10, _srcInfoOffsetStart = 1170, _srcInfoOffsetEnd = 1171}}, _identValue = "y", _identWhitespace = [Space,Space]}, _paramType = Nothing, _unsafeKeywordParamWhitespaceRight = [Space], _unsafeKeywordParamExpr = Int {_exprAnn = Ann {getAnn = SrcInfo {_srcInfoName = "data/test/contractWithSchema.py", _srcInfoLineStart = 50, _srcInfoLineEnd = 50, _srcInfoColStart = 14, _srcInfoColEnd = 15, _srcInfoOffsetStart = 1175, _srcInfoOffsetEnd = 1176}}, _unsafeIntValue = IntLiteralDec {_intLiteralAnn = Ann {getAnn = SrcInfo {_srcInfoName = "data/test/contractWithSchema.py", _srcInfoLineStart = 50, _srcInfoLineEnd = 50, _srcInfoColStart = 14, _srcInfoColEnd = 15, _srcInfoOffsetStart = 1175, _srcInfoOffsetEnd = 1176}}, _unsafeIntLiteralDecValue = DecDigit0 :| []}, _unsafeIntWhitespace = []}}]

eg. the first parameter is given a type of Nothing and assigned the literal expression from the second argument.

However the following works as expected

def other_foo(x : int = 0, y : str) -> bool:
   ...

When CR and LF are adjacent in the syntax tree they appear as CRLF

Def "a" NoArgs {- colon -} [Space, Continued CR []] (Just LF) block

If you create a value like this, then you won't be able to parse the result of pretty printing it, because the CR and LF are rendered adjacent in the string and will be considered a CRLF. You'll end up with a Continued CRLF [], the parser will look for a newline token to start the block and it'll choke.

The "simplest" way to fix this would to be detecting it during syntax checking.

Parsing and handling of docstrings (in addition to comments)

As far as I can tell hpython only parses standard Python comments:

parseComment :: (CharParsing m, Monad m) => m (SrcInfo -> PyToken SrcInfo)
parseComment =
(\a b -> TkComment (MkComment (Ann b) a)) <$ char '#' <*>
many (satisfy (`notElem` ['\r', '\n']))

Is there any reason that it doesn't support parsing of docstrings, or is it just not a use case that QFPL hasn't had need for yet (i.e. would a PR be welcomed, or potentially something to add as a milestone for a future release)?

Improve subscripting

Subscript slicing desugars to a slice object, so all subscripts are actually a single expression. The comma-separatedness helps form a tuple. Figure out how to make this all line up.

Is "syntactically correct by construction" worth it?

Until now, the approach to the AST has been "as correct-by-construction as possible, falling back to smart constructors when necessary". Since we've been getting closer to a complete representation, I have been considering another approach that gets similar levels of safety but permits a more elegant library design, and potentially better user experience.

Here it is, applied to a very small AST:

{-# language DataKinds, PolyKinds, LambdaCase, ViewPatterns #-}
module AST (AST, Val(..), _Int, _Add, _Assign, unvalidate, validate) where

import Control.Lens
import Data.Coerce

data Val = UV | V

data AST (a :: Val)
  = Int Int
  | Add (AST a) (AST a)
  | Assign String (AST a)
  deriving (Eq, Show)

_Int :: Prism (AST a) (AST UV) Int Int
_Int =
  prism
    Int
    (\case; (unvalidate -> Int a) -> Right a; (unvalidate -> a) -> Left a)

_Add :: Prism (AST a) (AST UV) (AST UV, AST UV) (AST UV, AST UV)
_Add =
  prism
    (uncurry Add)
    (\case; (unvalidate -> Add a b) -> Right (a, b); (unvalidate -> a) -> Left a)

_Assign :: Prism (AST a) (AST UV) (String, AST UV) (String, AST UV)
_Assign =
  prism
    (uncurry Assign)
    (\case; (unvalidate -> Assign a b) -> Right (a, b); (unvalidate -> a) -> Left a)

unvalidate :: AST a -> AST UV
unvalidate = coerce

validate :: AST UV -> Maybe (AST V)
validate (Int a) = Just $ coerce $ Int a
validate (Add a b) =
  fmap coerce $
  Add <$> (coerce <$> validate a) <*> (coerce <$> validate b)
validate (Assign a b)
  | a == "bad" = Nothing
  | otherwise =
    fmap coerce $
    Assign a <$> (coerce <$> validate b)

In this approach, I use a phantom type to indicate that the AST has been validated. Due to the types of the prisms, only unvalidated terms can be constructed, but terms of any validation status can be matched on. This means we can use the same data structure to represent validated and unvalidated terms. The current codebase has two distinct datatypes (with a lot of duplication). The other consequence is that all syntax-correction checking is moved to run-time. I don't believe this is a bad thing, considering the amount of checks that are already performed at run-time.

This pattern also fixes the optics problem that is demonstrated in #17.

There is a small "safety" flaw with this approach, that a user can just use coerce to skip the validation stage. Currently I think that's an okay trade-off.

Revise the treatment of starred parameters

I think that we can remove the "TypedUnnamedStarredParam" (or whatever it's called) error by using
the correct data structure for the contents of a starred parameter.

Warn for large constant exponentiations

Python will try to constant-fold expressions like 100 ** 123456789, which takes a very long time. Output a warning when this is encountered.

For extra credit, write a refactor rule that will change such occurrences to

a = 123456789
100 ** a

Conversion to `NonEmpty` rather than list for types which should do it naturally

Could we get Foldable1 on types which can implement it.
For instance for Block it would be nice to be able to convert it to NonEmpty Statement rather than [Statement]. I have very little idea about lenses but I understand that in order to use toNonEmptyOf the type should implement Foldable1 and/or Traversable1 rather then Foldable as it does now.

The same I guess applies to CommaSep1 and CommaSep1'

f-string support

It would be awesome to support python3.6 f-strings, introduced here. For example:

name = 'paul'
message =  'hello'
str = f'{name} says {message}'

Replace `Trie` with `Map Text`

bytestring-trie is a blocker in getting support for newer GHC versions, and I'm not sure that we care about blazing fast scope analysis quite yet.

isPythonic

Is it possible to write this function?

isPythonic :: PythonAST -> Bool

Implement pretty errors for megaparsec and benchmark them

In the megaparsec-strict branch I have ported the lexer to megaparsec over strict text. It's 10%+ faster than the trifecta version, but it isn't doing the extra work to be able to produce the clang-style errors. It's possible that implementing this in megaparsec would make it slower than the trifecta version.

Let's find out.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.