qfpl / hpython Goto Github PK
View Code? Open in Web Editor NEWHaskell language tools for Python
License: Other
Haskell language tools for Python
License: Other
While parser now supports type annotations in various places, there is one use it chokes on:
result : List[HistoryLine] = []
BTW, thank you for this great work!
this is a line in hpython.nix:
version = "0.2";
however, hackage show that the latest versions is 0.3. I am new to nix so what is happening here?
Are there any plans to support python 3.6/3.7 in the near term?
""""""
is not a syntax error
"""""""
is a syntax error
""" """
is not a syntax error
""" """"
is a syntax error
""" " """
is not a syntax error
There should be a Language.Python.Validation
module that exports key things from the Validation tree. As things are now, I don't know where to get started with validation, and there's nowhere I can document how the pieces fit together.
See the travis logs for the latest PR
In a recent deployment we generated lambda a=None,_=None,_=None:None
with the 'correct expression generator', which gets a 'DuplicateArguments' syntax error.
This AST:
AtomExprNoAwait
(AtomInteger
(IntegerDecimal (Left (NonZeroDigit_1, [])) ())
())
(Compose
[ Compose . Before [] $
TrailerAccess
(Compose $ Before [] (Identifier "a" ()))
()
])
()
serializes to 1.a
, which causes a SyntaxError
. This is probably because it tries to parse it as a float.
Valid forms of not
expressions in Python:
not(True)
not True
not""
not{}
not 1
Invalid Python:
notTrue
not1
Here's a simplified form of the AST:
data NotExpr
= NotExprOne KNot [Whitespace] NotExpr
| NotExprNone Comparison
data Symbol = SGt | SLt | ...
data Comparison
= Gt Expr [Whitespace] SGt [Whitespace] Expr
| Lt Expr [Whitespace] SLt [Whitespace] Expr
| ...
data Expr = Number Int | True | False | None | String [Char] | Parens Expr | ...
This permits all the valid uses of not
, but also some invalid uses. For example, one
could write NotExprOne KNot [] (NotExprOne KNot [] ...
, which would result in
notnot...
.
The true whitespace rule for not
is this:
Given (NotExprOne a b c): If print(c) begins with an identifier character, then 'b' should not be empty.
If we were to encode this with types, it might look like this:
data NotExpr'
= NotExprNone' Comparison'
data NotExpr
= NotExprOne KNot (NonEmpty Whitespace) NotExpr
| NotExprOne' KNot [Whitespace] Comparison'
| NotExprNone Comparison
-- | Comparison' will not begin with an identifier character when printed
data Comparison'
= Gt' Expr' [Whitespace] SGt [Whitespace] Expr
| Lt' Expr' [Whitespace] SLt [Whitespace] Expr
-- | Expr' will not begin with an identifier character when printed
data Expr' = Parens' (Either Expr Expr') | String' [Char]
-- | Comparison will begin with an identifier character when printed
data Comparison
= Gt Expr [Whitespace] SGt [Whitespace] Expr
| Lt Expr [Whitespace] SLt [Whitespace] Expr
-- | Expr will begin with an identifier character when printed
data Expr = Number Int | True | False | None
We split terminals into two types- one that will begin with identifier characters when printed, and one that won't. Then we have to propagate this change to all the non-terminals. It essentially doubles the number of types and data constructors. A lot of this grammar https://docs.python.org/3.5/reference/grammar.html has to be duplicated to encode this. I'm skeptical of this approach due to the amount of code it requires.
Ideally I still want to have simple prisms on all these types to provide a good user interface.
Now it would have to look something like this:
class FromNot s ws | s -> ws where
_Not :: Prism' NotExpr (KNot, ws, s)
instance FromNot NotExpr (NonEmpty Whitespace) where
_Not = -- match on NotExprOne
instance FromNot Comparison' [Whitespace] where
_Not = -- match on NotExprOne'
class FromComparison comp expr | comp -> expr where
_Gt :: Prism' comp (expr, [Whitespace], SGt, [Whitespace], Expr)
_Lt :: Prism' comp (expr, [Whitespace], SLt, [Whitespace], Expr)
instance FromComparison Comparison' Expr' where
_Gt = -- match on Gt'
_Lt = -- match on Lt'
instance FromComparison Comparison Expr where
_Gt = -- match on Gt
_Lt = -- match on Lt
instance FromComparison NotExpr Expr where
_Gt = -- match on NotExprNone then Gt
_Lt = -- match on NotExprNone then Lt
instance FromComparison NotExpr' Expr' where
_Gt = -- match on NotExprNone' then Gt'
_Lt = -- match on NotExprNone' then Lt'
Eventually I want to be able to parse and validate imported modules. Not going to do it yet, but I'm going to do some brainstorming here.
https://docs.python.org/3.5/reference/import.html#searching
importlib.import_module
as well as import statementsIt definitely should
There are a bunch of syntax elements that are created using runtime validation, because a type-based correct-by-construction representation is too complex to be useful. All of these runtime-validated types are created using the smart constructor pattern. The problem is that smart constructors prevent helpful prisms.
Here's an example:
module NoNumbers (NoNumbers, mkNoNumbers, _NoNumbers) where
newtype NoNumbers = NoNumbers { unNoNumbers :: String }
mkNoNumbers :: String -> Maybe NoNumbers
mkNoNumbers s
| any isDigit s = Nothing
| otherwise = Just $ NoNumbers s
_NoNumbers :: Prism' NoNumbers String
_NoNumbers = prism NoNumbers (Just . unNoNumbers)
_NoNumbers
is obviously wrong. review _NoNumbers ~ NoNumbers
, which we were trying to hide in the first place.
How do we fix this?
`_NoNumbers :: Prism' (Maybe NoNumbers) String
Now, review _NoNumbers :: String -> Maybe NoNumbers
, which is mkNoNumbers
. The downside is
that preview _NoNumbers :: Maybe NoNumbers -> Maybe String
, which will be a pain when chaining
prisms.
_NoNumbers :: Prism NoNumbers (Maybe NoNumbers) String String
Now review _NoNumbers :: String -> Maybe NoNumbers
and preview _NoNumbers :: NoNumbers -> Maybe String
. We can always get a string out, so let's downgrade to Iso
_NoNumbers :: Iso NoNumbers (Maybe NoNumbers) String String
view _NoNumbers :: NoNumbers -> String
view (from _NoNumbers) :: String -> Maybe NoNumbers
Now that we have an accurate optic, let's see how it fares in nested updates
{-# language TemplateHaskell #-}
module Test where
import Control.Lens
import NoNumbers
testMkThing = Thing "hello" <$> ("goodbye" ^. getting (from _NoNumbers))
testSet =
let
a = testMkThing1 ^?! _Just
in
a & traverseOf thingNoNumbers (_NoNumbers .~ "goodbye1")
It turns out that in this context, an Iso
is a burden. We could have achieve the same outcome with less code by using mkNoNumbers
and unNoNumbers
on their own:
testMkThing = Thing "hello" <$> mkNoNumbers "goodbye"
testSet =
let
a = testMkThing1 ^?! _Just
in
a & traverseOf thingNoNumbers (\_ -> mkNoNumbers "goodbye1")
<!>
needs to try parsing the left side before to know whether or not it needs the right, so we will use way too much memory for cases where the parser takes the left side of many <!>
s.
For example
exprListComp :: Parser ann Whitespace -> Parser ann (Expr '[] ann)
exprListComp ws = do
ex <- exprOrStar ws
(\cf ->
Generator (ex ^. exprAnnotation) .
Comprehension (ex ^. exprAnnotation) ex cf) <$>
compFor <*>
many (Left <$> compFor <!> Right <$> compIf)
<!>
(\case
([], Nothing) -> ex
([], Just ws) ->
Tuple (ex ^. exprAnnotation) ex ws Nothing
((ws, ex') : cs, mws) ->
Tuple (ex ^. exprAnnotation) ex ws . Just $ (ex', cs, mws) ^. _CommaSep1') <$>
commaSepRest (exprOrStar anySpace)
will cause an out of memory error for
(()for a in()for a in()for a in()if{**()}for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in()for a in ())
For this to work, <!>
needs to be able to determine which side to take based on the current parse state, ala LL/LR(k).
There are three mutable, globally accessible dicts called locals()
and globals()
, and vars()
. When these dicts are modified, new variables are brought into scope. Warn about these usages. Warnings should only occur when it's the built-in definitions that are accessed. If a variable called globals
is introduced, shadowing the original, then there's no issue.
I am trying to implement parsing for python 3 decorators and struggling (see http://python-3-patterns-idioms-test.readthedocs.io/en/latest/PythonDecorators.html for details...).
What I have so far is that I've defined:
data Decorator v a
= Decorator (Indents a) (Ident v a)
-- ^ '@' <ident>
deriving (Eq, Show, Functor, Foldable, Traversable)
The Decorator
is parsed using a TkAt
token that contains the string for the decorator's name (I am only tackling argument less decorators ATM):
parseDecoratorIdentifier :: Parser ann (Ident '[] ann)
parseDecoratorIdentifier = do
curTk <- currentToken
case curTk of
TkAt s ann -> do
Parser $ consumed ann
pure $ MkIdent ann s []
_ -> Parser . throwError $ ExpectedIdentifier curTk
maybeDecorator :: Parser ann (Maybe (Decorator '[] ann))
maybeDecorator = optional (Decorator <$>
indents <*> parseDecoratorIdentifier)
Then I've augmented FunDef
with a Maybe (Decorator v a)
field.
compoundStatement :: Parser ann (CompoundStatement '[] ann)
compoundStatement =
fundef <!> ....
where
fundef =
(\dec a (tkDef, defSpaces) -> Fundef dec a (pyTokenAnn tkDef) (NonEmpty.fromList defSpaces)) <$>
maybeDecorator <*>
indents <*>
token space (TkDef ()) <*>
.....
Now when I try to parse "@decorate\ndef fun(a:str) -> int:\n return 1"
I got the following error:
┏━━ test/Helpers.hs ━━━
40 ┃ doParse :: (Show ann, Monad m) => ann -> Parser ann a -> Nested ann -> PropertyT m a
41 ┃ doParse initial pa input = do
42 ┃ let res = runParser initial pa input
43 ┃ case res of
44 ┃ Left err -> do
45 ┃ annotateShow err
┃ │ UnexpectedEndOfLine (Caret (Columns 0 0) "@decorate\n")
46 ┃ failure
┃ ^^^^^^^
47 ┃ Right a -> pure a
So the decorator needs to handle the newline and be at the same indentation level than the def
it is attached to. How do I implement that in hpython
?
The grammar has the names the wrong way around which is why I made this mistake.
if_
, for_
, and while_
all return fully-formed Statement
s, but else_
can only operate on types that have HasElse
. Additionally, the signature of else_
requires its input to be something with an 'else-shaped' hole in it, but the combinators from above produce hole-less things.
I currently see two options: make else_
operate on more things, so it can modify the fully-formed output of if_
et al, or change the control flow combinators to output their pre-Statement
types (like For
and If
), so that else_
can modify those. I like the second option more because it means the types force us to write code that is reflected by the generated Python. If I allow else_
to modify Statement
s, then we're allowed to write redundant applications of else_
to and old statement.
It would help learners if there were example of validation in the examples directory.
async
and await
are not considered reserved keywords. async
is an identifier unless followed by def
, defining a function definition. await
is considered an identifer unless used inside an async def
function, in which case it is a keyword.
It would be nice if you could derive this generically rather than writing it out by hand.
https://hackage.haskell.org/package/digit
We will probably want to add binary, octal and hex digits to that package too.
I think it probably should
Currently if you validate python code then you just get a list of errors. It would be good to be able to translate that list into a human readable representation. The output would give a succinct explanation of the error, and use the original source file + the annotations in the error to pull out relevant areas of the code so we can pinpoint the bits that caused the error.
For example, if we validate this code:
def a():
x = 1
y = x + b
return y
we could get an error message like:
Line 3: 'b' is not in scope
2 | x = 1
3 | y = x + b
^
4 | return y
As part of fixing the nix situation, we need to upgrade to the lastest version of digit.
Currently we warn about all usages of globals
, nonlocals
, and del
. In the general case they interfere with scope checking, but there is a subset of usages that are safe.
We only really need to warn when we need to do extra computation to see if the scope should be modified. Usages at the top-level and in unconditional control flow (like try
, with
) would be okay, but inside function calls and if
statements are bad.
When parsing the following definition signature:
def foo(y : str, x : int = 0) -> bool:
...
hpython generates the following parameters list:
[KeywordParam {_paramAnn = Ann {getAnn = SrcInfo {_srcInfoName = "data/test/contractWithSchema.py", _srcInfoLineStart = 50, _srcInfoLineEnd = 50, _srcInfoColStart = 9, _srcInfoColEnd = 10, _srcInfoOffsetStart = 1170, _srcInfoOffsetEnd = 1171}}, _paramName = MkIdent {_identAnn = Ann {getAnn = SrcInfo {_srcInfoName = "data/test/contractWithSchema.py", _srcInfoLineStart = 50, _srcInfoLineEnd = 50, _srcInfoColStart = 9, _srcInfoColEnd = 10, _srcInfoOffsetStart = 1170, _srcInfoOffsetEnd = 1171}}, _identValue = "y", _identWhitespace = [Space,Space]}, _paramType = Nothing, _unsafeKeywordParamWhitespaceRight = [Space], _unsafeKeywordParamExpr = Int {_exprAnn = Ann {getAnn = SrcInfo {_srcInfoName = "data/test/contractWithSchema.py", _srcInfoLineStart = 50, _srcInfoLineEnd = 50, _srcInfoColStart = 14, _srcInfoColEnd = 15, _srcInfoOffsetStart = 1175, _srcInfoOffsetEnd = 1176}}, _unsafeIntValue = IntLiteralDec {_intLiteralAnn = Ann {getAnn = SrcInfo {_srcInfoName = "data/test/contractWithSchema.py", _srcInfoLineStart = 50, _srcInfoLineEnd = 50, _srcInfoColStart = 14, _srcInfoColEnd = 15, _srcInfoOffsetStart = 1175, _srcInfoOffsetEnd = 1176}}, _unsafeIntLiteralDecValue = DecDigit0 :| []}, _unsafeIntWhitespace = []}}]
eg. the first parameter is given a type of Nothing
and assigned the literal expression from the second argument.
However the following works as expected
def other_foo(x : int = 0, y : str) -> bool:
...
Def "a" NoArgs {- colon -} [Space, Continued CR []] (Just LF) block
If you create a value like this, then you won't be able to parse the result of pretty printing it, because the CR and LF are rendered adjacent in the string and will be considered a CRLF. You'll end up with a Continued CRLF []
, the parser will look for a newline token to start the block and it'll choke.
The "simplest" way to fix this would to be detecting it during syntax checking.
As far as I can tell hpython
only parses standard Python comments:
hpython/src/Language/Python/Internal/Lexer.hs
Lines 115 to 118 in ede398a
Is there any reason that it doesn't support parsing of docstrings, or is it just not a use case that QFPL hasn't had need for yet (i.e. would a PR be welcomed, or potentially something to add as a milestone for a future release)?
Subscript slicing desugars to a slice
object, so all subscripts are actually a single expression. The comma-separatedness helps form a tuple. Figure out how to make this all line up.
We rely on type-level-sets for these functions, but that package isn't actively maintained, so we might as well just re-implement them.
f(a=False,*b)
is valid Python, but is reported as a positional after keyword argument error
Until now, the approach to the AST has been "as correct-by-construction as possible, falling back to smart constructors when necessary". Since we've been getting closer to a complete representation, I have been considering another approach that gets similar levels of safety but permits a more elegant library design, and potentially better user experience.
Here it is, applied to a very small AST:
{-# language DataKinds, PolyKinds, LambdaCase, ViewPatterns #-}
module AST (AST, Val(..), _Int, _Add, _Assign, unvalidate, validate) where
import Control.Lens
import Data.Coerce
data Val = UV | V
data AST (a :: Val)
= Int Int
| Add (AST a) (AST a)
| Assign String (AST a)
deriving (Eq, Show)
_Int :: Prism (AST a) (AST UV) Int Int
_Int =
prism
Int
(\case; (unvalidate -> Int a) -> Right a; (unvalidate -> a) -> Left a)
_Add :: Prism (AST a) (AST UV) (AST UV, AST UV) (AST UV, AST UV)
_Add =
prism
(uncurry Add)
(\case; (unvalidate -> Add a b) -> Right (a, b); (unvalidate -> a) -> Left a)
_Assign :: Prism (AST a) (AST UV) (String, AST UV) (String, AST UV)
_Assign =
prism
(uncurry Assign)
(\case; (unvalidate -> Assign a b) -> Right (a, b); (unvalidate -> a) -> Left a)
unvalidate :: AST a -> AST UV
unvalidate = coerce
validate :: AST UV -> Maybe (AST V)
validate (Int a) = Just $ coerce $ Int a
validate (Add a b) =
fmap coerce $
Add <$> (coerce <$> validate a) <*> (coerce <$> validate b)
validate (Assign a b)
| a == "bad" = Nothing
| otherwise =
fmap coerce $
Assign a <$> (coerce <$> validate b)
In this approach, I use a phantom type to indicate that the AST has been validated. Due to the types of the prisms, only unvalidated terms can be constructed, but terms of any validation status can be matched on. This means we can use the same data structure to represent validated and unvalidated terms. The current codebase has two distinct datatypes (with a lot of duplication). The other consequence is that all syntax-correction checking is moved to run-time. I don't believe this is a bad thing, considering the amount of checks that are already performed at run-time.
This pattern also fixes the optics problem that is demonstrated in #17.
There is a small "safety" flaw with this approach, that a user can just use coerce
to skip the validation stage. Currently I think that's an okay trade-off.
For example, where we have Maybe (NonEmpty Whitespace)
we can instead have Maybe Async
I think that we can remove the "TypedUnnamedStarredParam" (or whatever it's called) error by using
the correct data structure for the contents of a starred parameter.
Python will try to constant-fold expressions like 100 ** 123456789
, which takes a very long time. Output a warning when this is encountered.
For extra credit, write a refactor rule that will change such occurrences to
a = 123456789
100 ** a
Could we get Foldable1
on types which can implement it.
For instance for Block
it would be nice to be able to convert it to NonEmpty Statement
rather than [Statement]
. I have very little idea about lenses but I understand that in order to use toNonEmptyOf
the type should implement Foldable1
and/or Traversable1
rather then Foldable
as it does now.
The same I guess applies to CommaSep1
and CommaSep1'
It would be awesome to support python3.6 f-strings, introduced here. For example:
name = 'paul'
message = 'hello'
str = f'{name} says {message}'
bytestring-trie
is a blocker in getting support for newer GHC versions, and I'm not sure that we care about blazing fast scope analysis quite yet.
Is it possible to write this function?
isPythonic :: PythonAST -> Bool
In the megaparsec-strict
branch I have ported the lexer to megaparsec over strict text. It's 10%+ faster than the trifecta version, but it isn't doing the extra work to be able to produce the clang-style errors. It's possible that implementing this in megaparsec would make it slower than the trifecta version.
Let's find out.
I cant build on nix
It would be good to give a commentary through the examples so that someone can pick up a lot about the library by reading the examples.
https://docs.python.org/3.5/reference/simple_stmts.html#the-del-statement
Currently, del
only supports Identifiers as targets.
Scope checking will need to be updated when this is completed, because I there are usages of del
that can be statically known not to interfere with scoping.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.