Giter Site home page Giter Site logo

elm-syntax's Introduction

elm-syntax

Elm Syntax in Elm: for parsing and writing Elm in Elm.

How does this work?

When Elm code is parsed, it's converted into an Abstract Syntax Tree (AST). The AST lets us represent the code in a way that's much easier to work with when programming.

Here's an example of that: Code: 3 + 4 * 2 AST:

OperatorApplication
    (Integer 3)
    "+"
    (OperatorApplication
        (Integer 4)
        "*"
        (Integer 2)
    )

Notice how it forms a tree structure where we first multiply together 4 and 2, and then add the result with 3. That's where the "tree" part of AST comes from.

Getting Started

import Elm.Parser
import Html exposing (Html)

src : String
src =
    """module Foo exposing (foo)

foo = 1
"""

parse : String -> String
parse input =
    case Elm.Parser.parseToFile input of
        Err e ->
            "Failed: " ++ Debug.toString e

        Ok v ->
            "Success: " ++ Debug.toString v

main : Html msg
main =
    Html.text (parse src)

Used in:

elm-syntax's People

Contributors

adeschamps avatar andys8 avatar ceddlyburge avatar janiczek avatar jfmengels avatar jiegillet avatar lue-bird avatar martinsstewart avatar matheus23 avatar michaeljones avatar ollef avatar phenax avatar rogeriochaves avatar siriusstarr avatar sparksp avatar stil4m avatar teodorofilippini avatar zwilias avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

elm-syntax's Issues

Provide a Node for the operator in OperatorApplication

An operator application is currently defined as:

OperatorApplication String InfixDirection (Node Expression) (Node Expression)

which gives you the node for the left and right expression, along with the infix direction and the operator which is being used.

This does not give you the position of the operator though.

There is an elm-review rule that forbids the uses of <|, but when this rule tries to report something, it can't point to <| specifically because it does not have the range.

Proposal:

  • Change the String to a Node String which would point to the exact
  • Remove the InfixDirection? If this is only for internal purposes, then like I mentioned in #58 (comment), it might be better to have an intermediate representation of the AST. If this is not for internal purposes, I have no clue what this is for ๐Ÿค”

Problem parsing type annotation with siblings params

elm-syntax is parsing type annotations like Dict String Int as if it were nested, like Dict (String Int)

I've written a failing test that you can add to tests/Elm/Parser/TypeAnnotationTests.elm to verify, but I couldn't fix the problem:

test "parse type with multiple params" <|
    \() ->
        parseFullStringWithNullState "Dict String Int" Parser.typeAnnotation
            |> Maybe.map noRangeTypeReference
            |> Expect.equal
                (Just
                    ( emptyRange
                    , Typed []
                        "Dict"
                        [ ( emptyRange, Typed [] "String" [] )
                        , ( emptyRange, Typed [] "Int" [] )
                        ]
                    )
                )

Elm.Parser incorrectly reports syntax errors

Calling Elm.Parser.parse on the Elm.Processing source file returns an Err result. This seems like a bug since if Elm.Processing really had a syntax error, I wouldn't be able to use this package in the first place.

Code used to reproduce this behavior
https://ellie-app.com/3NDQvRWHFrKa1

Details
The exact syntax error I get is
Err [{ col = 32, problem = ExpectingVariable, row = 190 },{ col = 32, problem = ExpectingVariable, row = 190 },{ col = 32, problem = ExpectingSymbol "\"", row = 190 },{ col = 32, problem = ExpectingSymbol "'", row = 190 },{ col = 32, problem = UnexpectedChar, row = 190 },{ col = 32, problem = ExpectingNumber, row = 190 },{ col = 32, problem = ExpectingSymbol "()", row = 190 },{ col = 32, problem = ExpectingSymbol "_", row = 190 },{ col = 32, problem = Expecting "{", row = 190 },{ col = 32, problem = Expecting "[", row = 190 },{ col = 32, problem = Expecting "(", row = 190 }]

which seems to point to (\( p, infix, s ) -> in this code within Elm.Processing

                    |> Maybe.map
                        (\( p, infix, s ) ->
                            OperatorApplication
                                (Node.value infix.operator)
                                (Node.value infix.direction)
                                (Node (Range.combine <| List.map Node.range p) (divideAndConquer p))
                                (Node (Range.combine <| List.map Node.range s) (divideAndConquer s))
                        )
                    |> Maybe.withDefault (fixExprs exps)

Simplify TypeAnnotation

I think the TypeAnnotation type could be simplified a bit.

elm/project-metadata-utils provides a different representation for the same data (without the Ranges).

In the following proposals, I will have a table of equivalences, where <thing> means that something is wrapped in a Node.

If you prefer discussing each one in different issues, let me know and I'll be happy to create them.

Removing Unit in favor of Tupled []

Elm code elm-syntax elm/project-metadata-utils
() Unit Tuple []
( a, b ) Tupled [ <GenericType "a">, <GenericType "b"> ] Tuple [ Var "a", Var "b" ]

I think that the Unit serves little purpose, and could be simplified to Tupled [], like what project-metadata-utils does.

Changing this doesn't make pattern matching harder, as you can still pattern match for Tupled [].

Fusing GenericRecord and Record

Elm code elm-syntax elm/project-metadata-utils
{ a : () } Record [ <( <"a">, <Unit> )> ] Record [ ( "a", Tuple [] ) ] Nothing
{ b | a : () } GenericRecord <"b"> [ <( <"a">, <Unit> )> ] Record [ ( "a", Tuple [] ) ] (Just "b")

elm/project-metadata-utils employs the same type for records and generic records, and uses a Maybe for the optional generic part.
Both Record and GenericRecord serve similar purposes, and in practice I think that we will analyse them in similar ways. Having a single type can help avoid duplication of code.

Again, this does not prevent pattern matching directly on generic and non-generic record, as you can still pattern match for Record _ Nothing and Record _ (Just _)

Changing this doesn't make pattern matching harder, as you can still pattern match for Tupled [].

Renaming Tupled to Tuple and Typed to Type

Elm code elm-syntax elm/project-metadata-utils
a GenericType "a" Var "a"
Elm.Syntax.Node.Node a Typed <( [ "Elm", "Syntax", "Node" ], "Node" )> [ <GenericType "a"> ] Type "Elm.Syntax.Node.Node" [ Var "a" ]

I think the terms chosen by project-metadata-utils are more like how people read them. I think people (at least I do) read type annotations like "It takes a type Foo and returns a tuple of String and Int", and not "It takes a typed Foo and returns a tupled String and Int.

I actually have a hard time figuring out how to read those types in elm-syntax.

Note: I think the Typed variant in elm-syntax is more useful than the Type from project-metadata-utils because splitting the module name from the function name can be very useful.

Rename GenericType to Var

Elm code elm-syntax elm/project-metadata-utils
a GenericType "a" Var "a"

This is entirely subjective, but I think Var is nicer, or at least it gives the same meaning but with a shorter name. I also think it would be nicer to have similar names for both projects.

Provide a ModuleName.toString function

I very often need to get a module name as a string, to report it to the user in a human readable format, or to compare it to the configuration that a user gave (I don't want to have to let them give it to me as the underlying type, they shouldn't have to know that).

So I have written String.join "." moduleName dozens of times in a lot of rules and elm-review code, sometimes defining it in a new function, sometimes not.

In v8, the ModuleName will be made available as a ( String, List String ) (which by the way, is not reflected in the ModuleName module) or a SomeVariant String (List String).

With that change, the code need to turn a module name into a string becomes more complex: String.join "." ( first :: rest ) which becomes quite annoying.

Proposal

Provide a ModuleName.toString function:

module ModuleName exposing (...)

toString : ( String, List String) -> String

I think the signature is better as ( String, List String) -> String than as String -> List String -> String because it is easier to do

case foo of
  SomeVariant first rest _ ->
    ModuleName.toString ( first, rest )

than to do the following, where you necessarily need the destructuring step:

let ( first, rest ) = getModuleNameFromContext something
in ModuleName.toString first rest

Side-notes / side-proposal

I think that ModuleName.ModuleName should reflect the new format of a module name that will come in v8.

Also, as I mentioned before, it would be helpful to provide a way to validate that a string corresponds to a module name, meaning that it would be nice for a user to be able to "parse" a module name

module ModuleName exposing (...)

fromString : String -> Maybe ( String, List String ) -- Or maybe a Result

Without something like this, I think that most people will only validate a module name by splitting the module name by "." or something or to create a crude regex, but that wouldn't necessarily match elm-syntax's implementation.

I thought the two proposals were kind of related, but I can split this into two issues if you prefer.

Potential use for Elm code generation?

In the last few weeks I have become interested the idea of generating Elm code from other languages. This is motivated by a couple of internal projects:

  • Sharing constants between Typescript & Elm. Having the official definitions in Typescript & being able to generate an Elm module that mirrored it would be useful.
  • Handling language translations written with ICU strings. Being able to generate an Elm module from the ICU strings that achieves the intended logic of the ICU syntax.

Due to whitespace being significant it seems like it is a hard task to easily export valid Elm code from a quickly written program. It seems like it might be easier to rely on generating some kind of JSON AST representation and then have a commonly used tool to do convert the JSON to valid Elm code.

I see you've commented on a related issue on elm-format and there is another issue about it too.

Do you believe that the JSON structure you have developed and your Elm.Writer module are in a good position to help with that? I'm keen to do some experiments. Do you think it would be useful to have 'elm-to-json-ast' and 'json-ast-to-elm' command line tools? And maybe some kind of Typescript API for generating the JSON AST? Potentially as separate projects if you'd like to keep this project clean and with a clear scope?

Thanks for the project. It is very cool to see.

Elm.Syntax.Import allows for impossible module aliases

type alias Import =
    { moduleName : Node ModuleName
    , moduleAlias : Maybe (Node ModuleName)
    , exposingList : Maybe (Node Exposing)
    }

The moduleAlias field uses ModuleName instead of String. This allows for aliases such as Module.Submodule which can't exist since it's a syntax error.

Improve documentation

  • Add samples how to get started
  • Describe how the syntax tree works.
  • (more suggestions).

Using the parser in CLI

Hi there - Could this project conceivably be used to write the AST of an elm file to standard-output as JSON? Or am I barking up the wrong tree?

Encode the AST as Bytes

As I mentioned in #55 (comment), elm-review parses every file in a project, and then caches the resulting AST by storing it on the disk. When it restarts, those files are then read to avoid having to parse the file again.

Over at my work project, we have 160k LoC over 600+ modules. When all of this gets cached, the combined disk space used for all these AST is about 39MB, which is a lot! (The raw source code is about 5.7MB big, FYI).

I think decoding and encoding the same data but using elm/bytes would reduce the amount of space taken. And since reading from disk is (relatively) slow, would speed startup time, also probably the time spent writing this data to disk.

I don't have hard data on how much space and time this would save, but I imagine it will be smaller several folds, as we will be able to store data much more compactly than with JSON.

Since elm-syntax's AST is not opaque at all, we can try this out in a separate package (or directly in elm-review for that matter), and potentially keep it there forever to avoid having elm/bytes as a dependency of elm-syntax if that is something we wish to avoid.
One of the problems for elm-review though, is that this data needs to be sent over a port, but ports don't support Bytes. A workaround I heard of is to

Anyway, I wanted to share this need/want of mine. I'll likely tackle this at some point unless someone beats me to it (I have other things to work on for a while ๐Ÿ˜… )

Can't parse declaration starting with 'as' after import

I was using elm-analyse (which is great by the way, thanks for that!), and ran into a parse failure. I think this is the appropriate place to file the issue.

Here's a small module that doesn't parse with elm-analyse:

module Test exposing (asSomething)

import Import


asSomething : Int
asSomething =
    123

If asSomething is renamed to something that doesn't start with as, it works, or if I do import Import as Import, so it seems that the import parser is greedily munching away on the as of the declaration when it expects a possible as.

Cheers!

Rename Expression variants

I think the names for the different expression variants can be made clearer, to make it easier for users to grasp.

FYI, at the moment, people who try elm-review tell me that the hardest part is working with elm-syntax (which does make sense, since that is a big part of the elm-review API ).

Application variant

Expression has a Application variant.

To this day, I find that name confusing and un-intuitive. I think that something like FunctionCall would be more understandable (to me at least), especially since with the changes proposed in #43, it would have one argument at the very least (that used to be true before, but this becomes more obvious to the user).

OperatorApplication variant

I think this one makes sense, but I feel like Operation would be more intuitive, although it could be not clear enough ๐Ÿค”

(By the way, maybe for a different issue, but does the InfixDirection argument bring any value here after post-processing?)

Operator variant

I have no clue what this represents in practice, I think this one should not exist.

Literals

There are several kinds of literals that are not consistent. I think each literal type could be suffixed with Literal:

  • CharLiteral: This one is already good ๐Ÿ‘
  • Floatable is a bit of an odd name I think. FloatLiteral is better IMO
  • ListExpr -> ListLiteral
  • For consistency, Integer -> IntLiteral, Hex -> HexLiteral
  • Lastly: Literal -> StringLiteral. It is not clear that Literal pertains to strings, so I think it is nice to explicit that.

Other renames

I think the names for expression variants can also be re-thought to be more consistent. A lot of them are suffixed with Expr, some with Expression, others with Block and some are not suffixed, leading to an inconsistent experience.

I think we can remove Expr/Expression from most, while still making a lot of sense.

  • IfBlock could be renamed to If (or at least IfExpr, since the Elm guide calls them if expressions)
  • UnitExpr could be renamed to Unit. I don't think there is any confusion here
  • TupledExpression -> Tuple. Although this could conflict with the proposal I made in #49 when people import everything with (..). So either we discourage doing so, or this could be named TupleExpr
  • ParenthesizedExpression -> Parenthesized? Not entirely sure about this one, but I think it makes sense
  • LetExpression -> Let (I personally like LetIn, but that's not what other people call those)
  • CaseExpression -> Case (I personally like CaseOf, but that's not what other people call those)
  • LambdaExpression -> Lambda
  • RecordUpdateExpression -> RecordUpdate
  • GLSLExpression -> GLSL?

Some context

FYI, I tend to import and use elm-syntax expressions like this:

import Elm.Syntax.Expression as Expression exposing (Expression)

foo = case Node.value node of
  Expression.RecordUpdateExpression -> ...

So that it becomes easier to know where a value comes from, which I feel can become overwhelming otherwise. When doing so, having names like Expression.RecordUpdateExpression becomes a mouthful that does not add any additional information.

I wouldn't be against suffixing some variants with Expr/Expression as they are now, but I am not sure that would bring a lot of value.

Operator table for built in operators

Since the set of infix operators is static in Elm 0.19, would it make sense to just always use a predefined operator table containing these (plus maybe the ones from Parser and List.::)?

I was working on creating some elm-review rules, but couldn't quite get them to work because infix expressions weren't parsed taking operator precedence into account.

Order comments by position

In elm-review, I currently sort comments by their start position.
I seem to remember that the comments were in a somewhat random order, but I can't reproduce anything else than just the reverse order.

Currently we have:

module TestModule exposing (..)

-- 1
a = ""
-- 2

which gives the following comment list:

[ Node <range> "-- 2"
, Node <range> "-- 1"
]

and I think that having it be in the reverse order will be more intuitive and useful:

[ Node <range> "-- 1"
, Node <range> "-- 2"
]

Side-note: I seem to understand that the way documentation is added in post-processing is by going through all comments and looking at ranges.
I think that having sorted them be sorted can lead to optimizations, like not going further down the list if the start range of the item to look for is before the current item's start range (In other words, using a recursive function rather than List.filter + List.head). I am not good at benchmarking, but that does sound faster to me than the current way, and sorting comments would help with that.

Breaking changes in JSON encoders and decoders

This is a continuation of #55 (comment)

I personally think, if we are going to make a breaking change to the JSON data, we should swap out the JSON encoders/decoders we have now with https://package.elm-lang.org/packages/miniBill/elm-codec/latest/ . This will cut the amount of code we need to write in half and greatly reduce the risk of writing encoders/decoders that fail to round trip the data.

The one disadvantage I see with this approach is that there might be some performance concerns. I've created an issue for that here miniBill/elm-codec#4 to explore if this is an actual problem or if in practice elm-codec is just as fast as normal encoders/decoders.

Use a variation of Pattern for function arguments

Currently, the arguments used in a function declaration are List Pattern. The problem is that there are many variants of Pattern that aren't valid when deconstructing an argument.

For example:

-- Here we have the IntPattern variant. This isn't valid syntax.
myFunction 2 = 
    ...

As far as I can tell, only these patterns variants are valid

AllPattern
UnitPattern
TuplePattern
RecordPattern
VarPattern
NamedPattern
AsPattern
ParenthesizedPattern

I think it makes sense then to have a new custom type (ArgumentPattern perhaps) that contains only these variants.

This would be used in Elm.Syntax.Declaration.Destructuring and Elm.Syntax.Expression.FunctionImplementation.

Range miscalculation when using If expressions

module TestModule exposing (..)

a =
  if cond then
    1
  else
    2
  
-- hello

{-| doc
-}
b = 3

In the code above, the range for a ends where b starts, meaning it will include the comment in between and the documentation for b.

If the if expression is removed, like here:

a =
  2
  
-- hello

{-| doc
-}
b = 3

then a's range seems to be correct.

This looks very similar to #63 which had the same problem but for let expressions.

Problem with non-ascii characters in variable names

I encountered a problem with elm-syntax while using elm-analyse on the elm-visualization package.

@jfmengels pointed out that the problem is caused by the variable names ฮต2 and ฯ.

Original issue opened here: stil4m/elm-analyse#242

Elm-analyse returns a FileLoadError (Could not load file due to: Unexpected parse error) for the module Zoom.Interpolation of elm-visualization.

https://github.com/gampleman/elm-visualization/blob/2.1.1/src/Zoom/Interpolation.elm

Remove FloatPattern from Elm.Syntax.Pattern.Pattern

Since Elm 0.19, pattern matching on float values is an error.

I cannot pattern match with floating point numbers:

42|     42.0 -> ()
        ^^^^
Equality on floats can be unreliable, so you usually want to check that they are
nearby with some sort of (abs (actual - expected) < 0.001) check.

I think it therefore makes sense to remove this pattern from the available pattern possibilities.

Use nonempty list to make impossible syntax impossible

There are a few places in elm-syntax where using List.Nonempty* instead of List could prevent needing to deal with impossible cases.

The ones I've been able to find are

  • Elm.Syntax.Expression.RecordUpdateExpression - Having no record setters would correspond to { a | } which is invalid syntax.
  • Elm.Syntax.Expression.Application - Not sure how to give an example of this one. If the list is empty then there's just nothing there right?
  • Elm.Syntax.ModuleName.ModuleName - Here I can understand why it's like this. Elm.Syntax.Expression.FunctionOrValue can have an empty ModuleName because functions can be local or exposed. However, in import statements, having a ModuleName that's just an empty list would look like this import exposing (..) which is invalid syntax. One solution is to have to two versions of ModuleName, one is a list, one is a nonempty list. Another solution is for FunctionOrValue to treat the last element in the nonempty list as the function/value name.
  • Elm.Syntax.Exposing.Explicit - Empty list results in exposing () which is a syntax error.

*By List.Nonempty I mean this package but an alternative could be to just have (a, List a).

Range miscalculation when using Let expressions

module TestModule exposing (..)

a =
  let
    c = 1
  in
  2
  
-- hello

b = 3

In the code above, the range for a is considered to be

{ start = { row = 3, column = 1 }
, end = { row = 11, column = 1 }
}

This is incorrect, as that would mean that a's range ends where b starts. Also, that means that removing the a node means deleting the comment in between a and b (which is how I noticed the problem).

If the let expression is removed, like here:

a =
  2
  
-- hello

b = 3

then a's range is correct and equal to

{ start = { row = 3, column = 1 }
, end = { row = 4, column = 4 }
}

Interactive Ellie with it at: https://ellie-app.com/99hdDGwS24La1

Elm.Syntax.Pattern.patternRange

This might be worth including in the package, similarly to Elm.Syntax.Exposing.topLevelExposeRange.

patternRange : Elm.Syntax.Pattern.Pattern -> Range
patternRange pattern =
    case pattern of
        Elm.Syntax.Pattern.AllPattern range ->
            range

        Elm.Syntax.Pattern.UnitPattern range ->
            range

        Elm.Syntax.Pattern.CharPattern _ range ->
            range

        Elm.Syntax.Pattern.StringPattern _ range ->
            range

        Elm.Syntax.Pattern.IntPattern _ range ->
            range

        Elm.Syntax.Pattern.FloatPattern _ range ->
            range

        Elm.Syntax.Pattern.TuplePattern _ range ->
            range

        Elm.Syntax.Pattern.RecordPattern _ range ->
            range

        Elm.Syntax.Pattern.UnConsPattern _ _ range ->
            range

        Elm.Syntax.Pattern.ListPattern _ range ->
            range

        Elm.Syntax.Pattern.VarPattern _ range ->
            range

        Elm.Syntax.Pattern.NamedPattern _ _ range ->
            range

        Elm.Syntax.Pattern.QualifiedNamePattern _ range ->
            range

        Elm.Syntax.Pattern.AsPattern _ _ range ->
            range

        Elm.Syntax.Pattern.ParenthesizedPattern _ range ->
            range

Simplification suggestion: Field `open` for Exposing.TypeExpose is always Just

When we parse the following code

import A exposing (B, C(..))

we'll get a Exposing.TypeOrAliasExpose "B" and a Exposing.TypeExpose { name = "C", open = Just <range> }.
It is however not possible to have a Exposing.TypeExpose { name = "C", open = Nothing }, and I therefore think the AST should be simplified.

I propose either

  1. Removing TypeOrAliasExpose and representing it as Exposing.TypeExpose { name = "B", open = Nothing }
  2. Changing Exposing.TypeExpose's open field to be a Range instead of a Maybe Range.

I'm leaning towards 1 personally at this moment.

Have a nice day! โค๏ธ

Pattern matching empty records fails

This is valid elm syntax:

module Anything exposing (main)

main {} = ()

This is also valid elm syntax:

module Anything exposing (main)

main = \{} -> ()

But parsing this using elm-syntax fails. I got this while trying to use elm-analyse for one of my projects.

Testcase


import Html
import Elm.Parser


src = """module Anything exposing (main)

main {} = ()
-- also fails
main = \\{} -> ()
"""

parse : String -> String
parse input =
  case Elm.Parser.parse input of
    Err e ->
      "Failed: " ++ Debug.toString e
    Ok v ->
      "Success: " ++ Debug.toString v

main = Html.text (parse src)

Writes: Failed: [{ col = 7, problem = ExpectingVariable, row = 3 }]

Rename decode to decoder.

There are multiple functions which provide decoders for Elm syntax data. But they all named as decode. Which is misleading. Because they have a value of Json.Decode.Decoder. Isn't it more natural to call them decoder?

Operator precedence is not taken into account when parsing expressions

It seems to me, that operator precedence is not take into account when parsing expressions.

For example the following two expressions

1 ^ 2 * 3 + 4
1 + 2 * 3 ^ 4

both produce the same shape of the expression tree, the first one being obviously incorrect

^
โ”‚
โ”œโ”€ 1
โ”‚
โ””โ”€ *
   โ”‚
   โ”œโ”€ 2
   โ”‚
   โ””โ”€ +
      โ”‚
      โ”œโ”€ 3
      โ”‚
      โ””โ”€ 4

+
โ”‚
โ”œโ”€ 1
โ”‚
โ””โ”€ *
   โ”‚
   โ”œโ”€ 2
   โ”‚
   โ””โ”€ ^
      โ”‚
      โ”œโ”€ 3
      โ”‚
      โ””โ”€ 4

I created SSCCE which demonstrated this issue: https://github.com/jhrcek/elm-syntax-sscce

Wider context: this library is used by elm-review, which -among other things - allows writing custom linting rules for elm source code. Unfortunately since the expressions are not parsed correctly by this library, this leads to all sorts of false positives as in the following example

Screenshot from 2020-04-24 09-34-06

Because elm-syntax parses this as (f <| x) || y it gives this false positive warning, which shouldn't be there, because || has higher precedence than <|, so the correct parsing should look like f <| (x || y).

Remove Node and add Range directly to AST types

One problem I have when working with elm-syntax is that I spend a lot of time deconstructing types or fixing compiler errors related to forgetting to use Node.value on what I thought was a Expression but in fact was Node Expression. I spend a lot of time jumping between my code and the documentation because I often get lost in heavily nested data structures.

I think these problems could be greatly reduced if Node a was removed and instead Range was placed directly in all the AST types that use it.

As an example of how this makes things more simple, here is some code I wrote that uses elm-syntax

getTodo : ModuleContext -> Function -> List Todo
getTodo context  function =
    case ( function.signature, function.declaration ) of
        ( Just (Node _ signature), Node _ declaration ) ->
            case signature.typeAnnotation of
                Node _ (TypeAnnotation.Typed (Node _ ( [], "Codec" )) [ Node _ (TypeAnnotation.Typed codecType []) ]) ->
                    case declaration.expression of
                        Node _ (Expression.Application ((Node _ (Expression.FunctionOrValue [ "Debug" ] "todo")) :: _)) ->
                            case ModuleNameLookupTable.moduleNameFor context.lookupTable codecType of
                                Just actualCodecType ->
                                    ...

                                Nothing ->
                                    []

                        _ ->
                            []

                _ ->
                    []

        _ ->
            []

and here is the same code but I've removed Node and placed extra _ where a range value would be instead

getTodo2 : ModuleContext -> Function -> List Todo
getTodo2 context function =
    case ( function.signature, function.declaration ) of
        ( Just signature, declaration ) ->
            case signature.typeAnnotation of
                (TypeAnnotation.Typed _ (_, [], "Codec" ) [  (TypeAnnotation.Typed _ codecType []) ]) ->
                    case declaration.expression of
                        Expression.Application _ ((Expression.FunctionOrValue _ [ "Debug" ] "todo") :: _) ->
                            case ModuleNameLookupTable.moduleNameFor context.lookupTable codecType of
                                Just actualCodecType ->
                                    ...

                                Nothing ->
                                    []

                        _ ->
                            []

                _ ->
                    []

        _ ->
            []

Failure to parse type annotation with no space between type and parenthesis

Working:

module Example exposing (example)

example : List (List String)
example =
    [ [ "example" ] ]

Failing:

module Example exposing (example)

example : List(List String)
example =
    [ [ "example" ] ]

Yields: Could not continue parsing on location (2,-1)

Elm-make will successfully build files with such type signatures.

`+1` is considered valid syntax

As shown (roughly) here, +1 is parsed and considered as valid syntax, even though Elm rejects it with a syntax error.

It seems to be parsed as OperationApplication "+" Right (Application []) (Int 1). The Application is obviously invalid, and I am not sure how this will turn out in v8, where it has become a non-empty list. Also, its range is the emptyRange (0,0,0,0).

++1 also seems to compile even though it shouldn't.

Incorrect parsing of the contents of triple-quoted strings

When encountering double backslashes (\\) in triple quotes, the order of characters is wrong.

module A exposing (..)
_ = """\\{"""
_ = """\\{\\}"""
_ = """\\}\\{"""
_ = """\\[\\]"""
_ = """\\(\\)"""
_ = """\\a\\b"""
_ = """\\a-blablabla-\\b"""

For instance \\{\\} becomes \\}\\{

To play with those interactively, I made an Ellie. It's in Elm code, so there is a lot of escaping unfortunately https://ellie-app.com/bbmqddVmJzka1 (see line ~65)

Add the information to String literal of how it was created

It would be nice to know whether a String literal was created using "x" or using """x""".

I don't think we have the information at this point, and if we want to pin-point a part of the string based on the range from the expression node, then we need to account for the """.

Possible ways:

  1. Add the information as an additional argument
type Expression =
  = ...
  | Literal StringLiteralType String

-- Let's think of a better name for this one
type StringLiteralType
  = SingleQuote
  | TripleQuote

Then, if we want to select a sub-string, we need to offset the string by one for SingleQuote and by 3 for TripleQuote.

  1. Add a new variant to expression:
type Expression =
  = ...
  | Literal String
  | MultilineStringLiteral (List (Node String))

I am thinking it could be interesting to have a List of lines, each with their own range. That would make a lot of things quite easier. If you do need the concatenated string, then that is relatively easy to do.

I know I have struggled with finding sub-strings in multiline comments (maybe that could also be split into different types like this?), and I hope that this can make things a bit easier (especially if we have the range for each line).

I think this option is better, but the thing that makes it less nice is that if you are looking at strings, you need to handle both cases.

Elm.Writer does not write any comments

When parsing the following program, and then writing it again all comments are lost.
SSCCE:

module Test exposing (main)

import Elm.Parser
import Elm.Processing
import Elm.Syntax.File as File
import Elm.Writer as Writer
import Html


src =
    """
module Foo exposing (..)

-- foo
foo = 1

-- bar
greet =
  let a = "Hello"
  in a ++ "Peter"
"""


parse : String -> File.File
parse input =
    case Elm.Parser.parse input of
        Err e ->
            Debug.todo <| "Oh no: " ++ Debug.toString e

        Ok rawFile ->
            Elm.Processing.process Elm.Processing.init rawFile


write : File.File -> String
write file =
    Writer.write <|
        Writer.writeFile file


main =
    Html.pre [] [ Html.text <| write <| parse src ]

Elm.Writer generates invalid case expressions

The following is code generated with Elm.Writer

codec : Codec MyType
codec =
    Codec.customType (\a b c value -> case value of
  VariantA ->
    a
  VariantB ->
    b
  VariantC ->
    c)

This is not valid Elm code for two reasons:

  1. The case expression patterns aren't indented past the start of the case keyword
  2. The closing parenthesis for the lambda expression is not on its own line

GLSL parsing in elm-syntax

Parsing GLSL blocks would be nice to have (read: high effort, low reward). Might have some uses in combination with elm-review but maybe not. I'm creating this issue so the idea isn't forgotten at least.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.