Giter Site home page Giter Site logo

cstparser.jl's Introduction

CSTParser

Dev Project Status: Active - The project has reached a stable, usable state and is being actively developed. Run CI on master codecov

A parser for Julia using Tokenize that aims to extend the built-in parser by providing additional meta information along with the resultant AST.

Installation and Usage

using Pkg
Pkg.add("CSTParser")
using CSTParser
CSTParser.parse("x = y + 123")

Documentation: Dev

Structure

CSTParser.EXPR are broadly equivalent to Base.Expr in structure. The key differences are additional fields to store, for each expression:

  • trivia tokens such as punctuation or keywords that are not stored as part of the AST but are needed for the CST representation;
  • the span measurements for an expression;
  • the textual representation of the token (only needed for certain tokens including identifiers (symbols), operators and literals);
  • the parent expression, if present; and
  • any other meta information (this field is untyped and is used within CSTParser to hold errors).

All .head values used in Expr are used in EXPR. Unlike in AST, tokens (terminal expressions with no child expressions) are stored as EXPR and additional head types are used to distinguish between different types of token. These possible head values include:

:IDENTIFIER
:NONSTDIDENTIFIER (e.g. var"id")
:OPERATOR

# Punctuation
:COMMA
:LPAREN
:RPAREN
:LSQUARE
:RSQUARE
:LBRACE
:RBRACE
:ATSIGN
:DOT

# Keywords
:ABSTRACT
:BAREMODULE
:BEGIN
:BREAK
:CATCH
:CONST
:CONTINUE
:DO
:ELSE
:ELSEIF
:END
:EXPORT
:FINALLY
:FOR
:FUNCTION
:GLOBAL
:IF
:IMPORT
:LET
:LOCAL
:MACRO
:MODULE
:MUTABLE
:NEW
:OUTER
:PRIMITIVE
:QUOTE
:RETURN
:STRUCT
:TRY
:TYPE
:USING
:WHILE

# Literals
:INTEGER
:BININT (0b0)
:HEXINT (0x0)
:OCTINT (0o0)
:FLOAT
:STRING
:TRIPLESTRING
:CHAR
:CMD
:TRIPLECMD
:NOTHING 
:TRUE
:FALSE

The ordering of .args members matches that in Base.Expr and members of .trivia are stored in the order in which they appear in text.

cstparser.jl's People

Contributors

aminya avatar aviatesk avatar benph avatar c42f avatar davidanthoff avatar femtocleaner[bot] avatar fonsp avatar github-actions[bot] avatar jeffbezanson avatar keno avatar kristofferc avatar musm avatar pangoraw avatar pfitzseb avatar simeonschaub avatar staticfloat avatar vtjnash avatar yingboma avatar zacln avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cstparser.jl's Issues

Incorrectly parsed anonymous function call

julia> dump(Expr(CSTParser.parse("function(args::Vararg{Any,N}) where N end")))
Expr
  head: Symbol function
  args: Array{Any}((2,))
    1: Expr
      head: Symbol where
      args: Array{Any}((2,))
        1: Expr
          head: Symbol ::
          args: Array{Any}((2,))
            1: Symbol args
            2: Expr
              head: Symbol curly
              args: Array{Any}((3,))
                1: Symbol Vararg
                2: Symbol Any
                3: Symbol N
              typ: Any
          typ: Any
        2: Symbol N
      typ: Any
    2: Expr
      head: Symbol block
      args: Array{Any}((0,))
      typ: Any
  typ: Any

julia> dump(Meta.parse("function(args::Vararg{Any,N}) where N end"))
Expr
  head: Symbol function
  args: Array{Any}((2,))
    1: Expr
      head: Symbol where
      args: Array{Any}((2,))
        1: Expr
          head: Symbol tuple
          args: Array{Any}((1,))
            1: Expr
              head: Symbol ::
              args: Array{Any}((2,))
                1: Symbol args
                2: Expr
              typ: Any
          typ: Any
        2: Symbol N
      typ: Any
    2: Expr
      head: Symbol block
      args: Array{Any}((1,))
        1: Expr
          head: Symbol line
          args: Array{Any}((2,))
            1: Int64 1
            2: Symbol none
          typ: Any
      typ: Any
  typ: Any

julia> CSTParser.parse("function(args::Vararg{Any,N}) where N end")
CSTParser.FunctionDef  41 (1:41)
├─ CSTParser.KEYWORD(FUNCTION, 8, 1:8)
├─ WhereOpCall  30 (1:29)
│  ├─ CSTParser.InvisBrackets  22 (1:21)
│  │  ├─ CSTParser.PUNCTUATION(LPAREN, 1, 1:1)
│  │  ├─ BinarySyntaxOpCall  19 (1:19)
│  │  │  ├─ ID: args  4 (1:4)
│  │  │  ├─ OP: DECLARATION 2 (1:2)
│  │  │  └─ CSTParser.Curly  13 (1:13)
│  │  │     ├─ ID: Vararg  6 (1:6)
│  │  │     ├─ CSTParser.PUNCTUATION(LBRACE, 1, 1:1)
│  │  │     ├─ ID: Any  3 (1:3)
│  │  │     ├─ CSTParser.PUNCTUATION(COMMA, 1, 1:1)
│  │  │     ├─ ID: N  1 (1:1)
│  │  │     └─ CSTParser.PUNCTUATION(RBRACE, 1, 1:1)
│  │  └─ CSTParser.PUNCTUATION(RPAREN, 2, 1:1)
│  ├─ OP: WHERE 6 (1:5)
│  └─ ID: N  2 (1:1)
├─ CSTParser.Block  0 (1:0)
└─ CSTParser.KEYWORD(END, 3, 1:3)

Note the missing Expr(:tuple). I believe the CSTParser.InvisBrackets should probably be a TupleH.

Lexical order of nodes

Consider

julia> str
"[i for i in x if y==2]"

julia> CSTParser.parse(str)
CSTParser.Comprehension  22 (1:22)
├─ CSTParser.PUNCTUATION(LSQUARE, 1, 1:1)
├─ CSTParser.Generator  20 (1:12)
│  ├─ ID: i  2 (1:1)
│  ├─ CSTParser.KEYWORD(FOR, 4, 1:3)
│  └─ CSTParser.Filter  14 (1:6)
│     ├─ BinaryOpCall  4 (1:4)
│     │  ├─ ID: y  1 (1:1)
│     │  ├─ OP: EQEQ 2 (1:2)
│     │  └─ LITERAL: 2  1 (1:1)
│     ├─ CSTParser.KEYWORD(IF, 3, 1:2)
│     └─ BinaryOpCall  7 (1:6)
│        ├─ ID: i  2 (1:1)
│        ├─ OP: IN 3 (1:2)
│        └─ ID: x  2 (1:1)
└─ CSTParser.PUNCTUATION(RSQUARE, 1, 1:1)

Here, the y == 2 node is before the i in x node in args. If we try to accumulate spans by iterating over args we will get confused.
I think there either needs to be a guarantee that nodes come in lexical order or an iterator that allows one to iterate in lexical order.

v0.7/1.0 compatability

  • #22089: suffixed operators (Tokenize)
  • #24404: add operator (Tokenize)
  • introduce specific literal types for bin/hex/oct (Tokenize)
  • #16356: disallow juxtaposition of bin/hex/octs
  • #24153: make <| right associative
  • #8470: replace cell1d and cell2d expr heads w/ braces and bracescat respectively
  • #21774: replace if head w/ elseif in elseif blocks
  • #21774: collect let assignments as in for blocks`
  • #21774: add :do head
  • #23157: rename datatype heads
  • #25391: add higher precedence for =>
  • #20575: disallow string literals juxtaposed (trailed) by a symbol
  • #18650: change handling of macro calls to arguments of generators/comprehensions
  • #16937: parse error on repeated argument names (implement?)
  • #22868: disallow @ macroname args (space)
  • #22868: disallow juxtaposition of symbol w/ macrocall (was already disallowed)
  • use GenericIOBuffer
  • use @nospecialize

Random Roslyn link

This sounded interesting, I have no idea whether it really applies here, but thought I'd post it :)

Incorrect parsing of `-0x10`

julia> Expr(CSTParser.parse("-0x10"))
:($(Expr(:call)))

julia> Base.parse("-0x10")
:(-0x10)

julia> dump(ans)
Expr
  head: Symbol call
  args: Array{Any}((2,))
    1: Symbol -
    2: UInt8 16
  typ: Any

I'll look into fixing this.

empty triple-string parsing fails

@Keno, following the fixes to triple string quoting

using CSTParser
str = "\"\"\"\"\"\""
CSTParser.parse(str)

gives

ERROR: BoundsError: attempt to access ""
  at index [0]
Stacktrace:
 [1] next at ./strings/string.jl:197 [inlined]
 [2] getindex(::String, ::Int64) at ./strings/basic.jl:32
 [3] (::CSTParser.#adjust_lcp#20)(::CSTParser.EXPR{CSTParser.LITERAL{STRING::Tokenize.Tokens.Kind= 78}}, ::Bool) at /home/zac/.julia/v0.6/CSTParser/src/components/strings.jl:41
 [4] parse_string_or_cmd(::CSTParser.ParseState, ::Bool) at /home/zac/.julia/v0.6/CSTParser/src/components/strings.jl:89
 [5] CSTParser.LITERAL(::CSTParser.ParseState) at /home/zac/.julia/v0.6/CSTParser/src/spec.jl:27
 [6] INSTANCE(::CSTParser.ParseState) at /home/zac/.julia/v0.6/CSTParser/src/spec.jl:49
 [7] parse_doc(::CSTParser.ParseState) at /home/zac/.julia/v0.6/CSTParser/src/CSTParser.jl:273
 [8] parse(::CSTParser.ParseState, ::Bool) at /home/zac/.julia/v0.6/CSTParser/src/CSTParser.jl:330
 [9] parse(::String, ::Bool) at /home/zac/.julia/v0.6/CSTParser/src/CSTParser.jl:261
 [10] parse(::String) at /home/zac/.julia/v0.6/CSTParser/src/CSTParser.jl:260

Rename?

I feel like Parser.jl is too generic.
Parser for what?

I suggest instead

  • CstParser.jl
  • JuliaCstParser.jl

Possible error parsing @doc macrocall

There seems to be a difference in parsing:

julia> str = raw"""
       @doc \"""
          foo()
       \"""
       foo() = bar()"""

julia> CSTParser.parse(str)
MacroCall  35(35)
 MacroName  5(4)
  PUNC: AT_SIGN  1(1)
  doc  4(3)
 TRIPLE_STRING:    foo()
  17(16)
 BinaryOpCall  13(13) new scope foo
  Call  6(5)
   foo  3(3)
   (
   )
  OP: EQ  2(1)
  Block  5(5)
   Call  5(5)
    bar  3(3)
    (
    )

and

julia> str = """
       module MyModule

       import Markdown: @doc_str

       @doc doc\"""
           foo()
       \"""
       foo() = bar()

       end # module"""

julia> CSTParser.parse(str)
ModuleH  97(88) new scope MyModule
 MODULE  7(6)
 MyModule  10(8)
 Block  68(66)
  Import  27(25)
   IMPORT  7(6)
   Markdown  8(8)
   OP: COLON  2(1)
   MacroName  10(8)
    PUNC: AT_SIGN  1(1)
    doc_str  9(7)
  MacroCall  26(25)
   MacroName  5(4)
    PUNC: AT_SIGN  1(1)
    doc  4(3)
   x_Str  21(20)
    doc  3(3)
    TRIPLE_STRING: foo()
  18(17)
  BinaryOpCall  15(13) new scope foo
   Call  6(5)
    foo  3(3)
    (
    )
   OP: EQ  2(1)
   Block  7(5)
    Call  7(5)
     bar  3(3)
     (
     )
 END  12(3)

Specifically in the first case the macrocall has the binaryopcall as the 3rd arg and in the second case it does not.

triple quoted string indent error

The de-indenting of the following (when in a file/as a string) is incorrect

begin
        @info """
            METADATA $is out-of-date — you may not have the latest version of $pkg
            Use `Pkg.update()` to get the latest versions of your packages
            """
end

@Keno

short-form anon func not parsed as function definition

and also doesn't open a new scope:

julia> CSTParser.parse("function (a,b); a+b; end").scope
Root scope:
[]
[]


julia> CSTParser.parse("(a,b) -> a+b ").scope

Is that intended? I had expected short form anonymous functions to be parsed as FunctionDefs as well.

Bug in parse_comma_sep

I thought I had filed this before, but can't find it right now:

SERVER ERROR: BoundsError: attempt to access 0-element Array{CSTParser.EXPR,1} at index [0]
Stacktrace:
 [1] pop!(::CSTParser.EXPR{CSTParser.TupleH}) at /opt/pkgs/v0.6/CSTParser/src/spec.jl:47
 [2] parse_comma_sep(::CSTParser.ParseState, ::CSTParser.EXPR{CSTParser.TupleH}, ::Bool, ::Bool) at /opt/pkgs/v0.6/CSTParser/src/components/functions.jl:149
 [3] macro expansion at /opt/pkgs/v0.6/CSTParser/src/utils.jl:49 [inlined]
 [4] parse_paren(::CSTParser.ParseState) at /opt/pkgs/v0.6/CSTParser/src/CSTParser.jl:191
 [5] macro expansion at /opt/pkgs/v0.6/CSTParser/src/utils.jl:152 [inlined]
 [6] parse_expression(::CSTParser.ParseState) at /opt/pkgs/v0.6/CSTParser/src/CSTParser.jl:57
 [7] macro expansion at /opt/pkgs/v0.6/CSTParser/src/utils.jl:80 [inlined]
 [8] parse_unary(::CSTParser.ParseState, ::CSTParser.EXPR{CSTParser.OPERATOR{8,COLON::Tokenize.Tokens.Kind = 540,false}}) at /opt/pkgs/v0.6/CSTParser/src/components/operators.jl:206
 [9] macro expansion at /opt/pkgs/v0.6/CSTParser/src/utils.jl:152 [inlined]
 [10] parse_expression(::CSTParser.ParseState) at /opt/pkgs/v0.6/CSTParser/src/CSTParser.jl:69
 [11] macro expansion at /opt/pkgs/v0.6/CSTParser/src/utils.jl:80 [inlined]
 [12] parse_operator(::CSTParser.ParseState, ::CSTParser.EXPR{CSTParser.TupleH}, ::CSTParser.EXPR{CSTParser.OPERATOR{1,EQ::Tokenize.Tokens.Kind = 98,false}}) at /opt/pkgs/v0.6/CSTParser/src/components/operators.jl:214
 [13] parse_compound(::CSTParser.ParseState, ::CSTParser.EXPR{CSTParser.TupleH}) at /opt/pkgs/v0.6/CSTParser/src/CSTParser.jl:132
 [14] macro expansion at /opt/pkgs/v0.6/CSTParser/src/utils.jl:152 [inlined]
 [15] parse_expression(::CSTParser.ParseState) at /opt/pkgs/v0.6/CSTParser/src/CSTParser.jl:85
 [16] macro expansion at /opt/pkgs/v0.6/CSTParser/src/utils.jl:49 [inlined]
 [17] parse_block(::CSTParser.ParseState, ::CSTParser.EXPR{CSTParser.Block}, ::Array{Tokenize.Tokens.Kind,1}, ::Bool) at /opt/pkgs/v0.6/CSTParser/src/components/genericblocks.jl:34
 [18] macro expansion at /opt/pkgs/v0.6/CSTParser/src/utils.jl:124 [inlined]
 [19] parse_kw(::CSTParser.ParseState, ::Type{Val{FUNCTION::Tokenize.Tokens.Kind = 25}}) at /opt/pkgs/v0.6/CSTParser/src/components/functions.jl:34
 [20] macro expansion at /opt/pkgs/v0.6/CSTParser/src/utils.jl:152 [inlined]
 [21] parse_expression(::CSTParser.ParseState) at /opt/pkgs/v0.6/CSTParser/src/CSTParser.jl:55
 [22] parse_doc(::CSTParser.ParseState) at /opt/pkgs/v0.6/CSTParser/src/CSTParser.jl:242
 [23] parse(::CSTParser.ParseState, ::Bool) at /opt/pkgs/v0.6/CSTParser/src/CSTParser.jl:263
 [24] parse(::String, ::Bool) at /opt/pkgs/v0.6/CSTParser/src/CSTParser.jl:213

I think this happens for cases like (;abc), i.e. where there's no other expressions before the semicolon.

"TypeError: non-boolean (Tokenize.Tokens.Token) used in boolean context" from running Linter

ERROR: LoadError: TypeError: non-boolean (Tokenize.Tokens.Token) used in boolean context
Stacktrace:
 [1] lex_ws_comment(::Tokenize.Lexers.Lexer{Base.AbstractIOBuffer{Array{UInt8,1}}}, ::Char) at /home/kristoffer/.vscode/extensions/julialang.language-julia-0.6.0-alpha.3/scripts/languageserver/packages/CSTParser/src/lexer.jl:158
 [2] next(::CSTParser.ParseState) at /home/kristoffer/.vscode/extensions/julialang.language-julia-0.6.0-alpha.3/scripts/languageserver/packages/CSTParser/src/lexer.jl:122
 [3] parse_kw(::CSTParser.ParseState, ::Type{Val{BEGIN::Tokenize.Tokens.Kind = 12}}) at /home/kristoffer/.vscode/extensions/julialang.language-julia-0.6.0-alpha.3/scripts/languageserver/packages/CSTParser/src/components/genericblocks.jl:11
 [4] macro expansion at /home/kristoffer/.vscode/extensions/julialang.language-julia-0.6.0-alpha.3/scripts/languageserver/packages/CSTParser/src/utils.jl:182 [inlined]
 [5] parse_expression(::CSTParser.ParseState) at /home/kristoffer/.vscode/extensions/julialang.language-julia-0.6.0-alpha.3/scripts/languageserver/packages/CSTParser/src/CSTParser.jl:63
 [6] macro expansion at /home/kristoffer/.vscode/extensions/julialang.language-julia-0.6.0-alpha.3/scripts/languageserver/packages/CSTParser/src/utils.jl:49 [inlined]
 [7] macro expansion at 

Incorrect parsing of nested generator expressions

This test is parsed incorrectly:

julia> t = ":(x for x in y if aa for z in w if bb)"
":(x for x in y if aa for z in w if bb)"

julia> parse(t)
:($(Expr(:quote, :((x for x = y if aa for z = w if bb)))))

julia> Expr(CSTParser.parse(t))
:($(Expr(:quote, :((x for x = y if (aa for z = w if bb))))))

julia> dump(parse(t))
Expr
  head: Symbol quote
  args: Array{Any}((1,))
    1: Expr
      head: Symbol flatten
      args: Array{Any}((1,))
        1: Expr
          head: Symbol generator
          args: Array{Any}((2,))
            1: Expr
              head: Symbol generator
              args: Array{Any}((2,))
                1: Symbol x
                2: Expr
              typ: Any
            2: Expr
              head: Symbol filter
              args: Array{Any}((2,))
                1: Symbol aa
                2: Expr
              typ: Any
          typ: Any
      typ: Any
  typ: Any

julia> dump(Expr(CSTParser.parse(t)))
Expr
  head: Symbol quote
  args: Array{Any}((1,))
    1: Expr
      head: Symbol generator
      args: Array{Any}((2,))
        1: Symbol x
        2: Expr
          head: Symbol filter
          args: Array{Any}((2,))
            1: Expr
              head: Symbol generator
              args: Array{Any}((2,))
                1: Symbol aa
                2: Expr
              typ: Any
            2: Expr
              head: Symbol =
              args: Array{Any}((2,))
                1: Symbol x
                2: Symbol y
              typ: Any
          typ: Any
      typ: Any
  typ: Any

Error when calling get_args on Parameters

julia> x = CSTParser.parse("@f(; x)")
MacroCall  7(7)
 MacroName  2(2)
  PUNC: AT_SIGN  1(1)
  f  1(1)
 (
 Parameters  1(1)
  x  1(1)
 )
julia> CSTParser.get_args(x[3])
ERROR: MethodError: no method matching rem_where(::Nothing)
Closest candidates are:
  rem_where(::CSTParser.EXPR) at /home/domluna/.julia/packages/CSTParser/d5RjS/src/interface.jl:170
Stacktrace:
 [1] get_args(::CSTParser.EXPR) at /home/domluna/.julia/packages/CSTParser/d5RjS/src/interface.jl:426
 [2] top-level scope at REPL[131]:1

precompilation on 0.7 alpha seems stuck

I am trying to use FemtoCleaner locally which requires CSTParser.

When 'using CSTParser', julia seems stuck on my computer (although the task manager in Windows10 shows the process is busy).
Is this a compilation/precompilation regression on 0.7 alpha?
Is this working on other OS?

How long should I reasonably wait after 'using CSTParser'?

Femtocleaner causes MethodError: no method matching get_sig(::CSTParser.UnarySyntaxOpCall)

on Julia 0.6.4 and CSTParser tag v0.3.5.

ERROR: MethodError: no method matching get_sig(::CSTParser.UnarySyntaxOpCall)
Closest candidates are:
  get_sig(::CSTParser.BinarySyntaxOpCall) at /Users/goretkin/repos/update_polyhedra/jl_pkg/v0.6/CSTParser/src/interface.jl:266
  get_sig(::CSTParser.EXPR{CSTParser.Macro}) at /Users/goretkin/repos/update_polyhedra/jl_pkg/v0.6/CSTParser/src/interface.jl:264
  get_sig(::CSTParser.EXPR{CSTParser.FunctionDef}) at /Users/goretkin/repos/update_polyhedra/jl_pkg/v0.6/CSTParser/src/interface.jl:263
  ...
Stacktrace:
 [1] get_args(::CSTParser.UnarySyntaxOpCall) at /Users/goretkin/repos/update_polyhedra/jl_pkg/v0.6/CSTParser/src/interface.jl:331
 [2] get_args(::CSTParser.EXPR{CSTParser.FunctionDef}) at /Users/goretkin/repos/update_polyhedra/jl_pkg/v0.6/CSTParser/src/interface.jl:334
 [3] create_scope(::CSTParser.EXPR{CSTParser.FunctionDef}, ::Deprecations.CSTAnalyzer.Scope, ::Deprecations.CSTAnalyzer.State{Deprecations.CSTAnalyzer.FileSystem}) at /Users/goretkin/repos/update_polyhedra/jl_pkg/v0.6/Deprecations/src/CSTAnalyzer/CSTAnalyzer.jl:102
 [4] trav(::Deprecations.IncludeWalker, ::CSTParser.EXPR{CSTParser.MacroCall}, ::Deprecations.CSTAnalyzer.Scope, ::Deprecations.CSTAnalyzer.State{Deprecations.CSTAnalyzer.FileSystem}) at /Users/goretkin/repos/update_polyhedra/jl_pkg/v0.6/Deprecations/src/CSTAnalyzer/trav.jl:20

Instrumenting trav(w::Walker, x::OverlayNode, s, S::State) to print x.expr as it goes reveals:

         │  ├─ BinarySyntaxOpCall  25 (1:25)
         │  │  ├─ CSTParser.Curly  14 (1:14)
         │  │  │  ├─ ID: RepIterator  11 (1:11)
         │  │  │  ├─ CSTParser.PUNCTUATION(LBRACE, 1, 1:1)
         │  │  │  ├─ ID: T  1 (1:1)
         │  │  │  └─ CSTParser.PUNCTUATION(RBRACE, 1, 1:1)
         │  │  ├─ OP: DOT 1 (1:1)
         │  │  └─ CSTParser.TupleH  10 (1:10)
         │  │     ├─ CSTParser.PUNCTUATION(LPAREN, 1, 1:1)
         │  │     ├─ CSTParser.Call  8 (1:8)
         │  │     │  ├─ ID: vreps  5 (1:5)
         │  │     │  ├─ CSTParser.PUNCTUATION(LPAREN, 1, 1:1)
         │  │     │  ├─ ID: p  1 (1:1)
         │  │     │  └─ CSTParser.PUNCTUATION(RPAREN, 1, 1:1)
         │  │     └─ CSTParser.PUNCTUATION(RPAREN, 1, 1:1)
         │  └─ OP: DDDOT 3 (1:3)
         └─ CSTParser.PUNCTUATION(RPAREN, 2, 1:1)

Incorrect parsing of '''

julia> '''
'\'': ASCII/Unicode U+0027 (category Po: Punctuation, other)

but it looks like it thinks is a multiline string.

confusing `BracesCat` usage

I think the following is a little confusing:

julia> """
       function foo() where {A <:B}
       body
       end
       """ |> Meta.parse
:(function foo() where A <: B
      #= none:2 =#
      body
  end)

julia> """
       function foo() where {A <: B}
       body
       end
       """ |> Meta.parse
:(function foo() where A <: B
      #= none:2 =#
      body
  end)

It looks these return the same output with Meta.parse. But with CSTParser.parse the CST's are different:

julia> """
       function foo() where {A <: B}
       body
       end
       """ |> CSTParser.parse
FunctionDef  39(38)
 FUNCTION  9(8)
 WhereOpCall  21(20)
  Call  6(5)
   foo  3(3)
   (
   )
  OP: WHERE  6(5)
  PUNC: LBRACE  1(1)
  BinaryOpCall  6(6)
   A  2(1)
   OP: ISSUBTYPE  3(2)
   B  1(1)
  PUNC: RBRACE  2(1)
 Block  5(4)
  body  5(4)
 END  4(3)


julia> """
       function foo() where {A <:B}
       body
       end
       """ |> CSTParser.parse
FunctionDef  38(37)
 FUNCTION  9(8)
 WhereOpCall  20(19)
  Call  6(5)
   foo  3(3)
   (
   )
  OP: WHERE  6(5)
  BracesCat  8(7)
   PUNC: LBRACE  1(1)
   Row  5(5)
    A  2(1)
    UnaryOpCall  3(3)
     OP: ISSUBTYPE  2(2)
     B  1(1)
   PUNC: RBRACE  2(1)
 Block  5(4)
  body  5(4)
 END  4(3)

This looks like a corner case where you don't actually want to treat this as a UnaryOpCall.

Triple strings aren't assigned Tokens.TRIPLE_STRING

Consider

julia> x = CSTParser.parse("t = \"\"\"hello world\"\"\"")
BinarySyntaxOpCall  21 (21)
 ID: t  2 (1)
 OP: EQ  2 (1)
 LITERAL: hello world  17 (17)

julia> x.arg2.kind
STRING::Kind = 62

should be TRIPLE_STRING::Kind = 63.

Another thing to consider

julia> s0 = """
           txt = \"\"\"
           cannot document the following expression:

           \$(isa(ex, AbstractString) ? repr(ex) : ex)\"\"\"
       """

This will be parsed into a StringH type. Should the STRING literals inside be TRIPLE_STRING or would it make more sense to have a TripleStringH type?

Support julia markdown (.jmd) files

I thought I'd bring this up now rather than later ;) It would be great if we could get all of this working for .jmd markdown files as used in Weave.jl, i.e. if we had the full intellisense available for code blocks in weave files.

Reintroduce mechanism for error annotations

We seem to have lost the capability to have the parser emit error annotations. That's a shame, because they were quite nice (e.g. in FancyDiagnostics). What would be the best way to add back that capability?

parsing LITERAL removes newline

julia> s
"\"\"\"\nInterpolate using `\\\$`\n\"\"\"\n"

julia> x = CSTParser.parse(s)
LITERAL: Interpolate using `$`
  31 (30)


julia> x.val
"Interpolate using `\$`\n"

should be "\nInterpolate using `\$`\n"

Getting hundreds of "Possible use of undeclared variable" errors

I can't come up with a MWE, but this is all over my lightgraphs code:

file: 'file:///Users/bromberger1/dev/julia/LightGraphs/src/graphtypes/simplegraphs/simpleedge.jl'
severity: 'Warning'
message: 'Possible use of undeclared variable SimpleEdge'
at: '11,1'
source: 'CSTParser.Diagnostics.Diagnostic'

However, this variable is indeed defined right above:

abstract type AbstractSimpleEdge <: AbstractEdge end

struct SimpleEdge{T<:Integer} <: AbstractSimpleEdge
    src::T
    dst::T
end

SimpleEdge(t::Tuple) = SimpleEdge(t[1], t[2]). # this is line 11

Error when calling get_args on MacroCall

julia> x = CSTParser.parse("@f(; x)")
MacroCall  7(7)
 MacroName  2(2)
  PUNC: AT_SIGN  1(1)
  f  1(1)
 (
 Parameters  1(1)
  x  1(1)
 )


julia> CSTParser.get_args(x)
ERROR: MethodError: no method matching rem_where(::Nothing)
Closest candidates are:
  rem_where(::CSTParser.EXPR) at /home/domluna/.julia/packages/CSTParser/d5RjS/src/interface.jl:170
Stacktrace:
 [1] get_args(::CSTParser.EXPR) at /home/domluna/.julia/packages/CSTParser/d5RjS/src/interface.jl:426
 [2] top-level scope at REPL[56]:1

I would expect this to behave the same as Call.

Might be as simple as adding || typof(x) === MacroCall below?

elseif typof(x) === Call

Misleading linter warning?

file: 'file:///Users/seth/dev/julia/wip/LightGraphs.jl/src/community/cliques.jl'
severity: ''
message: 'ExtraWS'
at: '56,32'
source: 'CSTParser.Diagnostics.Diagnostic'

line in question:

stack = Vector{Tuple{Set{T}, Set{T}, Set{T}}}()

I think we should have whitespace following the commas in the tuple, no?

WhereOpCall parse error

julia> str
"f(a, b, c)::Rtype where {A,B,C}"

julia> CSTParser.parse(str)
BinarySyntaxOpCall  31 (31)
 Call  10 (10)
  ID: f  1 (1)
  (
  ID: a  1 (1)
  ,
  ID: b  1 (1)
  ,
  ID: c  1 (1)
  )
 OP: DECLARATION  2 (2)
 WhereOpCall  19 (19)
  ID: Rtype  6 (5)
  OP: WHERE  6 (5)
  PUNC: LBRACE  1 (1)
  ID: A  1 (1)
  ,
  ID: B  1 (1)
  ,
  ID: C  1 (1)
  PUNC: RBRACE  1 (1)

I think this should be a WhereOpCall at the top-level and arg1 would be the BinarySyntaxOpCall.

CSTParser incorrectly accepts missing condition in `if`

julia> CSTParser.parse("if\n\true\nend")
CSTParser.If  11 (1:11)
├─ CSTParser.KEYWORD{IF::Tokenize.Tokens.Kind = 27}  4 (1:2)
├─ CSTParser.IDENTIFIER  4 (1:3)
├─ CSTParser.Block  0 (1:0)
└─ CSTParser.KEYWORD{END::Tokenize.Tokens.Kind = 21}  3 (1:3)


julia> parse("if\n\true\nend")
ERROR: ParseError("missing condition in \"if\" at none:1")
Stacktrace:
 [1] #parse#234(::Bool, ::Bool, ::Function, ::String, ::Int64) at ./parse.jl:222
 [2] (::Base.#kw##parse)(::Array{Any,1}, ::Base.#parse, ::String, ::Int64) at ./<missing>:0
 [3] #parse#235(::Bool, ::Function, ::String) at ./parse.jl:232
 [4] parse(::String) at ./parse.jl:232

should be an error instead.

Parsing of integers and floats calls back into Base parser

Replacing the base parser to calls to CSTParser gives problem when parsing for example an integer or a float (see JuliaLang/FancyDiagnostics.jl#1).

I believe the reason is that https://github.com/ZacLN/CSTParser.jl/blob/b9a5f944bbd3c1de35ad3b1b5cbd7a0d3ddd4351/src/conversion.jl#L41 calls back into the Base parser with a string.

Shouldn't there exist a method like:

Expr(x::EXPR{LITERAL{Tokens.INTEGER}}) = Base.parse(Int, x.val)

and perhaps the same for floats?

MethodError

From crash reporting (v0.15.0.alpha.9 is identical to v0.14.0.rc.4 in all aspects that relate to the LS side of things).

Failed method: Expr(::CSTParser.EXPR)

Error message:

MethodError: no method matching iterate(::Nothing)
Closest candidates are:
  iterate(!Matched::Core.SimpleVector) at essentials.jl:600
  iterate(!Matched::Core.SimpleVector, !Matched::Any) at essentials.jl:600
  iterate(!Matched::ExponentialBackOff) at error.jl:218
  ... 

Stack trace:

MethodError:
   at Expr(::CSTParser.EXPR) (.\scripts\languageserver\packages\CSTParser\src\conversion.jl156)
   at Expr(::CSTParser.EXPR) (.\scripts\languageserver\packages\CSTParser\src\conversion.jl158)
   at _binary_expr(::CSTParser.EXPR) (.\scripts\languageserver\packages\CSTParser\src\conversion.jl622)
   at Expr(::CSTParser.EXPR) (.\scripts\languageserver\packages\CSTParser\src\conversion.jl149)
   at _binary_expr(::CSTParser.EXPR) (.\scripts\languageserver\packages\CSTParser\src\conversion.jl622)
   at Expr(::CSTParser.EXPR) (.\scripts\languageserver\packages\CSTParser\src\conversion.jl149)
   at Expr(::CSTParser.EXPR) (.\scripts\languageserver\packages\CSTParser\src\conversion.jl252)
   at _binary_expr(::CSTParser.EXPR) (.\scripts\languageserver\packages\CSTParser\src\conversion.jl620)
   at Expr(::CSTParser.EXPR) (.\scripts\languageserver\packages\CSTParser\src\conversion.jl149)
   at get_hover(::StaticLint.Binding, ::String, ::LanguageServerInstance) (.\scripts\languageserver\packages\LanguageServer\src\requests\hover.jl53)
   at get_hover(::CSTParser.EXPR, ::String, ::LanguageServerInstance) (.\scripts\languageserver\packages\LanguageServer\src\requests\hover.jl20)
   at process(::LanguageServer.JSONRPC.Request{Val{Symbol("textDocument/hover")},LanguageServer.TextDocumentPositionParams}, ::LanguageServerInstance) (.\scripts\languageserver\packages\LanguageServer\src\requests\hover.jl5)
   at run(::LanguageServerInstance) (.\scripts\languageserver\packages\LanguageServer\src\languageserverinstance.jl217)
   at top-level scope (.\scripts\languageserver\main.jl28)
   at include (boot.jl328)
   at include_relative(::Module, ::String) (loading.jl1105)
   at include(::Module, ::String) (Base.jl31)
   at exec_options(::Base.JLOptions) (client.jl287)
   at _start() (client.jl460)

CSTParser incorrectly accepts invalid iteration spec

julia> parse("[x for (x ? true : false)]")
ERROR: ParseError("invalid iteration specification")
Stacktrace:
 [1] #parse#234(::Bool, ::Bool, ::Function, ::String, ::Int64) at ./parse.jl:222
 [2] (::Base.#kw##parse)(::Array{Any,1}, ::Base.#parse, ::String, ::Int64) at ./<missing>:0
 [3] #parse#235(::Bool, ::Function, ::String) at ./parse.jl:232
 [4] parse(::String) at ./parse.jl:232

julia> CSTParser.parse("[x for (x ? true : false)]")
CSTParser.Comprehension  26 (1:26)
├─ CSTParser.PUNCTUATION{LSQUARE::Tokenize.Tokens.Kind = 87}  1 (1:1)
├─ CSTParser.Generator  24 (1:24)
│  ├─ CSTParser.IDENTIFIER  2 (1:1)
│  ├─ CSTParser.KEYWORD{FOR::Tokenize.Tokens.Kind = 24}  4 (1:3)
│  └─ CSTParser.InvisBrackets  18 (1:18)
│     ├─ CSTParser.PUNCTUATION{LPAREN::Tokenize.Tokens.Kind = 91}  1 (1:1)
│     ├─ CSTParser.ConditionalOpCall  16 (1:16)
│     │  ├─ CSTParser.IDENTIFIER  2 (1:1)
│     │  ├─ CSTParser.OPERATOR{2,CONDITIONAL::Tokenize.Tokens.Kind = 120,false}  2 (1:1)
│     │  ├─ CSTParser.LITERAL{TRUE::Tokenize.Tokens.Kind = 83}  5 (1:4)
│     │  ├─ CSTParser.OPERATOR{8,COLON::Tokenize.Tokens.Kind = 540,false}  2 (1:1)
│     │  └─ CSTParser.LITERAL{FALSE::Tokenize.Tokens.Kind = 84}  5 (1:5)
│     └─ CSTParser.PUNCTUATION{RPAREN::Tokenize.Tokens.Kind = 92}  1 (1:1)
└─ CSTParser.PUNCTUATION{RSQUARE::Tokenize.Tokens.Kind = 88}  1 (1:1)

Error parsing newline after `:`

    Status `~/.julia/dev/JuliaFormatter.jl/Project.toml`
  [00ebfdb7] CSTParser v2.0.0
  [0796e94c] Tokenize v0.5.7
  [8dfed614] Test

Also is valid for 2.1.0

ref domluna/JuliaFormatter.jl#194

julia> using CSTParser

julia> s = """
       function mystr( str::String )
           return SubString( str, 1:
           3 )
       end"""
"function mystr( str::String )\n    return SubString( str, 1:\n    3 )\nend"

julia> Meta.parse(s)
:(function mystr(str::String)
      #= none:2 =#
      return SubString(str, 1:3)
  end)

julia> CSTParser.parse(s)
FunctionDef  71(71)
 FUNCTION  9(8)
 Call  25(20)
  mystr  5(5) 
  (
  BinaryOpCall  12(11)
   str  3(3) 
   OP: DECLARATION  2(2)
   String  7(6) 
  )
 Block  34(33)
  Return  34(33)
   RETURN  7(6)
   Call  27(26)
    SubString  9(9) 
    (
    str  3(3) 
    ,
    BinaryOpCall  9(8)
     INTEGER: 1  1(1)
     ErrorToken  6(1)CSTParser.UnexpectedNewLine
      OP: COLON  6(1)
     INTEGER: 3  2(1)
    )
 END  3(3)

Implement graceful failure

Allow graceful failure of parsing.

In practice this will involve removing all @catcherror ps ... code snippets and deciding what tokens to ignore. most of the changes should involve parse_expression and parse_compound as well as explicit tests for PUNCTUATION tokens and punctuation-like KEYWORDs.

Fullspan doesn't match Tokenize offset

Project JuliaFormatter v0.1.37
    Status `~/.julia/dev/JuliaFormatter.jl/Project.toml`
  [00ebfdb7] CSTParser v1.1.0
  [0796e94c] Tokenize v0.5.7
  [8dfed614] Test

MWE:

julia> using CSTParser, Tokenize

julia> s0 = "using .. Foo"
"using .. Foo"

julia> CSTParser.parse(s0)
Using  11(11)
 USING  6(5)
 OP: DOT  1(1)
 OP: DOT  1(1)
 Foo  3(3)

julia> Tokenize.tokenize(s0) |> collect
6-element Array{Tokenize.Tokens.Token,1}:
 1,1-1,5          KEYWORD        "using"
 1,6-1,6          WHITESPACE     " "
 1,7-1,8          OP             ".."
 1,9-1,9          WHITESPACE     " "
 1,10-1,12        IDENTIFIER     "Foo"
 1,13-1,12        ENDMARKER      ""

It looks like CSTParser ignores the whitespace before Foo.

It also seems this was the case with v1.0.0 and v0.5.6

Formatting Utilities

For formatting purposes it's useful to extract certain formatting context information from a file; for example, the indenting style of the file as a whole (tabs/N spaces) or the indentation prefix of the current line. These things could live anywhere in principle (Deprecations.jl has its own set of layers over CSTParser, for example), but for the sake of making them more accessible and consistent, CSTParser seems like it'd be a good home for them.

If folks are on board I'll throw up a PR for some of these.

BoundsError: attempt to access "⬤" at index [4]

Moving this over from julia-vscode/julia-vscode#422

julia> using CSTParser
INFO: Precompiling module CSTParser.

julia> str = """\"\"\"\$(a)⬤\$b\"\"\""""
"\"\"\"\$(a)⬤\$b\"\"\""

julia> CSTParser.parse(str)
ERROR: BoundsError: attempt to access ""
  at index [4]
Stacktrace:
 [1] next at ./strings/string.jl:197 [inlined]
 [2] getindex(::String, ::Int64) at ./strings/basic.jl:32
 [3] (::CSTParser.#adjust_lcp#10)(::CSTParser.LITERAL, ::Bool) at /Users/sabae/.julia/v0.6/CSTParser/src/components/strings.jl:47
 [4] parse_string_or_cmd(::CSTParser.ParseState, ::Bool) at /Users/sabae/.julia/v0.6/CSTParser/src/components/strings.jl:103
 [5] parse_string_or_cmd(::CSTParser.ParseState) at /Users/sabae/.julia/v0.6/CSTParser/src/components/strings.jl:25
 [6] CSTParser.LITERAL(::CSTParser.ParseState) at /Users/sabae/.julia/v0.6/CSTParser/src/spec.jl:83
 [7] parse_doc(::CSTParser.ParseState) at /Users/sabae/.julia/v0.6/CSTParser/src/CSTParser.jl:287
 [8] parse(::CSTParser.ParseState, ::Bool) at /Users/sabae/.julia/v0.6/CSTParser/src/CSTParser.jl:341
 [9] parse(::String, ::Bool) at /Users/sabae/.julia/v0.6/CSTParser/src/CSTParser.jl:276
 [10] parse(::String) at /Users/sabae/.julia/v0.6/CSTParser/src/CSTParser.jl:275

High-level stable API

I'm in the progress of upgrading FemtoCleaner (the GitHub bot that fixes deprecations) from CSTParser 0.2.1 to the current version of CSTParser.
However, progress is very slow because since CSTParser does not have an API, a lot of the code in FemtoCleaner depends on the internals of CSTParser, which are now completely changed.

If the idea is to use CSTParser to build nice tools, I feel there needs to be a more high-level API where some stability is expected.

Keep more information in the CST

I think it would be useful to keep more information in the CST e.g. comments and whitespace.
An example for another language (javascript) can be found at https://github.com/cst/cst.

In the image:
image

the ones with blue background are nodes and are important for syntax. The other ones (called tokens) are just whitespace or other trivia that can be deducible from the nodes. The job of a code-formatting would then be to write a CST -> string function and modify the tokens so that the string output looks nice.

Incorrect function definition

First one all is well

julia> CSTParser.defines_function(CSTParser.parse("s = 10"))
false

I'm pretty sure this is wrong

julia> CSTParser.defines_function(CSTParser.parse("s.x = 10"))
true

Parsing input error for `test/show.jl` in julia repo

While trying out formatting on the entire julia repo, I encountered a file, test/show.jl which CSTParser cannot parse.

The formatter checks if the input can be parsed before proceeding.

https://github.com/domluna/JuliaFormatter.jl/blob/577138b80e5f40a61e695e55c723a10492779b6d/src/JuliaFormatter.jl#L214-L215

julia> s = read("test/show.jl") |> String;
julia> x, ps = CSTParser.parse(CSTParser.ParseState(s), true);
julia> ps.errored
true

This is the only file in the entire julia repo where this occurs.

Successfully parses invalid colon expression

julia> CSTParser.parse("1:2.^3:10")
ColonOpCall  9(9)
 INTEGER: 1  1(1)
 OP: COLON  1(1)
 BinaryOpCall  4(4)
  INTEGER: 2  1(1)
  OP: CIRCUMFLEX_ACCENT  2(2)
  INTEGER: 3  1(1)
 OP: COLON  1(1)
 INTEGER: 10  2(2)

vs.

julia> Meta.parse("1:2.^3:10")
ERROR: Base.Meta.ParseError("invalid syntax \"2.^\"; add space(s) to clarify")
Stacktrace:
 [1] #parse#1(::Bool, ::Bool, ::Bool, ::typeof(Base.Meta.parse), ::String, ::Int64) at ./meta.jl:184
 [2] #parse at ./none:0 [inlined]
 [3] #parse#4(::Bool, ::Bool, ::typeof(Base.Meta.parse), ::String) at ./meta.jl:215
 [4] parse(::String) at ./meta.jl:215
 [5] top-level scope at REPL[11]:1

Expr_char v0.7 compatability

@Keno , on v0.7

using CSTParser
c = String([0x, 0x, 0x])
s = "'$c'"
x = CSTParser.LITERAL(5, 1:5, s, CSTParser.Tokens.CHAR)
CSTParser.Expr_char(x)

gives
ERROR: StringIndexError(....
I don't know enough about strings/chars or changes made to them to fix this

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.