Giter Site home page Giter Site logo

tree-sitter-c-sharp's Introduction

tree-sitter-c-sharp

CI discord matrix crates npm pypi

C# grammar for tree-sitter based upon the Roslyn grammar with changes in order to:

  • Deal with differences between the parsing technologies
  • Work around some bugs in that grammar
  • Handle #if, #else, #elif, #endif blocks
  • Support syntax highlighting/parsing of fragments
  • Simplify the output tree
  • Reduce parser state count and complexity
  • Be in-line with tree-sitter's convention where applicable

Status

Comprehensive supports C# 1 through 13.0 with the following exception:

  • async, var and await cannot be used as identifiers everywhere they are valid

References

tree-sitter-c-sharp's People

Contributors

amaanq avatar bekavalentine avatar brandonspark avatar damieng avatar dcreager avatar gbogarinb avatar geekmasher avatar gonglinyuan avatar hendrikvanantwerpen avatar hvitved avatar initram avatar jcs090218 avatar lukepistrol avatar luni-4 avatar maxbrunsfeld avatar mjambon avatar msftenhanceprovenance avatar neuromagus avatar noellelc avatar patrickt avatar rien avatar sjord avatar tamasvajk avatar tiggilyboo avatar xapphire13 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tree-sitter-c-sharp's Issues

Identifier regex should match non-ascii letter characters

Language spec for identifiers can be found here:
https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/language-specification/lexical-structure#identifiers

I have not been able to change the regex to use unicode caracter classes though, as I get the following error when I try:
Regex error: Unsupported character class syntax Unicode(ClassUnicode { span: Span(Position(o: 15, l: 1, c: 16), Position(o: 24, l: 1, c: 25)), negated: false, kind: Named("Lu") })

Is there a way to support unicode character classes?

Add global_statement

The _member_declaration in grammar.js does not yet include:

It causes all sorts of conflicts/precedence situations that would need resolving.

Support trailing commas in switch statements

This code doesn’t parse until you remove the trailing comma in the last clause:

        private static ITypeSymbol GetType(Compilation compilation, ISymbol symbol)
            => symbol switch
            {
                IFieldSymbol field => field.Type,
                IPropertySymbol property => property.Type,
                _ => compilation.GetSpecialType(SpecialType.System_Object),
            };

Update check list for c# 9

The following features do not need changes to the syntax:

  • Module initializers (is just a new attribute)
  • Skip localsinit (is just a new attribute)
  • Extension GetEnumerator (Allow extention methods with the right signature to be automatically used in foreach loops)

I don't know if we should remove them from the list or just check them off, as there should be nothing for us to implement. They are all semantic changes to the language.

How to tell what is missing

Is there a list or other easy way to identify aspects that are missing?

Perhaps we need a checklist so that we can know when the project is "complete".

Support keywords as arguments to lambdas.

It surprises me that this works, but it appears to be correct: note that the argument to the lambda is named var.

var firstVarDeclWithInitializer = variableDeclaration.Variables.FirstOrDefault(var => var.Initializer != null && var.Initializer.Value != null);

Add ref_type

The _type in grammar.js does not yet include ref_type because it causes a conflict with the ref keyword modifier.

Enable query_expression without conflicts

Would be nice to have the LINQ comprehension syntax working however enabling query_expression by un-commenting it in _expression causes lots of conflicts. It needs some precedence specifying.

Support destructuring tuple assignment

We currently fail on code like this:

foreach (var (document, diagnosticsForDocument) in diagnostics.GroupBy(d => GetReportedDocument(d, treeToDocumentMap)))

Changing that (document, diagnosticsForDocument) pattern to a single binding removes the error, so I think this is probably some new syntax. I don’t know what it’s called, though; I’ll need a C# expert’s take here.

Can't build on Windows

When I run npm run build I get the following error:

binding.obj : error LNK2001: unresolved external symbol ts_language_c_sharp [D:\github\tree-sitter-c-sharp\build\ts_language_c_sharp_binding.vcxproj]
D:\github\tree-sitter-c-sharp\build\Release\ts_language_c_sharp_binding.node : fatal error LNK1120: 1 unresolved externals [D:\github\tree-sitter-c-sharp\build\ts_language_c_sharp_binding.vcxproj]

Running Windows 10
NPM 5.4.2
Node 8.7.0

Double-brackets in interpolated strings

As found in Roslyn:

            var content = $@"
class A
{{
    bool Method(int value)
    {{
        return value  is  {operatorText}  3  or  {operatorText}  5;
    }}
}}
“;

Looks like {{ is interpreted as a literal {, and I’m not sure we have rules to handle that yet.

Implement C# 9 record support

This should cover the record_declaration type which is similar but not identical to class_declaration as well as the difference in bases to cover the argument passing.

Additionally should include with keyword support.

Support target-type new expressions.

Right now this fails to parse:

        internal static class EnumFormatters
        {
            public static readonly EnumFormatter<AnalysisKind> AnalysisKind = new(value => (int)value, value => (AnalysisKind)value);
        }

failing at the = new( bit. I think this is the relevant piece of syntax.

Conditional expression evaluates sub-parts incorrectly

e.g.

var a = b ? 1 + 2 : 3;

Is generating:

...
(equals_value_clause
  (conditional_expression
    (identifier)
    (binary_expression (integer_literal) (integer_literal)) (integer_literal))

When it should be generating

...
(equals_value_clause
  (conditional_expression
    (member_access_expression (identifier) (identifier))
    (string_literal) (string_literal))))))))

Related to #131

node_gyp issue on Windows

When running npm install tree-sitter-c-sharp --save on Windows.

gyp ERR! UNCAUGHT EXCEPTION
gyp ERR! stack Error: spawn C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\MSBuild\15.0\Bin\MSBuild.exe ENOENT
gyp ERR! stack at Process.ChildProcess._handle.onexit (internal/child_process.js:240:19)
gyp ERR! stack at onErrorNT (internal/child_process.js:415:16)
gyp ERR! stack at process._tickCallback (internal/process/next_tick.js:63:19)
gyp ERR! System Windows_NT 10.0.18363
gyp ERR! command "C:\Program Files\nodejs\node.exe" "C:\Program Files\nodejs\node_modules\npm\node_modules\node-gyp\bin\node-gyp.js" "rebuild"
gyp ERR! cwd C:\Users\mathi\source\repos\FluffySpoon.JavaScript.CSharpParser\node_modules\tree-sitter-c-sharp
gyp ERR! node -v v10.16.3
gyp ERR! node-gyp -v v3.8.0
gyp ERR! This is a bug in node-gyp.
gyp ERR! Try to update node-gyp and file an Issue if it does not help:
gyp ERR! https://github.com/nodejs/node-gyp/issues
npm ERR! code ELIFECYCLE
npm ERR! errno 7
npm ERR! [email protected] install: node-gyp rebuild
npm ERR! Exit status 7
npm ERR!
npm ERR! Failed at the [email protected] install script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.

npm ERR! A complete log of this run can be found in:
npm ERR! C:\Users\mathi\AppData\Roaming\npm-cache_logs\2020-01-09T18_40_23_838Z-debug.log

Add tuple_type

The _type in grammar.js does not yet include tuple_type because it causes multiple conflict rules and issues.

Conditional access pattern failure

This kind of if pattern is not working in the case of

if ((me as Person)?._Age != 1)
{
}

This looks like it could be to do with the priority of conditional_access_expression.

Add xml doc comment parsing

There is a lot of XML specified in the grammar.txt but it is not clear if/how it ties in with the /// commenting (there is no pattern for /// listed)

Set precedence of 'invocation_expression'

Source code:

void f()
{
    !this.Call();
}

Expected result:

(compilation_unit [0, 0] - [4, 0]
  (method_declaration [0, 0] - [3, 1]
    type: (void_keyword [0, 0] - [0, 4])
    name: (identifier [0, 5] - [0, 6])
    parameters: (parameter_list [0, 6] - [0, 8])
    body: (block [1, 0] - [3, 1]
      (expression_statement [2, 4] - [2, 17]
        (prefix_unary_expression [2, 4] - [2, 16]                  // parent: prefix_unary_expression
          (invocation_expression [2, 5] - [2, 16]                  // child: invocation_expression
            function: (member_access_expression [2, 5] - [2, 14]
              expression: (this_expression [2, 5] - [2, 9])
              name: (identifier [2, 10] - [2, 14]))
            arguments: (argument_list [2, 14] - [2, 16])))))))

Actual result:

(compilation_unit [0, 0] - [4, 0]
  (method_declaration [0, 0] - [3, 1]
    type: (void_keyword [0, 0] - [0, 4])
    name: (identifier [0, 5] - [0, 6])
    parameters: (parameter_list [0, 6] - [0, 8])
    body: (block [1, 0] - [3, 1]
      (expression_statement [2, 4] - [2, 17]
        (invocation_expression [2, 4] - [2, 16]                    // parent: invocation_expression
          function: (prefix_unary_expression [2, 4] - [2, 14]      // child: prefix_unary_expression
            (member_access_expression [2, 5] - [2, 14]
              expression: (this_expression [2, 5] - [2, 9])
              name: (identifier [2, 10] - [2, 14])))
          arguments: (argument_list [2, 14] - [2, 16]))))))

Implement top level statement support from C# 9.0

This will cause a large change in the corpus as previously we deviated from Roslyn to allow partial fragments of code to be syntax highlighted and parsed... and we did that by allowing all sorts of declarations at the top level.

Now there is a specific syntax and tree for it a lot of previously accepted-top level items will be treated differently. I expect we'll still support them all but there will be subtle changes like:

var a = Assert.Range(from, to);

On it's own will change from being a field_declaration (we treated it as if it were within a class even tho var isn't valid on a field declaration) to a local_declaration_statement just as if it were in a function.

Gap list?

Is there a list of constructs etc. not yet supported?

Local function that return tuple types can not be parsed

When looking though what parts of roslyn that we can not parse, I found this issue:

class A {
  void M(){
    (bool a, bool b) M2() 
    {
      return (true, false);
    }
  }
}

It results in a parse error where it tries to parse the local function as (expression_statement (invocation_expression ....

The line in roslyn source can be seen here: https://github.com/dotnet/roslyn/blob/057b9e2b3d782fb3ea98d92fb94bf1c924cfdf81/src/Analyzers/Core/Analyzers/MakeFieldReadonly/MakeFieldReadonlyDiagnosticAnalyzer.cs#L132

Fix conflict with variable declaration statements

When variable declaration statements are turned on, we get the following:

Error: Unresolved conflict for symbol sequence:

  'class'  identifier_name  '{'  identifier_name  parameter_list  '{'  identifier_name  •  '<'  …

Possible interpretations:

  1:  'class'  identifier_name  '{'  identifier_name  parameter_list  '{'  (generic_name  identifier_name  •  type_paramete
r_list)
  2:  'class'  identifier_name  '{'  identifier_name  parameter_list  '{'  (_expression  identifier_name)  •  '<'  …

Possible resolutions:

  1:  Specify a higher precedence in `generic_name` than in the other rules.
  2:  Specify a higher precedence in `_expression` than in the other rules.
  3:  Specify a left or right associativity in `_expression`
  4:  Add a conflict for these rules: `generic_name` `_expression`

Add structured_trivia

Right now pre-processor directives are handled in a simple block with some limitations:

  1. Does not support all operations
  2. Does not tokenize conditions, pragmas etc.
  3. Allows whitespace before the directive

The Roslyn grammar.txt has a structured_trivia that would help address at least 1 & 2.

This would replace preprocessor_directive in grammar.js

Fails to parse out argument with inline tuple declaration

The following fails to parse: M(out (int a, int b) c);
It tries to parse the tuple type as a tuple_expression, but it should parse it as tuple_type.

The following works as expected: (int a, int b) = c;
Here parses it as (tuple_expression (argument (declaration_expression ...
which is the same way that roslyn parses it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.