Giter Site home page Giter Site logo

chalcolith / ironmeta Goto Github PK

View Code? Open in Web Editor NEW
81.0 81.0 16.0 11.6 MB

The IronMeta parser generator provides a programming language and application for generating pattern matchers on arbitrary streams of objects. It is an implementation of Alessandro Warth's OMeta system for C#.

License: BSD 3-Clause "New" or "Revised" License

C# 100.00%

ironmeta's People

Contributors

andreyzakatov avatar chalcolith avatar dependabot[bot] avatar steve7411 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

ironmeta's Issues

Integrating IronMetaGenerate task in Visual Studio

Can you be more specific about how to integrate this in Visual Studio 2017? Perhaps a step-by-step?

IronMeta.Library.dllcontains an MsBuild task called "IronMetaGenerate". A simple example of how to use this:

      <UsingTask TaskName="IronMetaGenerate" AssemblyFile="path_to\IronMeta.Library.dll" />
      <Target Name="BeforeBuild">
        <IronMetaGenerate Input="MyParser.ironmeta" Output="MyParser.g.cs" Namespace="MyNamespace" Force="true" />
      </Target>

Non-linear performance for mixed right and left recursion

Hello, I think there is a performance problem with the current matching algorithm, when right and left recursions are being mixed.

When a right recursion grows the call stack and involves a left recursion and continues to grow (for example when it processes a nested expression), the performance characteristic of the current implementation of the algorithm is exponential.

Specifically, I think that making all the productions non-memoizable, that are placed in the call stack above the earliest appearance of the current left recursive production, is wrong (https://github.com/kulibali/ironmeta/blob/master/Source/Matcher/Matcher.cs#L216).

I tested an alternative in which only the productions, that are placed on the stack above the most recent production that matches the current left recursive one, are considered "involved" and are disabled for memoization. As far is I can see it now, this makes the performance characteristics of nested right recursions linear again, and passes all of my tests.

I must admit, that I do not completely understand the text of the original algorithm, and made this change just based on a hunch. And also there is a small probability that my F# port of the matching algorithm is just wrong and caused this problem in the first place.

For reference, here is the change set of the F# version, which also includes a test grammar:

pragmatrix/ScanRat@b59643c

The production addition contains a direct left-recursion, but is actually never used in the string that is parsed, but it does cause - unexpectedly - the drop in performance for the nested / right recursive property production. This might not be such a huge a problem for simple grammars, but it adds up considerably for more complex ones.

Solution does not build from repo

Clone the repo, open the solution, build.

Loads regex from NuGet. Then errors.

Severity Code Description Project File Line
Error Metadata file 'D:\MyDocs\Repos\ironmeta\Source\Library\bin\Debug\IronMeta.Library.dll' could not be found IronMeta.VSExtension D:\MyDocs\Repos\ironmeta\Tools\VSExtension\CSC
Error Error signing output with public key from file '..\IronMeta.snk' -- File not found. Library D:\MyDocs\Repos\ironmeta\Source\Library\CSC
Error Metadata file 'D:\MyDocs\Repos\ironmeta\Source\Library\bin\Debug\IronMeta.Library.dll' could not be found Calc D:\MyDocs\Repos\ironmeta\Samples\Calc\CSC
Error Metadata file 'D:\MyDocs\Repos\ironmeta\Source\Library\bin\Debug\IronMeta.Library.dll' could not be found IronMeta D:\MyDocs\Repos\ironmeta\Source\IronMeta\CSC
Error Metadata file 'D:\MyDocs\Repos\ironmeta\Samples\Calc\bin\Debug\Calc.exe' could not be found UnitTests D:\MyDocs\Repos\ironmeta\Tests\UnitTests\CSC
Error Metadata file 'D:\MyDocs\Repos\ironmeta\Source\Library\bin\Debug\IronMeta.Library.dll' could not be found UnitTests D:\MyDocs\Repos\ironmeta\Tests\UnitTests\CSC

C# library documentation from XML tags

The C# library contains lots of useful documentation, but this is not visible when debugging a target program. Is it possible to distribute documentation built from the tags, or a perhaps a debug version of the software in which the source code can be browsed?

Add an MSBuild task

An MSBuild task would make it easy to add IronMeta to the build process, including understanding dependencies between .ironmeta and generated .cs.

This would avoid the problem where someone changes (for example) the name of an AST node class, but forgets to rerun ironmeta.

Here's a first stab at an MSBuild Task:

using IronMeta.Generator;
using Microsoft.Build.Framework;
using Microsoft.Build.Utilities;

public class IronMetaTask : Task
{
    public override bool Execute()
    {
        var result = CSharpShell.Process(Input, Output, Namespace, Force);

        if (result.Success)
        {
            return true;
        }
        else
        {
            Log.LogError(result.Error);

            return false;
        }
    }

    public bool Force { get; set; }

    public string Namespace { get; set; }

    public string Output { get; set; }

    [Required]
    public string Input { get; set; }
}

Issues with special characters in regexp.

I have a lot of trouble getting IronMeta to work as expected with the regular expressions:
Issue cases:

  • \s does not match \n and some other whitespace characters -> this is on verophyle.regexp, not iron meta
  • CustomString = /[^\"]/; This works, but gets a weird syntax highlighting in VS Code if you set it to c# because the " starts a new string.
  • /REGEXPCODE/* is legal in the grammar but makes everything behind it be displayed as a comment in VS Code with c# syntax highlighting
  • it is not possible to have newline \n or tab \t inside of a regexp.

The reason for the latter is the following:
ws = /[\t \n\r]*/; This is my rule to match zero or more whitespace characters in the .ironmeta file. This gets parsed into new Verophyle.Regexp.StringRegexp(@"[\t \n\r]*") which looks fine at first. Actually it does not work due to the @ in front of the string which disables the tab and newline escapes.

Proposal:
move away from the /REGEXPCODE/ syntax and use a c# string with a special prefix to label it as a regexp. e.g. §"REGEXPCODE" which would be ws = §"[\t \n\r]*"; for my ws. This has the benefit of being interpreted as a string so \t, \r and \n should work as intended while also being nicely displayed in VS Code with c# syntax highlighting.
If this is not to your liking, another way to input special characters is needed. Another option would be to provide the special characters (especially not visible once like the tab and newline) to be input via unicode encoding which could look something like this: \u12345

item template for vs

please add a item template with build in custom tool and assembly to vs extension

expected eof or identbody

hi,

i have the followng problem: expected EOL or IdentBody on line 14 the str part.
when i comment out the values part compiles. so what is wrong with my regex, its valid regex
my grammar:

`
using System;
using System.Linq;
using IronMeta.Matcher;

ironmeta Function<char, string>
{
Expression = name "(" args ")";

name = /[_a-zA-Z][_a-zA-Z0-9]*/;
args = value | value ",";
value = str | integer | floating | hex;

//values
str = /"((?:\\.|[^"\\])*)"/;
integer = /[-+]?[0-9]+/;
hex = /0x[0-9A-Fa-f]+/;
floating = /[-+]?[0-9]*\.?[0-9]*/;

}
`

what do i wrong?

build ast

hi,

is it possible to build an ast from the parsing result?

Calculator Example

a) The calculator example is broken...

Will successfully parse '5+3' giving the answer '8'...

But will also successfully parse '5+' giving the answer '5'.

The parser returns a 'MatchResult' with an error condition 'not enough arguments;expected DecimalDigit or WS' which is correct but the 'Success' value is set to 'true' and not 'false'.

The match is not a success so why is it being reported as a success???

b) Also for '5+3' there is also the same error condition... just a different errorindex...

c) As an aside: Why is there no parse tree being constructed and returned as part of the 'MatchResult'? Does this have to be constructed manually using actions?

Can't pass constructed rules as arguments to rules

Given a rule that takes another as an argument:

Statement :indentation = indentation SomeOtherThing;

I would like to call this in some way that allows me to compute on the value :indentation. For example:

BlockBody :indentation = Statement('\t' indentation)*:statements;
// or
AlternateBlockBodyRule :indentation = Statement(Indent(indentation))*:statements;
Indent :indentation = '\t' indentation;

Neither of these work. The generator fails to parse my intent, and just passes null as the argument to Statement, which obviously fails.

extension does not work

i want to use your generator but nothing happens. i have created a file Grammar.ironmeta with Custom Tool IronMetaGenerator but nothing happens

Regexes are undocumented and non-standard

So there is support for regex in the form /regex/ but not as per .NET. In particular, the characters +-|* have to be escaped when used inside []
So what flavours of regex are used exactly?

General Question: Parse anything

hi, i would parse a block with body that can be anything, like this:
xml { }

this works fine but when i want to try some body content with curly braces it fails:
json { "obj": { "key": true } }

my rule for the body: [^}]*
ive treid using .* instead but it matches the end curly brace of the root block.

how could i solve this?

Concurrency issues when using regex

Without regular expressions multiple instances of the generated parser can be used concurrently. When regexps are used they are declared as 'static' but are in fact mutable objects, and _ParseRegexp method changes state of the objects. This prevents using parser with regular expressions concurrently:
image
image

Please change parser generation so that regexps are not static

Nested matchers

Hi, we've been using IronMeta in production for a while now, and are pretty happy with it (even more after #24). Thanks for the project!

Currently we are looking towards using it for named entity recognition (NER) task together with general pattern matching.

Consider for example phrase I want ten apples. We'd like to have a matcher that would answer if the phrase matches I want <N> apples pattern, AND would return value of N as int.

It's clear that we could write a specific grammar for this task, but consider a more complex example: I want <N> <Fruit> or I want <N> <Fruit> delivered at <Address> on <DayOrDate>.

It's still possible to write a specific grammar for every example, but it would cause huge duplication of code for specific entity matchers.

For example we already have a matcher that could transform words into integers ("ten thousands" -> 10000), so it would be great to have a way of re-using it without copy-pasting it as a part of specific grammar.

I imagine it to be something like this:

ironmeta IntMatcher<string, int>: Matcher<string, int>
{
...
}

enum Fruits
{
 Orange,
 Apple
}

ironmeta FruitsMatcher<string, Fruits>: Matcher<string, Fruits>
{
...
}

class Result
{
   int number;
   Fruits fruit;
}

ironmeta MainMatcher<string, Result>: Matcher<string, Result>
{
  Expression = "I" "want" IntMatcher:n FruitsMatcher:fr -> { return new Result() { number = n; fruit = fr }; };
}

What do you think? Is it possible to implement something like this? Or maybe there are simpler ways?

Unit testing example?

I am porting my project to .NET Standard and it is important to me to be able to unit test each grammar rule individually.

Can you please provide an example of calling a specific rule within a full grammar and verifying its output using either XUnit or similar?

Thanks in advance!

Problems with README documentation

As best I can tell the class CharMatcher doesn't exist in the library?

// IronMeta Calculator Example

using System;
using System.Linq;

ironmeta Calc<char, int> : IronMeta.Matcher.CharMatcher<int>
{
    Expression = Additive;

more docs/examples?

I might be missing something obvious, but the documentation would greatly benefit from the set of "building blocks", e.g. "how to match quoted string with escapes", "comma-separated list", "how to match right-associative infix operators" etc.

The custom tool "IronMetaGenerator" failed.

I am on Visual Studio 2017 v15.9.20, right click on resource file x.ironmenta > Run Custom Tool results in the following error message:

The custom tool 'IronMetaGenerator' failed. Could not load file or assembly 'Microsoft.VisualStudio.Shell.15.0, Version=16.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a' or one of its dependencies. The system cannot find the file specified.

I've installed the NuGet package and the Visual Studio Extension in the project. Is this something that can be fixed?

note: IronMeta.App.Exe appears to be working.

Double Quotes in Regex

Hi, I have trouble getting a Regex to work which should match double quotes.

Example: CustomString = /[^"]*/; -> it should match any char from one " to the other ". (Note that the surrounding double quotes are in the parent expression. This actually just matches everything that is not a double quote)

As far as I understand you are usinng https://github.com/verophyle/regexp for your regex and marking a regex with /..../ instead of new Regex("...."). This works just fine for everything BUT escaped or not-escaped double quotes. CustomString = /[^\"]*/; this also does not work and will cause a syntax error (i.e. EOL message).

This also does not work:
CustomString = /[^\u0022]/; // this encodes unicode double quotes
or
CustomString = /[^\u{0022}]
/; // this encodes unicode double quotes

I think letting the user escape the double quotes would be the more consistent approach, or if that is not an option, especially list the double quotes and how to work around them in a regex in the documentation.

Missing /tools folder in releases

It seems that nuget releases after 4.4.1 are missing the /tools folder, and therefore no IronMeta.App.exe is available. Is this an intended change?

It seems that it should still be possible to build .ironmeta files with MsBuild task. Is this a recommended way to use IM now?

Regex matches broken?

I am getting results from regex matches that I don't understand. In all cases, I'd expect the program below to output the same thing for Foo that it does for Bar, and to match exactly 6 characters every time.

Ironmeta file:

ironmeta Test<char, char>
{
    Foo = /[^\r\n]+/ -> { return new string(_IM_Result.Inputs.ToArray()); };
    Bar = (~('\r' | '\n').)+ -> { return new string(_IM_Result.Inputs.ToArray()); };
}

Test harness:

    class Program
    {
        static void PrintMatch(IronMeta.Matcher.MatchResult<char, char> m)
        {
            Console.WriteLine($"{m.Success} {m.NextIndex}");
            if (m.Success)
            {
                Console.WriteLine(string.Join(',', m.Results.Select(c => (int)c)));
            }
        }
        static void Compare(string s)
        {
            var p1 = new Test();
            var m1 = p1.GetMatch(s, p1.Foo);
            Console.WriteLine("foo:");
            PrintMatch(m1);
            var p2 = new Test();
            var m2 = p2.GetMatch(s, p2.Bar);
            Console.WriteLine("bar:");
            PrintMatch(m2);
        }
        static void Main(string[] args)
        {
            Compare("Hello!");
            Compare("Hello!\n");
            Compare("Hello!\nWorld!\n");
        }
    }

Results:

foo:
True 6
72,101,108,108,111,33
bar:
True 6
72,101,108,108,111,33
foo:
True 7
72,101,108,108,111,33,10
bar:
True 6
72,101,108,108,111,33
foo:
False -1
bar:
True 6
72,101,108,108,111,33

Handling word-level regexes with `.*`

Hi, we are looking towards moving our custom word-level regex engine to IronMeta. Considering that at least for characters IronMeta supports regex-based rules, it seems like it should work out of the box.

Here's th issue I'm currently looking at. Let's consider the following regex: .* "test" .*. It should match all the phrases, containing word test, eg. ["test"], ["some", "test"].
So a naive translation into IronMeta matcher would look like this:

ironmeta MatcherSpecific<string, bool>: Matcher<string, bool>
{
    Pattern_0 =  .* "test" .*;
    Expression = Pattern_0;
}

and it does not match any of the test phrases (giving error: expected end of file). This seems somewhat expectable considering the greedy nature of the quantification operators in IronMeta.

So I kinda found two ways to make matcher for this sample regex:

ironmeta MatcherSpecific<string, bool>: Matcher<string, bool>
{
    Pattern_0 =  "test" .*;
    Pattern_0 =  . Pattern_0;
    Expression = Pattern_0;
}
ironmeta MatcherSpecific<string, bool>: Matcher<string, bool>
{
    Pattern_0 =  (~"test" .)* "test" .*;
    Expression = Pattern_0;
}

But it's unclear to me how to generalize these solutions to more complex real-world regex cases. Could you please suggest something?

I wonder if the regular regexes for characters are also working this way in IronMeta?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.