benjamin-hodgson / pidgin Goto Github PK
View Code? Open in Web Editor NEWA lightweight and fast parsing library for C#.
Home Page: https://www.benjamin.pizza/Pidgin
License: MIT License
A lightweight and fast parsing library for C#.
Home Page: https://www.benjamin.pizza/Pidgin
License: MIT License
Background: I need a parser that can consume an ISO-8601 into a DateTime
. I'm trying to avoid reinventing the wheel for all the valid of date and time format, so I'm leaning on DateTime.Parse()
. I need to parse out a string with Pidgin to send to DateTime.Parse()
, but I don't actually know if that string is fully valid until I feed it through that parse method (code below).
Unfortunately there doesn't seem to be a way to indicate failure from within the "combine parser outputs" function that's passed to Map
, so an invalid date string will causing parsing to fail with a System.FormatException
. Of course there's TryParse()
, but once I know parsing has failed it doesn't seem like I can do anything with that info?
Is there a recommended way to handle this type of situation?
internal static readonly Parser<char, IFilterExpression> DateTimeLiteral =
SharedParsers.Token(
Map(
(year, rest) =>
{
var input = new string(year.ToArray()) + rest;
var result = DateTime.Parse(
input,
CultureInfo.InvariantCulture,
DateTimeStyles.RoundtripKind
);
return (IFilterExpression) new DateTimeLiteral(result);
},
Digit.Repeat(4),
Char('-')
.Then(
Try(Digit)
.Or(OneOf('-', ':', '.', 'T', 'Z'))
.ManyString(),
(head, tail) => head + tail
)
)
);
Let's have the following (simplified) example:
static readonly Parser<char, IEnumerable<string>> Foo = OneOf(Map(c => "<>", Token('|')), Token(c => c != '|').ManyString()).Many();
void Main()
{
Foo.Parse("|bar|foo").Dump();
}
It fails with Many() used with a parser which consumed no input
. What's wrong with that?
I'm trying to parse a quoted string with an escaped quote in it. I've tried a few variations of this with no success:
private static readonly Parser<char, string> QuotedString =
(String("\\\"").Or(Token(c => c != '"').ManyString())).Many().Select(System.String.Concat).Between(Quote);
Any ideas?
(Sorry if I misunderstand the goal of this library)
The parsers I found in the examples are all string-based. I'm wondering whether it is possible to create functionality using this library that takes a Stream
as an input and that outputs parsed items through event
s when new data is received through the input stream.
The use case for this is that I want to connect the incoming data Stream
from a SerialPort
object to my parser functionality, and I want it to emit parsed pieces of information to my GUI via event
s.
This is throwing a error because Any doesn't exist
using NUnit.Framework;
using Pidgin;
using static Pidgin.Parser;
using static Pidgin.Parser<char>;
namespace Pidgin_Basic_Tests
{
public class Pidgin
{
[Test]
public void Any()
{
Assert.AreEqual('a', Any.ParseOrThrow("a"));
Assert.AreEqual('b', Any.ParseOrThrow("b"));
}
}
}
there is how ever and Any() but it is a method and returns void
EDIT: renaming my Pidgin file and class to PidginTest doesn't seem to work.
EDIT2: so this..
using NUnit.Framework;
using Pidgin;
using static Pidgin.Parser;
using static Pidgin.Parser<char>;
namespace Pidgin_Basic_Tests
{
public class PidginTest
{
[Test]
public void Any()
{
Assert.AreEqual('a', Parser<char>.Any.ParseOrThrow("a"));
Assert.AreEqual('b', Parser<char>.Any.ParseOrThrow("b"));
}
}
}
works and makes using Pidgin;
a reconsidered using statement but the other two are still considered unnecessary.
EDIT2: odd it works fine in a different file maybe its due to n unit tests?
Hi,
we would like to use Pidgin for parsing data received using ReadOnlySequence<T>
. I'm experimenting with implementation in beho/Pidgin/read-only-sequence.
I was forced to fork this repository because ITokenStream
as the essential extension point for creating new token streams is marked internal. I would very much like to be able to use your build of Pidgin (and be able to update by simply updating nuget package) than having to merge new commits into a fork.
So the question is โ is there any plan to publish ITokenStream
for external consumers or is it something you would be willing to consider? Or from opposite perspective, is there any reason preventing you from making this interface public?
Thanks for great work!
beho
I would love to be able to resume parsing once I finish parsing something. For example, it would be nice to do something like this:
IEnumerable<Token> ParseStream(TextReader stream) {
var result = this.Parser.Parse(stream);
while (result.Success) {
yield return result.Value;
result = result.ParseAgain(); // This would probably be implemented differently
}
}
This would allow us to parse from an input stream as needed, without needing to store all the output tokens in memory. For example, we could process a 2GB+ file and write the output into another file without using very much memory.
Sprache has an equivalent to this, but does not support parsing from a stream:
IEnumerable<Token> ParseStream(string input) {
var result = this.Parser.TryParse(input);
while (result.WasSuccessful) {
yield return result.Value;
result = this.Parser(result.Remainder);
}
}
One possible approach is to keep a reference to the ParseState<T>
inside the object that Parse
returns.
First thank you for this library, I have some monster regexs that i can make more maintainable with this =D
But, I have some difficulty composing a parser that emits a complex type.
I have a string like this
Bob Saget : (1234) 'Actor'
And an object like this
class Person
{
public string Name { get; set; }
public int Id { get; set; }
public string Title { get; set; }
}
And a few parsers defined like this
var Colon = Char(':');
var SingleQuote = Char('\'');
var Name = OneOf(Letter, Whitespace).ManyString();
var IdNumber = Num.Between(Char('('), Char(')'));
var Title = OneOf(Letter, Whitespace).Many().Between(SingleQuote).Select(chars => string.Concat(chars));
But I am having a hard time chaining my primitive parsers into a single complex parser without losing the data from early stages in the pipeline.
Ideally i would like a Result<char, Person>
. To accomplish this I have awkwardly 'mapped' them together like this.
Person MakePerson(string name, char _, char _, int id, char _, string title)
{
return new Person {
Name = name,
Id = id,
Title = title
};
}
var personParser = Map(MakePerson, Name, Colon, Whitespace, IdNumber, Whitespace, Title);
It works, but if i had 10 properties or more discards I would be out of luck for using map.
The given sequencing primitives (Then and Before) assume one of the two captures are not useful so if i don't have throwaway characters to burn i am not sure how to use them. I might just be bad at parser combinators but this has me kinda stumped.
I have fought this making by a fluent builder that flows state, very similar to a F# computation expression like this.
public static class Builder
{
public static Builder<TToken, TComplex, TToken> Create<TToken, TComplex>(IEnumerable<TToken> tokenStream, TComplex item) => new Builder<TToken, TComplex, TToken>(tokenStream, item);
}
public struct Builder<TToken, TComplex, TLast>
{
public IEnumerator<TToken> Enumerator { get; }
public TComplex Item { get; }
public Result<TToken, TLast>? LastResult { get; }
public Builder(IEnumerable<TToken> tokenStream, TComplex item)
{
Enumerator = tokenStream.GetEnumerator();
Item = item;
LastResult = null;
}
private Builder(IEnumerator<TToken> enumerator, TComplex item, Result<TToken, TLast> lastResult)
{
Enumerator = enumerator;
Item = item;
LastResult = lastResult;
}
private bool ResultOk => LastResult == null || LastResult.Value.Success;
public Builder<TToken, TComplex, T2> Capture<T2>(Parser<TToken, T2> parser, Action<TComplex, T2> assignment)
{
if (!ResultOk)
return new Builder<TToken, TComplex, T2>(Enumerator, Item, new Result<TToken, T2>());
var newResult = parser.Parse(Enumerator);
if (newResult.Success)
assignment(Item, newResult.Value);
return new Builder<TToken, TComplex, T2>(Enumerator, Item, newResult);
}
public Builder<TToken, TComplex, T2> Skip<T2>(Parser<TToken, T2> parser)
{
if (!ResultOk)
return new Builder<TToken, TComplex, T2>(Enumerator, Item, new Result<TToken, T2>());
return new Builder<TToken, TComplex, T2>(Enumerator, Item, parser.Parse(Enumerator));
}
public Result<TToken, TComplex> Done
{
get
{
Parser<TToken, TComplex> ret;
if (ResultOk)
ret = Parser<TToken>.Return(Item);
else
ret = Parser<TToken>.Fail<TComplex>();
return ret.Parse(Enumerable.Empty<TToken>());
}
}
}
Which i then use like this
var result =
Builder
.Create(testSubject, new Employ())
.Capture(Name, (e, name) => e.Name = name)
.Skip(Colon)
.Skip(Whitespace)
.Capture(IdNumber, (e, id) => e.Id = id)
.Skip(Whitespace)
.Capture(Title, (e, title) => e.Title = title)
.Done;
My questions are
It would be nice to be able to have as a result a Span
in the original string/buffer that covers the matched extend.
It is quite a lot of code to assemble some literal types such as double
, DateTimeOffset
or TimeSpan
, especially if you're going to support all optional parts and variations.
The framework already has some very efficient parsers for those and it's relatively easy to just check if the input matches a format you're supporting, then letting the framework do the actual parsing.
Predicate.SpanResult<T>(Func<Span<string>, T> selector)
Want to parse a phone number literal that should look (0xx) xx xx
? and then keep it as a string?
String("(0")
.Then(Digit.Repeat(2))
.Then(String(") ")
.Then(Digit.Repeat(2))
.Then(Char(' '))
.Then(Digit.Repeat(2))
.SpanResult(s => s.ToString()); // Result is the full matched string
Want to parse a decimal?
Char('-').Optional()
.Then(Digit.SkipAtLeastOnce())
.Then(Char('.').Then(Digit.SkipMany()).Optional())
.SpanResult(s => decimal.Parse(s)); // Result is the parsed decimal number
Compare that with the work required to build up the decimal yourself.
Based on that idea that sometimes you just want to grab the matched substring, I think a regex parser building block that returns a Span
when it matches would make the scenarios above even simpler.
var phoneParser = Regex(@"\(0\d\d) \d\d \d\d");
var decimalParser = Regex(@"-?\d+(\.\d*)?", s => decimal.Parse(s));
When analyzing the actual xml file with xmlParser of Sample Code, it fails.
What should I do to analyze characters containing '\ r','\ n'?
Does the relevant code exist?
xml File:
public void Test()
{
string xmlContents = "<?xml version=\"1.0\"?>\r\n" +
"<note>\r\n" +
"<from>Jani</from>\r\n" +
"<to>Tove</to>\r\n" +
"<message>Norwegian: aa. French: eee</message>\r\n" +
"</note>";
var result = XmlParser.Parse(xmlContents);
}
result`s success is false.
used sample code
Used XmlParser.cs in Pidgin.Examlples Project
Thanks, good day~
Hi,
Sorry if this isn't the right place for questions, but how would you modify your XmlParser example so that it could parse <foo>some text</foo>
. Basically, adding an InnerText
or Text
property?
I'd like to parse something like
"Hi I'm ""Joonhwan"""
to
{quote} + {string:Hi I'm "Joonhwan"} + {quote}
Please be noted that the 'joonhwan' part is quoted inside of the string. yes..wierd but I need this.
quote mark itself is escaped using quote.
How to I create parser for this ?
var result = Sequence('"', '"').Select(_ => '"').Or(Token(c => c!='"')).Many().Between(Char('"')).Parse("\"\"\"abcd\"\"1234.556\"");
That what i tried but failed.
Hi Benjamin, how would one go about parsing a piece of code that defines new operators?
A motivating example is e.g. Prolog where one can:
:- op(200, xfy, =>>).
40 =>> 20.
I can see that ExpressionParser.Build takes an collection of operators, would it work to update the contents of the collection as the parsing progresses and we encounter new operator definitions?
how would I parse a uint?
//sounds like it would repeat until Letter or Symbol and then stop but it wants a digit, letter or symbol at the end.
public static Parser<char, uint> uInt { get; protected set; } = Digit.AtLeastOnceUntil(Letter.Or(Symbol)).Cast<uint>();
EDIT:
I also tried using a cast hack but it fails..
public static Parser<char, uint> uInt { get; protected set; } = Digit.AtLeastOnce().Cast<uint>().Labelled("uint");
Can you change the name of this?
There is already Pidgin since a lot of years (formerly named Gaim):
Hi,
In the expression parser, an (partially) invalid string takes a long time to fail. Ultimately the failure is reasonable, but the time it takes to fail may not be. I am looking for advice on the structure of multi-character sequences, is there a better way?
Having lots of fun with this excellent tool!
Applicable Expression parsers
private static readonly Parser<char, Func<IExpr, IExpr, IExpr>> EqualTo
= Binary(Tok("=").Then(String("=")).ThenReturn(BinaryOperatorType.EqualTo)); // "=="
Input string
1=2
Expected
1==2
Exception message
(reasonable)
Exception has occurred: CLR/Pidgin.ParseException
Exception thrown: 'Pidgin.ParseException' in Pidgin.dll: 'Parse error.
unexpected 2
expected expression
at line 1, col 3'
at Pidgin.ParserExtensions.GetValueOrThrow[TToken,T](Result2 result) at Pidgin.ParserExtensions.ParseOrThrow[T](Parser
2 parser, String input, Func`3 calculatePos)
at ApplicationSupport.Parsers.ExprParser.ParseOrThrow(String input) in /Users/mustik/Projects/ReservationCheck/ReservationCheck/Support/Parsers/ExprParser.cs:line 155
Hello.
I try running XmlParser in Pidgin.Examples and find out it does not parse nested tags. It only parse simple tag.
Can you please update the example or tell me how can fix it? Thanks
am I missing something?
Visual Studio 2017 can't seem to find Assert.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;
using Pidgin;
using static Pidgin.Parser;
using static Pidgin.Parser<char>;
namespace ScriptingLanguage
{
class Program
{
static void Main(string[] args)
{
Assert.AreEqual('a', Any.ParseOrThrow("a"));
}
}
}
I've been trying to implement chained function calls, i.e. foo(bar)(baz)
It seemed like the obvious way to do this is to use a postfix operator but I haven't managed to get it to work. The first call will parse (foo(bar)), but the operator won't chain. I looked at the source for the expression builder, and it looks like, unlike with binary operators, unary operators are not recursively applied and aggregated.
Should the unary operators be recursively applied? Can you suggest how I should get this to work?
In your conversation with kswaroop1 re: his "expression example" pull request you suggest doing exactly this for function call operators (including the foo(bar)(baz) example), but there isn't enough of an example for me to understand how to make this work.
Is this intended behaviour?
var p1 = Fail<Unit>().Labelled("fail1");
var p2 = p1.Where(r => true);
var p3 = p2.IgnoreResult();
Console.WriteLine(p1.Parse("abc").Error.Expected.Single().Label); // fail1
Console.WriteLine(p2.Parse("abc").Error.Expected.Single().Label); // fail1
Console.WriteLine(p3.Parse("abc").Error.Expected.Single().Label); // result satisfying assertion
It was certainly unexpected, since I didn't see why ignoring the result would then change the failure expectation.
Hi, Thanks for such a great library, I noticed you have both DecimalNum and Num returning the same int type. is this a "copy paste feature"?
Is there any Parser to parse a decimal number and return a double?
I am trying to parse a sequence of numbers like below:
114365.0 121429.0 6.2 4.0 30357.0 117165.0
Fine if you don't, I will write a new one, but as you have created the DecimalNum thought would be worth to ask you this question.
//
// Summary:
// A parser which parses a base-10 integer with an optional sign. The resulting
// int is not checked for overflow.
//
// Returns:
// A parser which parses a base-10 integer with an optional sign
public static Parser<char, int> DecimalNum { get; }
//
// Summary:
// A parser which parses a base-10 integer with an optional sign. The resulting
// int is not checked for overflow.
//
// Returns:
// A parser which parses a base-10 integer with an optional sign
public static Parser<char, int> Num { get; }
Thanks in Advance,
Paulo.
I'm experimenting with your parser (Celin.AIS.Data) but got stuck by a bulk of 'is defined in an assembly that is not referenced...' errors. After hours of googling and trial and errors I finally pulled the Pidgin source and saw that its target is netstandard1.3. After building it with netstandard2.0 as the target, so it matches my library, the errors dissipated.
is there a way to do this..
Parser<char IEnumerable<char>> While(Predicate<string> pred) => Any().Until(!pred.Invoke(<UNKNOWNSTRING>));
Parser<char IEnumerable<char>> Until(Predicate<string> pred) => Any().Until(pred.Invoke(<UNKNOWNSTRING>));
I'm trying to parse a big and complex database log file. There are a lot of event types written to this log file and each has its own syntax. Some of the events are just a single line with some properties, but others are more complicated an can even be nested.
I need to parse just some of the event types, because currently I do not care about the rest of the log file. I was able to get the event type Parser to work, but I struggled a lot to make a parser that cherry picks just a view parser and skips the rest of the input. And I'm wondering how would you solve this problem.
Here is the problem in a nutshell:
using System;
using System.Collections.Generic;
using Pidgin;
using static Pidgin.Parser;
using static Pidgin.Parser<char>;
namespace parsertest
{
static class NoisyParser
{
private static readonly Parser<char, string> ComplexParserA = String("AA");
private static readonly Parser<char, string> ComplexParserB = String("BB");
private static readonly Parser<char, string> RealData =
ComplexParserA
.Or(ComplexParserB);
private static readonly Parser<char, Unit> Skip =
Any.SkipUntil(Lookahead(Try(RealData)));
private static readonly Parser<char, string> RealDataWithNoiseBefore =
from _ in Skip
from traceData in RealData
select traceData;
private static readonly Parser<char, IEnumerable<string>> RealDataWithNoise =
from traceData in RealDataWithNoiseBefore.Until(Not(Lookahead(Skip)))
from __ in Any.Until(End)
select traceData;
public static Result<char, IEnumerable<string>> Parse(string input) =>
RealDataWithNoise.Parse(input);
}
class Program
{
static void Main(string[] args)
{
var veryNoisyInput = "asdfask asdf ASA a BB asdkfjAAaAa";
// ^ ^
// | Should match
// Should match
var parserResult = NoisyParser.Parse(veryNoisyInput);
if (parserResult.Success)
{
foreach (var match in parserResult.Value)
{
Console.WriteLine(match);
}
}
else
{
Console.WriteLine($"Error.EOF: {parserResult.Error.EOF}");
Console.WriteLine($"Error.ErrorPos: {parserResult.Error.ErrorPos}");
Console.WriteLine($"Error.Message: {parserResult.Error.Message}");
}
}
}
}
Is there a better way to solve this, I've got the feeling I'm missing something essential...
can we have some examples on how to make scripting language?
some things I would like to see would be..
is there a way to add two parsers together?
using Pidgin;
using static Pidgin.Parser;
using static Pidgin.Parser<char>;
using System;
using System.Collections.Generic;
using System.Text;
namespace Pidgin_Basic_Tests
{
public class TNumber
{
float value;
public static Parser<char, string> DotOp { get; protected set; } = String(".");
public static Parser<char, string> HexOp { get; protected set; } = String("0x");
public static Parser<char, string> DecOp { get; protected set; } = String("0d");
public static Parser<char, string> OctOp { get; protected set; } = String("0o");
public static Parser<char, string> BinOp { get; protected set; } = String("0b");
//trying to get either 0d+integer_number or integer_number+'.'+integer_number or integer_number
public static Parser<char, string> Dec { get; protected set; } = Try(DecOp.Or(DecimalNum.Cast<string>()));
}
}
I tried to use Many()
, however it threw an exception.
System.InvalidOperationException: 'Many() used with a parser which consumed no input'
Reproduction Code:
var x = Try(WhitespaceString.Many()).Parse(" ");
More details:
var x = Try(WhitespaceString).Parse(" ");
works just fine.
However, if I want to skip whitespaces or comments, I have to write something along the lines of var x = Try(WhitespaceString.Or(CommentString)).Parse(" ");
. This won't quite do the trick yet, since it can't handle a string that has whitespace and comments. To solve that, I tried using .Many()
to apply "the current parser zero or more times". var x = Try(WhitespaceString.Or(CommentString).Many()).Parse(" ");
But that results in an exception.
Which is the way to apply the Separated method with a Sequenced parser because the following code returns the original string.
private static readonly Parser<char, char> RBracket = Char(']');
private static readonly Parser<char, char> Asterisk = Char('*');
static readonly Parser<char, Unit> EndDocument =
RBracket.Then(Asterisk).Then(RBracket).ThenReturn(Unit.Value);
var test = Any
.ManyString()
.Separated(EndDocument)
.Parse("fd]*]vb]*]gf");
This test
[Theory]
[InlineData("<>", 1)]
[InlineData("<", 2)]
[InlineData(">", 3)]
[InlineData("=", 4)]
public void TryParse(string source, int expected)
{
Parser<char, int> parser =
Parser.Char('<').Then(_ => Parser.Char('>')).ThenReturn(1)
.Or(Parser.Char('<').ThenReturn(2))
.Or(Parser.Char('>').ThenReturn(3))
.Or(Parser.Char('=').ThenReturn(4));
Assert.Equal(expected, parser.ParseOrThrow(source));
}
fails with the following error
Failed PidginTests.PidginTests.TryParse(source: "<", expected: 2)
Error Message:
Pidgin.ParseException : Parse error.
unexpected EOF
expected "<"
at line 1, col 2
Stack Trace:
at Pidgin.ParserExtensions.GetValueOrThrow[TToken,T](Result2 result) at Pidgin.ParserExtensions.ParseOrThrow[T](Parser
2 parser, String input, Func`3 calculatePos)
at PidginTests.PidginTests.TryParse(String source, Int32 expected)
Other tests in theory passed.
Verified on Windows 10 and MacOS.
Hi,
First time using this library, so I'm a novice and hence this might be a stupid question.
Problem
I want to write a parser that reads, for example [64kB, 5MB),
and can produce the resulting tuple (64*1024, 5*1024^2)
of integers.
So far
using System;
using Pidgin;
using static Pidgin.Parser;
namespace Parser
{
public class Range
{
public int Lower { get; set; }
public int Upper { get; set; }
}
public static class RangeParser
{
public static readonly char[] Suffix =
{
'k', 'M', 'G'
};
private static readonly Parser<char, char> LBracket = Char('[');
private static readonly Parser<char, char> RBracket = Char(']');
private static readonly Parser<char, char> LParenthesis = Char('(');
private static readonly Parser<char, char> RParenthesis = Char(')');
private static readonly Parser<char, char> Comma = Char(',');
private static readonly Parser<char, char> Byte = Char('B');
private static readonly Parser<char, char> Kilo = Char('k');
private static readonly Parser<char, char> Mega = Char('M');
private static readonly Parser<char, char> Giga = Char('G');
private static readonly Parser<char, Range> Parser =
OneOf(LBracket, LParenthesis).Then(DecimalNum).Then(OneOf(Kilo, Mega, Giga))
.Then(Byte)
.Separated(Comma)
.Then(DecimalNum)
.Then(OneOf(Kilo, Mega, Giga))
.Then(Byte)
.Then(OneOf(RBracket, RParenthesis))
.Select(c => new Range());
public static int SuffixMap(char suffix)
{
switch(suffix)
{
case 'k': return 1000;
case 'M': return 1000 * 1000;
case 'G': return 1000 * 1000 * 1000;
default:
throw new ArgumentOutOfRangeException(nameof(suffix),
"The suffix is not supported. Valid values are [" + string.Join(",", Suffix) + "]");
}
}
public static Result<char, Range> Parse(string input) => Parser.Parse(input);
}
}
Can't understand how to use Map
or the second argument to Then
properly.
Thanks
Hey Benjamin,
I couldn't find any other way to contact you, that's why I created an issue.
Just wanted to mention that after your NDC talk I got curious what the syntax of Pidgin would look like in F#, so I started to work on a port, which now has enough functionality to implement a simple Json parser. The implementation closely follows the primitive and complex parsers of Pidgin, although a couple of them had to be renamed due to reserved F# keywords (for example I used after
instead of then
).
Many things are still missing, mainly some of the more finicky parsers (SeparatedAndOptionallyTerminated
, etc.), error messages, backtracking, and any other parser state than strings. Also, I started working on it mainly to learn more about F#, so it's not really intended for production use (for which there is already FParsec anyway).
So far what I really liked about doing it in F#:
and
syntaxWhat I didn't like:
map
with different number of argumentslet newState = advance state
) suggests that the state is immutable, which is true for a string state but wouldn't be true in the case of a stream (not implemented yet). I'm not sure yet what the nicest way to express this would be.If you're interested, you can take a look here.
Cheers,
Mark
Hi! Very nice library, thanks for open-sourcing it.
I noticed some odd behavior of Parser.Labelled()
, for example, given this parser:
var tupleParser = LetterOrDigit.AtLeastOnceString()
.Separated(Char(','))
.Between(Char('('), Char(')'))
.Labelled("Tuple");
tupleParser.ParseOrThrow("(1,2,!!,3)");
The error message is:
Pidgin.ParseException : Parse error.
unexpected !
expected Tuple
at line 1, col 6
Which is misleading IMHO, since a tuple isn't actually expected at col 6. A possible fix would be to change WithExpectedParser.Parse()
as follows:
internal override InternalResult<T> Parse(ref ParseState<TToken> state)
{
state.BeginExpectedTran();
var result = _parser.Parse(ref state);
state.EndExpectedTran(commit: result.ConsumedInput);
if (!result.Success && !result.ConsumedInput)
{
state.AddExpected(_expected);
}
return result;
}
I only just started looking at the Codebase, so I hope I understood the Expected transaction mechanism correctly :) With this change, the message is as if the Labelled
wasn't there:
Pidgin.ParseException : Parse error.
unexpected !
expected letter or digit
at line 1, col 6
Maybe a better fix would be to include a sort of stacktrace of Labelled
s that enclosed the error, e.g. "Error while parsing Tuple: ...", but this would be a larger change. Not sure if it should go into the error message, the Expected
s, or some new construct.
Hey there,
Since Pidgin positions itself as the fastest parser combinator library, it would be really useful to see the output of https://github.com/benjamin-hodgson/Pidgin/tree/master/Pidgin.Bench in the readme.md for quick access.
I'm working on writing a parser for the Microsoft API filtering language, and as a warmup exercise I'm doing a parser for their very simple sort syntax. The problem I'm running into is that my parser will succeed when there's a trailing separator, e.g. "foo,", or "foo desc, bar,".
Here's my parser. I have to be doing something wrong, because the PropertyAccess
parser does fail when there's a trailing dot, but the main SortExpression
parse doesn't. Any guidance on what I've screwed up?
public static class SortExpressionParser
{
internal static readonly Parser<char, char> Comma = Char(',');
internal static readonly Parser<char, char> Dot = Char('.');
internal static Parser<char, T> Token<T>(Parser<char, T> parser) =>
Try(parser).Before(SkipWhitespaces);
internal static Parser<char, string> Token(string s) => Token(String(s));
internal static readonly Parser<char, string> PropertyName =
Token(Letter.Then(LetterOrDigit.ManyString(), (head, tail) => head + tail));
internal static readonly Parser<char, IImmutableList<string>> PropertyAccess =
PropertyName.Separated(Token(Dot))
.Select<IImmutableList<string>>(names => names.ToImmutableArray());
internal static readonly Parser<char, SortDirection> SortDirectionModifier =
OneOf(
Token("asc").ThenReturn(SortDirection.Ascending),
Token("desc").ThenReturn(SortDirection.Descending)
);
internal static readonly Parser<char, SortDirective> SortStatement =
Map(
(propertyAccess, sortDirection) => new SortDirective(
propertyAccess,
sortDirection.GetValueOrDefault(SortDirection.Ascending)
),
SharedParsers.PropertyAccess,
SortDirectionModifier.Optional()
);
public static readonly Parser<char, IImmutableList<SortDirective>> SortExpression =
SortStatement.Separated(Token(Comma))
.Select<IImmutableList<SortDirective>>(list => list.ToImmutableArray())
.Before(End);
}
What I mean is - get a breakdown of how many times each parser was called, how long each individual parser took, the total time spent for each parser, etc. I tried profiling a parsing run with JetBrains dotTrace but it's, uh, not very helpful since it's just a massive chain of parser calls.
Is there a clever way to at least log how many times each parser is called by injecting a side effect somewhere? I can fill in the other numbers from that.
We are trying to build an expression parser using Pidgin, based on the provided example code. I have two questions:
expr = ExpressionParser.Build(
term,
new[]
{
Operator.PostfixChainable(call),
Operator.Prefix(Neg).And(Operator.Prefix(Complement)).And(Operator.Prefix(UPlus)),
Operator.InfixL(Multiply).And(Operator.InfixL(Divide)),
Operator.InfixL(Plus).And(Operator.InfixL(Minus)),
Operator.InfixL(EqualTo).And(Operator.InfixL(NotEqualTo))
}
).Labelled("expression");
first time initialization takes 10s of seconds. As you can see we have only added a few operator conditions, is this to be expected? Is there a better way to structure this?
Thanks in advance.
Mark
This may be a stupid question, but I'm looking for something like End()
in Sprache, but with no luck.
Like the code below:
int num = Parser.Num.ParseOrThrow("1234aa");
Now it can parse 1234
successfully, but I need the parsing failed.
Parsing Function Call with empty parameters fails
I have been able to create a reasonably complex expression evaluator. The only issue I am having is that when the object text contains a function with no arguments it fails (spinning in DLL) ... suspect Between not working with no arguments.
Expression that fails:
TEST()
Expression that works fine:
TEST(X)
TEST(X,X ...)
Parsing logic (directly from your test application)
private static Parser<char, T> Parenthesised(Parser<char, T> parser)
=> parser.Between(Tok("("), Tok(")"));
...
var call = Parenthesised(Rec(() => expr).Separated(Tok(",")))
.Select<Func<IExpr, IExpr>>(args => method => new Call(method, args.ToImmutableArray()))
.Labelled("function call");
Unfortunately, I have not been able to debug the DLL so I cant give you more details.
Thanks in advance.
I am attempting to parse a field that is a potentially Null Terminated ASCII string with a max of 64 bytes. If there is no null terminator at the 64th byte then I would like to terminate the string automatically.
This is what I have for a Null Terminated ASCII string and it appears to work fine. I just need to find a way to stop parsing past 64 bytes if there is no null terminator. Ideally in an efficient manner.
static Parser<byte, IEnumerable<byte>> NullTerminatedStringBytes = SingleByte.Until(Token(b => b == 0x00).Labelled("Null Terminated"));
public static Parser<byte, string> NullTerminatedString = Map((nullString) =>
{
var buffer = nullString.ToArray();
var result = Encoding.ASCII.GetString(buffer);
return result;
}, NullTerminatedStringBytes);
Thoughts?
I am trying to match an identifier where all chars except a set list of chars is legal. I would like to have a NotAnyOf or possibly some sort of negation parser where you could do Not(AnyOf('a', 'b', 'c')). Is there any way to do this now without using a Token lambda?
I am having trouble understanding these parser builders.
I was wanting to make a wrapper parser in pidgin that is easier to understand and use for me but can do the same thing..
any some one help me impliment it?
using System;
using System.Collections.Generic;
using System.Text;
using Pidgin;
using static Pidgin.Parser;
using static Pidgin.Parser<char>;
namespace Pidgin_Basic_Tests
{
public class Parser<TToken, T>
{
Pidgin.Parser<TToken, T> value;
Parser<TToken, T> Tag(string name) => value.Labelled(name);
ParseValue<char, IEnumerable<char>> Any ( ) => Any ( ); //anything
ParseValue<char, char > CharOnce (char n) => Char(n); //requires a n
ParseValue<char, char > CharMany (char[] n) => ???; //requires one of n
ParseValue<char, char > Whitespace ( ) => Parser.Whitespace; //requires whitespace
public static implicit operator Parser<TToken, T>(Pidgin.Parser<TToken, T> parser)
{
return new Parser<TToken, T>() { value = parser };
}
}
public class ParseValue<TToken, T>
{
Pidgin.Parser<TToken, T> value;
ParseValue <TToken, T> Tag(string name) => value.Labelled(name);
ParseOperator<char, char> AtMin (int times); //Must occer at least "times" times
ParseOperator<char, char> AtMax (int times); //Must occer at most "times" times
ParseOperator<char, char> Until<T> (T c); //Must occer until "c"
ParseOperator<char, char> UntilAnyOf<T>(T[] c); //Must occer until any of "c"
ParseOperator<char, char> UntilAllOf<T>(T[] c); //Must occer until all of "c"
ParseOperator<char, char> UntilEnd(); //Must occur until end
public static implicit operator ParseValue<TToken, T>(Pidgin.Parser<TToken, T> parser)
{
return new ParseValue<TToken, T>() { value = parser };
}
}
public class ParseOperator<TToken, T>
{
Pidgin.Parser<TToken, T> value;
ParseOperator<TToken, T> Tag(string name) => value.Labelled(name);
//And
Parser<TToken, T> ReqAND(ParseValue value); //required AND
Parser<TToken, T> ReqNAND(ParseValue value); //required NAND
Parser<TToken, T> ReqXAND(ParseValue value); //required XAND/XNOR
Parser<TToken, T> OptAND(ParseValue value); //optional AND
Parser<TToken, T> OptNAND(ParseValue value); //optional NAND
Parser<TToken, T> OptXAND(ParseValue value); //optional XAND/XNOR
//Or
Parser<TToken, T> ReqOR(ParseValue value); //required OR
Parser<TToken, T> ReqNOR(ParseValue value); //required NOR
Parser<TToken, T> ReqXOR(ParseValue value); //required XOR/XNAND
Parser<TToken, T> OptOR(ParseValue value); //optional OR
Parser<TToken, T> OptNOR(ParseValue value); //optional NOR
Parser<TToken, T> OptXOR(ParseValue value); //optional XOR/XNAND
Parser<TToken, T> Result(); //End
Parser<TToken, T> Result(T v1, T v2); //End
public static implicit operator ParseOperator<TToken, T>(Pidgin.Parser<TToken, T> parser)
{
return new ParseOperator<TToken, T>() { value = parser };
}
}
}
What if I dont have delimiter but only flat-file style fixed-length string. Imagine getting data from a Tcp stream like this for example
"\0\0\0j\0\0\0\vT3A1111 2999BOSH 2100021 399APV 2100022 "
I cannot reliably rely on a delimiter here. The string above, represents a message received from a server with following meaning:
4 byte long message length ("\0\0\0j") . THIS IS HEX value
4 byte long message id ("\0\0\0\v"). THIS IS HEX value, the rest of values below are ASCII
1 byte long message type ("T")
1 byte long message sequence ("3")
8 byte long car Id ("A1111 ")
9 byte long part-1 price (" 2999")
30 byte long part-1 manufacturer ("BOSH ")
9 byte long part# ("2100021 ")
9 byte long part-2 price (" 399")
30 byte long part-2 manufacturer ("APV ")
9 byte long part# ("2100022 ")
How to parse message like this?
Hi.
For a given parser class...
public class SutParser
{
public static Parser<char, char> TabOrSpace
= Token(c => c == ' ' || c == '\t');
public static Parser<char, string> TextField
= Token(c => !char.IsWhiteSpace(c))
.AtLeastOnceString()
.Between(TabOrSpace.Many());
public static Parser<char, IEnumerable<string>> ControlValue =
TextField
.AtLeastOnce()
.Between(String("~ControlValue"), EndOfLine)
;
public static Parser<char, IEnumerable<IEnumerable<string>>> ControlValues
= ControlValue.Many();
}
and the string to be parsed ...
~ControlValue 1 55.7 51.0 46.4 41.8 37.1
~ControlValue 2 50.6 46.4 42.2 38.0 33.8
~ControlValue 3 55.7 51.0 46.4 41.8 37.1
~ControlValue 4 50.6 46.4 42.2 38.0 33.8
~ControlValue 5 77.2 70.7 64.3 57.9 51.4
~ControlValue 6 88.6 81.2 73.8 66.4 59.0
~Key 10
~Contrast 10 32.5 23
I assigned testdata
string variable to above text.
The parser ControlValue
parses it successfully with single list of string
var singleResult = SutParser.ControlValue.ParseOrThrow(testdata);
// singleResult == ["1," "55.7", "51.0", "46.4", "41.8", "37.1"]
now I tried to expand it using ControlValues
but failed
var multipleResults = SutParser.ControlValues.ParseOrThrow(testdata);
It complained like
{Parse error.
unexpected K
expected "~ControlValue"
at line 7, col 2}
How do I ControlValues
parser only parse until ~Key
part in above text?
This question is based on the example expression parser.
For the binary operator expression BinaryOp
the parser is able to create a BinaryOp
object with the BinaryOperatorType
enumeration.
Because of the use case that I am using PidGin for has a finite number of "function calls" I was trying to design the function call expression to contain an enumeration of all the function call types instead of a IExpr
to represent the function call type.
For example something like this:
public enum FunctionCallType
{
Contains, // function name string to match it with: "contains"
StartsWith, // function name string to match it with: "startswith"
EndsWith // function name string to match it with: "endswith"
}
public class Call : IExpr
{
public FunctionCallType Type { get; }
public ImmutableArray<IExpr> Arguments { get; }
public Call(FunctionCallType type, ImmutableArray<IExpr> arguments)
{
Type = type;
Arguments = arguments;
}
}
I spent hours trying to come up with something that would give me a "warm fuzzy", I do have a design working but it is not what I have above and seems dirty to me. I am curious how this could be done in a easy/clean way based on the Call
design above?
I have been finding it arduous to constantly add SkipWhitespaces
with Before
or Between
methods for various parsers. Does it make sense to have char
based parsing automatically ignore whitespace or by setting to simplify parser creation?
Is there a way to access the source begin/end positions?
I was hoping that Result<TToken, T>
would reveal something, but I don't see anything. If not, is this planned?
The use case is that I would be able to highlight parts of the parsed source while processing the resulting tree.
I suppose the ideal API would expose the starting/ending SourcePos
from a successful result instance.
Apologies if you've addressed this question before; I didn't find any similar questions.
Hi Benjamin,
If a debug build of Pidgin.dll is linked with a console app targeting .NET Framework 4.x, the app is terminated due to StackOverflowException.
It looks like given the described configuration, CLR tries to eagerly initialize Parser.Optional._returnNothing and gets in an endless loop. Please take a look at the screenshot for an example.
Here is a minimal solution to reproduce the issue.
ParserOptionalStackoverflow.zip
I took the liberty of including binary files (Pidgin.dll and friends) withing the archive.
If you prefer to remove them and rebuild yourself, here are the build commands.
dotnet restore
dotnet build --configuration Debug
Curiously enough:
One way to correct this issue might be by preventing eager initialization in the following way
public abstract partial class Parser<TToken, T>
{
private static Parser<TToken, Maybe<T>> _returnNothing;
private static Parser<TToken, Maybe<T>> returnNothing =>
_returnNothing ?? (_returnNothing = Parser<TToken>.Return(Maybe.Nothing<T>()));
// Rest of the code left out for clarity
}
The motivation to use a debug build of Pidgin is, while learning the library, it gives a way to figure out what is going on inside of Pidgin if something does not work as I expect it to.
By the way, thank you for a great talk -- https://www.youtube.com/watch?v=lsUgwfK9XIM
Thank you.
Mykola.
Hi,
I have the following test case:
[TestMethod]
public void Bug()
{
var mins= OneOf(Tok("minutes"), Tok("mins"), Tok("min")).Trace(x => $"mins={x}");
var meters=OneOf(Tok("meters"), CIString("m").Before(WhitespaceString)).Trace(x => $"meters={x}");
meters.ParseOrThrow("min Run");
}
I would expect the meters
parser to throw because it should be looking for "m " and failing. Am i doing something wrong here?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.