Giter Site home page Giter Site logo

mecha's Introduction

Mecha

A parser combinator library for the Zig programming language. Time to make your own parser mech! mech

const mecha = @import("mecha");
const std = @import("std");

const Rgb = struct {
    r: u8,
    g: u8,
    b: u8,
};

fn toByte(v: u4) u8 {
    return @as(u8, v) * 0x10 + v;
}

const hex1 = mecha.int(u4, .{
    .parse_sign = false,
    .base = 16,
    .max_digits = 1,
}).map(toByte);
const hex2 = mecha.int(u8, .{
    .parse_sign = false,
    .base = 16,
    .max_digits = 2,
});
const rgb1 = mecha.manyN(hex1, 3, .{}).map(mecha.toStruct(Rgb));
const rgb2 = mecha.manyN(hex2, 3, .{}).map(mecha.toStruct(Rgb));
const rgb = mecha.combine(.{
    mecha.ascii.char('#').discard(),
    mecha.oneOf(.{ rgb2, rgb1 }),
});

test "rgb" {
    const testing = std.testing;
    const allocator = testing.allocator;
    const a = (try rgb.parse(allocator, "#aabbcc")).value;
    try testing.expectEqual(@as(u8, 0xaa), a.r);
    try testing.expectEqual(@as(u8, 0xbb), a.g);
    try testing.expectEqual(@as(u8, 0xcc), a.b);

    const b = (try rgb.parse(allocator, "#abc")).value;
    try testing.expectEqual(@as(u8, 0xaa), b.r);
    try testing.expectEqual(@as(u8, 0xbb), b.g);
    try testing.expectEqual(@as(u8, 0xcc), b.b);

    const c = (try rgb.parse(allocator, "#000000")).value;
    try testing.expectEqual(@as(u8, 0), c.r);
    try testing.expectEqual(@as(u8, 0), c.g);
    try testing.expectEqual(@as(u8, 0), c.b);

    const d = (try rgb.parse(allocator, "#000")).value;
    try testing.expectEqual(@as(u8, 0), d.r);
    try testing.expectEqual(@as(u8, 0), d.g);
    try testing.expectEqual(@as(u8, 0), d.b);
}

mecha's People

Contributors

anand-bala avatar data-man avatar dependabot-preview[bot] avatar dependabot[bot] avatar erooke avatar hejsil avatar jrachele avatar lotheac avatar mattnite avatar nektro avatar truemedian avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mecha's Issues

Combine yields incompatible tuple types

I was trying to extend your example to also accept #rgb colors, like so:

const rgb = mecha.convert(Rgb, Rgb.from, mecha.combine(.{
    mecha.char('#'),
    mecha.oneOf(.{
        rgbBytes,
        rgbNibbles,
    }),
}));
const rgbBytes = mecha.combine(.{ byte, byte, byte });
const rgbNibbles = mecha.combine(.{ nibble, nibble, nibble });
const byte = mecha.convert(u8, mecha.toInt(u8, 16), mecha.manyRange(2, 2, hex));
const nibble = mecha.convert(u8, mecha.toInt(u8, 16), mecha.manyRange(1, 1, hex));
const hex = mecha.digit(16);

And I'm getting this error when testing: error: expected type '.mecha.Result(.mecha.struct:323:23)', found '.mecha.Result(.mecha.struct:323:23)'
As far as I understand from the code and asking on Discord, combine generates its return type, but rgbBytes and rgbNibbles have distinct tuples even though they are both u8 triplets… (due to @TypeOf)

Build including mecha fails with no field named 'root_source_file' in struct 'Build.CreateModuleOptions'

Zig version: 0.11.0
OS: Pop!_OS 22.04 LTS

Hello! I am trying to include mecha in my project and when running zig build I get the error in the issue title:

/home/jeremy/.cache/zig/p/12203ee851cb0451f3ebdf23b51cf21886e261a563b5b1e1ec16c801dc3ef9ecea5d/build.zig:8:45: error: no field named 'root_source_file' in struct 'Build.CreateModuleOptions'
    const module = b.addModule("mecha", .{ .root_source_file = .{ .path = "mecha.zig" } });
                                            ^~~~~~~~~~~~~~~~
/usr/local/lib/zig/std/Build.zig:680:33: note: struct declared here
pub const CreateModuleOptions = struct {
                                ^~~~~~
referenced by:
    runBuild__anon_47132: /usr/local/lib/zig/std/Build.zig:1638:27
    remaining reference traces hidden; use '-freference-trace' to see all reference traces

I am fetching mecha through the built-in package manager - here's the relevant excerpt from build.zig.zon:

    .dependencies = .{
        .mecha = .{
            .url = "https://github.com/Hejsil/mecha/archive/5d572b31b08268e977bd52ba6701dfea6bf08025.tar.gz",
            .hash = "12203ee851cb0451f3ebdf23b51cf21886e261a563b5b1e1ec16c801dc3ef9ecea5d",
        },
    },

And a snippet from my build.zig:

    const exe = b.addExecutable(.{
        .name = "openmunge",
        .root_source_file = .{ .path = "src/main.zig" },
        .target = target,
        .optimize = optimize,
    });
    exe.addModule("mecha", mecha.module("mecha"));
    b.installArtifact(exe);

Any idea whether this is something I'm doing wrong? Thanks in advance.

How to parse indents?

I want to parse a format where indents are used instead of brackets, like in Nim or Python.
What's the best approach to this task using this library?

Considering later stages of the parsing (AST generation), probably it would be the easiest approach to have an Indent and a Dedent token type. If one line has 3 levels of indentation, then the next only has 2 levels, then it would generate a Dedent token before the first token on the next line.
This will probably need some state machine which tracks the level of the last indentation. Is this library capable of doing this?

How to parse bareword literal?

I want to parse a bareword literal:

  • it must start with uppercase
  • then any of upper- or lowercase can follow

I tried to use the following code:

const UpperCase = mecha.utf8.range('A', 'Z');
const LowerCase = mecha.utf8.range('a', 'z');

/// A widget literal starts with an upper case letter.
/// Then any number of upper-, or lowercase letters can follow.
var WidgetLiteral = mecha.combine(.{
  // Starts with uppercase.
  // It's an u21 in the result.
  UpperCase,
  // Then other chars follow.
  // It's an []u8 in the result.
  mecha.many(
    mecha.oneOf(.{
      UpperCase,
      LowerCase
    }),
    .{ .collect = true}
  )
});

The problem is that it's not easy to parse into a single []u8 or []u21, as the .combine() output gets parsed into a struct.
The first, single UpperCase is an u21, but then the following chars are []u8.
Is it possible somehow with a clean solution?

Documentation

Hi

I am very interested in using mecha for a toy language. I have never used a parser combinator before and as such am struggling with how to use it. Is there any documentation or a more robust example of mecha (something like a small language like mini-c, lua etc)?

Or can you point me to any article/tutorial which explains the kind of parser mecha is, with hopefully more examples for me to read through.
I did try to read the through the source, but mecha is super tiny and I am not sure how would I go about crafting a parser using those fundamental building blocks.

How to parse string literals?

Is this library capable of parsing string literals?
I'm interested in parsing two type of strings:
double quote: "this can be anything inside, except double quote"
slash: // this is a comment, until the end of the line

I guess some parser is needed that reads everything until a specified parser succeeds. E.g.

var StringLiteral = mecha.readUntil(
  mecha.string("\n");
);
// or maybe
var StringLiteral = mecha.combine(.{
  mecha.string("//"),
  mecha.matchAnyChar,
  mecha.string("\n")
})

Example Rgb parser is incorrect

test "rgb" {
    const allocator = std.testing.allocator;

    const c = (try rgb(allocator, "#0abbcc")).value;
    // expected result
    std.testing.expectEqual(@as(u8, 0x0a), c.r);
    std.testing.expectEqual(@as(u8, 0xbb), c.g);
    std.testing.expectEqual(@as(u8, 0xcc), c.b);
    // actual
    std.testing.expectEqual(@as(u8, 0xab), c.r);
    std.testing.expectEqual(@as(u8, 0xbc), c.g);
    std.testing.expectEqual(@as(u8, 0x0c), c.b);
}

Document what `map` passes to functions with various parsers (`combine`, `oneOf`, ...)

From reading the code, it's unclear how I'm supposed to define a function as when I want to use map. For instance, if I have this parser:

const if_statement = combine(.{keyword_if, condition, keyword_then, statements});

I can't easily tell how my function in map will be called, i.e., will I get a call like fn x(kw_if: []u8, condition: Condition, kw_then: []u8, statements: Statements), or something else?

Infinite recursion - i.e. how to use ref()?

I'm trying to write a very simple arithmetic expression parser. I'm not even trying to have any specific operator precedence yet, just going left to right. I'm ending up in an infinite loop, and I suspect it's because I'm misunderstanding ref.

Here's the data structure we're trying to parse into.

const Expression = union(enum) {
    value: u16,
    binOp: BinOp,

    const BinOp = struct {
        lhs: *Expression,
        operator: Op,
        rhs: *Expression,
    };

    const Op = enum { @"+", @"-", @"*", @"/" };
};

and a conversion function ...

fn toExpression(allocator: std.mem.Allocator, resultType: anytype) !*Expression {
    var x = try allocator.create(Expression);
    x.* = switch (@TypeOf(resultType)) {
        Expression.BinOp => Expression{ .binOp = resultType },
        u16 => Expression{ .value = resultType },
        *Expression => resultType.*, // probbaly shouldn't allocate again here but we're being fine.
        else => std.debug.panic("Unexpected type to toExpression {}", .{ @typeName(@TypeOf(resultType)) }),
    };
    return x;
}

here's the parser definition(s)

const ws = mecha.oneOf(.{
    mecha.utf8.char(0x0020),
    mecha.utf8.char(0x000A),
    mecha.utf8.char(0x000D),
    mecha.utf8.char(0x0009),
}).many(.{ .collect = false }).discard();

const integer = mecha.combine(.{ mecha.int(u16, .{ .parse_sign = false }), ws });

// TODO make these utf codepoints and then convert them before toEnuming. maybe fix up the toEnummer to not freak out on singel characters.
const plus = mecha.string("+");
const minus = mecha.string("-");
const star = mecha.string("*");
const slash = mecha.string("/");

const operator = mecha.oneOf(.{ plus, minus, star, slash }).convert(mecha.toEnum(Expression.Op));
const binOp = mecha.combine(.{
    mecha.ref(expressionRef),
    operator,
    mecha.ref(expressionRef),
}).map(mecha.toStruct(Expression.BinOp)).convert(toExpression);


fn expressionRef() mecha.Parser(*Expression) {
    return expression;
}

const expression = mecha.oneOf(.{
    binOp,
    integer.convert(toExpression),
});

A simple usage end in an infinite recursion

test "expr" {
    var arena = std.heap.ArenaAllocator.init(std.testing.allocator);
    const a = arena.allocator();
    const b = try expression.parse(a, "200 + 100");
    _ = b;
    // std.debug.print("\n {s}, {}", .{ b.rest, b.value });

    arena.deinit();
}
(lldb) bt all
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x16f603fe0)
  * frame #0: 0x00000001000062f4 test`mecha.map__struct_4549.parse(allocator=<unavailable>, str=(ptr = <read memory from 0x16f603f28 failed (0 of 8 bytes read)>, len = <read memory from 0x16f603f30 failed (0 of 8 bytes read)>)) at mecha.zig:602
    frame #1: 0x0000000100006738 test`mecha.convert__struct_4556.parse(allocator=<unavailable>, str=(ptr = "200 + 100", len = 9)) at mecha.zig:487:39
    frame #2: 0x000000010000096c test`mecha.oneOf__struct_5182.parse(allocator=<unavailable>, str=(ptr = "200 + 100", len = 9)) at mecha.zig:406:28
    frame #3: 0x0000000100004f24 test`mecha.ref__struct_3703.parse(allocator=mem.Allocator @ 0x000000016fdfee48, str=(ptr = "200 + 100", len = 9)) at mecha.zig:890:32
    frame #4: 0x0000000100006090 test`mecha.combine__struct_4474.parse(allocator=mem.Allocator @ 0x000000016fdfee48, str=(ptr = "200 + 100", len = 9)) at mecha.zig:354:43
    frame #5: 0x0000000100006328 test`mecha.map__struct_4549.parse(allocator=<unavailable>, str=(ptr = "200 + 100", len = 9)) at mecha.zig:603:39
    frame #6: 0x0000000100006738 test`mecha.convert__struct_4556.parse(allocator=<unavailable>, str=(ptr = "200 + 100", len = 9)) at mecha.zig:487:39
    frame #7: 0x000000010000096c test`mecha.oneOf__struct_5182.parse(allocator=<unavailable>, str=(ptr = "200 + 100", len = 9)) at mecha.zig:406:28
    frame #8: 0x0000000100004f24 test`mecha.ref__struct_3703.parse(allocator=mem.Allocator @ 0x000000016fdfee48, str=(ptr = "200 + 100", len = 9)) at mecha.zig:890:32
    frame #9: 0x0000000100006090 test`mecha.combine__struct_4474.parse(allocator=mem.Allocator @ 0x000000016fdfee48, str=(ptr = "200 + 100", len = 9)) at mecha.zig:354:43
    frame #10: 0x0000000100006328 test`mecha.map__struct_4549.parse(allocator=<unavailable>, str=(ptr = "200 + 100", len = 9)) at mecha.zig:603:39
    frame #11: 0x0000000100006738 test`mecha.convert__struct_4556.parse(allocator=<unavailable>, str=(ptr = "200 + 100", len = 9)) at mecha.zig:487:39
    frame #12: 0x000000010000096c test`mecha.oneOf__struct_5182.parse(allocator=<unavailable>, str=(ptr = "200 + 100", len = 9)) at mecha.zig:406:28
    frame #13: 0x0000000100004f24 test`mecha.ref__struct_3703.parse(allocator=mem.Allocator @ 0x000000016fdfee48, str=(ptr = "200 + 100", len = 9)) at mecha.zig:890:32
    frame #14: 0x0000000100006090 test`mecha.combine__struct_4474.parse(allocator=mem.Allocator @ 0x000000016fdfee48, str=(ptr = "200 + 100", len = 9)) at mecha.zig:354:43
    frame #15: 0x0000000100006328 test`mecha.map__struct_4549.parse(allocator=<unavailable>, str=(ptr = "200 + 100", len = 9)) at mecha.zig:603:39
    frame #16: 0x0000000100006738 test`mecha.convert__struct_4556.parse(allocator=<unavailable>, str=(ptr = "200 + 100", len = 9)) at mecha.zig:487:39
    frame #17: 0x000000010000096c test`mecha.oneOf__struct_5182.parse(allocator=<unavailable>, str=(ptr = "200 + 100", len = 9)) at mecha.zig:406:28
    frame #18: 0x0000000100004f24 test`mecha.ref__struct_3703.parse(allocator=mem.Allocator @ 0x000000016fdfee48, str=(ptr = "200 + 100", len = 9)) at mecha.zig:890:32
    frame #19: 0x0000000100006090 test`mecha.combine__struct_4474.parse(allocator=mem.Allocator @ 0x000000016fdfee48, str=(ptr = "200 + 100", len = 9)) at mecha.zig:354:43
    frame #20: 0x0000000100006328 test`mecha.map__struct_4549.parse(allocator=<unavailable>, str=(ptr = "200 + 100", len = 9)) at mecha.zig:603:39
    frame #21: 0x0000000100006738 test`mecha.convert__struct_4556.parse(allocator=<unavailable>, str=(ptr = "200 + 100", len = 9)) at mecha.zig:487:39
    frame #22: 0x000000010000096c test`mecha.oneOf__struct_5182.parse(allocator=<unavailable>, str=(ptr = "200 + 100", len = 9)) at mecha.zig:406:28

and so on, for many tens of thousands of frames...

I'm simply unsure what to do next to troubleshoot. any advice appreciated.

Delimiter seperated value parser

Would be super nice to have a function that parses delimiter separated values (dsv) into a type. Api would be something like this:

pub fn dsv(comptime T: type, comptime delim: Parser(void)) Parser(T) {...}

It would be able to take most types and parse a dsv into that type, doing all the conversions necessary to make this happen.

Open questions:

  • Delimiter escaping?
    • Maybe out of scope as this could become arbitrarily complex.

most mecha.ascii.* parsers are Parser(u8), but mecha.ascii.char is Parser(void)

Hi, thank you for this library.

The following test fails:

const testing = @import("std").testing;
const mecha = @import("mecha");

const hash = mecha.ascii.char('#');
const digit = mecha.ascii.digit(10);
const either = mecha.oneOf(.{ hash, digit });

test {
    try mecha.expectResult(u8, .{ .value = '#' }, hash(testing.allocator, "#"));
    try mecha.expectResult(u8, .{ .value = '5' }, digit(testing.allocator, "5"));
    try mecha.expectResult(u8, .{ .value = '9' }, either(testing.allocator, "9"));
    try mecha.expectResult(u8, .{ .value = '#' }, either(testing.allocator, "#"));
}
foo.zig:9:55: error: expected type 'error{OtherError,OutOfMemory,ParserFailed}!mecha.Result(u8)', found 'error{OtherError,OutOfMemory,ParserFailed}!mecha.Result(void)'
    try mecha.expectResult(u8, .{ .value = '#' }, hash(testing.allocator, "#"));
                                                  ~~~~^~~~~~~~~~~~~~~~~~~~~~~~
foo.zig:9:55: note: error union payload 'mecha.Result(void)' cannot cast into error union payload 'mecha.Result(u8)'
mecha.zig:23:12: note: struct declared here (2 times)
    return struct {
           ^~~~~~

This happens because mecha.ascii.char's result type is void, in contrast to the other pub fn parsers in this file (and it seems the same is true for mecha.utf8). I found that difference quite surprising, since it leads to somewhat unintuitive code, eg. when using oneOf() on ascii.range/ascii.digit and ascii.char. Could you shed some light on why the char parsers use discard?

Allow mecha to allocate memory

As seen in #20 it is pretty hard to make a parser that parses an unknown number of items with the current API. The best we can do right not is parse twice, once for validation and once for getting each result one by one in an iterating manner. What we really want here is the ability to allocate memory from mecha. I think we can even allow a custom context to be passed around as well:

  • Make all parsers take both a []const u8 and a *mem.Allocator.

    • This will allow many and manyRange to allocate their result instead of just returning the parsed string.
    • Also pass this allocator to the converter of convert so that it can allocate.
  • All parsers will now return mecha.Error!Result instead of ?Result:

    const Error = error{ OutOfMemory, ParserFailed, OtherError };
    • Mecha will catch ParserFailed but will let the other two errors bubble up as mecha cannot handle these.
  • If the users needs some sort of context struct to store or look up something, this can be done with @fieldParentPtr

const Context = struct {
    arena: ArenaAllocator,
};

fn parser(a: *mem.Allocator, str: []const u8) ?Result {
    // We cannot go directly from `a` to `Context` when `Context` is not the implementer
    // allocator. We can take the double parent pointer approach.
    const area = @fieldParentPtr(ArenaAllocator, "allocator", a);
    const context = @fieldParentPtr(Context, "arena", arena); 
}
  • mecha will not free any memory on its own. It is the users responsibility to use an allocator that can keep track of and free the memory it allocates (like ArenaAllocator).

Returning progress when parsing fails

There is a use case for this library, where the parser should report, where it failed. Since that would allow for some "graceful" failing. Instead of crashing, there is some merit in reporting where. It would suffice to report the position/rest, as everything else can be calculated if that happens, but now I think there is no way to report progress.

This would essentially mean moving the Error inside the ParserResult, which isn't ideal and is an architecture change. I understand if such modification is out of scope, and I need to implement it myself.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.