Giter Site home page Giter Site logo

Comments (15)

mathiasverraes avatar mathiasverraes commented on September 27, 2024 1

This commit shows that the problem with either is fixed: mathiasverraes@75761c7

(It's actually fixed as a consequence of something else, we swapped the implementation of or to make sure it only executes the second parser if the first one failed.)

This change is in the main branch and will be in 0.4. There are lots of breaking changes so beware.

from parsica.

mathiasverraes avatar mathiasverraes commented on September 27, 2024 1

The new JSON parser can escape characters in string literals: https://github.com/mathiasverraes/parsica/blob/main/src/JSON/JSON.php#L214

from parsica.

mathiasverraes avatar mathiasverraes commented on September 27, 2024

See the commit above, that shows how to parse the path. But I'm sure you figured that part out yourself.

I'm not sure what you want to achieve. Are you trying to make the parser somehow understand that parts of the path are repeated?

from parsica.

grifx avatar grifx commented on September 27, 2024

Thank you for your responsiveness!

To provide you with more context. I'm building an open source utility to normalize a flat map using a small DSL.
I'm currently using in production a version of this utility I wrote using imperative code.
I thought it would be nice to build an AST Tree using the Parsica library that I use to normalize the data.

I already managed to build a single branch of the AST tree by parsing a single key ("$.a.b.c.d").

[
  {
    "id@int": "123",
    "name": "Brian",
    "car.color":"red",
    "car.type": "SUV",
    "children[childId].id": "1",
    "children[childId].name": "Tom",
    "_childId": "1",
  },
  {
    "id@int": "123",
    "name": "Brian",
    "car.color":"red",
    "car.type": "SUV",
    "children[childId].id": "2",
    "children[childId].name": "Harry",
    "_childId": "2",
  }
]

=>

{
  "id": 123,
  "name": "Brian",
  "car": {"color":"red", "type": "SUV"},
  "children" [{"id": "1", "name": "Tom"}, {"id": "2", "name": "Harry"}]
}

I'm not sure what you want to achieve. Are you trying to make the parser somehow understand that parts of the path are repeated?

That's correct, I would like to "skip" the parsing of repeated strings

[
  "/a/b/c/file1",
  " **/a/b/c/** file2",
  " **/a/b/c/** file3",
  " **/a/b/** file4"
]

I would like to introduce a ParsingContext to my parser factory to provide me with cachedParsingSteps:

function parserBoundToContext(ParsingContext $context) {
    $resolveFromCache = fn($input) $this->resolveFromCache($input, $context);

    $parser = (
        // resolved from $context->cachedParsingSteps
        string("/a/b/c/")->map($resolveFromCache)
        ->or(string("/a/b/")->map($resolveFromCache))
        ->or(string("/a/")->map($resolveFromCache))
        ->or(nothing())
    );
    // ...
}

Whenever a folder is parsed, I would like to enrich the cachedParsingSteps with the input (the whole input since the start -- could be generated from the whole input minus the remainder) and its result.

Ideas

  • Provide both the value and the remainder in Parse::map($value, $remainder). We could use reflection to know if the param
    $remainder is declared.
  • Maybe we can create a cache($parser, $context) parser wrapper that would do something like:
    (startCacheNothing($context)->followedBy($parser)->thenIgnore(endCacheNothing($context)))->map(enrichContextWithValue($context))

I hope all this makes sense to you.

Cheers,

from parsica.

mathiasverraes avatar mathiasverraes commented on September 27, 2024

Some thoughts:

  1. You do have access to input, output, and remainder during the parsing process right now, but only by writing your own combinator. This is the general pattern
<?php
function yourCombinator(Parser $parser, $context) : Parser {
    return Parser::make(function (string $input) use ($parser, $context) : ParseResult {
             $result = $parser->run($input);
             if($result->isSuccess()) {
                 // You now have access to $result->output() and ->remainder(),
                 // as well as the original $input, the original $parser, and your $context
             }
             // Return a ParseResult
        }
    });
}

2.The above could be replaced by something similar to map. I don't want change the definition of map because that would conflict with the common definition of map in FP languages. But it could be something like map2(Parser, fn($output, $remainder) : ParseResult) : Parser

  1. Having side effect when something is parsed (such as updating a cache) could be done with events. emit(Parser, fn($output):void) : Parser. In other words, it's a combinator that returns a Parser that behaves exactly like the original parser, but performs side effects in function.
<?php
$cache = new Cache();
$addToCache = fn($output) => $cache->add($output);
$parser = emit(char('a'), $addToCache); // If we successfully parse 'a', add 'a' to cache.

I've added emit() in the commit above. (It's not in the main branch yet.) I'm not sure about the map2 thing yet, but you could easily make that yourself by adding a map2 to both Parser and ParseResult.

I'm still a bit unclear as to what exactly the point is of what you're trying to do, but let me know if these things help.

from parsica.

grifx avatar grifx commented on September 27, 2024

I think I found a way to achieve what I initially wanted to do.
I'll try to implement it and close this PR by the end of this week.

--

Do you think we should add this utility?

function lazy(callable /*: () => Parser */ $parserFactory): Parser
{
    return Parser::make(function (string $input) use ($parserFactory): ParseResult {
        $parser = $parserFactory();

        return $parser->run($input);
    });
}
$parser = lazy(fn() => string('joris'));

from parsica.

mathiasverraes avatar mathiasverraes commented on September 27, 2024

Good to hear you fixed it. The lazy combinator was suggested by someone on Twitter as well. I'm a bit hesitant to add it, because it forces the user to think about performance. I'm hoping the avoid that. We have some ideas that we want to try, but there are a number of intermediate steps we need first. So I don't want to add performance features before we can measure it and exhaust other options.

from parsica.

grifx avatar grifx commented on September 27, 2024

I finished implementing the parser using Parsica. The library is now working as expected : ).

Some notes:

  • Unexpected behaviour when using or() (used under the hood by either) with emit().
either(
    emit(
        success(),
        fn() => var_dump("I expect to be triggered")
    ),
    emit(
        success(),
        fn() => var_dump("I do not expect to be triggered")
    )
)->try('test');

My use-case:
I'm trying to parse from the cache (sometimes successful) "or" using the root parser (always successful).
I don't expect both events to be emitted.

return either(
    $context->preflightCacheParser(),
    $root
)->followedBy($rest);

This is what allows me to skip the repeated parts:

[
  "/a/b/c/file1",
  " **/a/b/c/** file2",
  " **/a/b/c/** file3",
  " **/a/b/** file4"
]

As you already know, it works when replacing the current implementation of or with the commented implementation.
Perhaps it's another reason, beyond performance, to get rid of the current implementation.
"or()" is used by quite a lot of utils that would have to be rewritten, I hope the default behaviour will change.

  • When using emit(), the $receiver does not have access to: $output, $input and $remainder. My use case requires both $output and $remainder. I wouldn't know how to implement it with the current emit(). Saying that, unlike the or() behaviour, it's very easy to go around the problem by creating a new emit function.

  • The Persica implementation of the Parser is very slow. This utility is now roughly 10 times slower than its previous version. Of course, we have to take in consideration that it's much easier to maintain and an extra layer of cache would result in similar performance. Unfortunately, in some environments, cross-request caching isn't possible.

    /**
     * @test
     */
    public function it_should_parse_under_100_ms()
    {
        $propertyName = atLeastOne(alphaNumChar());

        $type = emit(
            either(
                eof(),
                char('@')
                    ->followedBy($propertyName)
                    ->thenIgnore(eof()),
            ),
            function () {}
        );

        $map = emit(
            char('.')->followedBy($propertyName),
            function () {}
        );

        $list = emit(
            between(
                char('['),
                char(']'),
                either(
                    char('@')
                        ->followedBy($propertyName)
                        ->map(fn($value) => [
                            'discriminatorName' => $value,
                            'keepKeys'          => true
                        ]),
                    $propertyName
                        ->map(fn($value) => [
                            'discriminatorName' => $value,
                            'keepKeys'          => false
                        ]),
                )
            ),
            function () {}
        );

        $root = emit(
            char('$'),
            function () {}
        );

        $rest = many(any($map, $list))->followedBy($type);

        $parser = either(
            failure(), // $context->preflightCacheParser(),
            $root
        )->followedBy($rest);

        $start = microtime(true);
        for ($i = 0; $i < 500; $i++) {
            $parser->try('$.q.w[@1].e[2]@int');
        }
        $end = microtime(true);

        $this->assertLessThan(0.1, $end - $start);
    }

Thanks again for your work!

from parsica.

mathiasverraes avatar mathiasverraes commented on September 27, 2024

I've also added your other test. Would be possible to share your original parser? That way, we can define the test as a comparison, where the Parsica version must perform at most x% slower than the original one.

from parsica.

mathiasverraes avatar mathiasverraes commented on September 27, 2024

BTW I'm sure you know this, but if xdebug is on, it makes that test about 7x slower on my machine. It's still too slow, of course. I'm hoping to focus on performance after I get v0.4 out the door.

from parsica.

grifx avatar grifx commented on September 27, 2024

if xdebug is on, it makes that test about 7x slower on my machine. It's still too slow, of course.

Good point, I probably had it on.

--

The original parser isn't open-sourced and does much more than parsing.
I'll try to implement another Parser using imperative code by the end of this week 🤞 so we can have a proper comparison.

from parsica.

grifx avatar grifx commented on September 27, 2024

FYI (without xdebug)

Imperative
Current draft: https://gist.github.com/grifx/1efe84852f2e4dd867793d66149d152b
Time: 00:00.046, Memory: 6.00 MB
Parsica
Time: 00:00.672, Memory: 6.00 MB

        $context = new ParsingContext();
        for ($i = 0; $i < 1000; $i++) {
            $context->startParsingKey(0, '$.qwerty.qwerty[@qwerty].qwerty[qwerty]@qwerty');
            $this->imperativeKeyParser->try('$.qwerty.qwerty[@qwerty].qwerty[qwerty]@qwerty', $context);
            $context->endParsingKey();
            $context->clean();
        }

        $context = new ParsingContext();
        for ($i = 0; $i < 1000; $i++) {
            $context->startParsingKey(0, '$.qwerty.qwerty[@qwerty].qwerty[qwerty]@qwerty');
            $this->parsicaKeyParser->try('$.qwerty.qwerty[@qwerty].qwerty[qwerty]@qwerty', $context);
            $context->endParsingKey();
            $context->clean();
        }

I'll try to clean, extract and share the unit test during the week.

Note: They are not behaving exactly the same way. For instance, the imperative parser deals with char escaping: $.www\.gooo\\\.oooogle\.com. I haven't thought about how to do it with Parsica yet.

from parsica.

mathiasverraes avatar mathiasverraes commented on September 27, 2024

Should we close this? I'm not sure if there's anything actionable right now.

from parsica.

grifx avatar grifx commented on September 27, 2024

Sorry for the late reply.

Yes, let's close this issue.

I'll release the library using the imperative parser and open a PR to introduce Parsica.

Thanks for your help!

from parsica.

grifx avatar grifx commented on September 27, 2024

@mathiasverraes FYI, I cannot close this issue since I did not open it.

from parsica.

Related Issues (17)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.