Comments (15)
This commit shows that the problem with either
is fixed: mathiasverraes@75761c7
(It's actually fixed as a consequence of something else, we swapped the implementation of or
to make sure it only executes the second parser if the first one failed.)
This change is in the main branch and will be in 0.4. There are lots of breaking changes so beware.
from parsica.
The new JSON parser can escape characters in string literals: https://github.com/mathiasverraes/parsica/blob/main/src/JSON/JSON.php#L214
from parsica.
See the commit above, that shows how to parse the path. But I'm sure you figured that part out yourself.
I'm not sure what you want to achieve. Are you trying to make the parser somehow understand that parts of the path are repeated?
from parsica.
Thank you for your responsiveness!
To provide you with more context. I'm building an open source utility to normalize a flat map using a small DSL.
I'm currently using in production a version of this utility I wrote using imperative code.
I thought it would be nice to build an AST Tree using the Parsica library that I use to normalize the data.
I already managed to build a single branch of the AST tree by parsing a single key ("$.a.b.c.d").
[
{
"id@int": "123",
"name": "Brian",
"car.color":"red",
"car.type": "SUV",
"children[childId].id": "1",
"children[childId].name": "Tom",
"_childId": "1",
},
{
"id@int": "123",
"name": "Brian",
"car.color":"red",
"car.type": "SUV",
"children[childId].id": "2",
"children[childId].name": "Harry",
"_childId": "2",
}
]
=>
{
"id": 123,
"name": "Brian",
"car": {"color":"red", "type": "SUV"},
"children" [{"id": "1", "name": "Tom"}, {"id": "2", "name": "Harry"}]
}
I'm not sure what you want to achieve. Are you trying to make the parser somehow understand that parts of the path are repeated?
That's correct, I would like to "skip" the parsing of repeated strings
[
"/a/b/c/file1",
" **/a/b/c/** file2",
" **/a/b/c/** file3",
" **/a/b/** file4"
]
I would like to introduce a ParsingContext to my parser factory to provide me with cachedParsingSteps:
function parserBoundToContext(ParsingContext $context) {
$resolveFromCache = fn($input) $this->resolveFromCache($input, $context);
$parser = (
// resolved from $context->cachedParsingSteps
string("/a/b/c/")->map($resolveFromCache)
->or(string("/a/b/")->map($resolveFromCache))
->or(string("/a/")->map($resolveFromCache))
->or(nothing())
);
// ...
}
Whenever a folder is parsed, I would like to enrich the cachedParsingSteps with the input (the whole input since the start -- could be generated from the whole input
minus the remainder
) and its result.
Ideas
- Provide both the value and the remainder in Parse::map($value, $remainder). We could use reflection to know if the param
$remainder is declared. - Maybe we can create a cache($parser, $context) parser wrapper that would do something like:
(startCacheNothing($context)->followedBy($parser)->thenIgnore(endCacheNothing($context)))->map(enrichContextWithValue($context))
I hope all this makes sense to you.
Cheers,
from parsica.
Some thoughts:
- You do have access to input, output, and remainder during the parsing process right now, but only by writing your own combinator. This is the general pattern
<?php
function yourCombinator(Parser $parser, $context) : Parser {
return Parser::make(function (string $input) use ($parser, $context) : ParseResult {
$result = $parser->run($input);
if($result->isSuccess()) {
// You now have access to $result->output() and ->remainder(),
// as well as the original $input, the original $parser, and your $context
}
// Return a ParseResult
}
});
}
2.The above could be replaced by something similar to map
. I don't want change the definition of map
because that would conflict with the common definition of map in FP languages. But it could be something like map2(Parser, fn($output, $remainder) : ParseResult) : Parser
- Having side effect when something is parsed (such as updating a cache) could be done with events.
emit(Parser, fn($output):void) : Parser
. In other words, it's a combinator that returns a Parser that behaves exactly like the original parser, but performs side effects in function.
<?php
$cache = new Cache();
$addToCache = fn($output) => $cache->add($output);
$parser = emit(char('a'), $addToCache); // If we successfully parse 'a', add 'a' to cache.
I've added emit()
in the commit above. (It's not in the main branch yet.) I'm not sure about the map2 thing yet, but you could easily make that yourself by adding a map2 to both Parser and ParseResult.
I'm still a bit unclear as to what exactly the point is of what you're trying to do, but let me know if these things help.
from parsica.
I think I found a way to achieve what I initially wanted to do.
I'll try to implement it and close this PR by the end of this week.
--
Do you think we should add this utility?
function lazy(callable /*: () => Parser */ $parserFactory): Parser
{
return Parser::make(function (string $input) use ($parserFactory): ParseResult {
$parser = $parserFactory();
return $parser->run($input);
});
}
$parser = lazy(fn() => string('joris'));
from parsica.
Good to hear you fixed it. The lazy
combinator was suggested by someone on Twitter as well. I'm a bit hesitant to add it, because it forces the user to think about performance. I'm hoping the avoid that. We have some ideas that we want to try, but there are a number of intermediate steps we need first. So I don't want to add performance features before we can measure it and exhaust other options.
from parsica.
I finished implementing the parser using Parsica. The library is now working as expected : ).
Some notes:
- Unexpected behaviour when using
or()
(used under the hood byeither
) withemit()
.
either(
emit(
success(),
fn() => var_dump("I expect to be triggered")
),
emit(
success(),
fn() => var_dump("I do not expect to be triggered")
)
)->try('test');
My use-case:
I'm trying to parse from the cache (sometimes successful) "or" using the root parser (always successful).
I don't expect both events to be emitted.
return either(
$context->preflightCacheParser(),
$root
)->followedBy($rest);
This is what allows me to skip the repeated parts:
[
"/a/b/c/file1",
" **/a/b/c/** file2",
" **/a/b/c/** file3",
" **/a/b/** file4"
]
As you already know, it works when replacing the current implementation of or
with the commented implementation.
Perhaps it's another reason, beyond performance, to get rid of the current implementation.
"or()" is used by quite a lot of utils that would have to be rewritten, I hope the default behaviour will change.
-
When using emit(), the $receiver does not have access to: $output, $input and $remainder. My use case requires both $output and $remainder. I wouldn't know how to implement it with the current
emit()
. Saying that, unlike theor()
behaviour, it's very easy to go around the problem by creating a new emit function. -
The Persica implementation of the Parser is very slow. This utility is now roughly 10 times slower than its previous version. Of course, we have to take in consideration that it's much easier to maintain and an extra layer of cache would result in similar performance. Unfortunately, in some environments, cross-request caching isn't possible.
/**
* @test
*/
public function it_should_parse_under_100_ms()
{
$propertyName = atLeastOne(alphaNumChar());
$type = emit(
either(
eof(),
char('@')
->followedBy($propertyName)
->thenIgnore(eof()),
),
function () {}
);
$map = emit(
char('.')->followedBy($propertyName),
function () {}
);
$list = emit(
between(
char('['),
char(']'),
either(
char('@')
->followedBy($propertyName)
->map(fn($value) => [
'discriminatorName' => $value,
'keepKeys' => true
]),
$propertyName
->map(fn($value) => [
'discriminatorName' => $value,
'keepKeys' => false
]),
)
),
function () {}
);
$root = emit(
char('$'),
function () {}
);
$rest = many(any($map, $list))->followedBy($type);
$parser = either(
failure(), // $context->preflightCacheParser(),
$root
)->followedBy($rest);
$start = microtime(true);
for ($i = 0; $i < 500; $i++) {
$parser->try('$.q.w[@1].e[2]@int');
}
$end = microtime(true);
$this->assertLessThan(0.1, $end - $start);
}
Thanks again for your work!
from parsica.
I've also added your other test. Would be possible to share your original parser? That way, we can define the test as a comparison, where the Parsica version must perform at most x% slower than the original one.
from parsica.
BTW I'm sure you know this, but if xdebug is on, it makes that test about 7x slower on my machine. It's still too slow, of course. I'm hoping to focus on performance after I get v0.4 out the door.
from parsica.
if xdebug is on, it makes that test about 7x slower on my machine. It's still too slow, of course.
Good point, I probably had it on.
--
The original parser isn't open-sourced and does much more than parsing.
I'll try to implement another Parser using imperative code by the end of this week 🤞 so we can have a proper comparison.
from parsica.
FYI (without xdebug)
Imperative
Current draft: https://gist.github.com/grifx/1efe84852f2e4dd867793d66149d152b
Time: 00:00.046, Memory: 6.00 MB
Parsica
Time: 00:00.672, Memory: 6.00 MB
$context = new ParsingContext();
for ($i = 0; $i < 1000; $i++) {
$context->startParsingKey(0, '$.qwerty.qwerty[@qwerty].qwerty[qwerty]@qwerty');
$this->imperativeKeyParser->try('$.qwerty.qwerty[@qwerty].qwerty[qwerty]@qwerty', $context);
$context->endParsingKey();
$context->clean();
}
$context = new ParsingContext();
for ($i = 0; $i < 1000; $i++) {
$context->startParsingKey(0, '$.qwerty.qwerty[@qwerty].qwerty[qwerty]@qwerty');
$this->parsicaKeyParser->try('$.qwerty.qwerty[@qwerty].qwerty[qwerty]@qwerty', $context);
$context->endParsingKey();
$context->clean();
}
I'll try to clean, extract and share the unit test during the week.
Note: They are not behaving exactly the same way. For instance, the imperative parser deals with char escaping: $.www\.gooo\\\.oooogle\.com
. I haven't thought about how to do it with Parsica yet.
from parsica.
Should we close this? I'm not sure if there's anything actionable right now.
from parsica.
Sorry for the late reply.
Yes, let's close this issue.
I'll release the library using the imperative parser and open a PR to introduce Parsica.
Thanks for your help!
from parsica.
@mathiasverraes FYI, I cannot close this issue since I did not open it.
from parsica.
Related Issues (17)
- Apply() should throw when the inner function's arity != 1
- How to parse annotations HOT 3
- Float should not parse to string but to a custom float object HOT 1
- Where to look for potential JSON parser improvements HOT 1
- User-space Stream implementations and TakeResult HOT 1
- Ternary operator HOT 6
- run GH actions on php7.4 and 8.0
- Add phpbench as dev-dependency HOT 1
- Feature idea: collect a map intead of a list
- Learning Parsica HOT 3
- AST validation [Question] HOT 7
- Hot to fail expression parser if no operator in input HOT 2
- Beginner question - extending expression parser
- Library maintenance HOT 1
- Cleanup: There should be one *and preferably only one* obvious way to do it.
- Unique keys in JSON-object HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from parsica.