Giter Site home page Giter Site logo

Comments (16)

josevalim avatar josevalim commented on September 1, 2024

Can you please describe the use case?

Show me a string that you want to parse that would require this functionality. If you can provide how such parser is implemented in ex_spirit or combine, even better.

NimbleParsec relies a lot on the fact it knows exactly what is happening. If we allow you to just hijack all of its state, then all optimizations are off, a submatch binary will be created as well as stacktrace entries.

I like the record idea. I will create another issue for it.

from nimble_parsec.

tmbb avatar tmbb commented on September 1, 2024

Can you please describe the use case?

Parsing XML

Show me a string that you want to parse that would require this functionality. If you can provide how such parser is implemented in ex_spirit or combine, even better.

You can look at this example parsing: https://github.com/OvermindDL1/ex_spirit/blob/master/examples/simple_xml.exs

If we allow you to just hijack all of its state, then all optimizations are off, a submatch binary will be created as well as stacktrace entries.

I don't want to hijack all the state. I only want to dynamically match something based on what I've matched before and update the state element basedon what I match. I only need to update the state part of the tuple, all the rest would be handled by nimble.

from nimble_parsec.

tmbb avatar tmbb commented on September 1, 2024

Actually, the state system would be a bonus, but giving read-only access to the tuple so that the user could tag tokens with the source positions would be pretty good already (even though this couldn't be used to parse context-sensitive languages)

from nimble_parsec.

josevalim avatar josevalim commented on September 1, 2024

@tmbb thank you. I have also read your forum comment that helped me understand it better.

I have another question. The parser is almost fully tail recursive so returning the state is a no-go because we would break tail recursion. I will see if I can implement it by simply passing a context that we push forward.

I will investigate the simple xml implementation and report back.

from nimble_parsec.

tmbb avatar tmbb commented on September 1, 2024

I will investigate the simple xml implementation and report back.

The implementation is quite involved if I remember correctly, but the important thing is that there is a state map and you can push things into it.

from nimble_parsec.

tmbb avatar tmbb commented on September 1, 2024

I actually use the state system on my HTML lexer to highlight matching open and closing tags

from nimble_parsec.

josevalim avatar josevalim commented on September 1, 2024

@tmbb how do you push things into it in a way that doesn't accidentally write something you have written before? Do you keep an integer too?

from nimble_parsec.

tmbb avatar tmbb commented on September 1, 2024

I don't know because I've just copied that part from the file I've shown you. ExSpirit only passes the state into children parsers, it doesn't overwrite the key of the parent parser. I don't remember how it's implemented internally. For "global" state ExSpirit defines the userdata map.

Tonight (GMT) I think I can write you an XML parser that uses userdata, which is much more intuitive.

What you want for nimble is the userdata not what @OverminDL1 calls "the state system". That's not as useful.

from nimble_parsec.

tmbb avatar tmbb commented on September 1, 2024

When using userdata to match XML tags you just need to keep your own stack in the shape of a list.

from nimble_parsec.

josevalim avatar josevalim commented on September 1, 2024

How do you handle things like HTML5 though? Where <p> may or may not have a closing tag?

from nimble_parsec.

josevalim avatar josevalim commented on September 1, 2024

I have an idea of what needs to be implemented. Generally speaking, it is this:

repeat_while(combinator, initial_mfa, prelude, to_repeat, while_mfa)

The initial_mfa receives the binary, line, offset and userdata and returns {:ok, userdata} | :error. If it returns :ok, it will use the combinators in prelude and goes into the to_repeat loop while the while_mfa returns true. The while_mfa receives binary, line, offset and userdata and returns {:ok, userdata} | :error.

We can use this approach to implement repeat_up_to, except we would stop using the stack and use the userdata. Although using the stack will definitely be faster since we don't need to do a key lookup. So I think I will keep userdata anyway.

So I think the next steps here are clear: add userdata support, revamp reduce_while.

The only downside of this approach is that each tag in a XML will be parsed twice. One with a preemptive lookup in initial_mfa and then the rest of it in prelude.

from nimble_parsec.

tmbb avatar tmbb commented on September 1, 2024

How do you handle things like HTML5 though? Where

may or may not have a closing tag?

I try to close the tag, and if it fails I gove up and highlight the tag alone without a matching tag. I don't need ti actually parse HTML, only lex it

from nimble_parsec.

tmbb avatar tmbb commented on September 1, 2024

The matching tags is just a bonus for better highlighting

from nimble_parsec.

tmbb avatar tmbb commented on September 1, 2024

I'll look at your commits tonight. Seems cool

from nimble_parsec.

josevalim avatar josevalim commented on September 1, 2024

@tmbb I will close this when all is done and post a xml parsing example.

from nimble_parsec.

josevalim avatar josevalim commented on September 1, 2024

Here is the XML example: https://github.com/plataformatec/nimble_parsec/blob/master/examples/simple_xml.exs

I need to document some of the functions used there and I will do that tomorrow.

from nimble_parsec.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.