Giter Site home page Giter Site logo

Choice with Vec of parsers about combine HOT 4 CLOSED

stefnotch avatar stefnotch commented on September 3, 2024
Choice with Vec of parsers

from combine.

Comments (4)

Marwes avatar Marwes commented on September 3, 2024 1

Yes, that can be added (basically just copy the array impl). But in general, combine parsers are expected to be cheap to create so you may want to keep that in mind.

Since all the parsers must have the same type in the vector (or array, same issue) you must also take care that the closures cast to a function pointer so they have the same type. Alternatively you could do

choice::choice([
            string("alpha")),
            string("beta")),
            string("chi"),
            string("delta"),
...
]).map(|s| match s { "alpha" => MathChild::Identifier("α".into()), ... })

from combine.

stefnotch avatar stefnotch commented on September 3, 2024

I'm not certain if this request would actually fit my use case. My use case is re-implementing the AsciiMath grammar.

Something along the lines of

choice::choice([
            string("alpha").map(|_| MathChild::Identifier("α".into())),
            string("beta").map(|_| MathChild::Identifier("β".into())),
            string("chi").map(|_| MathChild::Identifier("χ".into())),
            string("delta").map(|_| MathChild::Identifier("δ".into())),
            string("Delta").map(|_| MathChild::Operator("Δ".into())),
            string("epsi").map(|_| MathChild::Identifier("ε".into())),
            string("varepsilon").map(|_| MathChild::Identifier("ɛ".into())),
            string("eta").map(|_| MathChild::Identifier("η".into())),
            string("gamma").map(|_| MathChild::Identifier("γ".into())),
            string("Gamma").map(|_| MathChild::Operator("Γ".into())),
            string("iota").map(|_| MathChild::Identifier("ι".into())),
            string("kappa").map(|_| MathChild::Identifier("κ".into())),
            string("lambda").map(|_| MathChild::Identifier("λ".into())),
            string("Lambda").map(|_| MathChild::Operator("Λ".into())),
            string("lamda").map(|_| MathChild::Identifier("λ".into())),
            string("Lamda").map(|_| MathChild::Operator("Λ".into())),
            string("mu").map(|_| MathChild::Identifier("μ".into())),
            string("nu").map(|_| MathChild::Identifier("ν".into())),
            string("omega").map(|_| MathChild::Identifier("ω".into())),
            string("Omega").map(|_| MathChild::Operator("Ω".into())),
            string("phi").map(|_| MathChild::Identifier("ϕ".into())),
            string("varphi").map(|_| MathChild::Identifier("φ".into())),
            string("Phi").map(|_| MathChild::Operator("Φ".into())),
            string("pi").map(|_| MathChild::Identifier("π".into())),
            string("Pi").map(|_| MathChild::Operator("Π".into())),
            string("psi").map(|_| MathChild::Identifier("ψ".into())),
            string("Psi").map(|_| MathChild::Identifier("Ψ".into())),
            string("rho").map(|_| MathChild::Identifier("ρ".into())),
            string("sigma").map(|_| MathChild::Identifier("σ".into())),
            string("Sigma").map(|_| MathChild::Operator("Σ".into())),
            string("tau").map(|_| MathChild::Identifier("τ".into())),
            string("theta").map(|_| MathChild::Identifier("θ".into())),
            string("vartheta").map(|_| MathChild::Identifier("ϑ".into())),
            string("Theta").map(|_| MathChild::Operator("Θ".into())),
            string("upsilon").map(|_| MathChild::Identifier("υ".into())),
            string("xi").map(|_| MathChild::Identifier("ξ".into())),
            string("Xi").map(|_| MathChild::Operator("Ξ".into())),
            string("zeta").map(|_| MathChild::Identifier("ζ".into())),
            string("*").map(|_| MathChild::Operator("⋅".into())),
            string("**").map(|_| MathChild::Operator("∗".into())),
            string("***").map(|_| MathChild::Operator("⋆".into())),
            string("//").map(|_| MathChild::Operator("/".into())),
            string("\\").map(|_| MathChild::Operator("\\".into())),
            string("setminus").map(|_| MathChild::Operator("\\".into())),
            string("xx").map(|_| MathChild::Operator("×".into())),
            string("|><").map(|_| MathChild::Operator("⋉".into())),
            string("><|").map(|_| MathChild::Operator("⋊".into())),
            string("|><|").map(|_| MathChild::Operator("⋈".into())),
            string("-:").map(|_| MathChild::Operator("÷".into())),
            string("@").map(|_| MathChild::Operator("∘".into())),
            string("o+").map(|_| MathChild::Operator("⊕".into())),
            string("ox").map(|_| MathChild::Operator("⊗".into())),
            string("o.").map(|_| MathChild::Operator("⊙".into())),
            string("^^").map(|_| MathChild::Operator("∧".into())),
            string("vv").map(|_| MathChild::Operator("∨".into())),
            string("nn").map(|_| MathChild::Operator("∩".into())),
            string("uu").map(|_| MathChild::Operator("∪".into())),
            string("!=").map(|_| MathChild::Operator("≠".into())),
            string(":=").map(|_| MathChild::Operator(":=".into())),
            string("lt").map(|_| MathChild::Operator("<".into())),
            string("<=").map(|_| MathChild::Operator("≤".into())),
            string("lt=").map(|_| MathChild::Operator("≤".into())),
            string("gt").map(|_| MathChild::Operator(">".into())),
            string("mlt").map(|_| MathChild::Operator("≪".into())),
            string(">=").map(|_| MathChild::Operator("≥".into())),
            string("gt=").map(|_| MathChild::Operator("≥".into())),
            string("mgt").map(|_| MathChild::Operator("≫".into())),
            string("-<").map(|_| MathChild::Operator("≺".into())),
            string("-lt").map(|_| MathChild::Operator("≺".into())),
            string(">-").map(|_| MathChild::Operator("≻".into())),
            string("-<=").map(|_| MathChild::Operator("⪯".into())),
            string(">-=").map(|_| MathChild::Operator("⪰".into())),
            string("in").map(|_| MathChild::Operator("∈".into())),
            string("!in").map(|_| MathChild::Operator("∉".into())),
            string("sub").map(|_| MathChild::Operator("⊂".into())),
            string("sup").map(|_| MathChild::Operator("⊃".into())),
            string("sube").map(|_| MathChild::Operator("⊆".into())),
            string("supe").map(|_| MathChild::Operator("⊇".into())),
            string("-=").map(|_| MathChild::Operator("≡".into())),
            string("~=").map(|_| MathChild::Operator("≅".into())),
            string("~~").map(|_| MathChild::Operator("≈".into())),
            string("~").map(|_| MathChild::Operator("∼".into())),
            string("prop").map(|_| MathChild::Operator("∝".into())),
            string("not").map(|_| MathChild::Operator("¬".into())),
            string("=>").map(|_| MathChild::Operator("⇒".into())),
            string("<=>").map(|_| MathChild::Operator("⇔".into())),
            string("AA").map(|_| MathChild::Operator("∀".into())),
            string("EE").map(|_| MathChild::Operator("∃".into())),
            string("_|_").map(|_| MathChild::Operator("⊥".into())),
            string("TT").map(|_| MathChild::Operator("⊤".into())),
            string("|--").map(|_| MathChild::Operator("⊢".into())),
            string("|==").map(|_| MathChild::Operator("⊨".into())),
            string(":|:").map(|_| MathChild::Operator("|".into())),
            string("int").map(|_| MathChild::Operator("∫".into())),
            string("oint").map(|_| MathChild::Operator("∮".into())),
            string("del").map(|_| MathChild::Operator("∂".into())),
            string("grad").map(|_| MathChild::Operator("∇".into())),
            string("+-").map(|_| MathChild::Operator("±".into())),
            string("-+").map(|_| MathChild::Operator("∓".into())),
            string("O/").map(|_| MathChild::Operator("∅".into())),
            string("oo").map(|_| MathChild::Operator("∞".into())),
            string("aleph").map(|_| MathChild::Operator("ℵ".into())),
            string("...").map(|_| MathChild::Operator("...".into())),
            string(":.").map(|_| MathChild::Operator("∴".into())),
            string(":'").map(|_| MathChild::Operator("∵".into())),
            string("/_").map(|_| MathChild::Operator("∠".into())),
            string("/_\\").map(|_| MathChild::Operator("△".into())),
            string("'").map(|_| MathChild::Operator("′".into())),
            string("\\ ").map(|_| MathChild::Operator(" ".into())),
            string("frown").map(|_| MathChild::Operator("⌢".into())),
            string("quad").map(|_| MathChild::Operator("  ".into())),
            string("qquad").map(|_| MathChild::Operator("    ".into())),
            string("cdots").map(|_| MathChild::Operator("⋯".into())),
            string("vdots").map(|_| MathChild::Operator("⋮".into())),
            string("ddots").map(|_| MathChild::Operator("⋱".into())),
            string("diamond").map(|_| MathChild::Operator("⋄".into())),
            string("square").map(|_| MathChild::Operator("□".into())),
            string("|__").map(|_| MathChild::Operator("⌊".into())),
            string("__|").map(|_| MathChild::Operator("⌋".into())),
            string("|~").map(|_| MathChild::Operator("⌈".into())),
            string("~|").map(|_| MathChild::Operator("⌉".into())),
            string("CC").map(|_| MathChild::Operator("ℂ".into())),
            string("NN").map(|_| MathChild::Operator("ℕ".into())),
            string("QQ").map(|_| MathChild::Operator("ℚ".into())),
            string("RR").map(|_| MathChild::Operator("ℝ".into())),
            string("ZZ").map(|_| MathChild::Operator("ℤ".into())),
            string("dim").map(|_| MathChild::Operator("dim".into())),
            string("mod").map(|_| MathChild::Operator("mod".into())),
            string("lub").map(|_| MathChild::Operator("lub".into())),
            string("glb").map(|_| MathChild::Operator("glb".into())),
            string("uarr").map(|_| MathChild::Operator("↑".into())),
            string("darr").map(|_| MathChild::Operator("↓".into())),
            string("rarr").map(|_| MathChild::Operator("→".into())),
            string("->").map(|_| MathChild::Operator("→".into())),
            string(">->").map(|_| MathChild::Operator("↣".into())),
            string("->>").map(|_| MathChild::Operator("↠".into())),
            string(">->>").map(|_| MathChild::Operator("⤖".into())),
            string("|->").map(|_| MathChild::Operator("↦".into())),
            string("larr").map(|_| MathChild::Operator("←".into())),
            string("harr").map(|_| MathChild::Operator("↔".into())),
            string("rArr").map(|_| MathChild::Operator("⇒".into())),
            string("lArr").map(|_| MathChild::Operator("⇐".into())),
            string("hArr").map(|_| MathChild::Operator("⇔".into())),
        ]);

from combine.

stefnotch avatar stefnotch commented on September 3, 2024

In the spirit of your other comment #289 (comment) , I'm now taking a stab at

  1. Creating a trie from the data
  2. Implementing a Parser<Input>
  3. And then I can cheaply create an optimised parser for that very use case.

I'll close this issue as soon as I got it figured out. I suppose adding the Vec impl might not be that useful after all.

from combine.

stefnotch avatar stefnotch commented on September 3, 2024

For future readers: I implemented a very basic parser that accepts a trie (using a branch on my forked https://github.com/stefnotch/qp-trie-rs/tree/fix-40 repo). It then greedily tries to find the longest match.

This particular approach is really just suited to the special case where one has a lot of different potential texts. It might also be possible to do better (?) with https://docs.rs/fst/latest/fst/ , but I have no idea how.

use combine::{
    error::{ErrorInfo, StreamError, Tracked},
    ParseError, ParseResult, Parser, Stream, StreamOnce,
};
use qp_trie::Trie;

/// A greedy parser that parses a string from a trie.
#[derive(Clone)]
pub struct TrieParser<'trie, Input, Output, E>
where
    Input: Stream<Token = char>,
    Output: Clone,
{
    trie: &'trie Trie<Vec<u8>, Output>,
    expected: E,
    _marker: std::marker::PhantomData<Input>,
}

impl<'trie, Input, Output, E> TrieParser<'trie, Input, Output, E>
where
    Input: Stream<Token = char>,
    Output: Clone,
{
    pub fn new(trie: &'trie Trie<Vec<u8>, Output>, expected: E) -> Self {
        assert!(!trie.is_empty());
        Self {
            trie,
            expected,
            _marker: std::marker::PhantomData,
        }
    }
}

pub fn trie_parser<'trie, Input, Output, E>(
    trie: &'trie Trie<Vec<u8>, Output>,
    expected: E,
) -> TrieParser<'trie, Input, Output, E>
where
    Input: Stream<Token = char>,
    Output: Clone,
{
    TrieParser::new(trie, expected)
}

impl<'trie, Input, Output, E> Parser<Input> for TrieParser<'trie, Input, Output, E>
where
    Input: Stream<Token = char>,
    Output: Clone,
    E: for<'s> ErrorInfo<'s, Input::Token, Input::Range>,
{
    type Output = Output;

    type PartialState = ();

    // Used https://docs.rs/combine/4.6.6/src/combine/parser/token.rs.html#240 as a reference
    #[inline]
    fn parse_lazy(&mut self, input: &mut Input) -> ParseResult<Self::Output, Input::Error> {
        let start = input.position();
        let mut char_bytes_container = [0; 4]; // The oversized container for the char bytes

        let mut last_match = None;

        let mut sub_trie = match combine::stream::uncons(input) {
            ParseResult::CommitOk(other) | ParseResult::PeekOk(other) => {
                let char_bytes = other.encode_utf8(&mut char_bytes_container).as_bytes();
                let sub_trie = self.trie.subtrie(char_bytes);
                if sub_trie.is_empty() {
                    return ParseResult::PeekErr(<Input as StreamOnce>::Error::empty(start).into());
                }
                sub_trie
            }
            ParseResult::PeekErr(mut error) => {
                error.error.set_position(start);
                return ParseResult::PeekErr(error);
            }
            ParseResult::CommitErr(mut error) => {
                error.set_position(start);
                return ParseResult::CommitErr(error);
            }
        };
        // Now I'm "committed" to parsing the rest of the input
        loop {
            if let Some(value) = sub_trie.get_value() {
                last_match = Some((input.checkpoint(), value));
            }

            sub_trie = match combine::stream::uncons(input) {
                ParseResult::CommitOk(other) | ParseResult::PeekOk(other) => {
                    let char_bytes = other.encode_utf8(&mut char_bytes_container).as_bytes();
                    let sub_trie = sub_trie.subtrie(char_bytes);
                    if sub_trie.is_empty() {
                        if let Some((checkpoint, value)) = last_match {
                            if let Err(err) = input.reset(checkpoint) {
                                return ParseResult::CommitErr(err.into());
                            }
                            return ParseResult::CommitOk(value.clone());
                        }
                        let mut errors = <Input as StreamOnce>::Error::from_error(
                            start,
                            StreamError::unexpected_token(other),
                        );
                        errors.add_expected(&self.expected);
                        return ParseResult::CommitErr(errors);
                    }
                    sub_trie
                }
                ParseResult::PeekErr(mut error) => {
                    if let Some((checkpoint, value)) = last_match {
                        if let Err(err) = input.reset(checkpoint) {
                            return ParseResult::CommitErr(err.into());
                        }
                        return ParseResult::CommitOk(value.clone());
                    }
                    error.error.set_position(start);
                    return ParseResult::CommitErr(error.error);
                }
                ParseResult::CommitErr(mut error) => {
                    if let Some((checkpoint, value)) = last_match {
                        if let Err(err) = input.reset(checkpoint) {
                            return ParseResult::CommitErr(err.into());
                        }
                        return ParseResult::CommitOk(value.clone());
                    }
                    error.set_position(start);
                    return ParseResult::CommitErr(error);
                }
            };
        }
    }

    fn add_error(&mut self, errors: &mut Tracked<<Input as StreamOnce>::Error>) {
        errors.error.add_expected(&self.expected);
    }
}

// Write unit tests for the trie parser
#[cfg(test)]
mod tests {
    use super::*;
    use combine::easy;
    use combine::Parser;
    use combine::Positioned;
    use qp_trie::Trie;

    #[test]
    fn test_trie_parser() {
        let mut trie = Trie::new();
        let map_insert = |map: &mut Trie<Vec<u8>, u32>, key: &str, value: u32| {
            map.insert(key.as_bytes().to_vec(), value);
        };
        map_insert(&mut trie, "abc", 1);
        map_insert(&mut trie, "abcd", 2);
        map_insert(&mut trie, "abcde", 3);
        map_insert(&mut trie, "abcdef", 4);
        map_insert(&mut trie, "abcdefg", 5);
        map_insert(&mut trie, "abcdefgh", 6);
        map_insert(&mut trie, "abcdefghi", 7);

        let mut parser = trie_parser(&trie, "expected string");
        let text = "abcd";
        let mut input = easy::Stream(text);
        let result = parser.parse_stream(&mut input);
        assert_eq!(result, ParseResult::CommitOk(2));
        assert_eq!(input.position().translate_position(text), 4);
    }
}

from combine.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.