Giter Site home page Giter Site logo

nom's People

Contributors

badboy avatar cad97 avatar chifflier avatar derekdreery avatar geal avatar guillaumegomez avatar homersimpsons avatar hywan avatar jansegre avatar joelself avatar jrakow avatar juchiast avatar kamarkiewicz avatar kamilaborowska avatar keruspe avatar kompass avatar kpp avatar lu-zero avatar lucretiel avatar meh avatar namsoocho avatar ngrewe avatar nickelc avatar sdroege avatar sourrust avatar tmccombs avatar tstorch avatar vickenty avatar waelwindows avatar willmurphyscode avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nom's Issues

Difference with Parsec nomenclature

Some of Parsec's functions are available in nom, but not all of them, and not always with the same name:

Take remaining bytes.

Right now I'm doing this:

named!(verbatim(&'a [u8]) -> &'a str, map_res!(
    alt!(take_until_either!("|^*$#") | rest),
    str::from_utf8));

fn rest(i: &[u8]) -> IResult<&[u8], &[u8]> {
    IResult::Done(&i[i.len()..], i)
}

I need to take until one of those characters happens, or read up to the end and finish parsing. Is there a better way to do it currently?

Add "switch" parser combinator macro

It's quite common pattern in binary protocols to use input[0] as message type and the remaining as the message body. Proposed to add macro for the following "switch" parser:

fn parse (i: &[u8]) -> IResult<&[u8], T> {
    match takeN!(i) {
        IResult::Done(i, o) => {
            match o {
                1 => parse_a(i),
                2 => parse_b(i),
                3 => parse_c(i),
                ...
                _ => IResult::Error(Err::Code(1))
            }
        }
        IResult::Error(e) => IResult::Error(e),
        IResult::Incomplete(n) => IResult::Incomplete(n),
    }
}

Separate naming of rules from definition of rules

Simple enough; it'd really just be the following + the closure-based stuff:

#[macro_export]
macro_rules! named (
    ($name:ident<$i:ty,$o:ty>, $rule:tt) => (
        fn<$i,$o>( i: $i ) -> IResult<$i,$o> {
            ($rule)( i )
        }
    );
    ($name:ident, $rule:tt) => (
        named!( $name<&[u8], &[u8]>, $rule )
    );
);

If you wanted to be really fancy, you could abstract the _parser! stuff like so:

#[macro_export]
macro_rules! parser (
    (alt, $i:ident, $($rest:tt)*) => ( alt_parser!( $i | $rest ) );
    ...
);

and then replace $rule:tt and ($rule)( i ) with $rule:ident, $($rest:tt)* and parser!( $rule, i, $rest ) in named.

Then you'd be able to say named!( foo, take, 3 ) and it wouldn't even have any closure overhead ๐Ÿ˜ƒ

Bit field parsers

As mentioned by @rrichardson in UpstandingHackers/hammer#64 (comment), parsers that transform from bit positions to a tuple of fields would be useful:

pub fn be_bits_1<'a, A>(i: &[u8], u8) -> IResult<'a,&[u8], (A)>
pub fn be_bits_2<'a, A, B>(i: &[u8], u8, u8) -> IResult<'a,&[u8],( A, B )>
pub fn be_bits_2<'a, A, B>(i: &[u8], u8, u8, u8) -> IResult<'a,&[u8],( A, B, C )>.
//.. up to 8 or so and also for le

// + a small bit of  macro magic to streamline the chaining 

// So to parse something like a TCP header, one would do something like: 
chain!(
    src_prt::  be_u16 ~ 
    dst_prt :  be_u16 ~ 
    seq_num: be_u32 ~
    ack_num: be_u32 ~
    (offs, _, flags) : be_bits_3( 4, 3, 9 ) ~ 
    blah: blah ~
) 

count_fixed! macro doesn't seem to expand properly

Hi there,

I tried fixing this myself, but I don't understand why it's not working - sorry ๐Ÿ˜ž

Essentially, this works (in a chain!):

        e_res2:     count_fixed!( call!(le_u16), u16, 10 ) ~

But this does not, despite this macro case:

        e_res2:     count_fixed!( le_u16, u16, 10 ) ~

Here's where I'm using it, if that's helpful.

`alt!` + `map!` + `call!` strange behavior

alt! + map! + call! have strange behavior.

This code work. See to closure in map!:

named!(range<&[u8], Range>,
    alt!(
        chain!(
            start: take_char ~
            tag!("-") ~
            end: take_char,
            || {
                debug!("range: (start, end): ({:?}, {:?})", start, end);
                Range {
                    start: start,
                    end: end,
                }
            }
        ) |
        map!(
            take_char,
            |c| {
                debug!("range: c: {:?}", c);
                Range {
                    start: c,
                    end: c,
                }
            }
        )
    )
);

If we try to wrap closure in map! by call!, it will not work:

...
        map!(
            take_char,
            call!(|c| {
                debug!("range: c: {:?}", c);
                Range {
                    start: c,
                    end: c,
                }
            })
        )
...

Error:

src/parser.rs:212:13: 212:14 error: unexpected end of macro invocation
src/parser.rs:212             })
                              ^

But if we use map! without alt! it have different behavior. Next code work:

named!(literal<&[u8], Expr>,
    map!(
        many1!(take_char),
        call!(|cs| {
            debug!("literal: cs: {:?}", cs);
            Expr::Literal {
                chars: cs,
            }
        })
    )
);

Without call! it not work:

named!(literal<&[u8], Expr>,
    map!(
        many1!(take_char),
        |cs| {
            debug!("literal: cs: {:?}", cs);
            Expr::Literal {
                chars: cs,
            }
        }
    )
);

Error:

src/parser.rs:220:9: 220:10 error: expected ident, found |
src/parser.rs:220         |cs| {
                          ^

Many macros could declare closures rather than just functions

As a quick example, a rewritten alt!()

#[macro_export]
macro_rules! alt (
    ($name:ident<$i:ty,$o:ty>, $($rest:tt)*) => (
        fn $name(i:$i) -> IResult<$i,$o>{
            alt_parser!(i | $($rest)*)
        }
    );
    ($($rest:tt)*) => ( | i | { alt_parser!(i | $($rest)*) } );
);

Just one additional line, and now omitting the name allows it to be used in value position.

Consumer.run() panics when consume() requests more data than available

The run() method currently doesn't check correctly whether the producer has given it enough data to meet what the consume() method requested. You'll drop out of the data collection loop because you are at eof (same thing could happen because of a ProducerError as well), with needed > acc.len(), and then try to get a slice that extends beyond the end of the buffer.

This test case demonstrates the problem:

   struct TestConsumer {
       done : bool
   }

   impl Consumer for TestConsumer {
       fn end(&mut self) {
       }

  fn consume(&mut self, input: &[u8]) -> ConsumerState {
    if self.done {
        ConsumerState::ConsumerDone
    }  else if input.len() < 2 {
        ConsumerState::Await(0,2)
    } else {
        self.done = true;
        ConsumerState::ConsumerDone
       }
    }

   fn failed(&mut self, error_code: u32) {
        println!("failed with error code: {}", error_code);
   }
}

  #[test]
  fn overrun() {
      let mut p = MemProducer::new(&b"a"[..], 1);
      let mut c = TestConsumer{ done: false };
      c.run(&mut p);
      assert_eq!(c.done, false);
  }

The right thing would probably be to call failed(), but that usually takes error codes produced by the consumer as an argument, so I'm not sure what to do here.

Network producer

nom should be able to get data from the network and parse it as soon as it is available

How get values from pusher! macro?

Hi!
I used pusher!() macro with FileProducer. But generated code doesn't return parser results.
For example, I have

pub struct Test {...}
...
named!(parse_test<&[u8],Test>, ...)

I want get [Test] or iterable object of Test sequence.
Does nom have appropriate macro or example?

Make a size_buffer combinator

Currently, we have length_value taking the first byte as length then absorbing a buffer of that size, and length_value! taking the result of the first parser as count, then applying the second parser that many time.

There should be a combinator whose argument is a parser returning a number, then returning a buffer of that size (to be able to use be_u16, be_u32 and others as size parser).

take_until_either! returns error when bytes aren't found

named!(xxx, take_until_either!("!."));

#[test]
fn test_take_until_either() {
    assert_eq!(
        xxx(&b"123"[..]),
        nom::IResult::Incomplete(nom::Needed::Unknown)
    );
}
//
//thread 'test_take_until_either' panicked at 'assertion failed: `(left == right)` (left: `Error(Position(14, [49, 50, 51]))`, right: `Incomplete(Unknown)`)'

I'd expect the result of the above to be Incomplete(Unknown) because it's unknown how much more must be read until any of the bytes are found. Is the error result correct?

Also compare this to take_until...

named!(xxx2, take_until!("end"));

#[test]
fn test_take_until() {
    assert_eq!(
        xxx2(&b"123"[..]),
        nom::IResult::Incomplete(nom::Needed::Unknown)
    );
}
//
//thread 'test_take_until' panicked at 'assertion failed: `(left == right)` (left: `Incomplete(Size(4))`, right: `Incomplete(Unknown)`)'

Channel producer

It should be possible to build a producer from an incoming channel, to parse data sent from another thread

Missing Readme file.

Steps to reproduce: try reading the Readme file.

Expected result: the readme file explains what this is about.
Actual result: there is no readme file.

Consumer::run doesn't stop after ConsumerState::ConsumerError is returned

If you return ConsumerError from Consumer::consume, Consumer::run will essentially run in an infinite loop. I think this is due to the empty match branch in https://github.com/Geal/nom/blob/master/src/consumer.rs#L162.

Consumer::run() should ideally stop execution (at least stop calling Consumer::consume with invalid data) when this happens. With the current behavior, it's impossible to recover from parsing invalid data using a Consumer.

The consumed field in Await is confusing

Right now, the consume() method has to calculate every time how much data has been consumed. This complicates the code and makes it error prone.

One solution could be returning the remaining input, and let the run() method calculate how much data has been consumed.

nom consuming 100% cpu

I am exploring the possibility of switching to nom in a project I am working on. I am not fully familiar with nom yet, so please bear with me.

For starters, I was trying to come up with a parser that matches strings of the form [a-zA-Z][-a-zA-Z0-9_]*. I wrote this:

#[macro_use]
extern crate nom;

use std::str::from_utf8;

use nom::{alpha, alphanumeric};
use nom::{IResult, Needed};
use nom::IResult::*;

named!(identifier<&[u8], String>,
       chain!(
           h: map_res!(alpha, from_utf8) ~
           t: many0!(alt!(alphanumeric | tag!("-") | tag!("_"))),
           || {
               let  s = h.to_string();
               t.into_iter().fold(s, |mut accum, slice| {
                   accum.push_str(from_utf8(slice).unwrap()); accum })}));

And I tested it with:

    #[test]
    fn id_name() {
        let a_setting = &b"miles"[..];
        let res = setting_name(a_setting);
        assert_eq!(res, Done(&b""[..], "miles".to_string()));
    }

When I run cargo test my PC completely hangs. With top I can see that it starts consuming more and more CPU and memory until the entire system is completely unusable and I have to hard reset.

Am I doing something wrong? Is this the best way to make a parser to match this type of strings?

Why uses `IResult` instead `std::result::Result<Status<I, O>, Error>`?

Why uses:

pub enum IResult<I,O> {
  Done(I,O),
  Error(Err),
  Incomplete(u32)
}

why not (for example):

pub enum Status<I, O> {
    Done(I, O),
    Incomplete(u32)
}

type Result<I, O> = std::result::Result<Status<I, O>, Error>

because of this there is no possibility to use map_err or try!...
What causes?

error: macro undefined: 'delimited1!'

Then uses delimited! macro it throw next error:

<nom macros>:8:32: 8:42 error: macro undefined: 'delimited1!'
<nom macros>:8 IResult:: Done ( i1 , _ ) => { delimited1 ! ( i1 , $ ( $ rest ) * ) } } } ) ;
                                              ^~~~~~~~~~

Reason:
delimited1! and delimited2! macros haven't #[macro_export] attributes

alt!() is not commutative

I came across something interesting when writing a parser to match a string literal with escape sequences.

Consider this code:

// ~~~ String literal parser and auxiliary parsers ~~~
named!(not_escaped_seq<&[u8], &[u8]>, take_until_either!(&b"\\\""[..]));
named!(escaped_seq, alt!(tag!("\\r") | tag!("\\n") | tag!("\\t") | tag!("\\\"") | tag!("\\\\")));
named!(string_literal<&[u8], String>,
       chain!(
           tag!("\"") ~
           s: many0!(map_res!(alt!(escaped_seq | not_escaped_seq), from_utf8)) ~
           tag!("\""),
           || {
               syntax::parse::str_lit(&s.into_iter().fold(String::new(),
                                                          |mut accum, slice| {
                                                              accum.push_str(slice);
                                                              accum
                                                          })[..])}));

It matches string literals that can contain any of the escaped sequences listed in escaped_seq. This parser works as expected, however, switching the order of the options in alt!(escaped_seq | not_escaped_seq) makes the parser unable to recognize any string literal that contains at least an escape sequence.

That is, replacing this line:

           s: many0!(map_res!(alt!(escaped_seq | not_escaped_seq), from_utf8)) ~

With:

           s: many0!(map_res!(alt!(not_escaped_seq | escaped_seq), from_utf8)) ~

Breaks the parser. Here are 2 test cases:

    #[test]
    fn single_str_scalar_value() {
        let input = &b"\"a string literal\""[..];
        let res = str_scalar_value(input);
        assert_eq!(res, Done(&b""[..], "a string literal".to_string()));        
    }

    #[test]
    fn single_str_scalar_value2() {
        let input = &b"\"A backslash in quotes: \\\"\\\\\\\"\""[..];
        let res = str_scalar_value(input);
        assert_eq!(res, Done(&b""[..], "A backslash in quotes: \"\\\"".to_string()));       
    }

The former passes with both versions; the latter fails with the 2nd version of the parser (the parser returns an Error). In general, any string with an escaped sequence is not recognized by the 2nd version of the parser.

Shouldn't alt be commutative?

tag! and byte arrays

What is the correct way to use tag! with a fixed byte array? tag!([42u8, 42u8]) fails, as AsBytes is not implemented for [u8; 2].

Bit sized parsers

Most of the parsers work on byte arrays, but IResult is completely generic, so accepting BitVecas input should be possible.

Example parsers

We currently have a few example parsers. In order to test the project and make it useful, other formats can be implemented. Here is a list, if anyone wants to try it:

Junk in the cargo package source

nom 0.3.10 has lots of large files in the cargo package source -- a 5 MB mp4 and more files olddoc, oldsrc (see at the end for file listing). (removed)

This is a reminder that cargo includes all non-ignored files in your working directory when you publish โ€” look at git status before you publish.

(I downloaded all crates.io crates and I started to grep for junk)

Bit level parsing

Since nom is mostly generic, it should be possible to apply parsers on bit slices. The naive way is to do it like this:

let bv = BitVec::from_bytes(&data[..]);
let bits: Vec<bool> = bits.iter().collect();

This is not very efficient, and a lot of bit level manipulations are calculations, so working with booleans is not the right way.

Error(u32) is not used currently

An error code might not be the best way to represent that something went wrong. Returning an accumulation of sub parser errors could indicate what parsing path failed, instead of a global parsing error.

Document ConsumerState variant values

Although there are comments describing the variants of ConsumerState::Await and ConsumerState::Seek in https://github.com/Geal/nom/blob/master/src/consumer.rs#L76, there are no documentation comments on these variants, and thus no indication at all as to what these variants do in the documentation.

Fixing this would be as simple as adding /// (amount_consumed, buffer_needed) before Await and /// (consumed, new_position, buffer_needed) before Seek. That wouldn't be very much documentation, but it would be hugely useful for those trying to figure out ConsumerState from just looking at documentation.

Producers produce fixed size chunks

There are cases where we do not know how much data we need at first, but after getting a header, we know what chunk size would be optimal.
Making the producers able to produce arbitrarily sized chunks could be useful

Public named functions/parsers

I guess re-exporting would be an alternative to adding a public export option to named parsers. What do you think of adding something like named!(pub foo<&u8>, ...)?

Provide some way to access nom::Needed values

I know the documentation says (for now the value is ignored, but it should indicate how much is needed), but it seems like most built-in parsers return reasonable values for this, and it would be super nice to be able to use when returning a ConsumerState::Await value from a consumer.

I would think this would be as simple as:

nom::IResult::Incomplete(x) => {
    let x = match x {
        nom::internal::Needed::Size(x) => x,
        nom::internal::Needed::Unknown => 1,
    };
    ConsumerState::Await(0, x)
},

But alas, this does not work, due to Needed::Size and Needed::Unknown both being private variants. (error: variantSizeis private)

It would be nice to make at least Needed::Size public, or to add a possible_size(&self) method which would return Option<usize>.

Is it possible to use namespaced functions in chain!()?

When using chain!(), using space works, but not nom::space: error: no rules expected the token::``.

Would it be possible to support this, or is this impossible/too complicated with the current macro system and the way chain!() is built? It would be nice to not have to use all functions to use in chain.

This is mostly a question of whether this is currently possible, it isn't really needed if it isn't possible.

The data in Incomplete is not used right now

Most of the Incomplete(usize) instances return 0 right now. Here are the possible fixes:

  • remove the field entirely, and let the calling code manage data aggregation automatically. This is easy (and corresponds to how the current code works).
  • return a sum type, something like Unknown|Size(usize). This adds more code in pattern matches, but it handles the case where we do not know how much data should be returned, and the case where we know, and the calling parser can augment it

There is still the problem that parsers are just functions, and do not have a value attached for the minimal data size they could need.

Still, returning a needed size is useful in cases where you need to seek, or load a large part of a file in memory, instead of chunking.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.