Giter Site home page Giter Site logo

rust-scan-rules's Introduction

scan-rules

This crate provides some macros for quickly parsing values out of text. Roughly speaking, it does the inverse of the print!/format! macros; or, in other words, a similar job to scanf from C.

The macros of interest are:

  • readln! - reads and scans a line from standard input.
  • try_readln! - like readln!, except it returns a Result instead of panicking.
  • scan! - scans the provided string.

Plus two convenience macros:

  • let_scan! - scans a string and binds captured values directly to local variables. Only supports one pattern and panics if it doesn't match.
  • let_scanln! - reads and scans a line from standard input, binding captured values directly to local variables. Only supports one pattern and panics if it doesn't match.

If you are interested in implementing support for your own types, see the ScanFromStr trait.

The available abstract scanners can be found in the scanner module.

Links

Compatibility

scan-rules is compatible with rustc version 1.6.0 and higher.

  • Due to a breaking change, scan-rules is not compatible with regex version 0.1.66 or higher.

  • rustc < 1.10 will not have the let_scanln! macro.

  • rustc < 1.7 will have only concrete implementations of ScanFromStr for the Everything, Ident, Line, NonSpace, Number, Word, and Wordish scanners for &str and String output types. 1.7 and higher will have generic implementations for all output types such that &str: Into<Output>.

  • rustc < 1.6 is explicitly not supported, due to breaking changes in Rust itself.

Quick Examples

Here is a simple CLI program that asks the user their name and age. You can run this using cargo run --example ask_age.

#[macro_use] extern crate scan_rules;

use scan_rules::scanner::Word;

fn main() {
    print!("What's your name? ");
    let name: String = readln! { (let name: Word<String>) => name };
    //                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^ rule
    //                                                       ^~~^ body
    //                           ^~~~~~~~~~~~~~~~~~~~~~~^ pattern
    //                            ^~~~~~~~~~~~~~~~~~~~~^ variable binding

    print!("Hi, {}.  How old are you? ", name);
    readln! {
        (let age) => {
    //   ^~~~~~^ implicitly typed variable binding
            let age: i32 = age;
            println!("{} years old, huh?  Neat.", age);
        },
        (..other) => println!("`{}` doesn't *look* like a number...", other),
    //   ^~~~~~^ bind to any input "left over"
    }

    print!("Ok.  What... is your favourite colour? (R, G, B): ");
    let_scanln!(let r: f32, ",", let g: f32, ",", let b: f32);
    //          ^~~~^            ^~~~^            ^~~~^
    // Scans and binds three variables without nesting scope.
    // Panics if *anything* goes wrong.
    if !(g < r && g < b && b >= r * 0.25 && b <= r * 0.75) {
        println!("Purple's better.");
    } else {
        println!("Good choice!");
    }
}

This example shows how to parse one of several different syntaxes. You can run this using cargo run --example scan_data.

#[macro_use] extern crate scan_rules;

use std::collections::BTreeSet;

// `Word` is an "abstract" scanner; rather than scanning itself, it scans some
// *other* type using custom rules.  In this case, it scans a word into a
// string slice.  You can use `Word<String>` to get an owned string.
use scan_rules::scanner::Word;

#[derive(Debug)]
enum Data {
    Vector(i32, i32, i32),
    Truthy(bool),
    Words(Vec<String>),
    Lucky(BTreeSet<i32>),
    Other(String),
}

fn main() {
    print!("Enter some data: ");
    let data = readln! {
        ("<", let x, ",", let y, ",", let z, ">") => Data::Vector(x, y, z),
    //      ^ pattern terms are comma-separated
    //   ^~^ literal text match

        // Rules are tried top-to-bottom, stopping as soon as one matches.
        (let b) => Data::Truthy(b),
        ("yes") => Data::Truthy(true),
        ("no") => Data::Truthy(false),

        ("words:", [ let words: Word<String> ],+) => Data::Words(words),
    //             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~^ repetition pattern
    //                                         ^ one or more matches
    //                                        ^ matches must be comma-separated

        ("lucky numbers:", [ let ns: i32 ]*: BTreeSet<_>) => Data::Lucky(ns),
    //          collect into specific type ^~~~~~~~~~~~^
    //                                    ^ zero or more (you might be unlucky!)
    //                                      (no separator this time)

        // Rather than scanning a sequence of values and collecting them into
        // a `BTreeSet`, we can instead scan the `BTreeSet` *directly*.  This
        // scans the syntax `BTreeSet` uses when printed using `{:?}`:
        // `{1, 5, 13, ...}`.
        ("lucky numbers:", let ns) => Data::Lucky(ns),

        (..other) => Data::Other(String::from(other))
    };
    println!("data: {:?}", data);
}

This example demonstrates using runtime scanners and the let_scan! convenience macro. You can run this using cargo run --example runtime_scanners.

//! **NOTE**: requires the `regex` feature.
#[macro_use] extern crate scan_rules;

fn main() {
    use scan_rules::scanner::{
        NonSpace, Number, Word,             // static scanners
        max_width_a, exact_width_a, re_str, // runtime scanners
    };

    // Adapted example from <http://en.cppreference.com/w/cpp/io/c/fscanf>.
    let inp = "25 54.32E-1 Thompson 56789 0123 56ß水";

    // `let_scan!` avoids the need for indentation and braces, but only supports
    // a single pattern, and panics if anything goes wrong.
    let_scan!(inp; (
        let i: i32, let x: f32, let str1 <| max_width_a::<NonSpace>(9),
    //               use runtime scanner ^~~~~~~~~~~~~~~~~~~~~~~~~~~~^
    //          limit maximum width of a... ^~~~~~~~~~^
    //                      ...static NonSpace scanner... ^~~~~~~^
    //                                                      9 bytes ^
        let j <| exact_width_a::<i32>(2), let y: f32, let _: Number,
    //        ^~~~~~~~~~~~~~~~~~~~~~~~~^ scan an i32 with exactly 2 digits
        let str2 <| re_str(r"^[0-9]{1,3}"), let warr: Word
    //           ^~~~~~~~~~~~~~~~~~~~~~~~^ scan using a regular expression
    ));

    println!(
        "Converted fields:\n\
            i = {i:?}\n\
            x = {x:?}\n\
            str1 = {str1:?}\n\
            j = {j:?}\n\
            y = {y:?}\n\
            str2 = {str2:?}\n\
            warr = {warr:?}",
        i=i, j=j, x=x, y=y,
        str1=str1, str2=str2, warr=warr);
}

License

Licensed under either of

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you shall be dual licensed as above, without any additional terms or conditions.

rust-scan-rules's People

Contributors

danielkeep avatar eroc33 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

sunshine40

rust-scan-rules's Issues

More scanners

Ideas for more scanners.

Self scanners:

  • std::path::Path[Buf] - Path would probably only work on paths without spaces in them. You could also quote them, but that would only work for PathBuf. One other edge case is handling path separators for the current platform (for parsing PATH). There's enough variance that it might be simpler to just require the use of an abstract scanner. - decided that there isn't enough benefit here over just using a regular string scanner of some description. Once rust-lang/rust#26448 gets fixed, they should all just work for paths, anyway.

Abstract scanners:

  • Various for std::time::Duration - Duration doesn't have a very nice Debug representation (although it could be parsed). Would probably be nicer to just have scanners for various units of time (Seconds, Days, etc.). ISO 8601 is a bit ugly, but covers all the usual bases. If there's demand, shorthand f64 -> Duration scanners can be added.
  • HorizontalSpace - match any horizontal spaces: \x20, \t, etc., but not newlines or vertical tabs.
  • Newline - match any of \r, \r\n, \n.

Runtime scanners:

  • until(impl Pattern) - slice until a given std::str::pattern::Pattern matches.
  • pat(impl Pattern) - use str::starts_with. - impossible given current Pattern design.

Maybe mark as deprecated?

I was looking for something like this and thought it was a good solution until I discovered that it panics when (for example) let_scan doesn't match a pattern, which means it is not useful in production code. Finding that it hasn't been updated since 2016 and there a no issues means I think that it is not used nor being maintained, so it might be good to mark the README with a warning such as deprecation, and point people to other solutions.

In my case I found the Regex was a good alternative. You can see my newbie journey summed up in this topic.

Meanwhile, thanks for providing a nice solution and if it was to be picked up and brought up to date with idiomatic Rust (i.e. returning Result rather than panic'ing) I think it could still be a neat option. I like the Rust pattern matching style for example.

Consider non-Self composite scanners (tuples, arrays, option, etc.)

Currently, the pattern (Word, Option<u64>) doesn't scan because Word can't scan to itself. This can be solved by changing the tuple scanners to not require elements to implement ScanSelfFromStr.

However, this breaks type inference; in that case, (Word, _) won't work.

A compromise would be to create some kind of NonSelf adaptor and implement it for various important types.

Possible problem is that I don't know if it could be made general, but it could be quite handy to have nevertheless.

Implement more cursor marker types

List of desirable for cursor marker types:

StrCompare:

  • Normalized - Pick a Unicode normalisation and use it.
  • IgnoreCase - Unicode-aware case insensitive matching... or at least as good as can be done without locales. Should probably also include normalisation.

SkipSpace:

  • IgnoreNonLine - Like IgnoreSpace, except don't skip/ignore newlines.
  • Loose - Spaces in the pattern must exist in the input, but the exact contents of the space don't matter.
  • Exact - Spaces in the pattern must exist exactly in the input.

SliceWord:

  • NonSpace - Break words on whitespace.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.