Giter Site home page Giter Site logo

atoi_simd's Introduction

Rust fast &[u8] to integer parser

Crate API

SIMD (fast) parsing is supported on x86_64 (SSE4.1, AVX2) and on Arm64 (aarch64, Neon), but this library works even if you don't have a SIMD supported cpu (and it will be still faster than str::parse).

Supports negative values and validates the input.

Supported output types: u8, i8, u16, i16, u32, i32, u64, i64, u128, i128, usize, isize.

Has good test coverage, and can be considered safe.

To enable SIMD it needs the target-feature or target-cpu flags set, or it will fallback to non-SIMD functions. You can copy the ./.cargo/config.toml to your project, or use one of the following environment variables:

  • RUSTFLAGS="-C target-feature=+sse2,+sse3,+sse4.1,+ssse3,+avx,+avx2" for x86_64

  • RUSTFLAGS="-C target-feature=+neon" for Arm64

  • RUSTFLAGS="-C target-cpu=native" will optimize for your current cpu

For Windows PowerShell you can set it with $Env:RUSTFLAGS='-C target-feature=+sse2,+sse3,+sse4.1,+ssse3,+avx,+avx2'

By default the target-feature is set in ./.cargo/config.toml, but seems like it works only inside this project.

If you have &str then use .as_bytes()

Supports no_std with --no-default-features

Got the idea from here (source).

Examples

let val: u64 = atoi_simd::parse(b"1234").unwrap();
assert_eq!(val, 1234_u64);

assert_eq!(atoi_simd::parse::<i64>(b"-2345"), Ok(-2345_i64));

assert_eq!(atoi_simd::parse_any::<u64>(b"123something_else"), Ok((123_u64, 3)));

// a drop-in replacement for `str::parse`
assert_eq!(atoi_simd::parse_skipped::<u64>(b"+000000000000000000001234"), Ok(1234_u64));

Benchmarks

You can run cargo bench from bench folder on your machine (or individually with cargo bench -- "parse u64")

Results

More information you can find here.

v0.16.0

Rust 1.78, Windows 10, Intel i7 9700K, "target-feature" set

benchmark 64

benchmark 128

parse::<u64>()

parse::<u64>()

str::parse::<u64>()

str::parse::<u64>()

parse::<i64>()

parse::<i64>()

str::parse::<i64>()

str::parse::<i64>()

parse::<u128>()

parse::<u128>()

str::parse::<u128>()

str::parse::<u128>()

parse::<i128>()

parse::<i128>()

str::parse::<i128>()

str::parse::<i128>()

v0.15.2

Rust 1.73, Windows 10, Intel i7 9700K, "target-feature" set

benchmark 64

benchmark 128

parse::<u64>()

parse::<u64>()

str::parse::<u64>()

str::parse::<u64>()

parse::<i64>()

parse::<i64>()

str::parse::<i64>()

str::parse::<i64>()

parse::<u128>()

parse::<u128>()

str::parse::<u128>()

str::parse::<u128>()

parse::<i128>()

parse::<i128>()

str::parse::<i128>()

str::parse::<i128>()

v0.14.5

Rust 1.72, Windows 10, Intel i7 9700K, "target-feature" set

benchmark 64

benchmark 128

parse::<u64>()

parse::<u64>()

str::parse::<u64>()

str::parse::<u64>()

parse::<i64>()

parse::<i64>()

str::parse::<i64>()

str::parse::<i64>()

parse::<u128>()

parse::<u128>()

str::parse::<u128>()

str::parse::<u128>()

parse::<i128>()

parse::<i128>()

str::parse::<i128>()

str::parse::<i128>()

v0.14.4

Rust 1.72, Windows 10, Intel i7 9700K, "target-feature" set

benchmark 64

benchmark 128

parse::<u64>()

parse::<u64>()

str::parse::<u64>()

str::parse::<u64>()

parse::<i64>()

parse::<i64>()

str::parse::<i64>()

str::parse::<i64>()

parse::<u128>()

parse::<u128>()

str::parse::<u128>()

str::parse::<u128>()

parse::<i128>()

parse::<i128>()

str::parse::<i128>()

str::parse::<i128>()

v0.10.1

Rust 1.67.1, Windows 10, Intel i7 9700K, "target-feature" set

benchmark 64

benchmark 128

parse::<u64>()

parse::<u64>()

str::parse::<u64>()

str::parse::<u64>()

parse::<i64>()

parse::<i64>()

str::parse::<i64>()

str::parse::<i64>()

parse::<u128>()

parse::<u128>()

str::parse::<u128>()

str::parse::<u128>()

parse::<i128>()

parse::<i128>()

str::parse::<i128>()

str::parse::<i128>()

v0.4-v0.5

Rust 1.63, Windows 10, Intel i7 9700K, "target-feature" set

benchmark 64

benchmark 128

parse_u64()

parse() u64

str::parse::<u64>()

str::parse::<u64>()

parse_i64()

parse_i64()

str::parse::<i64>()

str::parse::<i64>()

parse_u128()

parse_u128()

str::parse::<u128>()

str::parse::<u128>()

parse_i128()

parse_i128()

str::parse::<i128>()

str::parse::<i128>()

v0.3.0

Rust 1.63, Windows 10, Intel i7 9700K, "target-feature" set

all

parse() u64

parse() u64

str::parse::<u64>()

str::parse::<u64>()

parse_i64()

parse_i64()

str::parse::<i64>()

str::parse::<i64>()

parse_u128()

parse_u128()

str::parse::<u128>()

str::parse::<u128>()

parse_i128()

parse_i128()

str::parse::<i128>()

str::parse::<i128>()

v0.2.1

Rust 1.63, Windows 10, Intel i7 9700K, "target-feature" set

all

parse() u64

parse() u64

str::parse::<u64>()

str::parse::<u64>()

parse_i64()

parse_i64()

str::parse::<i64>()

str::parse::<i64>()

parse_u128()

parse_u128()

str::parse::<u128>()

str::parse::<u128>()

parse_i128()

parse_i128()

str::parse::<i128>()

str::parse::<i128>()

v0.2.0

Rust 1.63, Windows 10, Intel i7 9700K, "target-feature" set

all

parse() u64

parse() u64

str::parse::<u64>()

str::parse::<u64>()

parse_i64()

parse_i64()

str::parse::<i64>()

str::parse::<i64>()

parse_u128()

parse_u128()

str::parse::<u128>()

str::parse::<u128>()

parse_i128()

parse_i128()

str::parse::<i128>()

str::parse::<i128>()

v0.1.x

What was noticed

  • It's around 7 times faster than the standard parse (for long string, Rust 1.60)
  • The performance is constant (the same) for strings of different lengths
Rust 1.63, Windows 10, Intel i7 9700K, "target-feature" set

This one became faster on Rust 1.63:

long string std u64                  1234567890123456
                        time:   [9.0293 ns 9.0843 ns 9.1661 ns]
                        change: [-0.6548% +0.8424% +2.3425%] (p = 0.29 > 0.05)
                        No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) high mild
  6 (6.00%) high severe

This one became even slover. I reran it (with rebuild) multiple times - same result:

long string negative std i64         -1234567890123456
                        time:   [17.554 ns 17.607 ns 17.667 ns]
                        change: [-1.6112% -0.2132% +1.5620%] (p = 0.80 > 0.05)
                        No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe
long string u64                      1234567890123456
                        time:   [1.9273 ns 1.9346 ns 1.9424 ns]
                        change: [-2.3999% -0.4986% +1.2253%] (p = 0.62 > 0.05)
                        No change in performance detected.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe
long string i64                      1234567890123456
                        time:   [2.3258 ns 2.3357 ns 2.3468 ns]
                        change: [-2.1695% -0.4296% +1.3102%] (p = 0.65 > 0.05)
                        No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) high mild
  5 (5.00%) high severe
long string negative i64             -1234567890123456
                        time:   [2.5319 ns 2.5439 ns 2.5607 ns]
                        change: [-2.0344% -0.3167% +1.5650%] (p = 0.75 > 0.05)
                        No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) high mild
  7 (7.00%) high severe
short string std u64                       1
                        time:   [2.3305 ns 2.3462 ns 2.3656 ns]
                        change: [-4.1262% -1.9850% +0.2412%] (p = 0.07 > 0.05)
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) high mild
  6 (6.00%) high severe
short string negative std i64              -1
                        time:   [3.7983 ns 3.8177 ns 3.8402 ns]
                        change: [-1.4979% -0.0694% +1.5137%] (p = 0.94 > 0.05)
                        No change in performance detected.
Found 9 outliers among 100 measurements (9.00%)
  5 (5.00%) high mild
  4 (4.00%) high severe
short string u64                           1
                        time:   [2.0024 ns 2.0097 ns 2.0184 ns]
                        change: [-3.4351% -1.3017% +0.5198%] (p = 0.22 > 0.05)
                        No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe
short string i64                           1
                        time:   [2.4245 ns 2.4356 ns 2.4499 ns]
                        change: [-2.9298% -1.3203% +0.3535%] (p = 0.12 > 0.05)
                        No change in performance detected.
Found 9 outliers among 100 measurements (9.00%)
  3 (3.00%) high mild
  6 (6.00%) high severe
short string negative i64                  -1
                        time:   [2.5191 ns 2.5233 ns 2.5285 ns]
                        change: [-2.8014% -0.9235% +0.7916%] (p = 0.35 > 0.05)
                        No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) high mild
  6 (6.00%) high severe

Bonus 15 chars benchmarks:

15 chars string std u64              123456789012345
                        time:   [8.4146 ns 8.4352 ns 8.4604 ns]
                        change: [-2.5855% -1.0348% +0.5767%] (p = 0.21 > 0.05)
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) high mild
  5 (5.00%) high severe
15 chars string negative std i64     -123456789012345
                        time:   [10.268 ns 10.331 ns 10.415 ns]
                        change: [-0.7653% +0.9929% +2.7733%] (p = 0.30 > 0.05)
                        No change in performance detected.
Found 13 outliers among 100 measurements (13.00%)
  7 (7.00%) high mild
  6 (6.00%) high severe
15 chars string u64                  123456789012345
                        time:   [1.8990 ns 1.9042 ns 1.9103 ns]
                        change: [-1.8510% -0.3256% +0.9332%] (p = 0.70 > 0.05)
                        No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) high mild
  4 (4.00%) high severe
15 chars string i64                  123456789012345
                        time:   [2.3780 ns 2.3831 ns 2.3892 ns]
                        change: [-2.1490% -0.6463% +0.8095%] (p = 0.41 > 0.05)
                        No change in performance detected.
Found 9 outliers among 100 measurements (9.00%)
  5 (5.00%) high mild
  4 (4.00%) high severe
15 chars string negative i64         -123456789012345
                        time:   [2.5323 ns 2.5445 ns 2.5589 ns]
                        change: [-2.8686% -0.9755% +0.9693%] (p = 0.34 > 0.05)
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) high mild
  5 (5.00%) high severe
Rust 1.60, Windows 10, Intel i7 9700K, "target-feature" set
long string std u64                  1234567890123456
                        time:   [15.136 ns 15.172 ns 15.220 ns]
                        change: [-1.0266% +1.4318% +4.7776%] (p = 0.42 > 0.05)
                        No change in performance detected.
Found 14 outliers among 100 measurements (14.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  3 (3.00%) high mild
  9 (9.00%) high severe

When parsing to i64 (standard .parse::<i64>()) it's somehow faster rather then u64 (.parse::<u64>())

long string negative std i64         -1234567890123456
                        time:   [12.451 ns 12.468 ns 12.489 ns]
                        change: [-2.8201% -1.8197% -0.9578%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 15 outliers among 100 measurements (15.00%)
  2 (2.00%) low mild
  5 (5.00%) high mild
  8 (8.00%) high severe
long string u64                      1234567890123456
                        time:   [2.1173 ns 2.1212 ns 2.1254 ns]
                        change: [-1.7643% -0.7705% +0.0464%] (p = 0.11 > 0.05)
                        No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high severe
long string i64                      1234567890123456
                        time:   [2.0971 ns 2.1018 ns 2.1083 ns]
                        change: [-1.4917% -0.3822% +0.4114%] (p = 0.53 > 0.05)
                        No change in performance detected.
Found 16 outliers among 100 measurements (16.00%)
  3 (3.00%) low mild
  5 (5.00%) high mild
  8 (8.00%) high severe
long string negative i64             -1234567890123456
                        time:   [2.1659 ns 2.1689 ns 2.1729 ns]
                        change: [-1.8464% -0.6673% +0.2406%] (p = 0.25 > 0.05)
                        No change in performance detected.
Found 12 outliers among 100 measurements (12.00%)
  4 (4.00%) low mild
  1 (1.00%) high mild
  7 (7.00%) high severe
short string std u64                       1
                        time:   [2.7282 ns 2.7315 ns 2.7355 ns]
                        change: [-0.3423% +0.5560% +1.4297%] (p = 0.25 > 0.05)
                        No change in performance detected.
Found 16 outliers among 100 measurements (16.00%)
  6 (6.00%) high mild
  10 (10.00%) high severe
short string negative std i64              -1
                        time:   [3.4122 ns 3.4210 ns 3.4304 ns]
                        change: [-0.4427% +0.2415% +1.0592%] (p = 0.57 > 0.05)
                        No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe
short string u64                           1
                        time:   [2.0971 ns 2.0989 ns 2.1014 ns]
                        change: [-0.4568% +0.1569% +0.7932%] (p = 0.63 > 0.05)
                        No change in performance detected.
Found 16 outliers among 100 measurements (16.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  12 (12.00%) high severe

This one must be a little lower, around 2.3 ns

short string i64                           1
                        time:   [2.6629 ns 2.6704 ns 2.6789 ns]
                        change: [-0.2341% +0.4340% +0.9879%] (p = 0.19 > 0.05)
                        No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) high mild
  4 (4.00%) high severe
short string negative i64                  -1
                        time:   [2.3049 ns 2.3077 ns 2.3115 ns]
                        change: [-0.8049% -0.1058% +0.5989%] (p = 0.79 > 0.05)
                        No change in performance detected.
Found 16 outliers among 100 measurements (16.00%)
  5 (5.00%) low mild
  3 (3.00%) high mild
  8 (8.00%) high severe

Bonus 15 chars benchmarks:

15 chars string std u64              123456789012345
                        time:   [14.314 ns 14.347 ns 14.386 ns]
                        change: [+0.5781% +1.5775% +3.0108%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
  7 (7.00%) high mild
  3 (3.00%) high severe
15 chars string negative std i64     -123456789012345
                        time:   [11.797 ns 11.869 ns 11.952 ns]
                        change: [-2.0623% -0.8216% +0.4470%] (p = 0.21 > 0.05)
                        No change in performance detected.
Found 11 outliers among 100 measurements (11.00%)
  3 (3.00%) high mild
  8 (8.00%) high severe
15 chars string u64                  123456789012345
                        time:   [1.8545 ns 1.8559 ns 1.8576 ns]
                        change: [-1.0279% -0.3076% +0.3114%] (p = 0.40 > 0.05)
                        No change in performance detected.
Found 16 outliers among 100 measurements (16.00%)
  3 (3.00%) low mild
  4 (4.00%) high mild
  9 (9.00%) high severe
15 chars string i64                  123456789012345
                        time:   [2.3638 ns 2.3734 ns 2.3825 ns]
                        change: [-1.8528% -0.7356% +0.2488%] (p = 0.17 > 0.05)
                        No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe
15 chars string negative i64         -123456789012345
                        time:   [2.3077 ns 2.3109 ns 2.3152 ns]
                        change: [-1.9844% -1.2570% -0.5860%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 15 outliers among 100 measurements (15.00%)
  3 (3.00%) low mild
  2 (2.00%) high mild
  10 (10.00%) high severe

atoi_simd's People

Contributors

orlp avatar rodmitry avatar shnatsel avatar steffahn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

atoi_simd's Issues

Parsing very large `i32` and `u32` returns an incorrect number instead of an error

The following test case causes an integer overflow:

#[test]
fn overflow() {
    let input = [55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 56, 55, 55, 55, 55, 55, 55, 55, 55, 56, 45];
    crate::parse::<i32>(&input);
}

It happens in this line:

res = res * 10 + digit;

Integer overflow panics in debug mode by default. If you are confident this overflow is benign, wrapping_add and wrapping_mul should be used. If it needs to be handled for correctness, use checked_mul (which looks like it would be more efficient than the overflow! macro, but I'm not willing to claim that without benchmarks).

parse_skipped can't parse zeros

parse_skipped strips all leading zeros, but this will cause atoi_simd::parse_skipped(b"0") to return Err(Empty).

Some test cases that should all parse to zero:

assert_eq!(parse_skipped::<i8>(b"0"), Ok(0));
assert_eq!(parse_skipped::<i8>(b"-0"), Ok(0));
assert_eq!(parse_skipped::<i8>(b"-0000000000000000000000000000"), Ok(0));
assert_eq!(parse_skipped::<i8>(b"+0"), Ok(0));
assert_eq!(parse_skipped::<i8>(b"+0000000000000000000000000000"), Ok(0));
assert_eq!(parse_skipped::<u8>(b"0"), Ok(0));
assert_eq!(parse_skipped::<u8>(b"+0"), Ok(0));
assert_eq!(parse_skipped::<u8>(b"+0000000000000000000000000000"), Ok(0));

and repeated for i16, i32, i64, i128 and u16, u32, u64, u128.

Out-of-bounds Read

This bounds check is wrong:

let mut val: u64 = unsafe { read_unaligned(s.get_safe_unchecked(8..).as_ptr().cast()) };

It should be

 let mut val: u64 = unsafe { read_unaligned(s.get_safe_unchecked(8..16).as_ptr().cast()) }; 

Applying this change causes a number of tests to fail.

This issue can also be observed using Miri, if I run cargo +nightly miri test without the .cargo/config in this repo, I can get at least these two UB reports:

196 |             let val: u64 = unsafe { read_unaligned(s.get_safe_unchecked(8..).as_ptr().cast()) };
    |                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ using uninitialized data, but this operation requires initialized memory
    |
   --> src/fallback.rs:145:41
    |
145 |             let mut val: u64 = unsafe { read_unaligned(s.get_safe_unchecked(8..).as_ptr().cast()) };
    |                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ memory access failed: alloc29045 has size 9, so pointer to 8 bytes starting at offset 8 is out-of-bounds

Some `u128` and `i128` values are parsed incorrectly instead of returning an error

The number "707071770707000177170017011770740070701" (or, in &[u8] notation, [55, 48, 55, 48, 55, 49, 55, 55, 48, 55, 48, 55, 48, 48, 48, 49, 55, 55, 49, 55, 48, 48, 49, 55, 48, 49, 49, 55, 55, 48, 55, 52, 48, 48, 55, 48, 55, 48, 49]) is parsed as 26507036865123250243267796907203647789 by atoi_simd. The standard library correctly returns an error in this case.

It only happens with AVX enabled, so it is likely a symptom of #6

Found using cargo fuzz

Parsing numbers starting with 0 is not supported

Add the following test to the codebase:

#[test]
fn test_parse_i64_fuzzed() {
    let input = b"02221222122210221223";
    let result = parse::<i64>(input).unwrap();
    assert_eq!(result, 02221222122210221223);
}

Run it with SIMD disabled (with the .cargo/config.toml file deleted) and you'll see that it fails. Meanwhile the standard library parses this number correctly.

This bug was discovered using cargo-fuzz. You can find a guide to it here and the fuzzing setup I used here. I will contribute the fuzzing setup to your repository eventually.

Parsing `i128` and `u128` may cause integer overflow when AVX is used

The following test case causes a panic in debug mode when AVX is in use:

#[test]
fn overflow_u128() {
    let input = [54, 54, 54, 54, 54, 54, 54, 54, 54, 54, 54, 54, 54, 54, 54, 54, 54, 54, 54, 54, 54, 50, 54, 54, 54, 54, 54, 54, 54, 54, 54, 54, 54, 54, 54, 51, 54, 54, 54];
    crate::parse::<u128>(&input);
}

The overflow happens on this line:

((*arr.get_safe_unchecked(0) as u128 * 10_000_000_000_000_000

See #5 for potential solutions.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.