alexhuszagh / rust-lexical Goto Github PK

Fast numeric to- and from-string conversion routines.

License: Other

Rust 94.46% Python 3.87% Go 1.26% Shell 0.41%

string-conversion std no-std precision floating-point rust parsing encoding

rust-lexical's Issues

[BUG] Wrong limb width when cross compiling

The following code in the lexical-core build script branches on the value of cfg(target_arch). That cfg refers to the target of the current code being compiled by rustc. In the case of the build script that's the host architecture.

rust-lexical/lexical-core/build.rs

Lines 10 to 20 in 8c75e29

    
           let limb_width_64 = cfg!(any( 
        
               target_arch = "aarch64", 
        
               target_arch = "mips64", 
        
               target_arch = "powerpc64", 
        
               target_arch = "x86_64" 
        
           )); 
        
           if limb_width_64 { 
        
               println!("cargo:rustc-cfg=limb_width_64"); 
        
           } else { 
        
               println!("cargo:rustc-cfg=limb_width_32"); 
        
           }

Cargo provides a separate env variable to build scripts, called CARGO_CFG_TARGET_ARCH, to determine the target arch of the library as opposed to the target arch of the build script.

To reproduce, put this in build.rs and run cargo check --target wasm32-unknown-unknown and see which error is triggered.

#[cfg(target_arch = "x86_64")]
compile_error!("target_arch = x86_64");
#[cfg(target_arch = "wasm32")]
compile_error!("target_arch = wasm32");

x86_64-pc-windows-msvc compile error

I've encountered the following compile error on a vs2017-win2016 VM (in Azure Piplines) using x86_64-pc-windows-msvc, rustc 1.39.0-nightly (97e58c0d3 2019-09-20). I see this project has a CI run on x86_64-pc-windows-gnu (not msvc) that passes.

error[E0061]: this function takes 1 parameter but 0 parameters were supplied
   --> C:\Users\VssAdministrator\.cargo\registry\src\github.com-1ecc6299db9ec823\lexical-core-0.6.1\src\util\num.rs:961:44
    |
961 |         float_method_msvc!(self, f32, f64, powf, powf32, n as f32)
    |                                            ^^^^ expected 1 parameter

Fix proc-macro issues in Nom

Tracking issue for rust-lang/rust#62146.

[FEATURE] Ignore/check base prefixes when using `format` and `radix`

Problem

With format and radix enabled, I can e.g. parse hexadecimal floats using my proglang-specific syntax. However, that syntax includes base prefixes, in this case 0x. To make matters worse, base prefixes usually appear between the sign and the integer digits of a number literal.

Solution

0b, 0o, 0, 0d, 0x, as well as upper-case variants, are all common base prefixes in programming languages with 0b for base-2, 0o for base-8, 0 — as in leading zero — as a terrible way of saying 0o, 0d as an optional base-10 for the sake of symmetry, and 0x for base-16 numbers. I suggest ignoring leading 0 to mean base-8. That's just terrible, a source of countless bugs, and should thus be up to the user to work around.

While pretty much any radix is possible, I suggest only handling these four bases. I don't know of any common radices for others.

The following extensions to the format bit-packed config should be made:

4 bits to indicate whether base prefixes are allowed for 0b, 0o, 0d, 0x.
4 bits to indicate whether base prefixes are optional. In most proglangs that would only be true for 0d.
4 bits to indicate whether base prefixes are case-insensitive. Often is the case, but not always.
4 bytes for the base-indicating characters. For base-10 this character would be b'd'. If only upper-case was allowed in a format, this would be b'D'. Leading 0 is implied. If the format of the current radix has optional base-indicators, then all leading zeros behave normally.

This leaves 2 bytes and 4 bits reserved when using a u128 or a second u64 for the format settings.

Prerequisites

lexical version : 0.7.*
lexical compilation features used: correct, format, radix

Alternatives

Currently I check the sign myself, memorise it, then skip sign and base prefix to radix-parse the number literal I got. This abuses the fact that flipping the sign of a float is a lossless operation. However, it's annoying and unergonomic.

An alternative design to support more radix prefixes would be to take a function pointer or something that maps base to a base-indicating u8 ASCII-char.

It's also worth noting that there are languages with base postfixes, like 03h in Intel x86 assembly. Should these be supported as well?

[BUG] Formats with `digit_separators` can't parse float numbers

Description

Trying to define a custom format (a format containing digit separators), I couldn't get my number format to parse the string "42.0". After a while I've noticed, that those provided formats, which also contain digit separators, can't parse the same string either. See test case.

Prerequisites

Rust version: rustc 1.54.0 (a178d0322 2021-07-26)
lexical version: 6.0.0
lexical compilation features used: format

Test case

fn main() {
  const RUST: u128 = lexical::format::RUST_LITERAL;
  const JSON: u128 = lexical::format::JSON;
  const CXX: u128 = lexical::format::CXX17_LITERAL;

  let o = lexical::ParseFloatOptions::new();
  
  println!("{:?}", lexical::parse_with_options::<f64, _, JSON>("42.0", &o));
  
  // RUST_LITERAL
  println!("{:?}", lexical::parse_with_options::<f64, _, RUST>("42.0", &o));
  println!("{:?}", lexical::parse_with_options::<f64, _, RUST>("4_2.0", &o));
  
  // CXX17_LITERAL
  println!("{:?}", lexical::parse_with_options::<f64, _, CXX>("42.0", &o));
  println!("{:?}", lexical::parse_with_options::<f64, _, CXX>("4'2.0", &o));
}

I would expect all five println invocations to print Ok(42.0). But in the actual output, only the first one is able to parse the number.

Ok(42.0)
Err(EmptyInteger(2))
Err(EmptyInteger(3))
Err(EmptyMantissa(4))
Err(EmptyMantissa(5))

Additional Context

When I copy the RUST_LITERAL and CXX17_LITERAL definitions to my main function and comment out the digit_separator, the simple case can be parsed correctly:

  pub const CXX_NOSEP: u128 = lexical::NumberFormatBuilder::new()
//    .digit_separator(std::num::NonZeroU8::new(b'\''))
    .case_sensitive_special(true)
    .internal_digit_separator(true)
    .build();
  println!("{:?}", lexical::parse_with_options::<f64, _, CXX_NOSEP>("42.0", &o));

  pub const RUST_NO_SEP: u128 = lexical::NumberFormatBuilder::new()
//    .digit_separator(std::num::NonZeroU8::new(b'_'))
    .required_digits(true)
    .no_positive_mantissa_sign(true)
    .no_special(true)
    .internal_digit_separator(true)
    .trailing_digit_separator(true)
    .consecutive_digit_separator(true)
    .build();
  println!("{:?}", lexical::parse_with_options::<f64, _, RUST_NO_SEP>("42.0", &o));

prints

Ok(42.0)
Ok(42.0)

[FEATURE] Add Support for ISA Integers/Numbers

Problem

Currently, there are integer literals for ISAs (Instruction Set Architectures) like Intel x86 that support literal numbers for interrupt instructions, etc, as well as numerous other places. For example, for x86, we have the following reference specification.

Solution

First, we should add flags to NumberFormat to ensure all these numbers can be parsed. Specific flags, such as for base prefixes and postfixes (integer only?) should be added.

Next, we should add support for the numerical constants supported by popular ISAs, which could include:

x86/x86_64
ARMv6, ARMv7-A, ARMv8-A
MIPS/MIPS64EL/MIPS64
PowerPC/PPC64/PPC64EL
s390x (IBM Z)
RISC-V
Any others deemed important

I don't know any specifics for ISAs other than x86, so help is greatly appreciated. Do different ISAs have any differences than x86? Is there any difference between AT&T and Intel syntax (I don't believe so). I'm looking for a series of new flags to add to NumberFormat and then pre-defined constants so I can encompass all these possible variants.

[BUG] Unable to use lexical-core in stable no_std environment

Description

Include lexical-core as a dependency in Cargo.toml, and turn off std:

lexical-core = {version="0.7.4", default-features=false }

Build gives an error:

error[E0554]: `#![feature]` may not be used on the stable release channel
   --> /Users/todd/.cargo/registry/src/github.com-1ecc6299db9ec823/lexical-core-0.7.4/src/lib.rs:133:35
    |
133 | #![cfg_attr(not(feature = "std"), feature(core_intrinsics))]

[FEATURE] NumberFormat to disallow leading zeroes

Problem

Some languages disallow leading zeros in integers (but not floats). For example, a Python REPL:

>>> 012
  File "<stdin>", line 1
SyntaxError: leading zeros in decimal integer literals are not permitted; use an 0o prefix for octal integers
>>> -012
  File "<stdin>", line 1
SyntaxError: leading zeros in decimal integer literals are not permitted; use an 0o prefix for octal integers
>>> 012.0
12.0

JavaScript/JSON has the same behavior.

Solution

I don't see an existing format that applies. If a new format is added, it would pass this test:

diff --git a/lexical-core/src/atoi/api.rs b/lexical-core/src/atoi/api.rs
index 022214f..1196fd8 100644
--- a/lexical-core/src/atoi/api.rs
+++ b/lexical-core/src/atoi/api.rs
@@ -348,6 +348,14 @@ mod tests {
         assert!(i32::from_lexical_format(b"31_", format).is_err());
     }

+    #[test]
+    #[cfg(feature = "format")]
+    fn i32_leading_zero() {
+        let format = NumberFormat::INTEGER_NO_LEADING_ZERO;
+        assert!(i32::from_lexical_format(b"012", format).is_err());
+        assert!(i32::from_lexical_format(b"-012", format).is_err());
+    }
+
     #[cfg(feature = "std")]
     proptest! {
         #[test]

This could be applied to at least NumberFormat::PYTHON_LITERAL and NumberFormat::JSON.

Tracking Issue for Double Parse Issue in Nom

Tracking Issue for: rust-bakery/nom#1080

Should be fixed with a PR to v0.6 or higher.

Parsing of '.'

Rust will parse . as Err(ParseFloatError { kind: Invalid }), while rust-lexical will parse it as 0.0. Not sure which one is the "correct" one.

[FEATURE] Remove global state

Problem

Currently, some settings like the current expected exponent character are global state. This can be anything from inconvenience to tricky issue for projects parsing multiple languages.

Solution

The format feature of lexical-core added integer-packed settings which you have to pass to all format parsing functions. I suggest doing a similar thing for everything else, but by passing in a struct reference. Integer-packing works for things like the exponent characters, but fails for e.g. set_inf_string. This is still C-API-friendly. C libs are just forced to put strlen next to their *const c_chars.

Prerequisites

lexical version : 0.7.*
lexical compilation features used: format, radix, correct

Alternatives

Uhh… not doing any of this? Maybe packing said strings into something like staticvec::StaticString inlined into the struct? But I don't think that gives any benefit. Another idea would be to always enable format and include format's bit-packed settings into that one struct.

Implement std::error::Error trait for lexical::Error

Problem

rust-lexical does not implement the std::error::Error trait for lexical::Error.

That makes error handling harder, in my case the integration with anyhow. anyhow is a very helpful error handling helper library, but it requires the errors to implement the std::error::Error trait. lexical does not do that, which is why I'm gonna have to write boilerplate code to make it work correctly

Solution

Implement the std::error::Error trait for lexical::Error.

Prerequisites

None, as far as I can tell. I'm a Rust beginner, but I think an implementation of std::error::Error should require nothing beyond just a couple lines of code.

Alternatives

Alternative: not do anything.

Advantage: less work
Disadvantage: crate is harder to work with and integrate with other crates

parse_float with 16 radix failed

Description

use lexical::parse_with_options;

const FIXTURE: &str = "1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111";

fn main() {
    let f: f64 = parse_with_options::<_, _, { lexical::NumberFormatBuilder::from_radix(16) }>(
        FIXTURE,
        &lexical::parse_float_options::STANDARD,
    )
    .expect("parse float failed");
    println!("{}", f);
}

Prerequisites

Here are a few things you should provide to help me understand the issue:

Rust version: 1.56.0
lexical version: 6.0.1
lexical compilation features used: features = ["power-of-two", "parse-floats"]

Test case

use lexical::parse_with_options;

const FIXTURE: &str = "11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111";

fn main() {
    let f: f64 = parse_with_options::<_, _, { lexical::NumberFormatBuilder::from_radix(16) }>(
        FIXTURE,
        &lexical::parse_float_options::STANDARD,
    )
    .expect("parse float failed");
    println!("{}", f);
}

Additional Context

expected to see inf,
but got error: thread 'main' panicked at 'parse float failed: InvalidPunctuation', src\main.rs:10:6

Better documentation for *toa_slice

https://docs.rs/lexical-core/0.1.3/lexical_core/ftoa/fn.f64toa_slice.html

How big the bytes input should be?
What exactly does this function returns?

[FEATURE] CORE write() formatting control

Problem

I'm trying to implement a protocol which does not accept the scientific format in all places. It would be useful to control if the decimal output is written in normal or scientific format.

The number of significant digits would also be nice to have some degree of control over. Rounding the number to the desired before doesn't help if the rounded value isn't representable (example 1.2f32 -> 1.2000000476837158).

Solution

A extra write function which can take formatting hints, possibly write_format(n, format, significant_digits, bytes) where:

n - Value to be written
format - An enum for desired format
significant_digits - A usize of maximum number of significant digits, 0 could mean "Don't care".
bytes - Output buffer

Prerequisites

If applicable to the feature request, here are a few things you should provide to help me understand the issue:

Rust version : rustc 1.39.0-nightly (4295eea90 2019-08-30)
lexical version : 0.6.2
lexical compilation features used: lexical-core: features=["radix"], default-features=false

Alternatives

String manipulation - Expensive and buggy, needs to check the result to figure out how to cut it up.

Parse floats using fast path?

Problem

In my application, over 25% of execution time is spent inside lexical::parse_lossy. Mind you, lexical's implementation is far better than the stdlib implementation and its speed is simply fantastic.

A little bit more speed can't hurt though, so I was looking at the implementation details where is stated that parse_lossy tries multiple parsing implementations: first the fast path, then the moderate path. Is it possible to make lexical return the fast path's result directly?

To give context; this is an excerpt of the floats that need to be parsed:

-0.018477, -0.018464, -0.018458, -0.014031, -0.014018, -0.014011, -0.000648, 0.008092, 
0.000111, -0.009875, 0.012704, 0.012185, 0.011334, 0.011927, 0.012284, 0.010097,
0.012951, 0.001517, -0.005452, 0.015123, -0.004884, -0.007977, 0.019697, 0.010684

They're all in between -1.0 and +1.0, and only the first few 4-5 floating point digits matter.

Solution

It could be implemented using a new parse function, maybe lexical::parse_lossier?

Unable to build (rename-dependency and lifetime errors)

Hi! Your work looks interesting and I'm interesting in applying some of the concepts elsewhere, so I thought I'd have a play around.

Unfortunately I'm new to rust and struggling to use this. I tried following your instructions, and hit a couple of problems. I'm on Ubuntu 18.04/amd64, and I started off with the system supplied rust (1.30.0).

I started a new project with cargo new, added lexical to the Cargo.toml, and threw this into main.rs:

extern crate lexical;

fn main() {
    let f: f32 = lexical::parse("12.34567");
    println!("Hello, world! {}", f);
}

This gave me rename-dependency as an error:

$ cargo install lexical
    Updating crates.io index
 Downloading lexical v2.0.0                                                                                                                                                 
error: failed to parse manifest at `/home/pwaller/.cargo/registry/src/github.com-1ecc6299db9ec823/lexical-2.0.0/Cargo.toml`                                                 

Caused by:
  feature `rename-dependency` is required

consider adding `cargo-features = ["rename-dependency"]` to the manifest

I found various suggestions on the internet:

Including inserting cargo-features = ["rename-dependency"] at the top of my Cargo.toml. This did not appear to help.
Update to rust 1.32.0.

Unfortunately, the latter suggestion lead to this new error:

~/.local/rust/bin/cargo install lexical
    Updating crates.io index
  Installing lexical v2.0.0
   Compiling void v1.0.2
   Compiling ryu v0.2.7
   Compiling static_assertions v0.2.5
   Compiling cfg-if v0.1.6
   Compiling unreachable v1.0.0
   Compiling stackvector v1.0.2
   Compiling lexical-core v0.3.1
error[E0309]: the parameter type `T` may not live long enough
   --> /home/pwaller/.cargo/registry/src/github.com-1ecc6299db9ec823/lexical-core-0.3.1/src/util/veclike.rs:440:5
    |
439 | pub struct ReverseView<'a, T> {
    |                            - help: consider adding an explicit lifetime bound `T: 'a`...
440 |     inner: &'a [T],
    |     ^^^^^^^^^^^^^^
    |
note: ...so that the reference type `&'a [T]` does not outlive the data it points at
   --> /home/pwaller/.cargo/registry/src/github.com-1ecc6299db9ec823/lexical-core-0.3.1/src/util/veclike.rs:440:5
    |
440 |     inner: &'a [T],
    |     ^^^^^^^^^^^^^^

error[E0309]: the parameter type `T` may not live long enough
   --> /home/pwaller/.cargo/registry/src/github.com-1ecc6299db9ec823/lexical-core-0.3.1/src/util/veclike.rs:456:5
    |
455 | pub struct ReverseViewMut<'a, T> {
    |                               - help: consider adding an explicit lifetime bound `T: 'a`...
456 |     inner: &'a mut [T],
    |     ^^^^^^^^^^^^^^^^^^
    |
note: ...so that the reference type `&'a mut [T]` does not outlive the data it points at
   --> /home/pwaller/.cargo/registry/src/github.com-1ecc6299db9ec823/lexical-core-0.3.1/src/util/veclike.rs:456:5
    |
456 |     inner: &'a mut [T],
    |     ^^^^^^^^^^^^^^^^^^

error: aborting due to 2 previous errors

For more information about this error, try `rustc --explain E0309`.
error: failed to compile `lexical v2.0.0`, intermediate artifacts can be found at `/tmp/cargo-installhhZln5`

Caused by:
  Could not compile `lexical-core`.

To learn more, run the command again with --verbose.

Thanks in advance for any help.

[BUG] Fix a bug, and improve comments on Dragonbox

Rust version: N/A
lexical version: 0.8.1
lexical compilation features used: N/A

Description

Here are some comments in the code I find misleading to the readers.

rust-lexical/lexical-write-float/src/algorithm.rs

Line 826 in 8cd8e86

// These are much more efficient log routines than the ones

I'm not sure what you mean here. The ones provided by dragonbox (if you mean the reference implementation) eventually boil down to code mostly identical to what you wrote. There are small differences on the details but they should not really make any differences on the performance.
rust-lexical/lexical-write-float/src/algorithm.rs

Line 861 in 8cd8e86

/// Calculate `(x * log10(2) - log10(4)) / 3` quickly.

This one is not correct. What's being computed is floor(x * log10(2) - log10(4/3)); please refer to the paper (Section 5.4).
rust-lexical/lexical-write-float/src/algorithm.rs

Line 355 in 8cd8e86

// floor( (fc-1/2) * 2^e ) = 1.175'494'28... * 10^-38

and similar lines. There should be no floor on the LHS's (the RHS's are not integers). This was my mistake and I corrected them in my repo recently.

[FEATURE] Make no_std stable by default

Problem

Currently, lexical-core only allows no_std on stable if the libm feature is enabled. Since libm is very small, well-maintained, and fast to compile, this should be the default.

Solution

Remove core::intrinsics and replace them with libm.

Undefined symbols ld error from nom

Not sure if this is an issue with lexical-core, nom, elastic-rs/elastic (where I am using nom) or even std/core or the compiler but I thought I would start here...

I am getting undefined symbols ld errors for various symbols in std/core when I enable the lexical feature of nom.

See elastic-rs/elastic/pull/389 for a little more background + error logs and this or this Travis build.

I have reproduced it on macOS 10.14 & 10.15 and Ubuntu 19.04 (and for the sake of completeness; various Linux via Docker) with rustc 1.38.0 (625451e37 2019-09-23), 1.39.0-beta.6 (224f0bc90 2019-10-15), 1.40.0-nightly (4a8c5b20c 2019-10-23)—and a few other nightlies—and when cross compiling to x86_64-unknown-linux-musl from macOS and Linux hosts.

We don't actually use the lexical feature of nom in elastic-rs/elastic, so I disabled it and everything is building fine but I thought I would open this issue in case someone else runs into it.

I can also upload our current Cargo.lock if that would help...

[BUG] Remove lexical-test/target from build tree.

Description

Git repository is currently too large due to compiled targets from lexical-test being added to the tree. This affects clone times dramatically. This can be fixed by running the following commands on each branch.

git filter-branch --tree-filter "rm -rf lexical-test/target" --prune-empty HEAD
git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d
git commit -m "Removing lexical-test/target from git history."
git gc
git push --force

Trailing dot

println!("{}", 1.0.to_string()); // 1

let mut buf = [0u8, 64];
lexical_core::ftoa::f64toa_slice(v, 10, &mut buf);
println!("{}", std::str::from_utf8(buf).unwrap()); // 1.

Is this an intended behavior?

[FEATURE] Add support for `f128::f128` and `half::{ f16, bf16 }`

Problem

Currently, lexical-core can only parse f32 and f64, but especially for designers of programming languages supporting more number formats than Rust does would be nice.

Solution

Offer a feature-gated default impl for f128 using the f128 crate and f16, bf16 from the half crate.

Prerequisites

lexical version : 0.7.*
lexical compilation features used: format, correct, radix

Alternatives

Don't see any beyond »let's not«.

Additional Context

Rust has u128, as having it for e.g. crypto is convenient, despite no mainstream CPU having 128-bit integer arithmetic and registers. f16 is very often used, e.g. in GPU code, bf16 specifically in neural network code, and f128 also finds some use here and there.

There's also 8-bit floats, though not IEEE-standardised, and there's IEEE 754 binary256. However, I know of no handy softfloat crates for these.

As lexical-core aims to be a proglang-agnostic number parser, i.e. not tied to Rust formats and types, I see no reason to completely restrict oneself to just the built-in Rust machine types.

[OTHER] Yank versions incompatbile with Rust 1.53.0

Prerequisites

If applicable to the issue, here are a few things you should provide to help me understand the issue:

Rust version: rustc -V 1.53.0
lexical version: 0.7.4
lexical compilation features used:

Description

Please include a clear and concise description of the issue.

Refs #55, rust-lang/rust#85667

It would be nice if the versions that are incompatible with Rust 1.53.0 could be yanked. While yanking doesn't force people to update to the fixed versions, it does help as tools like cargo-audit will now warn that you're using a yanked version and should upgrade.

Additional Context

Add any other context or screenshots about the issue here.

At rust-lang/rust#85667 (comment) @Mark-Simulacrum said "I think yanking is likely not the right step to take at this time." - I wonder if they still think that now that 1.53.0 is stable.

Consider replacing stackvector with arrayvec

Hey, that's me again bugging you about arrayvec :)

It looks like the two crates are pretty close in the end, and I wonder if it makes sense for rust-lexical to switch to the latter? arrayvec has seen much more usage in the ecosystem, and, because unsafe code is involved, it seems like it makes sense to minimize duplication?

The there's the problem that ArrayVec lacks insert_many method, but I wonder if adding a
splice method would help with that?

(The reason why I am asking about this is that I've noticied that rust-analyzer transitively, via nom, depends on stackvector, while it already has arrayvec among the deps`).

[FEATURE] The Schubfach algorithm

Problem

While the Ryu algorithm shows fine average throughput for arbitrary numbers.

But it does a lot of rounding iterations for numbers with a small mantissa (in serialized representation) and have no version for bases except decimal.

Solution

Use the Schubfach algorithm to avoid rounding loops.

It will minimize tail latency or increase throughput if input has lot of numbers a with small mantissa.

Also, the Schubfach algorithm can be implemented for other bases except 3, 12, 24, 48, etc.

Additional Context

"The Schubfach way to render doubles" by Raffaello Giulietti
Java variant for decimal representation from the author of the algorithm
Scala variant for decimal representation from the jsoniter-scala library

[BUG] Compilation Fails on Latest Nightly

Description

When building the crate with the latest nightly, the compilation fails with 27 errors.

Prerequisites

Rust version : rustc 1.51.0-nightly (d4e3570db 2021-02-01)
lexical version: 0.7.4

All of them are E0308 and E0277

lexical::Error should impl std::error::Error

This is useful when exposing a lexical::Error through a wrapping-error's source or cause method.

Memory safety problem in `insert_many()`

Hello, we recently reported a buffer overflow bug in SmallVec::insert_many() servo/rust-smallvec#252.

This crate contains a slightly older copy of insert_many() that has the same vulnerability.

rust-lexical/lexical-core/src/util/sequence.rs

Lines 34 to 46 in 8c75e29

    
           /// Insert multiple elements at position `index`. 
        
           /// 
        
           /// Shifts all elements before index to the back of the iterator. 
        
           /// It uses size hints to try to minimize the number of moves, 
        
           /// however, it does not rely on them. We cannot internally allocate, so 
        
           /// if we overstep the lower_size_bound, we have to do extensive 
        
           /// moves to shift each item back incrementally. 
        
           /// 
        
           /// This implementation is adapted from [`smallvec`], which has a battle-tested 
        
           /// implementation that has been revised for at least a security advisory 
        
           /// warning. Smallvec is similarly licensed under an MIT/Apache dual license. 
        
           /// 
        
           /// [`smallvec`]: https://github.com/servo/rust-smallvec

Please update the code when the fix is published.

Thanks!

Parse success on floats missing a trailing digit, e.g. "1."

In testing a project using lexical-core against https://github.com/nst/JSONTestSuite, I see that parsing floats such as 1., 0.e1, and 2.e+3 pass, but are expected to fail.

This patch shows the behavior I think should apply. What do you think?

diff --git a/lexical-core/src/atof/api.rs b/lexical-core/src/atof/api.rs
index 9bb0688..a95d682 100644
--- a/lexical-core/src/atof/api.rs
+++ b/lexical-core/src/atof/api.rs
@@ -270,6 +270,9 @@ mod tests {
         assert_eq!(Err((ErrorCode::EmptyFraction, 0).into()), f32::from_lexical(b"e-1"));
         assert_eq!(Err((ErrorCode::Empty, 1).into()), f32::from_lexical(b"+"));
         assert_eq!(Err((ErrorCode::Empty, 1).into()), f32::from_lexical(b"-"));
+        assert_eq!(Err((ErrorCode::EmptyFraction, 2).into()), f32::from_lexical(b"1."));
+        assert_eq!(Err((ErrorCode::EmptyFraction, 2).into()), f32::from_lexical(b"0.e1"));
+        assert_eq!(Err((ErrorCode::EmptyFraction, 2).into()), f32::from_lexical(b"2.e+3"));

         // Bug fix for Issue #8
         assert_eq!(Ok(5.002868148396374), f32::from_lexical(b"5.002868148396374"));
@@ -399,6 +402,9 @@ mod tests {
         assert_eq!(Err((ErrorCode::EmptyFraction, 1).into()), f64::from_lexical(b"-."));
         assert_eq!(Err((ErrorCode::Empty, 1).into()), f64::from_lexical(b"+"));
         assert_eq!(Err((ErrorCode::Empty, 1).into()), f64::from_lexical(b"-"));
+        assert_eq!(Err((ErrorCode::EmptyFraction, 2).into()), f64::from_lexical(b"1."));
+        assert_eq!(Err((ErrorCode::EmptyFraction, 2).into()), f64::from_lexical(b"0.e1"));
+        assert_eq!(Err((ErrorCode::EmptyFraction, 2).into()), f64::from_lexical(b"2.e+3"));

         // Bug fix for Issue #8
         assert_eq!(Ok(5.002868148396374), f64::from_lexical(b"5.002868148396374"));

try_parse fails to parse signed integer minimum values

All of the following test cases fail with an Overflow error:

    assert_eq!(i8::MIN, lexical::try_parse(i8::MIN.to_string()).unwrap());
    assert_eq!(i16::MIN, lexical::try_parse(i16::MIN.to_string()).unwrap());
    assert_eq!(i32::MIN, lexical::try_parse(i32::MIN.to_string()).unwrap());
    assert_eq!(i64::MIN, lexical::try_parse(i64::MIN.to_string()).unwrap());

Compile error in lexical-core 4 when building nom 5.0.0-beta2

Not sure how to get around this.

Cargo.lock

[[package]]
name = "nom"
version = "5.0.0-beta2"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
 "lexical-core 0.4.0 (registry+https://github.com/rust-lang/crates.io-index)",
 "memchr 2.2.0 (registry+https://github.com/rust-lang/crates.io-index)",
 "regex 1.1.7 (registry+https://github.com/rust-lang/crates.io-index)",
 "version_check 0.1.5 (registry+https://github.com/rust-lang/crates.io-index)",
]

Error log

error[E0412]: cannot find type `ChunksExact` in module `slice`
   --> C:\Users\Chris\.cargo\registry\src\github.com-1ecc6299db9ec823\lexical-core-0.4.0\src\util\sequence.rs:439:51
    |
439 |     fn chunks_exact(&self, size: usize) -> slice::ChunksExact<T> {
    |                                                   ^^^^^^^^^^^ not found in `slice`

error[E0412]: cannot find type `ChunksExactMut` in module `slice`
   --> C:\Users\Chris\.cargo\registry\src\github.com-1ecc6299db9ec823\lexical-core-0.4.0\src\util\sequence.rs:445:59
    |
445 |     fn chunks_exact_mut(&mut self, size: usize) -> slice::ChunksExactMut<T> {
    |                                                           ^^^^^^^^^^^^^^ not found in `slice`

error[E0412]: cannot find type `RChunks` in module `slice`
   --> C:\Users\Chris\.cargo\registry\src\github.com-1ecc6299db9ec823\lexical-core-0.4.0\src\util\sequence.rs:535:46
    |
535 |     fn rchunks(&self, size: usize) -> slice::RChunks<T> {
    |                                              ^^^^^^^ did you mean `Chunks`?

error[E0412]: cannot find type `RChunksMut` in module `slice`
   --> C:\Users\Chris\.cargo\registry\src\github.com-1ecc6299db9ec823\lexical-core-0.4.0\src\util\sequence.rs:541:54
    |
541 |     fn rchunks_mut(&mut self, size: usize) -> slice::RChunksMut<T> {
    |                                                      ^^^^^^^^^^ did you mean `ChunksMut`?

error[E0412]: cannot find type `RChunksExact` in module `slice`
   --> C:\Users\Chris\.cargo\registry\src\github.com-1ecc6299db9ec823\lexical-core-0.4.0\src\util\sequence.rs:549:52
    |
549 |     fn rchunks_exact(&self, size: usize) -> slice::RChunksExact<T> {
    |                                                    ^^^^^^^^^^^^ not found in `slice`

error[E0412]: cannot find type `RChunksExactMut` in module `slice`
   --> C:\Users\Chris\.cargo\registry\src\github.com-1ecc6299db9ec823\lexical-core-0.4.0\src\util\sequence.rs:555:60
    |
555 |     fn rchunks_exact_mut(&mut self, size: usize) -> slice::RChunksExactMut<T> {
    |                                                            ^^^^^^^^^^^^^^^ not found in `slice`

error[E0309]: the parameter type `T` may not live long enough
   --> C:\Users\Chris\.cargo\registry\src\github.com-1ecc6299db9ec823\lexical-core-0.4.0\src\util\sequence.rs:137:5
    |
136 | pub struct ReverseView<'a, T> {
    |                            - help: consider adding an explicit lifetime bound `T: 'a`...
137 |     inner: &'a [T],
    |     ^^^^^^^^^^^^^^
    |
note: ...so that the reference type `&'a [T]` does not outlive the data it points at
   --> C:\Users\Chris\.cargo\registry\src\github.com-1ecc6299db9ec823\lexical-core-0.4.0\src\util\sequence.rs:137:5
    |
137 |     inner: &'a [T],
    |     ^^^^^^^^^^^^^^

error[E0309]: the parameter type `T` may not live long enough
   --> C:\Users\Chris\.cargo\registry\src\github.com-1ecc6299db9ec823\lexical-core-0.4.0\src\util\sequence.rs:153:5
    |
152 | pub struct ReverseViewMut<'a, T> {
    |                               - help: consider adding an explicit lifetime bound `T: 'a`...
153 |     inner: &'a mut [T],
    |     ^^^^^^^^^^^^^^^^^^
    |
note: ...so that the reference type `&'a mut [T]` does not outlive the data it points at
   --> C:\Users\Chris\.cargo\registry\src\github.com-1ecc6299db9ec823\lexical-core-0.4.0\src\util\sequence.rs:153:5
    |
153 |     inner: &'a mut [T],
    |     ^^^^^^^^^^^^^^^^^^

error: aborting due to 8 previous errors

Some errors occurred: E0309, E0412.
For more information about an error, try `rustc --explain E0309`.
error: Could not compile `lexical-core`.
warning: build failed, waiting for other jobs to finish...
error: build failed

v0.4.1 looks broken on 32-bit architecture.

Hi.

Reported by @travismiller here.

Build log: https://ci.appveyor.com/project/blackbeam/mysql-async/builds/25478919/job/n0x2jg84ypetvh24#L250

List of broken builds (only i686): https://ci.appveyor.com/project/blackbeam/mysql-async/builds/25478919

[FEATURE] Add a format flag to allow parsing with , as separator

Problem

Dutch floats are formatted like so: 101.123,456, where the . is the separator, and the comma is used for the fraction.

Afaik, it is not possible to add a format flag to allow parsing dutch floats.

Some way to configure the parser to allow that would be great.

~25% perf hit from version 2.0

I just saw you released a 2.0 (nice!) so I bumped the version I use in a pet project and it has caused a ~25% slowdown in parsing floats.

Are you aware of this already? With all the benchmarks here I figure you would have but the only note about 2.0 I can find is that you use minimal unsafe which is looking like a dubious trade on my end. Can you enlighten me?

Tracking issue: atof for exact float parsing (hopefully for serde_json)

This issue tracks the implementation of an atof function that could be used by serde_json. It is motivated by the parsing issues discussed in serde-rs/json#536.

@Alexhuszagh Has provided background detail in #28. In particular, their lexical library has lots of testing, which provides a great foundation on which to build a customized atof function.

The direction that currently seems viable is to add a streaming atof function to lexical-core. It would operate on one byte at a time. This ought to allow serde_json to correctly parse JSON floats at high speed.

Doctesting fails without std

Errors:

error: duplicate lang item in crate `lexical_core`: `panic_impl`.
  |
  = note: first defined in crate `std`.
error: duplicate lang item in crate `lexical_core`: `eh_personality`.
  |
  = note: first defined in crate `panic_unwind`.

parse api for char at time, incremental FromLexical

In cases like #24 where you wish to parse a subset of the formats supported by lexical,
It would perhaps be nice if there was an API which would return an interim result, which could be fed back into the parse function along with the next character.

Then when parsing from e.g. a JSON float string representation, you could avoid malformed string representations, and convert to float in a single pass over the input.

Post Test-Cases for Float-Conformance

Generate an expanded TOML or similarly-formatted file with all the test cases for float-parsing conformance.

[FEATURE] Add support for numbers with different radices in different components.

Problem

By default, we assume the radix is the same for the entire number. That is, the radix for the mantissa digits, the exponent base, and the radix for the exponent digits is the same.

Solution

Provide in ParseFloatOptions 2 additional fields:

exponent_radix, the radix for the exponent digit encoding
exponent_base, the numerical base for the exponent

These should both be limited to valid radices as well.

Additional Context

C++ hexadecimal float literals, and hexadecimal float representations demonstrate this issue:

// 0xa.b, which is 10.6875 in hex notation
// p specifies an exponent base of 2
//   The exponent is never optional for literals
//   The exponent is optional for strings
// 10 is a decimal-encoded integer
// So, the float is identical to 10.6875 * 2^10 
const float = 0xa.bp10

lexical::try_parse for floats parsing appears dependent on compiler flags

Hi,

First off I just want to thank you for the work you've put into this crate to create a faster parser of from string to uint, int, and floating types.

I'm currently writing a crate that I hope will eventually act as a faster version of numpy's loadtxt and genfromtxt but for Rust. The main logic can be found in the macro I wrote for the various different conversions as seen here. The only lines I had to change to incorporate your crate can be found at L172-188. The commented out lines are what used to be contained in the map function based on the standard library's conversion for most primitive types.

The current tests I have fail for only the float cases. You can view them here. However, they pass for all of integer and unsigned integer tests. Now if I more or less copy exactly what's in the macro and run it outside of a macro it works. Below is an example:

    let mut results = Vec::<f32>::new();
    let line_split_vec = vec!["1", "2", "3"];
    results.extend({
          line_split_vec.iter().map(|x| {
                lexical::try_parse::<f32, _>(x.trim()).unwrap()
           })
     });
    println!("{:?}", results);

I've also checked the type and values being fed into the lexical::try_parse, and they are the same between my above example and the macro which fails.

edit:

I just ran it using the debug build and that works...

So, I've been fiddling around with the compiler options a bit, and it appears that depending on what compiler flags are being passed in determines whether or not it will run. I'll need to look a bit more into what was so different between what my basic cargo options set and the flags used for release.

[BUG] Enabling `compact` breaks `no_std`

Description

The current version (v0.8.2) of lexical-core claims to be no_std (when default features are disabled), and doesn't include any mention of needing std in the context of the compact feature, but when enabling said feature, compilation is halted because std seems to be required by lexical-util whose std feature was enabled somewhere in the dependency tree.

   Compiling lexical-util v0.8.1
error[E0463]: can't find crate for `std`
  |
  = note: the `thumbv6m-none-eabi` target may not support the standard library
  = note: `std` is required by `lexical_util` because it does not declare `#![no_std]`

Prerequisites

Here are a few things you should provide to help me understand the issue:

Rust version: 1.53
lexical-core version: 0.8.2
lexical compilation features used: "compact"

Test case

[package]
name = "foo"
version = "0.1.0"
authors = ["becominginsane <[email protected]>"]
edition = "2018"


[dependencies]
lexical-core = { version = "0.8", features = ["compact"], default-features = false }

fn main() {
    println!("you won't compile me");
}

[BUG] lexical-core 0.6 pins cfg-if 0.1.9, which causes problems on newer stable.

Description

lexical-core 0.6 pins cfg-if 0.1.9, which causes downstream problems.
Users may be stuck on lexical-core 0.6 for a while, since nom 5 requires it, and moving to newer nom versions is a notoriously slow process.

Prerequisites

Rust version : rustc 1.43.1 (8d69840ab 2020-05-04)
lexical-core version : 0.6.3

Test case

[dependencies]
lexical-core = "0.6" # or indirectly through nom = "5"
cfg-if = "0.1"

fn foo() {
    cfg_if::cfg_if! {
        if #[cfg(unix)] {
            fn bar() {}
            let tm = ();
        }
    }
}

$ cargo +stable check
    Checking foo v0.1.0 (/home/jon/dev/tmp/foo)
error: expected an item keyword
 --> src/main.rs:5:13
  |
5 |             let tm = ();
  |             ^^^

error: aborting due to previous error

error: could not compile `foo`.
$ cargo +beta check
    Checking foo v0.1.0 (/home/jon/dev/tmp/foo)
error: expected an item keyword
  --> src/main.rs:5:13
   |
5  |             let tm = ();
   |             ^^^
   |
  ::: /home/jon/.cargo/registry/src/github.com-1ecc6299db9ec823/cfg-if-0.1.9/src/lib.rs:41:40
   |
41 |         if #[cfg($($meta:meta),*)] { $($it:item)* }
   |                                        -------- while parsing argument for this `item` macro fragment

error: aborting due to previous error

error: could not compile `foo`.

Additional Context

So, the problem here is that cargo does a search of the dependency tree, sees that lexical-core requires exactly cfg-if 0.1.9, and so any crate in the tree that depends on cfg-if 0.1 then gets that version (since cargo only builds one major version of every crate).

As I understand it, the decision to pin cfg-if 0.1.9 was made in fefe818 to support older Rust versions. Unfortunately that now means that newer Rust versions are not supported. It seems more important to support new Rust versions than old ones, so I suggest that the pinning should be undone.

There is also a note in that PR saying:

Update cfg-if to "0.1.10" when we support only Rustc >= 1.32.0.

Don't know if that applies now?

[FEATURE] Turn as much as possible of `NumberFormat` into `const fn`

Problem

This is how I currently define my number format at comptime, because const fn is not supported:

const FORMI_LITERAL: NumberFormat =
    NumberFormat::from_bits_truncate(0
        | ((b'_' as u64) << 56) // digit_separator_to_flags
        | 0x00000000_00000007   // REQUIRED_DIGITS
        | 0x00000000_00000100   // NO_EXPONENT_WITHOUT_FRACTION
        | 0x00000000_00000200   // NO_SPECIAL
        | 0x00000111_00000000   // INTERNAL_DIGIT_SEPARATOR
    );

I.e. a bunch of magic numbers that may break on any version jump.

Solution

Well, use const fn. =) Or provide alternative functions using const fn. Where checks are needed, a few different approaches are possible, each with ups and downs:

Make the functions unsafe fn with_×××_unchecked and panic!() once panicing in const fns is a thing.
Use the const assert trick to check invariants. I.e. index a const array with a bool. If it's true, you index out of bounds. Fetched value must be used, iirc.
Use a crate for static assertions.

Prerequisites

lexical version : 0.7.*
lexical compilation features used: correct, radix, format

Alternatives

Provide a single const fn that returns true if a format spec is valid, otherwise false. Users of lexical-core can then themselves do the static assertion. However, this still makes constructing a format at comptime »magic«, as opposed to using const fn methods describing what it is one wants.

The repository is HUGE (500 MB), consider cleaning it up (remove lexical-core/target)

The repository is almost 500 megabytes due to including the lexical-core/target

git allows you to cleanup this using things like filter-branch, etc.

It will change the commit ids of the commits after the one where you accidentally added the lexical-core/target folder though, so contributors with old histories will get conflicts, but it's still largely worth it IMHO.

Remove needless uses of `unsafe`

There's a lot of unsafe code in lexical_core. A lot of it appears to be dealing with pointers, where you have a start and end pointer pair, which could as well be a slice and be completely safe with acceptable performance cost.

[FEATURE] Refactor available features

Problem

It is currently impossible to do a lot of things, without private forks of lexical-core.

Cannot have optionally-trimmed floats (IE, "12.0" and "12").
Cannot use correct (slow) and incorrect (fast) parsers at the same time.
Heavy reliance on global state (#45).

There are also numerous features that are rarely used (including some undocumented ones), and have dubious utility:

table (should be the default, since correct depends on it).
unchecked_index (introduces security risks if enabled, and has no tangible performance benefits).
libm (should be enabled by default, see #61).
noinline (debugging tools, no longer used).
~~format (should be enabled by default, with fast-path algorithms to avoid overhead).~~

module `traits` is private

Hello,

With lexical 1.5 the following code was fine:

pub fn deserialize<'de, T, D>(deserializer: D) -> Result<T, D::Error>
where
T: lexical::traits::Aton,
D: Deserializer<'de> {
lexical::try_parse::<T, _>(String::deserialize(deserializer)?).map_err(de::Error::custom)
}

Now it says:

error[E0603]: module traits is private

It looks like trait Aton has become FromBytes, but it still doesn't help as it cannot be used.

This f64 value round-trips with 1 ULP error

Compile the following with lexical-core (I tried 0.4.0 and 0.4.2) with the default options:

extern crate lexical_core;

fn main() {
    let problematic: f64 = 7.689539722041643e164;
    let as_str: String = format!("{:?}", problematic); // or other formats, see below
    println!("{}", as_str);
    let lcresult = lexical_core::atof64_slice(as_str.as_bytes());
    println!("{:?}", lcresult);
    let parse_result: f64 = as_str.parse().unwrap();
    println!("{:?}", parse_result);
}

Output (I've removed most of the zeroes to make it more readable):

768953972204164300 … 0.0
768953972204164200 … 0.0
768953972204164300 … 0.0

Note the lexical-core result (middle) is different from the input value and from the String::parse result, by 1 ULP. I got the same error using lexical::parse (1.2.2) instead of lexical_core::atof64_slice. OTOH, I got the expected result if I replace the format in the marked line with {} (which omits the ".0") or with {:e} (which outputs e164 like the input literal, instead of many zeroes). The input value has no special significance to me: I found it by testing my code (which calls into this function) using the proptest strategy proptest::num::f64::NORMAL.

Add ignored byte feature

Akin to the configurable exponent character etc., I'd like to be able to tell lexical_core that all b'_' are to be ignored.

I'm using this crate to parse floating-point literals in a toy programming language that allows arbitrary _ separators to be added after the first digits before and after the .. Currently, I have to allocate memory just to throw away the _ from otherwise valid input.

Test case:

b"+4_2_.3_4_e+7_7_" should successfully parse as +42.34e77 if the to-be-ignored byte is set to b'_'.

Prior art:

Rust's float literals allow for _ separators.
Ruby's as well.
In C++14, ' is a valid digit separator.

Questions:

Are valid digits allowed? Who would even do that?
Dot should not be allowed, I'd say?
How would one treat the case of not having a to-be-ignored byte? Would that noticeably slow things down?
Would it make sense to allow having multiple to-be-ignored bytes? Maybe even going so far as to add a filter callback? That'll for sure be slower.

	let limb_width_64 = cfg!(any(
	target_arch = "aarch64",
	target_arch = "mips64",
	target_arch = "powerpc64",
	target_arch = "x86_64"
	));
	if limb_width_64 {
	println!("cargo:rustc-cfg=limb_width_64");
	} else {
	println!("cargo:rustc-cfg=limb_width_32");
	}

	/// Insert multiple elements at position `index`.
	///
	/// Shifts all elements before index to the back of the iterator.
	/// It uses size hints to try to minimize the number of moves,
	/// however, it does not rely on them. We cannot internally allocate, so
	/// if we overstep the lower_size_bound, we have to do extensive
	/// moves to shift each item back incrementally.
	///
	/// This implementation is adapted from [`smallvec`], which has a battle-tested
	/// implementation that has been revised for at least a security advisory
	/// warning. Smallvec is similarly licensed under an MIT/Apache dual license.
	///
	/// [`smallvec`]: https://github.com/servo/rust-smallvec

alexhuszagh / rust-lexical Goto Github PK

rust-lexical's Issues

Problem

Solution

Prerequisites

Alternatives

Description

Prerequisites

Test case

Additional Context

Problem

Solution

Description

Problem

Solution

Problem

Solution

Prerequisites

Alternatives

Problem

Solution

Prerequisites

Alternatives

Description

Prerequisites

Test case

Additional Context

Problem

Solution

Prerequisites

Alternatives

Problem

Solution

Description

Problem

Solution

Description

Problem

Solution

Prerequisites

Alternatives

Additional Context

Prerequisites

Description

Additional Context

Problem

Solution

Additional Context

Description

Prerequisites

Cargo.lock

Error log

Problem

Problem

Solution

Additional Context

Description

Prerequisites

Test case

Description

Prerequisites

Test case

Additional Context

Problem

Solution

Prerequisites

Alternatives

Problem

Recommend Projects

Recommend Topics

Recommend Org