alexhuszagh / rust-lexical Goto Github PK
View Code? Open in Web Editor NEWFast numeric to- and from-string conversion routines.
License: Other
Fast numeric to- and from-string conversion routines.
License: Other
The following code in the lexical-core build script branches on the value of cfg(target_arch). That cfg refers to the target of the current code being compiled by rustc. In the case of the build script that's the host architecture.
rust-lexical/lexical-core/build.rs
Lines 10 to 20 in 8c75e29
Cargo provides a separate env variable to build scripts, called CARGO_CFG_TARGET_ARCH, to determine the target arch of the library as opposed to the target arch of the build script.
To reproduce, put this in build.rs and run cargo check --target wasm32-unknown-unknown
and see which error is triggered.
#[cfg(target_arch = "x86_64")]
compile_error!("target_arch = x86_64");
#[cfg(target_arch = "wasm32")]
compile_error!("target_arch = wasm32");
I've encountered the following compile error on a vs2017-win2016
VM (in Azure Piplines) using x86_64-pc-windows-msvc
, rustc 1.39.0-nightly (97e58c0d3 2019-09-20)
. I see this project has a CI run on x86_64-pc-windows-gnu
(not msvc
) that passes.
error[E0061]: this function takes 1 parameter but 0 parameters were supplied
--> C:\Users\VssAdministrator\.cargo\registry\src\github.com-1ecc6299db9ec823\lexical-core-0.6.1\src\util\num.rs:961:44
|
961 | float_method_msvc!(self, f32, f64, powf, powf32, n as f32)
| ^^^^ expected 1 parameter
Tracking issue for rust-lang/rust#62146.
With format
and radix
enabled, I can e.g. parse hexadecimal floats using my proglang-specific syntax. However, that syntax includes base prefixes, in this case 0x
. To make matters worse, base prefixes usually appear between the sign and the integer digits of a number literal.
0b
, 0o
, 0
, 0d
, 0x
, as well as upper-case variants, are all common base prefixes in programming languages with 0b
for base-2, 0o
for base-8, 0
— as in leading zero — as a terrible way of saying 0o
, 0d
as an optional base-10 for the sake of symmetry, and 0x
for base-16 numbers. I suggest ignoring leading 0
to mean base-8. That's just terrible, a source of countless bugs, and should thus be up to the user to work around.
While pretty much any radix is possible, I suggest only handling these four bases. I don't know of any common radices for others.
The following extensions to the format
bit-packed config should be made:
0b
, 0o
, 0d
, 0x
.0d
.b'd'
. If only upper-case was allowed in a format, this would be b'D'
. Leading 0
is implied. If the format of the current radix has optional base-indicators, then all leading zeros behave normally.This leaves 2 bytes and 4 bits reserved when using a u128
or a second u64
for the format settings.
0.7.*
correct
, format
, radix
Currently I check the sign myself, memorise it, then skip sign and base prefix to radix-parse the number literal I got. This abuses the fact that flipping the sign of a float is a lossless operation. However, it's annoying and unergonomic.
An alternative design to support more radix prefixes would be to take a function pointer or something that maps base
to a base-indicating u8
ASCII-char.
It's also worth noting that there are languages with base postfixes, like 03h
in Intel x86 assembly. Should these be supported as well?
Trying to define a custom format (a format containing digit separators), I couldn't get my number format to parse the string "42.0"
. After a while I've noticed, that those provided formats, which also contain digit separators, can't parse the same string either. See test case.
rustc 1.54.0 (a178d0322 2021-07-26)
6.0.0
format
fn main() {
const RUST: u128 = lexical::format::RUST_LITERAL;
const JSON: u128 = lexical::format::JSON;
const CXX: u128 = lexical::format::CXX17_LITERAL;
let o = lexical::ParseFloatOptions::new();
println!("{:?}", lexical::parse_with_options::<f64, _, JSON>("42.0", &o));
// RUST_LITERAL
println!("{:?}", lexical::parse_with_options::<f64, _, RUST>("42.0", &o));
println!("{:?}", lexical::parse_with_options::<f64, _, RUST>("4_2.0", &o));
// CXX17_LITERAL
println!("{:?}", lexical::parse_with_options::<f64, _, CXX>("42.0", &o));
println!("{:?}", lexical::parse_with_options::<f64, _, CXX>("4'2.0", &o));
}
I would expect all five println
invocations to print Ok(42.0)
. But in the actual output, only the first one is able to parse the number.
Ok(42.0)
Err(EmptyInteger(2))
Err(EmptyInteger(3))
Err(EmptyMantissa(4))
Err(EmptyMantissa(5))
When I copy the RUST_LITERAL
and CXX17_LITERAL
definitions to my main function and comment out the digit_separator, the simple case can be parsed correctly:
pub const CXX_NOSEP: u128 = lexical::NumberFormatBuilder::new()
// .digit_separator(std::num::NonZeroU8::new(b'\''))
.case_sensitive_special(true)
.internal_digit_separator(true)
.build();
println!("{:?}", lexical::parse_with_options::<f64, _, CXX_NOSEP>("42.0", &o));
pub const RUST_NO_SEP: u128 = lexical::NumberFormatBuilder::new()
// .digit_separator(std::num::NonZeroU8::new(b'_'))
.required_digits(true)
.no_positive_mantissa_sign(true)
.no_special(true)
.internal_digit_separator(true)
.trailing_digit_separator(true)
.consecutive_digit_separator(true)
.build();
println!("{:?}", lexical::parse_with_options::<f64, _, RUST_NO_SEP>("42.0", &o));
prints
Ok(42.0)
Ok(42.0)
Currently, there are integer literals for ISAs (Instruction Set Architectures) like Intel x86 that support literal numbers for interrupt instructions, etc, as well as numerous other places. For example, for x86, we have the following reference specification.
First, we should add flags to NumberFormat
to ensure all these numbers can be parsed. Specific flags, such as for base prefixes and postfixes (integer only?) should be added.
Next, we should add support for the numerical constants supported by popular ISAs, which could include:
I don't know any specifics for ISAs other than x86, so help is greatly appreciated. Do different ISAs have any differences than x86? Is there any difference between AT&T and Intel syntax (I don't believe so). I'm looking for a series of new flags to add to NumberFormat
and then pre-defined constants so I can encompass all these possible variants.
Include lexical-core as a dependency in Cargo.toml, and turn off std:
lexical-core = {version="0.7.4", default-features=false }
Build gives an error:
error[E0554]: `#![feature]` may not be used on the stable release channel
--> /Users/todd/.cargo/registry/src/github.com-1ecc6299db9ec823/lexical-core-0.7.4/src/lib.rs:133:35
|
133 | #![cfg_attr(not(feature = "std"), feature(core_intrinsics))]
Some languages disallow leading zeros in integers (but not floats). For example, a Python REPL:
>>> 012
File "<stdin>", line 1
SyntaxError: leading zeros in decimal integer literals are not permitted; use an 0o prefix for octal integers
>>> -012
File "<stdin>", line 1
SyntaxError: leading zeros in decimal integer literals are not permitted; use an 0o prefix for octal integers
>>> 012.0
12.0
JavaScript/JSON has the same behavior.
I don't see an existing format that applies. If a new format is added, it would pass this test:
diff --git a/lexical-core/src/atoi/api.rs b/lexical-core/src/atoi/api.rs
index 022214f..1196fd8 100644
--- a/lexical-core/src/atoi/api.rs
+++ b/lexical-core/src/atoi/api.rs
@@ -348,6 +348,14 @@ mod tests {
assert!(i32::from_lexical_format(b"31_", format).is_err());
}
+ #[test]
+ #[cfg(feature = "format")]
+ fn i32_leading_zero() {
+ let format = NumberFormat::INTEGER_NO_LEADING_ZERO;
+ assert!(i32::from_lexical_format(b"012", format).is_err());
+ assert!(i32::from_lexical_format(b"-012", format).is_err());
+ }
+
#[cfg(feature = "std")]
proptest! {
#[test]
This could be applied to at least NumberFormat::PYTHON_LITERAL
and NumberFormat::JSON
.
Tracking Issue for: rust-bakery/nom#1080
Should be fixed with a PR to v0.6 or higher.
Rust will parse .
as Err(ParseFloatError { kind: Invalid })
, while rust-lexical
will parse it as 0.0
. Not sure which one is the "correct" one.
Currently, some settings like the current expected exponent character are global state. This can be anything from inconvenience to tricky issue for projects parsing multiple languages.
The format
feature of lexical-core
added integer-packed settings which you have to pass to all format parsing functions. I suggest doing a similar thing for everything else, but by passing in a struct reference. Integer-packing works for things like the exponent characters, but fails for e.g. set_inf_string
. This is still C-API-friendly. C libs are just forced to put strlen
next to their *const c_char
s.
0.7.*
format
, radix
, correct
Uhh… not doing any of this? Maybe packing said strings into something like staticvec::StaticString
inlined into the struct? But I don't think that gives any benefit. Another idea would be to always enable format
and include format
's bit-packed settings into that one struct.
rust-lexical does not implement the std::error::Error
trait for lexical::Error
.
That makes error handling harder, in my case the integration with anyhow
. anyhow
is a very helpful error handling helper library, but it requires the errors to implement the std::error::Error
trait. lexical
does not do that, which is why I'm gonna have to write boilerplate code to make it work correctly
Implement the std::error::Error
trait for lexical::Error
.
None, as far as I can tell. I'm a Rust beginner, but I think an implementation of std::error::Error
should require nothing beyond just a couple lines of code.
Alternative: not do anything.
use lexical::parse_with_options;
const FIXTURE: &str = "1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111";
fn main() {
let f: f64 = parse_with_options::<_, _, { lexical::NumberFormatBuilder::from_radix(16) }>(
FIXTURE,
&lexical::parse_float_options::STANDARD,
)
.expect("parse float failed");
println!("{}", f);
}
Here are a few things you should provide to help me understand the issue:
features = ["power-of-two", "parse-floats"]
use lexical::parse_with_options;
const FIXTURE: &str = "11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111";
fn main() {
let f: f64 = parse_with_options::<_, _, { lexical::NumberFormatBuilder::from_radix(16) }>(
FIXTURE,
&lexical::parse_float_options::STANDARD,
)
.expect("parse float failed");
println!("{}", f);
}
expected to see inf
,
but got error: thread 'main' panicked at 'parse float failed: InvalidPunctuation', src\main.rs:10:6
https://docs.rs/lexical-core/0.1.3/lexical_core/ftoa/fn.f64toa_slice.html
bytes
input should be?I'm trying to implement a protocol which does not accept the scientific format in all places. It would be useful to control if the decimal output is written in normal or scientific format.
The number of significant digits would also be nice to have some degree of control over. Rounding the number to the desired before doesn't help if the rounded value isn't representable (example 1.2f32
-> 1.2000000476837158
).
A extra write function which can take formatting hints, possibly write_format(n, format, significant_digits, bytes)
where:
n
- Value to be writtenformat
- An enum for desired formatsignificant_digits
- A usize of maximum number of significant digits, 0 could mean "Don't care".bytes
- Output bufferIf applicable to the feature request, here are a few things you should provide to help me understand the issue:
rustc 1.39.0-nightly (4295eea90 2019-08-30)
0.6.2
lexical-core: features=["radix"], default-features=false
In my application, over 25% of execution time is spent inside lexical::parse_lossy
. Mind you, lexical's implementation is far better than the stdlib implementation and its speed is simply fantastic.
A little bit more speed can't hurt though, so I was looking at the implementation details where is stated that parse_lossy
tries multiple parsing implementations: first the fast path, then the moderate path. Is it possible to make lexical return the fast path's result directly?
To give context; this is an excerpt of the floats that need to be parsed:
-0.018477, -0.018464, -0.018458, -0.014031, -0.014018, -0.014011, -0.000648, 0.008092,
0.000111, -0.009875, 0.012704, 0.012185, 0.011334, 0.011927, 0.012284, 0.010097,
0.012951, 0.001517, -0.005452, 0.015123, -0.004884, -0.007977, 0.019697, 0.010684
They're all in between -1.0 and +1.0, and only the first few 4-5 floating point digits matter.
It could be implemented using a new parse function, maybe lexical::parse_lossier
?
Hi! Your work looks interesting and I'm interesting in applying some of the concepts elsewhere, so I thought I'd have a play around.
Unfortunately I'm new to rust and struggling to use this. I tried following your instructions, and hit a couple of problems. I'm on Ubuntu 18.04/amd64, and I started off with the system supplied rust (1.30.0).
I started a new project with cargo new
, added lexical to the Cargo.toml
, and threw this into main.rs:
extern crate lexical;
fn main() {
let f: f32 = lexical::parse("12.34567");
println!("Hello, world! {}", f);
}
This gave me rename-dependency
as an error:
$ cargo install lexical
Updating crates.io index
Downloading lexical v2.0.0
error: failed to parse manifest at `/home/pwaller/.cargo/registry/src/github.com-1ecc6299db9ec823/lexical-2.0.0/Cargo.toml`
Caused by:
feature `rename-dependency` is required
consider adding `cargo-features = ["rename-dependency"]` to the manifest
I found various suggestions on the internet:
cargo-features = ["rename-dependency"]
at the top of my Cargo.toml
. This did not appear to help.Unfortunately, the latter suggestion lead to this new error:
~/.local/rust/bin/cargo install lexical
Updating crates.io index
Installing lexical v2.0.0
Compiling void v1.0.2
Compiling ryu v0.2.7
Compiling static_assertions v0.2.5
Compiling cfg-if v0.1.6
Compiling unreachable v1.0.0
Compiling stackvector v1.0.2
Compiling lexical-core v0.3.1
error[E0309]: the parameter type `T` may not live long enough
--> /home/pwaller/.cargo/registry/src/github.com-1ecc6299db9ec823/lexical-core-0.3.1/src/util/veclike.rs:440:5
|
439 | pub struct ReverseView<'a, T> {
| - help: consider adding an explicit lifetime bound `T: 'a`...
440 | inner: &'a [T],
| ^^^^^^^^^^^^^^
|
note: ...so that the reference type `&'a [T]` does not outlive the data it points at
--> /home/pwaller/.cargo/registry/src/github.com-1ecc6299db9ec823/lexical-core-0.3.1/src/util/veclike.rs:440:5
|
440 | inner: &'a [T],
| ^^^^^^^^^^^^^^
error[E0309]: the parameter type `T` may not live long enough
--> /home/pwaller/.cargo/registry/src/github.com-1ecc6299db9ec823/lexical-core-0.3.1/src/util/veclike.rs:456:5
|
455 | pub struct ReverseViewMut<'a, T> {
| - help: consider adding an explicit lifetime bound `T: 'a`...
456 | inner: &'a mut [T],
| ^^^^^^^^^^^^^^^^^^
|
note: ...so that the reference type `&'a mut [T]` does not outlive the data it points at
--> /home/pwaller/.cargo/registry/src/github.com-1ecc6299db9ec823/lexical-core-0.3.1/src/util/veclike.rs:456:5
|
456 | inner: &'a mut [T],
| ^^^^^^^^^^^^^^^^^^
error: aborting due to 2 previous errors
For more information about this error, try `rustc --explain E0309`.
error: failed to compile `lexical v2.0.0`, intermediate artifacts can be found at `/tmp/cargo-installhhZln5`
Caused by:
Could not compile `lexical-core`.
To learn more, run the command again with --verbose.
Thanks in advance for any help.
Here are some comments in the code I find misleading to the readers.
floor(x * log10(2) - log10(4/3))
; please refer to the paper (Section 5.4).floor
on the LHS's (the RHS's are not integers). This was my mistake and I corrected them in my repo recently.Currently, lexical-core only allows no_std
on stable if the libm
feature is enabled. Since libm is very small, well-maintained, and fast to compile, this should be the default.
Remove core::intrinsics
and replace them with libm
.
Not sure if this is an issue with lexical-core
, nom
, elastic-rs/elastic
(where I am using nom
) or even std
/core
or the compiler but I thought I would start here...
I am getting undefined symbols
ld
errors for various symbols in std
/core
when I enable the lexical
feature of nom
.
See elastic-rs/elastic/pull/389 for a little more background + error logs and this or this Travis build.
I have reproduced it on macOS 10.14 & 10.15 and Ubuntu 19.04 (and for the sake of completeness; various Linux via Docker) with rustc 1.38.0 (625451e37 2019-09-23)
, 1.39.0-beta.6 (224f0bc90 2019-10-15)
, 1.40.0-nightly (4a8c5b20c 2019-10-23)
—and a few other nightlies—and when cross compiling to x86_64-unknown-linux-musl
from macOS and Linux hosts.
We don't actually use the lexical
feature of nom
in elastic-rs/elastic
, so I disabled it and everything is building fine but I thought I would open this issue in case someone else runs into it.
I can also upload our current Cargo.lock
if that would help...
Git repository is currently too large due to compiled targets from lexical-test being added to the tree. This affects clone times dramatically. This can be fixed by running the following commands on each branch.
git filter-branch --tree-filter "rm -rf lexical-test/target" --prune-empty HEAD
git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d
git commit -m "Removing lexical-test/target from git history."
git gc
git push --force
println!("{}", 1.0.to_string()); // 1
let mut buf = [0u8, 64];
lexical_core::ftoa::f64toa_slice(v, 10, &mut buf);
println!("{}", std::str::from_utf8(buf).unwrap()); // 1.
Is this an intended behavior?
Currently, lexical-core
can only parse f32
and f64
, but especially for designers of programming languages supporting more number formats than Rust does would be nice.
Offer a feature-gated default impl for f128
using the f128
crate and f16, bf16
from the half
crate.
0.7.*
format
, correct
, radix
Don't see any beyond »let's not«.
Rust has u128
, as having it for e.g. crypto is convenient, despite no mainstream CPU having 128-bit integer arithmetic and registers. f16
is very often used, e.g. in GPU code, bf16
specifically in neural network code, and f128
also finds some use here and there.
There's also 8-bit floats, though not IEEE-standardised, and there's IEEE 754 binary256. However, I know of no handy softfloat crates for these.
As lexical-core
aims to be a proglang-agnostic number parser, i.e. not tied to Rust formats and types, I see no reason to completely restrict oneself to just the built-in Rust machine types.
If applicable to the issue, here are a few things you should provide to help me understand the issue:
rustc -V
1.53.0Please include a clear and concise description of the issue.
Refs #55, rust-lang/rust#85667
It would be nice if the versions that are incompatible with Rust 1.53.0 could be yanked. While yanking doesn't force people to update to the fixed versions, it does help as tools like cargo-audit
will now warn that you're using a yanked version and should upgrade.
Add any other context or screenshots about the issue here.
At rust-lang/rust#85667 (comment) @Mark-Simulacrum said "I think yanking is likely not the right step to take at this time." - I wonder if they still think that now that 1.53.0 is stable.
Hey, that's me again bugging you about arrayvec :)
It looks like the two crates are pretty close in the end, and I wonder if it makes sense for rust-lexical
to switch to the latter? arrayvec has seen much more usage in the ecosystem, and, because unsafe
code is involved, it seems like it makes sense to minimize duplication?
The there's the problem that ArrayVec
lacks insert_many
method, but I wonder if adding a
splice method would help with that?
(The reason why I am asking about this is that I've noticied that rust-analyzer transitively, via nom
, depends on stackvector
, while it already has arrayvec
among the deps`).
While the Ryu algorithm shows fine average throughput for arbitrary numbers.
But it does a lot of rounding iterations for numbers with a small mantissa (in serialized representation) and have no version for bases except decimal.
Use the Schubfach algorithm to avoid rounding loops.
It will minimize tail latency or increase throughput if input has lot of numbers a with small mantissa.
Also, the Schubfach algorithm can be implemented for other bases except 3, 12, 24, 48, etc.
"The Schubfach way to render doubles" by Raffaello Giulietti
Java variant for decimal representation from the author of the algorithm
Scala variant for decimal representation from the jsoniter-scala library
When building the crate with the latest nightly, the compilation fails with 27 errors.
rustc 1.51.0-nightly (d4e3570db 2021-02-01)
All of them are E0308 and E0277
This is useful when exposing a lexical::Error
through a wrapping-error's source
or cause
method.
Hello, we recently reported a buffer overflow bug in SmallVec::insert_many()
servo/rust-smallvec#252.
This crate contains a slightly older copy of insert_many()
that has the same vulnerability.
rust-lexical/lexical-core/src/util/sequence.rs
Lines 34 to 46 in 8c75e29
Please update the code when the fix is published.
Thanks!
In testing a project using lexical-core against https://github.com/nst/JSONTestSuite, I see that parsing floats such as 1.
, 0.e1
, and 2.e+3
pass, but are expected to fail.
This patch shows the behavior I think should apply. What do you think?
diff --git a/lexical-core/src/atof/api.rs b/lexical-core/src/atof/api.rs
index 9bb0688..a95d682 100644
--- a/lexical-core/src/atof/api.rs
+++ b/lexical-core/src/atof/api.rs
@@ -270,6 +270,9 @@ mod tests {
assert_eq!(Err((ErrorCode::EmptyFraction, 0).into()), f32::from_lexical(b"e-1"));
assert_eq!(Err((ErrorCode::Empty, 1).into()), f32::from_lexical(b"+"));
assert_eq!(Err((ErrorCode::Empty, 1).into()), f32::from_lexical(b"-"));
+ assert_eq!(Err((ErrorCode::EmptyFraction, 2).into()), f32::from_lexical(b"1."));
+ assert_eq!(Err((ErrorCode::EmptyFraction, 2).into()), f32::from_lexical(b"0.e1"));
+ assert_eq!(Err((ErrorCode::EmptyFraction, 2).into()), f32::from_lexical(b"2.e+3"));
// Bug fix for Issue #8
assert_eq!(Ok(5.002868148396374), f32::from_lexical(b"5.002868148396374"));
@@ -399,6 +402,9 @@ mod tests {
assert_eq!(Err((ErrorCode::EmptyFraction, 1).into()), f64::from_lexical(b"-."));
assert_eq!(Err((ErrorCode::Empty, 1).into()), f64::from_lexical(b"+"));
assert_eq!(Err((ErrorCode::Empty, 1).into()), f64::from_lexical(b"-"));
+ assert_eq!(Err((ErrorCode::EmptyFraction, 2).into()), f64::from_lexical(b"1."));
+ assert_eq!(Err((ErrorCode::EmptyFraction, 2).into()), f64::from_lexical(b"0.e1"));
+ assert_eq!(Err((ErrorCode::EmptyFraction, 2).into()), f64::from_lexical(b"2.e+3"));
// Bug fix for Issue #8
assert_eq!(Ok(5.002868148396374), f64::from_lexical(b"5.002868148396374"));
All of the following test cases fail with an Overflow
error:
assert_eq!(i8::MIN, lexical::try_parse(i8::MIN.to_string()).unwrap());
assert_eq!(i16::MIN, lexical::try_parse(i16::MIN.to_string()).unwrap());
assert_eq!(i32::MIN, lexical::try_parse(i32::MIN.to_string()).unwrap());
assert_eq!(i64::MIN, lexical::try_parse(i64::MIN.to_string()).unwrap());
Not sure how to get around this.
[[package]]
name = "nom"
version = "5.0.0-beta2"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"lexical-core 0.4.0 (registry+https://github.com/rust-lang/crates.io-index)",
"memchr 2.2.0 (registry+https://github.com/rust-lang/crates.io-index)",
"regex 1.1.7 (registry+https://github.com/rust-lang/crates.io-index)",
"version_check 0.1.5 (registry+https://github.com/rust-lang/crates.io-index)",
]
error[E0412]: cannot find type `ChunksExact` in module `slice`
--> C:\Users\Chris\.cargo\registry\src\github.com-1ecc6299db9ec823\lexical-core-0.4.0\src\util\sequence.rs:439:51
|
439 | fn chunks_exact(&self, size: usize) -> slice::ChunksExact<T> {
| ^^^^^^^^^^^ not found in `slice`
error[E0412]: cannot find type `ChunksExactMut` in module `slice`
--> C:\Users\Chris\.cargo\registry\src\github.com-1ecc6299db9ec823\lexical-core-0.4.0\src\util\sequence.rs:445:59
|
445 | fn chunks_exact_mut(&mut self, size: usize) -> slice::ChunksExactMut<T> {
| ^^^^^^^^^^^^^^ not found in `slice`
error[E0412]: cannot find type `RChunks` in module `slice`
--> C:\Users\Chris\.cargo\registry\src\github.com-1ecc6299db9ec823\lexical-core-0.4.0\src\util\sequence.rs:535:46
|
535 | fn rchunks(&self, size: usize) -> slice::RChunks<T> {
| ^^^^^^^ did you mean `Chunks`?
error[E0412]: cannot find type `RChunksMut` in module `slice`
--> C:\Users\Chris\.cargo\registry\src\github.com-1ecc6299db9ec823\lexical-core-0.4.0\src\util\sequence.rs:541:54
|
541 | fn rchunks_mut(&mut self, size: usize) -> slice::RChunksMut<T> {
| ^^^^^^^^^^ did you mean `ChunksMut`?
error[E0412]: cannot find type `RChunksExact` in module `slice`
--> C:\Users\Chris\.cargo\registry\src\github.com-1ecc6299db9ec823\lexical-core-0.4.0\src\util\sequence.rs:549:52
|
549 | fn rchunks_exact(&self, size: usize) -> slice::RChunksExact<T> {
| ^^^^^^^^^^^^ not found in `slice`
error[E0412]: cannot find type `RChunksExactMut` in module `slice`
--> C:\Users\Chris\.cargo\registry\src\github.com-1ecc6299db9ec823\lexical-core-0.4.0\src\util\sequence.rs:555:60
|
555 | fn rchunks_exact_mut(&mut self, size: usize) -> slice::RChunksExactMut<T> {
| ^^^^^^^^^^^^^^^ not found in `slice`
error[E0309]: the parameter type `T` may not live long enough
--> C:\Users\Chris\.cargo\registry\src\github.com-1ecc6299db9ec823\lexical-core-0.4.0\src\util\sequence.rs:137:5
|
136 | pub struct ReverseView<'a, T> {
| - help: consider adding an explicit lifetime bound `T: 'a`...
137 | inner: &'a [T],
| ^^^^^^^^^^^^^^
|
note: ...so that the reference type `&'a [T]` does not outlive the data it points at
--> C:\Users\Chris\.cargo\registry\src\github.com-1ecc6299db9ec823\lexical-core-0.4.0\src\util\sequence.rs:137:5
|
137 | inner: &'a [T],
| ^^^^^^^^^^^^^^
error[E0309]: the parameter type `T` may not live long enough
--> C:\Users\Chris\.cargo\registry\src\github.com-1ecc6299db9ec823\lexical-core-0.4.0\src\util\sequence.rs:153:5
|
152 | pub struct ReverseViewMut<'a, T> {
| - help: consider adding an explicit lifetime bound `T: 'a`...
153 | inner: &'a mut [T],
| ^^^^^^^^^^^^^^^^^^
|
note: ...so that the reference type `&'a mut [T]` does not outlive the data it points at
--> C:\Users\Chris\.cargo\registry\src\github.com-1ecc6299db9ec823\lexical-core-0.4.0\src\util\sequence.rs:153:5
|
153 | inner: &'a mut [T],
| ^^^^^^^^^^^^^^^^^^
error: aborting due to 8 previous errors
Some errors occurred: E0309, E0412.
For more information about an error, try `rustc --explain E0309`.
error: Could not compile `lexical-core`.
warning: build failed, waiting for other jobs to finish...
error: build failed
Hi.
Reported by @travismiller here.
Build log: https://ci.appveyor.com/project/blackbeam/mysql-async/builds/25478919/job/n0x2jg84ypetvh24#L250
List of broken builds (only i686): https://ci.appveyor.com/project/blackbeam/mysql-async/builds/25478919
Dutch floats are formatted like so: 101.123,456
, where the .
is the separator, and the comma is used for the fraction.
Afaik, it is not possible to add a format flag to allow parsing dutch floats.
Some way to configure the parser to allow that would be great.
I just saw you released a 2.0 (nice!) so I bumped the version I use in a pet project and it has caused a ~25% slowdown in parsing floats.
Are you aware of this already? With all the benchmarks here I figure you would have but the only note about 2.0 I can find is that you use minimal unsafe
which is looking like a dubious trade on my end. Can you enlighten me?
This issue tracks the implementation of an atof function that could be used by serde_json. It is motivated by the parsing issues discussed in serde-rs/json#536.
@Alexhuszagh Has provided background detail in #28. In particular, their lexical
library has lots of testing, which provides a great foundation on which to build a customized atof function.
The direction that currently seems viable is to add a streaming atof function to lexical-core
. It would operate on one byte at a time. This ought to allow serde_json to correctly parse JSON floats at high speed.
Errors:
error: duplicate lang item in crate `lexical_core`: `panic_impl`.
|
= note: first defined in crate `std`.
error: duplicate lang item in crate `lexical_core`: `eh_personality`.
|
= note: first defined in crate `panic_unwind`.
In cases like #24 where you wish to parse a subset of the formats supported by lexical,
It would perhaps be nice if there was an API which would return an interim result, which could be fed back into the parse function along with the next character.
Then when parsing from e.g. a JSON float string representation, you could avoid malformed string representations, and convert to float in a single pass over the input.
Generate an expanded TOML or similarly-formatted file with all the test cases for float-parsing conformance.
By default, we assume the radix is the same for the entire number. That is, the radix for the mantissa digits, the exponent base, and the radix for the exponent digits is the same.
Provide in ParseFloatOptions
2 additional fields:
exponent_radix
, the radix for the exponent digit encodingexponent_base
, the numerical base for the exponentThese should both be limited to valid radices as well.
C++ hexadecimal float literals, and hexadecimal float representations demonstrate this issue:
// 0xa.b, which is 10.6875 in hex notation
// p specifies an exponent base of 2
// The exponent is never optional for literals
// The exponent is optional for strings
// 10 is a decimal-encoded integer
// So, the float is identical to 10.6875 * 2^10
const float = 0xa.bp10
Hi,
First off I just want to thank you for the work you've put into this crate to create a faster parser of from string to uint, int, and floating types.
I'm currently writing a crate that I hope will eventually act as a faster version of numpy's loadtxt and genfromtxt but for Rust. The main logic can be found in the macro I wrote for the various different conversions as seen here. The only lines I had to change to incorporate your crate can be found at L172-188. The commented out lines are what used to be contained in the map function based on the standard library's conversion for most primitive types.
The current tests I have fail for only the float cases. You can view them here. However, they pass for all of integer and unsigned integer tests. Now if I more or less copy exactly what's in the macro and run it outside of a macro it works. Below is an example:
let mut results = Vec::<f32>::new();
let line_split_vec = vec!["1", "2", "3"];
results.extend({
line_split_vec.iter().map(|x| {
lexical::try_parse::<f32, _>(x.trim()).unwrap()
})
});
println!("{:?}", results);
I've also checked the type and values being fed into the lexical::try_parse
, and they are the same between my above example and the macro which fails.
edit:
I just ran it using the debug build and that works...
So, I've been fiddling around with the compiler options a bit, and it appears that depending on what compiler flags are being passed in determines whether or not it will run. I'll need to look a bit more into what was so different between what my basic cargo options set and the flags used for release.
The current version (v0.8.2) of lexical-core
claims to be no_std
(when default features are disabled), and doesn't include any mention of needing std
in the context of the compact
feature, but when enabling said feature, compilation is halted because std
seems to be required by lexical-util
whose std
feature was enabled somewhere in the dependency tree.
Compiling lexical-util v0.8.1
error[E0463]: can't find crate for `std`
|
= note: the `thumbv6m-none-eabi` target may not support the standard library
= note: `std` is required by `lexical_util` because it does not declare `#![no_std]`
Here are a few things you should provide to help me understand the issue:
[package]
name = "foo"
version = "0.1.0"
authors = ["becominginsane <[email protected]>"]
edition = "2018"
[dependencies]
lexical-core = { version = "0.8", features = ["compact"], default-features = false }
fn main() {
println!("you won't compile me");
}
lexical-core 0.6
pins cfg-if 0.1.9
, which causes downstream problems.
Users may be stuck on lexical-core 0.6
for a while, since nom 5
requires it, and moving to newer nom versions is a notoriously slow process.
rustc 1.43.1 (8d69840ab 2020-05-04)
0.6.3
[dependencies]
lexical-core = "0.6" # or indirectly through nom = "5"
cfg-if = "0.1"
fn foo() {
cfg_if::cfg_if! {
if #[cfg(unix)] {
fn bar() {}
let tm = ();
}
}
}
$ cargo +stable check
Checking foo v0.1.0 (/home/jon/dev/tmp/foo)
error: expected an item keyword
--> src/main.rs:5:13
|
5 | let tm = ();
| ^^^
error: aborting due to previous error
error: could not compile `foo`.
$ cargo +beta check
Checking foo v0.1.0 (/home/jon/dev/tmp/foo)
error: expected an item keyword
--> src/main.rs:5:13
|
5 | let tm = ();
| ^^^
|
::: /home/jon/.cargo/registry/src/github.com-1ecc6299db9ec823/cfg-if-0.1.9/src/lib.rs:41:40
|
41 | if #[cfg($($meta:meta),*)] { $($it:item)* }
| -------- while parsing argument for this `item` macro fragment
error: aborting due to previous error
error: could not compile `foo`.
So, the problem here is that cargo does a search of the dependency tree, sees that lexical-core
requires exactly cfg-if 0.1.9
, and so any crate in the tree that depends on cfg-if 0.1
then gets that version (since cargo only builds one major version of every crate).
As I understand it, the decision to pin cfg-if 0.1.9
was made in fefe818 to support older Rust versions. Unfortunately that now means that newer Rust versions are not supported. It seems more important to support new Rust versions than old ones, so I suggest that the pinning should be undone.
There is also a note in that PR saying:
Update cfg-if to "0.1.10" when we support only Rustc >= 1.32.0.
Don't know if that applies now?
See also rust-bakery/nom#1115 (comment)
This is how I currently define my number format at comptime, because const fn
is not supported:
const FORMI_LITERAL: NumberFormat =
NumberFormat::from_bits_truncate(0
| ((b'_' as u64) << 56) // digit_separator_to_flags
| 0x00000000_00000007 // REQUIRED_DIGITS
| 0x00000000_00000100 // NO_EXPONENT_WITHOUT_FRACTION
| 0x00000000_00000200 // NO_SPECIAL
| 0x00000111_00000000 // INTERNAL_DIGIT_SEPARATOR
);
I.e. a bunch of magic numbers that may break on any version jump.
Well, use const fn
. =) Or provide alternative functions using const fn
. Where checks are needed, a few different approaches are possible, each with ups and downs:
unsafe fn with_×××_unchecked
and panic!()
once panicing in const fn
s is a thing.0.7.*
correct
, radix
, format
Provide a single const fn
that returns true
if a format spec is valid, otherwise false
. Users of lexical-core
can then themselves do the static assertion. However, this still makes constructing a format at comptime »magic«, as opposed to using const fn
methods describing what it is one wants.
The repository is almost 500 megabytes due to including the lexical-core/target
git allows you to cleanup this using things like filter-branch, etc.
It will change the commit ids of the commits after the one where you accidentally added the lexical-core/target folder though, so contributors with old histories will get conflicts, but it's still largely worth it IMHO.
There's a lot of unsafe
code in lexical_core. A lot of it appears to be dealing with pointers, where you have a start
and end
pointer pair, which could as well be a slice and be completely safe with acceptable performance cost.
It is currently impossible to do a lot of things, without private forks of lexical-core.
There are also numerous features that are rarely used (including some undocumented ones), and have dubious utility:
table
(should be the default, since correct
depends on it).unchecked_index
(introduces security risks if enabled, and has no tangible performance benefits).libm
(should be enabled by default, see #61).noinline
(debugging tools, no longer used).format
(should be enabled by default, with fast-path algorithms to avoid overhead).Hello,
With lexical 1.5 the following code was fine:
pub fn deserialize<'de, T, D>(deserializer: D) -> Result<T, D::Error>
where
T: lexical::traits::Aton,
D: Deserializer<'de> {
lexical::try_parse::<T, _>(String::deserialize(deserializer)?).map_err(de::Error::custom)
}
Now it says:
error[E0603]: module
traits
is private
It looks like trait Aton has become FromBytes, but it still doesn't help as it cannot be used.
Compile the following with lexical-core (I tried 0.4.0 and 0.4.2) with the default options:
extern crate lexical_core;
fn main() {
let problematic: f64 = 7.689539722041643e164;
let as_str: String = format!("{:?}", problematic); // or other formats, see below
println!("{}", as_str);
let lcresult = lexical_core::atof64_slice(as_str.as_bytes());
println!("{:?}", lcresult);
let parse_result: f64 = as_str.parse().unwrap();
println!("{:?}", parse_result);
}
Output (I've removed most of the zeroes to make it more readable):
768953972204164300 … 0.0
768953972204164200 … 0.0
768953972204164300 … 0.0
Note the lexical-core result (middle) is different from the input value and from the String::parse
result, by 1 ULP. I got the same error using lexical::parse
(1.2.2) instead of lexical_core::atof64_slice
. OTOH, I got the expected result if I replace the format in the marked line with {}
(which omits the ".0") or with {:e}
(which outputs e164
like the input literal, instead of many zeroes). The input value has no special significance to me: I found it by testing my code (which calls into this function) using the proptest strategy proptest::num::f64::NORMAL
.
Akin to the configurable exponent character etc., I'd like to be able to tell lexical_core
that all b'_'
are to be ignored.
I'm using this crate to parse floating-point literals in a toy programming language that allows arbitrary _
separators to be added after the first digits before and after the .
. Currently, I have to allocate memory just to throw away the _
from otherwise valid input.
Test case:
b"+4_2_.3_4_e+7_7_"
should successfully parse as +42.34e77
if the to-be-ignored byte is set to b'_'
.
Prior art:
_
separators.'
is a valid digit separator.Questions:
filter
callback? That'll for sure be slower.A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.