wrandelshofer / fastdoubleparser Goto Github PK

View Code? Open in Web Editor NEW

165.0 7.0 17.0 8.64 MB

A Java port of Daniel Lemire's fast_float project

License: MIT License

Java 100.00%

java high-performance parser floating-point bigdecimal biginteger

fastdoubleparser's People

Contributors

Stargazers

Watchers

Forkers

sirinath cafxx codeintelligencetesting carldea rakhithjk lemire murodin lvheyang pjfanning chenzhongpu blacelle kosak marschall vlsi solitary-dream xtonik cklim647

fastdoubleparser's Issues

Large number of incorrect parsing

For https://github.com/fastfloat/fast_float, we have extensive tests. I have run them through on FastDoubleParser and found many failures, I have collected them in this gist...

https://gist.github.com/lemire/641a34589c36747f6d24ed6d29ac75f0

The algorithm at https://github.com/fastfloat/fast_float handles all of these cases correctly.

You may refer to https://arxiv.org/abs/2101.11408 or to the C# port at https://github.com/CarlVerret/csFastFloat

The parser throws StringIndexOutOfBoundsException/ArrayIndexOutOfBoundsException for some inputs

The parser throws StringIndexOutOfBoundsException/ArrayIndexOutOfBoundsException for some inputs.
For example with the following input: "0x".

This issue has been discovered in FasterXML/jackson-core#809

The only exceptions, that the parser may throw are:

NumberFormatException when the input is illegal
OutOfMemoryException when the JVM fails to allocate memory for objects created by the parser

Incorrect maven command sequence

The test can be done with javac and java directly, but it does NOT work as expected with maven.

After mvn clean package, the command below raises an error "Error: Could not find or load main class ch.randelshofer.fastdoubleparserdemo.Main in module ch.randelshofer.fastdoubleparserdemo":

java -XX:CompileCommand=inline,java/lang/String.charAt -p fastdoubleparser/target:fastdoubleparserdemo/target -m ch.randelshofer.fastdoubleparserdemo/ch.randelshofer.fastdoubleparserdemo.Main --markdown

I checked the jars inside fastdoubleparser/target and fastdoubleparserdemo/target, and found that they contains nothing but a META-INF folder!

jar xvf fastdoubleparser-0.7.0.jar
  created: META-INF/
 inflated: META-INF/MANIFEST.MF

So, the maven command cannot produce correct jar, and I think it is caused by incorrect Maven project structures and POM configurations. BTW, I think the current multi-release jar here is a little overkill.

FastDoubleParser accepts illegal inputs "." and ".e2"

FastDoubleParser accepts illegal inputs "." and ".e2".
Double.parseDouble() does not accept these values.

float parser

Hi - thanks for all the great work on the double parser. I've been experimenting with it for possible inclusion in jackson-core.

Parsing floats using the double parser is also much faster than using Float.parseFloat but unfortunately casting doubles to floats can often give you different result from plain Float.parseFloat.

Would it be possible to consider also supporting a dedicated float parser?

An example is 7.006492321624086e-46 which Float.parseFloat returns as 1.4E-45 but using FastDoubleParser:

        double dbl = FastDoubleParser.parseDouble("7.006492321624086e-46");
        System.out.println("double=" + dbl); //7.006492321624085E-46
        System.out.println("float=" + (float)dbl); //0.0

possible performance issue with very big doubles

JavaDoubleParser seems to be slower than Double.parseDouble for very large numbers (thousands of digits).
Malicious actors often create input files with large numbers to try to cause denial of service issues.

I have a jmh benchmark at https://github.com/pjfanning/jackson-number-parse-bench

./gradlew jmh

It's worth checking the build.gradle file as I have a param that controls which benchmark to run.

jmh {
    includes = ['org.example.jackson.bench.DoubleParserBench']
}

I'm wondering if it would be possible to disregard the least significant digits. If there are 1000 digits, only the first 30 or 40 digits should really impact the double value - even if you were conservative and limited it 100 or 200, this would limit the risk vector.

Publish a multi-release JAR

"We use your java8 code in jackson-core. If you publish a jar with your java8 branch code that would be great - we would change our build to use your published jars and that shades the class packages to include them in jackson-core jar.

One solution would be to append '-java8' to the artifact name (and '-java17' for the java17 jar). Or maven supports 'classifiers' which basically lead to a similar result."

Originally posted by @pjfanning in #22 (comment)

Integrating it as the default parser in openjdk

Hi,
I truly think that https://arxiv.org/abs/2101.11408 is a breakthrough in computer science and that the world would benefit from such parser to be used by default in openjdk (as it is for Go).
(I wonder if an even faster parser couldn't be achieved using jsoniter-scala optimization techniques in addition to Lemire's FasterXML/jackson-core#577 ).

Parsing of hexadecimal floating point numbers is broken in release 0.5.0

There is a bug in the method 'tryHexToFloatWithFastAlgorithm'.

I accidentally removed the if-statements that check whether the fast algorithm succeeded, and didn't notice it because I had the corresponding unit tests commented out.

Document which code signing keys will be used for published artifacts.

Looks like artifacts are being signed with this key:
https://keyserver.ubuntu.com/pks/lookup?search=6ead752b3e2b38e8e2236d7ba9321edaa5cb3202&fingerprint=on&op=index

If that is the correct key can you add a section to the readme confirming that is the key that is expected to be used for code signing on the artifacts released from this repo? Thanks. :)

See examples of other libs that provide docs for the code signing key used are here:

Document usage and benchmarking

👍

Can you document how one would run the benchmarks and how one would use the code as an external library?

Double.parseDouble("0e555") != FastDoubleParser.parseDouble("0e555")

Double.parseDouble and FastDoubleParser.parseDouble return different results for the string "0e555":

Double.parseDouble("0e555"): 0.0
FastDoubleParser.parseDouble("0e555"): Infinity

Edit: I believe that is caused by the special case at

FastDoubleParser/src/ch/randelshofer/math/FastDoubleParser.java

Line 1089 in daa2392

return negative ? Double.NEGATIVE_INFINITY : Double.POSITIVE_INFINITY;

, which does not handle the even more special case of the mantissa being 0.

BigDecimal parser

Thanks for all the hard work on the double and float parsers. Would there be any chance that you could consider adding support for BigDecimal parsing? A lot of the low level parser could be reused.

release jar with jdk8 compatible

Could you help release the jar with JDK 8 compatible to make it available for the broader use cases?

BigInteger parser

    @wrandelshofer I'm using v0.5.2 and have found that `JavaBigIntegerParser,parseBigInteger(CharSequence str)` accepts hex values like "AAAA" but `new BigInteger(String)` throws a NumberFormatException with "AAAA".

Would it be possible to support being able to disable hex support?

Originally posted by @pjfanning in #24 (comment)

More efficient character group check

The same trick as in a3c6df6.

9007199254740992.e-256 will not parsed

It appears that wrandelshofer/FastDoubleParser might be tied to https://github.com/lemire/fast_double_parser which is based on RFC 7159 (JSON standard). This means that strings such as 9007199254740992.e-256 which are not valid in JSON will not parse.

I really recommend you follow more closely the approach in https://github.com/fastfloat/fast_float if you mean to solve the general float parsing problem.

Parser accepts invalid hex chars

See description of pull request #48.

make it allocation free on happy path

FastDoubleParser/src/main/java/ch/randelshofer/fastdoubleparser/FastDoubleParser.java

Line 530 in 8df6871

    
           Double d = FastDoubleMath.hexFloatLiteralToDouble(index, isNegative, digits, exponent, virtualIndexOfPoint, exp_number, isDigitsTruncated, skipCountInTruncatedDigits);

why allocate here?? by that time you know it's not a NaN for sure... so instead of returning null you can just return Double.NaN or whatever special constant.

lack of jmh tests is also troubling :(

1.0.0 release only supports very recent JVMs

Jackson still supports Java 8 but fastdoubleparser has at least some classes that have class file major version 66 - might be java 22

Jackson built fine with fastdoubleparser 0.9.0.

This could be a shortcoming of maven plugins - that don't know about Java 22. In fairness, Java 22 is only early access and many build tools really struggle to keep up.

Error:  Failed to execute goal org.apache.maven.plugins:maven-shade-plugin:3.5.1:shade (shade-jackson-core) on project jackson-core: Error creating shaded jar: Problem shading JAR /home/runner/.m2/repository/ch/randelshofer/fastdoubleparser/1.0.0/fastdoubleparser-1.0.0.jar entry META-INF/versions/22/ch/randelshofer/fastdoubleparser/FastDoubleSwar.class: java.lang.IllegalArgumentException: Unsupported class file major version 66

Edit: This seems to be a shortcoming of maven-shade-plugin but I think I have managed to work around it by excluding the java 22 classes that are in META-INF/versions/22/ch/randelshofer/fastdoubleparser

Double.parseDouble(...) != FastDoubleParser.parseDouble(...)

I have found another input string for which the return values of Double.parseDouble and FastDoubleParser.parseDouble differ. This one is less important than #6 though as it implies only a very minor loss in precision:

Double.parseDouble("-2.2222222222223e-322"): -2.2E-322
FastDoubleParser.parseDouble("-2.2222222222223e-322"): 0.0

Both this issue and #6 have been found with the open-source JVM fuzzer Jazzer. If you are interested in these kinds of findings, I could add the fuzzer to the project as a PR.

Is this a mistake with hex float parsing?

In this section of FastFloatMath, it checks the significand against a 53-bit number (as if it were testing to see if it is an exactly representable double), but then casts to float, despite the comments repeatedly referring to the code as using doubles. I think the cast to float should probably be a cast to double (and d should be a double), but I'm not familiar with the code.

FastDoubleParser/fastdoubleparser-dev/src/main/java/ch.randelshofer.fastdoubleparser/ch/randelshofer/fastdoubleparser/FastFloatMath.java

Lines 324 to 334 in c7b1162

    
           if (Long.compareUnsigned(significand, 0x1fffffffffffffL) <= 0) { 
        
               // convert the integer into a double. This is lossless since 
        
               // 0 <= i <= 2^53 - 1. 
        
               float d = (float) significand; 
        
               // 
        
               // The general idea is as follows. 
        
               // If 0 <= s < 2^53  then 
        
               // 1) Both s and p can be represented exactly as 64-bit floating-point 
        
               // values (binary64). 
        
               // 2) Because s and p can be represented exactly as floating-point values, 
        
               // then s * p will produce correctly rounded values.

Builds should be reproducible

Building a specific revision should be reproducible.

Currently, the multi-release jar created by the build contains the timestamps of the compiled class files. And therefore each time the multi-release jar is built, it has different content.

See
https://maven.apache.org/guides/mini/guide-reproducible-builds.html

SWAR routines accept invalid non-digit chars/bytes

FastDoubleParser/src/main/java/ch.randelshofer.fastdoubleparser/ch/randelshofer/fastdoubleparser/FastDoubleSimd.java

Line 117 in 0903817

long det = ((value + 0x4646464646464646L) | val) &

Use 0x76 instead of 0x46 byte for detection of invalid digits in "numbers" like 1X345678.

Please bundle LICENSE/NOTICE files in the produced jar files

I'm upgrading jackson in Apache JMeter, and I found the new jackson version depends on fastdoubleparser.
It turns out fastdoubleparser does not ship with the license, so it is problematic for the consumers.

See apache/jmeter#5831, and the build failure: https://github.com/apache/jmeter/actions/runs/4823397202/jobs/8592678119?pr=5831#step:4:1857

I have created a lot of similar requests, and almost all of them got fixed eventually, see Dependency with "manual" license configuration in apache/jmeter#469

Current issues

The current license is MIT: https://github.com/wrandelshofer/FastDoubleParser/blob/aeeab26365235cc2fbfb68fea2145a4b86a800fd/LICENSE
However, please note that there's no canonical MIT license text. Every MIT license is different since the copyright is a part of the license text.
In other words, the line Copyright (c) 2021 Werner Randelshofer, Switzerland is a part of the license, and the license text requires that The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software

It is hard for consumers to comply with the requirement above, especially if fastdoubleparser.jar does not include the license text.

The pom file for fastdoubleparser refers to a different license. See https://repo1.maven.org/maven2/ch/randelshofer/fastdoubleparser/0.8.0/fastdoubleparser-0.8.0.pom
The URL there is http://www.opensource.org/licenses/mit-license.php, which does not mention Werner Randelshofer.
fastdoubleparser.jar misses reference to the license. There are cases when fastdoubleparser.jar appears without the corresponding pom.xml, so if you consider fastdoubleparser.jar alone, it is hard to tell what is the license for that artifact.

Consider relicensing with Apache-2.0

You might want to consider switching to Apache-2.0 license. It has several advantages for the consumers:

The copyright and the license text are separate. In other words, there's a canonical Apache-2.0 license text, and you can put your copyright notice into NOTICE file. In general, it becomes easier to review, since every MIT license is different while every Apache-2.0 is the same.
Apache-2.0 license mentions Grant of Patent License while MIT does not mention patents
With Apache-2.0 you can have a canonical license URL right in pom.xml and MANIFEST.MF
With MIT, you literally force everyone to double-check the license text since no one knows if you have other modifications than the custom copyright.

If you absolutely like MIT, you might go with MIT or Apache-2.0, however, I'm not sure if you want that complication (as it would be impossible to express in pom.xml)

Fix steps

Include the license text into the jars under META-INF/LICENSE, META-INF/NOTICE, etc. It would enable consumers to get up-to-date licenses when they depend on fastdoubleparser.
Fix pom.xml to point to the proper license text (e.g. a permalink to GitHub). The current link http://www.opensource.org/licenses/mit-license.php is invalid as it points to a wrong license text.
Add Bundle-License: Apache-2.0 (or Bundle-License: MIT; link=...) manifest entry (where Apache-2.0 is SPDX identifier, see https://osgi.org/specification/osgi.core/7.0.0/framework.module.html#framework.module-bundle-license )

porting FastFloatParser

https://github.com/fastfloat/fast_float
is it on your roadmap ? :)

FastDoubleParser doesn't support all input formats as the default OpenJDK Float/Double parsers

The FastDoubleParser was recently introduced in Jackson through this issue FasterXML/jackson-core#577 is 3-4x times faster compared to the version that's implemented in OpenJDK. This is fantastic news, since many numerical processing workloads would benefit from this.

However the OpenJDK Double/Float parsers support variety of input formats that the FastDoubleParser will fail on, therefore it can cause unexpected regressions when used.

For example, the FastDoubleParser will fail with a NumberFormatException on these example patterns (there are more to be found in the OpenJDK Double/Float tests):

1.1e-23f
0x.003p12f
0x1.17742db862a4P-1d

I think apart from the first one in this list, the rest are all hexadecimal if I'm not mistaken.

Implement faster slow path for double parser (JDK 21)

JDK 21 now includes a faster conversion routine from BigDecimal to double.
We can now implement a performant slow path for double values with very few lines of code.

openjdk/jdk#9410
https://bugs.openjdk.org/browse/JDK-8205592

Bug: the highest bit of hexadecimal float significand ignored

Parsing hexadecimal float literals like 0x8000000000000000p0 yields an incorrect result.
See merge request #62

issue with module-info classes in v0.9.0 release

There are multiple module-info classes in the v0.9.0 jar. In v0.8.0, there was just the versions/9/module-info.class.

In v0.9.0, there are module-info.class fils in all the versions dirs.

This is causing FasterXML/jackson-core#1027

Would it be possible to get some background on the v0.9.0 changes, so that I can work out what to do with the jackson-core issues?

Publish 0.5.2 to maven central

Is it possible to publish the 0.5.2 release to maven central?

Thanks for the great project!

NPE in "FastDoubleParser", method "JavaBigDecimalParser.parseBigDecimal()"

See description in FasterXML/jackson-core#1161

See proposed fix in FasterXML/jackson-core#1162

Unfortunately the fix is incomplete. We need to replace all calls to parseDigitsRecursive() with last argument null by parseDigitsIterative().

	if (Long.compareUnsigned(significand, 0x1fffffffffffffL) <= 0) {
	// convert the integer into a double. This is lossless since
	// 0 <= i <= 2^53 - 1.
	float d = (float) significand;
	//
	// The general idea is as follows.
	// If 0 <= s < 2^53 then
	// 1) Both s and p can be represented exactly as 64-bit floating-point
	// values (binary64).
	// 2) Because s and p can be represented exactly as floating-point values,
	// then s * p will produce correctly rounded values.