Giter Site home page Giter Site logo

simd-json's Introduction

SIMD JSON for Rust โ€ƒ Build Status Build Status ARM Quality Latest Version Code Coverage

Rust port of extremely fast simdjson JSON parser with serde compatibility.


readme (for real!)

simdjson version

Currently tracking version 0.2.x of simdjson upstream (work in progress, feedback is welcome!).

CPU target

To be able to take advantage of simd-json your system needs to be SIMD-capable. On x86, it will select the best SIMD feature set (avx2 or sse4.2) during runtime. If simd-json is compiled with SIMD support, it will disable runtime detection.

simd-json supports AVX2, SSE4.2, NEON, and simd128 (wasm) natively. It also includes an unoptimized fallback implementation using native rust for other platforms; however, this is a last resort measure and nothing we'd recommend relying on.

Performance characteristics

  • CPU native cpu compilation results in the best performance.
  • CPU detection for AVX and SSE4.2 is the second fastest (on x86_* only).
  • Portable std::simd is the next fast implementation when compiled with a native CPU target.
  • std::simd or the rust native implementation is the least performant.

allocator

For best performance, we highly suggest using mimalloc or jemalloc instead of the system allocator used by default. Another recent allocator that works well (but we have yet to test it in production) is snmalloc.

runtime-detection

This feature allows selecting the optimal algorithm based on available features during runtime; it has no effect on non-x86 or x86_64 platforms. When neither AVX2 nor SSE4.2 is supported, it will fall back to a native rust implementation.

Note that an application compiled with runtime-detection will not run as fast as an application compiled for a specific CPU. The reason is that rust can't optimise as far as the instruction set when it uses the generic instruction set, and non-simd parts of the code won't be optimised for the given instruction set either.

portable

Currently disabled

An implementation of the algorithm using std::simd and up to 512 byte wide registers, currently disabled due to dependencies and is highly experimental.

serde_impl

simd-json is compatible with serde and serde-json. The Value types provided implement serializers and deserializers. In addition to that simd-json implements the Deserializer trait for the parser so it can deserialize anything that implements the serde Deserialize trait. Note, that serde provides both a Deserializer and a Deserialize trait.

That said the serde support is contained in the serde_impl feature which is part of the default feature set of simd-json, but it can be disabled.

known-key

The known-key feature changes the hash mechanism for the DOM representation of the underlying JSON object, from ahash to fxhash. The ahash hasher is faster at hashing and provides protection against DOS attacks by forcing multiple keys into a single hashing bucket. The fxhash hasher, on the other hand allows for repeatable hashing results, which in turn allows memoizing hashes for well known keys and saving time on lookups. In workloads that are heavy on accessing some well-known keys, this can be a performance advantage.

The known-key feature is optional and disabled by default and should be explicitly configured.

value-no-dup-keys

This flag has no effect on simd-json itself but purely affects the Value structs.

The value-no-dup-keys feature flag toggles stricter behavior for objects when deserializing into a Value. When enabled, the Value deserializer will remove duplicate keys in a JSON object and only keep the last one. If not set duplicate keys are considered undefined behavior and Value will not make guarantees on it's behavior.

big-int-as-float

The big-int-as-float feature flag treats very large integers that won't fit into u64 as f64 floats. This prevents parsing errors if the JSON you are parsing contains very large numbers. Keep in mind that f64 loses some precision when representing very large numbers.

safety

simd-json uses a lot of unsafe code.

There are a few reasons for this:

  • SIMD intrinsics are inherently unsafe. These uses of unsafe are inescapable in a library such as simd-json.
  • We work around some performance bottlenecks imposed by safe rust. These are avoidable, but at a performance cost. This is a more considered path in simd-json.

simd-json goes through extra scrutiny for unsafe code. These steps are:

  • Unit tests - to test 'the obvious' cases, edge cases, and regression cases
  • Structural constructive property based testing - We generate random valid JSON objects to exercise the full simd-json codebase stochastically. Floats are currently excluded since slightly different parsing algorithms lead to slightly different results here. In short "is simd-json correct".
  • Data-oriented property-based testing of string-like data - to assert that sequences of legal printable characters don't panic or crash the parser (they might and often error so - they are not valid JSON!)
  • Destructive Property based testing - make sure that no illegal byte sequences crash the parser in any way
  • Fuzzing - fuzz based on upstream & jsonorg simd pass/fail cases
  • Miri testing for UB

This doesn't ensure complete safety nor is at a bulletproof guarantee, but it does go a long way to assert that the library is of high production quality and fit for purpose for practical industrial applications.

Other interesting things

There are also bindings for upstream simdjson available here

License

simd-json itself is licensed under either of

However it ports a lot of code from simdjson so their work and copyright on that should be respected along side.

The serde integration is based on their example and serde-json so again, their copyright should as well be respected.

All Thanks To Our Contributors:

simd-json's People

Contributors

5225225 avatar as-com avatar asukharev avatar cjp10 avatar colerar avatar count-count avatar darach avatar dasetwas avatar dependabot-preview[bot] avatar dependabot[bot] avatar dvermd avatar fisherdarling avatar gelbpunkt avatar godtamit avatar hkratz avatar karrybit avatar licenser avatar mesteery avatar mishrasamiksha avatar parth4git avatar philss avatar pseitz avatar richarddavison avatar ritchie46 avatar saakshii12 avatar sam-kirby avatar singodiyashubham87 avatar sunnygleason avatar superblaubeere27 avatar universalmind303 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

simd-json's Issues

`Value` for Tape

It would be nice to be able to use the tape with the functions that are provided or the Value-trait, for read only situations this might significantly faster. The current problem is around ownership (hello rust ...)

arm support

It would be great to support for ARM's neon SIMD set.

clean up serde integration

serde is a nightmare, trying to integrate both owned and unowned values is crazy :/ this needs some cleaning up.

runtime dispatch and other goodies from upstream

One idea we should consider soonish is compiling everything for the target architecture regardless of CPU features and doing runtime detection & dispatch (as implemented in upstream, hopefully easier in Rust?).

There might be other goodies in upstream as well...

Investigate slow float parsing

Float parsing seems slow and it seems we're hitting 'parse_float' more often then we should. Let's take a look at why.

flattend json access for the tape

Do simdjson have flattened JSON access? (similar to https://github.com/pikkr/pikkr)

Will, there be any performance improvement if I use flattend json access?


Added by @Licenser as an issue description

The Tape struct should be querieable via a simplified version of JSONpath (section 3.2 in the paper linked below).

To achieve this we need at minumum:

  • a parser that takes a query string and turns it into a digestible format
  • a function that takes said format and applies it to a Tape
  • support for .<field> to query a object field
  • support for [<index>] to query array indexes
  • support for nesting those two
  • sufficient tests to cover the code (sufficient here is defined as 'does not drop crate coverage' or better)

Additional JSONpath operators are welcome but optional.

Improve testing

We still have a rather low test coverage on parts of the code (my fault -.-) now that we can measure it we should aim to improve it. 90%+ sounds like a good target

arch alignment & tracking upstream?

I started on this pull request to get us closer to upstream structure, I think we're in a good place to pick it up again maybe?

#46

@Licenser, what would you think about this idea:

  • in the README/Cargo.toml description/etc, we add some description of currently tracking simdjson vX.Y.Z
  • we finish that arch alignment to get the raw (current) codebase close in a majority sense (of course except for the necessary rust-specific enhancements)
  • we genericise/template-enable the code that was made more similar by arch alignment (reducing the size of platform-specific modules)
  • this should get the core features very close to the corresponding upstream version ๐ŸŽ‰

Very interested in your thoughts! ๐Ÿฅ‚

0.1.17 does not compile

The 0.1.17 release is failing to build in Travis, when 0.1.16 used to work. Any idea what may be going on here?
https://travis-ci.org/serde-rs/json-benchmark/jobs/565755287

error messages
error[E0425]: cannot find value `SIMDJSON_PADDING` in this scope
   --> /home/travis/.cargo/registry/src/github.com-1ecc6299db9ec823/simd-json-0.1.17/src/stage2.rs:256:48
    |
256 |                 let mut copy = vec![0u8; len + SIMDJSON_PADDING];
    |                                                ^^^^^^^^^^^^^^^^ not found in this scope

error[E0425]: cannot find value `SIMDJSON_PADDING` in this scope
   --> /home/travis/.cargo/registry/src/github.com-1ecc6299db9ec823/simd-json-0.1.17/src/stage2.rs:271:48
    |
271 |                 let mut copy = vec![0u8; len + SIMDJSON_PADDING];
    |                                                ^^^^^^^^^^^^^^^^ not found in this scope

error[E0425]: cannot find value `SIMDJSON_PADDING` in this scope
   --> /home/travis/.cargo/registry/src/github.com-1ecc6299db9ec823/simd-json-0.1.17/src/stage2.rs:286:48
    |
286 |                 let mut copy = vec![0u8; len + SIMDJSON_PADDING];
    |                                                ^^^^^^^^^^^^^^^^ not found in this scope

error[E0425]: cannot find value `SIMDJSON_PADDING` in this scope
   --> /home/travis/.cargo/registry/src/github.com-1ecc6299db9ec823/simd-json-0.1.17/src/lib.rs:142:79
    |
142 |         let needs_relocation = (buf_start + input.len()) % page_size::get() < SIMDJSON_PADDING;
    |                                                                               ^^^^^^^^^^^^^^^^ not found in this scope

error[E0425]: cannot find value `SIMDJSON_PADDING` in this scope
   --> /home/travis/.cargo/registry/src/github.com-1ecc6299db9ec823/simd-json-0.1.17/src/lib.rs:145:62
    |
145 |             let mut data: Vec<u8> = Vec::with_capacity(len + SIMDJSON_PADDING);
    |                                                              ^^^^^^^^^^^^^^^^ not found in this scope

error[E0425]: cannot find value `SIMDJSON_PADDING` in this scope
   --> /home/travis/.cargo/registry/src/github.com-1ecc6299db9ec823/simd-json-0.1.17/src/lib.rs:167:48
    |
167 |         let strings = Vec::with_capacity(len + SIMDJSON_PADDING);
    |                                                ^^^^^^^^^^^^^^^^ not found in this scope

error[E0425]: cannot find value `SIMDJSON_PADDING` in this scope
   --> /home/travis/.cargo/registry/src/github.com-1ecc6299db9ec823/simd-json-0.1.17/src/lib.rs:215:40
    |
215 |         let mut copy = vec![0u8; len + SIMDJSON_PADDING];
    |                                        ^^^^^^^^^^^^^^^^ not found in this scope

error[E0425]: cannot find value `SIMDJSON_PADDING` in this scope
   --> /home/travis/.cargo/registry/src/github.com-1ecc6299db9ec823/simd-json-0.1.17/src/lib.rs:227:18
    |
227 |         if len < SIMDJSON_PADDING {
    |                  ^^^^^^^^^^^^^^^^ not found in this scope

error[E0425]: cannot find value `SIMDJSON_PADDING` in this scope
   --> /home/travis/.cargo/registry/src/github.com-1ecc6299db9ec823/simd-json-0.1.17/src/lib.rs:228:44
    |
228 |             let mut copy = vec![0u8; len + SIMDJSON_PADDING];
    |                                            ^^^^^^^^^^^^^^^^ not found in this scope

error[E0599]: no function or associated item named `find_structural_bits` found for type `Deserializer<'_>` in the current scope
   --> /home/travis/.cargo/registry/src/github.com-1ecc6299db9ec823/simd-json-0.1.17/src/lib.rs:153:31
    |
112 | pub struct Deserializer<'de> {
    | ---------------------------- function or associated item `find_structural_bits` not found for this
...
153 |                 Deserializer::find_structural_bits(&data)
    |                               ^^^^^^^^^^^^^^^^^^^^ function or associated item not found in `Deserializer<'_>`

error[E0599]: no function or associated item named `find_structural_bits` found for type `Deserializer<'_>` in the current scope
   --> /home/travis/.cargo/registry/src/github.com-1ecc6299db9ec823/simd-json-0.1.17/src/lib.rs:156:36
    |
112 | pub struct Deserializer<'de> {
    | ---------------------------- function or associated item `find_structural_bits` not found for this
...
156 |             unsafe { Deserializer::find_structural_bits(input) }
    |                                    ^^^^^^^^^^^^^^^^^^^^ function or associated item not found in `Deserializer<'_>`

error[E0599]: no method named `parse_str_` found for type `&'a mut Deserializer<'de>` in the current scope
  --> /home/travis/.cargo/registry/src/github.com-1ecc6299db9ec823/simd-json-0.1.17/src/serde/de.rs:21:55
   |
21 |                 visitor.visit_borrowed_str(stry!(self.parse_str_()))
   |                                                       ^^^^^^^^^^

error[E0599]: no method named `parse_str_` found for type `&'a mut Deserializer<'de>` in the current scope
  --> /home/travis/.cargo/registry/src/github.com-1ecc6299db9ec823/simd-json-0.1.17/src/serde/de.rs:78:62
   |
78 |                 return visitor.visit_borrowed_str(stry!(self.parse_str_()));
   |                                                              ^^^^^^^^^^

error[E0599]: no method named `parse_str_` found for type `&'a mut Deserializer<'de>` in the current scope
  --> /home/travis/.cargo/registry/src/github.com-1ecc6299db9ec823/simd-json-0.1.17/src/serde/de.rs:81:47
   |
81 |         visitor.visit_borrowed_str(stry!(self.parse_str_()))
   |                                               ^^^^^^^^^^

error[E0599]: no method named `parse_str_` found for type `&'a mut Deserializer<'de>` in the current scope
  --> /home/travis/.cargo/registry/src/github.com-1ecc6299db9ec823/simd-json-0.1.17/src/serde/de.rs:94:53
   |
94 |                 return visitor.visit_str(stry!(self.parse_str_()));
   |                                                     ^^^^^^^^^^

error[E0599]: no method named `parse_str_` found for type `&'a mut Deserializer<'de>` in the current scope
  --> /home/travis/.cargo/registry/src/github.com-1ecc6299db9ec823/simd-json-0.1.17/src/serde/de.rs:97:38
   |
97 |         visitor.visit_str(stry!(self.parse_str_()))
   |                                      ^^^^^^^^^^

error[E0599]: no method named `parse_str_` found for type `Deserializer<'de>` in the current scope
   --> /home/travis/.cargo/registry/src/github.com-1ecc6299db9ec823/simd-json-0.1.17/src/value/borrowed.rs:184:29
    |
184 |             b'"' => self.de.parse_str_().map(Value::from),
    |                             ^^^^^^^^^^
    | 
   ::: /home/travis/.cargo/registry/src/github.com-1ecc6299db9ec823/simd-json-0.1.17/src/lib.rs:112:1
    |
112 | pub struct Deserializer<'de> {
    | ---------------------------- method `parse_str_` not found for this

error[E0599]: no method named `parse_str_` found for type `Deserializer<'de>` in the current scope
   --> /home/travis/.cargo/registry/src/github.com-1ecc6299db9ec823/simd-json-0.1.17/src/value/borrowed.rs:199:29
    |
199 |             b'"' => self.de.parse_str_().map(Value::from),
    |                             ^^^^^^^^^^
    | 
   ::: /home/travis/.cargo/registry/src/github.com-1ecc6299db9ec823/simd-json-0.1.17/src/lib.rs:112:1
    |
112 | pub struct Deserializer<'de> {
    | ---------------------------- method `parse_str_` not found for this

error[E0599]: no method named `parse_str_` found for type `Deserializer<'de>` in the current scope
   --> /home/travis/.cargo/registry/src/github.com-1ecc6299db9ec823/simd-json-0.1.17/src/value/borrowed.rs:244:37
    |
244 |             let key = stry!(self.de.parse_str_());
    |                                     ^^^^^^^^^^
    | 
   ::: /home/travis/.cargo/registry/src/github.com-1ecc6299db9ec823/simd-json-0.1.17/src/lib.rs:112:1
    |
112 | pub struct Deserializer<'de> {
    | ---------------------------- method `parse_str_` not found for this

error[E0599]: no method named `parse_str_` found for type `Deserializer<'de>` in the current scope
   --> /home/travis/.cargo/registry/src/github.com-1ecc6299db9ec823/simd-json-0.1.17/src/value/owned.rs:184:29
    |
184 |             b'"' => self.de.parse_str_().map(Value::from),
    |                             ^^^^^^^^^^
    | 
   ::: /home/travis/.cargo/registry/src/github.com-1ecc6299db9ec823/simd-json-0.1.17/src/lib.rs:112:1
    |
112 | pub struct Deserializer<'de> {
    | ---------------------------- method `parse_str_` not found for this

error[E0599]: no method named `parse_str_` found for type `Deserializer<'de>` in the current scope
   --> /home/travis/.cargo/registry/src/github.com-1ecc6299db9ec823/simd-json-0.1.17/src/value/owned.rs:199:29
    |
199 |             b'"' => self.de.parse_str_().map(Value::from),
    |                             ^^^^^^^^^^
    | 
   ::: /home/travis/.cargo/registry/src/github.com-1ecc6299db9ec823/simd-json-0.1.17/src/lib.rs:112:1
    |
112 | pub struct Deserializer<'de> {
    | ---------------------------- method `parse_str_` not found for this

error[E0599]: no method named `parse_str_` found for type `Deserializer<'de>` in the current scope
   --> /home/travis/.cargo/registry/src/github.com-1ecc6299db9ec823/simd-json-0.1.17/src/value/owned.rs:244:37
    |
244 |             let key = stry!(self.de.parse_str_());
    |                                     ^^^^^^^^^^
    | 
   ::: /home/travis/.cargo/registry/src/github.com-1ecc6299db9ec823/simd-json-0.1.17/src/lib.rs:112:1
    |
112 | pub struct Deserializer<'de> {
    | ---------------------------- method `parse_str_` not found for this

ARM code coverage

SSE4.2 or earlier?

I ran the following little script rg '_mm' src/sse42 | sed -e 's/^.*://' -e 's/^[ ]*//' -e $'s/_mm/\\\n_mm/g' | grep '_mm' | sed -e 's/(.*//' | sort -u to see what SIMD functions are used in the sse4.2 target and I found _mm_testz_si128 as only function used in SSE4.1 everything else is part of the 3.x branch. We could possibly relax th test to SSE4.1 or even add a ssse3 target that falls back on the one function.

@sunnygleason since you wrote the SSE42 implementation what are your thoughts?

Split out code into many crates.

Right now the crate provides a great way to parse json. But a lot of the simd tricks are much more general. For example parsing a float or an integer from a string, locating unescaped quotes, etc. may be valuable generally.

If these could be made into sub-crates that are individually pushed to crates.io they could be more easily reused in other contexts.

If they can be made nostd perhaps simd parsing of ints/floats could even be added to the standard library instead of the .parse() method on string.

Add relocation at page ends

Simdjson relocates the data if the end is too close to a page end so it can always read a bit past the data. We should adopt that behavior as it saves us from having to re-check on the hot path

Try adopting state machine approach for DOM creation

originally simdjson uses a state machine to build the structure on a tape. It's worth investigating if using this state machine approach is more efficient then the current recursion (spoiler: it probably is) and if it can be adopted to create the DOM or interface with serde/deserialize.

more deskriptive lifetimes

After ranting about single letter lifetimes I feel it's just right to own up to it and get rid of them here. The goal would be to rename each lifetime to something more descriptive. An example would be 'v in borrowd::Value to 'value or 'input.

Use with serde_derive

Is it possible to use this crate with serde_derive to speed up parsing? E.g. currently I have

#[derive(Serialize, Deserialize, Debug)]
pub struct Foo {
  pub a: u64,
  pub b: Vec<u64>,
  pub c: Option<u64>,
}
...

  let file = File::open(filename)?;
  let reader = BufReader::new(file);
  let result: Foo = serde_json::from_reader(reader)?

Is there a way to do something like this?

  let result: Foo = simd_json::from_reader(reader)?

repository name

The crate is named simd-json (since we can't have simdjson ...) so would it make sense to rename the repository to reflect the crate name?

It will have very little (no?) outside impact but I'd love to get some feedback on this before going ahead.

quite strange detection result

after upgrading to 0.2.4 I got stuck by simd detection error.

My PC supports sse4.2 and avx2, but it still detects nothing.

processor       : 11
vendor_id       : GenuineIntel
cpu family      : 6
model           : 158
model name      : Intel(R) Xeon(R) E-2176M  CPU @ 2.70GHz
stepping        : 10
microcode       : 0xca
cpu MHz         : 900.053
cache size      : 12288 KB
physical id     : 0
siblings        : 12
core id         : 5
cpu cores       : 6
apicid          : 11
initial apicid  : 11
fpu             : yes
fpu_exception   : yes
cpuid level     : 22
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs taa itlb_multihit
bogomips        : 5401.81
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

rustc also detect the features:

rustc --print target-features
Available features for this target:
    16bit-mode                    - 16-bit mode (i8086).
    32bit-mode                    - 32-bit mode (80386).
    3dnow                         - Enable 3DNow! instructions.
    3dnowa                        - Enable 3DNow! Athlon instructions.
    64bit                         - Support 64-bit instructions.
    64bit-mode                    - 64-bit mode (x86_64).
    adx                           - Support ADX instructions.
    aes                           - Enable AES instructions.
    avx                           - Enable AVX instructions.
    avx2                          - Enable AVX2 instructions.
    avx512bf16                    - Support bfloat16 floating point.
    avx512bitalg                  - Enable AVX-512 Bit Algorithms.
    avx512bw                      - Enable AVX-512 Byte and Word Instructions.
    avx512cd                      - Enable AVX-512 Conflict Detection Instructions.
    avx512dq                      - Enable AVX-512 Doubleword and Quadword Instructions.
    avx512er                      - Enable AVX-512 Exponential and Reciprocal Instructions.
    avx512f                       - Enable AVX-512 instructions.
    avx512ifma                    - Enable AVX-512 Integer Fused Multiple-Add.
    avx512pf                      - Enable AVX-512 PreFetch Instructions.
    avx512vbmi                    - Enable AVX-512 Vector Byte Manipulation Instructions.
    avx512vbmi2                   - Enable AVX-512 further Vector Byte Manipulation Instructions.
    avx512vl                      - Enable AVX-512 Vector Length eXtensions.
    avx512vnni                    - Enable AVX-512 Vector Neural Network Instructions.
    avx512vp2intersect            - Enable AVX-512 vp2intersect.
    avx512vpopcntdq               - Enable AVX-512 Population Count Instructions.
    bmi                           - Support BMI instructions.
    bmi2                          - Support BMI2 instructions.
    branchfusion                  - CMP/TEST can be fused with conditional branches.
    cldemote                      - Enable Cache Demote.
    clflushopt                    - Flush A Cache Line Optimized.
    clwb                          - Cache Line Write Back.
    clzero                        - Enable Cache Line Zero.
    cmov                          - Enable conditional move instructions.
    cx16                          - 64-bit with cmpxchg16b.
    cx8                           - Support CMPXCHG8B instructions.
    enqcmd                        - Has ENQCMD instructions.
    ermsb                         - REP MOVS/STOS are fast.
    f16c                          - Support 16-bit floating point conversion instructions.
    false-deps-lzcnt-tzcnt        - LZCNT/TZCNT have a false dependency on dest register.
    false-deps-popcnt             - POPCNT has a false dependency on dest register.
    fast-11bytenop                - Target can quickly decode up to 11 byte NOPs.
    fast-15bytenop                - Target can quickly decode up to 15 byte NOPs.
    fast-bextr                    - Indicates that the BEXTR instruction is implemented as a single uop with good throughput.
    fast-gather                   - Indicates if gather is reasonably fast.
    fast-hops                     - Prefer horizontal vector math instructions (haddp, phsub, etc.) over normal vector instructions with shuffles.
    fast-lzcnt                    - LZCNT instructions are as fast as most simple integer ops.
    fast-partial-ymm-or-zmm-write - Partial writes to YMM/ZMM registers are fast.
    fast-scalar-fsqrt             - Scalar SQRT is fast (disable Newton-Raphson).
    fast-scalar-shift-masks       - Prefer a left/right scalar logical shift pair over a shift+and pair.
    fast-shld-rotate              - SHLD can be used as a faster rotate.
    fast-variable-shuffle         - Shuffles with variable masks are fast.
    fast-vector-fsqrt             - Vector SQRT is fast (disable Newton-Raphson).
    fast-vector-shift-masks       - Prefer a left/right vector logical shift pair over a shift+and pair.
    fma                           - Enable three-operand fused multiple-add.
    fma4                          - Enable four-operand fused multiple-add.
    fsgsbase                      - Support FS/GS Base instructions.
    fxsr                          - Support fxsave/fxrestore instructions.
    gfni                          - Enable Galois Field Arithmetic Instructions.
    idivl-to-divb                 - Use 8-bit divide for positive values less than 256.
    idivq-to-divl                 - Use 32-bit divide for positive values less than 2^32.
    invpcid                       - Invalidate Process-Context Identifier.
    lea-sp                        - Use LEA for adjusting the stack pointer.
    lea-uses-ag                   - LEA instruction needs inputs at AG stage.
    lwp                           - Enable LWP instructions.
    lzcnt                         - Support LZCNT instruction.
    macrofusion                   - Various instructions can be fused with conditional branches.
    merge-to-threeway-branch      - Merge branches to a three-way conditional branch.
    mmx                           - Enable MMX instructions.
    movbe                         - Support MOVBE instruction.
    movdir64b                     - Support movdir64b instruction.
    movdiri                       - Support movdiri instruction.
    mpx                           - Support MPX instructions.
    mwaitx                        - Enable MONITORX/MWAITX timer functionality.
    nopl                          - Enable NOPL instruction.
    pad-short-functions           - Pad short functions.
    pclmul                        - Enable packed carry-less multiplication instructions.
    pconfig                       - platform configuration instruction.
    pku                           - Enable protection keys.
    popcnt                        - Support POPCNT instruction.
    prefer-256-bit                - Prefer 256-bit AVX instructions.
    prefetchwt1                   - Prefetch with Intent to Write and T1 Hint.
    prfchw                        - Support PRFCHW instructions.
    ptwrite                       - Support ptwrite instruction.
    rdpid                         - Support RDPID instructions.
    rdrnd                         - Support RDRAND instruction.
    rdseed                        - Support RDSEED instruction.
    retpoline                     - Remove speculation of indirect branches from the generated code, either by avoiding them entirely or lowering them with a speculation blocking construct.
    retpoline-external-thunk      - When lowering an indirect call or branch using a `retpoline`, rely on the specified user provided thunk rather than emitting one ourselves. Only has effect when combined with some other retpoline feature.
    retpoline-indirect-branches   - Remove speculation of indirect branches from the generated code.
    retpoline-indirect-calls      - Remove speculation of indirect calls from the generated code.
    rtm                           - Support RTM instructions.
    sahf                          - Support LAHF and SAHF instructions.
    sgx                           - Enable Software Guard Extensions.
    sha                           - Enable SHA instructions.
    shstk                         - Support CET Shadow-Stack instructions.
    slow-3ops-lea                 - LEA instruction with 3 ops or certain registers is slow.
    slow-incdec                   - INC and DEC instructions are slower than ADD and SUB.
    slow-lea                      - LEA instruction with certain arguments is slow.
    slow-pmaddwd                  - PMADDWD is slower than PMULLD.
    slow-pmulld                   - PMULLD instruction is slow.
    slow-shld                     - SHLD instruction is slow.
    slow-two-mem-ops              - Two memory operand instructions are slow.
    slow-unaligned-mem-16         - Slow unaligned 16-byte memory access.
    slow-unaligned-mem-32         - Slow unaligned 32-byte memory access.
    soft-float                    - Use software floating point features.
    sse                           - Enable SSE instructions.
    sse-unaligned-mem             - Allow unaligned memory operands with SSE instructions.
    sse2                          - Enable SSE2 instructions.
    sse3                          - Enable SSE3 instructions.
    sse4.1                        - Enable SSE 4.1 instructions.
    sse4.2                        - Enable SSE 4.2 instructions.
    sse4a                         - Support SSE 4a instructions.
    ssse3                         - Enable SSSE3 instructions.
    tbm                           - Enable TBM instructions.
    vaes                          - Promote selected AES instructions to AVX512/AVX registers.
    vpclmulqdq                    - Enable vpclmulqdq instructions.
    waitpkg                       - Wait and pause enhancements.
    wbnoinvd                      - Write Back No Invalidate.
    x87                           - Enable X87 float instructions.
    xop                           - Enable XOP instructions.
    xsave                         - Support xsave instructions.
    xsavec                        - Support xsavec instructions.
    xsaveopt                      - Support xsaveopt instructions.
    xsaves                        - Support xsaves instructions.

Moreover, I am just using core::arch to write simd programs these days and it works just fine. (No SIGILL and gaining expected speedup).

Therefore, I think there is something wrong with current settings?

Clarify README regarding jemalloc

The text says "... make sure to use jemalloc and not the system allocator (which has now become default in rust)". But it's not clear to me what is referred by "which". Is it jemalloc or the system allocator? I believe it only makes sense to be about jemalloc, so I propose moving the parenthesis remark to just after "jemalloc" name.

Improved number handling

It would be nice to improve number handling, right now we only have I64 and F64, this ignores U64 and extended values such as i and u 128.

defaulting to the 128 bit types will probably have a decreased performance we might want to avoid that, we could however extend then as additional types? but that would explode the number of types we have (however ValueTrait mittigates that).

we definetly want a u64 type along with the i64 one to be able to represent those numbers.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.