gnzlbg / bitintr Goto Github PK

View Code? Open in Web Editor NEW

42.0 42.0 10.0 850 KB

Portable Bitwise Manipulation Intrinsics

Home Page: https://gnzlbg.github.io/bitintr

License: MIT License

Rust 88.57% Assembly 1.19% Python 8.20% Shell 2.03%

bitintr's People

Contributors

Stargazers

Watchers

Forkers

blt mortenlohne pnordahl novadenizen amaurel justanotherdot davide125 peamaeq seemhuei finnbear

bitintr's Issues

Figure out a way to make it run on stable

counts/offsets shouldn't reuse the `Self` type from the trait

popcnt, tzcnt, bzhi, bextr and a few others should probably take/return a u32 (or similar) to match the register types of the instructions, rather than the Self type of the key argument when they talk about positions. Otherwise the calling convention of these combinators is rather different than the underlying assembly opcodes, and any desugaring into those instructions is rather unlikely.

As a reference from the standard library consider that .count_ones() returns a u32 for a u64 argument.

Implement missing BMI2 instructions

The following BMI2 instructions are still missing:

RORX
SARX
SHRX
SHLX

Update with stabilized `core::arch` modules

Hi !

First of all thanks for that library, it's been super easy adding BMI2 support for a decompression algorithm I needed, and that made the code 3x faster on x86 without having to worry about the target arch in my own code.

That being said, I noticed some the use of core::arch was being feature-gated behind an unstable config flag, even though the core::arch::x86 and core::arch::x86_64 have been stabilized.

This would make it possible to provide support for x86 and x86_64 intrinsincs on stable builds, whereas currently it falls back to the default implementation.

Unable to natively compile as `target-cpu=native` on `aarch64` ( AWS `c6g` )

Hi!

According to the history and the library CI aarch64 should be fully supported. Yet:

~$ uname -a
Linux ip-172-31-25-117 5.4.0-1018-aws #18-Ubuntu SMP Wed Jun 24 01:14:47 UTC 2020 aarch64 aarch64 aarch64 GNU/Linux

~$ rustc --version
rustc 1.46.0-nightly (f844ea1e5 2020-07-03)

~$ RUSTFLAGS="-C target-cpu=native -g" make ...

...

   Compiling bitintr v0.3.0
error[E0658]: use of unstable library feature 'stdsimd'
  --> {{redacted}}/.cargo/registry/src/github.com-1ecc6299db9ec823/bitintr-0.3.0/src/lib.rs:28:13
   |
28 |     pub use core::arch::aarch64::*;
   |             ^^^^^^^^^^^^^^^^^^^
   |
   = note: see issue #27731 <https://github.com/rust-lang/rust/issues/27731> for more information

error: aborting due to previous error

What am I missing / doing wrong?

Benchmark and improve alg::arm::v7::rbit

Benchmark faster 8-bit versions on x86.
Benchmark faster 16-bit version on x86.
32-bit version generates the rbit instruction in armv7.
64-bit version fails to generate rbit on armv8 (blocking on rust-lang#39410).

note: llvm.bitreverse generates pretty bad code in LLVM 3.9 but most of it is fixed in LLVM 4.0.

pdep failure

Great library, thanks for making it!

I am writing succinct data structure code using pdep to implement select on 64-bit words:

pub fn select64(x: u64, idx: usize) -> u64 {
    (1u64 << idx).pdep(x).trailing_zeros() as u64
}

However, the following test is failing:

fn test_select64_2() {
    let x:u64 = 18446744073709551615u64;
    assert_eq!(select64(x, 0), 0);
}

Note that x is 111....111 (i.e. -1).

The error message I'm seeing is:

thread 'words::tests::test_select64_2' panicked at 'attempt to add with overflow', /<path-to-my-home-directory>/.cargo/registry/src/github.com-1ecc6299db9ec823/bitintr-0.3.0/src/pdep.rs:107:17

I'm willing to be convinced that my code is somehow bad, but I think this might be a bug in the library. I'm on an X86_64 Mac.

Use faster PEXT/PDEP implemetation on older/non-intel CPU

The ZP7 https://github.com/zwegner/zp7 implementation by Zach Wegner claims to be faster than the builtin instruction on some AMD architectures for most input masks. According to this twitter the performance on some AMD CPUs is input dependend and much worse than the 1 cycle throughput on intel.

The code is branch free and probably also faster than the naive loop currently used in bitintr. It uses CLMUL if available.

If I find the time I will do a rust implementation and benchmark it.

Use docs.rs link in the Cargo.toml

`stdsimd` feature has been removed from nightly rustc

Root cause: rust-lang/rust#117372

I'm trying to get my benchmarks compiling on nightly again, this seems to be next in line. Per the linked PR, it looks like the stdsimd feature got split up into several stdarch_* features on a per-target basis.

BZHI panic at wrong bit_position

As mentioned in rust-lang/stdarch#932 the panic position for BZHI is wrong. According to the doc it'll panic when

If bit_position >= bit_size() and -C debug-assertions=1.

However the behavior is actually defined for all bit_position <= 0xFF because in the instruction the INDEX (i.e bit_position) value is the low 8 bits of the 2nd source and "The INDEX value is saturated at the value of OperandSize -1" which means for INDEX >= OperandSize the destination register is unchanged. It's confirmed by the operation:

N ← SRC2[7:0]
DEST ← SRC1
IF (N < OperandSize)
    DEST[OperandSize-1:N] ← 0
FI
IF (N > OperandSize - 1)
    CF ← 1
ELSE
    CF ← 0
FI

As you can see, if N >= OperandSize nothing in the destination register is touched and there are no undefined states except for the AF and PF flags. The Chromium test suite also actually tests those large N values such as 64 or 257 The bit_position == bit_size() case is actually very useful to create a mask with N least significant bits set with N in [0, 64] range

So I think the behavior should be changed to panic if bit_position > 0xFF or if bit_position > bit_size() so it won't panic for the bit_position == bit_size() case

+stable build on M1 Mac

Basically the same problem as #12.

$ cargo +stable build

   Compiling bitintr v0.3.0 (/Users/redacted/Develop/rust-workspace/bitintr)
error[E0658]: use of unstable library feature 'stdsimd'
  --> src/lib.rs:28:13
   |
28 |     pub use core::arch::aarch64::*;
   |             ^^^^^^^^^^^^^^^^^^^
   |
   = note: see issue #27731 <https://github.com/rust-lang/rust/issues/27731> for more information

Does this crate (intentionally) require +nightly on the aarch64 platform?

$ rustc +stable --version
rustc 1.55.0 (c8dfcfe04 2021-09-06)