Giter Site home page Giter Site logo

bitintr's People

Contributors

bors[bot] avatar gnzlbg avatar mortenlohne avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

bitintr's Issues

counts/offsets shouldn't reuse the `Self` type from the trait

popcnt, tzcnt, bzhi, bextr and a few others should probably take/return a u32 (or similar) to match the register types of the instructions, rather than the Self type of the key argument when they talk about positions. Otherwise the calling convention of these combinators is rather different than the underlying assembly opcodes, and any desugaring into those instructions is rather unlikely.

As a reference from the standard library consider that .count_ones() returns a u32 for a u64 argument.

Update with stabilized `core::arch` modules

Hi !

First of all thanks for that library, it's been super easy adding BMI2 support for a decompression algorithm I needed, and that made the code 3x faster on x86 without having to worry about the target arch in my own code.

That being said, I noticed some the use of core::arch was being feature-gated behind an unstable config flag, even though the core::arch::x86 and core::arch::x86_64 have been stabilized.

This would make it possible to provide support for x86 and x86_64 intrinsincs on stable builds, whereas currently it falls back to the default implementation.

Unable to natively compile as `target-cpu=native` on `aarch64` ( AWS `c6g` )

Hi!

According to the history and the library CI aarch64 should be fully supported. Yet:

~$ uname -a
Linux ip-172-31-25-117 5.4.0-1018-aws #18-Ubuntu SMP Wed Jun 24 01:14:47 UTC 2020 aarch64 aarch64 aarch64 GNU/Linux
~$ rustc --version
rustc 1.46.0-nightly (f844ea1e5 2020-07-03)
~$ RUSTFLAGS="-C target-cpu=native -g" make ...

...

   Compiling bitintr v0.3.0
error[E0658]: use of unstable library feature 'stdsimd'
  --> {{redacted}}/.cargo/registry/src/github.com-1ecc6299db9ec823/bitintr-0.3.0/src/lib.rs:28:13
   |
28 |     pub use core::arch::aarch64::*;
   |             ^^^^^^^^^^^^^^^^^^^
   |
   = note: see issue #27731 <https://github.com/rust-lang/rust/issues/27731> for more information

error: aborting due to previous error

What am I missing / doing wrong?

Benchmark and improve alg::arm::v7::rbit

  • Benchmark faster 8-bit versions on x86.
  • Benchmark faster 16-bit version on x86.
  • 32-bit version generates the rbit instruction in armv7.
  • 64-bit version fails to generate rbit on armv8 (blocking on rust-lang#39410).

note: llvm.bitreverse generates pretty bad code in LLVM 3.9 but most of it is fixed in LLVM 4.0.

pdep failure

Great library, thanks for making it!

I am writing succinct data structure code using pdep to implement select on 64-bit words:

pub fn select64(x: u64, idx: usize) -> u64 {
    (1u64 << idx).pdep(x).trailing_zeros() as u64
}

However, the following test is failing:

fn test_select64_2() {
    let x:u64 = 18446744073709551615u64;
    assert_eq!(select64(x, 0), 0);
}

Note that x is 111....111 (i.e. -1).

The error message I'm seeing is:

thread 'words::tests::test_select64_2' panicked at 'attempt to add with overflow', /<path-to-my-home-directory>/.cargo/registry/src/github.com-1ecc6299db9ec823/bitintr-0.3.0/src/pdep.rs:107:17

I'm willing to be convinced that my code is somehow bad, but I think this might be a bug in the library. I'm on an X86_64 Mac.

Use faster PEXT/PDEP implemetation on older/non-intel CPU

The ZP7 https://github.com/zwegner/zp7 implementation by Zach Wegner claims to be faster than the builtin instruction on some AMD architectures for most input masks. According to this twitter the performance on some AMD CPUs is input dependend and much worse than the 1 cycle throughput on intel.

The code is branch free and probably also faster than the naive loop currently used in bitintr. It uses CLMUL if available.

If I find the time I will do a rust implementation and benchmark it.

BZHI panic at wrong bit_position

As mentioned in rust-lang/stdarch#932 the panic position for BZHI is wrong. According to the doc it'll panic when

If bit_position >= bit_size() and -C debug-assertions=1.

However the behavior is actually defined for all bit_position <= 0xFF because in the instruction the INDEX (i.e bit_position) value is the low 8 bits of the 2nd source and "The INDEX value is saturated at the value of OperandSize -1" which means for INDEX >= OperandSize the destination register is unchanged. It's confirmed by the operation:

N ← SRC2[7:0]
DEST ← SRC1
IF (N < OperandSize)
    DEST[OperandSize-1:N] ← 0
FI
IF (N > OperandSize - 1)
    CF ← 1
ELSE
    CF ← 0
FI

As you can see, if N >= OperandSize nothing in the destination register is touched and there are no undefined states except for the AF and PF flags. The Chromium test suite also actually tests those large N values such as 64 or 257 The bit_position == bit_size() case is actually very useful to create a mask with N least significant bits set with N in [0, 64] range

So I think the behavior should be changed to panic if bit_position > 0xFF or if bit_position > bit_size() so it won't panic for the bit_position == bit_size() case

+stable build on M1 Mac

Basically the same problem as #12.

$ cargo +stable build
   Compiling bitintr v0.3.0 (/Users/redacted/Develop/rust-workspace/bitintr)
error[E0658]: use of unstable library feature 'stdsimd'
  --> src/lib.rs:28:13
   |
28 |     pub use core::arch::aarch64::*;
   |             ^^^^^^^^^^^^^^^^^^^
   |
   = note: see issue #27731 <https://github.com/rust-lang/rust/issues/27731> for more information

Does this crate (intentionally) require +nightly on the aarch64 platform?

$ rustc +stable --version
rustc 1.55.0 (c8dfcfe04 2021-09-06)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.