Giter Site home page Giter Site logo

Comments (8)

errantmind avatar errantmind commented on May 13, 2024

I've also been experimenting with getting inlining to work across the FFI, and succeeded using Rusts' 'linker-plugin-lto', clang-12, and lld-12. This improved the benchmarks for pico a little more and put both pico benchmarks in the lead, the full pico benchmark hitting ~2900 MB/s vs httparse at 1751 MB/s on my ancient laptop.

from httparse.

seanmonstar avatar seanmonstar commented on May 13, 2024

Ah yea good point. Originally httparse didn't have SIMD support either, so it was more similar.

from httparse.

errantmind avatar errantmind commented on May 13, 2024

I haven't looked all that far into it but I'm interested in your thoughts on why Pico is faster. Is it doing some memory management tricks or something? ..I'm working on a pet project and am trying to figure out if I should just write it in C, or if there is a way to get comparable results with unsafe Rust

from httparse.

seanmonstar avatar seanmonstar commented on May 13, 2024

How do you run the Rust benchmarks? Do you set the target CPU so it doesn't have to do runtime checks? https://rust-lang.github.io/packed_simd/perf-guide/target-feature/rustflags.html

from httparse.

errantmind avatar errantmind commented on May 13, 2024

I run these flags globally in my config.toml:

rustflags=["-Ctarget-cpu=native","-Ctarget-feature=+sse4.2"]

from httparse.

errantmind avatar errantmind commented on May 13, 2024

I'm going to dump some info here for reproducibility purposes

The speed improvements came primarily from two areas, both involved modifying the underlying Pico bindings crate

  1. Modifying the underlying crate to compile with sse4.2 and LTO (added -msse4 and -flto=thin to the cc compile command)
  2. Check llvm version and current host using rustc --version --verbose. Host information needed later
  3. Install llvm version used by rustc, current nightly uses llvm 11. On Ubuntu 20.04 you can install the binaries needed with sudo apt-get install clang-11 lld-11
  4. Set clang as primary for cc using export CC=/usr/bin/clang-11 (modify this location as needed by your dist)
  5. Set the appropriate rustflags in ~/.cargo/config.toml . Use the host information above. For me this is:
[target.x86_64-unknown-linux-gnu]
rustflags = [
   "-Ctarget-cpu=native",
   "-Clink-arg=-fuse-ld=lld",
   "-Clinker=clang-11",
]
  1. Clean up benchmark project if needed with cargo clean && rm Cargo.lock
  2. cargo bench in benchmark crate

Full cc command from Pico bindings crate:

cc::Build::new()
        .file("extern/picohttpparser/picohttpparser.c")
        .opt_level_str(&"fast")
        .flag("-funroll-loops")
        .flag("-msse4")
        .flag("-flto=thin")
        .flag("-march=native")
        .compile("libpicohttpparser.a");

from httparse.

errantmind avatar errantmind commented on May 13, 2024

Updated the above comment as the steps it described were incorrect. The above steps work as expected. Here are the results of my latest test:

results

from httparse.

errantmind avatar errantmind commented on May 13, 2024

Alright, the adventure is coming to an end with this final update:

  • Cargo automatically adds the linker-plugin-lto flag when building certain kinds of crates, like my sys crate in this example. This can be verified by passing verbose (i.e. cargo build --release --verbose)
    • It appears to be unnecessary to build all the dependency chain with linker-plugin-lto flag, just the sys crate (which is automatic). If all dependencies are built with linker-plugin-lto, there is actually a loss of about 5% performance
  • Using a cargo-wide config (e.g. ~/.cargo/config.toml) is overwritten by setting RUSTFLAGS
  • A cargo-wide config overrides a cargo config local to a project (e.g. <project>/.cargo/config.toml)
  • clang-12 is significantly (~5%) faster than clang-11 for the pico tests (for some unknown reason). clang-13 (dev build), so far, is not significantly faster than clang-12

Final results

from httparse.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.