Giter Site home page Giter Site logo

Comments (7)

phiresky avatar phiresky commented on May 20, 2024 6

Just in case someone else finds this, here's an overview of things to make it faster (tested on a file with four columns and one billion lines):

  1. compile with --release (huge difference, >10x perf)

  2. wrap your input in a BufReader::with_capacity(1_000_000). No difference for me, probably depends on where the data comes from

  3. use .byte_records instead of .records if you don't need string parsing (only minor difference for me)

  4. enable this: (opt-level 3 is not much faster than level 2, lto=fat improves perf by 15%!)

     [profile.release]
     opt-level = 3
     debug = true
     lto = "fat"
    
  5. compile with RUSTFLAGS="-C target-cpu=native" (only minor difference)

  6. Instead of for result in reader.into_byte_records(), use:

    let mut record = csv::ByteRecord::new();
    while reader.read_byte_record(&mut record)? {

    this doubles the performance!! This is also what xsv does: https://github.com/BurntSushi/xsv/blob/3de6c04269a7d315f7e9864b9013451cd9580a08/src/cmd/select.rs#L77

With the release mode and read_byte_record, perf of using the library is the same as xsv select for me.

from rust-csv.

Eh2406 avatar Eh2406 commented on May 20, 2024 1

Did you compile in release? Don't know the real answer, just a knee jerk response to "Rust beginner" and "expecting to run faster" :-)

from rust-csv.

BurntSushi avatar BurntSushi commented on May 20, 2024 1

@patricebellan Have you read the section in the docs about iterating over records?

I would somewhat expect xsv to run quite a bit faster than 25 seconds on a mere 7 million rows. I share @Eh2406's concerns. Instead of cargo build you might try cargo build --release.

xsv select barely does anything either. It should run within spitting distance of a simple count loop.

from rust-csv.

BurntSushi avatar BurntSushi commented on May 20, 2024 1

Thanks for adding your tip here! There is more explanation here on the technique: https://docs.rs/csv/1.1.3/csv/tutorial/index.html#amortizing-allocations

from rust-csv.

patricebellan avatar patricebellan commented on May 20, 2024

Nice try @Eh2406, I did compile in release ;)

I'm running it on a VM, so overall performance may not be the best.
But I was mostly concerned about comparing both, not pure performance per se.

from rust-csv.

hakunin avatar hakunin commented on May 20, 2024

Just ran into this myself - debug mode reads mere 2M rows in 18seconds, release build does it in 630ms.

from rust-csv.

phiresky avatar phiresky commented on May 20, 2024

Thanks for the link. Funny, I didn't actually see that, I just searched for "performance" within the docs index page but it didn't have results, and I assumed "tutorial" was more about handling different types of files etc, not about improving performance. And the docs search (obviously i guess) didn't yield it either :)

from rust-csv.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.