Comments (7)
Just in case someone else finds this, here's an overview of things to make it faster (tested on a file with four columns and one billion lines):
-
compile with
--release
(huge difference, >10x perf) -
wrap your input in a BufReader::with_capacity(1_000_000). No difference for me, probably depends on where the data comes from
-
use .byte_records instead of .records if you don't need string parsing (only minor difference for me)
-
enable this: (opt-level 3 is not much faster than level 2, lto=fat improves perf by 15%!)
[profile.release] opt-level = 3 debug = true lto = "fat"
-
compile with
RUSTFLAGS="-C target-cpu=native"
(only minor difference) -
Instead of
for result in reader.into_byte_records()
, use:let mut record = csv::ByteRecord::new(); while reader.read_byte_record(&mut record)? {
this doubles the performance!! This is also what xsv does: https://github.com/BurntSushi/xsv/blob/3de6c04269a7d315f7e9864b9013451cd9580a08/src/cmd/select.rs#L77
With the release mode and read_byte_record, perf of using the library is the same as xsv select
for me.
from rust-csv.
Did you compile in release? Don't know the real answer, just a knee jerk response to "Rust beginner" and "expecting to run faster" :-)
from rust-csv.
@patricebellan Have you read the section in the docs about iterating over records?
I would somewhat expect xsv
to run quite a bit faster than 25 seconds on a mere 7 million rows. I share @Eh2406's concerns. Instead of cargo build
you might try cargo build --release
.
xsv select
barely does anything either. It should run within spitting distance of a simple count loop.
from rust-csv.
Thanks for adding your tip here! There is more explanation here on the technique: https://docs.rs/csv/1.1.3/csv/tutorial/index.html#amortizing-allocations
from rust-csv.
Nice try @Eh2406, I did compile in release ;)
I'm running it on a VM, so overall performance may not be the best.
But I was mostly concerned about comparing both, not pure performance per se.
from rust-csv.
Just ran into this myself - debug mode reads mere 2M rows in 18seconds, release build does it in 630ms.
from rust-csv.
Thanks for the link. Funny, I didn't actually see that, I just searched for "performance" within the docs index page but it didn't have results, and I assumed "tutorial" was more about handling different types of files etc, not about improving performance. And the docs search (obviously i guess) didn't yield it either :)
from rust-csv.
Related Issues (20)
- Use `#[non_exhaustive]` tag instead of manually `__Nonexhaustive` variant
- A nested struct deserializer problem HOT 3
- Program crashes caused by inappropriate parameter sizes. HOT 1
- Deserialize a field to an empty Vec<>
- Disable line terminator config HOT 5
- `write_byte_record` and `write_field` does not mix well and this is not properly documented. HOT 1
- Feature: Manually add headers to new CSV, using proposed csv::Writer.push_header() function HOT 2
- Space after delimiter messes with quoting HOT 4
- How to writing column? HOT 1
- Handling (serialization) of nested containers HOT 8
- Add Support for serde_transcode::transcode HOT 2
- Deserializing a String field inside a flattende struct fails if the field contains a valid integer HOT 4
- Feature request: please add `invalid_result` deserializer HOT 7
- Can the separator in CSV format support the char type? HOT 1
- Header Implementations "Content-Disposition". HOT 2
- How to serialize to a byte buffer HOT 16
- Automatically add an index number to headers that contain duplicate fields. HOT 2
- Error During the Deserialization of `String` Fields from Nested `struct`s HOT 4
- unexpected behavior (bug?) when using serde untagged with an enum to deserialize csv data HOT 1
- Serializing `None` vs serializing empty string HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rust-csv.