suchapalaver / krust Goto Github PK
View Code? Open in Web Editor NEWcounts k-mers, written in rust
License: MIT License
counts k-mers, written in rust
License: MIT License
The biggest feature it adds over Vec is shallow cloning. In other words, calling clone() on a Bytes instance does not copy the underlying data. Instead, a Bytes instance is a reference-counted handle to some underlying data. The Bytes type is roughly an Arc<Vec> but with some added capabilities.
could be a great functionality to add to your tool in time, maybe make one that is loadable into whatever handles hash tables in Rust
the utf8-processing of the kmers. The kmer iterator itself should really check it has valid kmers while iterating. Also, instead of storing the reverse-complement in heap-allocated strings, you can make a lazy reverse-complemented object. Alternatively, store the kmers in u64 - one of the reasons for using kmers in the first place is that they can be packed into machine integers for speed.
something that explains which arguments to pass in. When the output directory exists, you just write "File exists", which is very confusing if you have an unrelated file called "output".
If anyone wants to import your function they won't be happy with something that crashes the whole application when it fails. You can panic in the executable portion of the program though.
Rather than do let kmer_len = config.kmer_len.parse::().unwrap();, I would instead change the Config struct member kmer_len to be a usize, and perform parsing while constructing Config - Config::new already returns Result.
Hello Team,
It seems needle tail is much faster than bio for fasta file parsing. For larger fasta files, parsing can also be parallelized. Is this doable?
Thanks,
Jianshu
use (which you are already doing) is a newer version of the same thing.
I was looking for a fast kmer counter and came across this as well as your package:
https://pirl.unc.edu/blog/shaking-the-rust-off-python-redox
I guess there's also needletail
which is older, on crates.io and hence more depended on, but yours uses rayon, so a better fit for the implementation at the link. Is that right?
This is also more recently maintained, just by you. ๐
Nice that you are still maintaining this starter project! I myself am starting to learn rust. It seems you have been hoping to implement hashmaps also. Let me know if you would like a contribution--please be warned, it is early days for me!
My use case is that I want to populate a large array with kmer counts from sequencing reads, likely from FASTQ files. Since that is supported by needletail
I might start there.
writing a line per kmer is too inefficient and rarely needed. Much better to just return a vector of kmer hashmaps. Alternatively, make a hashmap containing n -> m pairs, where N is the number of time some kmer has been seen, and m the number of distinct kmers having been seen n times.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.