Giter Site home page Giter Site logo

krust's People

Contributors

suchapalaver avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

krust's Issues

Explore using the bytes crate

bytes

The biggest feature it adds over Vec is shallow cloning. In other words, calling clone() on a Bytes instance does not copy the underlying data. Instead, a Bytes instance is a reference-counted handle to some underlying data. The Bytes type is roughly an Arc<Vec> but with some added capabilities.

build kmer indexes

could be a great functionality to add to your tool in time, maybe make one that is loadable into whatever handles hash tables in Rust

speed up by changing the utf8 processing, reverse-comp, and storage

the utf8-processing of the kmers. The kmer iterator itself should really check it has valid kmers while iterating. Also, instead of storing the reverse-complement in heap-allocated strings, you can make a lazy reverse-complemented object. Alternatively, store the kmers in u64 - one of the reasons for using kmers in the first place is that they can be packed into machine integers for speed.

avoid panicking at all in your library code

If anyone wants to import your function they won't be happy with something that crashes the whole application when it fails. You can panic in the executable portion of the program though.

change the Config struct member kmer_len to be a usize

Rather than do let kmer_len = config.kmer_len.parse::().unwrap();, I would instead change the Config struct member kmer_len to be a usize, and perform parsing while constructing Config - Config::new already returns Result.

implement map reduce

I was looking for a fast kmer counter and came across this as well as your package:
https://pirl.unc.edu/blog/shaking-the-rust-off-python-redox

I guess there's also needletail which is older, on crates.io and hence more depended on, but yours uses rayon, so a better fit for the implementation at the link. Is that right?

This is also more recently maintained, just by you. ๐Ÿ˜„

Nice that you are still maintaining this starter project! I myself am starting to learn rust. It seems you have been hoping to implement hashmaps also. Let me know if you would like a contribution--please be warned, it is early days for me!

My use case is that I want to populate a large array with kmer counts from sequencing reads, likely from FASTQ files. Since that is supported by needletail I might start there.

speed up using hashmaps

writing a line per kmer is too inefficient and rarely needed. Much better to just return a vector of kmer hashmaps. Alternatively, make a hashmap containing n -> m pairs, where N is the number of time some kmer has been seen, and m the number of distinct kmers having been seen n times.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.