Comments (6)
Thanks for the info!
This might not be helpful but I have found that up to 8-16 threads setting this option can really speed things up!
// stuff reading in a bam file and a header from that bam
// ...
let threads = 16;
let mut out = bam::Writer::from_path(out, &header, bam::Format::Bam).unwrap()
out.set_threads(threads).unwrap();
this of course assumes you use rust, rust-htslib, etc.
But when I use this I can write >10,000 pacbio reads per second.
from hiphase.
I'm not entirely sure what I'm looking at on that top readout. Is the rg
command providing sequential timepoints?
Regardless, there is likely some optimization of threads that can happen around all forms of I/O and parallelization. Most internal tests so far have been on 16 threads, and we have not revisited parallelization components probably since proof-of-concept. Historically, they were not the bottlenecks, but we may need to revisit that if further speed improvements get prioritized.
from hiphase.
Ahh sorry. rg is just a grep alternative I like and it's just searching top for updates with hiphase over a minute or so.
But I was able to remove the need for the bam with the new haplotag file you made for me and I am happy with that speed. So feel free to close if you want, or leave open to bookmark potential future improvements.
from hiphase.
v0.10.0 leverages the thread pools provided by htslib. This was the lowest hanging fruit in the short term for optimizing I/O. Internally, we saw about a 40% speedup while haplotagging, although mileage will vary there across systems and depending on contention.
from hiphase.
Yea, this is an bottleneck we're aware of that's specifically related to writing haplotagged files. The phasing itself is parallelized well, but the writing of files is still handled in a single-threaded manner. If you are not writing BAM files, this isn't really an issues because the file sizes are small, but once you starting haplotagging the tool quickly becomes thread and/or I/O bound. Improving this is on our longer-term TODO list.
from hiphase.
Can confirm that it is much faster without the bam output file. But FYI I am still not seeing great utilization for all 32 threads.
from hiphase.
Related Issues (20)
- Feature request: optional file containing read haplotype assignments HOT 1
- Question on tagging of supplemental alignments HOT 2
- Feature request: CRAM compatible HOT 6
- RUST error when phasing with SV VCF file HOT 7
- Error βthread '<unnamed>' panicked at 'assertion failed: `(left == right)`β occurred while HIPhase working HOT 7
- segmentation fault (core dumped) HOT 2
- Encountered max_edit_distance check HOT 6
- Recommendations for input vcf HOT 3
- A question about HP tag HOT 6
- reference letter case issue HOT 2
- Feature request: haplotag in phased VCF files HOT 2
- Running HiPhase with tumor-normal pair HOT 2
- Error while parsing VCF file: FORMAT columns HOT 3
- Expected memory usage HOT 3
- Phase vcf with pre phased reads HOT 6
- Normalization of INDELs: required or should be avoided HOT 3
- [Suggestion] reducing messages to STDOUT to speed up the utility HOT 3
- [Question] information in the filter column of vcfs HOT 2
- [Question] Phasing of rs36056539 in NA19226 HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hiphase.