Giter Site home page Giter Site logo

Problem speed of mapping about readfish HOT 23 CLOSED

looselab avatar looselab commented on August 22, 2024
Problem speed of mapping

from readfish.

Comments (23)

mattloose avatar mattloose commented on August 22, 2024

Hi,

There are a lot of issues here.

The toml file you provide (human_chr_selection.toml.txt) won't pass validation as it has no targets. It isn't the one passed in the command shown in ru_test.log (that one is human_chr_selection.toml).

The ru_test.log shows that you actually have two targets in your toml file (further suggesting an incorrect toml file here) BUT your used toml file has 2 targets, none of which are found in the reference.

Reads will either be always off target or not map at all. If they are not mapping (and I suspect that is the case here) you will collect more data and so your basecalling will take longer and longer.

In essence I'm not sure you have configured this experiment properly.

If you can provide further information including the source of data (are you playing back a bulkfile here or something else?) and the correct toml file we might be able to help further.

Matt

from readfish.

ArtemPalanaria avatar ArtemPalanaria commented on August 22, 2024

Thanks for the answer. I changed the reference and the file passed the test. After starting, it still shows a long time. To start, I used the bulk file from
http://s3.amazonaws.com/nanopore-human-wgs/bulkfile/PLSP57501_20170308_FNFAF14035_MN16458_sequencing_run_NOTT_Hum_wh1rs2_60428.fast5

Attach files.
human_chr_selection.toml.txt
chunk_log.log
ru_test.log
chek_toml.txt

from readfish.

ArtemPalanaria avatar ArtemPalanaria commented on August 22, 2024

Last time I attached the wrong file (TOML) attached ..

from readfish.

mattloose avatar mattloose commented on August 22, 2024

Thanks for the update - So that is a lot slower than I would expect.

I would check a few things here.

First - how quickly can your GPU call reads when running standalone. You may need to play with guppy parameters to tune your guppy basecaller optimally.

However, we need to see if it is GPU or CPU which is limiting here - how big is your reference file that you are mapping too? Also what sort of power is your CPU?

Have you tried the fast basecalling model instead of the high accuracy model? If you see an improvement in speed here then we can pinpoint the source of the problem a little.

Thanks

from readfish.

ArtemPalanaria avatar ArtemPalanaria commented on August 22, 2024

Thank. I launched it on a high accuracy model - the speed for the first 2 minutes was normal, but then again everything started to slow down to 1 second or more. I have CPU Ryzen 7 (3800X 8 core 16 treads). I use as a reference the indexed file from ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz
File size (mmi) more than 7 GB. Maybe this is the case?
Thanks

from readfish.

mattloose avatar mattloose commented on August 22, 2024

This doesn't really make sense then. Can you please try setting the max chunks to 8 rather than infinite and see what happens?

Also - please leave it running for more than 15 minutes and check the resulting data to see if the selection is working.

from readfish.

mattloose avatar mattloose commented on August 22, 2024

Also - please can you try it with the FAST model and not the High Accuracy Model. Running on the fast model will tell us something about where the lag is.

from readfish.

ArtemPalanaria avatar ArtemPalanaria commented on August 22, 2024

Dear Matt. I started the process on fast and hac models with max_chunks = 8
But according to the data obtained it is clear that the fast process is going as fast as necessary, and the hac is slowing down and very much. And the results obtained are also apparently bad.
Here are the fast data
read-length-histogram-05 05 2020, 10_55_26
chunk_log.zip
ru_test.zip
result.txt
and hac data
result.txt
chunk_log.log
hac
ru_test.zip

Thank

from readfish.

mattloose avatar mattloose commented on August 22, 2024

can I check what operating system you are on? And also can you provide a metric for how quickly you can basecall standard reads on your current setup?

from readfish.

tchrisboles avatar tchrisboles commented on August 22, 2024

image

How would I check the speed in standard basecalling? From log files? I've never looked for them - give me a hint and I dig it out.

from readfish.

ArtemPalanaria avatar ArtemPalanaria commented on August 22, 2024

Dear Matt. Here are the system data (Ubuntu 18.04.4 LTS
Gnome 3.28.2)
and baseсall speed files.
guppy.txt
guppy_basecaller_log-2020-05-07_10-42-57.log
Thank

from readfish.

vincentmanz avatar vincentmanz commented on August 22, 2024

I have observed the same problem here when using the hac model in the toml file, I obtained very slow mapping time (>1s).
#44~18.04.2-Ubuntu SMP Thu Apr 23 14:27:18 UTC 2020

from readfish.

mattloose avatar mattloose commented on August 22, 2024

Hi All,

A quick question - could people confirm the version of guppy they are using?

Thanks.

from readfish.

mattloose avatar mattloose commented on August 22, 2024

If you are on version 3.6 it may be worth trying guppy 3.4.5 - it is available from:

https://mirror.oxfordnanoportal.com/software/analysis/ont-guppy_3.4.5_linux64.tar.gz

It looks as though there is a change in guppy performance that might be negatively impacting the speed of read until.

from readfish.

tchrisboles avatar tchrisboles commented on August 22, 2024

Hi Matt and Artem,

I have been having problems similar to Artem and I am running:
image

from readfish.

tchrisboles avatar tchrisboles commented on August 22, 2024

Thanks Matt - will try 3.4.5 later today.

from readfish.

mattloose avatar mattloose commented on August 22, 2024

HI Chris,

If you can let us know how 3.4.5 goes - the accuracy differences aren't key here but the speed is. So you should find that gives you better performance. We're really keen to resolve this ASAP!

Best

Matt

from readfish.

tchrisboles avatar tchrisboles commented on August 22, 2024

OK, I think you guys nailed it with the guppy server version. Here's my test results.
(I downloaded and untarred ont-guppy package 3.4.5 as Matt pointed out above.)
Setup basecall server:
image
In second terminal window setup the ru_generators command:
image
I had previously modified Matt's toml file as here:
image
After 16 min the read distribution and mapping timing looked like this:
image
Which is much closer to Matt's readme image than I have gotten previously. Mapping timing is still not quite as fast as Matt's. Here's a close-up of 16 minute read distribution:
image
And the summarise output:
image
The median read lengths are now showing enrichments for chr21,22. Again, not quite as good as Matt's readme, but significant.
I think what would help us all would be some additional guidance on best strategies for optimizing for guppy server settings.

Hope this helps others who are as interested in ru as we are.

from readfish.

tchrisboles avatar tchrisboles commented on August 22, 2024

By the way, you can see my previous results using guppy 3.5.2 in Question #39.

from readfish.

mattloose avatar mattloose commented on August 22, 2024

Thanks @tchrisboles

We're just running some equivalence tests across a few GPUs here. All our work was reported using 3.4.5 - we will investigate the issues with guppy > 3.4.5 with ONT.

from readfish.

mattloose avatar mattloose commented on August 22, 2024

signal-attachment-2020-05-08-195144_001
So here is a comparison of a 1080 vs the GPU (GV100) in the GridION - as you can see for guppy 3.4.5 performance is roughly equivalent, but guppy 3.6 performance is not sufficient for real time calling. We suspect some underlying issue that can be resolved but for now recommend guppy 3.4.5. You can have two versions of guppy running side by side as required.

from readfish.

ArtemPalanaria avatar ArtemPalanaria commented on August 22, 2024

Dear Matt. I got similar results as Chris using guppy 3.4.5.
Run.txt
run
I got similar results as Chris using guppy 3.4.5
I also wanted to know - can I use any fast files in the quality of the bulk file, or do I need to prepare them somehow?
And do not tell me an example of setting up library depletion for the human genome (for enriching the metagenome)?
Thanks for the help. I am very glad that everything worked!

from readfish.

mattloose avatar mattloose commented on August 22, 2024

Hi - you have to record a bulkfile from a run - you cannot use any fast5 file.

Look under the advanced file save options.

For depletion of a human genome you just need to configure your toml file to reject anything that maps to the reference you want to get rid off. Have a look at our paper for detailsl.

from readfish.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.