Dear Matt. I tried to run Read Until, went through the testing stages (to the Testing

<a target="_blank" rel="noopener noreferrer nofollow" href="https://user-images.github

Problem speed of mapping about readfish HOT 23 CLOSED

looselab commented on August 22, 2024

Problem speed of mapping

from readfish.

Comments (23)

mattloose commented on August 22, 2024

Hi,

There are a lot of issues here.

The toml file you provide (human_chr_selection.toml.txt) won't pass validation as it has no targets. It isn't the one passed in the command shown in ru_test.log (that one is human_chr_selection.toml).

The ru_test.log shows that you actually have two targets in your toml file (further suggesting an incorrect toml file here) BUT your used toml file has 2 targets, none of which are found in the reference.

Reads will either be always off target or not map at all. If they are not mapping (and I suspect that is the case here) you will collect more data and so your basecalling will take longer and longer.

In essence I'm not sure you have configured this experiment properly.

If you can provide further information including the source of data (are you playing back a bulkfile here or something else?) and the correct toml file we might be able to help further.

Matt

from readfish.

ArtemPalanaria commented on August 22, 2024

Thanks for the answer. I changed the reference and the file passed the test. After starting, it still shows a long time. To start, I used the bulk file from
http://s3.amazonaws.com/nanopore-human-wgs/bulkfile/PLSP57501_20170308_FNFAF14035_MN16458_sequencing_run_NOTT_Hum_wh1rs2_60428.fast5

Attach files.
human_chr_selection.toml.txt
chunk_log.log
ru_test.log
chek_toml.txt

from readfish.

ArtemPalanaria commented on August 22, 2024

Last time I attached the wrong file (TOML) attached ..

from readfish.

mattloose commented on August 22, 2024

Thanks for the update - So that is a lot slower than I would expect.

I would check a few things here.

First - how quickly can your GPU call reads when running standalone. You may need to play with guppy parameters to tune your guppy basecaller optimally.

However, we need to see if it is GPU or CPU which is limiting here - how big is your reference file that you are mapping too? Also what sort of power is your CPU?

Have you tried the fast basecalling model instead of the high accuracy model? If you see an improvement in speed here then we can pinpoint the source of the problem a little.

Thanks

from readfish.

ArtemPalanaria commented on August 22, 2024

Thank. I launched it on a high accuracy model - the speed for the first 2 minutes was normal, but then again everything started to slow down to 1 second or more. I have CPU Ryzen 7 (3800X 8 core 16 treads). I use as a reference the indexed file from ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz
File size (mmi) more than 7 GB. Maybe this is the case?
Thanks

from readfish.

mattloose commented on August 22, 2024

This doesn't really make sense then. Can you please try setting the max chunks to 8 rather than infinite and see what happens?

Also - please leave it running for more than 15 minutes and check the resulting data to see if the selection is working.

from readfish.

mattloose commented on August 22, 2024

Also - please can you try it with the FAST model and not the High Accuracy Model. Running on the fast model will tell us something about where the lag is.

from readfish.

ArtemPalanaria commented on August 22, 2024

Dear Matt. I started the process on fast and hac models with max_chunks = 8
But according to the data obtained it is clear that the fast process is going as fast as necessary, and the hac is slowing down and very much. And the results obtained are also apparently bad.
Here are the fast data

chunk_log.zip
ru_test.zip
result.txt
and hac data
result.txt
chunk_log.log

ru_test.zip

Thank

from readfish.

mattloose commented on August 22, 2024

can I check what operating system you are on? And also can you provide a metric for how quickly you can basecall standard reads on your current setup?

from readfish.

tchrisboles commented on August 22, 2024

How would I check the speed in standard basecalling? From log files? I've never looked for them - give me a hint and I dig it out.

from readfish.

ArtemPalanaria commented on August 22, 2024

Dear Matt. Here are the system data (Ubuntu 18.04.4 LTS
Gnome 3.28.2)
and baseсall speed files.
guppy.txt
guppy_basecaller_log-2020-05-07_10-42-57.log
Thank

from readfish.

vincentmanz commented on August 22, 2024

I have observed the same problem here when using the hac model in the toml file, I obtained very slow mapping time (>1s).
#44~18.04.2-Ubuntu SMP Thu Apr 23 14:27:18 UTC 2020

from readfish.

mattloose commented on August 22, 2024

Hi All,

A quick question - could people confirm the version of guppy they are using?

Thanks.

from readfish.

mattloose commented on August 22, 2024

If you are on version 3.6 it may be worth trying guppy 3.4.5 - it is available from:

https://mirror.oxfordnanoportal.com/software/analysis/ont-guppy_3.4.5_linux64.tar.gz

It looks as though there is a change in guppy performance that might be negatively impacting the speed of read until.

from readfish.

tchrisboles commented on August 22, 2024

Hi Matt and Artem,

I have been having problems similar to Artem and I am running:

from readfish.

tchrisboles commented on August 22, 2024

Thanks Matt - will try 3.4.5 later today.

from readfish.

mattloose commented on August 22, 2024

HI Chris,

If you can let us know how 3.4.5 goes - the accuracy differences aren't key here but the speed is. So you should find that gives you better performance. We're really keen to resolve this ASAP!

Best

Matt

from readfish.

tchrisboles commented on August 22, 2024

OK, I think you guys nailed it with the guppy server version. Here's my test results.
(I downloaded and untarred ont-guppy package 3.4.5 as Matt pointed out above.)
Setup basecall server:

In second terminal window setup the ru_generators command:

I had previously modified Matt's toml file as here:

After 16 min the read distribution and mapping timing looked like this:

Which is much closer to Matt's readme image than I have gotten previously. Mapping timing is still not quite as fast as Matt's. Here's a close-up of 16 minute read distribution:

And the summarise output:

The median read lengths are now showing enrichments for chr21,22. Again, not quite as good as Matt's readme, but significant.
I think what would help us all would be some additional guidance on best strategies for optimizing for guppy server settings.

Hope this helps others who are as interested in ru as we are.

from readfish.

tchrisboles commented on August 22, 2024

By the way, you can see my previous results using guppy 3.5.2 in Question #39.

from readfish.

mattloose commented on August 22, 2024

Thanks @tchrisboles

We're just running some equivalence tests across a few GPUs here. All our work was reported using 3.4.5 - we will investigate the issues with guppy > 3.4.5 with ONT.

from readfish.

mattloose commented on August 22, 2024

So here is a comparison of a 1080 vs the GPU (GV100) in the GridION - as you can see for guppy 3.4.5 performance is roughly equivalent, but guppy 3.6 performance is not sufficient for real time calling. We suspect some underlying issue that can be resolved but for now recommend guppy 3.4.5. You can have two versions of guppy running side by side as required.

from readfish.

ArtemPalanaria commented on August 22, 2024

Dear Matt. I got similar results as Chris using guppy 3.4.5.
Run.txt

I got similar results as Chris using guppy 3.4.5
I also wanted to know - can I use any fast files in the quality of the bulk file, or do I need to prepare them somehow?
And do not tell me an example of setting up library depletion for the human genome (for enriching the metagenome)?
Thanks for the help. I am very glad that everything worked!

from readfish.

mattloose commented on August 22, 2024

Hi - you have to record a bulkfile from a run - you cannot use any fast5 file.

Look under the advanced file save options.

For depletion of a human genome you just need to configure your toml file to reject anything that maps to the reference you want to get rid off. Have a look at our paper for detailsl.

from readfish.

Problem speed of mapping about readfish HOT 23 CLOSED

Comments (23)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent