Hello, I recently found about LRSIM which seems to be super useful f

You could try running the test.sh in the <code class=

It ran well on my side. I uploaded the files generated at <a href="http://www.bio8.cs.

LRSIM was tested on LongRanger 2.0 <span class="email-hidden-toggl

Using LRSIM with LongRanger: Extremely high rate of incorrect barcodes observed (99.90 %) about lrsim HOT 13 CLOSED

aquaskyline commented on September 23, 2024

Using LRSIM with LongRanger: Extremely high rate of incorrect barcodes observed (99.90 %)

from lrsim.

Comments (13)

aquaskyline commented on September 23, 2024

Lowering -x to 100 is likely to be the cause of the error. For a smaller dataset, you might try simulating a larger dataset using the default parameters of LRSIM and then use a subset of the simulated reads randomly sampled from the whole fastq file. LongRanger checks the distribution and depth evenness of the barcode used very stringently. Changing the default parameters in LRSIM and using -o to disable parameter checking can make the simulated dataset look unreliable to LongRanger.

from lrsim.

morispi commented on September 23, 2024

Thanks for your answer!

Yeah, I could also do that, but the reason I lowered -x was actually because the size of the data generated was getting pretty big. I'm not exactly sure how far I got through the simulation process, but it grew up to a little more than 500 GB. Since I don't have access to lots of disk space, I thought lowering -x was a good compromise.

I'll try running again and leave -x at its default value then. Do you have any idea how much disk space it is gonna use in total, when running on E. coli? I would just like to be sure it's not gonna fully fill the available disk space I have left, since I have other experiments running in parallel, and that also require a little disk space.

Thanks again.

Pierre

from lrsim.

aquaskyline commented on September 23, 2024

You could try running the test.sh in the test folder, it provides an example on Ecoli. The Ecoli reference is already in the folder so what you need to do is just to run the test.sh script.

from lrsim.

morispi commented on September 23, 2024

I did run a full experiment with default parameters on E. coli last night. It ran successfully in a few hours and needed around 700 Go of disk space to run. However, I did not use the parameters specified in test.sh because I did not want any SV to be included in the data (it might sound weird, but I'm interested in seeing how SV-callers tools, especially the one I'm working on, behave on datasets with no SVs). The command I used was the following: perl simulateLinkedReads.pl -r Ecoli.fasta -p /scratch/pmorisse/LRSIM/Ecoli/SimEcoli -n

I then used seqtk to randomly subsamble the fastq file, and performed LongRanger alignment with the subsambled fastq files I thus generated. The total size of the fastq files was around 7 GB, which seems like a reasonable coverage for a small test experiment.

However, I still got the same error, and LongRanger reported that a extremely high rate of incorrect barcodes was observed.

Am I forced to perform LongRanger alignment with the whole 700 GB fastq file generated with LRSIM? I'm afraid I won't have enough disk space if I have to do so. Or might it be because I deactivated SV simulation?

from lrsim.

aquaskyline commented on September 23, 2024

I suggest you to test run the `test.sh` first to see if it goes through LongRanger.

…

On Tue, Dec 8, 2020 at 11:46 PM Pierre Morisse ***@***.***> wrote: I did run a full experiment with default parameters on E. coli last night. It ran successfully in a few hours and needed around 700 Go of disk space to run. However, I did not use the parameters specified in test.sh because I did not want any SV to be included in the data (it might sound weird, but I'm interested in seeing how SV-callers tools, especially the one I'm working on, behave on datasets with no SVs). The command I used was the following: perl simulateLinkedReads.pl -r Ecoli.fasta -p /scratch/pmorisse/LRSIM/Ecoli/SimEcoli -n I then used seqtk to randomly subsamble the fastq file, and performed LongRanger alignment with the subsambled fastq files I thus generated. The total size of the fastq files was around 7 GB, which seems like a reasonable coverage for a small test experiment. However, I still got the same error, and LongRanger reported that a extremely high rate of incorrect barcodes was observed. Am I forced to perform LongRanger alignment with the whole 700 GB fastq file generated with LRSIM? I'm afraid I won't have enough disk space if I have to do so. Or might it be because I deactivated SV simulation? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#34 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAG2SOKF3TWWEZMMDCHZL2LSTZC43ANCNFSM4UNSYAHQ> .

-- Laurent

from lrsim.

morispi commented on September 23, 2024

I just run test.sh and provided the generated data to LongRanger.
It crashed again, and output a different error message:

Log message: stage error:FASTQ parsing error: input fastq not consistent

from lrsim.

aquaskyline commented on September 23, 2024

It ran well on my side. I uploaded the files generated at http://www.bio8.cs.hku.hk/lrsim/.

from lrsim.

morispi commented on September 23, 2024

Just downloaded and tested with your data, and got the same error.
Might be something to do with LongRanger I guess? Can you tell me which version you are using?

from lrsim.

aquaskyline commented on September 23, 2024

LRSIM was tested on LongRanger 2.0

…

On Thu, Dec 10, 2020 at 7:59 PM Pierre Morisse ***@***.***> wrote: Just downloaded and tested with your data, and got the same error. Might be something to do with LongRanger I guess? Can you tell me which version you are using? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#34 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAG2SOK3C2SRYS7VUUQF6ILSUCZ2BANCNFSM4UNSYAHQ> .

-- Laurent

from lrsim.

morispi commented on September 23, 2024

That might be why, I'm using LongRanger 2.2.2.
LongRanger 2.0 does not seem to be available for download on 10x genomics website though.

from lrsim.

morispi commented on September 23, 2024

I managed to pin down the problem.

As mentioned in a previous issue, this was caused by the "/1" and "/2" located at the end of the reads simulated by LRSIM, which seem to be incompatible with LongRanger. Removing them and re-rerunning LongRanger seemed to fix the problem with the data generated by the test.sh script.

I also tried generated more data, using most of the parameters mentioned in test.sh, but deactivating SV simulation, and all seems to work well. LongRanger is still running, but did not report any error.

I believe my initial with the high rate of incorrect barcodes was due to the fact I was using -x 1 without decreasing the -t parameter in accordance.

from lrsim.

aquaskyline commented on September 23, 2024

That's great. I was trying to pinpoint the problem but focused too much on the barcode list.

…

On Mon, Dec 14, 2020 at 11:55 PM Pierre Morisse ***@***.***> wrote: I managed to pin down the problem. As mentioned in a previous issue, this was caused by the "/1" and "/2" located at the end of the reads simulated by LRSIM, which seem to be incompatible with LongRanger. Removing them and re-rerunning LongRanger seemed to fix the problem with the data generated by the test.sh script. I also tried generated more data, using most of the parameters mentioned in test.sh, but deactivating SV simulation, and all seems to work well. LongRanger is still running, but did not report any error. I believe my initial with the high rate of incorrect barcodes was due to the fact I was using -x 1 without decreasing the -t parameter in accordance. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#34 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAG2SOPDJXHVNYSB7PUMWNDSUYYO3ANCNFSM4UNSYAHQ> .

-- Laurent

from lrsim.

morispi commented on September 23, 2024

Closing since the problem is solved.

from lrsim.

Using LRSIM with LongRanger: Extremely high rate of incorrect barcodes observed (99.90 %) about lrsim HOT 13 CLOSED

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent