Giter Site home page Giter Site logo

Comments (13)

aquaskyline avatar aquaskyline commented on September 23, 2024

Lowering -x to 100 is likely to be the cause of the error. For a smaller dataset, you might try simulating a larger dataset using the default parameters of LRSIM and then use a subset of the simulated reads randomly sampled from the whole fastq file. LongRanger checks the distribution and depth evenness of the barcode used very stringently. Changing the default parameters in LRSIM and using -o to disable parameter checking can make the simulated dataset look unreliable to LongRanger.

from lrsim.

morispi avatar morispi commented on September 23, 2024

Thanks for your answer!

Yeah, I could also do that, but the reason I lowered -x was actually because the size of the data generated was getting pretty big. I'm not exactly sure how far I got through the simulation process, but it grew up to a little more than 500 GB. Since I don't have access to lots of disk space, I thought lowering -x was a good compromise.

I'll try running again and leave -x at its default value then. Do you have any idea how much disk space it is gonna use in total, when running on E. coli? I would just like to be sure it's not gonna fully fill the available disk space I have left, since I have other experiments running in parallel, and that also require a little disk space.

Thanks again.

Pierre

from lrsim.

aquaskyline avatar aquaskyline commented on September 23, 2024

You could try running the test.sh in the test folder, it provides an example on Ecoli. The Ecoli reference is already in the folder so what you need to do is just to run the test.sh script.

from lrsim.

morispi avatar morispi commented on September 23, 2024

I did run a full experiment with default parameters on E. coli last night. It ran successfully in a few hours and needed around 700 Go of disk space to run. However, I did not use the parameters specified in test.sh because I did not want any SV to be included in the data (it might sound weird, but I'm interested in seeing how SV-callers tools, especially the one I'm working on, behave on datasets with no SVs). The command I used was the following: perl simulateLinkedReads.pl -r Ecoli.fasta -p /scratch/pmorisse/LRSIM/Ecoli/SimEcoli -n

I then used seqtk to randomly subsamble the fastq file, and performed LongRanger alignment with the subsambled fastq files I thus generated. The total size of the fastq files was around 7 GB, which seems like a reasonable coverage for a small test experiment.

However, I still got the same error, and LongRanger reported that a extremely high rate of incorrect barcodes was observed.

Am I forced to perform LongRanger alignment with the whole 700 GB fastq file generated with LRSIM? I'm afraid I won't have enough disk space if I have to do so. Or might it be because I deactivated SV simulation?

from lrsim.

aquaskyline avatar aquaskyline commented on September 23, 2024

from lrsim.

morispi avatar morispi commented on September 23, 2024

I just run test.sh and provided the generated data to LongRanger.
It crashed again, and output a different error message:

Log message: stage error:FASTQ parsing error: input fastq not consistent

from lrsim.

aquaskyline avatar aquaskyline commented on September 23, 2024

It ran well on my side. I uploaded the files generated at http://www.bio8.cs.hku.hk/lrsim/.

from lrsim.

morispi avatar morispi commented on September 23, 2024

Just downloaded and tested with your data, and got the same error.
Might be something to do with LongRanger I guess? Can you tell me which version you are using?

from lrsim.

aquaskyline avatar aquaskyline commented on September 23, 2024

from lrsim.

morispi avatar morispi commented on September 23, 2024

That might be why, I'm using LongRanger 2.2.2.
LongRanger 2.0 does not seem to be available for download on 10x genomics website though.

from lrsim.

morispi avatar morispi commented on September 23, 2024

I managed to pin down the problem.

As mentioned in a previous issue, this was caused by the "/1" and "/2" located at the end of the reads simulated by LRSIM, which seem to be incompatible with LongRanger. Removing them and re-rerunning LongRanger seemed to fix the problem with the data generated by the test.sh script.

I also tried generated more data, using most of the parameters mentioned in test.sh, but deactivating SV simulation, and all seems to work well. LongRanger is still running, but did not report any error.

I believe my initial with the high rate of incorrect barcodes was due to the fact I was using -x 1 without decreasing the -t parameter in accordance.

from lrsim.

aquaskyline avatar aquaskyline commented on September 23, 2024

from lrsim.

morispi avatar morispi commented on September 23, 2024

Closing since the problem is solved.

from lrsim.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.